LiteKit: Baidu's Mobile AI Deployment Framework for Fast AI Capability Integration
LiteKit, Baidu’s mobile AI deployment framework built on Paddle Lite, delivers out‑of‑the‑box video super‑resolution, human segmentation and gesture‑recognition SDKs that reduce integration complexity to three simple steps across Objective‑C, Java and C++, achieving real‑time performance (25 FPS) while lowering development effort and platform barriers.
As more AI scenarios are directly deployed on mobile devices, the advantages of mobile-side AI include real-time processing, bandwidth savings, and enhanced security. These AI capabilities bring tremendous imagination to mobile products and promote the prosperity of the mobile internet's next phase.
Behind mobile intelligence, there are mobile developers and AI algorithm engineers. In actual business development, AI algorithms developed by algorithm engineers need to be delivered to mobile developers for engineering implementation. This implementation chain involves two main challenges:
High access threshold : A product without any AI practice requires not only model training and prediction engine integration, but also data pre-processing and post-processing for different models, including various color spaces and storage format conversions for images. This process may involve multiple languages such as Python, C/C++, Objective-C, and Java.
Heavy workload : The process of integrating AI capabilities is very complex, involving engine call development, pre/post-processing development, and even concurrency-related processing. Considering portability and reusability, it is necessary to support multiple business scenarios and perform modular decomposition, resulting in doubled workload.
LiteKit provides a solution to these problems. It offers video super-resolution, human segmentation, and gesture recognition AI capabilities, all derived from Baidu's years of technical accumulation in various AI business scenarios, provided in the form of SDKs for out-of-the-box use.
Video Super-Resolution
For mobile application scenarios, video super-resolution's most important aspect is not only pursuing extreme super-resolution effects, but also the balance between performance and effect. The main goal of mobile application video super-resolution is to optimize and reconstruct the画面 while achieving 25FPS on mobile devices.
LiteKit's video super-resolution is the first open solution in the industry that can achieve 25FPS. It supports 360p to 480p super-resolution at 25FPS. Additionally, LiteKit's video super-resolution can directly process YUV420 video frame data decoded by the player and obtain output data in the same format, reducing the need for data format conversion and greatly facilitating users while improving processing performance.
Performance test results show that on iPhone XS Max (2018), video super-resolution can achieve 32.15ms per frame prediction speed, supporting 25FPS super-resolution. On the latest iPhone 12, prediction speed is further improved by 30% compared to iPhone XS Max. On Huawei nova 5 Pro (released in 2019), video super-resolution can achieve 23.84ms per frame, fully supporting 25FPS super-resolution.
Gesture Recognition
LiteKit provides gesture recognition AI capability that can accurately detect the rectangular coordinates of gesture location, gesture type, and confidence. It supports recognizing six types of gestures: hand, five-finger gesture, V gesture, fist, 1 gesture, and OK gesture.
Human Segmentation
LiteKit provides real-time human segmentation capability that can accurately segment human figures from backgrounds, usable for background removal, portrait matting, photo synthesis, background replacement, and other business scenarios.
Integration Process
LiteKit's integration requires only three steps: creating a predictor, performing inference, and releasing the predictor. Although video super-resolution, human segmentation, and gesture recognition have different inputs and outputs, the overall process and API style are abstracted into similar steps, and even API naming is kept highly consistent to minimize learning costs.
Objective-C Demo:
// 1. Create predictor. The creation interface is synchronous and requires no additional configuration.
self.srVideo = [LiteKitVideoSuperResolutionor createVideoSuperResolutionorWithError:&error];
// 2. Inference. For video super-resolution, only input image and scale factor are needed to obtain the super-resolved image.
UIImage *newImg = [self.srVideo superResolutionWithUIImage:inputImage scale:1.0 error:&error];
// 3. Release predictor.
self.srVideo = nil;Java Demo:
// 1. Create predictor. The creation interface is synchronous and requires no additional configuration.
Long handle = VideoSuperResolution.init(this);
// 2. Inference. For video super-resolution, only input image, scale factor, and predictor handle are needed to obtain the super-resolved image.
Bitmap bitmap = VideoSuperResolution.nativePreditBitmap(handle, lowBitmap, scale);
// 3. Release predictor.
VideoSuperResolution.nativeReleaseSrSdk(handle);LiteKit Architecture
LiteKit is divided into three layers from bottom to top:
Paddle Lite (Bottom layer): Open-source inference engine provided by Baidu's PaddlePaddle deep learning platform, capable of performing inference on CPU, GPU, and other environments.
LiteKitCore Framework Layer (Middle layer): Isolates business parties from direct dependence on Paddle Lite and provides consistent Objective-C, Java, and C++ APIs, basic structure and data type definitions, and common tool sets to upper layers.
LiteKit Business Layer (Top layer): Encapsulates capabilities such as human segmentation, video super-resolution, and gesture recognition based on different businesses. LiteKit's capabilities are continuously expanding.
After mobile AI developers integrate LiteKitCore, they no longer need to worry about complex operations such as inference engine configuration and model loading. LiteKit internally manages most complex configurations of the inference engine while still supporting flexible configuration of key parameters such as model addresses.
LiteKitCore provides three sets of interfaces (Java/Objective-C/C++) to AI capability developers above, which can greatly reduce the integration cost for end-side AI developers.
C++ Example for CPU AI Inference:
// 1. Create config
litekit_framework::LiteKitConfig config;
// machine_type setting
congit.machine_type = litekit_framework::LiteKitConfig::MachineType::PaddleLite;
litekit_framework::LiteKitConfig::PaddleLiteConfig paddle_config;
// Set model path
paddle_config.modle_type = litekit_framework::LiteKitConfig::PaddleLiteConfig::LITE_MODEL_FROM_FILE;
paddle_config.model.model_from_file.data = fileDir.data;
paddle_config.model.model_from_file.size = fileDir.length;
config.machine_config.paddle_lite_config = paddle_config;
/* Some unimportant property settings omitted */
// 2. Load machine
std::shared_ptr<litekit_framework::LiteKitMachineService> service = slitekit_framework::CreateLiteKitMachineService(config);C++ Inference Example:
// 1. Create input
std::unique_ptr<litekit_framework::LiteKitData> inputData = service->getInputData(0);
litekit_framework::LiteKitTensor *ainput = inputData->litekitTensor;
// 2. Execute predict
service->run();
// 3. Get output
std::unique_ptr<const litekit_framework::LiteKitData> outputData = service->getOutputData(0);
const litekit_framework::LiteKitTensor *output_tensor = outputData->litekitTensor;LiteKit, as PaddlePaddle's mobile deployment tool, can quickly deploy AI capabilities based on Paddle Lite, Baidu's lightweight inference engine. It enables AI capabilities to be quickly engineered and deployed in any APP and any scenario, allowing developers to simply implement their own AI effects.
In the near future, LiteKit will also open more capabilities such as OCR and support more business scenarios.
Baidu App Technology
Official Baidu App Tech Account
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.