Implementation of a Lightweight Real-Time Face Sticker for Live Streaming on iOS
This article details the design and implementation of a low‑overhead face‑sticker solution for live‑streaming iOS apps, covering technical requirements, chosen libraries, JSON sticker format, 3D transformation pipelines, landmark smoothing, Euler‑angle handling, and the resulting performance advantages.
To meet a business need for live‑streaming, a lightweight face‑sticker system was built for iOS, focusing on usability, high performance, minimal package increase, low intrusiveness, and ease of use.
Technical analysis identified four key constraints: maintain visual quality, optimise CPU/GPU consumption, keep the binary size small, and ensure a non‑intrusive integration.
The final stack consists of GPUImage for video processing, Apple’s Vision framework for face detection, and OpenGL for compositing the sticker onto the video frames. Vision was chosen because it adds no binary overhead and works from iOS 12 onward.
Sticker JSON parsing extracts position, rotation centre, scale points and trigger events. Example snippet:
{
"position": {
"flag": 49,
"x": 185.59,
"y": 299.39
},
"rotateCenter": [{
"flag": 49,
"x": 239.92,
"y": 357.87
}],
"scale": {
"pointA": {"flag": 6, "x": 185.59, "y": 299.39},
"pointB": {"flag": 13, "x": 293.67, "y": 298.72}
},
"begin": "sticker1.complete"
}The sticker is then transformed in three‑dimensional space. Five coordinate systems are involved: Vision, model, world, camera and homogeneous (NDC). Vision points are first converted to model space, then the classic Model‑View‑Projection (MVP) matrices are applied:
_modelViewMatrix = GLKMatrix4Identity;
_viewMatrix = GLKMatrix4MakeLookAt(0, 0, 6, 0, 0, 0, 0, 1, 0);
float ratio = outputFramebuffer.size.width / outputFramebuffer.size.height;
_projectionMatrix = GLKMatrix4MakeFrustum(-ratio, ratio, -1, 1, 3, 9);After MVP multiplication, the sticker vertices (including a manually added z = 0 and w = 1) are fed to the vertex shader, where additional Euler‑angle rotations are applied.
Landmark smoothing addresses jitter in Vision’s points. After testing Kalman and Savitzky‑Golay filters, a dynamic first‑order low‑pass filter was adopted:
Y(n) = α·X(n) + (1‑α)·Y(n‑1)The filter coefficient α is increased when the landmark motion is consistent and decreased (or reset) when direction changes, achieving both responsiveness and stability.
Euler‑angle handling notes that Vision provides roll and yaw, while pitch appears only from iOS 15. For more accurate head pose, a PnP solution via OpenCV’s solvePnP was explored, though it degrades beyond 45° yaw.
Project advantages include a tiny binary footprint (no extra native libraries), CPU increase of only 5‑20 % on a single core, memory growth of 10‑20 MB, seamless streaming integration, modular JSON conversion, and easy extensibility thanks to the GPUImage architecture.
Additional notes cover buffer‑cache management, a brief introduction to Euler angles, homogeneous coordinates, and MVP matrix theory.
TAL Education Technology
TAL Education is a technology-driven education company committed to the mission of 'making education better through love and technology'. The TAL technology team has always been dedicated to educational technology research and innovation. This is the external platform of the TAL technology team, sharing weekly curated technical articles and recruitment information.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.