Real-Time Video Stream Subject Recognition for E-commerce (MetaSight)
MetaSight introduces a real‑time video‑stream subject recognition system that replaces the traditional capture‑upload‑search flow with continuous, on‑camera product identification, using a sub‑10 MB edge model, global IDs for frame continuity, edge‑cloud collaboration, and batch processing to cut interaction steps, lower server load, and pave the way for future AR/XR shopping experiences.
This article introduces a real‑time video stream subject recognition scenario for e‑commerce, dubbed “MetaSight”. It explains how the traditional image‑search workflow (capture → upload → result page) is being transformed into a continuous video‑stream search.
The business value includes optimizing the existing image‑search mode, providing richer contextual information directly on the camera view, reducing user interaction steps, enhancing shopping decision‑making, and laying groundwork for future XR shopping experiences.
Key technical challenges identified are: maintaining subject continuity across frames, fitting edge‑side models within a sub‑10 MB footprint, and balancing performance with user experience in a high‑throughput video stream.
The proposed solution addresses three problems: (1) assigning a global unique ID to the same subject across consecutive frames, (2) using edge‑cloud collaboration to overcome resource limits and improve accuracy, and (3) employing batch processing to reduce network requests and server load.
Architecture overview consists of five layers: UI layer (card lifecycle management), data layer (subject caching and batch upload), task management, algorithm unit, and the base platform built on the in‑house MNN framework with supporting middleware.
Data layer details include model pre‑download during idle time, edge‑cloud feedback loops for algorithm correction, and a cache‑plus‑batch strategy for efficient image‑search requests.
UI layer manages three card types—fallback, small, and large—using a Weex 2.0 cross‑platform framework. It implements collision avoidance and dynamic rendering to ensure smooth user interaction.
In summary, the “real‑time search” solution provides an end‑to‑end video‑stream search pipeline, improves user experience, and positions the product for future AR/XR integration under the MetaSight vision.
DaTaobao Tech
Official account of DaTaobao Technology
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.