Tagged articles
19 articles
Page 1 of 1
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 22, 2026 · Artificial Intelligence

Li Mu Returns to Bilibili with a Real-Time AI Avatar

Li Mu (沐神) returns to Bilibili after a year to showcase Higgs Avatar v1, a fully AI‑generated real‑time digital human that can listen, speak, lip‑sync and display facial expressions, with performance metrics showing 16 ms per frame on a single H100 GPU and potential applications ranging from customer service to training, while also raising ethical considerations about identity and trust.

AI AvatarBoson AIHiggs Avatar
0 likes · 7 min read
Li Mu Returns to Bilibili with a Real-Time AI Avatar
Weekly Large Model Application
Weekly Large Model Application
May 5, 2026 · Artificial Intelligence

Where Is End‑to‑End Speech AI Heading? Product vs Engineering Perspectives

The article clarifies the dual meaning of “end‑to‑end” in speech AI—product simplicity and engineering unification—then outlines six emerging trends, from real‑time conversational latency to multilingual robustness, token‑based audio pipelines, voice‑specific security, edge privacy, and the growing importance of data quality and reproducibility.

End-to-EndReal-Time InteractionSpeech AI
0 likes · 8 min read
Where Is End‑to‑End Speech AI Heading? Product vs Engineering Perspectives
Lao Guo's Learning Space
Lao Guo's Learning Space
Apr 21, 2026 · Artificial Intelligence

HappyOyster: Build an Explorable Interactive World with a Single Prompt

Alibaba’s ATH team unveiled HappyOyster, a real‑time world‑model platform that lets users generate and explore interactive 3D environments from a single sentence or image, offering two modes—Wander for exploration and Direct for creation—while detailing its streaming architecture, multimodal foundation, competitive advantages, use cases, and current limitations.

AI VideoGame DevelopmentReal-Time Interaction
0 likes · 11 min read
HappyOyster: Build an Explorable Interactive World with a Single Prompt
Machine Heart
Machine Heart
Apr 20, 2026 · Artificial Intelligence

AURA: Real-Time Video Understanding Shifts from Post-Play Q&A to Continuous Interaction

AURA introduces an always‑on video LLM that processes streams frame‑by‑frame, decides when to stay silent or answer, uses a dual sliding‑window context and a Silent‑Speech Balanced Loss, achieves state‑of‑the‑art scores on StreamingBench, OVO‑Bench and OmniMMI, and runs at 2 FPS with ~312 ms end‑to‑end latency on two 80G GPUs.

AURAReal-Time InteractionSilent-Speech Loss
0 likes · 15 min read
AURA: Real-Time Video Understanding Shifts from Post-Play Q&A to Continuous Interaction
21CTO
21CTO
Nov 4, 2025 · Artificial Intelligence

LongCat-Flash-Omni: How an Open-Source 560B Model Achieves Real-Time Multimodal Mastery

LongCat-Flash-Omni, an open‑source 560 billion‑parameter multimodal model, combines efficient Shortcut‑Connected MoE architecture with advanced perception and speech modules to deliver low‑latency real‑time audio‑video interaction and state‑of‑the‑art performance across text, image, video, and audio tasks.

Efficient InferenceLarge Language ModelReal-Time Interaction
0 likes · 10 min read
LongCat-Flash-Omni: How an Open-Source 560B Model Achieves Real-Time Multimodal Mastery
Meituan Technology Team
Meituan Technology Team
Nov 3, 2025 · Artificial Intelligence

LongCat-Flash-Omni: 560B Open‑Source Multimodal Model with Real‑Time Interaction

LongCat-Flash-Omni, the latest open‑source model from Meituan, combines a 560 billion‑parameter architecture, efficient multimodal perception and speech reconstruction modules, and a progressive training strategy to deliver real‑time audio‑video interaction and state‑of‑the‑art performance across text, image, audio, and video tasks.

AILarge Language ModelMultimodal
0 likes · 9 min read
LongCat-Flash-Omni: 560B Open‑Source Multimodal Model with Real‑Time Interaction
Instant Consumer Technology Team
Instant Consumer Technology Team
Oct 31, 2025 · Cloud Computing

How WebRTC Enables Millisecond‑Level Dual‑Direction Streaming in Cloud‑Based Mobile Testing

This article explains how a cloud testing platform leverages WebRTC to achieve sub‑200 ms bidirectional video transmission, enabling ultra‑low‑latency screen casting and remote camera feed replacement for mobile devices, and details the architecture, optimizations, performance gains, and future enhancements.

Mobile AutomationReal-Time InteractionWebRTC
0 likes · 20 min read
How WebRTC Enables Millisecond‑Level Dual‑Direction Streaming in Cloud‑Based Mobile Testing
Instant Consumer Technology Team
Instant Consumer Technology Team
Jun 19, 2025 · Artificial Intelligence

Exploring II-Agent: An Open‑Source AI Agent Framework for Multi‑Domain Automation

II-Agent is an open‑source, multi‑domain AI agent framework that leverages powerful large language models, a rich toolset, planning‑and‑reflection mechanisms, and advanced context management to enable autonomous task execution, real‑time interaction, and seamless integration across development, data analysis, and enterprise workflows.

AI agentContext ManagementLarge Language Model
0 likes · 21 min read
Exploring II-Agent: An Open‑Source AI Agent Framework for Multi‑Domain Automation
KooFE Frontend Team
KooFE Frontend Team
May 22, 2025 · Artificial Intelligence

How AG-UI Protocol Bridges AI Agents and User Interfaces for Real‑Time Collaboration

The AG-UI (Agent User Interaction) protocol standardizes communication between backend AI agents and front‑end interfaces using a single JSON event stream, addressing real‑time streaming, tool orchestration, shared state, concurrency, security, and framework fragmentation to enable seamless human‑agent collaboration.

AG-UIAI AgentsProtocol
0 likes · 8 min read
How AG-UI Protocol Bridges AI Agents and User Interfaces for Real‑Time Collaboration
AI Frontier Lectures
AI Frontier Lectures
Apr 10, 2025 · Artificial Intelligence

How WonderTurbo Generates Interactive 3D Worlds in Just 0.72 Seconds

WonderTurbo introduces a real‑time 3D scene generation pipeline that accelerates both geometry and appearance modeling to under a second per view, using StepSplat, QuickDepth, and FastPaint modules, achieving up to 15× speedup while maintaining high visual quality.

3D generationDepth CompletionGeometry Modeling
0 likes · 16 min read
How WonderTurbo Generates Interactive 3D Worlds in Just 0.72 Seconds
DataFunSummit
DataFunSummit
Dec 25, 2024 · Artificial Intelligence

Design and Implementation of a Multimodal Real-Time Voice AI Teammate for Naraka: Bladepoint

This article explains the design, implementation, and underlying Agent‑Oriented‑Programming framework of NetEase Fuxi’s multimodal real‑time voice AI teammate for the mobile game ‘Naraka: Bladepoint’, highlighting its capabilities such as autonomous navigation, combat assistance, natural dialogue, teaching, and broader applications of voice technology in games.

Naraka BladepointReal-Time Interactionagent-oriented programming
0 likes · 12 min read
Design and Implementation of a Multimodal Real-Time Voice AI Teammate for Naraka: Bladepoint
Bilibili Tech
Bilibili Tech
Sep 13, 2024 · Backend Development

Architectural Evolution of Bilibili Live Interaction Center

To solve duplicated functionality, legacy code, and scalability limits in Bilibili’s live‑streaming interaction services, the team created a unified Interaction Center that abstracts RTC, consolidates session, link, UI, scoring and role management, introduces a shared state machine and tracing, and evolves through phased, extensible architecture for higher performance and maintainability.

RTCReal-Time Interactionlive streaming
0 likes · 22 min read
Architectural Evolution of Bilibili Live Interaction Center
Bilibili Tech
Bilibili Tech
May 30, 2023 · Backend Development

Evolution of Interactive Live Streaming: Bilibili's Open Platform Journey

Bilibili’s live‑streaming tech team created an open interactive platform—spurred by the 600,000‑viewer success of Xiu Gou Nightclub—that supports hang‑up, host‑enhanced, and tool‑assisted streams, provides SDKs, APIs, data‑compliant authentication, tackles latency and rendering challenges, and now explores advertising, sponsorship and game‑promotion models to sustain its ecosystem.

Real-Time InteractionSDK DevelopmentWebSocket communication
0 likes · 13 min read
Evolution of Interactive Live Streaming: Bilibili's Open Platform Journey
DataFunSummit
DataFunSummit
Dec 9, 2022 · Artificial Intelligence

Volcano Engine Virtual Digital Human Technology Overview

This article provides a comprehensive overview of Volcano Engine's virtual digital human platform, detailing its definition, AI‑driven and human‑driven classifications, 2D and 3D technical architectures, multi‑modal perception, interaction capabilities, application scenarios, and future development directions.

2D avatar3D avatarReal-Time Interaction
0 likes · 15 min read
Volcano Engine Virtual Digital Human Technology Overview
Baidu Geek Talk
Baidu Geek Talk
Sep 7, 2022 · Artificial Intelligence

Design and Architecture of AI Digital Human Live Streaming System

The paper presents a cloud‑native architecture for AI‑driven digital‑human live‑streaming, detailing three‑layer asset, interaction, and media modules, real‑time script and Q&A scheduling, fault‑tolerant rendering and control services, and demonstrates how virtual anchors can deliver continuous, lifelike 24/7 e‑commerce streams.

AIReal-Time InteractionSystem Architecture
0 likes · 21 min read
Design and Architecture of AI Digital Human Live Streaming System
Tencent Cloud Developer
Tencent Cloud Developer
Sep 4, 2020 · Frontend Development

Introducing TWebLive: Tencent Cloud Web Live Interactive SDK

TWebLive, Tencent Cloud’s new web‑live interactive SDK, bundles TRTC, TIM and TCPlayer to let developers add push streaming, low‑latency WebRTC or CDN playback, and real‑time chat or bullet‑screen interaction with simple APIs, demo projects and open‑source code, replacing legacy Flash solutions.

JavaScriptReal-Time InteractionSDK
0 likes · 11 min read
Introducing TWebLive: Tencent Cloud Web Live Interactive SDK
Youku Technology
Youku Technology
Nov 21, 2019 · Industry Insights

How Alibaba Delivered a Global 4K Dolby‑Atmos Live Stream to 200+ Countries

Alibaba Entertainment’s 2019 Double‑11 "Cat Night" showcased a suite of cutting‑edge streaming technologies—including multi‑angle frame alignment, Dolby‑Atmos audio, low‑latency SRT transport, smart bitrate, edge‑cloud distribution, and a zero‑loss quality‑assurance system—that enabled a seamless 4K experience for viewers in over 200 countries.

Dolby AtmosReal-Time Interactionedge-cloud
0 likes · 9 min read
How Alibaba Delivered a Global 4K Dolby‑Atmos Live Stream to 200+ Countries
Alibaba Cloud Developer
Alibaba Cloud Developer
Jan 16, 2019 · Artificial Intelligence

How Alibaba’s AliPlayStudio Powers Real‑Time AI Video Interactions on Mobile

This article details the research and engineering behind Alibaba's AliPlayStudio, a video‑interactive platform that combines computer‑vision algorithms such as human parsing, gesture and pose detection, and controllable style transfer, all optimized for real‑time deployment on low‑power mobile and embedded devices.

Real-Time Interactiongesture recognitionmobile AI
0 likes · 17 min read
How Alibaba’s AliPlayStudio Powers Real‑Time AI Video Interactions on Mobile
Alibaba Cloud Developer
Alibaba Cloud Developer
Jan 3, 2017 · Backend Development

How Alibaba Engineered Real‑Time, Cross‑Device Interaction for the 2016 Double‑11 Live Show

The article details Alibaba's technical innovations for the 2016 Double‑11 live event, covering two‑way audience interaction, time‑offset synchronization, massive real‑time like ranking, AR cross‑screen features, and the custom internet‑director console that together enabled seamless, high‑concurrency, multi‑platform engagement.

ARBackend EngineeringReal-Time Interaction
0 likes · 14 min read
How Alibaba Engineered Real‑Time, Cross‑Device Interaction for the 2016 Double‑11 Live Show