When Should a Streaming Video LLM Speak? Evidence‑Condition Alignment via Explicit Scene Graphs (Response‑G1)
The ACL 2026 paper introduces Response‑G1, a proactive streaming video‑LLM framework that aligns visual evidence with response conditions using explicit scene‑graph modeling, memory‑augmented retrieval, and trigger‑based decision making, achieving 12.8 % and 15.1 % improvements on active tasks of OVO‑Bench and StreamingBench while also benefiting passive settings.
