Tagged articles
12 articles
Page 1 of 1
Machine Heart
Machine Heart
May 27, 2026 · Artificial Intelligence

CVPR 2026: Learning Camera Pose from 10M Unlabeled Driving Videos

LA‑Pose shows that a model can acquire accurate camera pose estimation for autonomous driving by self‑supervised pretraining on roughly ten million unlabeled driving video clips and fine‑tuning with only a small amount of high‑quality 3D annotations, achieving over 10% accuracy gains while drastically reducing labeling cost.

Autonomous DrivingCVPR 2026LA-Pose
0 likes · 8 min read
CVPR 2026: Learning Camera Pose from 10M Unlabeled Driving Videos
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 26, 2026 · Artificial Intelligence

AI Trends in Medical Imaging: From Recognition to Workflow Automation (CVPR'26)

The article reviews CVPR 2026 medical imaging papers, highlighting a shift from pure image recognition toward efficient model adaptation, clinical semantic understanding, and cross‑modal reasoning, with examples ranging from simple AI agents optimizing workflows to multimodal foundation models for CT, ultrasound, spatial transcriptomics, IMU‑video alignment, and dual‑view X‑ray analysis.

AICVPR 2026Multimodal
0 likes · 24 min read
AI Trends in Medical Imaging: From Recognition to Workflow Automation (CVPR'26)
Machine Heart
Machine Heart
May 5, 2026 · Artificial Intelligence

Monocular Open‑Vocabulary Occupancy Prediction Sets New SOTA for Indoor 3D Scenes (CVPR 2026 Oral)

The paper introduces LegoOcc, a monocular open‑vocabulary occupancy framework that unifies geometry and semantics via language‑embedded Gaussians, uses Poisson‑based aggregation and progressive temperature decay, and achieves over twice the previous mIoU on Occ‑ScanNet while running at 22.47 FPS, making it well suited for embodied robots.

3D visionCVPR 2026Monocular
0 likes · 12 min read
Monocular Open‑Vocabulary Occupancy Prediction Sets New SOTA for Indoor 3D Scenes (CVPR 2026 Oral)
Machine Heart
Machine Heart
Apr 23, 2026 · Artificial Intelligence

UniLS: End-to-End Audio-Driven Framework Eliminates the ‘Poker Face’ in Digital Human Dialogue

UniLS, the first end‑to‑end audio‑driven framework that jointly generates speaking and listening facial motions for digital humans, achieves state‑of‑the‑art speaking accuracy, improves listening naturalness by 44.1 %, and runs at over 500 FPS, as demonstrated on the CVPR 2026‑accepted paper with extensive quantitative and user studies.

CVPR 2026audio-driven animationdigital humans
0 likes · 9 min read
UniLS: End-to-End Audio-Driven Framework Eliminates the ‘Poker Face’ in Digital Human Dialogue
Machine Heart
Machine Heart
Apr 12, 2026 · Artificial Intelligence

CVPR 2026 WorldArena Challenge Launches with Amap’s Open‑Source High‑Performance World Model Baseline

The CVPR 2026 WorldArena Challenge, organized by top academic institutions and Amap, introduces a new evaluation framework that tests video world models for physical realism and functional utility, while Amap releases its high‑performance ABot‑PhysWorld model and benchmark scores that set a new state‑of‑the‑art.

ABot-PhysWorldCVPR 2026Physical Consistency
0 likes · 9 min read
CVPR 2026 WorldArena Challenge Launches with Amap’s Open‑Source High‑Performance World Model Baseline
Machine Heart
Machine Heart
Apr 12, 2026 · Artificial Intelligence

Breaking Camera Dependence: M4Human Advances Millimeter-Wave Human Perception to New Levels

The M4Human paper introduces a large‑scale multimodal mmWave radar benchmark for high‑fidelity human mesh reconstruction, detailing its data collection pipeline, annotation quality, benchmark splits, a raw‑radar‑tensor baseline (RT‑Mesh), and extensive experiments that show radar’s privacy‑friendly robustness and complementary strength to visual sensors.

CVPR 2026M4HumanRF dataset
0 likes · 13 min read
Breaking Camera Dependence: M4Human Advances Millimeter-Wave Human Perception to New Levels
Machine Heart
Machine Heart
Apr 10, 2026 · Artificial Intelligence

Ant AI Wins CVPR 2026 Challenge: A Powerful Countermeasure Against Deepfake Abuse

Amid rising deep‑fake misuse in entertainment, Ant Group’s AI Security Lab won the CVPR 2026 NTIRE Robust AIGC Image Detection challenge with a ROC AUC of 0.9723, presenting a DINOv3‑based robust detection framework, extensive multi‑source data, and novel augmentation and optimization techniques to combat AI‑generated abuse.

AIGCCVPR 2026DINOv3
0 likes · 10 min read
Ant AI Wins CVPR 2026 Challenge: A Powerful Countermeasure Against Deepfake Abuse
Machine Heart
Machine Heart
Apr 8, 2026 · Artificial Intelligence

From a Single Image to a Physically Realistic 4D Video in One Minute

PhysGM, a CVPR 2026 paper by Beijing Institute of Technology and Li Auto, transforms a single static image into a high‑fidelity 4D video that obeys real‑world physics in under a minute, using a dual‑decoder transformer, DPO alignment, and a newly built 50k‑item PhysAssets dataset, outperforming prior methods in speed and quality.

3D Gaussian SplattingCVPR 2026Direct Preference Optimization
0 likes · 7 min read
From a Single Image to a Physically Realistic 4D Video in One Minute
vivo Internet Technology
vivo Internet Technology
Apr 1, 2026 · Artificial Intelligence

Why Fixed CFG Fails and How Time‑Adaptive C²FG Boosts Diffusion Image Generation

This article introduces C²FG, a training‑free, plug‑and‑play time‑adaptive exponential control function that replaces the fixed classifier‑free guidance scale, theoretically justifies its superiority with score discrepancy bounds, and demonstrates significant FID and IS improvements across multiple diffusion architectures on ImageNet.

CVPR 2026Classifier-Free GuidancePlug-and-Play
0 likes · 7 min read
Why Fixed CFG Fails and How Time‑Adaptive C²FG Boosts Diffusion Image Generation
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 22, 2026 · Artificial Intelligence

NS-Diff: Adding a Physics Engine to Diffusion Models for Fluid and Rigid‑Body Dynamics

The CVPR 2026 paper introduces NS‑Diff, a physics‑guided video diffusion framework that combines a noise‑robust dynamics detector, a physical‑condition latent injection module, and reinforcement‑learning optimization to reduce jerk error by 43 % and fluid divergence by 33 %, achieving superior physical realism and visual quality across multiple benchmarks.

CVPR 2026NS‑DiffNavier-Stokes
0 likes · 13 min read
NS-Diff: Adding a Physics Engine to Diffusion Models for Fluid and Rigid‑Body Dynamics
AIWalker
AIWalker
Mar 12, 2026 · Artificial Intelligence

BeautyGRPO: RL‑Driven Realistic Portrait Retouching Ends Over‑Beautification (CVPR 2026)

The paper introduces BeautyGRPO, a reinforcement‑learning framework that combines a fine‑grained preference dataset (FRPref‑10K) with Dynamic Path Guidance to balance aesthetic enhancement and high‑fidelity preservation in portrait retouching, achieving superior metrics and user preference over existing SFT and RL models.

AI aestheticsCVPR 2026dynamic path guidance
0 likes · 11 min read
BeautyGRPO: RL‑Driven Realistic Portrait Retouching Ends Over‑Beautification (CVPR 2026)
Xiaomi Tech
Xiaomi Tech
Mar 3, 2026 · Artificial Intelligence

Xiaomi Scores 14 Papers at CVPR 2026, Showcasing Breakthroughs in Large Models and Autonomous Driving

CVPR 2026 accepted 14 Xiaomi papers spanning long‑video understanding, multimodal reasoning, GUI agents, and autonomous driving, each accompanied by arXiv and GitHub links, and introducing novel frameworks such as REVISOR, EMO‑R3, TimeViper, MSJoE, SafeGRPO, GUI‑CEval, ProactiveMobile, ParkGaussian, UFO, TraqPoint, SimScale, MeanFuser and DVGT.

Autonomous DrivingCVPR 2026Long Video Understanding
0 likes · 19 min read
Xiaomi Scores 14 Papers at CVPR 2026, Showcasing Breakthroughs in Large Models and Autonomous Driving