Tag

computer vision

1 views collected around this technical thread.

AntTech
AntTech
Jun 15, 2025 · Artificial Intelligence

21 Ant Research Papers Shaping CVPR 2025: AI Image & Video Generation Breakthroughs

The Interactive Intelligence Lab of Ant Technology Research Institute presented 21 accepted CVPR 2025 papers covering visual generation, editing, 3D vision, digital humans and multimodal AI, highlighting tools such as MagicQuill, Lumos, Aurora, FLARE, LeviTor, MangaNinja, AniDoc, Mimir, AvatarArtist, DiffListener, MotionStone, TensorialGaussianAvatars, DualTalk, CompreCap and Uni-AD.

CVPR2025Multimodal Modelscomputer vision
0 likes · 20 min read
21 Ant Research Papers Shaping CVPR 2025: AI Image & Video Generation Breakthroughs
Kuaishou Large Model
Kuaishou Large Model
Jun 11, 2025 · Artificial Intelligence

12 Kuaishou Breakthrough Papers at CVPR 2025: Video Generation, Diffusion & Multimodal AI

CVPR 2025 in Nashville will feature 12 Kuaishou papers spanning large‑scale video datasets, quality assessment, 3D/4D reconstruction, controllable generation, diffusion scaling laws, multimodal simulation, and novel benchmarks, highlighting the company's cutting‑edge contributions to video AI research.

computer visiondiffusion modelslarge-scale datasets
0 likes · 21 min read
12 Kuaishou Breakthrough Papers at CVPR 2025: Video Generation, Diffusion & Multimodal AI
Kuaishou Audio & Video Technology
Kuaishou Audio & Video Technology
Jun 11, 2025 · Artificial Intelligence

Kuaishou Showcases 12 Cutting-Edge CVPR 2025 Papers on Video Generation and AI

Kuaishou presented twelve peer‑reviewed papers at CVPR 2025 covering video quality assessment, large‑scale video datasets, dynamic 3D avatar reconstruction, 4D scene simulation, controllable video generation, scaling laws for diffusion transformers, multimodal foundations, and more, highlighting the company's leading research in computer vision and AI.

AI researchCVPR2025computer vision
0 likes · 21 min read
Kuaishou Showcases 12 Cutting-Edge CVPR 2025 Papers on Video Generation and AI
Kuaishou Tech
Kuaishou Tech
Jun 10, 2025 · Artificial Intelligence

Top 12 Cutting-Edge Video Generation Papers from Kuaishou at CVPR 2025

The article highlights CVPR 2025’s acceptance statistics and showcases twelve cutting‑edge video‑generation papers from Kuaishou, spanning datasets, quality assessment, style control, scaling laws, 4D simulation, interleaved image‑text data, vision‑language acceleration, high‑fidelity avatars, patch‑wise super‑resolution, narrative‑driven benchmarks, sketch‑based editing, and spatio‑temporal diffusion, each with links and abstracts.

CVPR2025Kuaishoucomputer vision
0 likes · 20 min read
Top 12 Cutting-Edge Video Generation Papers from Kuaishou at CVPR 2025
Kuaishou Tech
Kuaishou Tech
May 26, 2025 · Artificial Intelligence

CineMaster: A 3D‑Aware and Controllable Framework for Cinematic Text‑to‑Video Generation

Researchers introduce CineMaster, a SIGGRAPH‑2025 paper presenting a 3D‑aware, controllable text‑to‑video generation framework that lets users define target objects and camera motions via an interactive workflow, enabling cinematic video creation with high‑quality, user‑directed results.

3D-awareAI videoCineMaster
0 likes · 6 min read
CineMaster: A 3D‑Aware and Controllable Framework for Cinematic Text‑to‑Video Generation
JD Tech
JD Tech
May 26, 2025 · Artificial Intelligence

Solving Technical Challenges at JD Retail: Multi‑Reward Models, LLM‑Based Query Expansion, Model Pruning, and Reinforcement Learning

This article details how JD Retail's young algorithm engineers tackled a series of AI engineering problems—including advertising image quality assessment with multi‑reward models, large‑language‑model‑driven query expansion, FFT‑and‑RDP‑based model pruning, and agent‑centric reinforcement learning—while sharing practical growth insights and code snippets.

AILarge Language Modelscomputer vision
0 likes · 15 min read
Solving Technical Challenges at JD Retail: Multi‑Reward Models, LLM‑Based Query Expansion, Model Pruning, and Reinforcement Learning
DaTaobao Tech
DaTaobao Tech
May 16, 2025 · Artificial Intelligence

JianYi: AI‑Powered Image Segmentation and Matting System for Taobao Home‑Decoration

The article introduces JianYi, a self‑developed image segmentation and matting system for Taobao's home‑decoration business that supports product, human, and panoramic segmentation with multi‑modal interaction, achieving high‑precision real‑time performance and powering AI tools such as "Jiazuo" and "Fang Wo Jia".

Artificial Intelligencecomputer visiondeep learning
0 likes · 11 min read
JianYi: AI‑Powered Image Segmentation and Matting System for Taobao Home‑Decoration
php中文网 Courses
php中文网 Courses
Apr 23, 2025 · Artificial Intelligence

Real-Time Face Recognition with PHP and OpenCV

This article explains how to set up a PHP environment, control a camera, and use the OpenCV library to perform real‑time face detection and recognition with code examples, demonstrating a practical security solution for applications such as access control and surveillance.

PHPcomputer visionface recognition
0 likes · 6 min read
Real-Time Face Recognition with PHP and OpenCV
JD Tech Talk
JD Tech Talk
Apr 22, 2025 · Artificial Intelligence

End-to-End 3D Spatial Video Generation via Monocular Depth Estimation, Novel View Synthesis, and MV-HEVC Encoding

Leveraging AI-driven monocular depth estimation, novel view synthesis, and MV‑HEVC encoding, the JD Retail Content R&D team presents an end‑to‑end pipeline that converts 2D video assets into high‑quality immersive 3D spatial videos, introduces the large‑scale StereoV1K dataset, and demonstrates superior performance over existing methods.

3D video generationAIGCMV-HEVC
0 likes · 22 min read
End-to-End 3D Spatial Video Generation via Monocular Depth Estimation, Novel View Synthesis, and MV-HEVC Encoding
Amap Tech
Amap Tech
Apr 21, 2025 · Artificial Intelligence

Lenna: Language‑Enhanced Reasoning Detection Assistant and a Chain‑of‑Thought Image Editing Framework Using Multimodal Large Language Models

At ICASSP 2025, Gaode’s two accepted papers present Lenna, a language‑enhanced reasoning detection assistant that adds a DET token to multimodal LLMs and achieves state‑of‑the‑art accuracy on RefCOCO benchmarks, and a chain‑of‑thought image‑editing framework that converts complex prompts into segmented masks and repair prompts for diffusion‑based inpainting, surpassing existing methods.

AIChain-of-ThoughtICASSP
0 likes · 10 min read
Lenna: Language‑Enhanced Reasoning Detection Assistant and a Chain‑of‑Thought Image Editing Framework Using Multimodal Large Language Models
Python Programming Learning Circle
Python Programming Learning Circle
Apr 19, 2025 · Artificial Intelligence

Building an AI‑Powered Dou Dizhu Card‑Playing Assistant with YOLOv5 and DouZero

This tutorial explains how to create an AI‑driven Dou Dizhu (Chinese poker) assistant that captures game screenshots, uses YOLOv5 for card detection, leverages the DouZero model for optimal move prediction, and provides a PyQt5 UI for real‑time play assistance, including environment setup and code examples.

AIDouZeroPyQt5
0 likes · 13 min read
Building an AI‑Powered Dou Dizhu Card‑Playing Assistant with YOLOv5 and DouZero
DataFunTalk
DataFunTalk
Apr 18, 2025 · Artificial Intelligence

Applying ByteDance’s Doubao‑1.5 Vision Model for Image Counting and Automated Annotation

The article demonstrates how ByteDance’s new Doubao‑1.5 multimodal model can be used to locate and count objects in images—such as sushi plates, street signs, and cartoon hats—by generating coordinates and overlaying visual annotations through a concise Python script.

AIDoubaoImage Annotation
0 likes · 5 min read
Applying ByteDance’s Doubao‑1.5 Vision Model for Image Counting and Automated Annotation
JD Retail Technology
JD Retail Technology
Apr 16, 2025 · Artificial Intelligence

AI‑Driven 3D Spatial Video Generation from Monocular 2D Content with MV‑HEVC Encoding

This work presents an end‑to‑end AI pipeline that transforms existing monocular 2D videos into immersive 3D spatial streams by combining DINO‑v2‑based depth estimation, multi‑branch view synthesis, and MV‑HEVC encoding, achieving up to 33 % BD‑Rate reduction, 31 % speed gains, state‑of‑the‑art visual quality, and real‑time production suitability, validated on the new StereoV1K benchmark and deployed in JD.Vision’s e‑commerce catalog.

3D videoAI generationAIGC
0 likes · 21 min read
AI‑Driven 3D Spatial Video Generation from Monocular 2D Content with MV‑HEVC Encoding
Python Programming Learning Circle
Python Programming Learning Circle
Mar 29, 2025 · Artificial Intelligence

Hand Gesture Detection Using OpenCV and Python: Skin Color and Contour Processing

This article presents a step‑by‑step tutorial for building a hand‑gesture detection system in Python using OpenCV, covering video capture, skin‑color detection via YCrCb conversion, contour extraction, and full source code for processing frames and visualizing results.

Hand GesturePythonSkin Detection
0 likes · 6 min read
Hand Gesture Detection Using OpenCV and Python: Skin Color and Contour Processing
AntTech
AntTech
Mar 14, 2025 · Artificial Intelligence

MP-GUI: Modality Perception with Multimodal Large Language Models for GUI Understanding

The CVPR 2025 paper "MP-GUI: Modality Perception with MLLMs for GUI Understanding" presents a novel algorithm that enhances multimodal large language models' ability to perceive and reason about graphical user interfaces by integrating text, visual, and spatial signals through specialized perception modules and a dynamic fusion gate, achieving state‑of‑the‑art performance on multiple GUI benchmarks.

CVPR2025GUI UnderstandingMLLM
0 likes · 5 min read
MP-GUI: Modality Perception with Multimodal Large Language Models for GUI Understanding
php中文网 Courses
php中文网 Courses
Mar 13, 2025 · Artificial Intelligence

Real-Time Image Processing with PHP and OpenCV: A Step-by-Step Tutorial

This tutorial guides PHP developers through installing OpenCV and the php‑opencv extension, capturing live video, displaying frames in a browser, and performing real‑time face detection using Haar cascades, providing a practical introduction to computer‑vision tasks in PHP.

PHPcomputer visionface detection
0 likes · 6 min read
Real-Time Image Processing with PHP and OpenCV: A Step-by-Step Tutorial
DataFunTalk
DataFunTalk
Mar 2, 2025 · Artificial Intelligence

Top 10 AI Research Papers of 2024: Summaries, Contributions, and Practical Uses

This article presents a curated selection of ten groundbreaking 2024 AI research papers, detailing each model’s abstract, key contributions, and practical application scenarios across computer vision, multimodal learning, NLP, and efficient inference, offering readers inspiration and actionable insights for real‑world projects.

2024 researchAINLP
0 likes · 18 min read
Top 10 AI Research Papers of 2024: Summaries, Contributions, and Practical Uses
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Feb 27, 2025 · Artificial Intelligence

SAFE: A Lightweight General AI Image Detection Method Achieving 96.7% Accuracy Across 33 Test Subsets

SAFE is a lightweight AI‑image detection framework using only 1.44 M parameters and 2.30 B FLOPs that preserves fine‑grained artifacts through crop‑based preprocessing, invariant augmentations, and high‑frequency wavelet features, achieving an average 96.7 % accuracy across 33 test subsets and strong generalization to unseen GAN and diffusion generators.

AI image detectionGenerative Modelscomputer vision
0 likes · 11 min read
SAFE: A Lightweight General AI Image Detection Method Achieving 96.7% Accuracy Across 33 Test Subsets
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Feb 24, 2025 · Artificial Intelligence

AIDE: Hybrid Feature Detector for AI‑Generated Image Detection and the Chameleon Benchmark

The paper introduces AIDE, a hybrid AI‑generated image detector that fuses low‑level pixel statistics with high‑level semantic embeddings, and the manually curated Chameleon benchmark of ~26 000 diverse, high‑realism images, showing AIDE surpasses nine state‑of‑the‑art methods by up to 4.6 % while highlighting remaining challenges on this tougher dataset.

AI-generated image detectionbenchmark datasetcomputer vision
0 likes · 14 min read
AIDE: Hybrid Feature Detector for AI‑Generated Image Detection and the Chameleon Benchmark
DevOps
DevOps
Feb 17, 2025 · Artificial Intelligence

Microsoft OmniParser V2.0: A Visual Agent Parsing Framework for Enhanced UI Understanding

Microsoft's OmniParser V2.0 transforms large language models such as DeepSeek‑R1, GPT‑4o, and Qwen‑2.5VL into visual AI agents by accurately detecting interactive UI elements, providing semantic descriptions, and generating structured representations that boost inference speed, reduce latency by 60%, and dramatically improve benchmark accuracy.

AI AgentDeepSeekGPT-4o
0 likes · 7 min read
Microsoft OmniParser V2.0: A Visual Agent Parsing Framework for Enhanced UI Understanding