Tagged articles

653 articles

Page 1 of 7

May 31, 2026 · Artificial Intelligence

How a Child’s Finger‑Drawn Moustache Fooled AI Age Verification (and Made Engineers Speechless)

When Discord switched to a teen‑by‑default policy, users discovered that a simple thumb sketch with two eyes and a mouth could trick the on‑device AI age estimator into granting adult access, exposing the limits of lightweight facial analysis models.

AI age verificationDiscordMeta

0 likes · 6 min read

How a Child’s Finger‑Drawn Moustache Fooled AI Age Verification (and Made Engineers Speechless)

Machine Heart

May 17, 2026 · Artificial Intelligence

ViT³: Vision Test‑Time Training Architecture Breaking Transformer Complexity (CVPR 2026 Oral)

The paper systematically studies Test‑Time Training (TTT) for vision, derives six design principles, and introduces ViT³—a pure TTT architecture that uses full‑batch internal training, a learning rate of 1.0, and lightweight SwiGLU‑Depthwise convolution modules, achieving state‑of‑the‑art linear‑complexity performance across classification, detection, segmentation and generation tasks.

Linear ComplexitySequence ModelingTest-Time Training

0 likes · 14 min read

ViT³: Vision Test‑Time Training Architecture Breaking Transformer Complexity (CVPR 2026 Oral)

Data Party THU

May 15, 2026 · Artificial Intelligence

94% Precision: YOLO11‑Based Detection of Near‑Earth Object and Satellite Streaks

The StreakMind system built by the Spanish Royal Navy Academy uses a YOLO11‑OBB detector trained on over 2,000 real astronomical images and 280 synthetic streaks to automatically identify satellite and asteroid streaks with 94% precision and 97% recall, delivering standardized database entries and robust frame‑to‑frame tracking.

StreakMindYOLO11astronomical imaging

0 likes · 10 min read

94% Precision: YOLO11‑Based Detection of Near‑Earth Object and Satellite Streaks

Machine Heart

May 15, 2026 · Artificial Intelligence

How X2SAM Empowers Multimodal Models to Segment Images and Videos at Pixel Level

X2SAM is a unified multimodal large model that combines image and video segmentation with language and visual prompts, introduces a Mask Memory for temporal consistency, defines a new V‑VGD task, and achieves state‑of‑the‑art results while cutting training cost by over 30%.

Large Language ModelV-VGDX2SAM

0 likes · 9 min read

How X2SAM Empowers Multimodal Models to Segment Images and Videos at Pixel Level

HyperAI Super Neural

May 14, 2026 · Artificial Intelligence

YOLO‑11 Enables 94% Detection of Near‑Earth Object and Satellite Streaks

StreakMind, developed by the Spanish Royal Navy Academy’s observatory, combines real and synthetic astronomical images to train a YOLO‑11 oriented‑bounding‑box detector that robustly identifies satellite and asteroid streaks, achieving 94% precision and 97% recall on an independent test set of 273 images, and automatically integrates results into a standardized MPC database.

AIStreakMindYOLO-11

0 likes · 8 min read

YOLO‑11 Enables 94% Detection of Near‑Earth Object and Satellite Streaks

Machine Heart

May 14, 2026 · Artificial Intelligence

Breaking the 3D Perception Bottleneck: VGGT Series Enables Dynamic High‑Fidelity Reconstruction

The VGGT series from KOKONI 3D and collaborators tackles three core 3D perception limits—unbounded sequence memory, dynamic‑static entanglement, and compute‑precision trade‑offs—by introducing StreamCacheVGGT, progressive decoupling, and HD‑VGGT, achieving O(1) memory streaming, 15%+ accuracy gains on dynamic benchmarks, and record‑high AUC on RealEstate10K.

3D ReconstructionVGGTcomputer vision

0 likes · 10 min read

Breaking the 3D Perception Bottleneck: VGGT Series Enables Dynamic High‑Fidelity Reconstruction

Machine Heart

May 6, 2026 · Artificial Intelligence

Scal3R Enables Stable Kilometer-Scale 3D Reconstruction of Long Videos

Scal3R introduces test‑time training with a global‑context memory and synchronization mechanism that lets models train on and infer over ultra‑long video sequences, achieving accurate camera poses and dense point clouds for kilometer‑scale scenes while outperforming prior SLAM, SfM and streaming baselines on multiple benchmarks.

3D ReconstructionScal3RTest-Time Training

0 likes · 11 min read

Scal3R Enables Stable Kilometer-Scale 3D Reconstruction of Long Videos

Machine Heart

May 3, 2026 · Artificial Intelligence

How LEADER Beats Traditional LiDAR Relocalization in Accuracy and Speed

The LEADER framework achieves ten‑millisecond "eye‑open" LiDAR relocalization while surpassing the decimeter‑level accuracy of classic retrieval‑registration pipelines, using cylindrical projection, sparse convolution, and a Truncated Relative Reliability loss, as demonstrated on the NCLT benchmark.

Autonomous DrivingLEADERLiDAR

0 likes · 9 min read

How LEADER Beats Traditional LiDAR Relocalization in Accuracy and Speed

AI Explorer

May 2, 2026 · Artificial Intelligence

How DeepSeek’s “Cyber Finger” Gives AI a Physical Sense of the World

DeepSeek introduces a “cyber finger” that lets AI not only recognize objects but also infer their spatial relationships, orientations, and manipulability, turning visual perception into a digital simulation of touch and enabling more realistic interaction in robotics, AR, and assistive technologies.

AIDeepSeekaugmented reality

0 likes · 6 min read

How DeepSeek’s “Cyber Finger” Gives AI a Physical Sense of the World

Geek Labs

Apr 30, 2026 · Artificial Intelligence

Why the 14-Year-Old ccv Library Remains a Top Choice for Modern Computer Vision

The ccv library, created in 2010 and still actively maintained, offers a highly portable C‑based computer‑vision toolkit with minimal dependencies, a built‑in cache for preprocessing, a full libnnc neural‑network runtime, and easy builds via Bazel, Make, or Swift Package Manager.

C libraryNeural Networkcomputer vision

0 likes · 5 min read

Why the 14-Year-Old ccv Library Remains a Top Choice for Modern Computer Vision

Machine Heart

Apr 27, 2026 · Artificial Intelligence

Google DeepMind Open‑Sources TIPSv2: State‑of‑the‑Art Patch‑Text Alignment at CVPR 2026

The DeepMind team unveils TIPSv2, a vision‑language pre‑training model that dramatically improves patch‑level image‑text alignment through iBOT++, Head‑only EMA, and multi‑granularity captions, achieving record‑breaking results on nine tasks across twenty datasets while remaining fully open‑source.

DeepMindMultimodal PretrainingPatch-Text Alignment

0 likes · 12 min read

Google DeepMind Open‑Sources TIPSv2: State‑of‑the‑Art Patch‑Text Alignment at CVPR 2026

Java Tech Enthusiast

Apr 25, 2026 · Fundamentals

Turn a MacBook into a Touchscreen in 16 Hours for Under $7 Using Only a Mirror

A team led by Anish Athalye built a $1‑$7 hardware add‑on that lets a MacBook’s built‑in webcam detect finger touches via a tiny mirror, using classic computer‑vision techniques to map those touches to screen coordinates and generate mouse events.

Low‑cost hardwareMacBookMirror

0 likes · 8 min read

Turn a MacBook into a Touchscreen in 16 Hours for Under $7 Using Only a Mirror

AI Explorer

Apr 24, 2026 · Artificial Intelligence

Google’s ‘Banana’ Model Redefines Visual Transformers with Dynamic Sparse Attention

Google’s newly unveiled “Banana” visual Transformer introduces dynamic sparse attention that cuts inference cost 3‑5×, reduces memory by 70%, and improves ImageNet accuracy, while demonstrating real‑world gains in autonomous driving, medical imaging, and satellite analysis.

Dynamic Sparse AttentionGoogleImageNet

0 likes · 6 min read

Google’s ‘Banana’ Model Redefines Visual Transformers with Dynamic Sparse Attention

Xiaomi Tech

Apr 22, 2026 · Artificial Intelligence

SVOR Wins CVPR 2026 Video Object Removal Challenge – Xiaomi’s Open‑Source Solution for Three Tough Problems

The article introduces SVOR, a Xiaomi‑developed video object removal framework that tackles shadow residues, motion jitter, and mask defects with MUSE, DA‑Seg, and a two‑stage training pipeline, achieves new SOTA on multiple benchmarks, and clinches first place in the CVPR 2026 video removal contest, with all code and models released publicly.

DA‑SegMUSEOpen Source

0 likes · 8 min read

SVOR Wins CVPR 2026 Video Object Removal Challenge – Xiaomi’s Open‑Source Solution for Three Tough Problems

Machine Heart

Apr 19, 2026 · Artificial Intelligence

How Google Turns Your CAPTCHA Clicks into Training Data for the Next Generation of AI

The article explains how YouTube’s AI‑video rating and Google’s reCAPTCHA system covertly collect billions of user interactions each day, converting them into labeled data that fuels Google’s computer‑vision models such as Veo, Maps and Waymo, effectively turning routine security checks into a massive, unpaid AI training workforce.

AI trainingGoogleWaymo

0 likes · 7 min read

How Google Turns Your CAPTCHA Clicks into Training Data for the Next Generation of AI

AI Explorer

Apr 16, 2026 · Artificial Intelligence

AI Tech Daily: Top AI Research and Industry Updates on April 16 2026

This roundup highlights recent AI breakthroughs such as NVIDIA‑MIT’s Sol‑RL framework for faster diffusion model training, Peking University’s CPL++ visual localization improvement, DeepMind’s TIPSv2 for image recognition, Boston Dynamics Spot’s AI upgrade, Anthropic’s safety paper, a major MCP protocol vulnerability, OpenAI’s GPT‑5.4 release, and the shifting AI video landscape.

AIAI safetyMachine Learning

0 likes · 5 min read

AI Tech Daily: Top AI Research and Industry Updates on April 16 2026

Machine Heart

Apr 16, 2026 · Artificial Intelligence

CPL++: A Self‑Aware, Self‑Correcting Framework for Weakly Supervised Visual Grounding

The CPL++ framework equips weakly supervised visual grounding models with confidence‑aware pseudo‑label learning, self‑supervised association correction, and dynamic validation, enabling the model to detect and amend erroneous region‑query links during training, which yields absolute performance gains of 1–6 % across five benchmark datasets.

Visual GroundingWeak Supervisioncomputer vision

0 likes · 9 min read

CPL++: A Self‑Aware, Self‑Correcting Framework for Weakly Supervised Visual Grounding

AIWalker

Apr 10, 2026 · Artificial Intelligence

How RealRestorer Bridges the Gap in Real‑World Image Restoration

RealRestorer leverages large‑scale image‑editing models, a hybrid synthetic‑and‑real degradation pipeline, and a two‑stage training strategy to deliver state‑of‑the‑art open‑source restoration that generalizes across nine real‑world degradation types while preserving content consistency.

benchmarkcomputer visiondeep learning

0 likes · 13 min read

How RealRestorer Bridges the Gap in Real‑World Image Restoration

HyperAI Super Neural

Apr 9, 2026 · Artificial Intelligence

Cornell’s EMSeek Generates Insights from EM Images in 2–5 Minutes, 50× Faster Than Experts

EMSeek, a modular multi‑agent platform from Cornell, integrates perception, structural reconstruction, property prediction, and literature reasoning to automate electron microscopy analysis across 20 material systems and five tasks, achieving up to twice the speed of Segment Anything, over 90% structural similarity, and a 50‑fold reduction in processing time compared with expert workflows, while requiring only about 2 % labeled data for calibration.

EMSeekMaterials DiscoveryMulti-Agent AI

0 likes · 16 min read

Cornell’s EMSeek Generates Insights from EM Images in 2–5 Minutes, 50× Faster Than Experts

JD Cloud Developers

Apr 8, 2026 · Artificial Intelligence

How JoyAI-Image-Edit Brings Spatial Intelligence to Open‑Source Image Editing

JoyAI-Image-Edit, an open‑source multimodal foundation model from JD Research Institute, integrates text‑to‑image generation, image understanding, and instruction‑driven spatial editing, achieving world‑leading spatial perception and editing capabilities that unlock new applications across e‑commerce, robotics, 3D reconstruction, and design.

Generative Modelscomputer visionimage editing

0 likes · 7 min read

How JoyAI-Image-Edit Brings Spatial Intelligence to Open‑Source Image Editing

AIWalker

Apr 6, 2026 · Artificial Intelligence

BIPNet: Adaptive Progressive Upsampling Drives a Leap in Burst Image Restoration (TPAMI 2025)

The TPAMI 2025 paper introduces BIPNet, a unified burst‑image framework that tackles alignment, fusion, and upsampling challenges with edge‑enhanced alignment, pseudo‑burst feature fusion, and adaptive group upsampling, achieving state‑of‑the‑art results across super‑resolution, low‑light enhancement, and denoising while offering lightweight mobile variants.

BIPNetBurst Image ProcessingDenoising

0 likes · 13 min read

BIPNet: Adaptive Progressive Upsampling Drives a Leap in Burst Image Restoration (TPAMI 2025)

AIWalker

Apr 6, 2026 · Artificial Intelligence

How TIR‑Agent Turns Image‑Restoration Tools into a Learnable Decision‑Making Agent

The paper introduces TIR‑Agent, an image‑restoration agent that learns a tool‑calling policy via supervised fine‑tuning and reinforcement learning, addressing exploration stagnation and multi‑objective reward imbalance, and demonstrates over 2.5× faster inference and superior multi‑metric performance on synthetic and real degradation datasets.

agent-based AIcomputer visionimage restoration

0 likes · 18 min read

How TIR‑Agent Turns Image‑Restoration Tools into a Learnable Decision‑Making Agent

Data Party THU

Apr 1, 2026 · Artificial Intelligence

How SwiftTailor Accelerates Realistic 3D Garment Generation

SwiftTailor introduces a two‑stage, geometry‑centric framework that unifies pattern inference and mesh synthesis, dramatically cutting inference time to seconds while achieving state‑of‑the‑art accuracy and visual realism on the Multimodal GarmentCodeData benchmark for digital fashion.

3D garment generationAISwiftTailor

0 likes · 4 min read

How SwiftTailor Accelerates Realistic 3D Garment Generation

Machine Heart

Mar 30, 2026 · Artificial Intelligence

InfoTok: Information-Theoretic Adaptive Video Tokenizer Redefines Efficient Tokenization (ICLR 2026 Oral)

InfoTok, a collaborative effort by Stanford, NVIDIA Cosmos, and NUS, leverages information theory and an ELBO‑based router to allocate tokens adaptively, achieving 2.3× higher compression, 11× faster inference, and superior reconstruction quality on benchmarks such as TokenBench and DAVIS.

ELBOICLR 2026InfoTok

0 likes · 11 min read

InfoTok: Information-Theoretic Adaptive Video Tokenizer Redefines Efficient Tokenization (ICLR 2026 Oral)

Data Party THU

Mar 29, 2026 · Artificial Intelligence

How LoGeR Enables Minute‑Long 3D Reconstruction with Hybrid Memory

The article presents LoGeR, a long‑context geometric reconstruction framework that combines test‑time‑training memory and sliding‑window attention to achieve minute‑scale, fully‑feedforward 3D reconstruction with superior accuracy on benchmarks such as KITTI and VBR.

3D ReconstructionHybrid MemoryLoGeR

0 likes · 11 min read

How LoGeR Enables Minute‑Long 3D Reconstruction with Hybrid Memory

AIWalker

Mar 23, 2026 · Artificial Intelligence

Dynamic Dense Computing and Minimal End‑to‑End Design: YOLO-Master & YOLO26

By introducing a dynamic mixture‑of‑experts routing scheme and an end‑to‑end architecture that eliminates NMS and DFL, YOLO‑Master and YOLO26 dramatically cut compute waste and latency on edge devices, achieving up to 43% faster CPU inference while keeping model accuracy, with all code openly released.

Dynamic RoutingEdge AIMixture of Experts

0 likes · 7 min read

Dynamic Dense Computing and Minimal End‑to‑End Design: YOLO-Master & YOLO26

AI Frontier Lectures

Mar 19, 2026 · Artificial Intelligence

Can Circulant Attention Reduce Vision Transformer Cost by 7×?

The article reviews the AAAI 2026 paper "Vision Transformers are Circulant Attention Learners", explaining how modeling self‑attention as a Block‑Circulant matrix enables FFT‑based multiplication that cuts the quadratic complexity of standard attention, achieving up to seven‑fold inference speed‑up while preserving accuracy across ImageNet, COCO and ADE20K benchmarks.

BCCB MatrixCirculant AttentionEfficient Attention

0 likes · 15 min read

Can Circulant Attention Reduce Vision Transformer Cost by 7×?

AI Frontier Lectures

Mar 19, 2026 · Artificial Intelligence

Why Sharing Parameters in Vision Transformers Hurts Performance—and How Layer Specialization Fixes It

The article analyzes the hidden conflict between [CLS] and patch tokens in Vision Transformers, reveals how shared normalization and linear layers cause computational friction, and demonstrates that layer‑specific parameters dramatically improve dense prediction tasks without increasing inference FLOPs.

Dense PredictionLayer SpecializationSelf-Attention

0 likes · 9 min read

Why Sharing Parameters in Vision Transformers Hurts Performance—and How Layer Specialization Fixes It

AIWalker

Mar 18, 2026 · Artificial Intelligence

7× Faster Inference: Tsinghua’s Huang‑Gao Team Redesigns Vision‑Transformer Attention via Fourier Transforms

The AAAI 2026 paper by Tsinghua’s Huang‑Gao team shows that modeling Vision‑Transformer attention as a Block‑Circulant matrix and computing it with FFT reduces the quadratic complexity to O(N log N), delivering up to seven‑fold real‑world speedups without sacrificing accuracy.

AAAI 2026Circulant MatricesEfficiency

0 likes · 15 min read

7× Faster Inference: Tsinghua’s Huang‑Gao Team Redesigns Vision‑Transformer Attention via Fourier Transforms

SuanNi

Mar 16, 2026 · Artificial Intelligence

How NaLaFormer Revives Linear Attention with Query‑Norm Awareness

NaLaFormer introduces a norm‑aware linear attention mechanism that restores the query‑norm‑driven sharpness of softmax attention, achieving up to 7.5% higher ImageNet accuracy and 92% memory reduction in super‑resolution, while delivering strong results across classification, detection, segmentation, and language modeling tasks.

AILinear AttentionNaLaFormer

0 likes · 13 min read

How NaLaFormer Revives Linear Attention with Query‑Norm Awareness

Machine Learning Algorithms & Natural Language Processing

Mar 15, 2026 · Artificial Intelligence

A 17‑Year‑Old High‑Schooler Becomes First‑Author on a CVPR Paper

A 17‑year‑old high‑school student from Anhui Ansheng School led the first‑author CVPR 2026 paper "CraftMesh," a novel 3D mesh editing framework that combines image editing, mesh generation, and Poisson seamless fusion, achieving superior quantitative metrics and showcasing the growing impact of young researchers in top AI conferences.

3D mesh generationCVPRCraftMesh

0 likes · 7 min read

A 17‑Year‑Old High‑Schooler Becomes First‑Author on a CVPR Paper

AI Explorer

Mar 8, 2026 · Artificial Intelligence

Can a Pure‑Vision Model Redefine AI Perception? Inside ByteDance’s VideoWorld 2

ByteDance and Beijing Jiaotong University unveil VideoWorld 2, a visual‑only AI model that learns from massive video data without language mediation, promising richer detail retention, reduced bias, and a potential paradigm shift in how artificial intelligence perceives the world.

AI perceptionByteDanceVideoWorld 2

0 likes · 7 min read

Can a Pure‑Vision Model Redefine AI Perception? Inside ByteDance’s VideoWorld 2

AIWalker

Mar 7, 2026 · Artificial Intelligence

YOLO-Master v2026.02 Unveils Four Innovations for SOTA Object Detection

Tencent’s YOLO-Master v2026.02 adds a Mixture‑of‑Experts architecture, zero‑overhead LoRA fine‑tuning, Sparse SAHI inference for large images, and Cluster‑Weighted NMS, delivering 3‑5× faster inference, up to 70% reduced training resources, and markedly higher detection accuracy across diverse benchmarks.

LoRAMixture of ExpertsSparse Inference

0 likes · 15 min read

YOLO-Master v2026.02 Unveils Four Innovations for SOTA Object Detection

Code Mala Tang

Mar 5, 2026 · Artificial Intelligence

Master YOLOv12: A Step‑by‑Step Guide to Build, Train, and Deploy Custom Models

This tutorial walks readers through the fundamentals of YOLOv12, covering model variants, dataset preparation with Roboflow, optional FlashAttention acceleration, installation, model selection, training commands, post‑training tasks such as tracking, validation, inference, exporting to ONNX, and benchmarking, all with concrete code snippets and practical tips.

FlashAttentionPythonRoboflow

0 likes · 8 min read

Master YOLOv12: A Step‑by‑Step Guide to Build, Train, and Deploy Custom Models

Code Mala Tang

Mar 1, 2026 · Artificial Intelligence

Why YOLO Dominates Real-Time Object Detection: A Complete Guide

This article provides a comprehensive overview of the YOLO (You Only Look Once) algorithm, explaining its core principles, architecture, version history, training workflow, real‑world applications, strengths, and current limitations for modern computer‑vision tasks.

YOLOcomputer visiondeep learning

0 likes · 9 min read

Why YOLO Dominates Real-Time Object Detection: A Complete Guide

AIWalker

Feb 26, 2026 · Artificial Intelligence

Overcoming Vision Transformer Bottlenecks: The Plug‑and‑Play Upgrade of ViT‑5

ViT‑5 systematically revisits five years of Transformer architecture advances, introducing seven plug‑and‑play components—LayerScale, RMSNorm, GeLU, dual positional encodings, high‑frequency RoPE for register tokens, QK‑Norm, and bias‑free projections—that together raise ImageNet‑1k Top‑1 accuracy to 84.2% (Base) and achieve superior performance across classification, generation, and segmentation tasks.

ViT-5Vision Transformercomputer vision

0 likes · 14 min read

Overcoming Vision Transformer Bottlenecks: The Plug‑and‑Play Upgrade of ViT‑5

HyperAI Super Neural

Feb 22, 2026 · Artificial Intelligence

OCR Models Guide: DeepSeek, PaddlePaddle, Others for High Accuracy & Local Deployment

This article surveys the latest open‑source OCR models—including GLM‑OCR, PaddleOCR‑VL‑1.5, LightOnOCR‑2‑1B, DeepSeek‑OCR 2, and MonkeyOCR—detailing their architectures, benchmark scores on OmniDocBench, hardware requirements, and how to run them via online demos.

Model BenchmarkOCROpen Source

0 likes · 8 min read

OCR Models Guide: DeepSeek, PaddlePaddle, Others for High Accuracy & Local Deployment

Data Party THU

Feb 19, 2026 · Artificial Intelligence

How Data Priors and Scene Parameterization Boost 3D Indoor Reconstruction

This thesis investigates the two core challenges of data prior utilization and scene parameterization in multi‑view RGB‑based 3D indoor reconstruction, proposing novel representations and learning‑based methods to improve reconstruction quality, generalization, and applicability across AR, robotics, and autonomous navigation.

3D Reconstructioncomputer visiondata priors

0 likes · 8 min read

How Data Priors and Scene Parameterization Boost 3D Indoor Reconstruction

AI Algorithm Path

Feb 18, 2026 · Artificial Intelligence

Using Autoencoders for Industrial Defect Detection

This article explains how to train a simple fully‑connected autoencoder on defect‑free images, use reconstruction error to highlight anomalies in industrial parts, and convert the error into a single metric that cleanly separates good from defective components.

Anomaly DetectionAutoencoderKeras

0 likes · 7 min read

Using Autoencoders for Industrial Defect Detection

AI Cyberspace

Feb 13, 2026 · Artificial Intelligence

How Attention Mechanisms Revolutionized Computer Vision and Machine Translation

This article traces the evolution of attention mechanisms from their inaugural application in computer vision and machine translation to their central role in modern Transformer models, detailing the underlying RNN‑Attention designs, the breakthrough in sequence alignment, and the innovations that enabled high‑performance, parallelizable deep learning architectures.

Transformerattention mechanismcomputer vision

0 likes · 14 min read

How Attention Mechanisms Revolutionized Computer Vision and Machine Translation

php Courses

Dec 9, 2025 · Artificial Intelligence

How to Supercharge Your PHP Apps with AI: A Practical Guide

This guide explains why PHP applications need AI, outlines core AI use cases such as intelligent content processing, computer vision, personalization, and chatbots, and provides step‑by‑step implementation paths, tools, best‑practice recommendations, real‑world case studies, and future trends for developers.

AI integrationBest PracticesMachine Learning

0 likes · 10 min read

How to Supercharge Your PHP Apps with AI: A Practical Guide

Kuaishou Tech

Dec 4, 2025 · Artificial Intelligence

Can a Tree‑Reasoned Model Master Video Emotion Understanding?

The paper introduces VidEmo, a multimodal video foundation model that uses a two‑stage emotion‑clue‑guided reasoning framework and a large emotion‑centric dataset (Emo‑CFG) to achieve state‑of‑the‑art performance on facial attribute, expression, and fine‑grained emotion tasks, surpassing Gemini 2.0.

AIFoundation ModelMultimodal

0 likes · 15 min read

Can a Tree‑Reasoned Model Master Video Emotion Understanding?

Tencent Advertising Technology

Dec 4, 2025 · Artificial Intelligence

How POPEN Boosts LVLM Reasoning Segmentation with Preference Optimization and Ensemble

The paper introduces POPEN, a new framework that uses preference‑based optimization and ensemble methods to reduce hallucinations and improve segmentation accuracy in large visual language models, achieving state‑of‑the‑art results on multiple benchmarks.

LVLMPreference OptimizationSegmentation

0 likes · 14 min read

How POPEN Boosts LVLM Reasoning Segmentation with Preference Optimization and Ensemble

Sohu Smart Platform Tech Team

Nov 20, 2025 · Artificial Intelligence

How Hooop Turns HarmonyOS into an Offline AI Basketball Coach

Hooop leverages HarmonyOS's on‑device AI and custom vision algorithms to provide real‑time, offline basketball training by detecting shots, analyzing trajectories, automatically clipping scoring clips, and tracking performance metrics without an internet connection.

AIHarmonyOSVideo processing

0 likes · 12 min read

How Hooop Turns HarmonyOS into an Offline AI Basketball Coach

Tencent Technical Engineering

Nov 5, 2025 · Artificial Intelligence

iDetex: The Winning AI Model Transforming Image Quality Assessment

iDetex, the champion solution of the ICCV 2025 MIPI Detailed Image Quality Assessment Challenge, introduces a novel multimodal LLM-driven framework that precisely locates, describes, and grades image distortions, outperforming traditional IQA models and enabling practical deployments across video, live streaming, e‑commerce, and image‑processing pipelines.

AIICCV 2025computer vision

0 likes · 18 min read

iDetex: The Winning AI Model Transforming Image Quality Assessment

JD Tech Talk

Nov 4, 2025 · Artificial Intelligence

How AI-Powered Virtual Try-On Transforms Fashion E‑Commerce

The article explains how JD.com's AI virtual try‑on system Oxygen Tryon uses advanced computer‑vision and generative models to let shoppers instantly preview clothing on their own photos, dramatically improving purchase decisions, reducing return rates, and outlining technical challenges, innovations, and future development plans.

AIFashion E‑commercecomputer vision

0 likes · 7 min read

How AI-Powered Virtual Try-On Transforms Fashion E‑Commerce

JD Cloud Developers

Nov 4, 2025 · Artificial Intelligence

How AI-Powered Virtual Try‑On Is Revolutionizing Fashion E‑Commerce

The article explains how JD.com's AI try‑on system Oxygen Tryon uses advanced computer‑vision models to let shoppers instantly preview garments on their own photos, dramatically improving fit perception, reducing return rates, and outlining future technical and business expansions.

AIFashion E‑commercecomputer vision

0 likes · 6 min read

How AI-Powered Virtual Try‑On Is Revolutionizing Fashion E‑Commerce

AsiaInfo Technology: New Tech Exploration

Nov 4, 2025 · Artificial Intelligence

How Multimodal Large Models Are Revolutionizing Video Analysis

This article examines the evolution from single‑frame video analysis to multimodal large models, detailing their architecture, optimization techniques, experimental validation on edge devices, and practical scenarios, while highlighting current limitations and future directions for AI‑driven video understanding.

AILarge ModelsMultimodal

0 likes · 20 min read

How Multimodal Large Models Are Revolutionizing Video Analysis

AI Algorithm Path

Nov 1, 2025 · Artificial Intelligence

Deep Dive into Vision Transformer Patch Embedding Mechanisms

This article explains how Vision Transformers convert images into patch embeddings, compares flattening versus convolutional approaches, discusses position and CLS tokens, analyzes the effect of patch size, explores pixel‑level tokens, and contrasts ViT’s inductive bias with CNNs.

ConvolutionInductive BiasPatch Embedding

0 likes · 10 min read

Deep Dive into Vision Transformer Patch Embedding Mechanisms

JD Retail Technology

Oct 31, 2025 · Artificial Intelligence

How JD’s AI Try‑On “Oxygen Tryon” Revolutionizes Online Fashion Shopping

JD’s Oxygen Tryon leverages advanced AI, keypoint detection, and real‑time rendering to let shoppers virtually try on clothing, dramatically cutting return rates, boosting conversion, and outlining technical challenges, innovations, and future plans for broader fashion applications.

AI try-onFashion E‑commercecomputer vision

0 likes · 6 min read

How JD’s AI Try‑On “Oxygen Tryon” Revolutionizes Online Fashion Shopping

Liangxu Linux

Oct 29, 2025 · Artificial Intelligence

7 Must‑Try Open‑Source Tools for Remote Jobs, AI, and Dev Productivity

This article curates seven open‑source projects—including a remote‑work company list, a versatile file‑conversion platform, a personal finance manager, an AI‑powered resume optimizer, Claude Code resources, a computer‑vision toolbox, and a lightweight AI assistant—each with key features and GitHub links for easy adoption.

AI toolsFinancecomputer vision

0 likes · 7 min read

7 Must‑Try Open‑Source Tools for Remote Jobs, AI, and Dev Productivity

Network Intelligence Research Center (NIRC)

Oct 24, 2025 · Artificial Intelligence

Next‑Gen VR Interaction via Micro‑Gesture Recognition: The “MiaoKong Virtual Realm” Demo

At Beijing University of Posts and Telecommunications' 70th anniversary, the Network Intelligence Research Center showcased a micro‑gesture‑driven VR system that captures millimeter‑scale finger motions with high‑precision, low‑latency hand tracking, delivering efficient, fatigue‑reducing interactions and earning strong audience approval.

VR interactionXRcomputer vision

0 likes · 8 min read

Next‑Gen VR Interaction via Micro‑Gesture Recognition: The “MiaoKong Virtual Realm” Demo

Alimama Tech

Oct 22, 2025 · Artificial Intelligence

How Alibaba’s AIGC Model Revolutionizes Virtual Fashion Try‑On

This article details Alibaba’s Taobao Star fashion AIGC model, explaining its data pipeline, captioning strategy, multi‑stage training, and impressive virtual try‑on results for users and merchants, while showcasing model‑based and model‑free generation and pose‑transfer capabilities.

AIAIGCcomputer vision

0 likes · 11 min read

How Alibaba’s AIGC Model Revolutionizes Virtual Fashion Try‑On

Amap Tech

Oct 2, 2025 · Artificial Intelligence

How FantasyWorld Unifies Video Generation and 3D Geometry for Consistent Virtual Worlds

FantasyWorld introduces a geometry‑enhanced framework that augments a frozen video diffusion model with a trainable geometry branch, enabling simultaneous video representation and implicit 3D field generation, achieving spatially consistent, high‑quality virtual worlds and outperforming recent baselines in multi‑view coherence and geometric fidelity.

3D modelingVideo Generationcomputer vision

0 likes · 11 min read

How FantasyWorld Unifies Video Generation and 3D Geometry for Consistent Virtual Worlds

HyperAI Super Neural

Sep 29, 2025 · Artificial Intelligence

8 Popular Remote Sensing Object Detection Datasets with One-Click Downloads

This article presents a curated list of eight widely used remote sensing object detection datasets covering indoor scenes, landslides, drone imagery, crop diseases, safety vests, human fractures, urban issues, and plant diseases, each with size estimates and direct download links for researchers.

AIcomputer visiondatasets

0 likes · 10 min read

8 Popular Remote Sensing Object Detection Datasets with One-Click Downloads

Data Party THU

Sep 27, 2025 · Artificial Intelligence

How Depth-Guided Texture Diffusion Boosts Image Semantic Segmentation

This article reviews the depth‑guided texture diffusion method, detailing its texture extraction, diffusion, structural consistency optimization, and integration into segmentation networks, and shows how it narrows the depth‑RGB gap to achieve state‑of‑the‑art performance on various semantic segmentation tasks.

computer visiondepth-guided diffusionsemantic segmentation

0 likes · 13 min read

How Depth-Guided Texture Diffusion Boosts Image Semantic Segmentation

AntTech

Sep 25, 2025 · Artificial Intelligence

ICCV Spotlight: Pixel Tracing for Copy Detection and Skip-Vision Model Acceleration

The ICCV 2025 live session will deep‑dive into two cutting‑edge papers—PixTrace with CopyNCE for precise image copy detection and Skip‑Vision for dramatically faster training and inference of vision‑language models—showcasing their methods, results, and real‑world impact.

ICCV 2025Vision-Language Modelscomputer vision

0 likes · 5 min read

ICCV Spotlight: Pixel Tracing for Copy Detection and Skip-Vision Model Acceleration

Data Party THU

Sep 16, 2025 · Artificial Intelligence

How Dynamic Snake Convolution Boosts Tubular Segmentation and Infrared Small Target Detection

This article reviews two recent AI papers that introduce dynamic convolution kernels guided by geometric or statistical priors and adaptive loss mechanisms, demonstrating significant improvements in tubular structure segmentation and infrared small‑target detection across multiple 2D and 3D datasets.

Medical Image Segmentationcomputer visiondynamic convolution

0 likes · 6 min read

How Dynamic Snake Convolution Boosts Tubular Segmentation and Infrared Small Target Detection

AIWalker

Sep 2, 2025 · Artificial Intelligence

BEVANet’s Triple Boost for Real-Time Segmentation: Field, Edge, Speed

BEVANet tackles the efficiency‑accuracy trade‑off in real‑time semantic segmentation by integrating large‑kernel attention, an efficient visual attention (EVA) module, a bilateral architecture, and boundary‑guided adaptive fusion, delivering up to 81 % mIoU on Cityscapes at 33 FPS and surpassing prior state‑of‑the‑art models on both accuracy and speed.

Efficiencycomputer visionlarge kernel attention

0 likes · 19 min read

BEVANet’s Triple Boost for Real-Time Segmentation: Field, Edge, Speed

AntTech

Aug 21, 2025 · Artificial Intelligence

How the Mixture-of-Queries Transformer Tackles Camouflaged Instance Segmentation

The IJCAI 2025 paper showcase introduces the Mixture‑of‑Queries Transformer, a novel model that combines frequency‑domain feature enhancement with collaborative query decoding to achieve state‑of‑the‑art camouflaged instance segmentation across multiple datasets.

IJCAI 2025Transformercamouflaged segmentation

0 likes · 4 min read

How the Mixture-of-Queries Transformer Tackles Camouflaged Instance Segmentation

AIWalker

Aug 18, 2025 · Artificial Intelligence

UniConvNet: Expanding Effective Receptive Field for a SOTA CNN Vision Backbone (ICCV 2025)

UniConvNet introduces a three‑layer receptive‑field aggregator that combines small kernels to enlarge the effective receptive field while preserving its Gaussian distribution, achieving state‑of‑the‑art results on ImageNet‑1K, COCO2017 and ADE20K with only 30M parameters and 5.1G FLOPs.

CNNEffective Receptive FieldICCV2025

0 likes · 6 min read

UniConvNet: Expanding Effective Receptive Field for a SOTA CNN Vision Backbone (ICCV 2025)

AI Algorithm Path

Aug 16, 2025 · Artificial Intelligence

Meta Unveils DINOv3: A Universal Self‑Supervised Visual AI for All Image Tasks

Meta's DINOv3 is a 70‑billion‑parameter self‑supervised visual foundation model trained on 17 billion Instagram images without any labels, introducing dense feature extraction, Gram‑Anchoring to prevent feature collapse, high‑resolution adaptation, and multi‑student distillation that together enable out‑of‑the‑box performance on segmentation, depth estimation, 3D matching, and tracking while surpassing prior models such as DINOv2, CLIP, and SAM.

DINOv3Gram Anchoringcomputer vision

0 likes · 8 min read

Meta Unveils DINOv3: A Universal Self‑Supervised Visual AI for All Image Tasks

AIWalker

Aug 13, 2025 · Artificial Intelligence

One‑Model‑For‑All: Inception‑Level AI Try‑On/Off with Arbitrary Poses and No Masks

The paper presents OMFA, a diffusion‑based unified framework for virtual try‑on and try‑off that removes the need for garment templates, segmentation masks, and fixed poses by leveraging a novel partial‑diffusion mechanism and SMPL‑X pose conditioning, achieving state‑of‑the‑art results on VITON‑HD and DeepFashion‑MultiModal datasets.

AI try-onSMPL-Xcomputer vision

0 likes · 15 min read

One‑Model‑For‑All: Inception‑Level AI Try‑On/Off with Arbitrary Poses and No Masks

AIWalker

Aug 3, 2025 · Artificial Intelligence

Tree-Guided CNN Boosts Image Super-Resolution in Joint University Study

A collaborative team from five universities proposes a tree-structured convolutional neural network that leverages binary‑tree guidance, cosine cross‑domain extraction, and an adaptive Nesterov momentum optimizer to markedly improve image super‑resolution performance.

adaptive optimizercomputer visiondeep learning

0 likes · 5 min read

Tree-Guided CNN Boosts Image Super-Resolution in Joint University Study

Data Party THU

Jul 31, 2025 · Artificial Intelligence

How LaVin-DiT Revolutionizes Vision Generation with ST‑VAE and Joint Diffusion Transformer

The LaVin-DiT paper introduces a large‑scale vision diffusion transformer that combines a spatiotemporal variational auto‑encoder, a joint diffusion transformer with full‑sequence joint attention, and 3D rotary position encoding to enable unified, efficient generation across diverse visual tasks such as segmentation and video prediction.

3D RoPEVision Transformercomputer vision

0 likes · 11 min read

How LaVin-DiT Revolutionizes Vision Generation with ST‑VAE and Joint Diffusion Transformer

Sohu Tech Products

Jul 30, 2025 · Artificial Intelligence

How 3D Gaussian Splatting Enables Low‑Cost 3D Reconstruction from Simple Videos

This article explains how 3D Gaussian Splatting transforms ordinary video footage into high‑quality 3D reconstructions with minimal equipment, outlines the low‑cost workflow using ffmpeg and COLMAP, and discusses practical challenges and future possibilities for the technology.

3D ReconstructionCOLMAPGaussian splatting

0 likes · 5 min read

How 3D Gaussian Splatting Enables Low‑Cost 3D Reconstruction from Simple Videos

AI Frontier Lectures

Jul 26, 2025 · Artificial Intelligence

Training-Free Universal Virtual Try-On: OmniVTON’s Multi-Person Breakthrough

OmniVTON introduces a training‑free universal virtual try‑on framework that decouples garment texture and human pose, achieving high‑fidelity results across both in‑shop and in‑the‑wild scenarios, and uniquely supporting multi‑person virtual dressing, as demonstrated by extensive quantitative and qualitative experiments.

Artificial IntelligenceMulti-Personcomputer vision

0 likes · 9 min read

Training-Free Universal Virtual Try-On: OmniVTON’s Multi-Person Breakthrough

AI Frontier Lectures

Jul 17, 2025 · Artificial Intelligence

Top 8 Tencent Youtu Papers Accepted at ICCV 2025: Innovations in AI and Vision

The 20th ICCV conference announced 8 papers from Tencent Youtu Lab covering stylized face recognition, AI‑generated image detection, heterogeneous knowledge distillation, multi‑conditional diffusion, multimodal LLM distillation, palmprint recognition, low‑light vision, and oracle bone script decipherment, each pushing the frontier of computer vision and AI research.

Artificial IntelligenceICCV 2025Knowledge Distillation

0 likes · 17 min read

Top 8 Tencent Youtu Papers Accepted at ICCV 2025: Innovations in AI and Vision

AIWalker

Jul 15, 2025 · Artificial Intelligence

Dynamic Vision Mamba: Re‑ordering Pruning and Adaptive Block Selection Cut FLOPs by 35.2%

This article presents Dynamic Vision Mamba (DyVM), a method that tackles token and block redundancy in Mamba‑based visual models through a novel re‑ordering pruning strategy and dynamic block selection, achieving a 35.2% FLOPs reduction with only a 1.7% accuracy loss while demonstrating strong generalization across tasks and architectures.

Dynamic Block SelectionFLOPs ReductionToken Pruning

0 likes · 22 min read

Dynamic Vision Mamba: Re‑ordering Pruning and Adaptive Block Selection Cut FLOPs by 35.2%

Amap Tech

Jul 14, 2025 · Artificial Intelligence

How UPRE Achieves Zero-Shot Domain Adaptation for Object Detection with Unified Prompts

The UPRE paper, presented at ICCV, introduces a multi‑view domain prompt and a unified representation enhancement to enable zero‑shot domain adaptation for object detection, achieving state‑of‑the‑art performance across diverse weather, geographic, and synthetic‑to‑real scenarios.

computer visionobject detectionprompt engineering

0 likes · 10 min read

How UPRE Achieves Zero-Shot Domain Adaptation for Object Detection with Unified Prompts

Baidu Geek Talk

Jul 9, 2025 · Artificial Intelligence

PaddleOCR 3.1 Unveils Multilingual PP‑OCRv5, Document Translation, and MCP Server Integration

PaddleOCR 3.1 introduces three major upgrades—a multilingual PP‑OCRv5 model supporting 37 languages with over 30% accuracy gain, a PP‑DocTranslation pipeline for high‑quality multi‑language document translation, and MCP server support for flexible AI application integration—accompanied by detailed CLI usage, demo scenarios, and open‑source resources.

AIMCPOCR

0 likes · 11 min read

PaddleOCR 3.1 Unveils Multilingual PP‑OCRv5, Document Translation, and MCP Server Integration

AI Frontier Lectures

Jul 8, 2025 · Artificial Intelligence

How LaVin-DiT Unifies Vision Tasks with a Large Diffusion Transformer

The LaVin-DiT paper presents a large vision diffusion transformer that integrates a spatio‑temporal variational auto‑encoder, a joint diffusion transformer with full‑sequence joint attention, and 3D rotary position encoding to enable unified, efficient multi‑task generation for images and videos, and details its training via flow‑matching and experimental results.

3D RoPEGenerative ModelingJoint Diffusion Transformer

0 likes · 12 min read

How LaVin-DiT Unifies Vision Tasks with a Large Diffusion Transformer

Zhuanzhuan Tech

Jul 2, 2025 · Artificial Intelligence

How to Build an Image Similarity Search System with ResNet, Milvus, and YOLO

This article walks through the end‑to‑end process of building an image similarity solution—from vectorizing images with ResNet, storing high‑dimensional vectors in Milvus, using HNSW for fast ANN search, to applying YOLO for object detection and practical training tips.

HNSWMilvusResNet

0 likes · 15 min read

How to Build an Image Similarity Search System with ResNet, Milvus, and YOLO

Huolala Tech

Jul 2, 2025 · Artificial Intelligence

Can Diffusion Models Revolutionize Salient Object Detection?

This article introduces a diffusion‑based framework for salient object detection, discusses its background, challenges, and motivations, details the model architecture and training, presents extensive experiments and ablation studies, and outlines limitations and future research directions.

computer visiondeep learningdiffusion model

0 likes · 11 min read

Can Diffusion Models Revolutionize Salient Object Detection?

Qborfy AI

Jul 1, 2025 · Artificial Intelligence

Why CNNs Outperform Fully Connected Networks: A Deep Dive into Architecture and Applications

This article explains the fundamentals of convolutional neural networks (CNNs), detailing their definition, advantages over fully connected networks, architectural components such as input, hidden, and output layers, key operations like convolution, pooling, and activation, and showcases practical applications and notable insights.

Artificial IntelligenceCNNModel architecture

0 likes · 5 min read

Why CNNs Outperform Fully Connected Networks: A Deep Dive into Architecture and Applications

Amap Tech

Jun 30, 2025 · Artificial Intelligence

SeqGrowGraph: Chain-of-Graph Expansion for Precise Lane Topology

SeqGrowGraph introduces a novel chain-of-graph expansion framework that incrementally builds lane topology graphs using a Transformer-based autoregressive model, achieving state‑of‑the‑art performance on large autonomous‑driving datasets such as nuScenes and Argoverse 2 by accurately modeling complex road structures.

Autonomous DrivingSequence ModelingTransformer

0 likes · 10 min read

SeqGrowGraph: Chain-of-Graph Expansion for Precise Lane Topology

Rare Earth Juejin Tech Community

Jun 27, 2025 · Artificial Intelligence

Image Encryption, Watermarking, Detection & Green Screen Removal in Python

This tutorial walks through Python-based computer‑vision techniques—including XOR‑based image encryption, mask and ROI methods, digital watermark embedding via bit‑plane and LSB, sensitivity‑driven object detection, and HSV‑based green‑screen removal—providing complete code snippets and practical guidance for rapid AI‑assisted learning.

OpenCVPythoncomputer vision

0 likes · 17 min read

Image Encryption, Watermarking, Detection & Green Screen Removal in Python

AntTech

Jun 25, 2025 · Artificial Intelligence

CVPR 2025: Semi-Body Digital Humans, Video Upscaling, Mobile Super‑Res

In this CVPR 2025 showcase, Ant Group presents three cutting‑edge papers—EchoMimicV2 introducing an open‑source semi‑body digital human generation framework, RivuletMLP offering an efficient MLP‑based architecture for compressed video quality enhancement, and a quantized super‑resolution model that achieves real‑time 3× upscaling on mobile NPUs.

AICVPRcomputer vision

0 likes · 6 min read

CVPR 2025: Semi-Body Digital Humans, Video Upscaling, Mobile Super‑Res

AIWalker

Jun 24, 2025 · Artificial Intelligence

How Multimodal Fusion Accelerates Paper Publication: Key Insights and Resources

The article surveys 117 recent multimodal‑fusion papers, classifies them into improvement‑based and combination‑based approaches, highlights representative works such as TimeXL, OGP‑Net, MMR‑Mamba and FusionSight, and provides a free collection of papers, classic models and code repositories for researchers.

AI researchcomputer visiondeep learning

0 likes · 8 min read

How Multimodal Fusion Accelerates Paper Publication: Key Insights and Resources

AI Algorithm Path

Jun 20, 2025 · Artificial Intelligence

Beginner’s Guide to Visual Language Models – Day 1: What They Are and Why They Matter

This article introduces visual‑language models (VLMs), explaining how they combine large language models with visual encoders, why they overcome the rigidity of traditional computer‑vision systems, their key advantages, modular architecture, training methods, and practical applications such as image captioning and visual question answering.

AI applicationscomputer visioncontrastive learning

0 likes · 8 min read

Beginner’s Guide to Visual Language Models – Day 1: What They Are and Why They Matter

AntTech

Jun 15, 2025 · Artificial Intelligence

21 Ant Research Papers Shaping CVPR 2025: AI Image & Video Generation Breakthroughs

The Interactive Intelligence Lab of Ant Technology Research Institute presented 21 accepted CVPR 2025 papers covering visual generation, editing, 3D vision, digital humans and multimodal AI, highlighting tools such as MagicQuill, Lumos, Aurora, FLARE, LeviTor, MangaNinja, AniDoc, Mimir, AvatarArtist, DiffListener, MotionStone, TensorialGaussianAvatars, DualTalk, CompreCap and Uni-AD.

CVPR2025Video Generationcomputer vision

0 likes · 20 min read

21 Ant Research Papers Shaping CVPR 2025: AI Image & Video Generation Breakthroughs

AI Frontier Lectures

Jun 14, 2025 · Industry Insights

CVPR 2025 Awards Unveiled: Breakthrough Papers and Rising Stars

The CVPR 2025 awards spotlight groundbreaking research, honoring young scholars and top papers such as VGGT, Neural Inverse Rendering, and several honorable mentions, while summarizing each work's core contributions, methodologies, and potential impact on computer vision and related fields.

2025CVPRPaper Awards

0 likes · 13 min read

CVPR 2025 Awards Unveiled: Breakthrough Papers and Rising Stars

AI Frontier Lectures

Jun 14, 2025 · Industry Insights

CVPR 2025 Awards Unveiled: Best Papers, Young Researchers, and Industry Highlights

The CVPR 2025 conference announced record‑breaking submission numbers, awarded a Best Paper to VGGT, honored young researchers and student papers, listed multiple honorary nominations, and highlighted the strong presence of Chinese institutions across the award candidates.

Award SummaryBest PaperCVPR 2025

0 likes · 12 min read

CVPR 2025 Awards Unveiled: Best Papers, Young Researchers, and Industry Highlights

Kuaishou Tech

Jun 10, 2025 · Artificial Intelligence

Can MaIR’s Locality‑Preserving Mamba Boost Image Restoration?

The article presents MaIR, a locality‑ and continuity‑preserving Mamba‑based model for image restoration, detailing its three‑stage architecture, novel scanning strategy, loss functions, experimental results on super‑resolution and denoising, and ablation studies, with links to the arXiv paper and source code.

DenoisingMambacomputer vision

0 likes · 5 min read

Can MaIR’s Locality‑Preserving Mamba Boost Image Restoration?

AI Frontier Lectures

Jun 3, 2025 · Artificial Intelligence

How MaIR Advances Image Restoration with a Locality‑Preserving Mamba Architecture

The article presents MaIR, a Mamba‑based image restoration model that preserves locality and continuity, detailing its architecture, scanning strategies, loss functions, experimental results on super‑resolution and denoising, and an ablation study, while providing links to the arXiv paper and GitHub source code.

DenoisingMambacomputer vision

0 likes · 5 min read

How MaIR Advances Image Restoration with a Locality‑Preserving Mamba Architecture

JD Tech

May 26, 2025 · Artificial Intelligence

Solving Technical Challenges at JD Retail: Multi‑Reward Models, LLM‑Based Query Expansion, Model Pruning, and Reinforcement Learning

This article details how JD Retail's young algorithm engineers tackled a series of AI engineering problems—including advertising image quality assessment with multi‑reward models, large‑language‑model‑driven query expansion, FFT‑and‑RDP‑based model pruning, and agent‑centric reinforcement learning—while sharing practical growth insights and code snippets.

AIcomputer visionlarge language models

0 likes · 15 min read

Solving Technical Challenges at JD Retail: Multi‑Reward Models, LLM‑Based Query Expansion, Model Pruning, and Reinforcement Learning

JD Tech

May 20, 2025 · Artificial Intelligence

How Re‑parameterization and Adaptive Learning Boost Visual Deep Learning Efficiency

The award‑winning project from Tsinghua University and JD Retail introduces re‑parameterization model design, cross‑scene adaptive learning, and platform‑aware compression to overcome accuracy‑efficiency trade‑offs in visual deep learning, achieving over 20% accuracy gains and more than 50% inference speedup in real‑world e‑commerce deployments.

AI researchModel Compressionadaptive models

0 likes · 6 min read

How Re‑parameterization and Adaptive Learning Boost Visual Deep Learning Efficiency

Network Intelligence Research Center (NIRC)

May 19, 2025 · Artificial Intelligence

How 3D Modeling Powers Digital Humans: From 3DMM to NeRF

The article explains what digital humans are, reviews the evolution of 3D modeling techniques—from early 2D hand‑drawn methods and 3DMM to deep‑learning‑based implicit models like NeRF—and discusses current challenges and future research directions.

3D modeling3DMMNeRF

0 likes · 7 min read

How 3D Modeling Powers Digital Humans: From 3DMM to NeRF

AIWalker

May 18, 2025 · Artificial Intelligence

YOLOE: Open‑Source Real‑Time Anything Detector Beats YOLO‑World v2

YOLOE unifies object detection and segmentation in a single efficient model that supports text, visual, and prompt‑free inference, introduces RepRTA, SAVPE, and LRPC strategies, and achieves higher AP with up to three‑fold lower training cost and 1.4× faster inference on GPUs and mobile devices, as demonstrated by extensive LVIS and COCO experiments.

YOLOEcomputer visionobject detection

0 likes · 29 min read

YOLOE: Open‑Source Real‑Time Anything Detector Beats YOLO‑World v2

DaTaobao Tech

May 16, 2025 · Artificial Intelligence

JianYi: AI‑Powered Image Segmentation and Matting System for Taobao Home‑Decoration

The article introduces JianYi, a self‑developed image segmentation and matting system for Taobao's home‑decoration business that supports product, human, and panoramic segmentation with multi‑modal interaction, achieving high‑precision real‑time performance and powering AI tools such as "Jiazuo" and "Fang Wo Jia".

Artificial Intelligencecomputer visiondeep learning

0 likes · 11 min read

JianYi: AI‑Powered Image Segmentation and Matting System for Taobao Home‑Decoration

Bilibili Tech

May 16, 2025 · Artificial Intelligence

How FineVQ Sets New Standards for Fine‑Grained UGC Video Quality Assessment

The article introduces FineVD, the first large‑scale multi‑dimensional UGC video quality dataset, and presents FineVQ, a unified model that predicts quality scores, attributes, and distortion types across six dimensions, achieving state‑of‑the‑art performance on multiple benchmarks and cross‑dataset evaluations.

FineVQMultimodalUGC

0 likes · 9 min read

How FineVQ Sets New Standards for Fine‑Grained UGC Video Quality Assessment

AI Frontier Lectures

May 15, 2025 · Artificial Intelligence

OverLoCK: How a Bio‑Inspired Three‑Stage ConvNet Beats Transformers on Vision Tasks

OverLoCK introduces a bio‑inspired depth‑stage decomposition that splits a network into Base‑Net, Overview‑Net and Focus‑Net, and a novel Context‑Mix dynamic convolution, achieving state‑of‑the‑art accuracy on image classification, detection and segmentation while balancing speed and model size.

ConvNetcomputer vision

0 likes · 11 min read

OverLoCK: How a Bio‑Inspired Three‑Stage ConvNet Beats Transformers on Vision Tasks

AI Frontier Lectures

May 15, 2025 · Artificial Intelligence

DefMamba: How Deformable Scanning Boosts Vision State‑Space Models

DefMamba introduces a deformable visual state‑space model that dynamically adjusts scanning paths and reference points, preserving spatial structure and improving feature capture, achieving state‑of‑the‑art results on ImageNet classification, COCO detection, and ADE20K segmentation while reducing computational cost.

DefMambaDeformable ScanningState Space Model

0 likes · 23 min read

DefMamba: How Deformable Scanning Boosts Vision State‑Space Models

AIWalker

May 14, 2025 · Artificial Intelligence

How HGO‑YOLO Achieves 87.4% Accuracy at 56 FPS with Only 4.6 MB Parameters

This paper presents HGO‑YOLO, a lightweight real‑time anomaly‑behavior detector that integrates HGNetv2 and GhostConv into YOLOv8, achieving 87.4% mAP with just 4.6 MB of parameters and 56 FPS on CPU, and validates its performance across multiple datasets and hardware platforms.

Anomaly DetectionEdge AILightweight Models

0 likes · 25 min read

How HGO‑YOLO Achieves 87.4% Accuracy at 56 FPS with Only 4.6 MB Parameters

AIWalker

May 13, 2025 · Artificial Intelligence

PixelHacker: Diffusion‑Based Image Inpainting with Latent Class Guidance Beats SOTA

PixelHacker introduces a latent class guidance (LCG) paradigm that injects foreground and background embeddings into a diffusion model, training on 14 million image‑mask pairs and achieving state‑of‑the‑art structural and semantic consistency across Places2, CelebA‑HQ and FFHQ benchmarks.

PixelHackerSOTAcomputer vision

0 likes · 16 min read

PixelHacker: Diffusion‑Based Image Inpainting with Latent Class Guidance Beats SOTA

AIWalker

May 12, 2025 · Artificial Intelligence

DefMamba: A Deformable Multi‑Scale Visual Foundation Model that Boosts Vision Tasks

DefMamba introduces a multi‑scale backbone, deformable Mamba modules, and a dynamic scanning strategy to preserve image spatial structure, achieving state‑of‑the‑art performance on image classification, object detection, and semantic segmentation benchmarks.

DefMambaImage Classificationcomputer vision

0 likes · 23 min read

DefMamba: A Deformable Multi‑Scale Visual Foundation Model that Boosts Vision Tasks

Meituan Technology Team

Apr 24, 2025 · Artificial Intelligence

Meituan AI Recruitment: Join Our Advanced Technology Teams

Meituan's AI recruitment page showcases diverse opportunities across AI infrastructure, intelligent interaction, visual intelligence, and intelligent products, featuring roles from algorithm engineers to product managers working on cutting-edge technologies including large models, intelligent agents, and multimodal systems.

AI RecruitmentIntelligent agentsLarge Models

0 likes · 5 min read

Meituan AI Recruitment: Join Our Advanced Technology Teams

php Courses

Apr 23, 2025 · Artificial Intelligence

Real-Time Face Recognition with PHP and OpenCV

This article explains how to set up a PHP environment, control a camera, and use the OpenCV library to perform real‑time face detection and recognition with code examples, demonstrating a practical security solution for applications such as access control and surveillance.

OpenCVPHPcomputer vision

0 likes · 6 min read

Real-Time Face Recognition with PHP and OpenCV

Liangxu Linux

Apr 22, 2025 · Artificial Intelligence

Top 10 Open-Source OCR Projects on GitHub Ranked by Stars

This article compiles a ranked list of ten popular open-source OCR projects on GitHub, summarizing each tool’s key capabilities—such as multimodal text extraction, PDF linearization, layout analysis, and multilingual support—along with star counts and direct repository links for developers seeking ready-to-use OCR solutions.

GitHubMultimodalOCR

0 likes · 9 min read

Top 10 Open-Source OCR Projects on GitHub Ranked by Stars