Author

AIWalker

Focused on computer vision, image processing, color science, and AI algorithms; sharing hardcore tech, engineering practice, and deep insights as a diligent AI technology practitioner.

162

Articles

Likes

Views

Comments

Latest from AIWalker

100 recent articles max

AIWalker

May 22, 2025 · Artificial Intelligence

VisionReasoner: RL‑Unified System Beats YOLO‑World on Detection, Segmentation, Counting

VisionReasoner introduces a reinforcement‑learning‑driven unified framework that simultaneously handles detection, segmentation, and counting tasks within a single model, achieving 29.1% higher COCO detection AP, 22.1% better ReasonSeg segmentation, and 15.3% improvement on CountBench, while requiring only 7,000 training samples and offering efficient multi‑target matching via batch computation and the Hungarian algorithm.

LVLMObject CountingVisionReasoner

0 likes · 19 min read

VisionReasoner: RL‑Unified System Beats YOLO‑World on Detection, Segmentation, Counting

AIWalker

May 22, 2025 · Artificial Intelligence

174 Innovative Attention Mechanism Tweaks Backed by Leading Researchers (Yao Qizhi Included)

This article surveys 174 recent attention‑mechanism modifications from 2024‑2025, summarizing each paper’s core idea, performance gains, and code links while offering practical guidance on selecting and integrating the right attention variant for tasks such as object detection.

AITransformersattention mechanisms

0 likes · 9 min read

174 Innovative Attention Mechanism Tweaks Backed by Leading Researchers (Yao Qizhi Included)

AIWalker

May 18, 2025 · Artificial Intelligence

YOLOE: Open‑Source Real‑Time Anything Detector Beats YOLO‑World v2

YOLOE unifies object detection and segmentation in a single efficient model that supports text, visual, and prompt‑free inference, introduces RepRTA, SAVPE, and LRPC strategies, and achieves higher AP with up to three‑fold lower training cost and 1.4× faster inference on GPUs and mobile devices, as demonstrated by extensive LVIS and COCO experiments.

YOLOEcomputer visionobject detection

0 likes · 29 min read

YOLOE: Open‑Source Real‑Time Anything Detector Beats YOLO‑World v2

AIWalker

May 16, 2025 · Artificial Intelligence

GPDiT Sets New SOTA in Video Generation with Faster, Unified Diffusion‑Autoregressive Framework

GPDiT, a novel autoregressive diffusion transformer, unifies diffusion and autoregressive modeling for video generation, introducing lightweight causal attention and a parameter‑free rotation‑based time conditioning that boost temporal consistency and cut training/inference costs, achieving state‑of‑the‑art results on multiple benchmarks.

Video Generationautoregressive modelingcausal attention

0 likes · 16 min read

GPDiT Sets New SOTA in Video Generation with Faster, Unified Diffusion‑Autoregressive Framework

AIWalker

May 14, 2025 · Artificial Intelligence

Guided Image Filtering Explained: Insights from He Kaiming’s Classic Paper

This article reviews the Guided Image Filtering technique introduced by He Kaiming, compares it with other edge‑preserving filters, provides detailed OpenCV C++ and Python implementations, discusses the fast variant, analyzes computational complexity, and showcases visual results.

Edge PreservingGuided FilteringOpenCV

0 likes · 10 min read

Guided Image Filtering Explained: Insights from He Kaiming’s Classic Paper

AIWalker

May 14, 2025 · Artificial Intelligence

How HGO‑YOLO Achieves 87.4% Accuracy at 56 FPS with Only 4.6 MB Parameters

This paper presents HGO‑YOLO, a lightweight real‑time anomaly‑behavior detector that integrates HGNetv2 and GhostConv into YOLOv8, achieving 87.4% mAP with just 4.6 MB of parameters and 56 FPS on CPU, and validates its performance across multiple datasets and hardware platforms.

Anomaly DetectionEdge AILightweight Models

0 likes · 25 min read

How HGO‑YOLO Achieves 87.4% Accuracy at 56 FPS with Only 4.6 MB Parameters

AIWalker

May 13, 2025 · Artificial Intelligence

PixelHacker: Diffusion‑Based Image Inpainting with Latent Class Guidance Beats SOTA

PixelHacker introduces a latent class guidance (LCG) paradigm that injects foreground and background embeddings into a diffusion model, training on 14 million image‑mask pairs and achieving state‑of‑the‑art structural and semantic consistency across Places2, CelebA‑HQ and FFHQ benchmarks.

PixelHackerSOTAcomputer vision

0 likes · 16 min read

PixelHacker: Diffusion‑Based Image Inpainting with Latent Class Guidance Beats SOTA

AIWalker

May 12, 2025 · Artificial Intelligence

DefMamba: A Deformable Multi‑Scale Visual Foundation Model that Boosts Vision Tasks

DefMamba introduces a multi‑scale backbone, deformable Mamba modules, and a dynamic scanning strategy to preserve image spatial structure, achieving state‑of‑the‑art performance on image classification, object detection, and semantic segmentation benchmarks.

DefMambaImage Classificationcomputer vision

0 likes · 23 min read

DefMamba: A Deformable Multi‑Scale Visual Foundation Model that Boosts Vision Tasks

AIWalker

May 11, 2025 · Artificial Intelligence

Unified Multimodal Understanding and Generation: A 30K‑Word Survey of Recent Advances

This comprehensive survey reviews the rapid progress of multimodal understanding and text‑to‑image generation models, categorises existing unified architectures into diffusion‑based, autoregressive, and hybrid paradigms, analyses their tokenisation strategies, datasets and benchmarks, and highlights current challenges and future research directions.

Autoregressive ModelsSurveybenchmarks

0 likes · 64 min read

Unified Multimodal Understanding and Generation: A 30K‑Word Survey of Recent Advances

AIWalker

May 6, 2025 · Artificial Intelligence

SimpleAR: High‑Quality 1024×1024 Images with Just 0.5B Parameters via Pretraining, SFT, and RL

SimpleAR demonstrates that a vanilla autoregressive model with only 0.5 B parameters can generate high‑fidelity 1024×1024 images, covering pretraining, supervised fine‑tuning, and reinforcement learning, achieving competitive GenEval (0.59) and DPG‑Bench (79.66) scores while reducing inference time to about 14 seconds with vLLM and KV‑cache optimizations.

Supervised Fine‑Tuningautoregressivebenchmark

0 likes · 14 min read

SimpleAR: High‑Quality 1024×1024 Images with Just 0.5B Parameters via Pretraining, SFT, and RL