Author

AIWalker

Focused on computer vision, image processing, color science, and AI algorithms; sharing hardcore tech, engineering practice, and deep insights as a diligent AI technology practitioner.

162

Articles

Likes

Views

Comments

Latest from AIWalker

100 recent articles max

AIWalker

Mar 8, 2026 · Artificial Intelligence

FireRed-Image-Edit v1.1 Boosts OOTD Element Fusion and Portrait Consistency

The Super Intelligence team at Xiaohongshu unveils FireRed-Image-Edit v1.1, an open‑source image‑editing model that dramatically improves ID‑consistent edits, multi‑element OOTD fusion, portrait makeup, and font style rendering while delivering end‑to‑end generation in 4.5 seconds on 30 GB VRAM, backed by a full training‑distillation pipeline and a technical report on arXiv.

AI modelFireRed-Image-EditLoRA

0 likes · 10 min read

FireRed-Image-Edit v1.1 Boosts OOTD Element Fusion and Portrait Consistency

AIWalker

Mar 7, 2026 · Artificial Intelligence

YOLO-Master v2026.02 Unveils Four Innovations for SOTA Object Detection

Tencent’s YOLO-Master v2026.02 adds a Mixture‑of‑Experts architecture, zero‑overhead LoRA fine‑tuning, Sparse SAHI inference for large images, and Cluster‑Weighted NMS, delivering 3‑5× faster inference, up to 70% reduced training resources, and markedly higher detection accuracy across diverse benchmarks.

LoRAMixture of ExpertsSparse Inference

0 likes · 15 min read

YOLO-Master v2026.02 Unveils Four Innovations for SOTA Object Detection

AIWalker

Mar 6, 2026 · Artificial Intelligence

VA‑π: Pixel‑Level Alignment Achieves 50% FID Reduction with 25‑Minute Fine‑Tuning

The paper introduces VA‑π, a lightweight post‑training framework that aligns pixel‑level reconstruction with autoregressive generation using variational inference and reinforcement learning, achieving up to 50% FID reduction after just 25 minutes of fine‑tuning on LlamaGen‑XXL.

AR ModelsPixel AlignmentVision Generation

0 likes · 14 min read

VA‑π: Pixel‑Level Alignment Achieves 50% FID Reduction with 25‑Minute Fine‑Tuning

AIWalker

Mar 5, 2026 · Artificial Intelligence

How ViDA-UGC Leverages Large Multimodal Models for Fine-Grained Visual Quality Assessment

The article introduces ViDA-UGC, a large‑scale UGC visual‑quality dataset and its companion benchmark ViDA‑Bench, explains the MILP‑driven sampling, expert annotation pipeline, and CoT‑based evaluation framework, and shows how fine‑tuning popular multimodal LLMs on this data markedly improves low‑level quality perception, grounding, and description capabilities.

benchmarkchain-of-thoughtdataset

0 likes · 12 min read

How ViDA-UGC Leverages Large Multimodal Models for Fine-Grained Visual Quality Assessment

AIWalker

Mar 4, 2026 · Artificial Intelligence

Drifting Models Enable One‑Step Generation, Shattering Speed Records

The paper introduces Drifting Models, a new generative paradigm that moves the distribution evolution to the training phase, achieving true one‑step (1‑NFE) generation with state‑of‑the‑art ImageNet FID scores of 1.54 in latent space and 1.61 in pixel space, while eliminating the need for distillation or classifier‑free guidance.

Drifting ModelsGenerative ModelingImageNet

0 likes · 24 min read

Drifting Models Enable One‑Step Generation, Shattering Speed Records

AIWalker

Mar 3, 2026 · Artificial Intelligence

How NanoSD Cuts 90% Parameters to Enable Real‑Time Photo Editing on Mobile

NanoSD distills Stable Diffusion 1.5 into a 130 M‑parameter model that runs inference in 20 ms on a Qualcomm SM8750 NPU, using hardware‑aware module pruning, module‑level knowledge distillation, and Bayesian optimization to achieve Pareto‑optimal quality‑efficiency trade‑offs for on‑device image restoration.

Bayesian OptimizationKnowledge DistillationModel Compression

0 likes · 14 min read

How NanoSD Cuts 90% Parameters to Enable Real‑Time Photo Editing on Mobile

AIWalker

Mar 3, 2026 · Artificial Intelligence

RetouchIQ’s Instruction‑Driven AI Editing Overcomes Traditional Retouching Limits

RetouchIQ introduces an instruction‑driven AI retouching system that uses a general reward model to interpret abstract user commands, delivering precise image adjustments with higher semantic consistency and visual naturalness than existing multimodal large language models, thereby lowering the technical barrier for cinematic‑style edits.

AI Image EditingRetouchIQReward Model

0 likes · 3 min read

RetouchIQ’s Instruction‑Driven AI Editing Overcomes Traditional Retouching Limits

AIWalker

Mar 1, 2026 · Artificial Intelligence

How X2HDR Enables AI to Achieve True Transparent HDR Imaging

X2HDR tackles the long‑standing HDR generation problem by converting color data into a perceptual uniform space and applying LoRA lightweight fine‑tuning, dramatically boosting visual fidelity while slashing data and compute demands for film, gaming, and VR.

AIHDR imagingLoRA

0 likes · 3 min read

How X2HDR Enables AI to Achieve True Transparent HDR Imaging

AIWalker

Feb 27, 2026 · Artificial Intelligence

YOLO26 Review: End-to-End, NMS‑Free Edge AI Boosts CPU Inference by 43%

This article analyzes YOLO26’s architecture redesign that eliminates NMS, removes DFL, introduces progressive loss balancing, STAL, and the MuSGD optimizer, achieving up to 43% faster CPU inference and simplifying deployment for edge vision tasks across detection, segmentation, classification, pose estimation, and OBB.

CPU inferenceModel DeploymentNMS-free

0 likes · 13 min read

YOLO26 Review: End-to-End, NMS‑Free Edge AI Boosts CPU Inference by 43%

AIWalker

Feb 26, 2026 · Artificial Intelligence

Overcoming Vision Transformer Bottlenecks: The Plug‑and‑Play Upgrade of ViT‑5

ViT‑5 systematically revisits five years of Transformer architecture advances, introducing seven plug‑and‑play components—LayerScale, RMSNorm, GeLU, dual positional encodings, high‑frequency RoPE for register tokens, QK‑Norm, and bias‑free projections—that together raise ImageNet‑1k Top‑1 accuracy to 84.2% (Base) and achieve superior performance across classification, generation, and segmentation tasks.

ViT-5Vision Transformercomputer vision

0 likes · 14 min read

Overcoming Vision Transformer Bottlenecks: The Plug‑and‑Play Upgrade of ViT‑5