Artificial Intelligence 7 min read

Venus‑DeFakerOne: Breaking Forgery Boundaries with a Unified Fake Image Detection Model

Venus‑DeFakerOne introduces a unified foundation model that combines global semantics and fine‑grained pixel analysis to detect and localize diverse image forgeries, leveraging a 12.5 M mixed dataset, MLLM‑SAM2 architecture, and achieving top scores on 39 detection and 9 localization benchmarks with 95.8% accuracy on the GPT‑Image‑2 attack suite.

AntTech

Jun 1, 2026

Venus‑DeFakerOne: Breaking Forgery Boundaries with a Unified Fake Image Detection Model

Recent advances in generative models such as GPT‑Image‑2 and Nano Banana 2 have blurred the lines between traditional forgery categories—document tampering, natural‑image editing, DeepFake faces, and full‑image synthesis—making detection systems increasingly fragmented.

The authors identify three core challenges for a universal AI forensic judge: (1) Paradigm fragmentation —existing detectors specialize in single domains and cannot handle mixed‑type attacks; (2) Data silos —different forgery traces across documents, faces, and scenes lack a unified modeling approach, leading to poor cross‑domain generalization; (3) Supervision granularity loss —most methods output only a binary real/false label without pixel‑level localization or explainability.

To address these issues, Ant Group’s GuangJian team proposes DeFakerOne , a unified Fake Image Detection and Localization (FIDL) foundation model built on a data‑driven, global‑and‑fine‑grained design. The architecture integrates a multimodal large language model (MLLM) based on InternVL2 with the segmentation model SAM2, enabling simultaneous image‑level classification and pixel‑level mask generation.

Key innovations include:

Paradigm unification : a single model learns from four forensic domains using a 12.5 M “commercial + open‑source” mixed training set, demonstrating that cross‑domain collaborative learning transfers capabilities and mitigates feature interference among sub‑domains.

Data evolution pipeline : an automated pipeline driven by an intelligent agent generates high‑quality hard examples through a “hard‑example mining → reverse analysis → targeted synthesis → iterative optimization” loop, converting challenging samples into core training material.

Collaborative perception : the SA2VA multi‑task framework treats binary detection as a visual‑question‑answer (VQA) task, allowing the model to reason about forgery motives, while SAM2 extracts multi‑scale hierarchical features and fuses them with InternVL2’s dedicated segmentation token via cross‑attention, reinforcing detection accuracy with refined masks.

Extensive experiments show that DeFakerOne leads on 39 anti‑forgery detection benchmarks and tops 9 localization benchmarks. It maintains robust performance under Gaussian blur, illumination changes, and JPEG compression, and achieves 95.8% accuracy on the GPT‑Image‑2‑Bench, thanks to its joint capture of global semantic logic and local pixel anomalies.

Looking forward, the authors envision a unified, robust, and explainable image‑trust analysis capability that can serve social media platforms, financial and governmental document verification, news media image provenance, and broader AI‑generated content governance, shifting the question from “Which detector?” to “Is the image trustworthy, where is it untrustworthy, and why?”

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

multimodal LLM deepfake SAM2 fake image detection forensic AI Venus-DeFakerOne

Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.