Venus‑DeFakerOne: Breaking Forgery Boundaries with a Unified Fake Image Detection Model

Venus‑DeFakerOne introduces a unified foundation model that combines global semantics and fine‑grained pixel analysis to detect and localize diverse image forgeries, leveraging a 12.5 M mixed dataset, MLLM‑SAM2 architecture, and achieving top scores on 39 detection and 9 localization benchmarks with 95.8% accuracy on the GPT‑Image‑2 attack suite.

AntTech
AntTech
AntTech
Venus‑DeFakerOne: Breaking Forgery Boundaries with a Unified Fake Image Detection Model

Recent advances in generative models such as GPT‑Image‑2 and Nano Banana 2 have blurred the lines between traditional forgery categories—document tampering, natural‑image editing, DeepFake faces, and full‑image synthesis—making detection systems increasingly fragmented.

The authors identify three core challenges for a universal AI forensic judge: (1) Paradigm fragmentation —existing detectors specialize in single domains and cannot handle mixed‑type attacks; (2) Data silos —different forgery traces across documents, faces, and scenes lack a unified modeling approach, leading to poor cross‑domain generalization; (3) Supervision granularity loss —most methods output only a binary real/false label without pixel‑level localization or explainability.

To address these issues, Ant Group’s GuangJian team proposes DeFakerOne , a unified Fake Image Detection and Localization (FIDL) foundation model built on a data‑driven, global‑and‑fine‑grained design. The architecture integrates a multimodal large language model (MLLM) based on InternVL2 with the segmentation model SAM2, enabling simultaneous image‑level classification and pixel‑level mask generation.

Key innovations include:

Paradigm unification : a single model learns from four forensic domains using a 12.5 M “commercial + open‑source” mixed training set, demonstrating that cross‑domain collaborative learning transfers capabilities and mitigates feature interference among sub‑domains.

Data evolution pipeline : an automated pipeline driven by an intelligent agent generates high‑quality hard examples through a “hard‑example mining → reverse analysis → targeted synthesis → iterative optimization” loop, converting challenging samples into core training material.

Collaborative perception : the SA2VA multi‑task framework treats binary detection as a visual‑question‑answer (VQA) task, allowing the model to reason about forgery motives, while SAM2 extracts multi‑scale hierarchical features and fuses them with InternVL2’s dedicated segmentation token via cross‑attention, reinforcing detection accuracy with refined masks.

Extensive experiments show that DeFakerOne leads on 39 anti‑forgery detection benchmarks and tops 9 localization benchmarks. It maintains robust performance under Gaussian blur, illumination changes, and JPEG compression, and achieves 95.8% accuracy on the GPT‑Image‑2‑Bench, thanks to its joint capture of global semantic logic and local pixel anomalies.

Looking forward, the authors envision a unified, robust, and explainable image‑trust analysis capability that can serve social media platforms, financial and governmental document verification, news media image provenance, and broader AI‑generated content governance, shifting the question from “Which detector?” to “Is the image trustworthy, where is it untrustworthy, and why?”

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

multimodal LLMdeepfakeSAM2fake image detectionforensic AIVenus-DeFakerOne
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.