FireRed-Image-Edit v1.1 Boosts OOTD Element Fusion and Portrait Consistency
The Super Intelligence team at Xiaohongshu unveils FireRed-Image-Edit v1.1, an open‑source image‑editing model that dramatically improves ID‑consistent edits, multi‑element OOTD fusion, portrait makeup, and font style rendering while delivering end‑to‑end generation in 4.5 seconds on 30 GB VRAM, backed by a full training‑distillation pipeline and a technical report on arXiv.
Model Overview
FireRed-Image-Edit v1.1 was released less than a month after the 1.0 launch. The update expands identity‑preserving editing, multi‑element fusion, portrait makeup, and font‑style reference while providing a full training, deployment, and optimization stack.
Key Editing Capabilities
ID consistency : integrates the open‑source state‑of‑the‑art identity‑preserving technique, keeping a person recognizable after complex edits.
Multi‑element fusion : an Agent automatically crops and stitches more than ten elements, removing the need for long prompts.
Portrait makeup : supports dozens of styles, from professional retouching to creative Halloween looks.
Font‑style reference : generates high‑fidelity typography comparable to closed‑source solutions.
Photo restoration : delivers high‑quality old‑photo repair with fine‑grained detail recovery.
Engineering Optimizations
LoRA training ecosystem : open‑source training code enables custom style creation; the sampler is tuned for maximal GPU efficiency given identical tasks, image sizes, and batch counts.
Speed and memory : end‑to‑end generation finishes in 4.5 seconds using only 30 GB VRAM, thanks to integrated model distillation, quantization, and static compilation.
Agent workflow : automates multi‑image processing for complex compositions (e.g., virtual try‑on) without manual prompt engineering.
Cross‑platform deployment : native ComfyUI node support and GGUF lightweight format enable seamless integration into production pipelines.
Efficient training : offline feature extraction calls the VLM only for generation results, decoupling training from inference and accelerating the training loop.
Backbone Architecture
The editing ability is injected into a Qwen‑Image text‑to‑image backbone through a full pre‑training → SFT → RL pipeline. Because the capability is model‑agnostic, it can be transferred to any T2I foundation model; Qwen‑VL is invoked only for vision‑language grounding.
Demonstrations
Composite a man wearing a “New York Bears” jacket, camouflage pants, and AJ1 high‑top shoes on a sunny football field while preserving a red‑capped hat and a vintage leather travel bag.
Replace the background with a pastel‑blue natural‑light scene, add a pearl hairpin, and insert a sword, keeping the smiling expression.
Apply a full makeup pipeline: ivory matte foundation, brown eyebrows, subtle brown eye‑shadow, black eyeliner, false lashes, bean‑red lipstick, and pink blush sweep.
Additional LoRA adapters— makeuplora for custom makeup and covercraftlora for cover generation—are provided, with showcase images in the repository.
Open‑Source Resources
GitHub: https://github.com/FireRedTeam/FireRed-Image-Edit
Technical report (arXiv:2602.13344): https://arxiv.org/abs/2602.13344
Interactive demo: https://huggingface.co/spaces/FireRedTeam/FireRed-Image-Edit-1.1
Conclusion
FireRed-Image-Edit v1.1 offers a substantial upgrade in visual quality and engineering efficiency, establishing it as a leading open‑source solution for community‑driven image editing and creative workflows.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AIWalker
Focused on computer vision, image processing, color science, and AI algorithms; sharing hardcore tech, engineering practice, and deep insights as a diligent AI technology practitioner.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
