Tag

diffusion models

1 views collected around this technical thread.

Kuaishou Large Model
Kuaishou Large Model
Jun 11, 2025 · Artificial Intelligence

12 Kuaishou Breakthrough Papers at CVPR 2025: Video Generation, Diffusion & Multimodal AI

CVPR 2025 in Nashville will feature 12 Kuaishou papers spanning large‑scale video datasets, quality assessment, 3D/4D reconstruction, controllable generation, diffusion scaling laws, multimodal simulation, and novel benchmarks, highlighting the company's cutting‑edge contributions to video AI research.

Computer Visiondiffusion modelslarge-scale datasets
0 likes · 21 min read
12 Kuaishou Breakthrough Papers at CVPR 2025: Video Generation, Diffusion & Multimodal AI
DataFunTalk
DataFunTalk
Jun 8, 2025 · Artificial Intelligence

Why Autoregressive Video Models Like MAGI-1 May Outperform Diffusion Approaches

The article examines the current dominance of diffusion models in commercial video generation, contrasts them with autoregressive methods, and details how the open‑source MAGI‑1 model combines both paradigms to achieve longer, more controllable video synthesis while addressing scalability and quality challenges.

AI researchMAGI-1autoregressive models
0 likes · 70 min read
Why Autoregressive Video Models Like MAGI-1 May Outperform Diffusion Approaches
Amap Tech
Amap Tech
Jun 5, 2025 · Artificial Intelligence

How MVPainter Achieves Accurate, High‑Detail 3D Texture Generation with Multi‑View Diffusion

MVPainter introduces a fully open‑source pipeline that generates high‑quality, PBR‑compatible 3D textures from a single reference image and a white model by leveraging multi‑view diffusion, geometric control, and a human‑aligned evaluation framework, dramatically improving texture fidelity, alignment, and detail.

3D texture generationAIPBR
0 likes · 10 min read
How MVPainter Achieves Accurate, High‑Detail 3D Texture Generation with Multi‑View Diffusion
AntTech
AntTech
Jun 4, 2025 · Artificial Intelligence

LLaDA and LLaDA‑V: Large Language Diffusion Models and Their Multimodal Extensions

This article presents the LLaDA series of diffusion‑based large language models, explains how their generative‑modeling principle yields language intelligence comparable to autoregressive models, and details the multimodal LLaDA‑V architecture, training methods, experimental results, and broader implications for AI research.

Large Language Modelsdiffusion modelsgenerative modeling
0 likes · 10 min read
LLaDA and LLaDA‑V: Large Language Diffusion Models and Their Multimodal Extensions
Alimama Tech
Alimama Tech
Apr 17, 2025 · Artificial Intelligence

PosterMaker: High-Quality Product Poster Generation with Accurate Text Rendering

PosterMaker leverages a ControlNet‑based TextRenderNet with character‑level visual features and a reward‑driven foreground‑extension detector to generate high‑quality product posters that accurately render Chinese text (over 90% sentence accuracy) while preserving product fidelity, and is already deployed in Alibaba’s AI creative tool.

E-commerce AIcharacter-level featuresdiffusion models
0 likes · 18 min read
PosterMaker: High-Quality Product Poster Generation with Accurate Text Rendering
DaTaobao Tech
DaTaobao Tech
Apr 7, 2025 · Artificial Intelligence

Flow Matching for Generative Modeling

Flow Matching reformulates generative modeling by learning a time‑dependent vector field that deterministically transports Gaussian noise to data, using a neural network trained with an analytically derived L2 loss, yielding simpler training, faster convergence, and deterministic sampling that matches or exceeds diffusion model quality.

AIcontinuous normalizing flowdiffusion models
0 likes · 13 min read
Flow Matching for Generative Modeling
Tencent Cloud Developer
Tencent Cloud Developer
Mar 25, 2025 · Artificial Intelligence

Knowledge Distillation in Diffusion Models: Techniques and Applications

The article explains how knowledge distillation transfers capabilities from large to smaller diffusion models, covering hard and soft labels, temperature scaling, and contrasting it with data distillation, while detailing techniques such as consistency models, progressive distillation, adversarial distillation, and adversarial post‑training for model compression and step reduction.

adversarial post-trainingadversarial trainingconsistency models
0 likes · 19 min read
Knowledge Distillation in Diffusion Models: Techniques and Applications
Bilibili Tech
Bilibili Tech
Dec 24, 2024 · Artificial Intelligence

AniSora: An Integrated System for Anime Video Generation with Data Flywheel, Controllable Diffusion Models, and Evaluation Benchmark

AniSora combines a 10‑million‑pair anime text‑video dataset, a controllable diffusion‑transformer with temporal‑mask conditioning for text‑to‑video, interpolation and region‑guided animation, and a 948‑video benchmark, delivering industry‑leading character and motion consistency and already powering low‑cost dynamic‑comic production for multiple IPs.

AI AnimationAnime Video GenerationDataset Benchmark
0 likes · 21 min read
AniSora: An Integrated System for Anime Video Generation with Data Flywheel, Controllable Diffusion Models, and Evaluation Benchmark
DaTaobao Tech
DaTaobao Tech
Dec 16, 2024 · Artificial Intelligence

Reference Image Generation for Subject‑Driven Diffusion

This work presents a subject‑driven diffusion pipeline that injects multi‑scale reference features (ReferenceNet‑style) into high‑fidelity backbones such as SD‑XL and Flux, enabling zero‑shot, fine‑grained product consistency across diverse scenes and outperforming current fine‑tuned and zero‑shot methods while noting limits in category coverage and human interactions.

AIDreamboothIP-Adapter
0 likes · 9 min read
Reference Image Generation for Subject‑Driven Diffusion
AntTech
AntTech
Nov 27, 2024 · Artificial Intelligence

EchoMimicV2: An End-to-End Audio‑Driven Semi‑Body Human Animation Framework

EchoMimicV2, an open‑source project from Ant Group's Alipay AI team, introduces an end‑to‑end audio‑driven framework that generates high‑quality semi‑body portrait videos by jointly coordinating audio, pose, and image inputs, while addressing challenges of condition complexity, model stability, and computational cost.

AI researchDigital HumanMultimodal Generation
0 likes · 16 min read
EchoMimicV2: An End-to-End Audio‑Driven Semi‑Body Human Animation Framework
Alimama Tech
Alimama Tech
Nov 27, 2024 · Artificial Intelligence

FlowDCN: Efficient Arbitrary-Resolution Image Generation via Groupwise Multi‑Scale Deformable Convolution

FlowDCN introduces Groupwise‑MSDCN, a sparse deformable convolution that replaces attention, enabling efficient arbitrary‑resolution image generation with linear complexity, fewer parameters and FLOPs, and achieving state‑of‑the‑art FID scores on ImageNet while requiring far fewer training steps.

Image Generationarbitrary resolutiondeformable convolution
0 likes · 12 min read
FlowDCN: Efficient Arbitrary-Resolution Image Generation via Groupwise Multi‑Scale Deformable Convolution
JD Tech
JD Tech
Nov 15, 2024 · Artificial Intelligence

Reliable Feedback Network (RFNet) for Improving Usable Advertising Image Generation

The paper proposes a multimodal Reliable Feedback Network (RFNet) and a consistency‑regularized fine‑tuning method (RFFT) that dramatically increase the proportion of usable advertising images generated by diffusion models while preserving visual appeal, and introduces the large‑scale RF1M dataset for training and evaluation.

Image GenerationRFNetadvertising images
0 likes · 9 min read
Reliable Feedback Network (RFNet) for Improving Usable Advertising Image Generation
JD Retail Technology
JD Retail Technology
Nov 14, 2024 · Artificial Intelligence

Improving Advertisement Image Generation with a Multimodal Reliable Feedback Network (ECCV 2024)

The paper introduces a Multimodal Reliable Feedback Network (RFNet) and a consistency‑condition regularization technique that together boost the usable rate of automatically generated advertisement images while preserving visual quality, supported by a new million‑image annotated dataset and extensive ECCV‑2024 experiments.

AIECCV2024advertisement generation
0 likes · 8 min read
Improving Advertisement Image Generation with a Multimodal Reliable Feedback Network (ECCV 2024)
Tencent Cloud Developer
Tencent Cloud Developer
Oct 30, 2024 · Artificial Intelligence

Comprehensive Survey of AIGC Research: Papers, Resources, and Technical Overview

This survey acts as a comprehensive portal that organizes AIGC research across seven domains—text, image, and audio generation, cross‑modal association, text‑guided image and audio synthesis, and supporting resources—detailing seminal models such as GPT, Diffusion, CLIP, DALL·E, Stable Diffusion, MusicLM, and key papers that shaped each field.

AIGCClipComputer Vision
0 likes · 19 min read
Comprehensive Survey of AIGC Research: Papers, Resources, and Technical Overview
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Sep 19, 2024 · Artificial Intelligence

Target-Driven Distillation (TDD): A Multi‑Goal Distillation Method for Accelerating Diffusion Models

Target‑Driven Distillation (TDD) is a multi‑goal distillation method that flexibly selects short‑range target steps and decouples guidance during training, enabling 4‑to‑8‑step diffusion generation that preserves high‑resolution detail, works with LoRA, ControlNet, InstantID, and outperforms existing consistency distillation techniques in speed and quality.

AI accelerationImage Generationdiffusion models
0 likes · 9 min read
Target-Driven Distillation (TDD): A Multi‑Goal Distillation Method for Accelerating Diffusion Models
Alimama Tech
Alimama Tech
Aug 16, 2024 · Artificial Intelligence

SPLAM: Sub‑Path Linear Approximation for Accelerating Diffusion Model Sampling

SPLAM (Sub‑Path Linear Approximation Model) accelerates diffusion‑model image synthesis by linearly approximating short sub‑paths of the probability‑flow ODE, allowing high‑quality generation in as few as four steps, outperforming prior fast‑sampling methods on COCO benchmarks and being deployed in Alibaba Mama’s recommendation system.

AI image generationSPLAMdiffusion models
0 likes · 11 min read
SPLAM: Sub‑Path Linear Approximation for Accelerating Diffusion Model Sampling
Kuaishou Tech
Kuaishou Tech
Jul 31, 2024 · Artificial Intelligence

Kuaishou’s Kolors Text‑to‑Image Model: Architecture, Evaluation, and Real‑World Applications

The article presents a comprehensive overview of Kuaishou’s Kolors (formerly 可图) multimodal generative model, detailing its data collection strategy, diffusion‑based architecture, evaluation metrics, derived capabilities such as prompt refinement and interactive generation, and a range of practical applications from AI‑powered live‑stream gifts to virtual try‑on, while also offering strategic advice for the domestic visual‑generation community.

AI applicationsKolorsdiffusion models
0 likes · 27 min read
Kuaishou’s Kolors Text‑to‑Image Model: Architecture, Evaluation, and Real‑World Applications
Tencent Advertising Technology
Tencent Advertising Technology
Jul 31, 2024 · Artificial Intelligence

MimicMotion: A Controllable Video Generation Framework for High-Quality Human Motion Synthesis

MimicMotion is a controllable video generation framework that produces smooth, high-quality human motion videos by leveraging skeletal action guidance, addressing challenges in video generation such as limited length, weak controllability, and lack of dynamic detail.

AIAdvertising TechnologyMimicMotion
0 likes · 13 min read
MimicMotion: A Controllable Video Generation Framework for High-Quality Human Motion Synthesis
AntTech
AntTech
Jul 24, 2024 · Artificial Intelligence

EchoMimic: An Open‑Source AIGC‑Driven Framework for 2D/3D Digital Human Generation

EchoMimic, an open‑source project from Ant Group, presents a flexible, audio‑ and pose‑driven digital human generation pipeline that combines 2D, 3D and AIGC techniques, reduces production costs, achieves real‑time inference, and includes a detailed architecture, related work analysis, and future research directions.

AIGCComputer VisionDigital Human
0 likes · 18 min read
EchoMimic: An Open‑Source AIGC‑Driven Framework for 2D/3D Digital Human Generation
Kuaishou Tech
Kuaishou Tech
Jul 11, 2024 · Artificial Intelligence

Kuaishou Open-Sources Kolors: A High-Performance Text-to-Image Model Rivaling Midjourney v6

Kuaishou has officially open-sourced Kolors, a state-of-the-art text-to-image diffusion model that leverages ChatGLM3 for advanced bilingual text understanding and employs a two-stage training strategy to achieve photographic image quality rivaling leading proprietary systems.

Computer VisionLarge Language ModelsText-to-Image Generation
0 likes · 8 min read
Kuaishou Open-Sources Kolors: A High-Performance Text-to-Image Model Rivaling Midjourney v6