Artificial Intelligence 10 min read

Multidimensional Preference Model (MPS) for Text-to-Image Generation: Dataset, Architecture, and Experimental Analysis

This article introduces the Multidimensional Preference Model (MPS), the first multi‑dimensional scoring system for evaluating text‑to‑image generation, built on the newly released MHP dataset with extensive human annotations across aesthetic, semantic alignment, detail quality, and overall preference dimensions, and demonstrates its superior performance through comprehensive experiments and RLHF integration.

Kuaishou Tech
Kuaishou Tech
Kuaishou Tech
Multidimensional Preference Model (MPS) for Text-to-Image Generation: Dataset, Architecture, and Experimental Analysis

We propose the Multidimensional Preference Model (MPS), the first multi‑dimensional scoring model for evaluating text‑to‑image generation, trained on the newly released Multidimensional Human Preference (MHP) dataset containing 918,315 pairwise comparisons across aesthetic, semantic alignment, detail quality, and overall score dimensions.

MHP was built from a balanced set of prompts collected from multiple sources, augmented with GPT‑4 generated prompts to address long‑tail categories, and paired with images generated by diffusion, GAN, and autoregressive models, resulting in over 600k images and extensive human annotations.

The MPS architecture extends CLIP with a preference‑condition module that injects a conditional mask into the cross‑attention layers, allowing the model to predict scores for each preference dimension while sharing a unified backbone.

Extensive experiments on three public benchmarks and our MHP benchmark show that MPS outperforms existing scoring methods on overall and per‑dimension metrics, and visualizations using Grad‑CAM demonstrate that the conditional mask focuses on relevant prompt tokens.

We also integrate MPS into reinforcement learning from human feedback (RLHF) pipelines (e.g., PPO, DPO) to fine‑tune large text‑to‑image models, improving aesthetic quality and realism. The model, dataset, and code are publicly released.

text-to-imageAI evaluationRLHFMPSMHP datasetmultidimensional preference
Kuaishou Tech
Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.