ProteinOPD: Tsinghua’s Efficient Multi‑Objective Preference Alignment Framework for Protein Design

ProteinOPD introduces a multi‑teacher, on‑policy preference‑distillation framework that aligns protein language models with multiple design objectives—foldability, solubility and thermostability—while preserving generation quality, achieving up to 54% stability gains and an eight‑fold training speedup.

Data Party THU
Data Party THU
Data Party THU
ProteinOPD: Tsinghua’s Efficient Multi‑Objective Preference Alignment Framework for Protein Design

Background

Protein language models are transitioning from generating merely plausible sequences to designing proteins with target properties such as high foldability, solubility, and thermal stability. Multi‑objective preference alignment traditionally suffers from two problems: (1) improvement of specific attributes often causes the model to forget the design capability learned during pre‑training, and (2) balancing competing objectives is unstable.

Method

The ProteinOPD framework separates preference acquisition from preference combination :

Preference acquisition : For each target attribute (foldability, solubility, thermostability) an attribute oracle scores protein sequences. A small set of high‑scoring sequences is selected to form a preference‑specific training set.

Teacher construction : Lightweight adapters convert the pretrained protein language model into multiple teachers, each dedicated to a single attribute. The teachers are trained on the corresponding preference‑specific set.

Multi‑teacher OPD : During student training, the student generates a prefix, each teacher supplies a next‑token probability distribution, and the distributions are merged by a normalized Product‑of‑Experts (PoE) to obtain a geometric consensus distribution. The consensus emphasizes tokens that all teachers jointly support, thereby resolving conflicts among objectives.

On‑policy token‑level distillation : The student receives a token‑level loss computed on its own generation trajectory, correcting the student in the states it actually visits. This on‑policy approach provides dense supervision compared with offline imitation and mitigates train‑time vs. generation‑time distribution mismatch.

The normalized PoE also yields a normalization factor that reflects the degree of disagreement among teachers, offering an implicit signal of attribute conflict without extra computation.

Results

Multi‑objective alignment : Compared with the strongest cross‑domain baseline MoMPNN, ProteinOPD improves hypervolume (HV) by 34.8%. Relative to the base model ProtGPT2, ProteinOPD increases foldability by 14.8%, solubility by 16.9%, and thermostability by 54.2%.

Training efficiency : Thermal‑stability improvement reaches the level of reinforcement‑learning methods in roughly 1/8 of the training time (≈8× speed‑up). Teacher construction requires only a few oracle‑selected high‑quality samples.

Single‑objective experiments : In unconditional generation, ProteinOPD retains most attribute gains while incurring lower novelty loss compared with direct fine‑tuning. In conditional generation, the ProTrek Score rises, indicating that alignment does not degrade condition consistency.

Case Study

A head‑to‑head comparison with ASPO on sequences whose thermal‑stability score exceeds 0.95 and whose identity to UniRef is below 5% shows that ProteinOPD achieves pLDDT 0.73 versus 0.49 for ASPO, and solubility score 0.69 versus 0.43, demonstrating superior multi‑attribute alignment while preserving novelty.

Resources

Paper: https://arxiv.org/abs/2605.10189

Open‑source code and releases: https://github.com/THU-AI4S/ProteinOPD

Colab inference notebook: https://colab.research.google.com/github/THU-AI4S/ProteinOPD/blob/main/notebooks/proteinopd_inference.ipynb

Code example

来源:ScienceAI
本文
约3000字
,建议阅读
5
分钟
提供了一条高效路径。
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

deep learningmulti-objective learningLanguage Modelsprotein designpreference alignmentProteinOPD
Data Party THU
Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.