ProteinOPD: Tsinghua’s Efficient Multi‑Objective Preference Alignment Framework for Protein Design
ProteinOPD introduces a multi‑teacher, on‑policy preference‑distillation framework that aligns protein language models with multiple design objectives—foldability, solubility and thermostability—while preserving generation quality, achieving up to 54% stability gains and an eight‑fold training speedup.
Background
Protein language models are transitioning from generating merely plausible sequences to designing proteins with target properties such as high foldability, solubility, and thermal stability. Multi‑objective preference alignment traditionally suffers from two problems: (1) improvement of specific attributes often causes the model to forget the design capability learned during pre‑training, and (2) balancing competing objectives is unstable.
Method
The ProteinOPD framework separates preference acquisition from preference combination :
Preference acquisition : For each target attribute (foldability, solubility, thermostability) an attribute oracle scores protein sequences. A small set of high‑scoring sequences is selected to form a preference‑specific training set.
Teacher construction : Lightweight adapters convert the pretrained protein language model into multiple teachers, each dedicated to a single attribute. The teachers are trained on the corresponding preference‑specific set.
Multi‑teacher OPD : During student training, the student generates a prefix, each teacher supplies a next‑token probability distribution, and the distributions are merged by a normalized Product‑of‑Experts (PoE) to obtain a geometric consensus distribution. The consensus emphasizes tokens that all teachers jointly support, thereby resolving conflicts among objectives.
On‑policy token‑level distillation : The student receives a token‑level loss computed on its own generation trajectory, correcting the student in the states it actually visits. This on‑policy approach provides dense supervision compared with offline imitation and mitigates train‑time vs. generation‑time distribution mismatch.
The normalized PoE also yields a normalization factor that reflects the degree of disagreement among teachers, offering an implicit signal of attribute conflict without extra computation.
Results
Multi‑objective alignment : Compared with the strongest cross‑domain baseline MoMPNN, ProteinOPD improves hypervolume (HV) by 34.8%. Relative to the base model ProtGPT2, ProteinOPD increases foldability by 14.8%, solubility by 16.9%, and thermostability by 54.2%.
Training efficiency : Thermal‑stability improvement reaches the level of reinforcement‑learning methods in roughly 1/8 of the training time (≈8× speed‑up). Teacher construction requires only a few oracle‑selected high‑quality samples.
Single‑objective experiments : In unconditional generation, ProteinOPD retains most attribute gains while incurring lower novelty loss compared with direct fine‑tuning. In conditional generation, the ProTrek Score rises, indicating that alignment does not degrade condition consistency.
Case Study
A head‑to‑head comparison with ASPO on sequences whose thermal‑stability score exceeds 0.95 and whose identity to UniRef is below 5% shows that ProteinOPD achieves pLDDT 0.73 versus 0.49 for ASPO, and solubility score 0.69 versus 0.43, demonstrating superior multi‑attribute alignment while preserving novelty.
Resources
Paper: https://arxiv.org/abs/2605.10189
Open‑source code and releases: https://github.com/THU-AI4S/ProteinOPD
Colab inference notebook: https://colab.research.google.com/github/THU-AI4S/ProteinOPD/blob/main/notebooks/proteinopd_inference.ipynb
Code example
来源:ScienceAI
本文
约3000字
,建议阅读
5
分钟
提供了一条高效路径。Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data Party THU
Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
