Can Graph Neural Networks Accurately Predict Antibody‑Antigen Binding Affinity?

A recent Oxford study introduces Graphinity, an equivariant graph neural network that directly uses antibody‑antigen structures to predict ΔΔG, achieving up to r = 0.89 on large synthetic datasets, but reveals that data volume and diversity, rather than model architecture, remain the primary bottleneck for reliable affinity prediction.

Data Party THU
Data Party THU
Data Party THU
Can Graph Neural Networks Accurately Predict Antibody‑Antigen Binding Affinity?

Antibody therapeutics rely on strong and specific binding to antigens, but measuring binding free‑energy changes (ΔΔG) experimentally is slow and costly, prompting researchers to turn to machine‑learning models that often suffer from limited training data.

Graphinity: An Equivariant GNN for ΔΔG Prediction

The Oxford team developed Graphinity , a Siamese equivariant graph neural network (EGNN) that takes the 3‑D structures of wild‑type and mutant antibody‑antigen complexes as input and directly predicts ΔΔG. The model processes the structures as graphs, preserving rotational and translational symmetry.

Data Generation Strategy

To overcome data scarcity, the authors generated a massive synthetic dataset using FoldX on the SAbDab structural antibody database. They exhaustively mutated interface residues across 29 complexes, producing nearly 1 million ΔΔG points (645 single‑point mutations per complex) and an additional 20 k experimentally measured cases.

Graphinity architecture and synthetic dataset preparation
Graphinity architecture and synthetic dataset preparation

Performance Evaluation

In 10‑fold cross‑validation on the synthetic set, Graphinity achieved a Pearson correlation coefficient r = 0.87. When the CDR‑sequence similarity cutoff was tightened to 100 % (90 % length match), performance dropped by 63 %, indicating over‑fitting to similar sequences.

Using a stricter 90 % CDR similarity split, the model reached r = 0.89. On an independent test of 36 391 experimental ΔΔG values, Graphinity obtained ROC AUC = 0.90 and average precision = 0.82, demonstrating genuine generalisation.

Graphinity performance on ΔΔG prediction
Graphinity performance on ΔΔG prediction

Data Quantity and Diversity Analysis

The authors investigated how training set size affects performance. Pearson r only stabilised above 0.85 when at least 90 000 mutations were used, despite a total of 94 126 data points being available.

They also evaluated three diversity metrics on a 100 k‑mutation subset:

Sequence diversity: reduced from 1 177 antibodies to 75, lowering standard deviation by 23 %.

Amino‑acid substitution type diversity: compressed 380 substitution types to 19 common ones, reducing standard deviation by an additional 60 %.

Interface‑mutation spatial distribution: restricting mutations to core or peripheral interface regions showed negligible impact on performance.

These results suggest that rich coverage of antibody sequences and substitution chemistry, rather than the precise location of mutations, drives model success.

Conclusions and Future Directions

The study concludes that the dominant obstacle for accurate ΔΔG prediction is the lack of diverse, high‑quality experimental data, not the sophistication of the GNN architecture. The authors estimate that at least 90 000 experimentally measured ΔΔG values are needed to surpass a Pearson r of 0.85. They advocate for the creation of larger, more varied datasets—potentially reaching hundreds of thousands to millions of points—to enable truly generalisable affinity‑prediction models.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

graph neural networkssynthetic datasetantibody affinityΔΔG predictionProtein Engineering
Data Party THU
Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.