Bridging LLMs' Social Gap: Graphia Uses Social Graphs as Supervision for Full Macro‑Micro Alignment
Graphia, a new LLM‑based social simulation framework, leverages social graph data as high‑quality supervision to jointly align microscopic interaction predictions and macroscopic network structures, achieving significant gains on TDGG and IDGG benchmarks across three real‑world datasets.
1. Background
Large language models (LLMs) have shown promise for simulating human‑like social behavior, yet existing approaches either model only graph topology [2,3] or rely on qualitative case studies [4] , leaving a gap between microscopic interactions and macroscopic network structure.
The authors identify three challenges: (1) traditional deep‑learning graph generators cannot capture text‑driven social activity; (2) no unified training framework can use social‑graph data as supervision to optimize both micro and macro aspects; (3) lack of quantitative metrics for evaluating alignment at both levels.
2. Method: Graphia Framework
2.1 Problem Formalization
The task is to model dynamic text‑attributed social graphs, represented as a sequence of timestamped sub‑graphs each containing nodes (users), edges (interactions), node attributes (text profiles) and edge attributes (messages and types). Given a historical window τ, the goal is to generate the future graph sequence.
Following the GDGB benchmark, the problem is split into two settings:
TDGG (Transductive Dynamic Graph Generation) : source nodes are known; evaluate microscopic interaction alignment.
IDGG (Inductive Dynamic Graph Generation) : source nodes are unknown; evaluate macroscopic structural alignment.
2.2 Graphia Learning Framework
Graphia treats the social graph as a high‑quality supervision signal for post‑training LLMs. The framework consists of three modules:
Activity Prediction : an Informer -based predictor estimates future out‑degree for each source node, providing a structural prior for the IDGG task.
Interaction Policy Learning : two specialized LLM agents are trained via reinforcement learning with graph‑neural‑network (GNN)‑derived rewards.
Graphia‑Q (Target‑Node Selection) : generates descriptive queries to retrieve candidate target nodes; rewards combine format quality and retrieval accuracy; optimized with GRPO.
Graphia‑E (Edge Generation) : generates interaction messages and categories for each node pair; rewards include category prediction (curriculum from soft GNN guidance to exact match) and message quality evaluated on six dimensions (goal achievement, contextual fidelity, persona depth, dynamic adaptation, immersion, content richness) using an LLM‑as‑a‑judge paradigm; training proceeds from SFT fine‑tuning to GRPO optimization.
2.3 Graph Generation Pipelines
Different pipelines are designed for TDGG and IDGG:
TDGG Pipeline : (1) Graphia‑Q generates a query for each source node u; (2) the query retrieves a set of target nodes; (3) Graphia‑E generates interaction messages for each target.
IDGG Pipeline : (1) Activity‑Predictor forecasts out‑degree for all nodes; (2) nodes with out‑degree > 0 are selected as active sources; (3) Graphia‑Q and Graphia‑E are applied to these sources; (4) the resulting edges are assembled into the future graph sequence.
3. Experimental Results
Graphia is evaluated on three real‑world social networks: Propagate‑En (Taobao e‑commerce), Weibo Tech, and Weibo Daily (all from the GDGB benchmark [5] ).
3.1 Microscopic Alignment (TDGG)
Target‑Node Selection : Graphia achieves an aggregated selection score of 0.848, surpassing the best baseline Qwen3‑32B by 6.1 % and matching larger models despite using an 8B‑parameter backbone.
Edge Generation : Using LLM‑as‑a‑judge and automatic metrics, Graphia leads on all six evaluation dimensions, improving the average score by 0.77 points (+28 %). Automatic metrics show a 12 % increase in category‑prediction accuracy and a 27.9 % boost in BERTScore.
3.2 Macroscopic Alignment (IDGG)
Structural Reproduction : Graphia attains the lowest MMD scores for degree distribution, clustering coefficient, and spectral properties across all datasets, indicating high structural similarity. It also outperforms baselines on edge‑overlap (EO) where other deep‑learning models approach zero.
Social Phenomena Reproduction : Three quantitative metrics are introduced: KOL identification precision (P@100‑KOL), echo‑chamber alignment (ΔC), and power‑law exponent gap (Δα). Graphia achieves the best or second‑best results, improving macro‑phenomena scores by 27.65 % over the strongest baseline.
4. Conclusion and Outlook
The contributions are twofold: (1) Graphia is the first unified training framework that uses social‑graph data as supervision to enhance LLM‑based social simulation, aligning both who‑to‑interact and how‑to‑interact; (2) a unified micro‑macro evaluation paradigm (TDGG/IDGG with quantitative metrics) demonstrates substantial superiority over existing methods.
Future work includes (a) causal mechanism analysis to explain agent behavior, and (b) designing higher‑order structural rewards (e.g., community cohesion, triadic closure) to improve generalization across diverse graph topologies.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alimama Tech
Official Alimama tech channel, showcasing all of Alimama's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
