Bridging LLMs' Social Gap: Graphia Uses Social Graphs as Supervision for Full Macro‑Micro Alignment

Graphia, a new LLM‑based social simulation framework, leverages social graph data as high‑quality supervision to jointly align microscopic interaction predictions and macroscopic network structures, achieving significant gains on TDGG and IDGG benchmarks across three real‑world datasets.

Alimama Tech
Alimama Tech
Alimama Tech
Bridging LLMs' Social Gap: Graphia Uses Social Graphs as Supervision for Full Macro‑Micro Alignment

1. Background

Large language models (LLMs) have shown promise for simulating human‑like social behavior, yet existing approaches either model only graph topology [2,3] or rely on qualitative case studies [4] , leaving a gap between microscopic interactions and macroscopic network structure.

The authors identify three challenges: (1) traditional deep‑learning graph generators cannot capture text‑driven social activity; (2) no unified training framework can use social‑graph data as supervision to optimize both micro and macro aspects; (3) lack of quantitative metrics for evaluating alignment at both levels.

2. Method: Graphia Framework

2.1 Problem Formalization

The task is to model dynamic text‑attributed social graphs, represented as a sequence of timestamped sub‑graphs each containing nodes (users), edges (interactions), node attributes (text profiles) and edge attributes (messages and types). Given a historical window τ, the goal is to generate the future graph sequence.

Following the GDGB benchmark, the problem is split into two settings:

TDGG (Transductive Dynamic Graph Generation) : source nodes are known; evaluate microscopic interaction alignment.

IDGG (Inductive Dynamic Graph Generation) : source nodes are unknown; evaluate macroscopic structural alignment.

2.2 Graphia Learning Framework

Graphia treats the social graph as a high‑quality supervision signal for post‑training LLMs. The framework consists of three modules:

Activity Prediction : an Informer -based predictor estimates future out‑degree for each source node, providing a structural prior for the IDGG task.

Interaction Policy Learning : two specialized LLM agents are trained via reinforcement learning with graph‑neural‑network (GNN)‑derived rewards.

Graphia‑Q (Target‑Node Selection) : generates descriptive queries to retrieve candidate target nodes; rewards combine format quality and retrieval accuracy; optimized with GRPO.

Graphia‑E (Edge Generation) : generates interaction messages and categories for each node pair; rewards include category prediction (curriculum from soft GNN guidance to exact match) and message quality evaluated on six dimensions (goal achievement, contextual fidelity, persona depth, dynamic adaptation, immersion, content richness) using an LLM‑as‑a‑judge paradigm; training proceeds from SFT fine‑tuning to GRPO optimization.

2.3 Graph Generation Pipelines

Different pipelines are designed for TDGG and IDGG:

TDGG Pipeline : (1) Graphia‑Q generates a query for each source node u; (2) the query retrieves a set of target nodes; (3) Graphia‑E generates interaction messages for each target.

IDGG Pipeline : (1) Activity‑Predictor forecasts out‑degree for all nodes; (2) nodes with out‑degree > 0 are selected as active sources; (3) Graphia‑Q and Graphia‑E are applied to these sources; (4) the resulting edges are assembled into the future graph sequence.

3. Experimental Results

Graphia is evaluated on three real‑world social networks: Propagate‑En (Taobao e‑commerce), Weibo Tech, and Weibo Daily (all from the GDGB benchmark [5] ).

3.1 Microscopic Alignment (TDGG)

Target‑Node Selection : Graphia achieves an aggregated selection score of 0.848, surpassing the best baseline Qwen3‑32B by 6.1 % and matching larger models despite using an 8B‑parameter backbone.

Edge Generation : Using LLM‑as‑a‑judge and automatic metrics, Graphia leads on all six evaluation dimensions, improving the average score by 0.77 points (+28 %). Automatic metrics show a 12 % increase in category‑prediction accuracy and a 27.9 % boost in BERTScore.

3.2 Macroscopic Alignment (IDGG)

Structural Reproduction : Graphia attains the lowest MMD scores for degree distribution, clustering coefficient, and spectral properties across all datasets, indicating high structural similarity. It also outperforms baselines on edge‑overlap (EO) where other deep‑learning models approach zero.

Social Phenomena Reproduction : Three quantitative metrics are introduced: KOL identification precision (P@100‑KOL), echo‑chamber alignment (ΔC), and power‑law exponent gap (Δα). Graphia achieves the best or second‑best results, improving macro‑phenomena scores by 27.65 % over the strongest baseline.

4. Conclusion and Outlook

The contributions are twofold: (1) Graphia is the first unified training framework that uses social‑graph data as supervision to enhance LLM‑based social simulation, aligning both who‑to‑interact and how‑to‑interact; (2) a unified micro‑macro evaluation paradigm (TDGG/IDGG with quantitative metrics) demonstrates substantial superiority over existing methods.

Future work includes (a) causal mechanism analysis to explain agent behavior, and (b) designing higher‑order structural rewards (e.g., community cohesion, triadic closure) to improve generalization across diverse graph topologies.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LLMevaluation metricsreinforcement learningsocial graphdynamic graphsgraph generationGraphia
Alimama Tech
Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.