GraphGPT: Enabling Large Language Models as Zero‑Shot Graph Learners
GraphGPT integrates large language models with graph neural networks by introducing graph tokens and instruction tuning, enabling zero‑shot graph learning for tasks such as node classification and link prediction, and demonstrates superior performance and generalization across supervised and zero‑shot benchmarks.
In this talk, Ph.D. student Tang Jiabin from the University of Hong Kong introduces GraphGPT, a framework that equips large language models (LLMs) with the ability to directly process graph-structured data and perform downstream graph tasks in a zero‑shot manner.
Graph data, composed of nodes and edges, underlies many applications such as recommendation systems, social networks, and drug discovery. While graph neural networks (GNNs) have become powerful tools for modeling such data, their heterogeneous semantics make it difficult to design a single model that generalizes across diverse graphs. Conversely, LLMs excel at learning from massive textual corpora and exhibit strong transfer capabilities.
GraphGPT addresses three core challenges: (1) how to feed graph structures into an LLM (natural‑language versus other formats), (2) how to align the LLM’s understanding with graph representations, and (3) how to enable step‑by‑step reasoning for complex graph tasks.
The solution consists of three parts. First, graphs are encoded into a sequence of graph tokens using a pretrained graph encoder (any GNN such as GCN or Graph Transformer). These tokens are projected into the same space as natural‑language tokens and concatenated before being fed to the LLM, achieving effective Text‑Graph grounding.
Second, a two‑stage instruction‑tuning paradigm is proposed. In the self‑supervised stage, a graph‑matching task aligns each graph token with its corresponding node description, using human‑question prompts that contain a shuffled list of node texts. This stage trains only the projector while keeping the LLM and graph encoder frozen, improving zero‑shot transfer. In the second stage, task‑specific instructions (e.g., node classification, link prediction) fine‑tune the model to generate appropriate answers for each graph learning task.
Third, chain‑of‑thought (CoT) reasoning is distilled from a closed‑source GPT‑3.5 model, enabling GraphGPT to perform multi‑step inference without increasing model size. For node classification on citation graphs, the model receives node abstracts, titles, and task descriptions, then generates step‑wise reasoning to produce accurate predictions.
Extensive experiments show that GraphGPT consistently outperforms state‑of‑the‑art baselines in both supervised and zero‑shot settings, achieving 2‑10× accuracy gains in zero‑shot scenarios. Ablation studies confirm the importance of the self‑supervised graph‑matching stage and the benefits of mixing diverse instruction data. Efficiency analyses reveal that freezing the LLM and graph encoder while fine‑tuning only the projector reduces parameter count by over 50× and alleviates GPU memory constraints.
Case studies on the Arxiv dataset demonstrate that GraphGPT can leverage subgraph structures to provide accurate predictions and sensible explanations, whereas models relying solely on textual node information struggle with interdisciplinary papers.
Finally, the authors discuss future directions, emphasizing the need for universal graph foundation models that combine LLM language understanding with graph structural reasoning, and they invite the community to experiment with their open‑source repository.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.