Artificial Intelligence 14 min read

Privacy-Preserving Vertical Federated Graph Neural Network for Node Classification

This article presents VFGNN, a privacy‑preserving vertical federated graph neural network designed for node classification, detailing its architecture, differential‑privacy enhancements, and experimental results that demonstrate superior accuracy over single‑party baselines across multiple graph datasets.

AntTech
AntTech
AntTech
Privacy-Preserving Vertical Federated Graph Neural Network for Node Classification

At IJCAI‑ECAI2022 (the 31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence), Ant Group co‑authored a paper titled Privacy‑Preserving Vertical Federated Graph Neural Network for Node Classification , which was accepted.

IJCAI2022 received over 4,500 submissions with an acceptance rate of only 15%. The "隐语" team together with Zhejiang University and others proposed the VFGNN model for vertically partitioned data scenarios, enabling privacy‑preserving node classification and extensible to other GNN models, addressing the data‑island problem in real‑world business.

Vertical data partitioning commonly occurs in cross‑industry or cross‑service collaborations where different parties hold disjoint feature sets of the same entities. Enabling secure data flow across such parties is crucial for the digital economy.

Graph Neural Networks (GNNs) are deep‑learning methods based on graph structures that excel at processing non‑structured data such as social networks, traffic networks, knowledge graphs, and complex file systems. Adding differential privacy further strengthens privacy protection and expands data utility.

Abstract

GNN models achieve excellent performance on many tasks thanks to rich node features and edge information, but in practice these data often belong to different owners, creating data‑island issues under privacy constraints. This paper proposes VFGNN, which protects data privacy while completing node classification in vertically partitioned scenarios (different feature spaces, same sample space). The algorithm can be generalized to other GNNs. VFGNN splits the computation graph into three parts: privacy‑related feature calculations remain with data owners, loss‑function‑related calculations stay on a semi‑honest server, and differential privacy is applied to the server‑owner outputs to further enhance privacy.

1. Problem

In a vertically partitioned setting, assume three data owners A, B, and C each hold the same four nodes. Features are vertically split: A has three feature dimensions (f1, f2, f3), B has two (f4, f5), and C has two (f6, f7). Moreover, each owner has a different edge set. Only A possesses node labels. The challenge is to build a federated GNN model that leverages the data of A, B, and C.

2. Method (VFGNN Model)

Figure 2 illustrates that VFGNN’s computation is divided into three parts:

Privacy‑related feature computation : Using multi‑party computation (MPC), the three owners jointly compute initial node embeddings from private features (Step 1). Each owner then aggregates multi‑hop neighbor embeddings to obtain local embeddings (Step 2).

Non‑privacy computation : Performed on the server. The server fuses local embeddings (e.g., by concatenation, averaging, or column‑wise stacking) to obtain a global embedding (Step 3) and runs subsequent layers to produce the server‑side output.

Privacy‑related label computation : The label‑holding owner receives the server output, applies a Softmax for classification, computes loss, and back‑propagates gradients.

3. Core Computation Steps

Figure 3 shows the three core steps:

Because features are vertically split, two strategies exist for generating initial embeddings:

Independent computation : Each owner computes its own initial embedding using only its local features and weight matrix.

Joint computation : Owners jointly generate a unified initial embedding via additive secret sharing (cryptographic MPC).

Subsequently, each owner applies GraphSAGE to aggregate neighbor information and obtain a local embedding. The aggregation function (AGG) can be Mean, LSTM, or Pooling.

The server then fuses all local embeddings into a global embedding using one of three methods (Figure 4):

Concat – column‑wise concatenation.

Mean – element‑wise averaging.

Regression – linear regression‑style fusion.

4. Privacy Enhancement

During forward propagation, owners send local embeddings to the server; during back‑propagation, the label‑holding owner sends gradients back, which could leak private information. To mitigate this, VFGNN injects differential privacy into both forward embeddings and backward gradients using Gaussian noise and a James‑Stein estimator (the noise generation follows the original paper).

5. Experimental Results

Experiments were conducted on four graph datasets: Cora, Pubmed, Citeseer, and arXiv. Table 1 lists dataset statistics; Table 2 compares model accuracies.

VFGNN consistently outperforms single‑party GraphSAGE (A or B) regardless of the fusion method, and its accuracy is comparable to the centralized GraphSAGE A+B baseline because the initial embedding is generated from all owners’ features.

Further analysis (Table 3) shows that more balanced data splits lead to slightly lower accuracy, while increasing the number of data owners degrades performance due to reduced per‑owner edge information (Table 4). Table 5 demonstrates that stronger differential‑privacy budgets improve accuracy, and the James‑Stein estimator yields better results than Gaussian noise.

Beyond the Paper

The research focuses on privacy‑preserving GNNs for vertically partitioned data, intersecting privacy computing and graph machine learning, with broad applications in drug discovery, financial risk control, and other domains where data are siloed across parties.

Conference Information

IJCAI (International Joint Conference on Artificial Intelligence) is a top‑tier AI conference recommended by CCF as an A‑class venue (CoreConference Ranking A*), held annually since 2016, often co‑organized with regional AI conferences such as ECAI.

privacyGraph Neural NetworksFederated Learningdifferential privacyvertical partitionNode Classification
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.