Artificial Intelligence 9 min read

Privacy-Preserving Gradient Boosting Decision Trees via Multi-Party Computation and the Squirrel Framework

This article introduces a privacy-preserving gradient boosting decision tree (GBDT) solution built on multi‑party computation, detailing its background, training steps, the MPC tools used, and the Squirrel framework’s workflow, while discussing performance challenges and experimental results demonstrating scalability to millions of samples.

DataFunSummit

Aug 27, 2023

Privacy-Preserving Gradient Boosting Decision Trees via Multi-Party Computation and the Squirrel Framework

Background : The work addresses the need to train decision tree models in a vertically distributed environment where data from banks and financial companies must be combined without exposing raw features, aiming to support loan‑approval predictions while preserving user privacy.

Gradient Boosting Decision Trees (GBDT) : GBDT is an ensemble learning method that iteratively adds weak decision trees to improve prediction accuracy. The training process includes initializing a model, computing residuals, training a weak learner on those residuals, updating the model, and repeating until convergence.

Secure Multi‑Party Computation (MPC) : To protect privacy, the solution employs semi‑honest two‑party computation, using additive secret sharing for non‑linear operations and linear homomorphic encryption for linear ones. A hybrid approach combines the strengths of both, favoring lattice‑based schemes (e.g., BFV) over Paillier due to efficiency and modulus compatibility.

Squirrel Framework Workflow : The framework proceeds through several steps: (1) Private Set Intersection (PSI) aligns samples from both parties; (2) Feature bucketization and encoding; (3) Secret‑shared gradient computation across parties; (4) Encrypted vector multiplication to accumulate gradients; (5) Efficient sigmoid approximation using a Fourier‑series‑based method that requires only a single round of multiplication.

Performance Challenges : While MPC enables privacy, it incurs high computational cost and communication overhead, especially for large datasets. The proposed method mitigates this by converting RLWE ciphertexts to LWE for bucket summation, then recombining them to reduce communication, and by using polynomial coefficient encryption for vector operations.

Experimental Results : Experiments demonstrate that the approach can train pure two‑party decision tree models on datasets ranging from hundreds of thousands to tens of millions of samples within minutes per tree, achieving high accuracy and interpretability comparable to XGBoost and LightGBM.

Conclusion : By integrating RLWE↔LWE conversion, indicator vectors for split tracking, and a Fourier‑based sigmoid approximation, the system offers a scalable, privacy‑preserving GBDT solution. The work has been accepted at USENIX Security 2023.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

GBDT privacy MPC secure computation USENIX Security Squirrel Framework

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.