Privacy-Preserving Machine Learning for AI and Big Data Using Intel SGX, Occlum, and BigDL PPML
This article presents an end‑to‑end privacy‑preserving machine‑learning solution for AI and big‑data workloads built on Intel SGX, the open‑source TEE OS Occlum, and BigDL PPML, detailing its architecture, key features, deployment steps, and real‑world applications.
Protecting data security and privacy in AI and big‑data applications is a pressing challenge; this article introduces a privacy‑preserving machine‑learning solution based on Intel SGX, the open‑source TEE operating system Occlum, and Intel’s open‑source BigDL PPML.
The solution, now integrated into Occlum 1.0, enables secure distributed analytics (e.g., Spark) and AI workloads, and is demonstrated through a detailed GBDT deployment example and its use on Ant Group’s MOSS privacy‑computing platform.
Background: Digital transformation accelerates data flow, creating multi‑party storage, transfer, and processing risks, especially for AI and big‑data scenarios where single organizations cannot hold all required data, necessitating collaborative data sharing while safeguarding privacy.
Solution Overview: Ant Group and Intel co‑developed an end‑to‑end privacy‑preserving ML stack. Occlum is a memory‑safe, multi‑process user‑mode LibOS for various TEEs (including Intel® SGX). It requires minimal code changes to run workloads inside SGX, offering high‑performance multi‑tasking, multiple file‑system support, Rust‑based memory safety, and compatibility with musl‑libc, glibc, and many languages (C/C++, Java, Python, Go, Rust).
BigDL PPML builds a distributed privacy‑preserving ML platform on top of Occlum, protecting data at every stage (input, analysis, model training, inference). It supports secure parameter aggregation, private set intersection, and federated learning, allowing existing big‑data applications to migrate seamlessly to a secure environment.
Using Apache Spark as an example, BigDL PPML and Occlum enable Spark‑in‑SGX without code modifications, leveraging large SGX EPC on third‑generation Xeon platforms to protect memory computation and provide remote attestation transparently.
Deployment Process: The article outlines environment setup (Kubernetes SGX plugin, attestation service, HDFS with KMS encryption, Docker images with Occlum‑enabled BigDL PPML) and application‑side steps (app registration, submission, result retrieval). Detailed GBDT training on Spark demonstrates configuration choices (e.g., 30 executors per node, 5 cores, 25 GB Occlum memory) and performance trade‑offs when scaling EPC resources.
The solution is also integrated into Ant Group’s MOSS privacy‑computing platform and the FAIR blockchain‑based data‑trust collaboration platform, enabling secure multi‑party data analytics across industries such as finance, telecom, and retail.
Future innovations include lightweight coroutine scheduling and Rust async runtime within Occlum, and the adoption of Linux io_uring for asynchronous I/O, both reducing enclave‑host transitions and improving performance.
BigDL PPML also provides an extensible end‑to‑end encryption framework that abstracts key‑management APIs, allowing Spark applications to read encrypted data via PPMLContext, automatically obtain decryption keys, process data, and write encrypted results without modifying application logic.
Conclusion: With increasing regulations on data security and privacy, the combined Intel SGX, Occlum, and BigDL PPML stack offers an effective, open‑source solution for secure AI and big‑data workloads, fostering ecosystem collaboration and enabling broader adoption of privacy‑preserving machine learning.
AntTech
Technology is the core driver of Ant's future creation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.