Artificial Intelligence 18 min read

Elastic Federated Learning Solution (EFLS): Project Overview, Architecture, and Technical Implementation

The article introduces Alibaba's Elastic Federated Learning Solution (EFLS), describing its business motivations, core functionalities, system architecture, sample‑set intersection, federated training pipeline, novel algorithms, product console, and future roadmap for privacy‑preserving advertising in large‑scale sparse scenarios.

DataFunTalk
DataFunTalk
DataFunTalk
Elastic Federated Learning Solution (EFLS): Project Overview, Architecture, and Technical Implementation

Project Background

In the mobile‑Internet era, privacy and data‑security concerns limit open data exchange among apps, creating information silos. To address this, Google proposed federated learning in 2016. Alibaba’s algorithm engineering team and the external advertising algorithm team have open‑sourced the Elastic Federated Learning Solution (EFLS) to bring federated learning into Alibaba’s advertising business.

Business Application

Business Background

Federated learning is mature in finance but still nascent in large‑scale sparse advertising scenarios. Alibaba’s external advertising (大外投) faces challenges such as multi‑media traffic, lack of user‑side behavior data, and strict privacy regulations.

Media‑driven traffic requires unified ad placement across multiple platforms while providing robust ROI analysis.

External ads lack pre‑click user behavior data, preventing a closed‑loop optimization.

Both parties need privacy‑preserving collaboration to improve ROI.

Application Scheme

During the online ad‑placement phase, media and e‑commerce parties jointly train a click‑through‑rate (CTR) prediction model via federated learning without sharing raw user data. Sample alignment is performed using encrypted identifiers (instance_id) and a privacy‑preserving set‑intersection protocol.

Project Architecture & Core Functions

EFLS supports both vertical and horizontal federated learning. Version 0.1 focuses on a two‑party vertical scenario.

The system consists of three main modules: EFLS‑Data (sample set intersection), EFLS‑Train (federated training), and EFLS‑Console (web UI). The algorithm package provides two effective federated learning models.

Web console built on a workflow abstraction for user, task, data, and permission management.

Sample‑set intersection implemented with Flink on Kubernetes, supporting billions of samples.

Lightweight client for quick testing without a full Flink setup.

High‑performance C++ communication layer for federated training with fine‑grained checkpointing.

First open‑source vertical federated learning methods: horizontal aggregation and hierarchical aggregation.

Technical Details

Sample Set Intersection

Three steps: bucketization, per‑bucket intersection via gRPC, and checksum verification. For sensitive keys, EFLS uses a Blind RSA‑based PSI protocol to encrypt identifiers before transmission.

Federated Model Training

In vertical federated learning, each party holds its own features and model parameters. The collaborator encrypts intermediate results, sends them to the leader, which computes gradients, encrypts them back, and the collaborator updates its parameters.

EFLS‑Train extends TensorFlow 1.15 with a federated dataset interface, secure gRPC communication, privacy encryption (including differential privacy), and a Keras‑like high‑level API.

CTR = efl.FederalModel()
CTR.input_fn(input_fn)
CTR.loss_fn(model_fn)
CTR.optimizer_fn(efl.optimizer_fn.optimizer_setter(tf.train.GradientDescentOptimizer(0.001)))
CTR.compile()
CTR.fit(efl.procedure_fn.train(), log_step=1, project_name="train")

Algorithm Innovations

Two novel vertical federated learning methods are open‑sourced:

Horizontal aggregation: Uses attention mechanisms to fuse media‑side feature vectors with e‑commerce features, allowing early‑layer integration.

Hierarchical aggregation (AutoHERI): Employs neural architecture search to automatically discover optimal hierarchical connections between media embeddings and e‑commerce model layers.

Product Console

A web console abstracts the federated workflow, enabling visual management of users, tasks, data, and permissions, thereby reducing operational overhead.

Future Roadmap

EFLS will add multi‑party scalability, auto‑elastic scaling, and performance optimizations for privacy encryption. Algorithmically, it will explore multi‑party federated learning, federated graph learning, and integration of pre‑training models for ad recall and ranking under strict privacy constraints.

References

[1] McMahan et al., “Communication‑efficient learning of deep networks from decentralized data,” 2017. [2] Yang et al., “Federated machine learning: Concept and applications,” 2019. [3] De Cristofaro & Tsudik, “Practical Private Set Intersection Protocols,” 2009. [4] Apache Flink documentation. [5] Kubernetes documentation. [6] Dwork, “Differential privacy: A survey of results,” 2008. [7] Wei et al., “AutoHERI: Automated Hierarchical Representation Integration for Post‑Click Conversion Rate Estimation,” CIKM 2021.

distributed systemsadvertisingFlinkkubernetesTensorFlowFederated LearningPrivacy-Preserving AI
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.