TensorNet: A Distributed Training Framework Optimized for Large-Scale Sparse Feature Models on TensorFlow
TensorNet is a TensorFlow‑based distributed training framework that tackles the challenges of massive data and billions of sparse parameters in advertising and recommendation systems by enabling near‑infinite sparse feature dimensions, drastically reducing synchronization overhead, and delivering up to 35% inference speed improvements.
TensorNet is a distributed training framework built on TensorFlow specifically designed for large‑scale sparse feature scenarios such as advertising recommendation, allowing developers to train models with sparse parameters exceeding tens of billions.
Training such models faces two major challenges: extremely large training data (over 100 TB) and massive model parameter counts (over 10 billion), making single‑machine training prohibitively slow and prompting the need for distributed solutions.
Standard TensorFlow struggles with sparse models because it limits feature‑dimension size and synchronizes all parameters during distributed training, which severely slows down training speed.
TensorNet addresses these issues by extending TensorFlow to support virtually unlimited sparse feature dimensions and by reducing the synchronized parameter volume to as little as one‑ten‑thousandth of the original, cutting offline training time from 3.5 hours to 25 minutes and improving online inference performance by roughly 35% in real‑world 360 advertising workloads.
The framework offers both asynchronous and synchronous training modes. In asynchronous mode, each worker hosts separate sparse and dense parameter servers, uses a distributed hash table for sparse parameters, and merges dense parameters into a distributed array, thereby minimizing network requests.
In synchronous mode, TensorNet retains a MultiWorkerMirroredStrategy‑like architecture but introduces a dedicated sparse parameter server and synchronizes only the batch‑relevant sparse features, reducing communication traffic to a fraction of TensorFlow’s original cost.
The core optimization reduces the embedding matrix size by defining a batch‑size‑limited embedding matrix and mapping user IDs to a virtual sparse feature . The process involves defining the embedding dimension per batch, collecting and sorting user IDs, assigning indices, fetching embeddings from the parameter server, and feeding the virtual sparse feature into the model.
For inference, TensorNet splits the model into an embedding_lookup_graph used only during offline training and an inference_graph for online serving; the sparse embedding dictionary can be exported and, when combined with XLA AOT, yields an additional ~35% performance gain.
TensorNet is open‑sourced on GitHub with comprehensive documentation, tutorials, and deployment guides, and the authors provide contact information for further collaboration.
360 Tech Engineering
Official tech channel of 360, building the most professional technology aggregation platform for the brand.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.