Artificial Intelligence 8 min read

TensorNet: A Distributed Training Framework Optimized for Large-Scale Sparse Feature Models on TensorFlow

TensorNet is a TensorFlow‑based distributed training framework that tackles the challenges of massive data and billions of sparse parameters in advertising and recommendation systems by enabling near‑infinite sparse feature dimensions, drastically reducing synchronization overhead, and delivering up to 35% inference speed improvements.

360 Tech Engineering

Sep 14, 2020

TensorNet: A Distributed Training Framework Optimized for Large-Scale Sparse Feature Models on TensorFlow

TensorNet is a distributed training framework built on TensorFlow specifically designed for large‑scale sparse feature scenarios such as advertising recommendation, allowing developers to train models with sparse parameters exceeding tens of billions.

Training such models faces two major challenges: extremely large training data (over 100 TB) and massive model parameter counts (over 10 billion), making single‑machine training prohibitively slow and prompting the need for distributed solutions.

Standard TensorFlow struggles with sparse models because it limits feature‑dimension size and synchronizes all parameters during distributed training, which severely slows down training speed.

TensorNet addresses these issues by extending TensorFlow to support virtually unlimited sparse feature dimensions and by reducing the synchronized parameter volume to as little as one‑ten‑thousandth of the original, cutting offline training time from 3.5 hours to 25 minutes and improving online inference performance by roughly 35% in real‑world 360 advertising workloads.

The framework offers both asynchronous and synchronous training modes. In asynchronous mode, each worker hosts separate sparse and dense parameter servers, uses a distributed hash table for sparse parameters, and merges dense parameters into a distributed array, thereby minimizing network requests.

In synchronous mode, TensorNet retains a MultiWorkerMirroredStrategy‑like architecture but introduces a dedicated sparse parameter server and synchronizes only the batch‑relevant sparse features, reducing communication traffic to a fraction of TensorFlow’s original cost.

The core optimization reduces the embedding matrix size by defining a batch‑size‑limited embedding matrix and mapping user IDs to a virtual sparse feature. The process involves defining the embedding dimension per batch, collecting and sorting user IDs, assigning indices, fetching embeddings from the parameter server, and feeding the virtual sparse feature into the model.

For inference, TensorNet splits the model into an embedding_lookup_graph used only during offline training and an inference_graph for online serving; the sparse embedding dictionary can be exported and, when combined with XLA AOT, yields an additional ~35% performance gain.

TensorNet is open‑sourced on GitHub with comprehensive documentation, tutorials, and deployment guides, and the authors provide contact information for further collaboration.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

TensorFlow distributed training AI infrastructure embedding optimization sparse features

Written by

360 Tech Engineering

Official tech channel of 360, building the most professional technology aggregation platform for the brand.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.