PaddleBox: A GPU‑Based Ultra‑Large‑Scale Sparse DNN Training Framework

PaddleBox is Baidu’s GPU‑based ultra‑large‑scale sparse DNN training framework that combines a three‑tier hierarchical parameter server (SSD, DRAM, HBM) with pipelined scheduling and multi‑machine multi‑GPU communication, delivering 5–40× cost‑performance gains over traditional CPU solutions and powering Baidu’s advertising services.

GPUPaddleBoxSparse Parameters

0 likes · 15 min read

PaddleBox: A GPU‑Based Ultra‑Large‑Scale Sparse DNN Training Framework

DataFunTalk

Dec 23, 2021 · Artificial Intelligence

Deep Customization and Optimization of TensorFlow for Large-Scale Sparse Training at Meituan

This article details Meituan's internal, heavily customized TensorFlow 1.x implementation that addresses large‑scale sparse parameter support, distributed training challenges, communication bottlenecks, and pipeline optimizations, achieving over ten‑fold scalability improvements and significant per‑node performance gains in recommendation system workloads.

Performance OptimizationSparse ParametersTensorFlow

0 likes · 32 min read

Deep Customization and Optimization of TensorFlow for Large-Scale Sparse Training at Meituan

Meituan Technology Team

Dec 9, 2021 · Artificial Intelligence

Deep Customization of TensorFlow for Large-Scale Sparse Training at Meituan

Meituan heavily customized TensorFlow 1.x for large‑scale sparse training, replacing variable embeddings with hash tables, improving load balancing, using RDMA communication, pipeline‑embedding graphs, high‑performance hash tables, and operator merges, achieving over ten‑fold scalability, up to 51% operator speedups, and enabling billions‑parameter models on CPU clusters with future GPU expansion.

Performance OptimizationRecommendation SystemsSparse Parameters

0 likes · 31 min read

Deep Customization of TensorFlow for Large-Scale Sparse Training at Meituan