Tag

Distributed Deep Learning

1 views collected around this technical thread.

AntTech
AntTech
Jul 13, 2020 · Artificial Intelligence

ElasticDL: An Open‑Source Distributed Deep Learning Framework with Elastic Scheduling

ElasticDL is an open‑source distributed deep learning framework built on TensorFlow 2.x and Kubernetes that simplifies programming by letting users define models with the Keras API, while providing elastic scheduling, fault tolerance, and significant performance gains demonstrated through extensive benchmarks.

Distributed Deep LearningElasticDLKubernetes
0 likes · 19 min read
ElasticDL: An Open‑Source Distributed Deep Learning Framework with Elastic Scheduling
AntTech
AntTech
Oct 17, 2019 · Artificial Intelligence

From a 30‑Year Coding Journey to AI Infrastructure: Wang Yi’s Story and the Open‑Source Projects SQLFlow and ElasticDL

The article chronicles Wang Yi’s three‑decade programming career, his moves across Tencent, Google, Baidu and Ant Financial, and explains how his open‑source AI infrastructure projects SQLFlow and ElasticDL transform model development for analysts while promoting a culture of code review and practical engineering.

AI infrastructureDistributed Deep LearningElasticDL
0 likes · 12 min read
From a 30‑Year Coding Journey to AI Infrastructure: Wang Yi’s Story and the Open‑Source Projects SQLFlow and ElasticDL
AntTech
AntTech
Sep 11, 2019 · Artificial Intelligence

ElasticDL: An Open‑Source Elastic Deep Learning System Built on TensorFlow 2.0 and Kubernetes

ElasticDL, the first industry‑level open‑source system for elastic deep learning on TensorFlow, leverages Kubernetes‑native scheduling, fault‑tolerance, and TensorFlow 2.0 Eager Execution to dramatically improve cluster utilization, simplify distributed training, and integrate seamlessly with tools like Kubeflow and SQLFlow.

AI infrastructureDistributed Deep LearningElasticDL
0 likes · 13 min read
ElasticDL: An Open‑Source Elastic Deep Learning System Built on TensorFlow 2.0 and Kubernetes