How Distributed Scheduling Redefines AI Large-Model Training Architecture

The article examines how the explosive compute, storage, network, and fault‑tolerance demands of AI large‑model training force a fundamental redesign of system architecture, covering layered storage, optimized All‑Reduce communication, elastic resource orchestration, observability, and cost‑saving strategies.

AI ArchitectureCompute SchedulingStorage Hierarchy

0 likes · 9 min read

How Distributed Scheduling Redefines AI Large-Model Training Architecture

JD Retail Technology

Mar 18, 2025 · Artificial Intelligence

Multi‑Agent Reinforcement Learning Based Full‑Chain Computation Allocation (MaRCA) for Advertising Systems

MaRCA, a multi‑agent reinforcement‑learning framework, allocates compute across JD’s advertising playback chain by jointly estimating user value, resource consumption, and action outcomes while dynamically adjusting to real‑time load, achieving roughly 15 % higher ad revenue without extra compute resources.

AdvertisingCompute Schedulingdeep learning

0 likes · 18 min read

Multi‑Agent Reinforcement Learning Based Full‑Chain Computation Allocation (MaRCA) for Advertising Systems

Baidu Geek Talk

Dec 30, 2024 · Industry Insights

How Baidu’s HTAP Table Storage Achieves Massive IO Gains and Faster Development

Baidu’s Search Content Storage team built an HTAP table storage system and a serverless compute‑scheduling architecture that separates OLTP and OLAP workloads, delivering up to 200 GB/s peak IO, reducing storage cost by 75 %, and enabling SQL‑style task development with native FaaS functions.

Big DataCompute SchedulingHTAP

0 likes · 20 min read

How Baidu’s HTAP Table Storage Achieves Massive IO Gains and Faster Development

iQIYI Technical Product Team

Oct 24, 2024 · Big Data

iQIYI Multi-AZ Unified Scheduling Architecture for Big Data

iQIYI’s Multi‑AZ unified scheduling architecture combines a unified storage layer (QBFS), an abstracted compute scheduler (QBCS), and a federated metadata service (Waggle Dance) to seamlessly route data and jobs across availability zones, cut storage costs up to 65 %, reduce overall big‑data workload expenses by more than 35 %, and lay the groundwork for future hybrid‑cloud expansion.

Big DataCompute SchedulingUnified Storage

0 likes · 15 min read

iQIYI Multi-AZ Unified Scheduling Architecture for Big Data

Yum! Tech Team

Jan 29, 2024 · Cloud Computing

Flexible Compute Scheduling Practices in the Restaurant Industry: A Yum China Case Study

This article examines the challenges of uneven compute resource distribution across China and presents Yum China's practical approaches—including multi‑unit deployment, dual‑data‑center scheduling, and supporting platforms—to achieve flexible, cost‑effective compute scheduling for the restaurant sector.

Compute SchedulingResource OptimizationYum China

0 likes · 5 min read

Flexible Compute Scheduling Practices in the Restaurant Industry: A Yum China Case Study