Tagged articles
4 articles
Page 1 of 1
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
May 27, 2026 · Artificial Intelligence

Optimizing Large Model Inference Architecture for the Agent Era: Engineering Practices and Challenges

The article analyzes the architectural challenges of large‑model inference in the Agent era—such as memory‑intensive MLA structures, MoE communication overhead, exploding KV‑Cache size, and tool‑call accuracy—and presents a series of engineering solutions including hierarchical KV‑Cache pooling, sequence parallelism, offloading strategies, and chip‑level adaptations to achieve higher throughput and lower token costs.

AI InfraAgentDeepSeek
0 likes · 15 min read
Optimizing Large Model Inference Architecture for the Agent Era: Engineering Practices and Challenges
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Sep 26, 2024 · Artificial Intelligence

How Alibaba Cloud’s PAI Tackles Large‑Model Training and Inference Challenges in 2024

At the 2024 Yunqi Conference, Alibaba Cloud’s AI Infra experts detailed the latest challenges of large‑model deployment—such as hardware costs, resource management, and software‑hardware coordination—and introduced PAI’s new capabilities, including stability tools, automated distributed training, reinforcement‑learning frameworks, inference optimizations, and integrated big‑data AI solutions.

AI InfraBig Data IntegrationInference Optimization
0 likes · 14 min read
How Alibaba Cloud’s PAI Tackles Large‑Model Training and Inference Challenges in 2024
AntTech
AntTech
Jul 12, 2023 · Artificial Intelligence

Hybrid Embedding Architecture for Large‑Scale Sparse CTR Models

This article describes the Hybrid Embedding solution proposed by Ant AI Infra to address storage, resource, and feature‑governance challenges of massive sparse CTR models, detailing its multi‑layer storage design, KV‑based parameter server, and performance gains in large‑scale recommendation systems.

AI InfraCTRHybrid Embedding
0 likes · 9 min read
Hybrid Embedding Architecture for Large‑Scale Sparse CTR Models