Baidu Geek Talk
Author

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

515
Articles
0
Likes
1.1k
Views
0
Comments
Recent Articles

Latest from Baidu Geek Talk

100 recent articles max
Baidu Geek Talk
Baidu Geek Talk
Mar 25, 2026 · Artificial Intelligence

Master OpenClaw: Install, Configure, and Scale Multi‑Agent AI Automation

An in‑depth guide walks you through installing OpenClaw, understanding its gateway, channel, agent, tool, and skill architecture, managing token costs, creating multi‑agent workflows, securing API keys, and troubleshooting common issues, empowering developers to build scalable AI‑driven automation.

InstallationOpenClawautomation
0 likes · 26 min read
Master OpenClaw: Install, Configure, and Scale Multi‑Agent AI Automation
Baidu Geek Talk
Baidu Geek Talk
Mar 23, 2026 · Databases

How Baidu’s MEG Platform Revamped ClickHouse with a Lakehouse Architecture

This article analyzes the challenges of scaling ClickHouse within Baidu’s MEG data platform and details a lake‑house solution that decouples storage and compute, integrates a meta‑service for transparent data access, optimizes query performance through caching, data roll‑up and layout tuning, and introduces a unified query gateway that gracefully falls back to Spark for complex workloads.

ClickHouseData PlatformLakehouse
0 likes · 25 min read
How Baidu’s MEG Platform Revamped ClickHouse with a Lakehouse Architecture
Baidu Geek Talk
Baidu Geek Talk
Mar 9, 2026 · Artificial Intelligence

Mastering Agent Skills: Design, Implementation, and Evaluation for Efficient AI Workflows

This article explains why large‑model agents need structured Skills to capture team‑specific knowledge and workflows, describes the evolution from MCP to Skill, details the progressive‑disclosure design, shows how to write, organize, and install Skills, and provides a systematic evaluation and iteration process to ensure high‑quality, token‑efficient agent behavior.

AISkill
0 likes · 26 min read
Mastering Agent Skills: Design, Implementation, and Evaluation for Efficient AI Workflows
Baidu Geek Talk
Baidu Geek Talk
Feb 9, 2026 · Databases

How Mantle Redefined Cloud Object Storage Metadata for Billion‑File Scale

This article recounts how Baidu's storage team tackled the performance and scalability limits of traditional object storage by redesigning metadata handling with the Mantle and MantleX architectures, introducing a centralized IndexNode, strong consistency, delta‑record writes, and a seamless single‑node to distributed transition for massive file systems.

FilesystemPerformance OptimizationScalability
0 likes · 37 min read
How Mantle Redefined Cloud Object Storage Metadata for Billion‑File Scale
Baidu Geek Talk
Baidu Geek Talk
Feb 2, 2026 · Artificial Intelligence

How Cloud AI Infra Powers the Next Wave of Embodied Intelligence

This article outlines the rapid rise of embodied intelligence, the explosion of Vision‑Language‑Action (VLA) research, and how cloud‑based AI infrastructure—including multi‑level IaaS, data pipelines, dual‑system model designs, and reinforcement‑learning workflows—addresses emerging scaling and deployment challenges.

VLAmultimodal modelsreinforcement learning
0 likes · 13 min read
How Cloud AI Infra Powers the Next Wave of Embodied Intelligence
Baidu Geek Talk
Baidu Geek Talk
Jan 7, 2026 · Artificial Intelligence

How Baidu’s vLLM‑Kunlun Plugin Powered MiMo Flash V2 on Kunlun XPU in 2 Days

Within two days, Baidu’s Baige and Kunlun Chip teams adapted the 309‑billion‑parameter MiMo Flash V2 model—featuring a hybrid SWA+Sink and Full Attention mechanism—to run efficiently on the Kunlun P800 XPU using the vLLM‑Kunlun Plugin, achieving lossless performance comparable to GPU inference.

AI inferenceKunlun XPUMiMo Flash V2
0 likes · 7 min read
How Baidu’s vLLM‑Kunlun Plugin Powered MiMo Flash V2 on Kunlun XPU in 2 Days
Baidu Geek Talk
Baidu Geek Talk
Dec 24, 2025 · Artificial Intelligence

Context Parallelism Slashes TTFT by 80% for 128K-Token LLMs

The article explains how Baidu’s Baige team integrated a Context Parallelism strategy into DeepSeek V3.2, detailing the DSA architecture, the limitations of traditional tensor and sequence parallelism, and how CP distributes computation and memory across GPUs to achieve up to an 80 % reduction in token‑to‑first‑token latency for ultra‑long 128K‑token contexts.

Context ParallelismDeepSeekLLM
0 likes · 9 min read
Context Parallelism Slashes TTFT by 80% for 128K-Token LLMs
Baidu Geek Talk
Baidu Geek Talk
Dec 17, 2025 · Artificial Intelligence

Accelerate LLM Deployment on Baidu Kunlun XPU with the Open‑Source vLLM‑Kunlun Plugin

The vLLM‑Kunlun Plugin, jointly released by Baidu Baige and Kunlun Chip, provides a high‑performance, zero‑intrusion solution for deploying open‑source large language models on domestic Kunlun XPU hardware, includes fused operators, precision‑validation and profiling tools, and supports over twenty mainstream and multimodal models.

Kunlun XPUModel DeploymentOpen Source
0 likes · 7 min read
Accelerate LLM Deployment on Baidu Kunlun XPU with the Open‑Source vLLM‑Kunlun Plugin
Baidu Geek Talk
Baidu Geek Talk
Dec 10, 2025 · Artificial Intelligence

How Offloading Latent Cache Boosts DeepSeek‑V3.2‑Exp Decoding Throughput

This report analyzes the memory bottleneck of DeepSeek‑V3.2‑Exp’s sparse‑attention decoder, proposes the Expanded Sparse Server (ESS) to offload the latent cache to CPU memory, and demonstrates through high‑fidelity simulation that the approach dramatically improves decode throughput while keeping latency within acceptable limits.

Cache offloadGPU MemoryLLM inference
0 likes · 20 min read
How Offloading Latent Cache Boosts DeepSeek‑V3.2‑Exp Decoding Throughput
Baidu Geek Talk
Baidu Geek Talk
Nov 10, 2025 · Cloud Native

How Polar‑TCP Breaks Kernel Network Bottlenecks for Cloud‑Native High‑Performance Services

This article explains how traditional kernel network stacks struggle with high‑concurrency, low‑latency cloud data‑center workloads and introduces Baidu Intelligent Cloud’s Polar solution—Polar‑TCP and Polar‑RDMA—which combine user‑space DPDK drivers, a lightweight TCP stack, and an industrial RPC framework to achieve near‑RDMA performance while preserving compatibility with existing TCP ecosystems.

DPDKNetwork StackPerformance Optimization
0 likes · 23 min read
How Polar‑TCP Breaks Kernel Network Bottlenecks for Cloud‑Native High‑Performance Services