Author

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

210

Articles

Likes

266

Views

Comments

Latest from Old Zhang's AI Learning

100 recent articles max

Old Zhang's AI Learning

May 31, 2026 · Artificial Intelligence

Qwen3.6-35B-A3B NVFP4: A Stable, Highly Compressed Quantized Model

NVIDIA's NVFP4 quantization reduces Qwen3.6-35B-A3B's memory footprint by threefold with almost no accuracy loss, offers plug‑and‑play deployment via vLLM, and outperforms other 4‑bit formats on Hopper/Blackwell GPUs, making it a practical choice for production AI workloads.

MoENVFP4Qwen3.6-35B-A3B

0 likes · 13 min read

Qwen3.6-35B-A3B NVFP4: A Stable, Highly Compressed Quantized Model

Old Zhang's AI Learning

May 31, 2026 · Artificial Intelligence

vLLM 0.22 Release: Production-Ready DeepSeek V4 and Extreme KV Cache Compression

The vLLM 0.22 stable release introduces production‑grade DeepSeek V4 support, massive kernel fusions, up to 10‑20× speedups, Batch Invariance with 28.9% latency gain, a Rust front‑end, multi‑level KV cache offload that can double context length, and broad hardware coverage across NVIDIA, AMD, CPU and RISC‑V, making it a pivotal upgrade for inference infrastructure teams.

Batch InvarianceDeepSeek V4Inference Optimization

0 likes · 13 min read

vLLM 0.22 Release: Production-Ready DeepSeek V4 and Extreme KV Cache Compression

Old Zhang's AI Learning

May 30, 2026 · Artificial Intelligence

Turning Technical Books into Claude Code Skills: Unlocking Internal Documentation as Reusable Skills

The article introduces the open‑source "book-to-skill" tool that compiles PDFs or EPUBs into Claude Code skills, explains its on‑demand loading architecture, compares it with raw PDF retrieval and RAG, and provides detailed implementation steps, performance numbers, and practical usage guidelines.

AIClaudeRAG

0 likes · 12 min read

Turning Technical Books into Claude Code Skills: Unlocking Internal Documentation as Reusable Skills

Old Zhang's AI Learning

May 30, 2026 · Artificial Intelligence

vLLM Semantic Router Deep Dive: Engineering Multimodal Routing and Bug Fixes

The article details the vLLM Semantic Router's Signal-Decision architecture, explores multimodal routing challenges, uncovers an 82% visual signal reversal issue, and walks through three layered bug fixes that restore cosine similarity above 0.999 across extensive tests.

Bug FixEmbeddingMultimodal

0 likes · 13 min read

vLLM Semantic Router Deep Dive: Engineering Multimodal Routing and Bug Fixes

Old Zhang's AI Learning

May 30, 2026 · Artificial Intelligence

vLLM Introduces Native RL API for Seamless Weight Synchronization

vLLM’s new native RL API introduces a four‑stage weight‑transfer protocol, pluggable backends, and a keep‑mode pause/resume mechanism that eliminates deadlocks in DPEP deployments, with large‑scale validations on SkyRL and Prime‑RL demonstrating reliability and performance gains.

CUDA IPCNCCLRL API

0 likes · 14 min read

vLLM Introduces Native RL API for Seamless Weight Synchronization

Old Zhang's AI Learning

May 30, 2026 · Artificial Intelligence

Set Up an Entire AI Development Pipeline with a Single Command

AI Factory is an npm package that automates the configuration of a full AI development pipeline—detecting project stacks, installing required skills and services, and providing a spec‑driven, multi‑agent workflow with planning, implementation, verification, and handoff commands—so developers can focus on writing requirements.

AI agentsAI developmentautomation

0 likes · 9 min read

Set Up an Entire AI Development Pipeline with a Single Command

Old Zhang's AI Learning

May 29, 2026 · Artificial Intelligence

Run Your Own AI‑Powered Company with 170+ Ready‑to‑Work Agents

The article reviews the open‑source “The Agency” repository, which bundles over 170 AI‑agent subagents across 17 departments—from engineering and design to marketing and sales—providing role‑based prompts, SOPs, and deliverables for Claude Code and other tools, and shares installation steps, usage examples, and practical tips.

AI agentsClaude CodeOpen Source

0 likes · 10 min read

Run Your Own AI‑Powered Company with 170+ Ready‑to‑Work Agents

Old Zhang's AI Learning

May 29, 2026 · Artificial Intelligence

How NVIDIA’s Polar Enables Any Agent Framework to Plug Into Reinforcement Learning

Integrating diverse AI agent harnesses into reinforcement‑learning pipelines is notoriously labor‑intensive, but NVIDIA’s new Polar system inserts an API‑proxy layer that treats any harness as a black box, enabling seamless rollout recording and trajectory reconstruction, as demonstrated by dramatic performance gains on a 4B model across multiple harnesses.

AI agentAPI ProxyNVIDIA

0 likes · 10 min read

How NVIDIA’s Polar Enables Any Agent Framework to Plug Into Reinforcement Learning

Old Zhang's AI Learning

May 29, 2026 · Artificial Intelligence

How I Got an AI Agent to Open a Browser, Scrape Hugging Face Papers, and Auto‑Post to X

This article reviews LocoAgent, an open‑source AI‑powered social‑media agent that uses real Chrome sessions to fetch Hugging Face daily papers, process them with a lightweight model, and automatically post summaries to X via customizable workflows, detailing setup, execution, and observed results.

AI agentHugging FaceSocial Media

0 likes · 8 min read

How I Got an AI Agent to Open a Browser, Scrape Hugging Face Papers, and Auto‑Post to X

Old Zhang's AI Learning

May 28, 2026 · Artificial Intelligence

How Anthropic Contains Claude: Three Isolation Strategies Explained

Anthropic’s engineering blog reveals that securing powerful AI agents like Claude requires focusing on blast radius and implementing three layered defenses—model, environment, and external content—through distinct isolation approaches, hard OS sandboxes, and practical lessons from real‑world pitfalls.

AI agent securityAnthropicClaude

0 likes · 11 min read

How Anthropic Contains Claude: Three Isolation Strategies Explained