Old Zhang's AI Learning
Author

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

210
Articles
0
Likes
266
Views
0
Comments
Recent Articles

Latest from Old Zhang's AI Learning

100 recent articles max
Old Zhang's AI Learning
Old Zhang's AI Learning
May 7, 2026 · Artificial Intelligence

How Unsloth and NVIDIA Boost Consumer‑GPU LLM Training by ~25% with Three Simple Optimizations

Unsloth and NVIDIA identified three low‑level bottlenecks in LLM fine‑tuning on consumer GPUs—repeated packed‑sequence metadata construction, serialized copy‑and‑compute during gradient checkpointing, and per‑expert routing overhead in MoE—and applied targeted patches that together deliver roughly a 25% speedup without changing hardware, code, or frameworks.

GPU optimizationLLM trainingMixture of Experts
0 likes · 12 min read
How Unsloth and NVIDIA Boost Consumer‑GPU LLM Training by ~25% with Three Simple Optimizations
Old Zhang's AI Learning
Old Zhang's AI Learning
May 6, 2026 · Artificial Intelligence

Google Boosts Gemma 4 Inference Speed Up to 3× with MTP Drafter and Day‑0 vLLM Support

Google’s new Multi‑Token Prediction (MTP) drafter for Gemma 4 delivers up to three‑fold inference speedups across hardware and frameworks—validated by official benchmarks and independent DGX Spark tests—while preserving identical output quality, and is immediately usable via Hugging Face, vLLM, MLX, Ollama and edge‑device runtimes.

Apple SiliconGemma 4LLM inference
0 likes · 9 min read
Google Boosts Gemma 4 Inference Speed Up to 3× with MTP Drafter and Day‑0 vLLM Support
Old Zhang's AI Learning
Old Zhang's AI Learning
May 6, 2026 · Information Security

Why Large‑Model AI Agents Need Strict Security Controls

The article compares AWS Rex, which enforces Cedar policies on Rhai scripts, with Vercel deepsec, which lets powerful coding agents hunt vulnerabilities, showing how both defensive and offensive approaches are shaping the emerging security model for AI agents in production.

AI agentsCedarRex
0 likes · 12 min read
Why Large‑Model AI Agents Need Strict Security Controls
Old Zhang's AI Learning
Old Zhang's AI Learning
May 6, 2026 · Artificial Intelligence

GPT-5.5 Instant Arrives: Smarter, Clearer, More Personalized AI

OpenAI has silently replaced the default ChatGPT model with GPT‑5.5 Instant, delivering a 52.5% drop in hallucinations, 30% shorter responses, deeper personalization via memory sources, and higher benchmark scores across a range of professional tasks, while rolling out new pricing and usage tiers.

AI benchmarksChatGPTGPT-5.5
0 likes · 11 min read
GPT-5.5 Instant Arrives: Smarter, Clearer, More Personalized AI
Old Zhang's AI Learning
Old Zhang's AI Learning
May 6, 2026 · Artificial Intelligence

Solving RAG’s Biggest Pain Point: Introducing the Open‑Source CocoIndex

RAG and agent contexts suffer from stale data, not chunking or reranking, and CocoIndex—a Rust‑based incremental engine with a declarative Python API—offers fresh, delta‑processed context, automatic schema evolution, and production‑grade features, demonstrated through PDF‑to‑Markdown pipelines and a podcast knowledge‑graph case study.

PythonRAGRust
0 likes · 13 min read
Solving RAG’s Biggest Pain Point: Introducing the Open‑Source CocoIndex
Old Zhang's AI Learning
Old Zhang's AI Learning
May 6, 2026 · Frontend Development

Testing Open‑Slide: A React‑Based PPT Framework Built for AI Agents

Open‑slide is a React and Tailwind powered slide framework designed for AI coding agents such as Claude Code, allowing natural‑language prompts to generate 1920×1080 decks with agent‑native authoring, inspector comments, asset management, presenter mode, static deployment, and a hands‑on evaluation of its strengths and limitations.

AI agentsClaude CodeFrontend
0 likes · 11 min read
Testing Open‑Slide: A React‑Based PPT Framework Built for AI Agents
Old Zhang's AI Learning
Old Zhang's AI Learning
May 5, 2026 · Artificial Intelligence

Claude Enters Finance: 10 Open‑Source Financial Agent Templates Unveiled

Anthropic released ten ready‑to‑use financial Agent templates that bundle skills, data connectors and sub‑agents, can run natively in Excel, PowerPoint, Word and Outlook, are open‑sourced on GitHub, support two deployment modes, score 64.37% on the Vals AI finance benchmark, and integrate dozens of market data sources, while offering both strengths and notable limitations.

Agent TemplatesClaudeData Connectors
0 likes · 14 min read
Claude Enters Finance: 10 Open‑Source Financial Agent Templates Unveiled
Old Zhang's AI Learning
Old Zhang's AI Learning
May 5, 2026 · Artificial Intelligence

vLLM 0.20.1 Fixes Instability and Speed Issues for DeepSeek V4

The vLLM 0.20.1 patch, released shortly after 0.20.0, consolidates stability fixes and performance optimizations for DeepSeek V4, adds several bug fixes, updates installation instructions, and provides targeted upgrade recommendations for different user scenarios.

Bug FixDeepSeek V4GPU inference
0 likes · 9 min read
vLLM 0.20.1 Fixes Instability and Speed Issues for DeepSeek V4
Old Zhang's AI Learning
Old Zhang's AI Learning
May 4, 2026 · Artificial Intelligence

How DeepSeek’s New Paper Redefines Multimodal Reasoning with Visual Primitives

DeepSeek’s new paper "Thinking with Visual Primitives" tackles the reference gap in multimodal models by introducing points and boxes as reasoning units, achieving up to 8× token efficiency and leading benchmark scores in counting, spatial reasoning, and maze navigation compared with GPT‑5.4, Claude‑Sonnet‑4.6 and Gemini‑3‑Flash.

DeepSeekMultimodalVisual Primitives
0 likes · 10 min read
How DeepSeek’s New Paper Redefines Multimodal Reasoning with Visual Primitives