Old Zhang's AI Learning
Author

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

210
Articles
0
Likes
266
Views
0
Comments
Recent Articles

Latest from Old Zhang's AI Learning

100 recent articles max
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 28, 2026 · Artificial Intelligence

OpenAI’s Latest Open‑Source Releases: Codex CLI, Plugins, Symphony, and Privacy‑Filter

OpenAI has recently open‑sourced three projects—Codex CLI, the openai/plugins repository, the engineering‑preview Symphony orchestration service, and the privacy‑filter model—detailing installation, plugin architecture, workflow orchestration design, and usage examples, while comparing them to competing agents and noting practical constraints.

AI agentCodex CLIOpen Source
0 likes · 17 min read
OpenAI’s Latest Open‑Source Releases: Codex CLI, Plugins, Symphony, and Privacy‑Filter
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 27, 2026 · Artificial Intelligence

Taming Claude Code: A Simple Skill Slashes Unnecessary Code Bloat

The author evaluates a community‑crafted “Karpathy Skills” plugin for Claude Code, applying four concise coding principles, and shows through a controlled experiment that the skill‑guided model produces far fewer superfluous changes—38 lines versus 95—while still fixing the targeted bug and improving code quality.

Claude CodeLLMcode quality
0 likes · 12 min read
Taming Claude Code: A Simple Skill Slashes Unnecessary Code Bloat
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 26, 2026 · Artificial Intelligence

Distilling Claude Opus into Qwen3.6-27B – GGUF Lets You Run Locally on Consumer GPUs

The preview model Qwopus3.6-27B‑v1, distilled from Claude Opus onto Qwen3.6‑27B using SFT with the Unsloth stack and a curated 12 K high‑quality inference sample set, is evaluated on agentic reasoning, front‑end design, and Canvas/WebGL tasks with an RTX 5090, and can be deployed locally via llama.cpp GGUF quantizations with detailed memory guidelines.

Apache-2.0Claude OpusGGUF
0 likes · 7 min read
Distilling Claude Opus into Qwen3.6-27B – GGUF Lets You Run Locally on Consumer GPUs
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 26, 2026 · Artificial Intelligence

Why Deploying DeepSeek‑V4 Locally with vLLM Is So Challenging

The article dissects DeepSeek‑V4’s local deployment using vLLM, explaining the steep hardware requirements, the complex heterogeneous KV‑cache architecture, and the aggressive kernel‑fusion and multi‑stream optimizations that together make high‑context inference both memory‑intensive and engineering‑heavy.

DeepSeek V4GPU MemoryKV Cache
0 likes · 15 min read
Why Deploying DeepSeek‑V4 Locally with vLLM Is So Challenging
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 25, 2026 · Artificial Intelligence

Deploying DeepSeek‑V4‑Flash Locally on 2 × NVIDIA H20 (96 GB) – Quick Performance Test

This article walks through deploying DeepSeek‑V4‑Flash on a server with two NVIDIA H20 GPUs (96 GB each), detailing model download, Docker image preparation, launch script tweaks, memory compression via FP8 and expert parallelism, and reports observed concurrency limits and token‑per‑second speeds, including a test that disables the model's thinking mode.

DeepSeek V4DockerFP8 quantization
0 likes · 6 min read
Deploying DeepSeek‑V4‑Flash Locally on 2 × NVIDIA H20 (96 GB) – Quick Performance Test
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 23, 2026 · Artificial Intelligence

DeepSeek Quietly Open‑Sources TileKernels to Push GPU Performance to Its Limits

DeepSeek has released TileKernels, a GPU kernel library written in the TileLang DSL, that targets H100/H200/B200 GPUs and claims to approach hardware limits in compute intensity and memory bandwidth, offering MoE routing, FP8/FP4 quantization, and dual‑language PyTorch references for deep‑learning engineers.

FP8 quantizationGPU optimizationLLM training
0 likes · 9 min read
DeepSeek Quietly Open‑Sources TileKernels to Push GPU Performance to Its Limits