Author

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

210

Articles

Likes

266

Views

Comments

Latest from Old Zhang's AI Learning

100 recent articles max

Old Zhang's AI Learning

May 11, 2026 · Artificial Intelligence

Open‑Source Qwen3.6‑35B‑A3B Runs at 162 tok/s on a Single RTX 5090

The article introduces the open‑source Qwen3.6‑35B‑A3B model, explains its MoE architecture, three‑stage LoRA fine‑tuning, shows benchmark results where it achieves 161.9 tok/s on an RTX 5090—2.6× faster than a dense 27B counterpart—and discusses deployment tips, quantized GGUF release, and known compatibility pitfalls.

GGUF quantizationLarge Language ModelLoRA fine-tuning

0 likes · 7 min read

Open‑Source Qwen3.6‑35B‑A3B Runs at 162 tok/s on a Single RTX 5090

Old Zhang's AI Learning

May 11, 2026 · Artificial Intelligence

Exploring Hermes Workspace: An Open‑Source Multi‑Agent UI for Local LLMs

Hermes Workspace is an open‑source web‑based dashboard that integrates chat, file management, memory, skills, terminal access, and multi‑agent scheduling into a single interface, offering a zero‑fork deployment model and advanced swarm capabilities for local large‑model workflows.

Hermes AgentOpen SourcePWA

0 likes · 6 min read

Exploring Hermes Workspace: An Open‑Source Multi‑Agent UI for Local LLMs

Old Zhang's AI Learning

May 11, 2026 · Information Security

Critical CVE-2026-7482 'Bleeding Llama' in Ollama: Why You Must Upgrade Now

Ollama versions before 0.17.1 suffer a CVSS 9.1 heap out‑of‑bounds read vulnerability (CVE‑2026‑7482) that lets attackers upload malicious GGUF files, read server memory—including env vars and API keys—and exfiltrate data, affecting over 300,000 publicly exposed servers, so immediate upgrade and hardening are essential.

API vulnerabilityBleeding LlamaCVE-2026-7482

0 likes · 5 min read

Critical CVE-2026-7482 'Bleeding Llama' in Ollama: Why You Must Upgrade Now

Old Zhang's AI Learning

May 11, 2026 · Artificial Intelligence

Ling-2.6-1T: 1T‑Parameter, Fast‑Thinking, Agent‑Ready Model After DeepSeek‑V4

Ant Group's Ling‑2.6‑1T, a 1‑trillion‑parameter LLM built for token efficiency and fast‑thinking, outperforms on elite reasoning and agentic benchmarks, offers easy local deployment via vLLM or SGLang, provides a quantized 3.6‑bit version, and includes practical usage tips for developers and knowledge workers.

Agentic ModelClaude Code IntegrationLing-2.6-1T

0 likes · 12 min read

Ling-2.6-1T: 1T‑Parameter, Fast‑Thinking, Agent‑Ready Model After DeepSeek‑V4

Old Zhang's AI Learning

May 10, 2026 · Frontend Development

How Konva’s New MCP Server Lets AI Agents Draw Whiteboards and Flowcharts

The article explores Konva.js—a popular 2D Canvas library—its new official MCP server that feeds the full documentation to LLMs, enabling AI agents like Cursor and Claude Desktop to generate accurate interactive Canvas code, and compares Konva with other canvas solutions.

AI agentsCanvasFrontend

0 likes · 12 min read

How Konva’s New MCP Server Lets AI Agents Draw Whiteboards and Flowcharts

Old Zhang's AI Learning

May 10, 2026 · Artificial Intelligence

DFlash Boosts Large Model Inference Up to 6× – Now Supporting DeepSeek-V4

DFlash replaces the speculative draft model with a block‑diffusion drafter, generating 16 tokens per forward pass and achieving up to 6× speedup over baseline (2.5× over EAGLE‑3) without quality loss, while supporting a wide range of open‑source LLMs and multiple back‑ends.

Block DiffusionDFlashLLM inference

0 likes · 12 min read

DFlash Boosts Large Model Inference Up to 6× – Now Supporting DeepSeek-V4

Old Zhang's AI Learning

May 9, 2026 · Artificial Intelligence

Claude’s Open‑Source Financial Skills: A Deep Dive

Anthropic’s new claude‑for‑financial‑services repository bundles 11 ready‑to‑run agents, vertical plugins, and 11 MCP data connectors that automate core Wall Street workflows—from pitch decks and earnings reviews to valuation modeling—while offering clear installation paths and guidance for enterprise customization.

AI agentsClaudeFinancial Services

0 likes · 13 min read

Claude’s Open‑Source Financial Skills: A Deep Dive

Old Zhang's AI Learning

May 9, 2026 · Artificial Intelligence

Why Gemini’s Multimodal RAG with File Search Is So Compelling

The article analyzes Google Gemini’s File Search tool as a fully managed multimodal RAG solution, detailing its architecture, key features, pricing model, step‑by‑step usage, strengths, limitations, and how it compares with OpenAI Assistants File Search and Vertex AI Search.

AI RetrievalEmbeddingFile Search

0 likes · 14 min read

Why Gemini’s Multimodal RAG with File Search Is So Compelling

Old Zhang's AI Learning

May 9, 2026 · Artificial Intelligence

Run Local LLM Agents on Claude Code, Codex and OpenClaw with Just 24 GB VRAM via Unsloth API

The article explains how Unsloth’s dual‑protocol API lets you run Claude Code, Codex and OpenClaw locally on a 24 GB GPU, details installation steps, hardware limits, configuration for each CLI, and shares real‑world performance pros and cons.

24GB VRAMClaude CodeCodex

0 likes · 12 min read

Run Local LLM Agents on Claude Code, Codex and OpenClaw with Just 24 GB VRAM via Unsloth API

Old Zhang's AI Learning

May 8, 2026 · Artificial Intelligence

Testing RHTV: Native AI Agent Powers One‑Stop Face‑Swap, Image Refinement, and Video Production

The article evaluates RunningHub’s RHTV platform, showing how its native AI agent integrates face‑swap, product‑image refinement and video generation on a single infinite canvas, eliminating the fragmented workflow of other tools and enabling rapid, controllable short‑form video creation demonstrated with a toothbrush‑promotion example.

AI agentsAI video generationRHTV

0 likes · 7 min read

Testing RHTV: Native AI Agent Powers One‑Stop Face‑Swap, Image Refinement, and Video Production