Author

Lao Guo's Learning Space

AI learning, discussion, and hands‑on practice with self‑reflection

Articles

Likes

Views

Comments

Latest from Lao Guo's Learning Space

62 recent articles

Lao Guo's Learning Space

May 13, 2026 · Artificial Intelligence

Can Trillion-Parameter Models Skip ‘Slow Thinking’? Ant’s Ling‑2.6‑1T Redefines Efficient LLMs

Ant’s newly released Ling‑2.6‑1T, a trillion‑parameter LLM, combines a hybrid MLA‑plus‑Linear Attention architecture to deliver 256K context, ultra‑low token cost and millisecond‑level latency, achieving GPT‑5.4‑level performance on multiple benchmarks while being open‑sourced for developers.

Ant AIFast ThinkingHybrid Architecture

0 likes · 10 min read

Can Trillion-Parameter Models Skip ‘Slow Thinking’? Ant’s Ling‑2.6‑1T Redefines Efficient LLMs

Lao Guo's Learning Space

May 12, 2026 · Artificial Intelligence

Demystifying the Core Technologies Behind ChatGPT, GPT‑4, and DeepSeek

This article breaks down the key algorithms that power large‑language models—Transformer, Mixture‑of‑Experts, Flash Attention, KV‑Cache, Multi‑Token Prediction, quantization, Chain‑of‑Thought and Retrieval‑Augmented Generation—explaining how each contributes to the performance of ChatGPT, GPT‑4 and DeepSeek.

Flash AttentionKV CacheMixture of Experts

0 likes · 10 min read

Demystifying the Core Technologies Behind ChatGPT, GPT‑4, and DeepSeek

Lao Guo's Learning Space

May 12, 2026 · Artificial Intelligence

Which Inference Framework Maximizes Your GPU Performance in 2026?

This article compares six popular LLM inference frameworks—vLLM, TensorRT‑LLM, llama.cpp, ds4.c, Ollama, and Omlx—across performance, ease of use, and hardware compatibility, then provides a practical matrix to help users select the best fit for their GPU.

Apple SiliconGPU performanceLLM inference

0 likes · 10 min read

Which Inference Framework Maximizes Your GPU Performance in 2026?

Lao Guo's Learning Space

May 11, 2026 · Artificial Intelligence

Redis Creator Releases Pure‑C Engine That Makes DeepSeek V4 Run Fast on Mac

Redis founder antirez unveiled ds4.c, a pure‑C inference engine that leverages Objective‑C and Metal to run DeepSeek V4 locally on Mac devices, delivering about 27 token/s on an M3 Ultra—far slower than GPU servers but offering a dependency‑free, on‑device solution that keeps data private.

AICDeepSeek

0 likes · 8 min read

Redis Creator Releases Pure‑C Engine That Makes DeepSeek V4 Run Fast on Mac

Lao Guo's Learning Space

May 10, 2026 · Industry Insights

Don't Rush to Buy GPUs: 5 Truths About Deploying Enterprise Large Models

The article reveals five hard‑won truths for enterprises adopting large AI models, showing why buying GPUs first often stalls projects and outlining how to define business goals, start with API‑based pilots, run small‑scale trials, invest in data pipelines, and build robust evaluation frameworks.

API pilotGPU procurementdata preparation

0 likes · 9 min read

Don't Rush to Buy GPUs: 5 Truths About Deploying Enterprise Large Models

Lao Guo's Learning Space

May 9, 2026 · Artificial Intelligence

How Top Credit Data Firms Use AI to Transform Risk Management: 5 Key Practices

AI is transforming credit risk assessment by automating data profiling, anomaly detection, rating, early warning, and compliance auditing, cutting manual review costs from millions, boosting data coverage to over 99%, improving consistency and speed, and enabling firms to shift from reactive to proactive risk control.

AIAnomaly DetectionCompliance Automation

0 likes · 11 min read

How Top Credit Data Firms Use AI to Transform Risk Management: 5 Key Practices

Lao Guo's Learning Space

May 7, 2026 · Artificial Intelligence

Gemma 4 MTP Deep Dive: Speculative Decoding & KV‑Cache Sharing for 3× Faster Inference

The article explains why large‑language‑model inference is bottlenecked by memory‑bandwidth, then details Google’s Gemma 4 MTP technique—using a small draft model with speculative decoding and shared KV‑Cache—to parallelize token prediction, achieving up to three‑fold speed gains without any loss in output quality, and provides step‑by‑step local deployment instructions.

Gemma 4Inference OptimizationKV Cache

0 likes · 11 min read

Gemma 4 MTP Deep Dive: Speculative Decoding & KV‑Cache Sharing for 3× Faster Inference

Lao Guo's Learning Space

May 6, 2026 · Artificial Intelligence

Why Your RAG Keeps Missing the Mark: Enterprise‑Level Pitfall Guide

This article examines why Retrieval‑Augmented Generation systems that work in demos often fail in production, detailing common pitfalls—from chunking and vector‑database selection to hybrid retrieval and re‑ranking—and offers concrete strategies, configuration tips, and a decision tree to build reliable enterprise‑grade RAG solutions.

ChunkingHybrid RetrievalRAG

0 likes · 12 min read

Why Your RAG Keeps Missing the Mark: Enterprise‑Level Pitfall Guide

Lao Guo's Learning Space

May 5, 2026 · Artificial Intelligence

Top DIY AI Supercomputer Builds 2026: RTX 5090 & GB300 from $300‑$100k

Analyzing the cost‑benefit of building personal AI supercomputers, the article compares cloud GPU rentals to DIY setups across budgets from $300 to $100k, detailing component choices such as RTX 5090, GB300, Mac Studio, and DGX Spark, while highlighting performance gains, ROI timelines, and common build pitfalls.

AI workstationDIY supercomputerGB300

0 likes · 14 min read

Top DIY AI Supercomputer Builds 2026: RTX 5090 & GB300 from $300‑$100k

Lao Guo's Learning Space

May 5, 2026 · Artificial Intelligence

AMD Ryzen AI MAX+ PRO 495 Review: The Most Powerful Mobile APU Yet

The AMD Ryzen AI MAX+ PRO 495 (code‑named Gorgon Halo) boosts memory bandwidth, expands unified memory to up to 256 GB, and delivers 55‑60 TOPS NPU performance, resulting in roughly 4 % multi‑core and 3 % single‑core gains over its predecessor while targeting demanding AI workloads on thin‑and‑light laptops.

AMDMobile APUNPU

0 likes · 9 min read

AMD Ryzen AI MAX+ PRO 495 Review: The Most Powerful Mobile APU Yet