Collection size
96 articles
Page 1 of 5
James' Growth Diary
James' Growth Diary
May 13, 2026 · Artificial Intelligence

Multimodal RAG: A Complete Guide to Ingesting Images, Tables, and PDFs

This article examines the blind spot of pure‑text RAG for visual content, compares three multimodal ingestion strategies—CLIP embeddings, image‑to‑text captioning with a MultiVectorRetriever, and ColPali visual retrieval—covers table‑specific handling, presents end‑to‑end TypeScript implementations, and lists common pitfalls to avoid when deploying production‑grade multimodal RAG pipelines.

CLIPColPaliImage Captioning
0 likes · 22 min read
Multimodal RAG: A Complete Guide to Ingesting Images, Tables, and PDFs
JavaEdge
JavaEdge
Apr 2, 2026 · Artificial Intelligence

Unlocking Qwen3.6-Plus: Features, Multimodal Performance, and API Guide

This article provides an in‑depth overview of the Qwen3.6‑Plus model, detailing its million‑token context window, enhanced multimodal reasoning, benchmark results across language and vision tasks, and step‑by‑step instructions for using the official API and integrating the model with popular coding assistants.

Qwen3.6-Plusapi-integrationcode agents
0 likes · 12 min read
Unlocking Qwen3.6-Plus: Features, Multimodal Performance, and API Guide
ByteFE
ByteFE
Oct 11, 2023 · Artificial Intelligence

CR Copilot: An Open‑Source LLM‑Based Code Review Assistant with Private Knowledge Base

This article describes the design and implementation of a code‑review assistant powered by open‑source large language models and a privately hosted knowledge base, covering background, pain points, system architecture, model selection, vector‑store integration, prompt engineering, diff parsing, and practical reflections.

AICode ReviewKnowledge Base
0 likes · 24 min read
CR Copilot: An Open‑Source LLM‑Based Code Review Assistant with Private Knowledge Base
James' Growth Diary
James' Growth Diary
May 14, 2026 · Artificial Intelligence

LLM Semantic Routing Explained: Model‑Based Intent Classification and Three Keyword‑Matching Pitfalls

This article breaks down LLM semantic routing as a classifier, compares keyword, embedding, and LLM‑based routes, provides full TypeScript implementations, introduces hybrid routing for speed and accuracy, and covers production‑grade observability and dynamic configuration to avoid common pitfalls.

Hybrid RoutingLLMLangChain
0 likes · 33 min read
LLM Semantic Routing Explained: Model‑Based Intent Classification and Three Keyword‑Matching Pitfalls
Ops Development Stories
Ops Development Stories
Jul 29, 2025 · Artificial Intelligence

Master AI Agents with LangGraph: Build Adaptive RAG, Translation, and ReAct Agents

This comprehensive guide explains what an AI Agent is, its core capabilities and design patterns, and walks through step‑by‑step implementations of RAG, Translation, and ReAct agents using LangGraph, complete with code samples, workflow diagrams, and practical tips for building personal ops knowledge‑base agents.

LLMLangGraphRAG
0 likes · 64 min read
Master AI Agents with LangGraph: Build Adaptive RAG, Translation, and ReAct Agents
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 27, 2026 · Artificial Intelligence

Putting Kimi K2.5 and Kimi Code to the Test: Real‑World AI Agent Benchmarks

This article presents a hands‑on evaluation of Kimi K2.5 and its open‑source Kimi Code agent across a series of hard‑core prompts, covering Python API generation, cost‑optimized routing, multimodal ECharts visualisation, massive‑scale SQL optimisation, web‑search‑driven research, MoE explanation and video‑to‑code workflows.

AI agentKimiLarge Language Model
0 likes · 9 min read
Putting Kimi K2.5 and Kimi Code to the Test: Real‑World AI Agent Benchmarks
Ops Development Stories
Ops Development Stories
Sep 19, 2024 · Artificial Intelligence

How to Connect Qwen LLMs with Higress AI Gateway: A Hands‑On Guide

This tutorial walks through setting up a local k3d cluster, installing Higress, and using its AI plugins—including AI Proxy, AI JSON formatter, AI Agent, and AI Statistics—to integrate and observe Alibaba Cloud's Qwen large language models across various use cases such as weather and flight queries.

AI gatewayAI pluginsHigress
0 likes · 30 min read
How to Connect Qwen LLMs with Higress AI Gateway: A Hands‑On Guide
Geek Labs
Geek Labs
Mar 31, 2026 · Artificial Intelligence

5 Open‑Source AI Projects: Lark CLI, OpenSpace, G0DM0D3, Awesome‑AI List, and Meta TribeV2

The article presents five notable open‑source AI projects, outlining their features, use cases, and performance: Lark CLI for office automation, OpenSpace with self‑evolving agents (4.2× gain, 46% token saving), G0DM0D3 as a privacy‑focused multi‑model chat alternative, a curated truly‑open AI list, and Meta’s TribeV2 multimodal brain‑encoding model for neuroscience research.

AI agentsG0DM0D3Meta TribeV2
0 likes · 12 min read
5 Open‑Source AI Projects: Lark CLI, OpenSpace, G0DM0D3, Awesome‑AI List, and Meta TribeV2
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Apr 22, 2026 · Artificial Intelligence

Hands‑On Kimi K2.6 + Hermes: A Karpathy‑Style Step‑by‑Step Guide

This article presents a detailed, hands‑on tutorial for deploying Kimi K2.6 with Hermes and Obsidian, showcases multi‑modal video note‑taking, skill creation, self‑evolving LLM‑driven knowledge bases, large‑scale agent clusters, and discusses both the strengths and current limitations of the system.

Agent SystemsHermesKimi K2.6
0 likes · 10 min read
Hands‑On Kimi K2.6 + Hermes: A Karpathy‑Style Step‑by‑Step Guide
Full-Stack Cultivation Path
Full-Stack Cultivation Path
Sep 4, 2024 · Artificial Intelligence

Hot Open-Source RAG Tool for Document Chat: GraphRAG, Multimodal QA & Complex Reasoning

This article introduces Kotaemon, an open‑source Retrieval‑Augmented Generation platform that lets users chat with their documents, offering a self‑hosted web UI, support for local and API LLMs, hybrid retrieval, multimodal question answering, GraphRAG indexing, and advanced reasoning capabilities, along with step‑by‑step installation via App or Docker.

GraphRAGLLMMultimodal QA
0 likes · 6 min read
Hot Open-Source RAG Tool for Document Chat: GraphRAG, Multimodal QA & Complex Reasoning
SuanNi
SuanNi
May 30, 2026 · Artificial Intelligence

Step 3.7 Flash: High‑Efficiency Pro‑Level Agent Model with 400 TPS and Low Cost

Step 3.7 Flash is a 196B‑parameter, 11B‑activation multimodal agent model that delivers 400 TPS inference, superior code‑generation and cross‑framework stability, cost‑effective Advisor Mode, and strong vision and search performance, with extensive benchmark gains over its predecessor and competing models.

AI agentAdvisor ModeMultimodal
0 likes · 12 min read
Step 3.7 Flash: High‑Efficiency Pro‑Level Agent Model with 400 TPS and Low Cost
Kuaishou Tech
Kuaishou Tech
Aug 23, 2025 · Artificial Intelligence

How Thyme Enables Models to Think Beyond Images with Code‑Driven Multimodal Reasoning

The Kwai Keye team presents Thyme, a novel multimodal reasoning framework that lets large language models generate and safely execute Python code for image manipulation and complex calculations, achieving significant performance gains over existing vision‑language models across perception, reasoning, and hallucination‑reduction benchmarks.

AI researchLarge Language ModelMultimodal
0 likes · 12 min read
How Thyme Enables Models to Think Beyond Images with Code‑Driven Multimodal Reasoning
Tech Minimalism
Tech Minimalism
Jan 28, 2026 · Artificial Intelligence

Master Oh My Claude Code: Complete Guide to Multi‑Agent AI Coding with Claude

Oh My Claude Code transforms Claude Code into a multi‑agent orchestration platform, offering five execution modes, 32 specialized agents, automatic model routing, and simple installation, enabling developers to automate complex coding tasks from planning to testing with natural‑language commands.

AI codingClaude CodeOh My Claude Code
0 likes · 15 min read
Master Oh My Claude Code: Complete Guide to Multi‑Agent AI Coding with Claude
Old Zhang's AI Learning
Old Zhang's AI Learning
Jan 23, 2026 · Artificial Intelligence

Top AI Projects: Mobile‑Controlled Claude Code, Hackathon‑Winner Claude Config, and AI‑Powered Document Illustration

The article introduces four standout AI tooling projects—Happy Coder for mobile Claude Code access, the Everything Claude Code configuration suite from a hackathon champion, the Document Illustrator skill that auto‑generates illustrations, and the free Gemini CLI course—plus the skills.sh marketplace, detailing their features, installation steps, and practical evaluations.

Agent SkillsClaude CodeDocument Illustrator
0 likes · 12 min read
Top AI Projects: Mobile‑Controlled Claude Code, Hackathon‑Winner Claude Config, and AI‑Powered Document Illustration
Old Meng AI Explorer
Old Meng AI Explorer
Apr 17, 2026 · Artificial Intelligence

Which AI Coding CLI Reigns Supreme? A Data‑Driven Comparison of Claude, Codex, Copilot, and Gemini

This article presents a hands‑on, data‑backed comparison of the four leading AI programming command‑line tools—Claude Code, GitHub Copilot CLI, Google Gemini CLI, and OpenAI Codex—covering installation ease, command design, agent capabilities, extensibility, multimodal support, security, pricing, real‑world scenarios, and benchmark results to help developers choose the tool that best fits their specific workflow.

AI codingCLI toolscomparison
0 likes · 18 min read
Which AI Coding CLI Reigns Supreme? A Data‑Driven Comparison of Claude, Codex, Copilot, and Gemini
Old Zhang's AI Learning
Old Zhang's AI Learning
May 9, 2026 · Artificial Intelligence

Why Gemini’s Multimodal RAG with File Search Is So Compelling

The article analyzes Google Gemini’s File Search tool as a fully managed multimodal RAG solution, detailing its architecture, key features, pricing model, step‑by‑step usage, strengths, limitations, and how it compares with OpenAI Assistants File Search and Vertex AI Search.

AI RetrievalEmbeddingFile Search
0 likes · 14 min read
Why Gemini’s Multimodal RAG with File Search Is So Compelling