Author

DeepHub IMBA

A must‑follow public account sharing practical AI insights. Follow now. internet + machine learning + big data + architecture = IMBA

Articles

Likes

Views

Comments

Latest from DeepHub IMBA

60 recent articles

DeepHub IMBA

Apr 13, 2026 · Artificial Intelligence

From Retrieval to Answer: Three Overlooked Failure Points in RAG Pipelines

The article reveals silent failures in production RAG systems—where high retrieval scores and fluent LLM outputs still deliver incorrect answers—and proposes a four‑step observability loop (relevance gating, post‑generation evaluation, session‑wide tracing, and user‑signal logging) to detect and remediate these faults.

LLM evaluationObservabilityRAG

0 likes · 12 min read

From Retrieval to Answer: Three Overlooked Failure Points in RAG Pipelines

DeepHub IMBA

Apr 11, 2026 · Artificial Intelligence

Understanding Vector Similarity Search: Flat Index, IVF, and HNSW

This article explains why vector databases are needed for semantic search of unstructured data and provides a detailed, step‑by‑step comparison of three core vector similarity algorithms—cosine similarity, Flat Index, IVF, and HNSW—highlighting their trade‑offs in accuracy and speed.

EmbeddingsHNSWIVF

0 likes · 10 min read

Understanding Vector Similarity Search: Flat Index, IVF, and HNSW

DeepHub IMBA

Apr 9, 2026 · Artificial Intelligence

Prompt, Context, Harness: Decoding the Three‑Layer Architecture of AI Agent Engineering

The article analyzes the evolution from Prompt Engineering to Context Engineering and finally Harness Engineering, explains why each layer is needed, provides concrete code examples, diagnostic scripts, and practical guidelines for building reliable AI coding agents.

AI AgentsAgent ArchitectureContext Engineering

0 likes · 22 min read

Prompt, Context, Harness: Decoding the Three‑Layer Architecture of AI Agent Engineering

DeepHub IMBA

Apr 8, 2026 · Artificial Intelligence

Choosing a Vector Database: Pinecone for Production, Chroma for Prototyping, Weaviate for Hybrid Search

This article compares three popular vector databases—Pinecone, Chroma, and Weaviate—explaining how they store embeddings for RAG systems, showing Python setup code, and outlining each solution's architecture, scaling limits, cost considerations, and ideal use cases.

ChromaEmbeddingHybrid Search

0 likes · 7 min read

Choosing a Vector Database: Pinecone for Production, Chroma for Prototyping, Weaviate for Hybrid Search

DeepHub IMBA

Apr 7, 2026 · Artificial Intelligence

instinct: A Confidence‑Based Self‑Learning Memory System for AI Agents

The article introduces instinct, a confidence‑driven memory framework that lets AI coding agents automatically observe, consolidate, and suggest reusable patterns across sessions, using SQLite for storage, MCP for integration, and a Python API for extensibility.

AIAgent MemoryPython

0 likes · 11 min read

instinct: A Confidence‑Based Self‑Learning Memory System for AI Agents

DeepHub IMBA

Apr 6, 2026 · Artificial Intelligence

Mastering Machine Learning Feature Engineering: Scaling, Encoding, Aggregation, Embedding, and Automation

The article explains why good features matter more than fancy algorithms and walks through practical techniques—scaling, log transforms, binning, interaction, various encoding schemes, datetime extraction, text statistics, geospatial distances, aggregation, feature selection, and automated feature generation—illustrated with concrete pandas and scikit‑learn code examples.

EncodingFeature Engineeringautomation

0 likes · 16 min read

Mastering Machine Learning Feature Engineering: Scaling, Encoding, Aggregation, Embedding, and Automation

DeepHub IMBA

Apr 5, 2026 · Artificial Intelligence

Understanding ADK Multi‑Agent Orchestration: SequentialAgent, ParallelAgent, and LoopAgent Explained

The article explains ADK's three core orchestration modes—SequentialAgent for ordered pipelines, ParallelAgent for independent concurrent tasks, and LoopAgent for iterative quality‑control loops—detailing their suitable scenarios, state‑flow mechanisms, and how to build a complete order‑to‑delivery workflow without writing explicit orchestration code.

ADKLLMLoopAgent

0 likes · 16 min read

Understanding ADK Multi‑Agent Orchestration: SequentialAgent, ParallelAgent, and LoopAgent Explained

DeepHub IMBA

Apr 4, 2026 · Artificial Intelligence

Building Mini-vLLM from Scratch: KV‑Cache, Dynamic Batching, and Distributed Inference

This article walks through constructing Mini-vLLM, a from‑scratch LLM inference engine that tackles the O(N²) attention cost with KV‑cache, boosts throughput via dynamic batching, adds observability with Prometheus/Grafana, supports gRPC, and scales across multiple workers, with benchmark numbers demonstrating its CPU‑only performance.

DockerDynamic BatchingInference Engine

0 likes · 12 min read

Building Mini-vLLM from Scratch: KV‑Cache, Dynamic Batching, and Distributed Inference

DeepHub IMBA

Apr 3, 2026 · Artificial Intelligence

Multi‑Aspect Embedding: Integrating Context Signals into Vector Similarity Search

The article analyzes how traditional vector database pipelines use external filters for context constraints and proposes the Aspect Database’s multi‑aspect embedding approach, which encodes contextual attributes directly into similarity vectors to enable unified, context‑aware retrieval for AI systems.

AI SystemsANN searchEmbedding

0 likes · 9 min read

Multi‑Aspect Embedding: Integrating Context Signals into Vector Similarity Search

DeepHub IMBA

Apr 2, 2026 · Artificial Intelligence

Speculative Decoding Explained: Small Draft Model + One‑Shot Verification

The article details how speculative decoding—using a fast small model to draft tokens and a large model to verify them—overcomes the memory‑bandwidth bottleneck of autoregressive inference, introduces SSD’s self‑draft and tree‑verification stages, presents real‑world benchmark gains, and shows how to enable it in vLLM.

GPU memory bandwidthInference OptimizationSSD

0 likes · 14 min read

Speculative Decoding Explained: Small Draft Model + One‑Shot Verification