Tagged articles
3 articles
Page 1 of 1
Java Companion
Java Companion
May 26, 2026 · Artificial Intelligence

How a Terminal AI Agent Achieves a 99.82% Cache Hit Rate with DeepSeek API

DeepSeek-Reasonix, a terminal‑based AI coding agent tightly integrated with the DeepSeek API, delivers a 99.82% prefix‑cache hit rate that cuts daily token costs from $61 to $1.38, while offering file editing, command execution, memory, hooks, MCP support, and a preview Tauri desktop client.

AI coding agentDeepSeekReasonix
0 likes · 14 min read
How a Terminal AI Agent Achieves a 99.82% Cache Hit Rate with DeepSeek API
James' Growth Diary
James' Growth Diary
May 24, 2026 · Artificial Intelligence

Execution → Observation → Reflection → Improvement: How Hermes Closes the Skill Loop

The article dissects Hermes' background review mechanism, showing how a silent daemon thread performs post‑conversation reflection, writes valuable insights to a skill or memory store, shares prompt designs, fork‑agent isolation, priority update rules, and common pitfalls for building continuously learning LLM agents.

Background ReviewDaemon ThreadHermes
0 likes · 14 min read
Execution → Observation → Reflection → Improvement: How Hermes Closes the Skill Loop
58 Tech
58 Tech
Apr 11, 2025 · Artificial Intelligence

Optimization of Multimodal Visual Large Model Inference: Pre‑processing, ViT TensorRT, CUDA Graphs, Tokenization, Prefix Cache, and Quantization

This report details a comprehensive set of optimizations for multimodal visual large‑model (VLM) inference—including image pre‑processing acceleration, TensorRT integration for the ViT module, CUDA‑Graph replay, token‑count reduction, prefix‑cache handling, and weight quantization—demonstrating up to three‑fold throughput gains while maintaining accuracy.

CUDA GraphMultimodalTensorRT
0 likes · 19 min read
Optimization of Multimodal Visual Large Model Inference: Pre‑processing, ViT TensorRT, CUDA Graphs, Tokenization, Prefix Cache, and Quantization