Tagged articles

12 articles

Page 1 of 1

May 11, 2026 · Artificial Intelligence

Redis Creator Releases Pure‑C Engine That Makes DeepSeek V4 Run Fast on Mac

Redis founder antirez unveiled ds4.c, a pure‑C inference engine that leverages Objective‑C and Metal to run DeepSeek V4 locally on Mac devices, delivering about 27 token/s on an M3 Ultra—far slower than GPU servers but offering a dependency‑free, on‑device solution that keeps data private.

AICDeepSeek

0 likes · 8 min read

Redis Creator Releases Pure‑C Engine That Makes DeepSeek V4 Run Fast on Mac

DeepHub IMBA

Apr 4, 2026 · Artificial Intelligence

Building Mini-vLLM from Scratch: KV‑Cache, Dynamic Batching, and Distributed Inference

This article walks through constructing Mini-vLLM, a from‑scratch LLM inference engine that tackles the O(N²) attention cost with KV‑cache, boosts throughput via dynamic batching, adds observability with Prometheus/Grafana, supports gRPC, and scales across multiple workers, with benchmark numbers demonstrating its CPU‑only performance.

DockerDynamic BatchingInference Engine

0 likes · 12 min read

Building Mini-vLLM from Scratch: KV‑Cache, Dynamic Batching, and Distributed Inference

JavaEdge

Jun 27, 2025 · Artificial Intelligence

Why Inference Engines Are Essential for Deploying Large Language Models in Production

The article explains what inference engines are, why they are needed beyond raw Python scripts, and outlines best practices such as model quantization, batching, and parallelism, while comparing popular open‑source and commercial options for production AI workloads.

AI deploymentBatchingInference Engine

0 likes · 14 min read

Why Inference Engines Are Essential for Deploying Large Language Models in Production

DataFunSummit

Dec 24, 2024 · Artificial Intelligence

Considerations and Practices for Domesticating Large‑Model Inference Engines

This article examines the importance of domestic large‑model inference engines, compares Chinese and international chips, evaluates four architectural approaches, discusses practical challenges such as performance loss and model support, and outlines future expectations for high‑performance, heterogeneous‑chip inference solutions.

Domestic ChipInference EnginePerformance Optimization

0 likes · 9 min read

Considerations and Practices for Domesticating Large‑Model Inference Engines

DataFunSummit

Sep 11, 2023 · Artificial Intelligence

Challenges and Insights for Deploying Large Models on Edge with MNN

The talk presents an overview of the MNN inference engine, outlines the end‑to‑end workflow for deploying large language models on mobile devices, discusses technical challenges and practical solutions, and concludes with future directions for edge AI deployment.

AIInference EngineLarge Models

0 likes · 2 min read

Challenges and Insights for Deploying Large Models on Edge with MNN

OPPO Kernel Craftsman

Oct 28, 2022 · Artificial Intelligence

ShaderNN: A GPU Shader‑Based Lightweight Inference Engine for Mobile AI Applications

ShaderNN is an open‑source, sub‑2 MB GPU‑shader inference engine that runs TensorFlow, PyTorch and ONNX models directly on mobile graphics textures via OpenGL fragment and compute shaders, delivering real‑time, low‑power AI for image‑heavy tasks while eliminating third‑party dependencies and achieving up to 90 % speed gains.

GPUInference EnginePerformance

0 likes · 11 min read

ShaderNN: A GPU Shader‑Based Lightweight Inference Engine for Mobile AI Applications

ByteDance Terminal Technology

Jul 29, 2022 · Artificial Intelligence

Pitaya: ByteDance’s End‑Side AI Engineering Platform Overview

Pitaya, built by ByteDance’s Client AI and MLX teams, is a comprehensive end‑side AI engineering platform that provides a full workflow from model development and data preparation to deployment, monitoring, and federated learning, supporting large‑scale commercial scenarios across multiple apps.

AI PlatformEdge AIInference Engine

0 likes · 14 min read

Pitaya: ByteDance’s End‑Side AI Engineering Platform Overview

DataFunTalk

Apr 14, 2022 · Artificial Intelligence

PaddlePaddle Deep Learning Platform: Architecture, Core Technologies, and Real‑World Applications

The article presents a comprehensive overview of Baidu's open‑source deep learning platform PaddlePaddle, detailing its full‑stack architecture, core technologies such as unified dynamic‑static graph, large‑scale distributed training, multi‑platform inference, an extensive model zoo, hardware adaptation, and showcases a real‑world deployment case in power‑grid monitoring.

AI FrameworkInference EngineModel Compression

0 likes · 15 min read

PaddlePaddle Deep Learning Platform: Architecture, Core Technologies, and Real‑World Applications

DaTaobao Tech

Mar 11, 2022 · Artificial Intelligence

How Alibaba’s MNN Engine Achieves 350% CPU Speedup and Sparse Acceleration

Alibaba’s MNN, a lightweight high‑performance deep‑learning inference engine, earned top honors in China’s 2022 “Science & Innovation China” awards, and delivers impressive gains such as 350% speedup on X86 CPUs, 2.1‑2.3× acceleration on ARM with sparse models, plus integrated OpenCV/Numpy functionality for edge AI deployment.

AI deploymentAlibabaInference Engine

0 likes · 4 min read

How Alibaba’s MNN Engine Achieves 350% CPU Speedup and Sparse Acceleration

Alibaba Terminal Technology

Feb 3, 2021 · Frontend Development

How Front-End AI Inference Engines Achieve Real-Time Smart Recognition

This article explains on‑device machine learning concepts, compares front‑end inference engines such as TensorFlow.js, ONNX.js and WebDNN across CPU, WASM and WebGL, and presents practical optimization techniques like vectorization, memory layout, graph fusion and mixed‑precision to boost performance for real‑time applications.

FrontendInference EngineMachine Learning

0 likes · 11 min read

How Front-End AI Inference Engines Achieve Real-Time Smart Recognition

Alibaba Cloud Developer

Jul 2, 2019 · Artificial Intelligence

How MNN Powers Mobile AI: Inside Alibaba’s Open‑Source Inference Engine

Alibaba’s MNN (Mobile Neural Network) engine, now open‑sourced on GitHub, showcases how a lightweight, end‑side deep‑learning inference framework tackles fragmentation, optimizes model conversion, scheduling, and execution across diverse devices, delivering significant performance gains for mobile and IoT AI applications.

Inference EngineMNNOperator fusion

0 likes · 15 min read

How MNN Powers Mobile AI: Inside Alibaba’s Open‑Source Inference Engine

Alibaba Cloud Developer

Aug 30, 2017 · Artificial Intelligence

How Alibaba’s Knowledge Graph Powers Real‑Time Product Governance with AI

Alibaba’s massive product knowledge graph combines billions of triples, AI‑driven inference, and semantic reasoning to enable millisecond‑level, explainable detection of illegal or counterfeit items across its e‑commerce ecosystem, improving platform governance and consumer experience.

AIAlibabaInference Engine

0 likes · 8 min read

How Alibaba’s Knowledge Graph Powers Real‑Time Product Governance with AI