Tag

Model Inference

1 views collected around this technical thread.

Zhihu Tech Column
Zhihu Tech Column
Mar 14, 2025 · Artificial Intelligence

Insights from Zhihu’s ZhiLight Large Model Inference Framework: Architecture, Parallelism, and Performance Optimizations

The article summarizes Zhihu’s technical talk on the ZhiLight large‑model inference framework, detailing model execution mechanisms, GPU load analysis, multi‑GPU parallel strategies, open‑source engine comparisons, compute‑communication overlap, quantization techniques, benchmark results, and future directions for scalable LLM deployment.

GPU parallelismModel InferenceSGLang
0 likes · 11 min read
Insights from Zhihu’s ZhiLight Large Model Inference Framework: Architecture, Parallelism, and Performance Optimizations
DataFunTalk
DataFunTalk
Mar 3, 2025 · Artificial Intelligence

FlightVGM: FPGA-Accelerated Inference for Video Generation Models Wins Best Paper at FPGA 2025

The FlightVGM paper, awarded Best Paper at FPGA 2025, details a novel FPGA-based inference IP for video generation models that leverages time‑space activation sparsity, mixed‑precision DSP58 extensions, and adaptive scheduling to achieve up to 1.30× performance and 4.49× energy‑efficiency gains over a NVIDIA 3090 GPU while preserving model accuracy.

AIFPGAMixed Precision
0 likes · 11 min read
FlightVGM: FPGA-Accelerated Inference for Video Generation Models Wins Best Paper at FPGA 2025
ZhongAn Tech Team
ZhongAn Tech Team
Feb 16, 2025 · Artificial Intelligence

DeepSeek R1 and V3: Model Innovations, Industry Impact, and Future Trends

The article reviews DeepSeek's open‑source R1 and V3 large language models, highlighting their technical breakthroughs, cost advantages, expert opinions, industry adoption across chips, cloud services, and applications, and discusses future directions for model scaling, distillation, and AI competition.

AI IndustryAI competitionDeepSeek
0 likes · 13 min read
DeepSeek R1 and V3: Model Innovations, Industry Impact, and Future Trends
Alimama Tech
Alimama Tech
Feb 12, 2025 · Artificial Intelligence

HighService: A High‑Performance Pythonic AI Service Framework for Model Inference and Global Resource Scheduling

HighService, Alibaba’s Pythonic AI service framework, accelerates large‑model inference and maximizes GPU utilization by separating CPU‑GPU processes, offering out‑of‑the‑box quantization, parallelism and caching, and dynamically reallocating idle GPUs across clusters through a master‑worker scheduler to keep online latency low while boosting offline throughput for diffusion and LLM workloads.

AI ServiceHigh PerformanceModel Inference
0 likes · 16 min read
HighService: A High‑Performance Pythonic AI Service Framework for Model Inference and Global Resource Scheduling
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jan 17, 2024 · Artificial Intelligence

Building a License Plate Recognition Service with C++, TensorRT, and Go

This article details how to train a YOLOv8‑pose model for license‑plate detection, convert it to TensorRT engine, implement C++ inference and preprocessing, expose the functionality via CGO to Go, and assemble a lightweight web service for real‑time plate recognition.

C++GoLicense Plate Recognition
0 likes · 12 min read
Building a License Plate Recognition Service with C++, TensorRT, and Go
HelloTech
HelloTech
Aug 9, 2023 · Artificial Intelligence

Device Intelligence: Concepts, Architecture, and Applications

Device intelligence brings on-device reasoning and real-time inference to smartphones and IoT gateways, delivering low-latency, privacy-preserving, personalized services such as AR/VR enhancements and recommendation re-ranking, while confronting challenges of hardware fragmentation and model size, and complementing cloud AI through architectures like Hala’s MNN-based pipeline.

Device IntelligenceModel InferenceOn-Device AI
0 likes · 10 min read
Device Intelligence: Concepts, Architecture, and Applications
HelloTech
HelloTech
May 8, 2023 · Artificial Intelligence

One‑Stop AI Platform for Cloud, Edge, Mobile, Flink, and Application Intelligence: Architecture, Challenges, and Solutions

The article presents a comprehensive one‑stop AI platform that unifies training, model, feature, and decision services across cloud, edge, mobile, Flink, and application environments, detailing its architecture, the limitations of cloud‑centric inference, the advantages of localized inference, and the challenges and solutions for model and feature localization, SDK design, and future AutoML enhancements.

AI PlatformFeature EngineeringFlink
0 likes · 17 min read
One‑Stop AI Platform for Cloud, Edge, Mobile, Flink, and Application Intelligence: Architecture, Challenges, and Solutions
iQIYI Technical Product Team
iQIYI Technical Product Team
Jul 23, 2021 · Artificial Intelligence

XGBoost Serving: An Open‑Source High‑Performance Inference System for GBDT and GBDT+FM Models

XGBoost Serving is an open‑source, high‑performance inference system built on TensorFlow Serving that adds dedicated servables for pure GBDT, GBDT+FM binary‑classification, and GBDT+FM multi‑classification models, providing automatic version lifecycle management, GRPC/HTTP APIs, and up to 50 % latency reduction, now available on GitHub after successful deployment in iQIYI’s recommendation platform.

Factorization MachinesGBDTMachine Learning
0 likes · 12 min read
XGBoost Serving: An Open‑Source High‑Performance Inference System for GBDT and GBDT+FM Models