Tagged articles

151 articles

Page 1 of 2

May 5, 2026 · Artificial Intelligence

vLLM 0.20.1 Fixes Instability and Speed Issues for DeepSeek V4

The vLLM 0.20.1 patch, released shortly after 0.20.0, consolidates stability fixes and performance optimizations for DeepSeek V4, adds several bug fixes, updates installation instructions, and provides targeted upgrade recommendations for different user scenarios.

Bug FixDeepSeek V4GPU inference

0 likes · 9 min read

vLLM 0.20.1 Fixes Instability and Speed Issues for DeepSeek V4

Architect

May 3, 2026 · Artificial Intelligence

Why the Same Model Feels Different in Coding Agents: Model Sets the Capability Ceiling, Harness Sets the Production Floor

The article examines how a model defines an agent’s ultimate capabilities while the harness determines its production reliability, detailing continuous evaluation, context‑budgeting, tool‑error classification, multi‑model migration, and SRE‑style engineering practices needed to keep AI coding agents stable and performant.

AI agentsAgent HarnessContext Management

0 likes · 31 min read

Why the Same Model Feels Different in Coding Agents: Model Sets the Capability Ceiling, Harness Sets the Production Floor

Lao Guo's Learning Space

May 3, 2026 · Artificial Intelligence

2026 Enterprise Guide to Large Model Fine‑Tuning: Choosing, Training, and Deploying

This comprehensive guide explains why enterprises should fine‑tune large language models instead of using raw APIs or RAG, compares six fine‑tuning techniques (Full, LoRA, QLoRA, AdaLoRA, DoRA, Prompt‑Tuning), evaluates popular toolchains, outlines a step‑by‑step workflow, presents cost analyses, real‑world case studies, and practical best‑practice recommendations for 2026.

LoRAModel DeploymentQLoRA

0 likes · 18 min read

2026 Enterprise Guide to Large Model Fine‑Tuning: Choosing, Training, and Deploying

PMTalk Product Manager Community

Apr 30, 2026 · Artificial Intelligence

How a Large AI Model Is Trained: Insights from a High‑Earning AI Product Manager

The article walks through model training, validation, ensemble learning, and deployment from an AI product manager’s viewpoint, using a churn‑prediction case to illustrate decision boundaries, metric choices, industry‑specific algorithm trade‑offs, cost considerations, and practical serving options.

AI product managementModel Deploymentensemble learning

0 likes · 6 min read

How a Large AI Model Is Trained: Insights from a High‑Earning AI Product Manager

AI Explorer

Apr 29, 2026 · Artificial Intelligence

Open-Source ML Intern: One-Click Paper Reading, Training & Deployment – Hype or Real Deal?

ml‑intern, an open‑source AI agent from Hugging Face, automates the full ML workflow—reading papers, generating code, training and deploying models—using an asynchronous event‑driven loop with submission and event queues, supporting interactive and headless modes, Slack notifications, and multiple LLM back‑ends.

AI agentHugging FaceLLM

0 likes · 5 min read

Open-Source ML Intern: One-Click Paper Reading, Training & Deployment – Hype or Real Deal?

Sohu Tech Products

Apr 15, 2026 · Artificial Intelligence

Why Harness Engineering Is the Next Evolution in AI System Design

This tutorial explains the three-stage evolution from Prompt Engineering to Context Engineering and finally Harness Engineering, detailing their motivations, core components, practical implementations, and why stable, end‑to‑end AI agents require a full harness to manage tasks, context, tools, execution, state, and error recovery.

AI SystemsAgent DesignContext Engineering

0 likes · 31 min read

Why Harness Engineering Is the Next Evolution in AI System Design

SuanNi

Apr 13, 2026 · Artificial Intelligence

Deploy Qwen3 8B Model with vLLM: Step‑by‑Step Guide for Remote Inference

This guide walks you through deploying Alibaba’s open‑source Qwen‑3 8B model on the SumW platform using vLLM, covering environment activation, server launch with OpenAI‑compatible parameters, SSH tunneling for remote access, and Python client calls, while highlighting key configuration tips and common pitfalls.

Model DeploymentOpenAI APIPython SDK

0 likes · 6 min read

Deploy Qwen3 8B Model with vLLM: Step‑by‑Step Guide for Remote Inference

AI Large-Model Wave and Transformation Guide

Apr 11, 2026 · Artificial Intelligence

How to Engineer Reliable AI Models: From Infrastructure to Deployment

This article presents a comprehensive, step‑by‑step framework for turning laboratory AI models into production‑ready systems, covering capability mapping, technology stack choices, model selection, prompt engineering, data pipelines, training strategies, and cross‑team collaboration to ensure stability, observability, and trustworthiness.

AI model engineeringModel DeploymentModel Monitoring

0 likes · 14 min read

How to Engineer Reliable AI Models: From Infrastructure to Deployment

Baidu Intelligent Cloud Tech Hub

Apr 8, 2026 · Artificial Intelligence

Unlocking 8‑Hour Autonomous Coding: GLM‑5.1’s Leap with Kunlun XPU

The open‑source GLM‑5.1 model, adapted to Baidu Baige's Kunlun XPU via the vLLM‑Kunlun Plugin, delivers record‑breaking SWE‑bench scores, eight‑hour autonomous coding, long‑context handling up to 64K tokens, and scalable deployment across tens of thousands of chips, showcasing end‑to‑end AI acceleration.

GLM-5.1Kunlun XPUModel Deployment

0 likes · 8 min read

Unlocking 8‑Hour Autonomous Coding: GLM‑5.1’s Leap with Kunlun XPU

Old Zhang's AI Learning

Apr 8, 2026 · Artificial Intelligence

GLM‑5.1 Outperforms Claude Opus in Benchmarks – The Open‑Source LLM’s Edge

GLM‑5.1, the new 744 B‑parameter open‑source LLM from Zhipu, tops SWE‑Bench Pro with a score of 58.4, outpacing Claude Opus, GPT‑5.4 and Gemini, excels at long‑duration autonomous tasks, yet shows gaps in single‑turn generation and pure mathematical reasoning.

Agent ProgrammingGLM-5.1Model Deployment

0 likes · 22 min read

GLM‑5.1 Outperforms Claude Opus in Benchmarks – The Open‑Source LLM’s Edge

IT Services Circle

Apr 5, 2026 · Artificial Intelligence

Why Harness Engineering Is the Next Frontier in AI System Design

This article explains how AI engineering has evolved from Prompt Engineering to Context Engineering and now Harness Engineering, detailing each stage's challenges, core techniques, and real‑world practices that turn large language models into reliable, long‑running production systems.

Context EngineeringHarness EngineeringLLM operations

0 likes · 32 min read

Why Harness Engineering Is the Next Frontier in AI System Design

Advanced AI Application Practice

Mar 24, 2026 · Artificial Intelligence

Connecting OpenClaw to Ollama: Step‑by‑Step Guide and Common Pitfalls

This article explains why Ollama has become popular for local LLM deployment, outlines its core features, and provides a detailed, step‑by‑step tutorial for integrating OpenClaw with Ollama—including model selection, configuration, troubleshooting common errors, and advanced tips for customization and multi‑model switching.

AIModel DeploymentOllama

0 likes · 9 min read

Connecting OpenClaw to Ollama: Step‑by‑Step Guide and Common Pitfalls

Old Zhang's AI Learning

Mar 5, 2026 · Artificial Intelligence

Timber: The “Ollama” for Traditional Machine Learning Models

Timber is a multi‑pass compiler that transforms classic ML models such as XGBoost and LightGBM into zero‑dependency C99 binaries, offering microsecond‑level inference latency, HTTP‑compatible serving, and substantial performance gains over Python runtimes, making it ideal for high‑throughput, low‑latency production scenarios.

LightGBMML compilerModel Deployment

0 likes · 8 min read

Timber: The “Ollama” for Traditional Machine Learning Models

Alibaba Cloud Native

Mar 3, 2026 · Cloud Native

Deploy Alibaba's Qwen3.5-397B Model in Minutes with Serverless Function Compute

This guide explains how to quickly deploy the new Qwen3.5-397B-A17B open‑source large model using Alibaba Cloud Function Compute's serverless GPU service, covering model features, deployment steps, required commands, and performance benefits.

AICloud NativeFunction Compute

0 likes · 5 min read

Deploy Alibaba's Qwen3.5-397B Model in Minutes with Serverless Function Compute

Woodpecker Software Testing

Mar 3, 2026 · Artificial Intelligence

How AI Transforms Performance Testing: Essential Insights for Test Engineers

The article explains how AI-driven predictive modeling, intelligent load orchestration, and self‑healing bottleneck detection can dramatically improve performance testing efficiency, reduce defect detection time by 68% and resource consumption by 41%, while outlining practical stacks and common pitfalls.

AIDevOpsLoad Orchestration

0 likes · 8 min read

How AI Transforms Performance Testing: Essential Insights for Test Engineers

AIWalker

Feb 27, 2026 · Artificial Intelligence

YOLO26 Review: End-to-End, NMS‑Free Edge AI Boosts CPU Inference by 43%

This article analyzes YOLO26’s architecture redesign that eliminates NMS, removes DFL, introduces progressive loss balancing, STAL, and the MuSGD optimizer, achieving up to 43% faster CPU inference and simplifying deployment for edge vision tasks across detection, segmentation, classification, pose estimation, and OBB.

CPU inferenceModel DeploymentNMS-free

0 likes · 13 min read

YOLO26 Review: End-to-End, NMS‑Free Edge AI Boosts CPU Inference by 43%

Baobao Algorithm Notes

Feb 25, 2026 · Artificial Intelligence

Exploring Qwen 3.5: Small‑Scale MoE Models, Architecture, and Deployment Guides

This article reviews the three open‑source Qwen 3.5 models—including a 35B MoE, a 122B MoE, and a 27B dense version—detailing their parameter layouts, core attention designs, context length, inference performance, hardware requirements, and provides step‑by‑step code examples for loading them with Hugging Face Transformers and vLLM.

AILarge Language ModelMoE

0 likes · 10 min read

Exploring Qwen 3.5: Small‑Scale MoE Models, Architecture, and Deployment Guides

Baidu Intelligent Cloud Tech Hub

Feb 12, 2026 · Artificial Intelligence

Deploying GLM-5 on Baidu Kunlun P800 XPU with vLLM‑Kunlun Plugin

This article explains how Baidu's new GLM-5 large model is adapted to the Kunlun P800 XPU, detailing the async reinforcement learning framework Slime, optimization techniques like INT8 quantization and tensor‑parallelism, and provides step‑by‑step deployment commands using the open‑source vLLM‑Kunlun plugin.

AI accelerationGLM-5INT8 Quantization

0 likes · 6 min read

Deploying GLM-5 on Baidu Kunlun P800 XPU with vLLM‑Kunlun Plugin

Baidu Geek Talk

Dec 17, 2025 · Artificial Intelligence

Accelerate LLM Deployment on Baidu Kunlun XPU with the Open‑Source vLLM‑Kunlun Plugin

The vLLM‑Kunlun Plugin, jointly released by Baidu Baige and Kunlun Chip, provides a high‑performance, zero‑intrusion solution for deploying open‑source large language models on domestic Kunlun XPU hardware, includes fused operators, precision‑validation and profiling tools, and supports over twenty mainstream and multimodal models.

Kunlun XPUModel DeploymentOpen Source

0 likes · 7 min read

Accelerate LLM Deployment on Baidu Kunlun XPU with the Open‑Source vLLM‑Kunlun Plugin

Python Programming Learning Circle

Nov 3, 2025 · Artificial Intelligence

Boost Deep Learning Deployment on Windows with LabVIEW + Python

This article explains how to combine Python for training deep‑learning models with LabVIEW for rapid Windows‑based UI development and model deployment, showing step‑by‑step usage of LabVIEW's Python Node and array passing techniques.

AIGraphical ProgrammingLabVIEW

0 likes · 4 min read

Boost Deep Learning Deployment on Windows with LabVIEW + Python

21CTO

Oct 6, 2025 · Artificial Intelligence

How to Become an AI Engineer: Skills, Workflow, and Career Path

This guide explains what AI engineering entails, outlines the end‑to‑end workflow from problem definition and data preparation through model development, deployment, and monitoring, and highlights the essential programming, cloud, and MLOps skills, career tracks, emerging trends, and salary outlook for aspiring AI engineers.

AI EngineeringCloud ComputingMLOps

0 likes · 11 min read

How to Become an AI Engineer: Skills, Workflow, and Career Path

DevOps Cloud Academy

Sep 25, 2025 · Artificial Intelligence

How to Build Scalable MLOps Infrastructure for Enterprise AI Success

This article explains what MLOps is, why a robust MLOps framework is essential for businesses, outlines its core components, compares MLOps with AIOps, details the benefits of investing in MLOps, and provides a step‑by‑step guide to designing enterprise‑grade AI MLOps infrastructure.

AI infrastructureGovernanceMLOps

0 likes · 17 min read

How to Build Scalable MLOps Infrastructure for Enterprise AI Success

DevOps Cloud Academy

Sep 21, 2025 · Artificial Intelligence

How to Deploy Machine Learning Models Efficiently: A Complete Guide

This guide explains what model deployment is, why it matters, the various deployment types, readiness criteria, best practices, common challenges, real‑world case studies, and the most popular tools and platforms for deploying machine learning models in production.

AIMLOpsModel Deployment

0 likes · 20 min read

How to Deploy Machine Learning Models Efficiently: A Complete Guide

DaTaobao Tech

Sep 17, 2025 · Artificial Intelligence

Boosting ID Card Photo Quality with Multimodal AI: A Practical Deployment Guide

This article details how a multimodal AI model was integrated to detect and improve ID card photo quality, covering common image issues, differences between OCR and multimodal extraction, deployment strategies, performance metrics, cost estimation, and the resulting business and technical benefits.

ID verificationModel DeploymentOCR

0 likes · 13 min read

Boosting ID Card Photo Quality with Multimodal AI: A Practical Deployment Guide

Alibaba Cloud Developer

Sep 3, 2025 · Artificial Intelligence

Deploy OpenAI’s gpt-oss-20b on Alibaba Cloud in 10 Minutes – A No‑Code Guide

This step‑by‑step tutorial shows how to quickly launch OpenAI’s open‑source gpt‑oss‑20b model on Alibaba Cloud PAI without writing code, configure the deployment, and start chatting with the model using the Cherry Studio client.

Alibaba CloudCherry StudioGPT-OSS

0 likes · 5 min read

Deploy OpenAI’s gpt-oss-20b on Alibaba Cloud in 10 Minutes – A No‑Code Guide

Dunmao Tech Hub

Sep 1, 2025 · Artificial Intelligence

Deploy DeepSeek‑r1 Locally with a One‑Click Ollama Script

This guide walks you through a Bash script that automatically checks for Ollama, installs it if missing, lets you choose a DeepSeek‑r1 model size, starts the Ollama service, and runs the selected model locally, complete with usage examples and a token‑cost note.

AIDeepSeekModel Deployment

0 likes · 7 min read

Deploy DeepSeek‑r1 Locally with a One‑Click Ollama Script

DataFunTalk

Sep 1, 2025 · Artificial Intelligence

Unlocking 560B‑Parameter AI: Inside LongCat‑Flash‑Chat’s Zero‑Computation MoE

LongCat‑Flash‑Chat, a 560‑billion‑parameter Mixture‑of‑Experts model with Zero‑Computation Experts, delivers top‑tier benchmark scores and fast inference while activating only a fraction of its parameters, and is fully open‑sourced with easy deployment scripts.

Artificial IntelligenceLarge Language ModelMixture of Experts

0 likes · 6 min read

Unlocking 560B‑Parameter AI: Inside LongCat‑Flash‑Chat’s Zero‑Computation MoE

Alibaba Cloud Big Data AI Platform

Aug 2, 2025 · Artificial Intelligence

Deploy and Use Qwen3‑Coder on Alibaba Cloud PAI for AI‑Powered Coding

This guide explains how to deploy the open‑source Qwen3‑Coder model on Alibaba Cloud's PAI platform, use the interactive PAI‑DSW environment, run code snippets, and generate notebook tutorials with the model's agentic CLI, covering both enterprise and individual developer scenarios.

AI coding modelAgentic CLIAlibaba Cloud

0 likes · 7 min read

Deploy and Use Qwen3‑Coder on Alibaba Cloud PAI for AI‑Powered Coding

Alibaba Cloud Developer

Jul 17, 2025 · Artificial Intelligence

How to Build a House Price Prediction Model with Python: A Step‑by‑Step Guide

This tutorial walks developers through the complete workflow of building a house‑price regression model—from problem definition, data collection and preprocessing, feature engineering, and model selection, to training, hyper‑parameter tuning, evaluation, optimization, deployment as a Flask service, and ongoing monitoring—using Python, pandas, scikit‑learn, and visualisation libraries.

Machine LearningModel DeploymentPython

0 likes · 29 min read

How to Build a House Price Prediction Model with Python: A Step‑by‑Step Guide

DataFunSummit

Jun 18, 2025 · Artificial Intelligence

How to Upload, Test, and Deploy MiniLM on Modelers.cn: A Step‑by‑Step Guide

This article walks through uploading a MiniLM model to the Modelers.cn community, explains why testing is essential, demonstrates both usability and local tests with openMind, and provides complete Python code for classification and simple question‑answering, enabling developers to quickly deploy and evaluate MiniLM in practice.

MiniLMModel DeploymentNLP

0 likes · 9 min read

How to Upload, Test, and Deploy MiniLM on Modelers.cn: A Step‑by‑Step Guide

DaTaobao Tech

Jun 4, 2025 · Artificial Intelligence

Understanding Large Language Model Architecture, Parameters, Memory, Storage, and Fine‑Tuning Techniques

This article provides a comprehensive overview of large language models (LLMs), covering their transformer architecture, parameter counts, GPU memory and storage requirements, and detailed fine‑tuning methods such as prompt engineering, data construction, LoRA, PEFT, RLHF, and DPO, along with practical deployment and inference acceleration strategies.

DPOLLMLoRA

0 likes · 17 min read

Architect

May 31, 2025 · Artificial Intelligence

Edge Intelligence Implementation in the Vivo Official App: Architecture, Feature Engineering, and Model Deployment

The article details how edge intelligence is applied to the Vivo official app to improve product recommendation on the smart‑hardware floor by abstracting the problem, designing feature engineering pipelines, training TensorFlow models, converting them to TFLite, and deploying inference on mobile devices, while also covering monitoring and performance considerations.

Edge AIModel DeploymentTensorFlow Lite

0 likes · 19 min read

Edge Intelligence Implementation in the Vivo Official App: Architecture, Feature Engineering, and Model Deployment

Alibaba Cloud Developer

May 28, 2025 · Artificial Intelligence

Unlocking LLM Fine‑Tuning: From Architecture to LoRA, DPO and Deployment

This article provides a comprehensive guide to large language model fine‑tuning, covering model architecture, parameter and memory calculations, prompt engineering, data construction, LoRA and PEFT techniques, reinforcement learning methods such as DPO, and practical deployment workflows on internal platforms.

Fine‑TuningLLMLoRA

0 likes · 21 min read

Unlocking LLM Fine‑Tuning: From Architecture to LoRA, DPO and Deployment

vivo Internet Technology

May 21, 2025 · Artificial Intelligence

How Vivo’s App Leverages Edge AI to Personalize Product Recommendations

This article details how Vivo’s official app implements edge intelligence to dynamically rank and recommend hardware products on its homepage, covering problem abstraction, data collection, feature engineering, model design, TensorFlow‑Lite conversion, on‑device inference, and monitoring for a personalized user experience.

AndroidEdge AIMachine Learning

0 likes · 19 min read

How Vivo’s App Leverages Edge AI to Personalize Product Recommendations

Baidu Geek Talk

May 12, 2025 · Artificial Intelligence

One‑Click Deployment of Baidu Qwen3 Large Models on Baidu Baige AI Platform

This guide explains how to use Baidu Baige's AI heterogeneous computing platform to deploy the eight‑model Qwen3 family—including dense and MoE variants—via a one‑click process, covering resource configuration, inference acceleration options, and post‑deployment service access.

AIBaidu BaigeCloud AI

0 likes · 4 min read

One‑Click Deployment of Baidu Qwen3 Large Models on Baidu Baige AI Platform

Volcano Engine Developer Services

May 8, 2025 · Artificial Intelligence

Connect Your Self‑Hosted LLM to Volcengine Edge Gateway in 4 Simple Steps

This step‑by‑step tutorial explains how to add a self‑deployed large language model to Volcengine's Edge Large Model Gateway, configure a secure calling channel, bind it to a gateway access key, and integrate the provided sample code for seamless API access.

LLMModel Deploymentapi-integration

0 likes · 9 min read

Connect Your Self‑Hosted LLM to Volcengine Edge Gateway in 4 Simple Steps

Architect's Alchemy Furnace

Mar 31, 2025 · Artificial Intelligence

How to Deploy and Run Large Language Models with Xinference: A Step‑by‑Step Guide

Xinference is a powerful distributed inference framework that enables quick deployment and efficient serving of open‑source large language models via Docker or source installation, offering Web UI, CLI, and API interfaces with detailed setup, model launching, and Chatbox integration instructions.

APIDockerLLM

0 likes · 11 min read

How to Deploy and Run Large Language Models with Xinference: A Step‑by‑Step Guide

Alibaba Cloud Developer

Mar 25, 2025 · Cloud Computing

How to Deploy QwQ-32B Model on Alibaba Cloud Function Compute Using CAP

This guide walks you through deploying the open‑source QwQ‑32B model on Alibaba Cloud Function Compute with CAP, covering required services, step‑by‑step deployment, cost notes, accessing the demo UI, interacting with the model, scaling settings, and resource cleanup.

Alibaba CloudCAPFunction Compute

0 likes · 7 min read

How to Deploy QwQ-32B Model on Alibaba Cloud Function Compute Using CAP

Top Architect

Mar 22, 2025 · Artificial Intelligence

Spring AI: Intelligent Development Trend for Java Developers

The article introduces Spring AI as an emerging tool for Java developers, explains its background, goals, and core components such as data processing, model training, deployment and monitoring, showcases application scenarios like NLP, image processing, recommendation systems and predictive analytics, and also includes promotional offers for AI resources and community groups.

Artificial IntelligenceJavaMachine Learning

0 likes · 17 min read

Spring AI: Intelligent Development Trend for Java Developers

Top Architecture Tech Stack

Mar 22, 2025 · Artificial Intelligence

Spring AI: An Overview of Intelligent Development Trends

This article introduces Spring AI, a Spring ecosystem module that simplifies building, training, and deploying AI applications for Java developers, covering its background, goals, core components such as data processing, model training, deployment, practical code examples, use cases, advantages, challenges, and future outlook.

Artificial IntelligenceJavaMachine Learning

0 likes · 12 min read

Spring AI: An Overview of Intelligent Development Trends

Architecture Digest

Mar 21, 2025 · Artificial Intelligence

Spring AI: Emerging Trends in Intelligent Development

This article introduces Spring AI, explains its background, goals, core components such as data processing, model training, deployment and monitoring, showcases practical use cases like NLP, image processing and recommendation systems, and discusses its advantages, challenges, and future outlook for Java developers.

Artificial IntelligenceJavaMachine Learning

0 likes · 16 min read

Spring AI: Emerging Trends in Intelligent Development

Alibaba Cloud Developer

Mar 12, 2025 · Artificial Intelligence

Deploy Alibaba Cloud’s QwQ-32B LLM: Benchmarks, Agent Features, and One‑Click Setup

This guide introduces Alibaba Cloud’s open‑source QwQ-32B large language model, highlights its superior benchmark performance over competing models, explains its integrated agent capabilities, and provides step‑by‑step instructions for one‑click deployment via the PAI‑Model Gallery.

Alibaba CloudLLMModel Deployment

0 likes · 7 min read

Deploy Alibaba Cloud’s QwQ-32B LLM: Benchmarks, Agent Features, and One‑Click Setup

Efficient Ops

Mar 9, 2025 · Artificial Intelligence

Essential LLMOps Tools: Build, Deploy, Monitor, and Manage Large Language Models

LLMOps, the end-to-end methodology for managing large language models, encompasses a curated set of development, deployment, monitoring, and local management tools—such as LangChain, vLLM, LangSmith, and Ollama—enabling practitioners to efficiently build, scale, and maintain AI applications.

AI developmentLLMOpsModel Deployment

0 likes · 6 min read

Essential LLMOps Tools: Build, Deploy, Monitor, and Manage Large Language Models

Programmer DD

Mar 6, 2025 · Artificial Intelligence

Discover QwQ-32B: A 32B LLM Matching 671B DeepSeek‑R1 Performance

The QwQ-32B model, released by Alibaba Cloud, delivers DeepSeek‑R1‑level results with only 32 billion parameters, offers integrated agent capabilities, is open‑source under Apache 2.0, and can be quickly deployed locally via Ollama or integrated into Java applications using Spring AI.

AI inferenceLarge Language ModelModel Deployment

0 likes · 4 min read

Discover QwQ-32B: A 32B LLM Matching 671B DeepSeek‑R1 Performance

Alibaba Cloud Native

Feb 19, 2025 · Cloud Native

Engineering Traffic Management for DeepSeek: Cloud‑Native Deployment Strategies

This article outlines practical cloud‑native deployment options for DeepSeek models, explains common engineering challenges such as traffic spikes, latency, security, quota control, and provides detailed AI‑gateway solutions—including fallback, content safety, API key management, gray‑release routing, caching, and observability—to ensure reliable large‑model applications.

DeepSeekModel Deploymenttraffic management

0 likes · 9 min read

Engineering Traffic Management for DeepSeek: Cloud‑Native Deployment Strategies

Alibaba Cloud Native

Feb 18, 2025 · Cloud Native

Deploy DeepSeek‑R1 on Alibaba Cloud ACK One Using ACS GPU in Minutes

This guide shows how to overcome on‑premise compute limits by registering a local Kubernetes cluster to Alibaba Cloud ACK One, provisioning ACS GPU resources, and deploying the DeepSeek‑R1 inference model with the vLLM framework through a series of concrete commands and YAML configurations.

ACK OneACS GPUDeepSeek

0 likes · 15 min read

Deploy DeepSeek‑R1 on Alibaba Cloud ACK One Using ACS GPU in Minutes

Architecture & Thinking

Feb 18, 2025 · Artificial Intelligence

Why Is DeepSeek Server Overloaded? Causes and Practical Workarounds

The article investigates why DeepSeek frequently returns a “server busy” message, analyzing factors such as sudden traffic spikes, compute and bandwidth limitations, security attacks, and maintenance policies, and then offers actionable solutions including query optimization, off‑peak usage, third‑party cloud platforms, and local deployment.

AIDeepSeekModel Deployment

0 likes · 10 min read

Why Is DeepSeek Server Overloaded? Causes and Practical Workarounds

Architect

Feb 17, 2025 · Artificial Intelligence

Deploying DeepSeek R1 on Huawei Ascend 910B: Weight Conversion and Troubleshooting

This article details a step‑by‑step deployment of the DeepSeek R1 model on Huawei Ascend 910B NPUs, covering FP8‑to‑BF16 weight conversion, custom container image preparation, configuration of MindIE services, common pitfalls, and practical troubleshooting tips for large‑scale inference.

DeepSeekHuawei AscendMindIE

0 likes · 8 min read

Deploying DeepSeek R1 on Huawei Ascend 910B: Weight Conversion and Troubleshooting

ByteDance Cloud Native

Feb 13, 2025 · Cloud Computing

Deploy the Full‑Size DeepSeek‑R1 Model on Volcengine Cloud with Terraform and Kubernetes

This guide walks you through two practical solutions for deploying the massive DeepSeek‑R1 model on Volcengine Cloud—one using Terraform for a quick two‑node GPU setup and another leveraging cloud‑native multi‑node distributed inference with Kubernetes, covering resource sizing, environment preparation, model download, monitoring, autoscaling, and storage acceleration.

AIKubernetesModel Deployment

0 likes · 22 min read

Deploy the Full‑Size DeepSeek‑R1 Model on Volcengine Cloud with Terraform and Kubernetes

Alibaba Cloud Infrastructure

Feb 13, 2025 · Cloud Computing

Deploy DeepSeek‑R1 LLM on Alibaba Cloud ACK One with ACS GPU in Minutes

This guide walks you through deploying the DeepSeek‑R1 large‑language‑model inference service on Alibaba Cloud ACK One registered clusters using ACS GPU compute, covering model preparation, OSS storage setup, PersistentVolume configuration, arena‑based service deployment, and verification steps with concrete commands and parameters.

ACK OneACS GPUDeepSeek

0 likes · 14 min read

Deploy DeepSeek‑R1 LLM on Alibaba Cloud ACK One with ACS GPU in Minutes

DeWu Technology

Feb 12, 2025 · Artificial Intelligence

Edge Intelligence for Intelligent Video Cover Recommendation

The article describes an edge‑based video‑cover recommendation system for DeWu that leverages the MNN SDK and a lightweight MobileNetV3 model, performing on‑device inference with quantization and parallel processing to automatically select high‑quality covers, achieving sub‑second latency and boosting click‑through rates by up to 18 %.

Edge AIInference OptimizationModel Deployment

0 likes · 12 min read

Edge Intelligence for Intelligent Video Cover Recommendation

Alibaba Cloud Native

Feb 10, 2025 · Cloud Native

How to Deploy DeepSeek‑R1‑Distill Models on Alibaba Cloud CAP (Ollama & Transformer)

This guide walks you through deploying various DeepSeek‑R1‑Distill models on Alibaba Cloud's Serverless AI platform CAP, covering supported models, deployment options (Ollama and Transformer), step‑by‑step template and model‑service setups, validation methods, and tips for adding custom models.

AICAPDeepSeek

0 likes · 10 min read

How to Deploy DeepSeek‑R1‑Distill Models on Alibaba Cloud CAP (Ollama & Transformer)

Infra Learning Club

Feb 8, 2025 · Artificial Intelligence

Why People Pay for DeepSeek Installation Packages (and How to Install It Yourself)

The article explains that DeepSeek is an open‑source LLM that many sellers monetize by offering paid installation packages, outlines the model lineup and size options, and provides a step‑by‑step guide to install and run DeepSeek locally with Ollama and Open WebUI.

AI modelsDeepSeekLLM

0 likes · 7 min read

Why People Pay for DeepSeek Installation Packages (and How to Install It Yourself)

Tencent Cloud Developer

Feb 7, 2025 · Artificial Intelligence

Launch DeepSeek Models in Seconds with One‑Click Cloud Development

This guide shows how to start DeepSeek large‑language models on cnb.cool in just 5‑10 seconds without downloading, using a simple three‑step process that includes forking the repository, selecting a model branch, and running Ollama or Docker commands, plus options for long‑term cloud deployment.

AICloud NativeDeepSeek

0 likes · 3 min read

Launch DeepSeek Models in Seconds with One‑Click Cloud Development

Alibaba Cloud Developer

Feb 5, 2025 · Artificial Intelligence

Deploy DeepSeek Models on Alibaba Cloud PAI with One-Click: A Step-by-Step Guide

This tutorial shows how to log into Alibaba Cloud PAI, navigate to the Model Gallery, select a DeepSeek model such as the distilled DeepSeek‑R1‑Distill‑Qwen‑7B, and deploy it with a single click using vLLM or BladeLLM, providing endpoint and token details for immediate use.

AIAlibaba CloudBladeLLM

0 likes · 3 min read

Deploy DeepSeek Models on Alibaba Cloud PAI with One-Click: A Step-by-Step Guide

Huawei Cloud Developer Alliance

Feb 5, 2025 · Artificial Intelligence

Deploy DeepSeek‑V3 on Ascend: Step‑by‑Step Guide for Fast AI Inference

This guide walks developers through obtaining the DeepSeek‑V3 model on the Ascend community, converting weights for GPU and NPU, loading the appropriate MindIE Docker image, launching the container, and configuring service‑level parameters to achieve efficient, out‑of‑the‑box AI inference on Ascend hardware.

AI inferenceAscendDeepSeek

0 likes · 4 min read

Deploy DeepSeek‑V3 on Ascend: Step‑by‑Step Guide for Fast AI Inference

21CTO

Feb 4, 2025 · Artificial Intelligence

Run DeepSeek Locally with Ollama: A Complete Step‑by‑Step Guide

This guide walks you through installing Ollama, selecting the appropriate DeepSeek model, running it locally, and exploring integration options, highlighting the benefits of offline AI such as data privacy, faster performance, and zero subscription costs.

AI TutorialArtificial IntelligenceDeepSeek

0 likes · 7 min read

Run DeepSeek Locally with Ollama: A Complete Step‑by‑Step Guide

Tencent Tech

Feb 4, 2025 · Artificial Intelligence

Deploy and Test DeepSeek Large Language Models on Tencent Cloud TI in Minutes

This guide walks you through quickly deploying DeepSeek series models on the Tencent Cloud TI platform, covering model selection, resource planning, step‑by‑step service creation, free online trial, API testing via built‑in tools or curl, and managing inference services for both large and compact models.

AI inferenceDeepSeekModel Deployment

0 likes · 13 min read

Deploy and Test DeepSeek Large Language Models on Tencent Cloud TI in Minutes

JavaEdge

Feb 2, 2025 · Artificial Intelligence

Mastering LLMOps: From Model Deployment to Scalable AI Operations

This article explains LLMOps—its goals, core activities, benefits, best practices, and how using an LLMOps platform like Dify can dramatically cut development time, simplify prompt engineering, data preparation, monitoring, and deployment of large language models.

AI OperationsData ManagementLLMOps

0 likes · 13 min read

Mastering LLMOps: From Model Deployment to Scalable AI Operations

Alibaba Cloud Big Data AI Platform

Feb 1, 2025 · Artificial Intelligence

Deploy DeepSeek-V3 and R1 Models with One-Click on Alibaba Cloud PAI Model Gallery

This article introduces Alibaba Cloud's PAI Model Gallery, detailing the DeepSeek-V3 and DeepSeek‑R1 large language models, their architectures and parameters, and provides a step‑by‑step guide for one‑click deployment of these models and their distilled variants using vLLM or BladeLLM.

AI inferenceAlibaba CloudDeepSeek

0 likes · 6 min read

Deploy DeepSeek-V3 and R1 Models with One-Click on Alibaba Cloud PAI Model Gallery

DataFunSummit

Jan 19, 2025 · Artificial Intelligence

Understanding MLOps and LMOps: Evolution, Engineering Practices, and Future Trends for Large Models

This article reviews the development of MLOps, introduces the emerging LMOps framework for large‑model engineering, outlines key architectural components, discusses current challenges and industry trends, and presents future directions and standardization efforts in AI operations.

AI EngineeringAI OpsLMOps

0 likes · 18 min read

Understanding MLOps and LMOps: Evolution, Engineering Practices, and Future Trends for Large Models

DevOps

Jan 6, 2025 · Artificial Intelligence

Ten Popular Large Language Model Deployment Engines and Tools: Features, Advantages, and Limitations

This article reviews ten mainstream LLM deployment solutions—including WebLLM, LM Studio, Ollama, vLLM, LightLLM, OpenLLM, HuggingFace TGI, GPT4ALL, llama.cpp, and Triton Inference Server—detailing their technical characteristics, strengths, drawbacks, and example deployment workflows for both personal and enterprise environments.

AI inferenceGPU AccelerationLLM

0 likes · 16 min read

Ten Popular Large Language Model Deployment Engines and Tools: Features, Advantages, and Limitations

DeWu Technology

Dec 11, 2024 · Artificial Intelligence

MLOps Practices for Improving Order Fulfillment Timeliness

The supply‑chain team leveraged core MLOps practices—versioning, testing, automated reproducible pipelines, deployment monitoring, and documentation—to eliminate data leakage, ensure online consistency, and accelerate model upgrades, using traffic‑replay, FAAS‑based decoupling, and approval workflows, ultimately cutting order‑fulfillment times, reducing costs, and enabling business teams to adopt reliable AI models at scale.

MLOpsModel Deploymentautomation

0 likes · 18 min read

MLOps Practices for Improving Order Fulfillment Timeliness

Test Development Learning Exchange

Dec 5, 2024 · Artificial Intelligence

End-to-End House Prices Prediction Project: Data Collection, Preprocessing, Modeling, Evaluation, and Deployment with Python

This tutorial walks through a complete house price prediction project, covering data collection from Kaggle, preprocessing with pandas and scikit‑learn, model training using RandomForestRegressor, evaluation, and deployment of a Flask API for real‑time predictions, providing full code examples.

FlaskMachine LearningModel Deployment

0 likes · 9 min read

End-to-End House Prices Prediction Project: Data Collection, Preprocessing, Modeling, Evaluation, and Deployment with Python

Baidu Geek Talk

Nov 25, 2024 · Artificial Intelligence

PP-ShiTuV2: A General Image Recognition Pipeline in PaddleX

PP‑ShiTuV2, a PaddleX pipeline that integrates subject detection, deep feature encoding, and vector retrieval, delivers 91 % recall@1 on AliProducts, surpasses earlier models by over 20 points, runs efficiently on GPU and CPU, and offers simple installation, quick‑start code, and full fine‑tuning support.

Model DeploymentPP-ShiTuV2PaddleX

0 likes · 8 min read

PP-ShiTuV2: A General Image Recognition Pipeline in PaddleX

Huolala Tech

Oct 24, 2024 · Artificial Intelligence

How Huolala’s Dolphin Platform Accelerates AI Model Delivery with Cloud‑Native Automation

This article describes how Huolala built a cloud‑native AI development platform called Dolphin to overcome low model delivery efficiency and poor compute‑resource utilization, detailing its architecture, one‑stop workflow, resource‑pooling, observability, and future roadmap for scaling AI across the company.

Cloud NativeKubernetesModel Deployment

0 likes · 10 min read

How Huolala’s Dolphin Platform Accelerates AI Model Delivery with Cloud‑Native Automation

Baidu Geek Talk

Sep 23, 2024 · Artificial Intelligence

Intelligent Early Screening System for Malignant Skin Tumors Based on PaddleX Low‑Code AI

The Meikel Studio team created an intelligent early‑screening system for malignant skin tumors on the PaddleX low‑code AI platform, which automatically captures dermatoscopic images, segments lesions with the PP‑LiteSeg model, achieves high accuracy (mIoU 0.868) and rapid inference, and offers one‑click deployment via RESTful API to improve diagnosis efficiency and support future medical‑imaging applications.

AI segmentationLow‑codeModel Deployment

0 likes · 9 min read

Intelligent Early Screening System for Malignant Skin Tumors Based on PaddleX Low‑Code AI

DeWu Technology

Aug 19, 2024 · Artificial Intelligence

Multi‑LoRA Deployment for Large Language Models: Concepts, Fine‑tuning, and Cost‑Effective Strategies

The article introduces a multi‑LoRA strategy that lets many scenario‑specific adapters share a single base LLM, dramatically cutting GPU usage and cost while preserving performance, and explains how to fine‑tune with LoRA, merge adapters, and serve them efficiently using VLLM.

LoRAModel Deploymentfine-tuning

0 likes · 10 min read

Multi‑LoRA Deployment for Large Language Models: Concepts, Fine‑tuning, and Cost‑Effective Strategies

Qunhe Technology Quality Tech

Aug 14, 2024 · Artificial Intelligence

Should Your Testing Team Build a Private LLM or Use RAG with a General Model?

This article compares the high costs and technical challenges of building a private large language model with the benefits, flexibility, and lower risk of using Retrieval‑Augmented Generation (RAG) on a general LLM, offering practical guidance for testing teams seeking AI assistance.

AIModel DeploymentRAG

0 likes · 11 min read

Should Your Testing Team Build a Private LLM or Use RAG with a General Model?

58 Tech

Aug 7, 2024 · Artificial Intelligence

Bridging Compute and Applications: 58.com AI Lab’s Large‑Model Platform and AI Agent Solutions

In this article, 58.com AI Lab senior director Zhan Kunlin explains how the company built a multi‑layer AI platform, created a vertical large‑language model called LingXi, and developed an AI Agent system with RAG capabilities to accelerate practical AI applications across various business scenarios.

AI PlatformAI agentsLarge Language Model

0 likes · 10 min read

Bridging Compute and Applications: 58.com AI Lab’s Large‑Model Platform and AI Agent Solutions

DataFunTalk

Jun 21, 2024 · Artificial Intelligence

Fine‑tuning Large Language Models with Alibaba Cloud PAI: Practices, Techniques, and Deployment

This article introduces the Alibaba Cloud PAI platform for large language model (LLM) fine‑tuning, covering model‑training pipelines, performance‑cost trade‑offs, retrieval‑augmented generation, fine‑tuning methods such as full‑parameter, LoRA and QLoRA, model selection, data preparation, evaluation, and real‑world deployment examples.

AI PlatformLLMModel Deployment

0 likes · 20 min read

Fine‑tuning Large Language Models with Alibaba Cloud PAI: Practices, Techniques, and Deployment

Alibaba Cloud Infrastructure

Jun 12, 2024 · Artificial Intelligence

Deploy Llama‑2 on ACK with KServe, Triton, and TensorRT‑LLM – Step‑by‑Step Guide

This tutorial walks through deploying the Llama‑2‑7b‑hf model on Alibaba Cloud Kubernetes (ACK) using KServe, Triton Inference Server with the TensorRT‑LLM backend, covering prerequisites, model preparation, YAML configuration, PV/PVC setup, runtime creation, and troubleshooting steps.

AI inferenceKServeKubernetes

0 likes · 13 min read

Deploy Llama‑2 on ACK with KServe, Triton, and TensorRT‑LLM – Step‑by‑Step Guide

Baidu Tech Salon

Jun 7, 2024 · Artificial Intelligence

How AI Transforms Financial Report Extraction: From Layout Analysis to Table Recognition

This article examines the challenges of extracting data from complex financial reports and presents an AI‑driven solution that combines advanced layout analysis, table recognition, OCR, and large‑language‑model integration using Baidu’s PaddlePaddle low‑code platform, detailing model selection, training, performance tuning, and deployment.

AIDocument ExtractionLayout Analysis

0 likes · 11 min read

How AI Transforms Financial Report Extraction: From Layout Analysis to Table Recognition

Sohu Tech Products

Jun 5, 2024 · Artificial Intelligence

How Treelite Supercharges Tree Model Inference by Up to 6×

This article introduces Treelite, an open‑source library that compiles XGBoost, LightGBM, and scikit‑learn tree models into optimized shared libraries, explains its branch‑prediction and comparison‑simplification techniques, and provides step‑by‑step Python examples showing significant inference speed gains across different batch sizes.

LightGBMModel DeploymentPython

0 likes · 6 min read

How Treelite Supercharges Tree Model Inference by Up to 6×

DataFunSummit

May 10, 2024 · Artificial Intelligence

LLMOps: Definition, Fine‑tuning Techniques, Application Architecture, Challenges and Solutions

This article introduces LLMOps by defining large language model operations, explains the three stages of LLM development, details modern fine‑tuning methods such as PEFT, Adapter, Prefix, Prompt and LoRA, outlines the architecture for building LLM applications, discusses the main difficulties of agent‑based deployments, and presents practical solutions including Prompt IDE, low‑code deployment, monitoring and cost control.

AI OperationsLLMOpsModel Deployment

0 likes · 14 min read

LLMOps: Definition, Fine‑tuning Techniques, Application Architecture, Challenges and Solutions

Eric Tech Circle

Apr 18, 2024 · Artificial Intelligence

Hands‑On Review of LM Studio: Install, Run, and Evaluate Open‑Source LLMs on Windows

This article walks through installing LM Studio on a Windows PC, downloading models from Hugging Face, using the AI Chat interface (including a Codellama‑generated Snake game), measuring resource usage, exploring the built‑in OpenAI‑compatible API, and summarizing its strengths and limitations.

AI chatHugging FaceLM Studio

0 likes · 5 min read

Hands‑On Review of LM Studio: Install, Run, and Evaluate Open‑Source LLMs on Windows

360 Tech Engineering

Apr 15, 2024 · Artificial Intelligence

Fine‑Tuning Large Language Models: A Practical Guide Using Qwen‑14B on the 360AI Platform

This article explains the concept, motivations, and step‑by‑step workflow for fine‑tuning large language models—specifically Qwen‑14B—covering data preparation, training commands with DeepSpeed, hyper‑parameter settings, evaluation, and deployment via FastChat, all illustrated with code snippets and configuration details.

AIDeepSpeedFastChat

0 likes · 10 min read

Fine‑Tuning Large Language Models: A Practical Guide Using Qwen‑14B on the 360AI Platform

OPPO Kernel Craftsman

Mar 22, 2024 · Artificial Intelligence

InternLM Model Fine-Tuning Tutorial with XTuner: Chat Format and Practical Implementation Guide

This tutorial walks through fine‑tuning Shanghai AI Lab’s open‑source InternLM models with XTuner, explaining chat‑format conventions, loading and inference (including multimodal InternLM‑XComposer), dataset preparation, configuration sections, DeepSpeed acceleration, and memory‑efficient QLoRA details for 7‑B‑parameter chat models.

Chat FormatDeepSpeedInternLM

0 likes · 22 min read

InternLM Model Fine-Tuning Tutorial with XTuner: Chat Format and Practical Implementation Guide

Alibaba Cloud Big Data AI Platform

Feb 29, 2024 · Artificial Intelligence

Deploy and Fine‑Tune Qwen1.5 LLM with Alibaba PAI‑QuickStart

This article introduces Alibaba Cloud's open‑source Qwen1.5 large language model series, highlights its multilingual, human‑preference alignment, and long‑context capabilities, and provides step‑by‑step guidance on using PAI‑QuickStart for model deployment, fine‑tuning, and Python SDK integration.

Model DeploymentPAI-QuickStartQwen1.5

0 likes · 9 min read

Deploy and Fine‑Tune Qwen1.5 LLM with Alibaba PAI‑QuickStart

DataFunSummit

Feb 25, 2024 · Artificial Intelligence

Tencent FinTech AI Development Platform: Architecture, Challenges, and Solutions

This article introduces Tencent FinTech’s AI development platform, outlining its business background and goals, the technical challenges encountered in feature engineering, model training, and inference stability, and the comprehensive solutions—including a unified feature engine, distributed training framework, optimized deployment, and future plans for large‑scale graph training and AutoML.

AI PlatformFinTechModel Deployment

0 likes · 13 min read

Tencent FinTech AI Development Platform: Architecture, Challenges, and Solutions

21CTO

Feb 22, 2024 · Artificial Intelligence

How Google’s Open‑Source Gemma Model Brings LLM Power to Your Laptop

Google’s newly released open‑source Gemma models let developers run powerful large‑language‑model workloads on notebooks, workstations, or cloud platforms, offering competitive performance, extensive tooling, and built‑in safety measures for responsible AI deployment.

AI safetyGemmaGoogle AI

0 likes · 6 min read

How Google’s Open‑Source Gemma Model Brings LLM Power to Your Laptop

DataFunSummit

Feb 3, 2024 · Artificial Intelligence

Practical Application of Large Language Models in MaShang Consumer Finance: From Model Building to Deployment

This article details how MaShang Consumer Finance leverages large language models for sales, collection, and customer service, covering company background, AI research achievements, model training infrastructure, data‑quality and compliance challenges, prompt engineering, inference acceleration, evaluation methods, and lessons learned from real‑world deployment.

ComplianceFinanceLLM

0 likes · 21 min read

Practical Application of Large Language Models in MaShang Consumer Finance: From Model Building to Deployment

Alibaba Cloud Big Data AI Platform

Jan 12, 2024 · Artificial Intelligence

Deploy and Fine‑Tune Mixtral‑8x7B on Alibaba Cloud PAI: A Step‑by‑Step Guide

This guide introduces the open‑source Mixtral‑8x7B large language model, explains its architecture and performance, and provides detailed instructions for using Alibaba Cloud PAI‑QuickStart to deploy, invoke via API or SDK, and fine‑tune the model with LoRA on Lingjun GPU resources.

Alibaba Cloud PAIMixtralModel Deployment

0 likes · 16 min read

Deploy and Fine‑Tune Mixtral‑8x7B on Alibaba Cloud PAI: A Step‑by‑Step Guide

Alibaba Cloud Native

Jan 6, 2024 · Cloud Computing

Deploy ModelScope Models to Alibaba Cloud Function Compute in 5 Minutes

This guide walks readers through using ModelScope’s SwingDeploy service to locate, configure, and instantly deploy open‑source AI models to Alibaba Cloud Function Compute, explaining the resources created, how to invoke the model via HTTP triggers, and how to optimize performance with provisioned instances, logging, and concurrency settings.

AI model servingAlibaba CloudFunction Compute

0 likes · 15 min read

Deploy ModelScope Models to Alibaba Cloud Function Compute in 5 Minutes

Alibaba Cloud Native

Dec 27, 2023 · Cloud Computing

One‑Click Deployment of LLMs to Alibaba Cloud Function Compute with SwingDeploy

This guide explains how to quickly select a ModelScope open‑source LLM, deploy it to Alibaba Cloud Function Compute using the SwingDeploy one‑click feature, enable reserved idle billing, and evaluate the cost savings compared with traditional GPU provisioning.

Function ComputeGPULLM

0 likes · 11 min read

One‑Click Deployment of LLMs to Alibaba Cloud Function Compute with SwingDeploy

Rare Earth Juejin Tech Community

Dec 27, 2023 · Artificial Intelligence

Comprehensive Overview of Large Language Models: Capabilities, Limitations, Deployment, and Future Trends

This article provides a detailed examination of large language models, covering their underlying technologies, capabilities and constraints, model families, training processes, cloud and edge deployment challenges, agent architectures, and emerging trends, offering practical insights for developers, product managers, and researchers.

Artificial IntelligenceLLMModel Deployment

0 likes · 43 min read

Comprehensive Overview of Large Language Models: Capabilities, Limitations, Deployment, and Future Trends

Baobao Algorithm Notes

Dec 1, 2023 · Operations

Deploy Hugging Face Transformers with One Click Using LMDeploy

This article explains how LMDeploy streamlines the deployment of Hugging Face transformer models by adding online conversion, offering an OpenAI‑compatible API server, a Gradio WebUI, and 4‑bit weight‑only quantization with AWQ, providing step‑by‑step commands, code examples, and performance insights.

AI inferenceAPI ServerHugging Face

0 likes · 9 min read

Deploy Hugging Face Transformers with One Click Using LMDeploy

Ant R&D Efficiency

Oct 26, 2023 · Artificial Intelligence

TestAgent: Open-Source 7B LLM That Supercharges Automated Test Generation

TestAgent is an open-source 7B test-domain LLM that delivers multi-language test-case generation, automatic assert completion, and a rapid deployment framework, offering industry-leading pass@1 scores, a ChatBot UI, and detailed setup instructions for diverse hardware environments.

AI testingLarge Language ModelModel Deployment

0 likes · 8 min read

TestAgent: Open-Source 7B LLM That Supercharges Automated Test Generation

DaTaobao Tech

Oct 25, 2023 · Artificial Intelligence

Prompt Engineering, LLM Supervised Fine‑Tuning, and Mobile Tmall AI Assistant Application

The article explains prompt engineering techniques, supervised fine‑tuning of large language models, and their practical deployment in the Mobile Tmall AI shopping assistant, detailing ChatGPT’s generation steps, Transformer architecture, prompt clarity, delimiters, role‑play, few‑shot and chain‑of‑thought prompting, SFT versus pre‑training, LoRA adapters, data collection, Qwen‑14B training configuration, SDK‑based inference, and comprehensive evaluation.

AI assistantLLM fine-tuningModel Deployment

0 likes · 14 min read

Prompt Engineering, LLM Supervised Fine‑Tuning, and Mobile Tmall AI Assistant Application

Baidu Geek Talk

Oct 11, 2023 · Artificial Intelligence

How Baidu’s Qianfan 2.0 Supercharges Large‑Model Development and Deployment

The article reviews Baidu Cloud’s Qianfan 2.0 platform, detailing its expanded model catalog, dataset library, Chinese‑language enhancements, compression and speed gains, robust AI infrastructure, application templates, and end‑to‑end data‑labeling pipeline that together lower cost and accelerate large‑model adoption across industries.

AI PlatformCloud AIModel Deployment

0 likes · 14 min read

How Baidu’s Qianfan 2.0 Supercharges Large‑Model Development and Deployment

DataFunTalk

Sep 1, 2023 · Artificial Intelligence

Risk Control Model Construction for Online Small‑Loan Scenarios: Pre‑Loan, In‑Loan, Post‑Loan and Monitoring

This article explains how to build and deploy risk‑control models for online micro‑loans across pre‑loan, in‑loan and post‑loan stages, covering data ingestion, feature engineering, model scoring, decision flow, optimization attempts, and monitoring practices.

Credit ScoringFinTechMachine Learning

0 likes · 16 min read

Risk Control Model Construction for Online Small‑Loan Scenarios: Pre‑Loan, In‑Loan, Post‑Loan and Monitoring

JD Tech

Aug 4, 2023 · Artificial Intelligence

Deploying and Evaluating the Vicuna Open‑Source Large Language Model on a Single Machine

This article details a step‑by‑step guide to deploying the Vicuna open‑source LLM on a single server, covering model preparation, environment setup, dependency installation, GPU and CUDA configuration, inference commands, performance evaluation, and attempted fine‑tuning, while sharing practical observations and results.

Fine‑tuningGPULLM

0 likes · 16 min read

Deploying and Evaluating the Vicuna Open‑Source Large Language Model on a Single Machine

360 Quality & Efficiency

Aug 4, 2023 · Artificial Intelligence

Machine Learning Model Testing Workflow and Best Practices

This article outlines the essential concepts, data preparation, model creation, training, deployment, and verification steps for testing machine‑learning models, highlighting dataset requirements, algorithm categories, framework choices, resource considerations, and provides a sample inference request.

AIMachine LearningModel Deployment

0 likes · 7 min read

Machine Learning Model Testing Workflow and Best Practices

Alibaba Cloud Big Data AI Platform

Jul 25, 2023 · Artificial Intelligence

Fine‑Tune and Deploy Llama 2 on Alibaba Cloud PAI in Minutes

This guide walks you through using Meta's open‑source Llama 2 models on Alibaba Cloud's PAI platform, covering low‑code LoRA fine‑tuning, full‑parameter fine‑tuning with PAI‑DSW, and rapid WebUI deployment via PAI‑EAS, complete with step‑by‑step instructions, code snippets, and resource requirements.

AIAlibaba CloudLlama2

0 likes · 16 min read

Fine‑Tune and Deploy Llama 2 on Alibaba Cloud PAI in Minutes

DataFunTalk

Jul 11, 2023 · Artificial Intelligence

Sunshine Insurance Group's Zhèngyán Large Model Open Platform: Architecture, Tools, and Business Applications

The article describes Sunshine Insurance Group's Zhèngyán Large Model Open Platform, detailing its three‑layer architecture, AutoTrain tool, self‑developed LLM, smart routing, plugin marketplace, intelligent review, and how these capabilities empower insurance marketing, sales, service, and management through AI‑driven solutions.

AI PlatformInsurance TechnologyModel Deployment

0 likes · 13 min read

Sunshine Insurance Group's Zhèngyán Large Model Open Platform: Architecture, Tools, and Business Applications

DataFunSummit

Jun 24, 2023 · Artificial Intelligence

From Model to Service: Alibaba Cloud Machine Learning PAI One‑Stop Model Development and Deployment Practice

This article presents an end‑to‑end overview of Alibaba Cloud’s Machine Learning PAI platform, detailing the three‑stage ML workflow, challenges in model development, the role of pre‑trained and open‑source models, PAI’s architecture, a hands‑on demo, and MLOps best practices for efficient model deployment.

Alibaba CloudMLOpsModel Deployment

0 likes · 11 min read

From Model to Service: Alibaba Cloud Machine Learning PAI One‑Stop Model Development and Deployment Practice

JD Retail Technology

May 18, 2023 · Artificial Intelligence

Local Deployment, Inference, and Fine‑tuning of the Vicuna‑7B Large Language Model

This article details the step‑by‑step process of preparing the environment, merging weights, installing dependencies, running inference, evaluating Vicuna‑7B against other models, and attempting fine‑tuning, while highlighting performance results, encountered issues, and future work for large language model deployment.

GPULarge Language ModelModel Deployment

0 likes · 11 min read

Local Deployment, Inference, and Fine‑tuning of the Vicuna‑7B Large Language Model

HelloTech

Apr 19, 2023 · Cloud Native

How FaaS Transforms AI Platforms: Lessons from Haro’s Cloud‑Native Journey

The article analyzes the operational, stability, and cost challenges of Haro’s AI platform, explains why a serverless FaaS architecture—specifically Knative—was selected, and details the implementation steps, performance gains, and future scenarios for AI workloads.

AI PlatformCloud NativeFaaS

0 likes · 8 min read

How FaaS Transforms AI Platforms: Lessons from Haro’s Cloud‑Native Journey

HelloTech

Apr 12, 2023 · Artificial Intelligence

Integrating Machine Learning Ranking into Elasticsearch: Architecture, Components, and Performance

The team embedded a full machine‑learning ranking pipeline as an Elasticsearch plug‑in—combining real‑time and offline feature stores, hot‑loadable model jars via Dragonfly, an MLeap execution engine, and a DSL for feature definition—replacing the coarse‑ranking logistic‑regression with a tree model that adds ~10 ms latency but yields a 1.2 % AB‑test lift, while maintaining high throughput, low CPU usage, and supporting future batch deep‑learning rescoring.

Model Deploymentfeature engineeringonline prediction

0 likes · 16 min read

Integrating Machine Learning Ranking into Elasticsearch: Architecture, Components, and Performance

Tencent Advertising Technology

Mar 30, 2023 · Artificial Intelligence

Tencent's Taiji Machine Learning Platform: End-to-End MLOps for Advertising

Tencent’s Taiji machine learning platform, a cloud‑native, distributed parameter‑server system, provides end‑to‑end MLOps for advertising by integrating data ingestion, feature engineering, model training, evaluation, deployment, and monitoring, supporting massive models up to billions of parameters while improving efficiency, scalability, and resource management.

MLOpsMachine Learning PlatformModel Deployment

0 likes · 18 min read

Tencent's Taiji Machine Learning Platform: End-to-End MLOps for Advertising