Tag

LLM deployment

1 views collected around this technical thread.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Apr 30, 2025 · Cloud Native

Deploying Qwen3-8B Large Language Model on Alibaba Cloud ACK with ACS GPU Acceleration

This guide explains how to prepare, deploy, and verify the Qwen3‑8B large language model on an Alibaba Cloud Container Service for Kubernetes (ACK) cluster using ACS GPU resources, covering prerequisites, model download, storage setup, Kubernetes manifests, and testing the inference service.

ACSAckCloud Native
0 likes · 8 min read
Deploying Qwen3-8B Large Language Model on Alibaba Cloud ACK with ACS GPU Acceleration
Data Thinking Notes
Data Thinking Notes
Feb 20, 2025 · Artificial Intelligence

How to Deploy DeepSeek R1 671B Model Locally with Ollama: A Step‑by‑Step Guide

This article provides a comprehensive tutorial on locally deploying the 671‑billion‑parameter DeepSeek R1 model using Ollama, covering model selection, hardware requirements, dynamic quantization, detailed installation steps, performance observations, and practical recommendations for consumer‑grade hardware.

AI model optimizationDeepSeekGPU inference
0 likes · 14 min read
How to Deploy DeepSeek R1 671B Model Locally with Ollama: A Step‑by‑Step Guide
Architecture Digest
Architecture Digest
Feb 6, 2025 · Artificial Intelligence

Deploying DeepSeek R1 671B Model Locally with Ollama and Dynamic Quantization

This guide explains how to deploy the full 671B DeepSeek R1 model on local hardware using Ollama, leveraging dynamic quantization to shrink model size, detailing hardware requirements, step‑by‑step installation, configuration, performance observations, and practical recommendations.

DeepSeekGPULLM deployment
0 likes · 12 min read
Deploying DeepSeek R1 671B Model Locally with Ollama and Dynamic Quantization
DataFunTalk
DataFunTalk
Jan 4, 2024 · Artificial Intelligence

Using OpenLLM to Quickly Build and Deploy Large Language Model Applications

This presentation explains how OpenLLM, an open‑source LLM framework, together with BentoML, addresses the challenges of deploying large language models by offering model switching, memory optimizations, multi‑GPU support, observability, and easy containerized deployment for production AI applications.

AI optimizationBentoMLLLM deployment
0 likes · 18 min read
Using OpenLLM to Quickly Build and Deploy Large Language Model Applications