How We Deployed an Office AI System with 8 NVIDIA A800 GPUs: Model Selection Guide
The author details the deployment of an office AI system on an internal network using eight NVIDIA A800 GPUs, explaining model choices, inference engines, GPU allocations, compatibility issues, and presenting the overall architecture diagram.
Background and Constraints
The client’s environment is internal‑network only and cannot run the newest large models. Eight NVIDIA A800 GPUs (50 GB each) were allocated to satisfy office AI workloads.
Model Deployment Table
1 . Engine vllm, model Qwen3.5-35B-A3B (latest inference large model). GPU 0 (A800) 50 GB dedicated memory, port 8001, Docker launch. Note: array GPUs do not support the latest vllm pooling, so only single‑GPU mode is possible and the single GPU cannot start.
2 . Engine vllm, model qwen3.5-27b/qwen3.5-27b-1 (smaller recent inference model). GPUs 0‑1 each 50 GB, ports 8005 and 8007, Docker launch. Note: version incompatibility.
3 . Engine vllm0.8.4, model qwen3-32b (general inference model). GPUs 0‑3 pooled for a total of 168 GB dedicated + 32 GB shared memory, port 8004, Docker launch. Note: GPU pooling enabled, safety protection applied.
4 . Engine vllm0.8.4, model bge-m3 (embedding). GPU 4 with 40 GB memory, port 8002, Docker launch. Note: safety protection.
5 . Engine vllm0.8.4, model bge-reranker-v2-m3 (reranking). GPU 5 with 40 GB memory, port 8003, Docker launch. Note: safety protection.
6 . Engine vllm0.8.4, task speech‑to‑text . GPU 6 with 40 GB memory, port 8006, Docker launch. Note: safety protection.
7 . Engine ollama, task image‑text OCR testing . GPU 7, port 11434, Docker launch. Note: occasional stalls.
Architecture Diagram
Each service runs in its own Docker container and is allocated according to the GPU memory limits, providing a functional office AI system despite the older infrastructure.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AI Large-Model Wave and Transformation Guide
Focuses on the latest large-model trends, applications, technical architectures, and related information.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
