Running Qwen3.5 Locally: Step‑by‑Step Guide with Unsloth Dynamic Quantization

This article explains how to run the 397B Qwen3.5 model on a Mac by using Unsloth Dynamic 2.0 quantization (2‑bit, 3‑bit, or 4‑bit), outlines hardware requirements, provides compilation and download commands for llama.cpp, shows how to launch inference in thinking and non‑thinking modes, and compares several deployment options such as llama‑server, Transformers, SGLang/vLLM, and MLX.

Dynamic QuantizationGGUFLLM deployment

0 likes · 14 min read

Running Qwen3.5 Locally: Step‑by‑Step Guide with Unsloth Dynamic Quantization

Data Thinking Notes

Feb 20, 2025 · Artificial Intelligence

How to Deploy DeepSeek R1 671B Model Locally with Ollama: A Step‑by‑Step Guide

This article provides a comprehensive tutorial on locally deploying the 671‑billion‑parameter DeepSeek R1 model using Ollama, covering model selection, hardware requirements, dynamic quantization, detailed installation steps, performance observations, and practical recommendations for consumer‑grade hardware.

AI model optimizationDeepSeekDynamic Quantization

0 likes · 14 min read

How to Deploy DeepSeek R1 671B Model Locally with Ollama: A Step‑by‑Step Guide

Top Architect

Feb 20, 2025 · Artificial Intelligence

Deploying DeepSeek R1 671B Model Locally with Ollama and Dynamic Quantization

This guide explains how to download, quantize, and run the full‑size 671‑billion‑parameter DeepSeek R1 model on local hardware using Ollama, covering model selection, hardware requirements, step‑by‑step deployment commands, optional web UI setup, performance observations, and practical recommendations.

AIDeepSeekDynamic Quantization

0 likes · 16 min read

Deploying DeepSeek R1 671B Model Locally with Ollama and Dynamic Quantization

Architecture Digest

Feb 6, 2025 · Artificial Intelligence

Deploying DeepSeek R1 671B Model Locally with Ollama and Dynamic Quantization

This guide explains how to deploy the full 671B DeepSeek R1 model on local hardware using Ollama, leveraging dynamic quantization to shrink model size, detailing hardware requirements, step‑by‑step installation, configuration, performance observations, and practical recommendations.

DeepSeekDynamic QuantizationGPU

0 likes · 12 min read

DaTaobao Tech

Oct 16, 2024 · Artificial Intelligence

Dynamic Quantization and Matrix Multiplication Optimization in MNN CPU Backend

The article details MNN’s CPU backend dynamic quantization for Transformer‑type models, describing runtime int8 conversion, block‑wise matrix‑multiply optimizations using ARM SMMLA/SDOT and AVX‑512 VNNI, weight‑group and batch‑wise quantization techniques, and reports up to three‑fold speed‑ups on Snapdragon 8 Gen 3.

CPU optimizationDynamic QuantizationINT8

0 likes · 19 min read

Dynamic Quantization and Matrix Multiplication Optimization in MNN CPU Backend