Artificial Intelligence 23 min read

Deploying a Private LLM Knowledge Base on a MacBook

The guide walks through installing and quantizing the open‑source ChatGLM3‑6B model and the m3e‑base embedder on a MacBook, wrapping them with a FastAPI OpenAI‑compatible service, routing requests through a One‑API gateway, storing metadata in MongoDB and vectors in PostgreSQL pgvector, deploying FastGPT for RAG, ingesting data, and demonstrating 5‑7 second response times, while outlining future improvements.

DaTaobao Tech

Dec 27, 2023

Deploying a Private LLM Knowledge Base on a MacBook

This article describes how to set up a private large‑language‑model (LLM) knowledge‑base solution on a MacBook to assist personal knowledge management.

It first explains the motivation for a local deployment (data security, flexibility) and then outlines the overall architecture, which combines the Chinese open‑source model ChatGLM3‑6B, the embedding model m3e‑base, a FastAPI wrapper exposing OpenAI‑compatible endpoints, the One‑API gateway, and the FastGPT knowledge‑base platform.

Model preparation : download ChatGLM3‑6B from HuggingFace or ModelScope, quantize it with chatglm.cpp (8‑bit or lower), and verify the model with ./build/bin/main -m chatglm3-ggml-q8.bin -i. Also download the m3e‑base embedding model.

Model API service : a FastAPI application is built (see code excerpt) that provides /v1/chat/completions and /v1/embeddings endpoints, using chatglm_cpp.Pipeline for inference and SentenceTransformer for embeddings. The service is run with

uvicorn chatglm_cpp.openai_api:app --host 127.0.0.1 --port 8000

One‑API gateway : the open‑source One‑API project (Go/Node) is compiled and configured to route requests to the local ChatGLM3 and m3e‑base services, allowing unified API management and token accounting.

Knowledge‑base backend : MongoDB stores metadata while PostgreSQL with the pgvector extension stores vector embeddings. Installation commands for both databases on macOS are provided.

FastGPT deployment : FastGPT (an open‑source RAG system) is cloned, its environment variables point to the One‑API endpoint, MongoDB, and PostgreSQL. After installing Node.js dependencies ( pnpm i) and launching the app ( pnpm dev), the web UI is accessible at http://localhost:3000.

Knowledge ingestion : the guide shows how to create a knowledge base, select the m3e‑base index model, and import data via manual entry, CSV, or API. Example queries demonstrate successful retrieval and citation.

Validation and results : sample chat interactions with ChatGLM3 show response times around 5–7 seconds and memory usage of ~3.8 GB on a 16 GB MacBook.

Future work : suggestions include improving chunking and embedding strategies, advanced prompt engineering, workflow orchestration, scaling hardware, and applying the system to real business problems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM Deployment RAG Knowledge Base ChatGLM3 FastAPI MacBook

Written by

DaTaobao Tech

Official account of DaTaobao Technology

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.