Artificial Intelligence 7 min read

Overview of Ollama: Architecture, Storage Structure, and Dialogue Process

This article provides a comprehensive overview of Ollama, a lightweight tool for running large language models, detailing its client‑server architecture, local storage layout, and the step‑by‑step workflow of user interactions with the model.

System Architect Go

Oct 15, 2024

Overview of Ollama: Architecture, Storage Structure, and Dialogue Process

Ollama Overview

Ollama

is a fast, easy‑to‑use tool for working with LLM (Large Language Models). By installing Ollama, users can interact with large language models without complex environment setup.

This article analyses the overall architecture of Ollama and explains the specific processing flow when users converse with Ollama.

Ollama Overall Architecture

Ollama

adopts a classic client‑server (CS) architecture:

Client interacts with the user via the command line.

Server can be started through the command line, a desktop application based on the Electron framework, or Docker; all launch the same executable.

Client and server communicate over HTTP.

The Ollama Server consists of three core components: ollama-http-server: handles interaction with the client. llama.cpp: the LLM inference engine that loads and runs models, processes inference requests, and returns results. ollama-http-server and llama.cpp communicate with each other via HTTP. llama.cpp is an independent open‑source project that is cross‑platform and hardware‑friendly, capable of running on devices without a GPU, even on Raspberry Pi.

Ollama Storage Structure

The default local storage directory for Ollama is $HOME/.ollama. The file layout is illustrated below:

Files fall into three categories:

Log files, including the user conversation history file and logs/server.log server log.

Key files: id_ed25519 private key and id_ed25519.pub public key.

Model files: blobs (raw data) and manifests (metadata).

Manifest files are JSON documents that borrow concepts from the OCI spec used in cloud‑native and container ecosystems; the digest field in a manifest corresponds to the related blobs file.

Ollama Dialogue Processing Flow

The overall user‑to‑model conversation flow is shown below:

User starts a conversation with the CLI command ollama run llama3.2 (where llama3.2 is an open‑source LLM; other models can be used as well).

Preparation stage:

CLI client sends an HTTP request to ollama-http-server to obtain model information; the server tries to read the local manifests file and returns 404 if not found.

If the model is missing, the CLI asks ollama-http-server to pull the model from a remote repository and store it locally.

The CLI then requests the model information again.

Interactive dialogue stage:

CLI sends an empty /api/generate request to ollama-http-server, which performs internal channel handling.

If the model metadata contains messages, they are displayed; users can save the current session as a new model, with the conversation stored as messages.

Actual conversation: CLI calls /api/chat on ollama-http-server. The server relies on the llama.cpp engine to load the model and perform inference. Before sending the inference request, the server checks the engine health via /health, then issues a /completion request, receives the response, and returns it to the CLI for display.

Through these steps, Ollama completes the interactive dialogue between the user and the large language model.

Conclusion

By integrating the llama.cpp inference engine and encapsulating complex LLM technology, Ollama provides developers and technical users with an efficient, flexible tool that makes large‑model inference and interaction readily accessible for various application scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI tools LLM Ollama client-server llama.cpp

Written by

System Architect Go

Programming, architecture, application development, message queues, middleware, databases, containerization, big data, image processing, machine learning, AI, personal growth.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.