Run AI Models Locally with Docker Model Runner and Java Integration
This article explains how Docker Model Runner enables effortless local execution of AI models, details platform support, provides a full command reference, shows how to use the REST endpoint, and demonstrates integration with Java via LangChain4j, including code examples and a feature comparison with Ollama.
Docker introduced the Model Runner feature in version 4.40, making it simple to run AI models locally without complex environment setup.
Current platform support: Docker Model Runner is available on Apple Silicon (M‑series) Macs, with Windows support planned for future releases.
The feature marks a significant step for Docker into AI development, allowing developers to manage and run large language models locally and avoid reliance on external cloud services.
Available Commands
Check Model Runner Status
Check whether Docker Model Runner is active:
<code>docker model status</code>List All Commands
Show help information and available sub‑commands:
<code>docker model help</code>Output:
<code>Usage: docker model COMMAND
Commands:
list List locally available models
pull Download a model from Docker Hub
rm Remove a downloaded model
run Run a model interactively or with a prompt
status Check if the model runner is running
version Show the current version</code>Pull a Model
Pull a model from Docker Hub to the local environment:
<code>docker model pull <model></code>Example:
<code>docker model pull ai/deepseek-r1-distill-llama</code>Output:
<code>Downloaded: 257.71 MB
Model ai/deepseek-r1-distill-llama pulled successfully</code>List Available Models
List all models currently pulled to the local environment:
<code>docker model list</code>Sample output:
<code>MODEL PARAMETERS QUANTIZATION ARCHITECTURE MODEL ID CREATED SIZE
ai/deepseek-r1-distill-llama 361.82 M IQ2_XXS/Q4_K_M llama 354bf30d0aa3 1 days ago 256.35 MiB</code>Run a Model
Run a model with a single prompt or in interactive chat mode.
Single Prompt
<code>docker model run ai/deepseek-r1-distill-llama "Hi"</code>Output:
<code>Hello! How can I assist you today?</code>Interactive Chat
<code>docker model run ai/deepseek-r1-distill-llama</code>Output:
<code>Interactive chat mode started. Type '/bye' to exit.
> Hi
Hi there! It's SmolLM, AI assistant. How can I help you today?
> /bye
Chat session ended.</code>Delete a Model
<code>docker model rm <model></code>Output:
<code>Model <model> removed successfully</code>Using the REST Endpoint
Enable host‑side TCP support in Docker Desktop GUI or CLI:
<code>docker desktop enable model-runner --tcp <port></code>Then interact via the chosen port, for example:
<code>curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ai/deepseek-r1-distill-llama",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Please write a summary about Docker."}
]
}'</code>LangChain4j Integration
LangChain4j is a Java framework for building applications powered by large language models (LLMs), offering a simple way for Java developers to interact with various LLMs.
Setup Steps
1. Ensure Docker Model Runner Is Enabled
Make sure the Model Runner feature is turned on in Docker Desktop.
2. Add LangChain4j Dependency
Add the following dependencies to your
pom.xml:
<code><dependencies>
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j</artifactId>
<version>1.0.0-beta2</version>
</dependency>
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-open-ai</artifactId>
<version>1.0.0-beta2</version>
</dependency>
</dependencies></code>3. Pull and Run the Desired Model
<code>docker model pull ai/deepseek-r1-distill-llama</code>4. Configure LangChain4j to Connect to the Local Model
<code>import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.openai.OpenAiChatModel;
public class ModelConfig {
public ChatLanguageModel chatLanguageModel() {
return OpenAiChatModel.builder()
.baseUrl("http://localhost:12434/engines/llama.cpp/v1")
.modelName("ai/deepseek-r1-distill-llama")
.temperature(0.7)
.build();
}
}</code>Sample Application
<code>public class DockerModelExample {
interface Assistant {
String chat(String message);
}
public static void main(String[] args) {
ModelConfig config = new ModelConfig();
ChatLanguageModel model = config.chatLanguageModel();
Assistant assistant = AiServices.builder(Assistant.class)
.chatLanguageModel(model)
.build();
String response = assistant.chat("用 Java 编写一个简单的 Hello World 程序");
System.out.println(response);
}
}</code>Summary
Docker Model Runner and Ollama both aim to simplify local AI model execution, but Docker Model Runner is tightly integrated with the Docker ecosystem, while Ollama is a standalone, cross‑platform tool with broader language support and more flexible model customization.
Java Architecture Diary
Committed to sharing original, high‑quality technical articles; no fluff or promotional content.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.