Building a Simple Local AI Question‑Answer System with Java, LangChain4J, Ollama, and ChromaDB
This article guides readers through the concepts of large language models, embeddings, vector databases, and Retrieval‑Augmented Generation, then demonstrates step‑by‑step how to set up Ollama, install a local Chroma vector store, configure Maven dependencies, and write Java code using LangChain4J to build and test a functional AI Q&A application.
Introduction
The author, interested in AI large models, shares a concise guide for building a local AI question‑answer system using Java, avoiding the complexities of ChatGPT and OpenAI APIs by opting for open‑source models like LLaMA and Qwen.
(1) Large Language Model (LLM)
LLMs are deep‑learning models with billions of parameters, typically based on the Transformer architecture, trained on massive text corpora to perform tasks such as text generation, translation, and question answering.
(2) Embedding
Embeddings map words, sentences, or documents into high‑dimensional vectors that capture semantic similarity. Common methods include Word2Vec, GloVe, FastText, BERT, ELMo, and Sentence‑Transformers.
(3) Vector Database
Vector databases store and index high‑dimensional vectors for efficient similarity search, supporting ANN queries, hybrid filtering, scalability, and real‑time updates. Examples: FAISS, Pinecone, Weaviate, Qdrant, Milvus.
(4) Retrieval‑Augmented Generation (RAG)
RAG combines retrieval of relevant documents from an external knowledge base with generative LLM output, reducing hallucinations, improving factuality, and enabling domain‑specific answers.
AI Application Development Framework
(1) LangChain
LangChain is a framework that simplifies LLM application development by providing chains, agents, memory, loaders, prompt engineering, and integrations with external data sources.
(2) LangChain4J
LangChain4J brings LangChain’s capabilities to the Java ecosystem, offering modular components, multi‑model support, memory, tool integration, and chain execution.
Local Environment Preparation
(1) Start a Local Model with Ollama
Download Ollama, install it, and pull open‑source models (e.g., llama3 , qwen ) via ollama pull modelName . Verify with ollama list and run a model using ollama run modelName .
(2) Launch a Local Vector Database (ChromaDB)
Install with pip install chromadb and start the service using chroma run .
Implementing the Local AI Q&A in Java
(1) Maven Dependencies
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<langchain4j.version>0.31.0</langchain4j.version>
</properties>
<dependencies>
<!-- LangChain4J core -->
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-core</artifactId>
<version>${langchain4j.version}</version>
</dependency>
<!-- Other LangChain4J modules (ollama, chroma, embeddings) -->
...
</dependencies>(2) Core Java Code
Key steps are illustrated below; each code block is kept intact inside tags.
public static void main(String[] args) throws ApiException {
// Load a local text file as knowledge base
Document document = getDocument("笑话.txt");
// Split the document into segments
DocumentByLineSplitter lineSplitter = new DocumentByLineSplitter(200, 0, new OpenAiTokenizer());
List
segments = lineSplitter.split(document);
// Embed segments using Ollama model
OllamaEmbeddingModel embeddingModel = OllamaEmbeddingModel.builder()
.baseUrl("http://localhost:11434")
.modelName("llama3")
.build();
// Store embeddings in ChromaDB
Client client = new Client(CHROMA_URL);
EmbeddingStore
embeddingStore = ChromaEmbeddingStore.builder()
.baseUrl(CHROMA_URL)
.collectionName(CHROMA_DB_DEFAULT_COLLECTION_NAME)
.build();
segments.forEach(segment -> {
Embedding e = embeddingModel.embed(segment).content();
embeddingStore.add(e, segment);
});
// Retrieve relevant segment for a query
String qryText = "北极熊";
Embedding queryEmbedding = embeddingModel.embed(qryText).content();
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
.queryEmbedding(queryEmbedding)
.maxResults(1)
.build();
EmbeddingSearchResult
result = embeddingStore.search(request);
TextSegment textSegment = result.matches().get(0).embedded();
// Build prompt and query LLM
PromptTemplate promptTemplate = PromptTemplate.from(
"基于如下信息用中文回答:\n{{context}}\n提问:\n{{question}}");
Map
vars = new HashMap<>();
vars.put("context", textSegment.text());
vars.put("question", "北极熊干了什么");
Prompt prompt = promptTemplate.apply(vars);
OllamaChatModel chatModel = OllamaChatModel.builder()
.baseUrl("http://localhost:11434")
.modelName("llama3")
.build();
Response
resp = chatModel.generate(prompt.toUserMessage());
System.out.println("Answer: " + resp.content().text());
}
private static Document getDocument(String fileName) {
URL docUrl = LangChainMainTest.class.getClassLoader().getResource(fileName);
if (docUrl == null) {
log.error("File not found");
return null;
}
try {
Path path = Paths.get(docUrl.toURI());
return FileSystemDocumentLoader.loadDocument(path);
} catch (URISyntaxException e) {
log.error("Error loading file", e);
return null;
}
}(3) Testing
The sample text "有一只北极熊和一只企鹅…" is loaded, split, embedded, stored, and queried. When asking "北极熊干了什么", the system correctly returns "北极熊把自己的身上的毛一根一根地拔了下来".
Conclusion
The guide demonstrates a minimal end‑to‑end AI Q&A pipeline using Java, LangChain4J, Ollama, and ChromaDB, and suggests extending it with Spring Boot, advanced prompting, tool calling, and memory features.
References
LangChain official site
LangChain4J GitHub repository
Ollama documentation
ChromaDB project
JD Tech
Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.