Build a RAG-Powered Knowledge Base with Spring Boot, Milvus, and Ollama
This guide walks through creating a Retrieval‑Augmented Generation (RAG) system using Spring Boot 3.4.2, Milvus vector database, and the bge‑m3 embedding model via Ollama, covering environment setup, dependency configuration, vector store operations, and integration with a large language model to deliver refined, similarity‑based answers.
1. Introduction
1.1 What is RAG?
Retrieval‑Augmented Generation (RAG) combines a large language model (LLM) with an external knowledge base to improve the accuracy and relevance of generated text.
RAG enables the model to retrieve relevant documents from a set of files and incorporate that information into its responses, rather than relying solely on its pre‑trained knowledge.
1.2 What is a vector database?
A vector database stores embeddings (numeric vectors) and performs similarity search instead of exact matching. It allows you to find items with similar semantic meaning, such as images or sentences.
Example: the text "I love Spring full‑stack case source code" is converted by an embedding model into a vector like [0.24, -0.56, 0.89].
1.3 Milvus Overview
Milvus is a popular open‑source vector database. For this tutorial we only need to know how to use it; detailed documentation is available at https://milvus.io/docs/zh .
2. Practical Example
2.1 Environment Preparation
Install Milvus (standalone) using the provided script.
Install the bge‑m3 embedding model with ollama pull bge-m3:latest .
<code># Download script
$ curl -sfL https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh -o standalone_embed.sh
# Start container
$ bash standalone_embed.sh start
</code>2.2 Project Configuration
Add the following Maven dependencies:
<code><dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-milvus-store-spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>com.alibaba.cloud.ai</groupId>
<artifactId>spring-ai-alibaba-starter</artifactId>
<version>1.0.0-M6.1</version>
</dependency>
</code>Key configuration (YAML style):
<code>spring:
ai:
dashscope:
api-key: sk-xxxooo
base-url: https://dashscope.aliyuncs.com/compatible-mode/v1
chat:
options:
model: qwen-turbo
embedding:
enabled: false
---
spring:
ai:
ollama:
chat:
enabled: false
base-url: http://localhost:11111
embedding:
enabled: true
model: bge-m3:latest
---
spring:
ai:
vectorstore:
milvus:
client:
host: localhost
port: 19530
username: root
password: root
initialize-schema: true
embeddingDimension: 1024
</code>2.3 Vector Store Operations
Service to save documents and perform similarity search:
<code>@Service
public class DocumentService {
private final VectorStore vectorStore;
public DocumentService(VectorStore vectorStore) {
this.vectorStore = vectorStore;
}
// Save sample texts
public void save() {
List<Document> documents = List.of(
new Document("banana"),
new Document("apple"),
new Document("orange"),
new Document("strawberry"),
new Document("Java"),
new Document("python"),
new Document("C#"),
new Document("tiger"));
this.vectorStore.add(documents);
}
// Similarity search
public List<Document> query(String prompt, int topK) {
SearchRequest request = SearchRequest.builder()
.query(prompt)
.topK(topK)
.build();
return this.vectorStore.similaritySearch(request);
}
}
</code>Controller exposing endpoints:
<code>@RestController
@RequestMapping("/rag")
public class RagController {
private final DocumentService documentService;
public RagController(DocumentService documentService) {
this.documentService = documentService;
}
@GetMapping("/save")
public ResponseEntity<String> save() {
this.documentService.save();
return ResponseEntity.ok("success");
}
@GetMapping("/{topK}")
public ResponseEntity<List<Document>> query(@PathVariable Integer topK, String prompt) {
return ResponseEntity.ok(this.documentService.query(prompt, topK));
}
}
</code>2.4 Combine with LLM
Configure a ChatClient bean:
<code>@Configuration
public class ChatConfig {
@Bean
ChatClient chatClient(ChatClient.Builder builder) {
return builder.defaultAdvisors(List.of(new SimpleLoggerAdvisor()))
.build();
}
}
</code>Endpoint that retrieves relevant documents, builds a prompt, and calls the LLM:
<code>@GetMapping("/query/{topK}")
public ResponseEntity<String> queryLLM(@PathVariable Integer topK,
@RequestParam String prompt) {
SearchRequest request = SearchRequest.builder()
.query(prompt)
.topK(topK)
.build();
List<Document> docs = this.vectorStore.similaritySearch(request);
PromptTemplate template = new PromptTemplate("{userMessage}\n\n Use the following information to answer the question:\n {contents}");
Prompt finalPrompt = template.create(Map.of("userMessage", prompt, "contents", docs));
String result = this.chatClient.prompt(finalPrompt).call().content();
return ResponseEntity.ok(result);
}
</code>Running /rag/save stores the sample texts in Milvus; /rag/{topK}?prompt=... performs similarity search; the final LLM step filters and formats the answer.
Spring Full-Stack Practical Cases
Full-stack Java development with Vue 2/3 front-end suite; hands-on examples and source code analysis for Spring, Spring Boot 2/3, and Spring Cloud.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.