Spring AI RAG: Concepts, Hands‑On Implementation, and Full Code

This article explains the limitations of large language models, introduces Retrieval‑Augmented Generation (RAG) and its four‑step workflow, details Spring AI's RAG components and vector‑store options, and provides complete, runnable Java code—including Maven, configuration, and service classes—to build a local knowledge‑base Q&A system.

The Dominant Programmer
The Dominant Programmer
The Dominant Programmer
Spring AI RAG: Concepts, Hands‑On Implementation, and Full Code

RAG Overview

Large language models (LLMs) suffer from knowledge staleness, hallucinations, and limited domain expertise. Retrieval‑Augmented Generation (RAG) mitigates these problems by first retrieving relevant information from an external knowledge source and then feeding that context to the LLM for answer generation.

Four‑step RAG workflow

Ingestion : Load raw documents (PDF, TXT, etc.) and split them into small chunks suitable for embedding. The example uses DocumentReader (specifically TikaDocumentReader) and TokenTextSplitter.

Embedding & Store : Convert each chunk into a high‑dimensional vector with an EmbeddingModel and store the vectors in a vector database. The demo uses SimpleVectorStore (in‑memory) but mentions persistent alternatives such as PgVector, Elasticsearch, Milvus, Weaviate, and Chroma.

Retrieval : When a user asks a question, the query is embedded and a similarity search returns the most relevant chunks.

Generation : Retrieved chunks are injected into the prompt (via ChatClient and a PromptTemplate) and the LLM generates a precise answer.

Vector store semantics

Vector stores enable semantic search; for example, the vectors for “Apple phone” and “iPhone” are close, allowing matching despite different keywords. Spring AI abstracts access through the VectorStore interface.

Key Spring AI components

Document

– represents a raw document and its metadata (interface org.springframework.ai.document.Document). DocumentReader – loads documents from the file system (implementations: JsonReader, TextReader, PagePdfDocumentReader, TikaDocumentReader). TextSplitter – splits long text into chunks (implementation: TokenTextSplitter). EmbeddingModel – transforms text chunks into vectors (provided by Ollama, e.g., nomic-embed-text). VectorStore – stores and retrieves vectors (implementations: SimpleVectorStore, ElasticsearchVectorStore, etc.). QuestionAnswerAdvisor – intercepts user requests, performs retrieval, and injects context before generation (built with QuestionAnswerAdvisor.builder(vectorStore)).

Maven dependencies (Spring AI 1.1.2)

<properties>
  <java.version>17</java.version>
  <spring-ai.version>1.1.2</spring-ai.version>
</properties>
<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>org.springframework.ai</groupId>
      <artifactId>spring-ai-bom</artifactId>
      <version>${spring-ai.version}</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>
<dependencies>
  <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
  </dependency>
  <dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-ollama</artifactId>
  </dependency>
  <dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-advisors-vector-store</artifactId>
  </dependency>
  <dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-tika-document-reader</artifactId>
  </dependency>
</dependencies>

VectorStoreConfig – document loading & vector store initialization

package com.badao.ai.config;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.ai.document.Document;
import org.springframework.ai.document.DocumentReader;
import org.springframework.ai.reader.tika.TikaDocumentReader;
import org.springframework.ai.transformer.splitter.TokenTextSplitter;
import org.springframework.ai.vectorstore.SimpleVectorStore;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.ai.embedding.EmbeddingModel;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.CommandLineRunner;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.Resource;
import java.util.List;

@Configuration
public class VectorStoreConfig {
    private static final Logger logger = LoggerFactory.getLogger(VectorStoreConfig.class);

    @Value("classpath:knowledge-base/badao-internal.txt")
    private Resource knowledgeResource;

    @Bean
    public VectorStore vectorStore(EmbeddingModel embeddingModel) {
        // In‑memory store for demo purposes
        return SimpleVectorStore.builder(embeddingModel).build();
    }

    @Bean
    public CommandLineRunner loadDocuments(VectorStore vectorStore) {
        return args -> {
            // 1. Load documents (auto‑detect format)
            DocumentReader reader = new TikaDocumentReader(knowledgeResource);
            List<Document> documents = reader.get();
            logger.info("Loaded {} documents", documents.size());

            // 2. Split into chunks (max 300 tokens, min 50 chars, keep separator)
            TokenTextSplitter splitter = TokenTextSplitter.builder()
                .withChunkSize(300)
                .withMinChunkSizeChars(50)
                .withMinChunkLengthToEmbed(5)
                .withKeepSeparator(true)
                .build();
            List<Document> chunks = splitter.apply(documents);
            logger.info("Split into {} chunks", chunks.size());

            // 3. Vectorize and store
            vectorStore.add(chunks);
            logger.info("Vector store initialized");
        };
    }
}

RagConfig – registering the RAG advisor

package com.badao.ai.config;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.vectorstore.QuestionAnswerAdvisor;
import org.springframework.ai.chat.model.ChatModel;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class RagConfig {
    @Bean
    public ChatClient chatClient(ChatModel chatModel, VectorStore vectorStore) {
        return ChatClient.builder(chatModel)
            .defaultAdvisors(
                QuestionAnswerAdvisor.builder(vectorStore)
                    .searchRequest(SearchRequest.builder()
                        .similarityThreshold(0.7)
                        .topK(3)
                        .build())
                    .build())
            .build();
    }
}

Service and controller layers

package com.badao.ai.service;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.stereotype.Service;

@Service
public class RagService {
    private final ChatClient chatClient;
    public RagService(ChatClient chatClient) { this.chatClient = chatClient; }
    public String ask(String question) {
        return chatClient.prompt()
            .user(question)
            .call()
            .content();
    }
}
package com.badao.ai.controller;

import com.badao.ai.service.RagService;
import org.springframework.web.bind.annotation.*;

@RestController
@RequestMapping("/api")
public class RagController {
    private final RagService ragService;
    public RagController(RagService ragService) { this.ragService = ragService; }
    @PostMapping("/rag")
    public ChatResponse rag(@RequestBody ChatRequest request) {
        String result = ragService.ask(request.message());
        return new ChatResponse(200, "success", result);
    }
    public record ChatRequest(String message) {}
    public record ChatResponse(int code, String msg, String data) {}
}

Application configuration (application.yml)

server:
  port: 886
spring:
  ai:
    ollama:
      base-url: http://localhost:11434
      chat:
        model: qwen2.5:7b-instruct
        options:
          temperature: 0.3
      embedding:
        model: nomic-embed-text
        options:
          num-batch: 4
logging:
  level:
    org.springframework.ai.rag: DEBUG
    org.springframework.ai.vectorstore: DEBUG

Embedding model download

ollama pull nomic-embed-text

Model selection notes

Chat model qwen2.5:7b-instruct supports tool calling and strong Chinese capability; alternatives include deepseek-r1:8b and llama3.1:8b. Embedding model nomic-embed-text produces 768‑dimensional vectors, is free, and works well; alternatives such as bge-m3 (1024‑dim) or mxbai-embed-large (1024‑dim) can be used if the vector store dimensions are adjusted accordingly. Both chat and embedding models must be present in the local Ollama repository, and the embedding dimension must match the vector store configuration.

ETL model explanation

The RAG pipeline follows an Extract‑Transform‑Load (ETL) pattern: Extract reads documents from the knowledge base; Transform splits them into chunks and converts each chunk to a vector via the embedding model; Load writes the vectors into the vector database. These steps are typically executed at application startup. At runtime, a user query is embedded, the vector store is searched, and the retrieved documents are injected into the LLM prompt.

Common optimization strategies

Increase similarity threshold – filters out less‑relevant documents (e.g., similarityThreshold(0.8)).

Decrease similarity threshold – retrieves more candidates for large corpora (e.g., similarityThreshold(0.5)).

Control return count – set topK to avoid exceeding the model’s context window (e.g., topK(3)).

Chunk size & overlap – balance precision and recall (e.g., chunkSize(300), chunkOverlap(50)).

Dynamic filtering – filter by metadata such as type or date (e.g.,

.param(QuestionAnswerAdvisor.FILTER_EXPRESSION, "type == 'manual'")

).

Custom prompt template – control how context and question are concatenated using placeholders {query} and {question_answer_context}.

Pipeline summary

Document loading – TikaDocumentReader auto‑detects PDF, Word, TXT, etc.

Text chunking – TokenTextSplitter with chunkSize(300), chunkOverlap(30).

Embedding – Ollama nomic-embed-text (768‑dim).

Vector store – SimpleVectorStore for demo; replace with PgVector, Elasticsearch, etc., for production.

Retrieval + generation – QuestionAnswerAdvisor with similarityThreshold and topK to control quality.

Enhanced generation – ChatClient automatically injects retrieved chunks into the prompt.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaRAGEmbeddingSpring AIOllamaVectorStoreQuestionAnswerAdvisor
The Dominant Programmer
Written by

The Dominant Programmer

Resources and tutorials for programmers' advanced learning journey. Advanced tracks in Java, Python, and C#. Blog: https://blog.csdn.net/badao_liumang_qizhi

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.