Spring AI ChatMemory: Concepts, Practical Setup, and Common Issues

This guide explains how Spring AI abstracts LLM conversation memory using a three‑layer architecture, demonstrates configuring MessageWindowChatMemory with a sliding‑window strategy, shows two ways to register the memory advisor, and provides complete Maven, YAML, and Java code examples with test screenshots.

The Dominant Programmer
The Dominant Programmer
The Dominant Programmer
Spring AI ChatMemory: Concepts, Practical Setup, and Common Issues

Scenario

Spring AI Advisor full guide: interceptor mechanism and practical walkthrough. The author previously implemented basic session memory and now expands the learning.

Large language models (LLM) are stateless—each call is independent. If the model is told "My name is Xiao Ming" in the first turn and asked "What is my name?" in the second turn, it will not remember.

To achieve coherent multi‑turn dialogue, the core idea is to send the complete conversation history with each request. Spring AI encapsulates this process through ChatMemory and the Advisor mechanism, allowing developers to enable memory, conversation isolation, and even persistence with minimal configuration.

Core Concepts

Spring AI splits conversation‑memory management into three layers, each with a distinct role:

ChatMemory (memory strategy layer) – decides which messages to keep and when to trim (e.g., retain only the most recent N messages).

ChatMemoryRepository (storage layer) – purely handles CRUD of messages (in‑memory, JDBC, Redis, etc.).

MessageChatMemoryAdvisor (interceptor layer) – automatically injects the conversation history into each request and saves new messages after the response.

MessageWindowChatMemory: Sliding‑Window Memory

MessageWindowChatMemory

is the recommended implementation. It maintains a fixed‑size window of messages (default 20). When the limit is exceeded, the oldest messages are removed, while system messages are retained. This design lets you precisely control the context length sent to the model, preventing token explosion.

Two Ways to Register an Advisor

Global registration (at builder time) – ChatClient.builder().defaultAdvisors(...) applies to all requests.

Per‑request registration (on demand) – prompt().advisors(a -> a.param(...)) passes runtime parameters such as conversationId. Because memory must load different histories based on conversation ID, the MessageChatMemoryAdvisor must be used with per‑request parameters.

Implementation

pom.xml

<parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>3.3.3</version> <!-- downgrade to stable version -->
</parent>
<groupId>com.example</groupId>
<artifactId>spring-ai-ollama-demo</artifactId>
<version>1.0</version>
<properties>
    <java.version>17</java.version>
    <spring-ai.version>1.1.2</spring-ai.version>
</properties>
<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
    <!-- Spring AI Ollama core -->
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-starter-model-ollama</artifactId>
        <version>${spring-ai.version}</version>
    </dependency>
</dependencies>
<repositories>
    <repository>
        <id>spring-milestones</id>
        <url>https://repo.spring.io/milestone</url>
        <snapshots><enabled>false</enabled></snapshots>
    </repository>
</repositories>

application.yml

server:
  port: 886
spring:
  ai:
    ollama:
      base-url: http://localhost:11434
      chat:
        model: qwen2.5:7b-instruct
        options:
          temperature: 0.7
          num-ctx: 4096   # Ollama context window size
logging:
  level:
    org.springframework.ai.chat.client.advisor: DEBUG   # observe memory injection logs

MemoryConfig – Creating ChatMemory

package com.badao.ai.config;
import org.springframework.ai.chat.memory.ChatMemory;
import org.springframework.ai.chat.memory.InMemoryChatMemoryRepository;
import org.springframework.ai.chat.memory.MessageWindowChatMemory;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class MemoryConfig {
    @Bean
    public ChatMemory chatMemory() {
        // 1. Create the underlying storage repository (in‑memory implementation)
        InMemoryChatMemoryRepository repository = new InMemoryChatMemoryRepository();
        // 2. Wrap with a sliding‑window strategy, limiting to the 10 most recent messages
        return MessageWindowChatMemory.builder()
                .chatMemoryRepository(repository)
                .maxMessages(10)
                .build();
    }
}

ChatConfig – Registering the Memory Advisor

package com.badao.ai.config;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.MessageChatMemoryAdvisor;
import org.springframework.ai.chat.memory.ChatMemory;
import org.springframework.ai.chat.model.ChatModel;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class ChatConfig {
    @Bean
    public ChatClient chatClient(ChatModel chatModel, ChatMemory chatMemory) {
        return ChatClient.builder(chatModel)
                .defaultAdvisors(
                        MessageChatMemoryAdvisor.builder(chatMemory).build()
                )
                .build();
    }
}

Controller – Multi‑turn Chat API

package com.badao.ai.controller;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.memory.ChatMemory; // import ChatMemory interface
import org.springframework.web.bind.annotation.*;

@RestController
@RequestMapping("/api")
public class MemoryChatController {
    private final ChatClient chatClient;
    public MemoryChatController(ChatClient chatClient) {
        this.chatClient = chatClient;
    }
    @PostMapping("/chat/memory")
    public ChatResponse chatWithMemory(@RequestBody MemoryChatRequest request) {
        String result = chatClient.prompt()
                .user(request.message())
                .advisors(advisor -> advisor.param(
                        ChatMemory.CONVERSATION_ID,
                        request.conversationId()
                ))
                .call()
                .content();
        return new ChatResponse(200, "success", result);
    }
    public record MemoryChatRequest(String message, String conversationId) {}
    public record ChatResponse(int code, String msg, String data) {}
}

Testing Verification

Test the session memory by sending conversation ID "001", stating a name, then asking for the name again. The following screenshots show the expected behavior.

Test 001
Test 001

Next, send conversation ID "002" and ask for the name again. The screenshot demonstrates that the memory is correctly scoped per conversation.

Test 002
Test 002
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

javaLLMSpring BootSpring AIChatMemoryConversation Memory
The Dominant Programmer
Written by

The Dominant Programmer

Resources and tutorials for programmers' advanced learning journey. Advanced tracks in Java, Python, and C#. Blog: https://blog.csdn.net/badao_liumang_qizhi

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.