Artificial Intelligence 24 min read

Why Switch from Hand‑Written HTTP Calls to Spring AI for Large‑Model Integration?

The article analyzes the drawbacks of manually coding HTTP calls to large language models—hard‑coded keys, fragile request construction, missing retries, and poor observability—and demonstrates how Spring AI’s layered abstraction, unified configuration, built‑in resilience, function calling, RAG support, and seamless Spring ecosystem integration solve these problems for production‑grade Java applications.

Su San Talks Tech

May 27, 2026

Why Switch from Hand‑Written HTTP Calls to Spring AI for Large‑Model Integration?

Preface

A colleague asked why everyone prefers Spring AI over direct RestTemplate calls to OpenAI or DeepSeek. The author illustrates the hidden costs of hand‑written HTTP code and then introduces Spring AI as a solution.

1. Why Not Write HTTP Manually?

1.1 Classic Hand‑Written HTTP Example

Using RestTemplate to call DeepSeek API results in hard‑coded API key and URL, manual request body assembly, manual header setup, and verbose error handling:

@RestController
public class NativeAIController {
    // Hard‑coded key and URL – first pitfall
    private static final String API_KEY = "sk-xxxxxxxx";
    private static final String API_URL = "https://api.deepseek.com/v1/chat/completions";

    @GetMapping("/ai/chat")
    public String chat(@RequestParam String msg) {
        RestTemplate restTemplate = new RestTemplate();
        // 1. Assemble request body – error‑prone
        Map<String, Object> requestBody = new HashMap<>();
        requestBody.put("model", "deepseek-chat");
        requestBody.put("messages", List.of(Map.of("role", "user", "content", msg)));
        // 2. Build headers
        HttpHeaders headers = new HttpHeaders();
        headers.setContentType(MediaType.APPLICATION_JSON);
        headers.setBearerAuth(API_KEY);
        HttpEntity<Map<String, Object>> entity = new HttpEntity<>(requestBody, headers);
        // 3. Send request and process response
        try {
            ResponseEntity<Map> response = restTemplate.exchange(API_URL, HttpMethod.POST, entity, Map.class);
            Map body = response.getBody();
            if (body != null && body.containsKey("choices")) {
                List<Map<String, Object>> choices = (List) body.get("choices");
                Map<String, Object> choice = choices.get(0);
                Map<String, Object> message = (Map) choice.get("message");
                return (String) message.get("content");
            }
            return "No valid response";
        } catch (RestClientException e) {
            // Extremely simplistic error handling
            return "Call error: " + e.getMessage();
        }
    }
}

This code can be written quickly, but it hides six fatal problems:

If you only need a personal demo, this may suffice, but in a production environment the following issues explode like time‑bombs: hard‑coded secret, fragile error handling, massive code changes when swapping models, etc.

Hard‑coded API key – committing the source reveals the secret.

Hard‑coded request parameters – each model has a different payload structure (e.g., messages vs input), requiring code changes for every switch.

Poor exception handling – model APIs are unstable; timeouts, rate limits, and degradations need sophisticated retry and fallback logic.

No connection‑pool management – creating a new RestTemplate per call leads to excessive TCP connections under load.

Fragile response parsing – any change in JSON structure throws exceptions and forces full regression testing.

Lack of observability – no logging, tracing, or metrics, making production debugging impossible.

1.2 Trying to Wrap It Yourself?

Even a custom wrapper quickly balloons to dozens of fields: thread‑safe HTTP client, retry policy, rate limiter, circuit breaker, connection pool, metrics, logger, plus request‑body construction and multi‑model adapters.

public class LLMClient {
    // 1. Thread‑safe client reuse
    private final HttpClient httpClient;
    // 2. Retry strategy
    private final RetryPolicy retryPolicy;
    // 3. Rate limiter
    private final RateLimiter rateLimiter;
    // 4. Circuit breaker
    private final CircuitBreaker circuitBreaker;
    // 5. Connection pool
    private final ConnectionPool connectionPool;
    // 6. Monitoring hooks
    private final MeterRegistry meterRegistry;
    // 7. Logging
    private final Logger logger;
    // ... plus request building, multi‑model adaptation, streaming handling …
}

Spring AI achieves all of this in just three lines:

@Service
public class AIChatService {
    private final ChatClient chatClient;
    public String chat(String question) {
        return chatClient.prompt(question).call().content();
    }
}

2. Spring AI: A “Swiss‑Army Knife” for Model Calls

2.1 Framework Overview

Spring AI, officially released in 2024, provides a standardized, modular AI toolchain for Java developers. It promises to lower the AI development barrier, unify the Java AI ecosystem, and strengthen enterprise‑grade support. After two years of iteration, Spring AI 1.0 was released in May 2025, certified by JSR‑382, guaranteeing API compatibility for at least three major versions. Within the first week it recorded over 150 k downloads and 32 k GitHub stars, supporting more than 20 model families.

2.2 Four‑Layer Abstraction

The core design consists of four layers, each solving a specific engineering problem:

Model abstraction layer : Interfaces like ChatModel and EmbeddingModel define a uniform contract regardless of the underlying provider.

Adapter layer : Provider‑specific adapters implement the interfaces, handling protocol translation. Built‑in adapters cover OpenAI, Azure OpenAI, Anthropic Claude, Alibaba Tongyi Qianwen, Ollama (Llama 3), etc. Adding a new model only requires a new adapter class.

Service orchestration layer : Exposes ChatClient with a fluent API, PromptTemplate, and conversation context management.

Application integration layer : The @EnableAI annotation auto‑configures beans such as AiClient and ModelRegistry, and integrates with Spring Security for model‑level access control.

2.3 Unified Configuration

Switching models is a matter of editing application.yml:

# OpenAI
spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      model: gpt-4-turbo

# DeepSeek – just replace the block
spring:
  ai:
    deepseek:
      api-key: ${DEEPSEEK_API_KEY}
      model: deepseek-chat

# Local Ollama (Llama 3)
spring:
  ai:
    ollama:
      base-url: http://localhost:11434
      model: llama3:8b

Business code remains untouched; API keys are injected via environment variables, eliminating hard‑coding.

2.4 ChatClient vs ChatModel

ChatModel

– low‑level interface that talks directly to the model and returns raw responses. ChatClient – high‑level wrapper offering a fluent, Spring‑style API, similar to WebClient or RestClient.

2.5 Function Calling

Spring AI lets a model invoke business code via annotated functions. Example:

@Component
@Description("Query order by ID")
public class OrderQueryFunction implements Function<OrderQueryFunction.Request, OrderQueryFunction.Response> {
    @Autowired
    private OrderService orderService;
    public record Request(@JsonProperty(required = true) String orderId) {}
    public record Response(String status, BigDecimal amount, String deliveryTime) {}
    @Override
    public Response apply(Request request) {
        Order order = orderService.getOrder(request.orderId());
        return new Response(order.getStatus(), order.getAmount(), order.getDeliveryTime());
    }
}

// Register the function with ChatClient
ChatClient chatClient = ChatClient.builder(chatModel)
    .defaultFunctions("queryOrder") // auto‑register
    .build();

When a user asks “Where is my order ORD‑001?”, Spring AI automatically detects the need for the queryOrder function, passes the parameter, and returns the result transparently.

3. Hand‑Written HTTP vs Spring AI Comparison

3.1 Overview Table (converted to list)

Multi‑model support : Hand‑written HTTP requires code changes per model; Spring AI needs only a config change.

Exception/Retry/Fallback : Manual implementation is error‑prone; Spring AI provides built‑in, production‑grade mechanisms.

Streaming response : Manual SSE handling is complex; Spring AI uses chatClient.prompt().stream() in one line.

Structured output : Manual JSON parsing is fragile; Spring AI auto‑converts to Java records via StructuredOutputConverter.

Function calling : Manual API design required; Spring AI registers functions with annotations.

Observability : Manual Micrometer integration needed; Spring AI exposes Actuator endpoints out of the box.

Spring ecosystem integration : Manual bean wiring; Spring AI auto‑configures.

Development efficiency : ~50 lines per interface manually; ~5 lines with Spring AI.

3.2 Engineering Capabilities

Enterprise applications demand retries, circuit breakers, rate limiting, monitoring, and audit trails. Implementing these manually can exceed 500 lines of code and requires extensive testing. Spring AI leverages Spring Cloud components such as spring.retry, ModelRouter, CircuitBreaker, and RateLimiter, providing declarative, production‑ready behavior.

3.3 Streaming Responses

Streaming with raw HTTP involves handling SSE, buffering chunks, and assembling the final output. Spring AI simplifies this to a single call:

@GetMapping(value = "/chat/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> streamChat(@RequestParam String prompt) {
    return chatClient.prompt(prompt).stream()
        .map(chunk -> chunk.getResult().getOutput().getContent());
}

3.4 Structured Output

Spring AI can map JSON responses directly to Java records:

public record SentimentResult(String label, double score) {}
BeanOutputConverter<SentimentResult> converter = new BeanOutputConverter<>(SentimentResult.class);
String json = chatClient.prompt("User review: " + review + ". Return JSON with sentiment.")
    .call().content();
SentimentResult result = converter.convert(json); // result.label -> "POSITIVE", result.score -> 0.95

3.5 Function Calling (Enterprise Focus)

With Spring AI, implementing a function is as simple as creating a class that implements Function and annotating it. The framework handles registration, parameter binding, and routing automatically.

3.6 Retrieval‑Augmented Generation (RAG)

Spring AI provides a VectorStore abstraction that unifies access to Elasticsearch, Milvus, PgVector, etc., along with DocumentReader for PDF/Word parsing and a Retriever for hybrid semantic‑keyword search. After configuring the store, a single call to retrieveAndGenerate() performs the entire RAG pipeline.

3.7 Conversation Memory

Spring AI’s ConversationContext manages multi‑turn dialogue state, integrates with Redis for cross‑instance sharing, and offers thread‑local safety. Developers retrieve recent messages via context.getLastMessages(5) without manual stitching.

4. Pain Points and Suitable Scenarios

4.1 Current Limitations

Learning curve & version stability : As a relatively new framework, rapid iteration may cause breaking changes.

Agent orchestration : Compared with LangChain4j, Spring AI offers fewer out‑of‑the‑box agent plugins.

Community ecosystem : Although gaining stars, the plugin ecosystem is still maturing; some adapters may need custom implementation.

4.2 Ideal Use Cases

Traditional Spring Boot microservices where teams want AI without refactoring existing architecture.

Enterprises needing flexible multi‑model switching via configuration.

Production environments demanding full‑stack engineering features (retries, circuit breakers, monitoring).

Scenarios where AI must tightly integrate with existing business services (e.g., order processing).

Regulated industries (finance, healthcare, government) requiring audit logs, permission control, and explainability.

4.3 When to Avoid Spring AI

Simple prototypes or one‑off scripts where the overhead of a framework outweighs benefits.

Complex agent orchestration requiring the richer toolchain of LangChain4j.

Edge‑computing or highly resource‑constrained environments where the Spring footprint is too large.

Conclusion

For quick demos, hand‑written HTTP is sufficient. However, for enterprise‑grade, scalable AI integration, Spring AI offers a unified abstraction, deep Spring ecosystem integration, and robust engineering capabilities that turn Java into a competitive platform for AI applications.

Choosing the right approach ultimately depends on team expertise, project scale, and long‑term maintenance considerations. By 2026, Spring AI is reshaping the Java AI development landscape.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend Java LLM RAG Function Calling Spring AI

Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.