Getting Started with Spring Cloud Alibaba AI: Integrating Text, Image, and Audio Models in a Spring Boot Application
This tutorial introduces Spring AI and Spring Cloud Alibaba AI, explains their core features, shows how to set up a Maven project with the required dependencies, and provides step‑by‑step code examples for invoking text, image, and audio generation models using Spring Boot.
Spring AI draws inspiration from Python projects like LangChain and LlamaIndex, aiming to make generative AI applications available across many programming languages, not just Python. Its core capabilities include abstraction, simplified AI development, model and vector support, and automatic configuration.
Spring Cloud Alibaba AI extends Spring AI with native support for Chinese large models such as Alibaba's Tongyi series, offering adapters for chat, text‑to‑image, and text‑to‑speech use cases, along with example projects.
Setup
Create a Maven project using JDK 17 and add the following dependencies:
com.alibaba.cloud
spring-cloud-alibaba-dependencies
2023.0.1.0
pom
import
com.alibaba.cloud
spring-cloud-starter-alibaba-aiConfigure the Tongyi API key in application.yml :
server:
port: 8080
spring:
application:
name: alibaba-spring-ai-demo
cloud:
ai:
tongyi:
api-key: your-api-keyCreate the main Spring Boot class:
@SpringBootApplication
public class MyAiApplication {
public static void main(String[] args) {
SpringApplication.run(MyAiApplication.class, args);
}
}Text Model Integration
Define a REST controller at /ai/simple that delegates to a TongYiService implementation:
@RestController
@RequestMapping("/ai")
@CrossOrigin
public class TongYiController {
@Autowired
@Qualifier("tongYiSimpleServiceImpl")
private TongYiService tongYiSimpleService;
@GetMapping("/simple")
public String completion(@RequestParam(value = "message", defaultValue = "AI时代下Java开发者该何去何从?") String message) {
return tongYiSimpleService.completion(message);
}
}The service interface declares methods for text completion, image generation, and audio synthesis:
public interface TongYiService {
/** Basic Q&A */
String completion(String message);
/** Text‑to‑Image */
ImageResponse genImg(String imgPrompt);
/** Speech synthesis */
String genAudio(String text);
}The implementation uses Spring AI’s ChatClient and StreamingChatClient which are auto‑wired:
@Service
@Slf4j
public class TongYiSimpleServiceImpl extends AbstractTongYiServiceImpl {
private final ChatClient chatClient;
private final StreamingChatClient streamingChatClient;
@Autowired
public TongYiSimpleServiceImpl(ChatClient chatClient, StreamingChatClient streamingChatClient) {
this.chatClient = chatClient;
this.streamingChatClient = streamingChatClient;
}
@Override
public String completion(String message) {
Prompt prompt = new Prompt(new UserMessage(message));
return chatClient.call(prompt).getResult().getOutput().getContent();
}
}Sending the prompt “AI时代下Java开发者该何去何从?” returns a generated answer (approximately 10 seconds response time).
Image Generation Model
The image service creates an ImagePrompt and calls the ImageClient :
@Service
@Slf4j
public class TongYiImagesServiceImpl extends AbstractTongYiServiceImpl {
private final ImageClient imageClient;
@Autowired
public TongYiImagesServiceImpl(ImageClient client) {
this.imageClient = client;
}
@Override
public ImageResponse genImg(String imgPrompt) {
var prompt = new ImagePrompt(imgPrompt);
return imageClient.call(prompt);
}
}Testing with the prompt “Painting a boy coding in front of the desk, with his dog.” produces a high‑quality image as shown in the original screenshots.
Audio Synthesis Model
The audio service uses SpeechClient to synthesize WAV audio from text:
@Service
@Slf4j
public class TongYiAudioSimpleServiceImpl extends AbstractTongYiServiceImpl {
private final SpeechClient speechClient;
@Autowired
public TongYiAudioSimpleServiceImpl(SpeechClient client) {
this.speechClient = client;
}
@Override
public String genAudio(String text) {
log.info("gen audio prompt is: {}", text);
var resWAV = speechClient.call(text);
// save the WAV file locally (code omitted)
return save(resWAV, SpeechSynthesisAudioFormat.WAV.getValue());
}
}The generated audio file plays correctly, confirming successful integration.
Experience Summary
Simplified Development: Spring AI abstracts away low‑level SDK calls, making complex AI features easier to maintain.
Response Time: Basic text Q&A takes around 10 seconds; performance can vary with model size and workload.
Model Selection: Current Spring AI integration defaults to Tongyi models; selecting alternative providers requires additional configuration.
Future work for Spring Cloud Alibaba AI includes support for VectorStore, Embedding, and ETL pipelines to enable richer RAG applications.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.