SpringAI

1953字约7分钟

2025-07-15

官方启动

@Bean
public CommandLineRunner runner(ChatClient.Builder builder) {
    return args -> {
        ChatClient chatClient = builder.build();
        String response = chatClient.prompt("Tell me a joke").call().content();							
        System.out.println(response);
    };
}

Prompt Templates

官方采用OSS 库StringTemplate

Tell me a {adjective} joke about {content}.

‍

嵌入

嵌入的工作原理是将文本、图像和视频转换为浮点数数组（称为向量）。这些矢量旨在捕获文本、图像和视频的含义。嵌入数组的长度称为向量的维数。
通过计算两段文本的向量表示之间的数值距离，应用程序可以确定用于生成嵌入向量的对象之间的相似性。

‍

令牌

代币是 AI 模型工作原理的构建块。在输入时，模型将单词转换为标记。在输出时，他们将标记转换回单词。
在英语中，一个标记大约相当于一个单词的 75%。作为参考，莎士比亚全集总计约 900,000 字，翻译成大约 120 万个代币。

‍

结构化输出

AI 模型的输出传统上以形式到达，即使您要求以 JSON 格式回复也是如此。它可能是正确的 JSON，但不是 JSON 数据结构。
创建提示以产生预期的输出，然后将生成的简单字符串转换为可用于应用程序集成的数据结构。

‍

拓展数据

自定义 AI 模型以合并您的数据：
微调：这种传统的机器学习技术涉及定制模型和更改其内部权重。然而，对于机器学习专家来说，这是一个具有挑战性的过程，并且由于 GPT 等模型的大小，它非常耗费资源。此外，某些型号可能不提供此选项。
Prompt Stuffing：一种更实用的替代方案涉及将数据嵌入到提供给模型的提示中。给定模型的 token 限制，需要技术在模型的上下文窗口中呈现相关数据。这种方法俗称 “填充提示”。 Spring AI 库可帮助您实现基于“填充提示”技术（也称为检索增强生成（RAG））的解决方案。
工具调用：该技术允许注册将大型语言模型连接到外部系统 API 的工具（用户定义的服务）。 Spring AI 大大简化了您需要编写以支持工具调用的代码。

‍

SpringAI maven 引入

https://docs.spring.io/spring-ai/reference/getting-started.html

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>1.0.0-SNAPSHOT</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

‍

API 讲解

聊天模型

ChatModel

public interface ChatModel extends Model<Prompt, ChatResponse> {

	default String call(String message) {...}

    @Override
	ChatResponse call(Prompt prompt);
}

StreamingChatModel

public interface StreamingChatModel extends StreamingModel<Prompt, ChatResponse> {

    default Flux<String> stream(String message) {...}

    @Override
	Flux<ChatResponse> stream(Prompt prompt);
}

Prompt

public class Prompt implements ModelRequest<List<Message>> {

    private final List<Message> messages;

    private ChatOptions modelOptions;

	@Override
	public ChatOptions getOptions() {...}

	@Override
	public List<Message> getInstructions() {...}

    // constructors and utility methods omitted
}

Content

public interface Content {

	String getText();

	Map<String, Object> getMetadata();
}

public interface Message extends Content {

	MessageType getMessageType();
}

public interface MediaContent extends Content {

	Collection<Media> getMedia();

}

ChatOptions

public interface ChatOptions extends ModelOptions {

	String getModel();
	Float getFrequencyPenalty();
	Integer getMaxTokens();
	Float getPresencePenalty();
	List<String> getStopSequences();
	Float getTemperature();
	Integer getTopK();
	Float getTopP();
	ChatOptions copy();

}

ChatResponse

public class ChatResponse implements ModelResponse<Generation> {

    private final ChatResponseMetadata chatResponseMetadata;
	private final List<Generation> generations;

	@Override
	public ChatResponseMetadata getMetadata() {...}

    @Override
	public List<Generation> getResults() {...}

    // other methods omitted
}

public class Generation implements ModelResult<AssistantMessage> {

	private final AssistantMessage assistantMessage;
	private ChatGenerationMetadata chatGenerationMetadata;

	@Override
	public AssistantMessage getOutput() {...}

	@Override
	public ChatGenerationMetadata getMetadata() {...}

    // other methods omitted
}

‍

模型实体映射

record ActorFilms(String actor, List<String> movies) {}

特定映射

ActorFilms actorFilms = chatClient.prompt()
    .user("Generate the filmography for a random actor.")
    .call()
    .entity(ActorFilms.class);

泛型映射

List<ActorFilms> actorFilms = chatClient.prompt()
    .user("Generate the filmography of 5 movies for Tom Hanks and Bill Murray.")
    .call()
    .entity(new ParameterizedTypeReference<List<ActorFilms>>() {});

‍

提示词模板

String answer = ChatClient.create(chatModel).prompt()
    .user(u -> u
            .text("Tell me the names of 5 movies whose soundtrack was composed by {composer}")
            .param("composer", "John Williams"))
    .call()
    .content();

自定义提示词规则模板

String answer = ChatClient.create(chatModel).prompt()
    .user(u -> u
            .text("Tell me the names of 5 movies whose soundtrack was composed by <composer>")
            .param("composer", "John Williams"))
    .templateRenderer(StTemplateRenderer.builder().startDelimiterToken('<').endDelimiterToken('>').build())
    .call()
    .content();

‍

Ollama-ai

Ollama 0.2.8+

参数设定

spring.ai.ollama.chat.options.model=hf.co/bartowski/gemma-2-2b-it-GGUF
spring.ai.ollama.init.pull-model-strategy=always

spring.ai.ollama.base-url=http://localhost:11434
spring.ai.ollama.chat.options.model=qwen2.5-coder:3b
spring.ai.ollama.chat.options.temperature=0.7

‍

依赖拉取

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-ollama</artifactId>
</dependency>

‍

基础模板

var ollamaApi = OllamaApi.builder().build();

var chatModel = OllamaChatModel.builder()
                    .ollamaApi(ollamaApi)
                    .defaultOptions(
                        OllamaOptions.builder()
                            .model(OllamaModel.MISTRAL)
                            .temperature(0.9)
                            .build())
                    .build();

ChatResponse response = this.chatModel.call(
    new Prompt("Generate the names of 5 famous pirates."));

// Or with streaming responses
Flux<ChatResponse> response = this.chatModel.stream(
    new Prompt("Generate the names of 5 famous pirates."));

运行注入

ChatResponse response = chatModel.call(
    new Prompt(
        "Generate the names of 5 famous pirates.",
        OllamaOptions.builder()
            .model(OllamaModel.LLAMA3_1)
            .temperature(0.4)
            .build()
    ));

控制器

@RestController
public class ChatController {

    private final OllamaChatModel chatModel;

    @Autowired
    public ChatController(OllamaChatModel chatModel) {
        this.chatModel = chatModel;
    }

    @GetMapping("/ai/generate")
    public Map<String,String> generate(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        return Map.of("generation", this.chatModel.call(message));
    }

    @GetMapping("/ai/generateStream")
	public Flux<ChatResponse> generateStream(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        Prompt prompt = new Prompt(new UserMessage(message));
        return this.chatModel.stream(prompt);
    }

}

‍

AutoPull model

Spring AI Ollama 可以在模型在 Ollama 实例中不可用时自动拉取模型。此功能对于开发和测试以及将应用程序部署到新环境特别有用。
拉取模型有三种策略：
always（定义于）：始终拉取模型，即使它已经可用。有助于确保您使用的是最新版本的模型。PullModelStrategy.ALWAYS
when_missing（定义于）：仅在模型尚不可用时拉取模型。这可能会导致使用旧版本的模型。PullModelStrategy.WHEN_MISSING
never（定义于）：从不自动拉取模型。PullModelStrategy.NEVER

spring:
  ai:
    ollama:
      init:
        pull-model-strategy: always
        timeout: 60s
        max-retries: 1
		chat:
          include: false # 排除指定类型的模型

动态加载模型

spring:
  ai:
    ollama:
      init:
        pull-model-strategy: always
        chat:
          additional-models:
            - llama3.2
            - qwen2.5

‍

多模态调用

var imageResource = new ClassPathResource("/multimodal.test.png");

var userMessage = new UserMessage("Explain what do you see on this picture?",
        new Media(MimeTypeUtils.IMAGE_PNG, this.imageResource));

ChatResponse response = chatModel.call(new Prompt(this.userMessage,
        OllamaOptions.builder().model(OllamaModel.LLAVA)).build());

使用聊天选项生成器

String jsonSchema = """
        {
            "type": "object",
            "properties": {
                "steps": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "explanation": { "type": "string" },
                            "output": { "type": "string" }
                        },
                        "required": ["explanation", "output"],
                        "additionalProperties": false
                    }
                },
                "final_answer": { "type": "string" }
            },
            "required": ["steps", "final_answer"],
            "additionalProperties": false
        }
        """;

Prompt prompt = new Prompt("how can I solve 8x + 7 = -23",
        OllamaOptions.builder()
            .model(OllamaModel.LLAMA3_2.getName())
            .format(new ObjectMapper().readValue(jsonSchema, Map.class))
            .build());

ChatResponse response = this.ollamaChatModel.call(this.prompt);

‍

与 BeanOutputConverter 实用程序集成

record MathReasoning(
    @JsonProperty(required = true, value = "steps") Steps steps,
    @JsonProperty(required = true, value = "final_answer") String finalAnswer) {

    record Steps(
        @JsonProperty(required = true, value = "items") Items[] items) {

        record Items(
            @JsonProperty(required = true, value = "explanation") String explanation,
            @JsonProperty(required = true, value = "output") String output) {
        }
    }
}

var outputConverter = new BeanOutputConverter<>(MathReasoning.class);

Prompt prompt = new Prompt("how can I solve 8x + 7 = -23",
        OllamaOptions.builder()
            .model(OllamaModel.LLAMA3_2.getName())
            .format(outputConverter.getJsonSchemaMap())
            .build());

ChatResponse response = this.ollamaChatModel.call(this.prompt);
String content = this.response.getResult().getOutput().getText();

MathReasoning mathReasoning = this.outputConverter.convert(this.content);

‍

编程方式使用

OllamaApi ollamaApi = new OllamaApi("YOUR_HOST:YOUR_PORT");

// Sync request
var request = ChatRequest.builder("orca-mini")
    .stream(false) // not streaming
    .messages(List.of(
            Message.builder(Role.SYSTEM)
                .content("You are a geography teacher. You are talking to a student.")
                .build(),
            Message.builder(Role.USER)
                .content("What is the capital of Bulgaria and what is the size? "
                        + "What is the national anthem?")
                .build()))
    .options(OllamaOptions.builder().temperature(0.9).build())
    .build();

ChatResponse response = this.ollamaApi.chat(this.request);

// Streaming request
var request2 = ChatRequest.builder("orca-mini")
    .ttream(true) // streaming
    .messages(List.of(Message.builder(Role.USER)
        .content("What is the capital of Bulgaria and what is the size? " + "What is the national anthem?")
        .build()))
    .options(OllamaOptions.builder().temperature(0.9).build().toMap())
    .build();

Flux<ChatResponse> streamingResponse = this.ollamaApi.streamingChat(this.request2);

‍

多模型引入

import org.springframework.ai.chat.ChatClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class ChatClientConfig {

    @Bean
    public ChatClient openAiChatClient(OpenAiChatModel chatModel) {
        return ChatClient.create(chatModel);
    }

    @Bean
    public ChatClient anthropicChatClient(AnthropicChatModel chatModel) {
        return ChatClient.create(chatModel);
    }
}

控制台选择模型

@Configuration
public class ChatClientExample {

    @Bean
    CommandLineRunner cli(
            @Qualifier("openAiChatClient") ChatClient openAiChatClient,
            @Qualifier("anthropicChatClient") ChatClient anthropicChatClient) {

        return args -> {
            var scanner = new Scanner(System.in);
            ChatClient chat;

            // Model selection
            System.out.println("\nSelect your AI model:");
            System.out.println("1. OpenAI");
            System.out.println("2. Anthropic");
            System.out.print("Enter your choice (1 or 2): ");

            String choice = scanner.nextLine().trim();

            if (choice.equals("1")) {
                chat = openAiChatClient;
                System.out.println("Using OpenAI model");
            } else {
                chat = anthropicChatClient;
                System.out.println("Using Anthropic model");
            }

            // Use the selected chat client
            System.out.print("\nEnter your question: ");
            String input = scanner.nextLine();
            String response = chat.prompt(input).call().content();
            System.out.println("ASSISTANT: " + response);

            scanner.close();
        };
    }
}

采用 api 端点

@Service
public class MultiModelService {

    private static final Logger logger = LoggerFactory.getLogger(MultiModelService.class);

    @Autowired
    private OpenAiChatModel baseChatModel;

    @Autowired
    private OpenAiApi baseOpenAiApi;

    public void multiClientFlow() {
        try {
            // Derive a new OpenAiApi for Groq (Llama3)
            OpenAiApi groqApi = baseOpenAiApi.mutate()
                .baseUrl("https://api.groq.com/openai")
                .apiKey(System.getenv("GROQ_API_KEY"))
                .build();

            // Derive a new OpenAiApi for OpenAI GPT-4
            OpenAiApi gpt4Api = baseOpenAiApi.mutate()
                .baseUrl("https://api.openai.com")
                .apiKey(System.getenv("OPENAI_API_KEY"))
                .build();

            // Derive a new OpenAiChatModel for Groq
            OpenAiChatModel groqModel = baseChatModel.mutate()
                .openAiApi(groqApi)
                .defaultOptions(OpenAiChatOptions.builder().model("llama3-70b-8192").temperature(0.5).build())
                .build();

            // Derive a new OpenAiChatModel for GPT-4
            OpenAiChatModel gpt4Model = baseChatModel.mutate()
                .openAiApi(gpt4Api)
                .defaultOptions(OpenAiChatOptions.builder().model("gpt-4").temperature(0.7).build())
                .build();

            // Simple prompt for both models
            String prompt = "What is the capital of France?";

            String groqResponse = ChatClient.builder(groqModel).build().prompt(prompt).call().content();
            String gpt4Response = ChatClient.builder(gpt4Model).build().prompt(prompt).call().content();

            logger.info("Groq (Llama3) response: {}", groqResponse);
            logger.info("OpenAI GPT-4 response: {}", gpt4Response);
        }
        catch (Exception e) {
            logger.error("Error in multi-client flow", e);
        }
    }
}