Ollama Chat
Prerequisites
您首先需要在本地计算机上运行 Ollama。请参阅官方 Ollama 项目 README 以开始在本地计算机上运行模型。
You first need to run Ollama on your local machine. Refer to the official Ollama project README to get started running models on your local machine.
安装 |
installing |
Add Repositories and BOM
Spring AI 工件发布在 Spring Milestone 和 Snapshot 存储库中。有关将这些存储库添加到你的构建系统的说明,请参阅 Repositories 部分。
Spring AI artifacts are published in Spring Milestone and Snapshot repositories. Refer to the Repositories section to add these repositories to your build system.
为了帮助进行依赖项管理,Spring AI 提供了一个 BOM(物料清单)以确保在整个项目中使用一致版本的 Spring AI。有关将 Spring AI BOM 添加到你的构建系统的说明,请参阅 Dependency Management 部分。
To help with dependency management, Spring AI provides a BOM (bill of materials) to ensure that a consistent version of Spring AI is used throughout the entire project. Refer to the Dependency Management section to add the Spring AI BOM to your build system.
Auto-configuration
Spring AI 为 Ollama 聊天客户端提供 Spring Boot 自动配置。如需启用,请将以下依赖项添加到项目的 Maven pom.xml
文件:
Spring AI provides Spring Boot auto-configuration for the Ollama Chat Client.
To enable it add the following dependency to your project’s Maven pom.xml
file:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
</dependency>
或添加到 Gradle build.gradle
构建文件中。
or to your Gradle build.gradle
build file.
dependencies {
implementation 'org.springframework.ai:spring-ai-ollama-spring-boot-starter'
}
|
Refer to the Dependency Management section to add the Spring AI BOM to your build file. |
Chat Properties
前缀 spring.ai.ollama
是用于配置与 Ollama 连接的属性前缀。
The prefix spring.ai.ollama
is the property prefix to configure the connection to Ollama
Property | Description | Default |
---|---|---|
spring.ai.ollama.base-url |
Base URL where Ollama API server is running. |
前缀 spring.ai.ollama.chat.options
是用于为 Ollama 配置聊天客户端实现的属性前缀。
The prefix spring.ai.ollama.chat.options
is the property prefix that configures the chat client implementation for Ollama.
|
The |
Property | Description | Default |
---|---|---|
spring.ai.ollama.chat.enabled |
Enable Ollama chat client. |
true |
spring.ai.ollama.chat.options.model |
The name of the supported models to use. |
mistral |
spring.ai.ollama.chat.options.numa |
Whether to use NUMA. |
false |
spring.ai.ollama.chat.options.num-ctx |
Sets the size of the context window used to generate the next token. |
2048 |
spring.ai.ollama.chat.options.num-batch |
??? |
512 |
spring.ai.ollama.chat.options.num-gqa |
The number of GQA groups in the transformer layer. Required for some models, for example, it is 8 for llama2:70b. |
1 |
spring.ai.ollama.chat.options.num-gpu |
The number of layers to send to the GPU(s). On macOS it defaults to 1 to enable metal support, 0 to disable. 1 here indicates that NumGPU should be set dynamically |
-1 |
spring.ai.ollama.chat.options.main-gpu |
??? |
- |
spring.ai.ollama.chat.options.low-vram |
??? |
false |
spring.ai.ollama.chat.options.f16-kv |
??? |
true |
spring.ai.ollama.chat.options.logits-all |
??? |
- |
spring.ai.ollama.chat.options.vocab-only |
??? |
- |
spring.ai.ollama.chat.options.use-mmap |
??? |
true |
spring.ai.ollama.chat.options.use-mlock |
??? |
false |
spring.ai.ollama.chat.options.embedding-only |
??? |
false |
spring.ai.ollama.chat.options.rope-frequency-base |
??? |
10000.0 |
spring.ai.ollama.chat.options.rope-frequency-scale |
??? |
1.0 |
spring.ai.ollama.chat.options.num-thread |
Sets the number of threads to use during computation. By default, Ollama will detect this for optimal performance. It is recommended to set this value to the number of physical CPU cores your system has (as opposed to the logical number of cores). 0 = let the runtime decide |
0 |
spring.ai.ollama.chat.options.num-keep |
??? |
0 |
spring.ai.ollama.chat.options.seed |
Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt. |
-1 |
spring.ai.ollama.chat.options.num-predict |
Maximum number of tokens to predict when generating text. (-1 = infinite generation, -2 = fill context) |
-1 |
spring.ai.ollama.chat.options.top-k |
Reduces the probability of generating nonsense. A higher value (e.g., 100) will give more diverse answers, while a lower value (e.g., 10) will be more conservative. |
40 |
spring.ai.ollama.chat.options.top-p |
Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. |
0.9 |
spring.ai.ollama.chat.options.tfs-z |
Tail-free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting. |
1.0 |
spring.ai.ollama.chat.options.typical-p |
??? |
1.0 |
spring.ai.ollama.chat.options.repeat-last-n |
Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx) |
64 |
spring.ai.ollama.chat.options.temperature |
The temperature of the model. Increasing the temperature will make the model answer more creatively. |
0.8 |
spring.ai.ollama.chat.options.repeat-penalty |
Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. |
1.1 |
spring.ai.ollama.chat.options.presence-penalty |
??? |
0.0 |
spring.ai.ollama.chat.options.frequency-penalty |
??? |
0.0 |
spring.ai.ollama.chat.options.mirostat |
Enable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0) |
0 |
spring.ai.ollama.chat.options.mirostat-tau |
Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive. |
5.0 |
spring.ai.ollama.chat.options.mirostat-eta |
Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. |
0.1 |
spring.ai.ollama.chat.options.penalize-newline |
??? |
true |
spring.ai.ollama.chat.options.stop |
Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return. Multiple stop patterns may be set by specifying multiple separate stop parameters in a modelfile. |
- |
|
The list of options for chat is to be reviewed. This issue will track progress. |
所有以 |
All properties prefixed with |
Chat Options
OllamaOptions.java 提供模型配置,例如要使用的模型、温度等。
The OllamaOptions.java provides model configurations, such as the model to use, the temperature, etc.
在启动时,可以使用 OllamaChatClient(api, options)
构造函数或 spring.ai.ollama.chat.options.*
属性配置默认选项。
On start-up, the default options can be configured with the OllamaChatClient(api, options)
constructor or the spring.ai.ollama.chat.options.*
properties.
在运行时,可以通过向 Prompt
调用中添加新的请求特定选项来覆盖默认选项。例如,覆盖特定请求的默认模型和温度:
At run-time you can override the default options by adding new, request specific, options to the Prompt
call.
For example to override the default model and temperature for a specific request:
ChatResponse response = chatClient.call(
new Prompt(
"Generate the names of 5 famous pirates.",
OllamaOptions.create()
.withModel("llama2")
.withTemperature(0.4)
));
|
In addition to the model specific OllamaOptions you can use a portable ChatOptions instance, created with the ChatOptionsBuilder#builder(). |
Sample Controller (Auto-configuration)
Create 一个新的 Spring Boot 项目,并将 spring-ai-openai-spring-boot-starter
添加到您的 pom(或 gradle)依赖项中。
Create a new Spring Boot project and add the spring-ai-openai-spring-boot-starter
to your pom (or gradle) dependencies.
在 src/main/resources
目录下添加一个 application.properties
文件,以启用和配置 OpenAI Chat 客户端:
Add a application.properties
file, under the src/main/resources
directory, to enable and configure the OpenAi Chat client:
spring.ai.ollama.base-url=http://localhost:11434
spring.ai.ollama.chat.options.model=mistral
spring.ai.ollama.chat.options.temperature=0.7
将 |
replace the |
这将创建一个 OllamaChatClient
实现,你可以将其注入你的类。以下是一个使用聊天客户端进行文本生成的简单 @Controller
类的示例:
This will create a OllamaChatClient
implementation that you can inject into your class.
Here is an example of a simple @Controller
class that uses the chat client for text generations.
@RestController
public class ChatController {
private final OllamaChatClient chatClient;
@Autowired
public ChatController(OllamaChatClient chatClient) {
this.chatClient = chatClient;
}
@GetMapping("/ai/generate")
public Map generate(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
return Map.of("generation", chatClient.call(message));
}
@GetMapping("/ai/generateStream")
public Flux<ChatResponse> generateStream(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
Prompt prompt = new Prompt(new UserMessage(message));
return chatClient.stream(prompt);
}
}
Manual Configuration
如果您不想使用 Spring Boot 自动配置,则可以在应用程序中手动配置 OllamaChatClient
。 OllamaChatClient 实现 ChatClient
和 StreamingChatClient
并使用 Low-level OpenAiApi Client 连接到 Ollama 服务。
If you don’t want to use the Spring Boot auto-configuration, you can manually configure the OllamaChatClient
in your application.
The OllamaChatClient implements the ChatClient
and StreamingChatClient
and uses the Low-level OpenAiApi Client to connect to the Ollama service.
要使用它,将 spring-ai-ollama
依赖项添加到项目的 Maven pom.xml
文件中:
To use it add the spring-ai-ollama
dependency to your project’s Maven pom.xml
file:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-ollama</artifactId>
</dependency>
或添加到 Gradle build.gradle
构建文件中。
or to your Gradle build.gradle
build file.
dependencies {
implementation 'org.springframework.ai:spring-ai-ollama'
}
|
Refer to the Dependency Management section to add the Spring AI BOM to your build file. |
|
The |
接下来,创建一个 OllamaChatClient
实例并使用它对文本生成请求:
Next, create an OllamaChatClient
instance and use it to text generations requests:
var ollamaApi = new OllamaApi();
var chatClient = new OllamaChatClient(ollamaApi).withModel(MODEL)
.withDefaultOptions(OllamaOptions.create()
.withModel(OllamaOptions.DEFAULT_MODEL)
.withTemperature(0.9f));
ChatResponse response = chatClient.call(
new Prompt("Generate the names of 5 famous pirates."));
// Or with streaming responses
Flux<ChatResponse> response = chatClient.stream(
new Prompt("Generate the names of 5 famous pirates."));
OllamaOptions
为所有聊天请求提供配置信息。
The OllamaOptions
provides the configuration information for all chat requests.
Low-level OpenAiApi Client
OllamaApi 提供用于 Ollama Chat API 的轻量级 Java 客户端 Ollama Chat Completion API。
The OllamaApi provides is lightweight Java client for Ollama Chat API Ollama Chat Completion API.
以下类图说明了 OllamaApi
聊天界面和构建模块:
Following class diagram illustrates the OllamaApi
chat interfaces and building blocks:
下面是一个简单的片段,说明如何以编程方式使用 API:
Here is a simple snippet how to use the api programmatically:
OllamaApi ollamaApi =
new OllamaApi("YOUR_HOST:YOUR_PORT");
// Sync request
var request = ChatRequest.builder("orca-mini")
.withStream(false) // not streaming
.withMessages(List.of(
Message.builder(Role.SYSTEM)
.withContent("You are geography teacher. You are talking to a student.")
.build(),
Message.builder(Role.USER)
.withContent("What is the capital of Bulgaria and what is the size? "
+ "What it the national anthem?")
.build()))
.withOptions(OllamaOptions.create().withTemperature(0.9f))
.build();
ChatResponse response = ollamaApi.chat(request);
// Streaming request
var request2 = ChatRequest.builder("orca-mini")
.withStream(true) // streaming
.withMessages(List.of(Message.builder(Role.USER)
.withContent("What is the capital of Bulgaria and what is the size? " + "What it the national anthem?")
.build()))
.withOptions(OllamaOptions.create().withTemperature(0.9f).toMap())
.build();
Flux<ChatResponse> streamingResponse = ollamaApi.streamingChat(request2);