Ollama Embeddings

利用 Ollama,可以在本地运行各种大型语言模型 (LLM),并由此生成嵌入。Spring AI 支持使用 `OllamaEmbeddingClient`的 Ollama 文本嵌入。

With Ollama you can run various Large Language Models (LLMs) locally and generate embeddings from them. Spring AI supports the Ollama text embeddings with OllamaEmbeddingClient.

嵌入是一个浮点数向量(列表)。两个向量之间的距离测量它们的相关性。距离越小表示相关性越高,距离越大表示相关性越低。

An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness.

Prerequisites

你首先需要在本地计算机上运行 Ollama。

You first need to run Ollama on your local machine.

请参考官方 Ollama 项目 README开始在本地计算机上运行模型。

Refer to the official Ollama project README to get started running models on your local machine.

请注意,安装`ollama run llama2` 将下载一个 4GB 的 docker 镜像。

Note, installing ollama run llama2 will download a 4GB docker image.

Add Repositories and BOM

Spring AI 工件发布在 Spring Milestone 和 Snapshot 存储库中。有关将这些存储库添加到你的构建系统的说明,请参阅 Repositories 部分。

Spring AI artifacts are published in Spring Milestone and Snapshot repositories. Refer to the Repositories section to add these repositories to your build system.

为了帮助进行依赖项管理,Spring AI 提供了一个 BOM(物料清单)以确保在整个项目中使用一致版本的 Spring AI。有关将 Spring AI BOM 添加到你的构建系统的说明,请参阅 Dependency Management 部分。

To help with dependency management, Spring AI provides a BOM (bill of materials) to ensure that a consistent version of Spring AI is used throughout the entire project. Refer to the Dependency Management section to add the Spring AI BOM to your build system.

Auto-configuration

Spring AI 为 Azure Ollama 嵌入客户端提供了 Spring Boot 自动配置。要启用它,请将以下依赖项添加到 Maven pom.xml 文件中:

Spring AI provides Spring Boot auto-configuration for the Azure Ollama Embedding Client. To enable it add the following dependency to your Maven pom.xml file:

<dependency>
   <groupId>org.springframework.ai</groupId>
   <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
</dependency>

或添加到 Gradle build.gradle 构建文件中。

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-ollama-spring-boot-starter'
}
  1. 参见 Dependency Management 部分,将 Spring AI BOM 添加到你的构建文件中。

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

spring.ai.ollama.embedding.options.* 属性用于配置所有嵌入请求使用的默认选项。(它用作 OllamaEmbeddingClient#withDefaultOptions() 实例)。

The spring.ai.ollama.embedding.options.* properties are used to configure the default options used for all embedding requests. (It is used as OllamaEmbeddingClient#withDefaultOptions() instance).

Embedding Properties

前缀 spring.ai.ollama 是用于配置与 Ollama 连接的属性前缀。

The prefix spring.ai.ollama is the property prefix to configure the connection to Ollama

Property Description Default

spring.ai.ollama.base-url

Base URL where Ollama API server is running.

http://localhost:11434

前缀 spring.ai.ollama.embedding.options 是为 Ollama 配置 EmbeddingClient 实现的属性前缀。

The prefix spring.ai.ollama.embedding.options is the property prefix that configures the EmbeddingClient implementation for Ollama.

Property Description Default

spring.ai.ollama.embedding.enabled

Enable Ollama embedding client.

true

spring.ai.ollama.embedding.model (DEPRECATED)

The name of the model to use. Deprecated use the spring.ai.ollama.embedding.options.model instead

mistral

spring.ai.ollama.embedding.options.model

The name of the supported models to use.

mistral

spring.ai.ollama.embedding.options.numa

Whether to use NUMA.

false

spring.ai.ollama.embedding.options.num-ctx

Sets the size of the context window used to generate the next token.

2048

spring.ai.ollama.embedding.options.num-batch

???

-

spring.ai.ollama.embedding.options.num-gqa

The number of GQA groups in the transformer layer. Required for some models, for example, it is 8 for llama2:70b.

-

spring.ai.ollama.embedding.options.num-gpu

The number of layers to send to the GPU(s). On macOS it defaults to 1 to enable metal support, 0 to disable.

-

spring.ai.ollama.embedding.options.main-gpu

???

-

spring.ai.ollama.embedding.options.low-vram

???

-

spring.ai.ollama.embedding.options.f16-kv

???

-

spring.ai.ollama.embedding.options.logits-all

???

-

spring.ai.ollama.embedding.options.vocab-only

???

-

spring.ai.ollama.embedding.options.use-mmap

???

-

spring.ai.ollama.embedding.options.use-mlock

???

-

spring.ai.ollama.embedding.options.embedding-only

???

-

spring.ai.ollama.embedding.options.rope-frequency-base

???

-

spring.ai.ollama.embedding.options.rope-frequency-scale

???

-

spring.ai.ollama.embedding.options.num-thread

Sets the number of threads to use during computation. By default, Ollama will detect this for optimal performance. It is recommended to set this value to the number of physical CPU cores your system has (as opposed to the logical number of cores).

-

spring.ai.ollama.embedding.options.num-keep

???

-

spring.ai.ollama.embedding.options.seed

Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt.

0

spring.ai.ollama.embedding.options.num-predict

Maximum number of tokens to predict when generating text. (Default: 128, -1 = infinite generation, -2 = fill context)

128

spring.ai.ollama.embedding.options.top-k

Reduces the probability of generating nonsense. A higher value (e.g., 100) will give more diverse answers, while a lower value (e.g., 10) will be more conservative.

40

spring.ai.ollama.embedding.options.top-p

Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text.

0.9

spring.ai.ollama.embedding.options.tfs-z

Tail-free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting.

1

spring.ai.ollama.embedding.options.typical-p

???

-

spring.ai.ollama.embedding.options.repeat-last-n

Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)

64

spring.ai.ollama.embedding.options.temperature

The temperature of the model. Increasing the temperature will make the model answer more creatively.

0.8

spring.ai.ollama.embedding.options.repeat-penalty

Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient.

1.1

spring.ai.ollama.embedding.options.presence-penalty

???

-

spring.ai.ollama.embedding.options.frequency-penalty

???

-

spring.ai.ollama.embedding.options.mirostat

Enable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)

0

spring.ai.ollama.embedding.options.mirostat-tau

Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive.

0.1

spring.ai.ollama.embedding.options.mirostat-eta

Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text.

5.0

spring.ai.ollama.embedding.options.penalize-newline

???

-

spring.ai.ollama.embedding.options.stop

Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return. Multiple stop patterns may be set by specifying multiple separate stop parameters in a modelfile.

-

spring.ai.ollama.embedding.options.* 属性基于 Ollama Valid Parameters and ValuesOllama Types

The spring.ai.ollama.embedding.options.* properties are based on the Ollama Valid Parameters and Values and Ollama Types

所有以 spring.ai.ollama.embedding.options 为前缀的属性都可以通过向 EmbeddingRequest 调用中添加一个特定的 Embedding Options 请求在运行时进行覆盖。

All properties prefixed with spring.ai.ollama.embedding.options can be overridden at runtime by adding a request specific Embedding Options to the EmbeddingRequest call.

Embedding Options

OllamaOptions.java提供了 Ollama 配置,如要使用的模型、低级 GPU 和 CPU 调优等。

The OllamaOptions.java provides the Ollama configurations, such as the model to use, the low level GPU and CPU tunning, etc.

默认选项也可以使用 spring.ai.ollama.embedding.options 属性进行配置。

The default options can be configured using the spring.ai.ollama.embedding.options properties as well.

在开始时,使用 OllamaEmbeddingClient#withDefaultOptions() 配置用于所有嵌入请求的默认选项。在运行时,你可以使用 OllamaOptions 实例作为 EmbeddingRequest 的一部分来覆盖默认选项。

At start-time use the OllamaEmbeddingClient#withDefaultOptions() to configure the default options used for all embedding requests. At run-time you can override the default options, using a OllamaOptions instance as part of your EmbeddingRequest.

例如,要覆盖特定请求的默认模型名称:

For example to override the default model name for a specific request:

EmbeddingResponse embeddingResponse = embeddingClient.call(
    new EmbeddingRequest(List.of("Hello World", "World is big and salvation is near"),
        OllamaOptions.create()
            .withModel("Different-Embedding-Model-Deployment-Name"));

Sample Controller (Auto-configuration)

这将创建一个 EmbeddingClient 实现,你可以将其注入到你的类中。这里有一个简单的 @Controller 类的示例,它使用 EmbeddingClient 实现。

This will create a EmbeddingClient implementation that you can inject into your class. Here is an example of a simple @Controller class that uses the EmbeddingClient implementation.

@RestController
public class EmbeddingController {

    private final EmbeddingClient embeddingClient;

    @Autowired
    public EmbeddingController(EmbeddingClient embeddingClient) {
        this.embeddingClient = embeddingClient;
    }

    @GetMapping("/ai/embedding")
    public Map embed(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        EmbeddingResponse embeddingResponse = this.embeddingClient.embedForResponse(List.of(message));
        return Map.of("embedding", embeddingResponse);
    }
}

Manual Configuration

如果你不使用 Spring Boot,则可以手动配置 OllamaEmbeddingClient。为此,将 spring-ai-ollama 依赖项添加到项目的 Maven pom.xml 文件中:

If you are not using Spring Boot, you can manually configure the OllamaEmbeddingClient. For this add the spring-ai-ollama dependency to your project’s Maven pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-ollama</artifactId>
</dependency>

或添加到 Gradle build.gradle 构建文件中。

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-ollama'
}
  1. 参见 Dependency Management 部分,将 Spring AI BOM 添加到你的构建文件中。

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

spring-ai-ollama 依赖项还提供对 OllamaChatClient 的访问。有关 OllamaChatClient 的更多信息,请参阅 Ollama Chat Client 部分。

The spring-ai-ollama dependency provides access also to the OllamaChatClient. For more information about the OllamaChatClient refer to the Ollama Chat Client section.

接下来,创建一个 OllamaEmbeddingClient 实例,并使用它来计算两个输入文本之间的相似度:

Next, create an OllamaEmbeddingClient instance and use it to compute the similarity between two input texts:

var ollamaApi = new OllamaApi();

var embeddingClient = new OllamaEmbeddingClient(ollamaApi)
    .withDefaultOptions(OllamaOptions.create()
			.withModel(OllamaOptions.DEFAULT_MODEL)
            .toMap());

EmbeddingResponse embeddingResponse = embeddingClient
	.embedForResponse(List.of("Hello World", "World is big and salvation is near"));

OllamaOptions 为所有嵌入请求提供配置信息。

The OllamaOptions provides the configuration information for all embedding requests.