PostgresML Embeddings
Spring AI 支持 PostgresML 文本嵌入模型。
Spring AI supports the PostgresML text embeddings models.
嵌入是文本的数字表示。它们用于将单词和句子表示为向量,即数字数组。嵌入可用于通过使用距离度量比较数字向量相似性来查找相似的文本,或者它们可以用作其他机器学习模型的输入特征,因为大多数算法不能直接使用文本。
Embeddings are a numeric representation of text. They are used to represent words and sentences as vectors, an array of numbers. Embeddings can be used to find similar pieces of text, by comparing the similarity of the numeric vectors using a distance measure, or they can be used as input features for other machine learning models, since most algorithms can’t use text directly.
在 PostgresML 内,许多经过预先训练的 LLM 可用于从文本中生成嵌入。您可浏览所有可用的 models 来在 Hugging Face 上找到最佳解决方案。
Many pre-trained LLMs can be used to generate embeddings from text within PostgresML. You can browse all the models available to find the best solution on Hugging Face.
Add Repositories and BOM
Spring AI 工件发布在 Spring Milestone 和 Snapshot 存储库中。有关将这些存储库添加到你的构建系统的说明,请参阅 Repositories 部分。
Spring AI artifacts are published in Spring Milestone and Snapshot repositories. Refer to the Repositories section to add these repositories to your build system.
为了帮助进行依赖项管理,Spring AI 提供了一个 BOM(物料清单)以确保在整个项目中使用一致版本的 Spring AI。有关将 Spring AI BOM 添加到你的构建系统的说明,请参阅 Dependency Management 部分。
To help with dependency management, Spring AI provides a BOM (bill of materials) to ensure that a consistent version of Spring AI is used throughout the entire project. Refer to the Dependency Management section to add the Spring AI BOM to your build system.
Auto-configuration
Spring AI 为 Azure PostgresML Embedding 客户端提供 Spring Boot 自动配置。要启用它,请将以下依赖项添加到项目的 Maven pom.xml
文件:
Spring AI provides Spring Boot auto-configuration for the Azure PostgresML Embedding Client.
To enable it add the following dependency to your project’s Maven pom.xml
file:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-postgresml-spring-boot-starter</artifactId>
</dependency>
或添加到 Gradle build.gradle
构建文件中。
or to your Gradle build.gradle
build file.
dependencies {
implementation 'org.springframework.ai:spring-ai-postgresml-spring-boot-starter'
}
|
Refer to the Dependency Management section to add the Spring AI BOM to your build file. |
使用 spring.ai.postgresml.embedding.options.*
属性配置你的 PostgresMlEmbeddingClient
。链接
Use the spring.ai.postgresml.embedding.options.*
properties to configure your PostgresMlEmbeddingClient
. links
Embedding Properties
前缀 spring.ai.postgresml.embedding
是配置 PostgresML 嵌入的 EmbeddingClient
实现的属性前缀。
The prefix spring.ai.postgresml.embedding
is property prefix that configures the EmbeddingClient
implementation for PostgresML embeddings.
Property |
Description |
Default |
spring.ai.postgresml.embedding.enabled |
Enable PostgresML embedding client. |
true |
spring.ai.postgresml.embedding.options.transformer |
The Huggingface transformer model to use for the embedding. |
distilbert-base-uncased |
spring.ai.postgresml.embedding.options.kwargs |
Additional transformer specific options. |
empty map |
spring.ai.postgresml.embedding.options.vectorType |
PostgresML vector type to use for the embedding. Two options are supported: |
PG_ARRAY |
spring.ai.postgresml.embedding.options.metadataMode |
Document metadata aggregation mode |
EMBED |
所有以 |
All properties prefixed with |
EmbeddingOptions
使用 PostgresMlEmbeddingOptions.java 使用选项(例如,使用该模型等)来配置 PostgresMlEmbeddingClient
。
Use the PostgresMlEmbeddingOptions.java to configure the PostgresMlEmbeddingClient
with options, such as the model to use and etc.
在启动时,你可以将 PostgresMlEmbeddingOptions
传递给 PostgresMlEmbeddingClient
构造函数,以配置用于所有嵌入请求的默认选项。
On start you can pass a PostgresMlEmbeddingOptions
to the PostgresMlEmbeddingClient
constructor to configure the default options used for all embedding requests.
在运行时,可以使用 EmbeddingRequest
中的 PostgresMlEmbeddingOptions
覆盖默认选项。
At run-time you can override the default options, using a PostgresMlEmbeddingOptions
in your EmbeddingRequest
.
例如,要覆盖特定请求的默认模型名称:
For example to override the default model name for a specific request:
EmbeddingResponse embeddingResponse = embeddingClient.call(
new EmbeddingRequest(List.of("Hello World", "World is big and salvation is near"),
PostgresMlEmbeddingOptions.builder()
.withTransformer("intfloat/e5-small")
.withVectorType(VectorType.PG_ARRAY)
.withKwargs(Map.of("device", "gpu"))
.build()));
Sample Controller (Auto-configuration)
这将创建一个 EmbeddingClient
实现,你可以将其注入到你的类中。这里有一个简单的 @Controller
类的示例,它使用 EmbeddingClient
实现。
This will create a EmbeddingClient
implementation that you can inject into your class.
Here is an example of a simple @Controller
class that uses the EmbeddingClient
implementation.
spring.ai.postgresml.embedding.options.transformer=distilbert-base-uncased
spring.ai.postgresml.embedding.options.vectorType=PG_ARRAY
spring.ai.postgresml.embedding.options.metadataMode=EMBED
spring.ai.postgresml.embedding.options.kwargs.device=cpu
@RestController
public class EmbeddingController {
private final EmbeddingClient embeddingClient;
@Autowired
public EmbeddingController(EmbeddingClient embeddingClient) {
this.embeddingClient = embeddingClient;
}
@GetMapping("/ai/embedding")
public Map embed(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
EmbeddingResponse embeddingResponse = this.embeddingClient.embedForResponse(List.of(message));
return Map.of("embedding", embeddingResponse);
}
}
Manual configuration
除了使用 Spring Boot 自动配置,你还可以手动创建 PostgresMlEmbeddingClient
。为此,将 spring-ai-postgresml
依赖项添加到项目的 Maven pom.xml
文件:
Instead of using the Spring Boot auto-configuration, you can create the PostgresMlEmbeddingClient
manually.
For this add the spring-ai-postgresml
dependency to your project’s Maven pom.xml
file:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-postgresml</artifactId>
</dependency>
或添加到 Gradle build.gradle
构建文件中。
or to your Gradle build.gradle
build file.
dependencies {
implementation 'org.springframework.ai:spring-ai-postgresml'
}
|
Refer to the Dependency Management section to add the Spring AI BOM to your build file. |
接下来,创建一个 PostgresMlEmbeddingClient
实例,并使用它来计算两个输入文本之间的相似性:
Next, create an PostgresMlEmbeddingClient
instance and use it to compute the similarity between two input texts:
var jdbcTemplate = new JdbcTemplate(dataSource); // your posgresml data source
PostgresMlEmbeddingClient embeddingClient = new PostgresMlEmbeddingClient(this.jdbcTemplate,
PostgresMlEmbeddingOptions.builder()
.withTransformer("distilbert-base-uncased") // huggingface transformer model name.
.withVectorType(VectorType.PG_VECTOR) //vector type in PostgreSQL.
.withKwargs(Map.of("device", "cpu")) // optional arguments.
.withMetadataMode(MetadataMode.EMBED) // Document metadata mode.
.build());
embeddingClient.afterPropertiesSet(); // initialize the jdbc template and database.
EmbeddingResponse embeddingResponse = embeddingClient
.embedForResponse(List.of("Hello World", "World is big and salvation is near"));
手动创建时,必须在设置属性并使用客户端之前调用 |
When created manually, you must call the |
@Bean
public EmbeddingClient embeddingClient(JdbcTemplate jdbcTemplate) {
return new PostgresMlEmbeddingClient(jdbcTemplate,
PostgresMlEmbeddingOptions.builder()
....
.build());
}