Hibernate Search 中文操作指南

18. Elasticsearch backend

18.1. Compatibility

18.1.1. Overview

Hibernate Search 的 Elasticsearch 后端与 Elasticsearch 的多个发行版兼容:

Hibernate Search’s Elasticsearch backend is compatible with multiple distributions of Elasticsearch:

有关 Hibernate Search 的哪些版本与 Elasticsearch/OpenSearch 的特定版本兼容,请参考 compatibility matrix

For information about which versions of Hibernate Search are compatible with a given version of Elasticsearch/OpenSearch, refer to the compatibility matrix.

有关 Hibernate Search 的哪些未来版本应该会保留与 Elasticsearch/OpenSearch 当前兼容版本的兼容性,请参考 compatibility policy

For information about which future versions of Hibernate Search you can expect to retain compatibility with currently compatible versions of Elasticsearch/OpenSearch, refer to the compatibility policy.

如果可能,集群上运行的分发和版本会在启动时自动检测,而 Hibernate Search 会根据该信息进行调整。

Where possible, the distribution and version running on your cluster will be automatically detected on startup, and Hibernate Search will adapt based on that.

使用 Amazon OpenSearch Serverless,或者当您的群集在启动时不可用时,您将必须明确配置 Hibernate Search 应该期望的版本:有关详细信息,请参见 Version compatibility

With Amazon OpenSearch Serverless, or when your cluster is not available on startup, you will have to configure the version Hibernate Search should expect explicitly: see Version compatibility for details.

目标版本对于 Hibernate Search 用户而言大多是透明的,但是 Hibernate Search 根据可能影响你的 Elasticsearch 发行版和版本的行为存在一些差异。以下章节详细介绍了这些差异。

The targeted version is mostly transparent to Hibernate Search users, but there are a few differences in how Hibernate Search behaves depending on the Elasticsearch distribution and version that may affect you. The following sections detail those differences.

18.1.2. Elasticsearch

Hibernate Search 的 Elasticsearch 后端与运行版本 7.10+ 或 8.x 的 Elasticsearch 集群兼容,并针对版本 7.10、7.17 或 8.14 定期进行测试。

Hibernate Search’s Elasticsearch backend is compatible with Elasticsearch clusters running version 7.10+ or 8.x and regularly tested against versions 7.10, 7.17 or 8.14.

目前,使用 Elasticsearch 不需要特定的配置且不会暗示特定的限制。

Using Elasticsearch currently doesn’t require specific configuration and doesn’t imply specific limitations.

18.1.3. OpenSearch

Hibernate Search 的 Elasticsearch 后端与运行版本 1.3 或 2.x 的 OpenSearch 集群兼容,并针对版本 1.3 或 2.14 定期进行测试。

Hibernate Search’s Elasticsearch backend is compatible with OpenSearch clusters running version 1.3 or 2.x and regularly tested against versions 1.3 or 2.14.

目前使用 OpenSearch 无需特定配置 Hibernate Search 在使用 knn predicate与 OpenSearch 时会应用某些限制。这些限制源自 OpenSearch 的功能可用性,更多详情请参阅 this section of the documentation

Using OpenSearch currently doesn’t require specific configuration There are some limitations applied by Hibernate Search when it comes to using a knn predicate with OpenSearch. These limitations come from OpenSearch’s feature availability, see this section of the documentation for more details.

18.1.4. Amazon OpenSearch Service

Hibernate Search 的 Elasticsearch 后端与 Amazon OpenSearch Service 兼容,并针对关键版本定期进行测试。

Hibernate Search’s Elasticsearch backend is compatible with Amazon OpenSearch Service and regularly tested against key versions.

使用亚马逊 OpenSearch 服务需要 proprietary authentication,其中涉及额外的配置。

Using Amazon OpenSearch Service requires proprietary authentication that involves extra configuration.

使用亚马逊 OpenSearch 服务意味着一项限制:在运行 Elasticsearch(而非 OpenSearch)并且仅在 7.1 或更低版本中时, closing indexes is not possible,进而 automatic schema updates ( not recommended in production) will fail when trying to update analyzer definitions

Using Amazon OpenSearch Service implies a single limitations: when running Elasticsearch (not OpenSearch) and only in version 7.1 or older, closing indexes is not possible, and as a result automatic schema updates (not recommended in production) will fail when trying to update analyzer definitions.

18.1.5. Amazon OpenSearch Serverless (incubating)

以下列出的特性尚处于 incubating 阶段:它们仍在积极开发中。

Features detailed below are incubating: they are still under active development.

通常 compatibility policy 不适用:孵化元素(例如类型、方法、配置属性等)的契约在后续版本中可能会以向后不兼容的方式更改,甚至可能被移除。

The usual compatibility policy does not apply: the contract of incubating elements (e.g. types, methods, configuration properties, etc.) may be altered in a backward-incompatible way — or even removed — in subsequent releases.

我们建议您使用孵化特性,以便开发团队可以收集反馈并对其进行改进,但在需要时您应做好更新依赖于这些特性的代码的准备。

You are encouraged to use incubating features so the development team can get feedback and improve them, but you should be prepared to update code which relies on them as needed.

Amazon OpenSearch Serverless 兼容性已实现并处于孵化中;随时对 HSEARCH-4867 提供反馈。

Amazon OpenSearch Serverless compatibility is implemented and incubating; feel free to provide feedback on HSEARCH-4867.

但是,请注意:

However, be aware that:

  1. Hibernate Search does not currently get tested against Amazon OpenSearch Serverless; see HSEARCH-4919.

  2. Connecting to an Amazon OpenSearch Serverless cluster requires proprietary authentication that involves extra configuration.

  3. Compatibility with Amazon OpenSearch Serverless must be enabled explicitly by setting hibernate.search.backend.version to amazon-opensearch-serverless.

此外, Amazon OpenSearch Serverless也有其自身的特定限制:

Also, Amazon OpenSearch Serverless has its own, specific limitations:

  1. Closing indexes is not possible, and as a result automatic schema updates (not recommended in production) will fail when trying to update analyzer definitions.

  2. Distribution/version detection on startup is not possible, so it is disabled by default and cannot be enabled explicitly.

  3. Minimal index status requirement for schema management is not possible, so it is disabled by default and cannot be enabled explicitly.

  4. Purging, flushing, refreshing, or merging segments is not possible, so attempts to perform these operations explicitly will always fail.

  5. The mass indexer will fail if you attempt a purge on start (the default), because Amazon OpenSearch Serverless doesn’t support it. Use .dropAndCreateSchemaOnStart(…​) to drop the indexes on start instead. See HSEARCH-4930.

  6. The mass indexer will skip the flush, refresh and merge-segments operations by default, and attempting to enable them explicitly will result in failures, because Amazon OpenSearch Serverless doesn’t support them.

  7. The Jakarta Batch integration is not currently supported. See HSEARCH-4929, HSEARCH-4930.

18.1.6. Upgrading Elasticsearch

升级 Elasticsearch 集群后,您的集群上仍需要某些操作:Hibernate Search 不会处理这些操作。

When upgrading your Elasticsearch cluster, some administrative tasks are still required on your cluster: Hibernate Search will not take care of those.

除此之外,Elasticsearch 的一些版本之间可能存在一些基本差异。请参阅 Elasticsearch 文档和迁移指南,以识别任何不兼容的模式更改。

On top of that, there might be some fundamental differences between some versions of Elasticsearch. Please refer to the Elasticsearch documentation and migration guides to identify any incompatible schema changes.

在上述情况下,最简单的升级方式是手动删除索引,让 Hibernate Search 重新创建索引及其架构并 reindex your data

In such cases, the easiest way to upgrade is to delete your indexes manually, make Hibernate Search re-create the indexes along with their schema, and reindex your data.

18.2. Basic configuration

Elasticsearch 后端的全部配置属性都是可选的,但默认值可能并不适合所有人。尤其是您的生产 Elasticsearch 集群可能无法在 _ http://localhost:9200_ 访问,因此您需要通过 configuring the client 设置集群的地址。

All configuration properties of the Elasticsearch backend are optional, but the defaults might not suit everyone. In particular your production Elasticsearch cluster is probably not reachable at http://localhost:9200, so you will need to set the address of your cluster by configuring the client.

相关配置属性在本文档的相关部分中有所提及。您可以在 the Elasticsearch backend configuration properties appendix中找到可用属性的完整参考。

Configuration properties are mentioned in the relevant parts of this documentation. You can find a full reference of available properties in the Elasticsearch backend configuration properties appendix.

18.3. Configuration of the Elasticsearch cluster

多数情况下,Hibernate Search 不需要手动向 Elasticsearch 集群应用任何特定配置,而超出 can be automatically generated的索引映射(架构)。

Most of the time, Hibernate Search does not require any specific configuration to be applied by hand to the Elasticsearch cluster, beyond the index mapping (schema) which can be automatically generated.

唯一的例外是 Sharding,需要显式启用。

The only exception is Sharding, which needs to be enabled explicitly.

18.4. Client configuration

Elasticsearch 后端通过 REST 客户端与 Elasticsearch 集群通信。以下选项会影响此客户端。

An Elasticsearch backend communicates with an Elasticsearch cluster through a REST client. Below are the options that affect this client.

18.4.1. Target hosts

以下属性配置了 Elasticsearch 主机(或主机)以发送索引请求和搜索查询:

The following property configures the Elasticsearch host (or hosts) to send indexing requests and search queries to:

hibernate.search.backend.hosts = localhost:9200

此属性的默认值是 localhost:9200

The default for this property is localhost:9200.

此属性可以设置为表示主机和端口的字符串,例如 localhostes.mycompany.com:4400,或包含多个以逗号分隔的此类主机和端口字符串的字符串,或包含此类主机和端口字符串的 Collection<String>

This property may be set to a String representing a host and port such as localhost or es.mycompany.com:4400, or a String containing multiple such host-and-port strings separated by commas, or a Collection<String> containing such host-and-port strings.

您可以使用此配置属性更改用于与主机通信的协议:

You may change the protocol used to communicate with the hosts using this configuration property:

hibernate.search.backend.protocol = http

此属性的默认值是 http

The default for this property is http.

此属性可以设置为 httphttps

This property may be set to either http or https.

或者,这两种协议和主机都可以使用单个属性定义为一个或多个 URI:

Alternatively, it is possible to define both the protocol and hosts as one or more URIs using a single property:

hibernate.search.backend.uris = http://localhost:9200

此属性可以设置为表示 URI 的字符串,例如 _ http://localhost_ 或 _ https://es.mycompany.com:4400_ ,或包含逗号分隔的多重此类 URI 字符串的字符串,或包含此类 URI 字符串的 Collection<String>

This property may be set to a String representing a URI such as http://localhost or https://es.mycompany.com:4400, or a String containing multiple such URI strings separated by commas, or a Collection<String> containing such URI strings.

对于使用此属性有一些限制:

There are some constraints regarding the use of this property:

所有 uri 必须具有相同的协议。

All the uris must have the same protocol.

如果设置了 hostsprotocol,则无法使用。

Cannot be used if hosts or protocol are set.

提供的 URI 列表不得为空。

The provided list of URIs must not be empty.

18.4.2. Path prefix

默认情况下,REST API 预计可用于根路径 (/)。例如,一个面向所有索引的搜索查询会被发送到路径 /_search。这是标准 Elasticsearch 设置所需的内容。

By default, the REST API is expected to be available to the root path (/). For example a search query target all indexes will be sent to path /_search. This is what you need for a standard Elasticsearch setup.

如果你的设置非标准,例如因为应用程序和 Elasticsearch 集群之间存在非透明的代理,你可以使用与此类似的配置:

If your setup is non-standard, for example because of a non-transparent proxy between the application and the Elasticsearch cluster, you can use a configuration similar to this:

hibernate.search.backend.path_prefix = my/path

通过以上方式,面向所有索引的搜索查询将被发送到 /my/path/_search 路径,而不是 /_search 路径。对于发送到 Elasticsearch 的所有请求,路径的前缀都类似。

With the above, a search query targeting all indexes will be sent to path /my/path/_search instead of /_search. The path will be prefixed similarly for all requests sent to Elasticsearch.

18.4.3. Node discovery

在使用自动发现时,Elasticsearch 客户端会定期探测群集中的新节点,并将这些节点添加到主机列表中(请参阅 Client configuration中的 hosts)。

When using automatic discovery, the Elasticsearch client will periodically probe for new nodes in the cluster, and will add those to the host list (see hosts in Client configuration).

自动发现通过下列属性控制:

Automatic discovery is controlled by the following properties:

hibernate.search.backend.discovery.enabled = false
hibernate.search.backend.discovery.refresh_interval = 10
  1. discovery.enabled defines whether the feature is enabled. Expects a boolean value. The default for this property is false.

  2. discovery.refresh_interval defines the interval between two executions of the automatic discovery. Expects a positive integer, in seconds. The default for this property is 10.

18.4.4. HTTP authentication

HTTP 身份验证默认情况下处于禁用状态,但是可以通过设置以下配置属性启用:

HTTP authentication is disabled by default, but may be enabled by setting the following configuration properties:

hibernate.search.backend.username = ironman
hibernate.search.backend.password = j@rv1s

这些属性的默认值为一个空字符串。

The default for these properties is an empty string.

连接到 Elasticsearch 服务器时要发送的用户名和密码。

The username and password to send when connecting to the Elasticsearch servers.

如果您使用 HTTP 而不是 HTTPS(见上文),您的密码将以明文形式通过网络传输。

If you use HTTP instead of HTTPS (see above), your password will be transmitted in clear text over the network.

18.4.5. Authentication on Amazon Web Services

完成配置后,Hibernate Search Elasticsearch 后端在大多数设置中会正常运作。但是,如果您需要使用 Amazon OpenSearch ServiceAmazon OpenSearch Serverless,您会发现它们需要一种专用身份验证方法: request signing

The Hibernate Search Elasticsearch backend, once configured, will work just fine in most setups. However, if you need to use Amazon OpenSearch Service or Amazon OpenSearch Serverless, you will find they require a proprietary authentication method: request signing.

虽然请求签名默认情况下不受支持,但是你可以使用附加的依赖项和几分配置启用它。

While request signing is not supported by default, you can enable it with an additional dependency and a bit of configuration.

你需要添加此依赖项:

You will need to add this dependency:

<dependency>
   <groupId>org.hibernate.search</groupId>
   <artifactId>hibernate-search-backend-elasticsearch-aws</artifactId>
   <version>7.2.0.Alpha2</version>
</dependency>

在类路径中添加此依赖项后,你仍需要对其进行配置。

With that dependency in your classpath, you will still need to configure it.

下列配置是强制要求的:

The following configuration is mandatory:

hibernate.search.backend.aws.signing.enabled = true
hibernate.search.backend.aws.region = us-east-1
  1. aws.signing.enabled defines whether request signing is enabled. Expects a boolean value. Defaults to false.

  2. aws.region defines the AWS region. Expects a string value. This property has no default and must be provided for the AWS authentication to work.

默认情况下,Hibernate Search 将依赖 AWS SDK 的默认凭据提供者。此提供者会在各种位置(Java 系统属性、环境变量、特定于 AWS 的配置等)中查找凭据。有关默认凭据提供者工作方式的更多信息,请参阅以下内容。

By default, Hibernate Search will rely on the default credentials provider from the AWS SDK. This provider will look for credentials in various places (Java system properties, environment variables, AWS-specific configuration, …​). For more information about how the default credentials provider works, see its official documentation.

或者,你可以用以下选项设置静态凭据:

Optionally, you can set static credentials with the following options:

hibernate.search.backend.aws.credentials.type = static
hibernate.search.backend.aws.credentials.access_key_id = AKIDEXAMPLE
hibernate.search.backend.aws.credentials.secret_access_key = wJalrXUtnFEMI/K7MDENG+bPxRfiCYEXAMPLEKEY
  1. aws.credentials.type defines the type of credentials. Set to default to get the default behavior (as explained above), or to static to provide credentials using the properties below.

  2. aws.credentials.access_key_id defines the access key ID. Expects a string value. This property has no default and must be provided when the credentials type is set to static.

  3. aws.credentials.secret_access_key defines the secret access key. Expects a string value. This property has no default and must be provided when the credentials type is set to static.

18.4.6. Connection tuning

Timeouts

# hibernate.search.backend.request_timeout = 30000 hibernate.search.backend.connection_timeout = 1000 hibernate.search.backend.read_timeout = 30000__request_timeout 定义执行请求时的超时时间。这包括建立连接、发送请求和读取响应所需的时间。此属性默认未定义。

# hibernate.search.backend.request_timeout = 30000 hibernate.search.backend.connection_timeout = 1000 hibernate.search.backend.read_timeout = 30000__request_timeout defines the timeout when executing a request. This includes the time needed to establish a connection, send the request and read the response. This property is not defined by default.

connection_timeout 定义建立连接时的超时。此属性的默认值为 1000

connection_timeout defines the timeout when establishing a connection. The default for this property is 1000.

read_timeout 定义读取响应时的超时时间。此属性的默认值为 30000

read_timeout defines the timeout when reading a response. The default for this property is 30000.

这些属性需要以毫秒为单位的正数 Integer value,例如 3000

These properties expect a positive Integer value in milliseconds, such as 3000.

Connection pool

hibernate.search.backend.max_connections = 20 hibernate.search.backend.max_connections_per_route = 10__max_connections 定义到 Elasticsearch 集群的最大同时连接数(包括所有主机)。此属性的默认值为 20

hibernate.search.backend.max_connections = 20 hibernate.search.backend.max_connections_per_route = 10__max_connections defines maximum number of simultaneous connections to the Elasticsearch cluster, all hosts taken together. The default for this property is 20.

max_connections_per_route 定义与 Elasticsearch 集群的每个主机的最大并发连接数。此属性的默认值为 10

max_connections_per_route defines maximum number of simultaneous connections to each host of the Elasticsearch cluster. The default for this property is 10.

这些属性需要一个正数 Integer value,例如 20

These properties expect a positive Integer value, such as 20.

Keep Alive

hibernate.search.backend.max_keep_alive = 10000__max_keep_alive 定义与 Elasticsearch 集群的连接的最长闲置时间。

hibernate.search.backend.max_keep_alive = 10000__max_keep_alive defines how long connections to the Elasticsearch cluster can be kept idle.

需要以毫秒为单位的正数 Long value,例如 60000

Expects a positive Long value in milliseconds, such as 60000.

如果 Elasticsearch 集群的响应包含 Keep-Alive 标头,则有效最大空闲时间为 Keep-Alive 标头或此属性的值(如果已设置)中较低的一个。

If the response from an Elasticsearch cluster contains a Keep-Alive header, then the effective max idle time will be whichever is lower: the duration from the Keep-Alive header or the value of this property (if set).

如果未设置此属性,则仅考虑 Keep-Alive 标头,如果它不存在,则空闲连接将永久保留。

If this property is not set, only the Keep-Alive header is considered, and if it’s absent, idle connections will be kept forever.

18.4.7. Custom HTTP client configurations

可以使用 org.apache.http.impl.nio.client.HttpAsyncClientBuilder 的实例直接配置 HTTP 客户端。

It is possible to configure the HTTP client directly using an instance of org.apache.http.impl.nio.client.HttpAsyncClientBuilder.

使用此 API,您可以添加拦截器、更改连接保持活动、最大连接数、SSL 密钥/信任存储设置以及许多其他客户端配置。

With this API you can add interceptors, change the keep alive, the max connections, the SSL key/trust store settings and many other client configurations.

直接配置 HTTP 客户端需要两步:

Configure the HTTP client directly requires two steps:

  • Define a class that implements the org.hibernate.search.backend.elasticsearch.client.ElasticsearchHttpClientConfigurer interface.

  • Configure Hibernate Search to use that implementation by setting the configuration property hibernate.search.backend.client.configurer to a bean reference pointing to the implementation, for example class:org.hibernate.search.documentation.backend.elasticsearch.client.HttpClientConfigurer.

示例 429. 实现和使用 ElasticsearchHttpClientConfigurer

. Example 429. Implementing and using a ElasticsearchHttpClientConfigurer

public class HttpClientConfigurer implements ElasticsearchHttpClientConfigurer { (1)

    @Override
    public void configure(ElasticsearchHttpClientConfigurationContext context) { (2)
        HttpAsyncClientBuilder clientBuilder = context.clientBuilder(); (3)
        clientBuilder.setMaxConnPerRoute( 7 ); (4)
        clientBuilder.addInterceptorFirst( (HttpResponseInterceptor) (request, httpContext) -> {
            long contentLength = request.getEntity().getContentLength();
            // doing some stuff with contentLength
        } );
    }
}
示例 430. 在属性中定义自定义 HTTP 客户端配置程序

. Example 430. Define a custom http client configurer in the properties

(1)
hibernate.search.backend.client.configurer = class:org.hibernate.search.documentation.backend.elasticsearch.client.HttpClientConfigurer

自定义 http client配置器定义的任何设置都将覆盖 Hibernate Search 定义的任何其他设置。

Any setting defined by a custom http client configurer will override any other setting defined by Hibernate Search.

18.5. Version compatibility

18.5.1. Version assumed by Hibernate Search

不同发行版和版本的 Elasticsearch/OpenSearch 公开了稍有不同的 API。因此,Hibernate Search 需要了解它要通信的发行版和版本,以生成正确的 HTTP 请求。

Different distributions and versions of Elasticsearch/OpenSearch expose slightly different APIs. As a result, Hibernate Search needs to be aware of the distribution and version it is talking to in order to generate correct HTTP requests.

默认情况下,Hibernate Search 将在启动时查询 Elasticsearch/OpenSearch 集群以检索此信息,并将推断出要采用的正确行为。

By default, Hibernate Search will query the Elasticsearch/OpenSearch cluster at boot time to retrieve this information, and will infer the correct behavior to adopt.

您可以通过将属性 hibernate.search.backend.version 设置为遵循 x.y.z-qualifier<distribution>:x.y.z-qualifier 或仅 <distribution> 格式的版本字符串来强制 Hibernate Search 期望某个特定版本的 Elasticsearch/OpenSearch,其中:

You can force Hibernate Search to expect a specific version of Elasticsearch/OpenSearch by setting the property hibernate.search.backend.version to a version string following the format x.y.z-qualifier or <distribution>:x.y.z-qualifier or just <distribution>, where:

  1. <distribution> is either elastic, opensearch or amazon-opensearch-serverless. Optional, defaults to elastic.

  2. x, y and z are integers. x is mandatory, y and z are optional.

  3. qualifier is a string of word characters (alphanumeric or _). Optional.

例如,88.08.9opensearch:2.9amazon-opensearch-service 均为有效的版本字符串。

For example, 8, 8.0, 8.9, opensearch:2.9, amazon-opensearch-service are all valid version strings.

Amazon OpenSearch Serverless 是一个特例,因为它不使用版本号。

Amazon OpenSearch Serverless is a special case as it doesn’t use version numbers.

使用该平台时,您必须设置版本,并且必须简单地将它设置为 amazon-opensearch-serverless ,没有拖尾的 : 或版本号。

When using that platform, you must set the version and it must be set to simply amazon-opensearch-serverless, without a trailing : or version number.

Hibernate Search 仍会查询 Elasticsearch/OpenSearch 集群以检测该集群的实际分布和版本(不支持的除外,即 Amazon OpenSearch Serverless),以便检查已配置的分布和版本是否与实际版本相匹配。

Hibernate Search will still query the Elasticsearch/OpenSearch cluster to detect the actual distribution and version of the cluster (except where not supported, i.e. Amazon OpenSearch Serverless), in order to check that the configured distribution and version match the actual ones.

18.5.2. Disabling the version check on startup

如有必要,您可以在启动时禁用对 Elasticsearch/OpenSearch 集群的调用,并手动提供信息。

If necessary, you can disable the call to the Elasticsearch/OpenSearch cluster on startup, and provide the information manually.

为此,将属性 hibernate.search.backend.version_check.enabled 设置为 false

To do that, set the property hibernate.search.backend.version_check.enabled to false.

您还必须将属性 _hibernate.search.backend.version_设置为一个版本字符串,如 previous section中所述。

You will also have to set the property hibernate.search.backend.version to a version string as explained in the previous section.

在这种情况下,主版本号和次版本号(上面的格式中的 xy )是必需的,但如果它是默认值 ( elasticsearch ),则 distribution 可以省略,所有其他组件(低级、识别符)仍然是可选的。例如, 8.08.9opensearch:2.9 在这种情况下都是有效的版本字符串,但 8 不够精确。

In this case, both major and minor version numbers (x and y in the formats above) are mandatory, but the distribution can be left out if it is the default (elasticsearch), and all other components (micro, qualifier) remain optional. For example, 8.0, 8.9, opensearch:2.9 are all valid version strings in this case, but 8 is not precise enough.

18.6. Request logging

hibernate.search.backend.log.json_pretty_printing boolean property定义是否将 request logs中包含的 JSON 漂亮打印(缩进,带换行符)。其默认值为 false

The hibernate.search.backend.log.json_pretty_printing boolean property defines whether JSON included in request logs should be pretty-printed (indented, with line breaks). It defaults to false.

18.7. Sharding

有关分片的初步介绍,包括它在 Hibernate Search 中的工作方式以及它的局限性是什么,请参阅 Sharding and routing

For a preliminary introduction to sharding, including how it works in Hibernate Search and what its limitations are, see Sharding and routing.

Elasticsearch 默认禁用分片。要启用它,请链接:https://www.elastic.co/guide/en/elasticsearch/reference/8.14/index-modules.html# static_index_settings[set the property _index.number_of_shards 在您的集群中]。

Elasticsearch disables sharding by default. To enable it, set the property _index.number_of_shards in your cluster.

18.8. Schema management

Elasticsearch 索引需要在用于编制索引和搜索之前创建;有关如何在 Hibernate Search 中创建索引及其架构的更多信息,请参阅 Managing the index schema

Elasticsearch indexes need to be created before they can be used for indexing and searching; see Managing the index schema for more information about how to create indexes and their schema in Hibernate Search.

专门针对 Elasticsearch,可以通过以下选项进行一些微调:

For Elasticsearch specifically, some fine-tuning is available through the following options:

# To configure the defaults for all indexes:
hibernate.search.backend.schema_management.minimal_required_status = green
hibernate.search.backend.schema_management.minimal_required_status_wait_timeout = 10000
# To configure a specific index:
hibernate.search.backend.indexes.<index-name>.schema_management.minimal_required_status = green
hibernate.search.backend.indexes.<index-name>.schema_management.minimal_required_status_wait_timeout = 10000
  1. minimal_required_status defines the minimal required status of an index before creation is considered complete. The default for this property is yellow, except on Amazon OpenSearch Serverless where index status checks are skipped because that platform does not support index status checks.

  2. minimal_required_status_wait_timeout defines the maximum time to wait for this status, as an integer value in milliseconds. The default for this property is 10000.

这些属性仅在作为模式管理一部分创建或验证索引时才有效。

These properties are only effective when creating or validating an index as part of schema management.

18.9. Index layout

Hibernate Search 可使用索引。这意味着 Hibernate Search 中具有给定名称的索引不会直接映射到 Elasticsearch 中具有相同名称的索引。

Hibernate Search works with aliased indexes. This means an index with a given name in Hibernate Search will not directly be mapped to an index with the same name in Elasticsearch.

索引布局是 Hibernate Search 索引名称映射到 Elasticsearch 索引的方式,并且控制该布局的策略在后端级别设置:

The index layout is how Hibernate Search index names are mapped to Elasticsearch indexes, and the strategy controlling that layout is set at the backend level:

hibernate.search.backend.layout.strategy = simple

此属性的默认值为 simple

The default for this property is simple.

有关可用策略的详细信息,请参阅以下小节。

See the following subsections for details about available strategies.

18.9.1. simple: the default, future-proof strategy

对于 Hibernate Search 中名称为 myIndex 的索引:

For an index whose name in Hibernate Search is myIndex:

  1. If Hibernate Search creates the index automatically, it will name the index myindex-000001 and will automatically create the write and read aliases.

  2. Write operations (indexing, purge, …​) will target the alias myindex-write.

  3. Read operations (searching, explaining, …​) will target the alias myindex-read.

simple 布局比它可能存在的要复杂一些,但它遵循了最佳实践。

The simple layout is a bit more complex than it could be, but it follows the best practices.

使用别名比直接针对索引有显著的优势:它可以在不宕机的情况下对实时应用程序进行完全重新索引,当 listener-triggered indexing 禁用 ( completelypartially ) 且您需要定期(例如每天)完全重新索引时尤其有用。

Using aliases has a significant advantage over directly targeting the index: it makes full reindexing on a live application possible without downtime, which is useful in particular when listener-triggered indexing is disabled (completely or partially) and you need to fully reindex periodically (for example on a daily basis).

使用别名,您只需将读取别名(由搜索查询使用)定向到索引的旧副本,而写入别名(由文档写入使用)将被重定向到索引的新副本。在没有别名(特别是使用 no-alias 布局)的情况下,这是不可能的。

With aliases, you just need to direct the read alias (used by search queries) to an old copy of the index, while the write alias (used by document writes) is redirected to a new copy of the index. Without aliases (in particular with the no-alias layout), this is impossible.

这种“零宕机”重新索引与 "blue/green" deployment 有一些共同特点,目前还没有由 Hibernate Search 本身提供。然而,您可以通过直接向 Elasticsearch 的 REST API 发出命令在您的应用程序中实现它。基本操作顺序如下:

This "zero-downtime" reindexing, which shares some characteristics with "blue/green" deployment, is not currently provided by Hibernate Search itself. However, you can implement it in your application by directly issuing commands to Elasticsearch’s REST APIs. The basic sequence of actions is the following:

创建一个新索引 myindex-000002

Create a new index, myindex-000002.

将写别名 myindex-writemyindex-000001 切换到 myindex-000002

Switch the write alias, myindex-write, from myindex-000001 to myindex-000002.

使用 mass indexer等重新索引。

Reindex, for example using the mass indexer.

将读别名 myindex-read_从 _myindex-000001_切换到 _myindex-000002

Switch the read alias, myindex-read, from myindex-000001 to myindex-000002.

删除 myindex-000001

Delete myindex-000001.

请注意,这仅在 Hibernate Search 映射没有更改时才有效;具有不断变化的模式的零宕机升级将更加复杂。您将在 HSEARCH-2861HSEARCH-3499 中找到关于此主题的讨论。

Note this will only work if the Hibernate Search mapping did not change; a zero-downtime upgrade with a changing schema would be considerably more complex. You will find discussions on this topic in HSEARCH-2861 and HSEARCH-3499.

18.9.2. no-alias: a strategy without index aliases

此策略主要对旧版集群有用。

This strategy is mostly useful on legacy clusters.

对于 Hibernate Search 中名称为 myIndex 的索引:

For an index whose name in Hibernate Search is myIndex:

  1. If Hibernate Search creates the index automatically, it will name the index myindex and will not create any alias.

  2. Write operations (indexing, purge, …​) will target the index directly by its name, myindex.

  3. Read operations (searching, explaining, …​) will target the index directly by its name myindex.

18.9.3. Custom strategy

如果内置布局策略不符合您的要求,您可以通过两个简单步骤定义自定义布局:

If the built-in layout strategies do not fit your requirements, you can define a custom layout in two simple steps:

  • Define a class that implements the interface org.hibernate.search.backend.elasticsearch.index.layout.IndexLayoutStrategy.

  • Configure the backend to use that implementation by setting the configuration property hibernate.search.backend.layout.strategy to a bean reference pointing to the implementation, for example class:com.mycompany.MyLayoutStrategy.

例如,下面的实现将为名为 myIndex 的索引生成以下布局:

For example, the implementation below will lead to the following layout for an index named myIndex:

  1. Write operations (indexing, purge, …​) will target the alias myindex-write.

  2. Read operations (searching, explaining, …​) will target the alias myindex (no suffix).

  3. If Hibernate Search creates the index automatically at exactly 19:19:00 on November 6th, 2017, it will name the index myindex-20171106-191900-000000000.

示例 431. 使用 Elasticsearch 后端实现自定义索引布局策略

. Example 431. Implementing a custom index layout strategy with the Elasticsearch backend

import java.time.Clock;
import java.time.Instant;
import java.time.ZoneOffset;
import java.time.format.DateTimeFormatter;
import java.util.Locale;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.hibernate.search.backend.elasticsearch.index.layout.IndexLayoutStrategy;

public class CustomLayoutStrategy implements IndexLayoutStrategy {

    private static final DateTimeFormatter INDEX_SUFFIX_FORMATTER =
            DateTimeFormatter.ofPattern( "uuuuMMdd-HHmmss-SSSSSSSSS", Locale.ROOT )
                    .withZone( ZoneOffset.UTC );
    private static final Pattern UNIQUE_KEY_PATTERN =
            Pattern.compile( "(.*)-\\d+-\\d+-\\d+" );

    @Override
    public String createInitialElasticsearchIndexName(String hibernateSearchIndexName) {
        // Clock is Clock.systemUTC() in production, may be overridden in tests
        Clock clock = MyApplicationClock.get();
        return hibernateSearchIndexName + "-"
                + INDEX_SUFFIX_FORMATTER.format( Instant.now( clock ) );
    }

    @Override
    public String createWriteAlias(String hibernateSearchIndexName) {
        return hibernateSearchIndexName + "-write";
    }

    @Override
    public String createReadAlias(String hibernateSearchIndexName) {
        return hibernateSearchIndexName;
    }

    @Override
    public String extractUniqueKeyFromHibernateSearchIndexName(
            String hibernateSearchIndexName) {
        return hibernateSearchIndexName;
    }

    @Override
    public String extractUniqueKeyFromElasticsearchIndexName(
            String elasticsearchIndexName) {
        Matcher matcher = UNIQUE_KEY_PATTERN.matcher( elasticsearchIndexName );
        if ( !matcher.matches() ) {
            throw new IllegalArgumentException(
                    "Unrecognized index name: " + elasticsearchIndexName
            );
        }
        return matcher.group( 1 );
    }
}

18.9.4. Retrieving index or alias names

用于读写的索引或别名名称可从 metamodel中检索。

Index or alias names used to read and write can be retrieved from the metamodel.

示例 432. 从 Elasticsearch 索引管理器中检索索引名称

. Example 432. Retrieving the index names from an Elasticsearch index manager

SearchMapping mapping = /* ... */ (1)
IndexManager indexManager = mapping.indexManager( "Book" ); (2)
ElasticsearchIndexManager esIndexManager = indexManager.unwrap( ElasticsearchIndexManager.class ); (3)
ElasticsearchIndexDescriptor descriptor = esIndexManager.descriptor();(4)
String readName = descriptor.readName();(5)
String writeName = descriptor.writeName();(5)

18.10. Schema ("mapping")

Elasticsearch 中所谓的架构是分配给每个索引的架构,它指定每个“属性”的数据类型和功能(在 Hibernate Search 中称为“索引字段”)。

What Elasticsearch calls the "mapping" is the schema assigned to each index, specifying the data type and capabilities of each "property" (called an "index field" in Hibernate Search).

在大部分情况下,Elasticsearch 映射是从 the mapping configured through Hibernate Search’s mapping APIs推断出来的,这些映射是通用的,并且与 Elasticsearch 无关。

For the most part, the Elasticsearch mapping is inferred from the mapping configured through Hibernate Search’s mapping APIs, which are generic and independent of Elasticsearch.

本节对针对 Elasticsearch 后端的一些特定方面进行了说明。

Aspects that are specific to the Elasticsearch backend are explained in this section.

Hibernate Search 可以配置为在通过 schema management 创建索引时将映射推送到 Elasticsearch。

Hibernate Search can be configured to push the mapping to Elasticsearch when creating the indexes through schema management.

18.10.1. Field types

Available field types

Elasticsearch 后端并不直接支持某些类型,但它们仍然可以工作,因为它们由映射器“桥接”。例如,实体模型中的 java.util.Date “桥接”到 Elasticsearch 后端支持的 java.time.Instant。有关更多信息,请参阅 Supported property types

Some types are not supported directly by the Elasticsearch backend, but will work anyway because they are "bridged" by the mapper. For example a java.util.Date in your entity model is "bridged" to java.time.Instant, which is supported by the Elasticsearch backend. See Supported property types for more information.

不在此列表中的字段类型仍然可以使用更少的工作:

Field types that are not in this list can still be used with a little bit more work:

如果实体模型中的属性具有不受支持的类型,但可以转换为受支持的类型,则需要桥接。请参见 Binding and bridges

If a property in the entity model has an unsupported type, but can be converted to a supported type, you will need a bridge. See Binding and bridges.

如果您需要一个 Hibernate Search 不支持的特定类型的索引字段,您将需要定义本机字段类型的桥接器。请参阅 Index field type DSL extension

If you need an index field with a specific type that is not supported by Hibernate Search, you will need a bridge that defines a native field type. See Index field type DSL extension.

表 14. Elasticsearch 后端支持的字段类型

Table 14. Field types supported by the Elasticsearch backend

Field type

Data type in Elasticsearch

Limitations

java.lang.String

text if an analyzer is defined, keyword otherwise

-

java.lang.Byte

byte

-

java.lang.Short

short

-

java.lang.Integer

integer

-

java.lang.Long

long

-

java.lang.Double

double

-

java.lang.Float

float

-

java.lang.Boolean

boolean

-

java.math.BigDecimal

scaled_float with a scaling_factor equal to 10^(decimalScale)

-

java.math.BigInteger

scaled_float with a scaling_factor equal to 10^(decimalScale)

-

java.time.Instant

date with format uuuu-MM-dd’T’HH:mm:ss.SSSSSSSSSZZZZZ

Lower range/resolution

java.time.LocalDate

date with format uuuu-MM-dd

Lower range/resolution

java.time.LocalTime

date with format HH:mm:ss.SSSSSSSSS

Lower range/resolution

java.time.LocalDateTime

date with format uuuu-MM-dd’T’HH:mm:ss.SSSSSSSSS

Lower range/resolution

java.time.ZonedDateTime

date with format uuuu-MM-dd’T’HH:mm:ss.SSSSSSSSSZZZZZ'['VV']'

Lower range/resolution

java.time.OffsetDateTime

date with format uuuu-MM-dd’T’HH:mm:ss.SSSSSSSSSZZZZZ

Lower range/resolution

java.time.OffsetTime

date with format HH:mm:ss.SSSSSSSSSZZZZZ

Lower range/resolution

java.time.Year

date with format uuuu

Lower range/resolution

java.time.YearMonth

date with format uuuu-MM

Lower range/resolution

java.time.MonthDay

date with format uuuu-MM-dd. The year is always set to 0.

-

GeoPoint

geo_point

-

日期/时间字段的范围和精度Elasticsearch date 类型不支持 java.time 类型可以表示的全部年份范围:

Range and resolution of date/time fields The Elasticsearch date type does not support the whole range of years that can be represented in java.time types:

_java.time_可以表示从 _-999.999.999_到 _999.999.999_的年份。

java.time can represent years ranging from -999.999.999 to 999.999.999.

Elasticsearch 的 _date_类型支持从年份 _-292.275.054_到年份 _292.278.993_的时间范围。

Elasticsearch’s date type supports dates ranging from year -292.275.054 to year 292.278.993.

超出范围的值会触发索引失败。

Values that are out of range will trigger indexing failures.

精度也较低:

Resolution is also lower:

java.time 支持纳秒精度。

java.time supports nanosecond-resolution.

Elasticsearch 的 _date_类型支持毫秒分辨率。

Elasticsearch’s date type supports millisecond-resolution.

索引时,毫秒精度以上的精度会丢失。

Precision beyond the millisecond will be lost when indexing.

Index field type DSL extension

并非所有 Elasticsearch 字段类型都在 Hibernate Search 中得到内置支持。但是,通过利用“原生”字段类型,仍然可以使用不受支持的字段类型。使用此字段类型,Elasticsearch“映射”可直接定义为 JSON,就能访问 Elasticsearch 提供的一切功能。

Not all Elasticsearch field types have built-in support in Hibernate Search. Unsupported field types can still be used, however, by taking advantage of the "native" field type. Using this field type, the Elasticsearch "mapping" can be defined as JSON directly, giving access to everything Elasticsearch can offer.

以下是如何使用 Elasticearch“原生”类型的示例。

Below is an example of how to use the Elasticearch "native" type.

示例 433. 使用 Elasticearch “本机”类型

. Example 433. Using the Elasticearch "native" type

public class IpAddressValueBinder implements ValueBinder { (1)
    @Override
    public void bind(ValueBindingContext<?> context) {
        context.bridge(
                String.class,
                new IpAddressValueBridge(),
                context.typeFactory() (2)
                        .extension( ElasticsearchExtension.get() ) (3)
                        .asNative() (4)
                                .mapping( "{\"type\": \"ip\"}" ) (5)
        );
    }

    private static class IpAddressValueBridge implements ValueBridge<String, JsonElement> {
        @Override
        public JsonElement toIndexedValue(String value,
                ValueBridgeToIndexedValueContext context) {
            return value == null ? null : new JsonPrimitive( value ); (6)
        }

        @Override
        public String fromIndexedValue(JsonElement value,
                ValueBridgeFromIndexedValueContext context) {
            return value == null ? null : value.getAsString(); (7)
        }
    }
}
@Entity
@Indexed
public class CompanyServer {

    @Id
    @GeneratedValue
    private Integer id;

    @NonStandardField( (1)
            valueBinder = @ValueBinderRef(type = IpAddressValueBinder.class) (2)
    )
    private String ipAddress;

    // Getters and setters
    // ...

}

18.10.2. Entity type name mapping

当 Hibernate Search 执行针对多个实体类型(即多个索引)的搜索查询时,它需要确定每个搜索结果的 entity type,以便将其映射回一个实体。

When Hibernate Search performs a search query targeting multiple entity types, and thus multiple indexes, it needs to determine the entity type of each search hit in order to map it back to an entity.

有多种策略用于处理此“实体类型名称解析”,每种策略各有优缺点。

There are multiple strategies to handle this "entity type name resolution", and each has pros and cons.

此策略在后端级别设置:

The strategy is set at the backend level:

hibernate.search.backend.mapping.type_name.strategy = discriminator

此属性的默认值为 discriminator

The default for this property is discriminator.

有关可用策略的详细信息,请参阅以下小节。

See the following subsections for details about available strategies.

discriminator: type name mapping using a discriminator field

使用 discriminator 策略,判别符字段用于直接从每个文档检索实体类型名称。

With the discriminator strategy, a discriminator field is used to retrieve the entity type name directly from each document.

索引时,__entity_type_字段会自动填充每个文档的 entity type名称。

When indexing, the _entity_type field is populated transparently with the name of the entity type for each document.

搜索时,_entity_type 字段的 docvalues 会透明地从 Elasticsearch 请求并从响应中提取。

When searching, the docvalues for the _entity_type field are transparently requested from Elasticsearch and extracted from the response.

优点:

Pros:

  1. Works correctly when targeting index aliases.

缺点:

Cons:

  1. Small storage overhead: a few bytes of storage per document.

  2. Requires full reindexing if an entity name changes, even if the index name doesn’t change.

index-name: type name mapping using the index name

使用 _index-name_策略,为每个搜索结果返回的 __index_元字段用于解析索引名称,然后解析 entity type名称。

With the index-name strategy, the _index meta-field returned for each search hit is used to resolve the index name, and from that the entity type name.

优点:

Pros:

  1. No storage overhead.

缺点:

Cons:

  1. Relies on the actual index name, not aliases, because the _index meta-field returned by Elasticsearch contains the actual index name (e.g. myindex-000001), not the alias (e.g. myindex-read). Thus, if indexes do not follow the default naming scheme <hibernateSearchIndexName>-<6 digits>, a custom index layout must be configured.

18.10.3. Dynamic mapping

默认情况下,Hibernate Search 将 Elasticsearch 索引映射中的 dynamic 属性设置为 strict 。这意味着尝试索引映射中不存在字段的文档将导致索引失败。

By default, Hibernate Search sets the dynamic property in Elasticsearch index mappings to strict. This means that attempting to index documents with fields that are not present in the mapping will lead to an indexing failure.

如果 Hibernate Search 是唯一的客户端,这不会成问题,因为 Hibernate Search 通常仅对已声明的模式字段进行操作。对于我们需要更改此设置的其他情况,可以使用以下索引级别属性更改值。

If Hibernate Search is the only client, that won’t be a problem, since Hibernate Search usually works on declared schema fields only. For the other cases in which we need to change this setting, we can use the following index-level property to change the value.

# To configure the defaults for all indexes:
hibernate.search.backend.dynamic_mapping = strict
# To configure a specific index:
hibernate.search.backend.indexes.<index-name>.dynamic_mapping = strict

此属性的默认值为 strict

The default for this property is strict.

我们说过 Hibernate Search 通常在已声明的模式字段上工作。更确切地说,如果未定义 Dynamic fields with field templates ,它始终会这样。当定义字段模板时, dynamic 将被强制为 true ,以允许动态字段。在这种情况下, dynamic_mapping 属性的值将被忽略。

We said that Hibernate Search usually works on declared schema fields. More precisely, it always does if no Dynamic fields with field templates are defined. When field templates are defined, dynamic will be forced to true, in order to allow for dynamic fields. In that case, the value of the dynamic_mapping property is ignored.

18.10.4. Multi-tenancy

根据在当前会话中定义的租户 ID,多租户功能得到支持并且会以透明的方式处理:

Multi-tenancy is supported and handled transparently, according to the tenant ID defined in the current session:

  1. documents will be indexed with the appropriate values, allowing later filtering;

  2. queries will filter results appropriately.

如果在映射器中启用了多租户,则在后端中会自动启用多租户,例如,如果 a multi-tenancy strategy is selected in Hibernate ORM,或者如果 multi-tenancy is explicitly configured in the Standalone POJO mapper

The multi-tenancy is automatically enabled in the backend if it is enabled in the mapper, e.g. if a multi-tenancy strategy is selected in Hibernate ORM, or if multi-tenancy is explicitly configured in the Standalone POJO mapper.

但是,可以手动启用多租户功能。

However, it is possible to enable multi-tenancy manually.

多租户策略是在后端级别设置的:

The multi-tenancy strategy is set at the backend level:

hibernate.search.backend.multi_tenancy.strategy = none

有关可用策略的详细信息,请参阅以下小节。

See the following subsections for details about available strategies.

none: single-tenancy

none 策略(默认策略)完全禁用多租户功能。

The none strategy (the default) disables multi-tenancy completely.

尝试设置租户 ID 会导致索引编制失败。

Attempting to set a tenant ID will lead to a failure when indexing.

discriminator: type name mapping using the index name

使用 discriminator 策略,所有租户的所有文档都存储在同一个索引中。每个文档的 Elasticsearch ID 设置为租户 ID 和原始 ID 的连接。

With the discriminator strategy, all documents from all tenants are stored in the same index. The Elasticsearch ID of each document is set to the concatenation of the tenant ID and original ID.

在索引时,为每个文档透明填充两个字段:

When indexing, two fields are populated transparently for each document:

  1. _tenant_id: the "discriminator" field holding the tenant ID.

  2. _tenant_doc_id: a field holding the original (tenant-scoped) document ID.

在搜索时,将针对租户 ID 字段的过滤器透明添加到搜索查询,以便仅返回当前租户的搜索结果。ID 字段用于检索原始文档 ID。

When searching, a filter targeting the tenant ID field is added transparently to the search query to only return search hits for the current tenant. The ID field is used to retrieve the original document IDs.

18.10.5. Custom index mapping

Basics

Hibernate Search 可以 create and validate indexes,但默认情况下,创建的索引仅包括编制索引和搜索所需的最低限度:映射和分析设置。如果您需要自定义一些 mapping parameters,可以向 Hibernate Search 提供自定义映射:在创建索引时,它将包括自定义映射。

Hibernate Search can create and validate indexes, but by default created indexes will only include the bare minimum required to index and search: the mapping, and the analysis settings. Should you need to customize some mapping parameters, it is possible to provide a custom mapping to Hibernate Search: it will include the custom mapping when creating an index.

Hibernate Search 映射与自定义 Elasticsearch 映射的一致性将不会以任何方式得到检查。您有责任确保映射中的任何覆盖都可以工作,例如您不会将索引字段的类型从 text 更改为 integer ,或对用于排序的字段禁用 doc_values

The consistency of the custom Elasticsearch mapping with the Hibernate Search mapping will not get checked in any way. You are responsible for making sure that any override in your mapping can work, e.g. that you’re not changing the type of an index field from text to integer, or disabling doc_values on a field used for sorting.

无效的自定义映射在引导时可能不会触发任何异常,但在以后索引或查询时会触发。在最坏的情况下,它可能不会触发任何异常,而只是导致错误的搜索结果。极其谨慎。

An invalid custom mapping may not trigger any exception on bootstrap, but later while indexing or querying. In the worst case, it could not trigger any exception, but simply lead to incorrect search results. Exercise extreme caution.

# To configure the defaults for all indexes:
hibernate.search.backend.schema_management.mapping_file = custom/index-mapping.json
# To configure a specific index:
hibernate.search.backend.indexes.<index-name>.schema_management.mapping_file = custom/index-mapping.json
示例 434. custom/index-mapping.json 文件可能的內容

. Example 434. Possible content of custom/index-mapping.json file

{
  "properties":{
    "userField":{
      "type":"keyword",
      "index":true,
      "norms":true,
      "doc_values":true
    },
    "userObject":{
      "dynamic":"true",
      "type":"object"
    }
  },
  "_source": {
    "enabled": false
  }
}

仅在自定义映射文件中定义但尚未由 Hibernate Search 映射的属性将对 Hibernate Search 不可见。

Properties that are only defined in the custom mappings file but not mapped by Hibernate Search will not be visible to Hibernate Search.

这意味着如果您尝试在 Search DSL 中引用这些属性,或者尝试在 writing to a document from a bridge 中引用这些属性,Hibernate Search 将抛出异常。

This means Hibernate Search will throw exceptions if you try to reference these properties in the Search DSL, or when writing to a document from a bridge

该文件不需要包含完整的映射:Hibernate Search 会自动将缺少的属性(索引字段)注入给定的映射。

The file does not need to contain the full mapping: Hibernate Search will automatically inject missing properties (index fields) in the given mapping.

将如下处理给定映射与 Hibernate Search 生成的映射之间的冲突:

Conflicts between the given mapping and the mapping generated by Hibernate Search will be handled as follows:

  • The dynamic_templates/_routing/dynamic mapping parameters will be those from the given mapping, falling back to the value generated by Hibernate Search (if any).

  • Any other mapping parameters besides the properties at the root of the mapping will be those from the given mapping; those generated by Hibernate Search will be ignored.

  • properties will be merged, using properties defined in both the given mapping and the mapping generated by Hibernate Search.

  • If a property is defined on both sides, it will be merged recursively, following steps 1-4.

在上面的示例中,合成的结果映射可能如下所示:

In the example above, the resulting, merged mapping could look like this:

示例 435. 将 custom/index-mapping.json 的内容与 Hibernate Search 映射合并后的可能的映射结果

. Example 435. Possible resulting mapping after merging the content of custom/index-mapping.json with the Hibernate Search mapping

{
  "_source":{
    "enabled":false
  },
  "dynamic":"strict",
  "properties":{
    "_entity_type":{ (1)
      "type":"keyword",
      "index":false
    },
    "title":{ (2)
      "type":"text",
      "analyzer":"english"
    },
    "userField":{
      "type":"keyword",
      "norms":true
    },
    "userObject":{
      "type":"object",
      "dynamic":"true"
    }
  }
}
Disabling _source

使用该功能,可以 disable the _source field 。例如,您可以传递以下 custom/index-mapping.json 文件:

Using this feature it is possible to disable the _source field. For instance, you could pass a custom/index-mapping.json file like the following:

示例 436. 可能的 custom/index-mapping.json 文件内容以禁用 _source 字段

. Example 436. Possible content of custom/index-mapping.json file to disable the _source field

{
  "_source": {
    "enabled": false
  }
}

禁用 _source 便于减小文件系统中 Elasticsearch 索引的大小,但这是有代价的。

Disabling the _source is useful to reduce the size of Elasticsearch indexes on the filesystem, but it comes at a cost.

一些 projections 依赖于启用 source 。如果您尝试在禁用 source 的情况下使用投影,则行为未定义:搜索查询可能返回 null 次命中,也可能完全失败并出现异常。

Several projections rely on the source being enabled. If you try to use projections with source disabled, behavior is undefined: the search query may return null hits, or it may fail completely with exceptions.

18.11. Analysis

18.11.1. Basics

Analysis 是由分析器执行的文本处理,包括在索引编制(文档处理)时和在搜索(查询处理)时。

Analysis is the text processing performed by analyzers, both when indexing (document processing) and when searching (query processing).

全部 built-in Elasticsearch analyzers均可立即使用,无需在 Hibernate Search 中进行任何配置:只需在 Hibernate Search 期望为分析器名称的任何位置使用其名称即可。但是,分析也可以显式配置。

All built-in Elasticsearch analyzers can be used transparently, without any configuration in Hibernate Search: just use their name wherever Hibernate Search expects an analyzer name. However, analysis can also be configured explicitly.

Elasticsearch 分析配置不会在启动时立即应用:它需要推送到 Elasticsearch 集群。

Elasticsearch analysis configuration is not applied immediately on startup: it needs to be pushed to the Elasticsearch cluster.

只有在通过 schema management 告知 Hibernate Search 之后,它才会将配置推送到集群。

Hibernate Search will only push the configuration to the cluster if instructed to do so through schema management.

要配置 Elasticsearch 后端中的分析,您需要:

To configure analysis in an Elasticsearch backend, you will need to:

  • Define a class that implements the org.hibernate.search.backend.elasticsearch.analysis.ElasticsearchAnalysisConfigurer interface.

  • Configure the backend to use that implementation by setting the configuration property hibernate.search.backend.analysis.configurer to a bean reference pointing to the implementation, for example class:com.mycompany.MyAnalysisConfigurer.

当启动时,Hibernate Search 将调用此实现的 configure 方法,而配置器能够充分利用 DSL 定义 analyzers and normalizers

Hibernate Search will call the configure method of this implementation on startup, and the configurer will be able to take advantage of a DSL to define analyzers and normalizers.

可将不同的分析配置器分配给每个索引:

A different analysis configurer can be assigned to each index:

# To set the default configurer for all indexes: hibernate.search.backend.analysis.configurer = class:com.mycompany.MyAnalysisConfigurer # To assign a specific configurer to a specific index: hibernate.search.backend.indexes.<index-name>.analysis.configurer = class:com.mycompany.MySpecificAnalysisConfigurer # To set the default configurer for all indexes: hibernate.search.backend.analysis.configurer = class:com.mycompany.MyAnalysisConfigurer # To assign a specific configurer to a specific index: hibernate.search.backend.indexes.<index-name>.analysis.configurer = class:com.mycompany.MySpecificAnalysisConfigurer 如果将特定配置器分配给索引,则该索引将忽略默认配置器:只考虑来自特定配置器的定义。

# To set the default configurer for all indexes: hibernate.search.backend.analysis.configurer = class:com.mycompany.MyAnalysisConfigurer # To assign a specific configurer to a specific index: hibernate.search.backend.indexes.<index-name>.analysis.configurer = class:com.mycompany.MySpecificAnalysisConfigurer # To set the default configurer for all indexes: hibernate.search.backend.analysis.configurer = class:com.mycompany.MyAnalysisConfigurer # To assign a specific configurer to a specific index: hibernate.search.backend.indexes.<index-name>.analysis.configurer = class:com.mycompany.MySpecificAnalysisConfigurer If a specific configurer is assigned to an index, the default configurer will be ignored for that index: only definitions from the specific configurer will be taken into account.

18.11.2. Built-in analyzers

开箱即用的内置分析器不需要显式配置。如有必要,通过用相同名称定义自己的分析器可以覆盖它们。

Built-in analyzers are available out-of-the-box and don’t require explicit configuration. If necessary, they can be overridden by defining your own analyzer with the same name.

Elasticsearch 后端带有几个内置分析器。确切的列表取决于 Elasticsearch 的版本,且可在 here 中找到。

The Elasticsearch backend comes with several built-in analyzers. The exact list depends on the version of Elasticsearch and can be found here.

无论 Elasticsearch 版本如何,名称在 org.hibernate.search.engine.backend.analysis.AnalyzerNames 中列为常量的分析器始终可用:

Regardless of the Elasticsearch version, analyzers whose name is listed as a constant in org.hibernate.search.engine.backend.analysis.AnalyzerNames are always available:

default

@FullTextField 默认使用的分析器。

The analyzer used by default with @FullTextField.

这只是 standard 的别名。

This is just an alias for standard by default.

standard

默认行为:首先,使用标准标记化器进行标记化,该标记化器遵循 Unicode 文本分段算法的单词分隔规则,如 Unicode Standard Annex #29中指定的那样。然后,将每个标记小写。

Default behavior: first, tokenize using the standard tokenizer, which follows Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29. Then, lowercase each token.

simple

默认行为:首先在非字母字符处拆分文本。然后将每个单词转换为小写。

Default behavior: first, split the text at non-letter characters. Then, lowercase each token.

whitespace

默认行为:在空格字符处拆分文本。不更改单词。

Default behavior: split the text at whitespace characters. Do not change the tokens.

stop

默认行为:首先在非字母字符处拆分文本。然后将每个单词转换为小写。最后,删除英语停用词。

Default behavior: first, split the text at non-letter characters. Then, lowercase each token. Finally, remove English stop words.

keyword

默认行为:不以任何方式更改文本。

Default behavior: do not change the text in any way.

通过这个分析器,全文字段的行为将类似于关键字字段,但功能更少:例如,没有词组聚合。

With this analyzer a full text field would behave similarly to a keyword field, but with fewer features: no terms aggregations, for example.

请考虑改用 @KeywordField

Consider using a @KeywordField instead.

18.11.3. Built-in normalizers

Elasticsearch 后端不提供任何内置规范化器。

The Elasticsearch backend does not provide any built-in normalizer.

18.11.4. Custom analyzers and normalizers

传递给配置器的上下文采用了 DSL 来定义分析器和规范化器:

The context passed to the configurer exposes a DSL to define analyzers and normalizers:

示例 437. 使用 Elasticsearch 后端实施和使用分析配置器来定义分析器和规范化器

. Example 437. Implementing and using an analysis configurer to define analyzers and normalizers with the Elasticsearch backend

package org.hibernate.search.documentation.analysis;

import org.hibernate.search.backend.elasticsearch.analysis.ElasticsearchAnalysisConfigurationContext;
import org.hibernate.search.backend.elasticsearch.analysis.ElasticsearchAnalysisConfigurer;

public class MyElasticsearchAnalysisConfigurer implements ElasticsearchAnalysisConfigurer {
    @Override
    public void configure(ElasticsearchAnalysisConfigurationContext context) {
        context.analyzer( "english" ).custom() (1)
                .tokenizer( "standard" ) (2)
                .charFilters( "html_strip" ) (3)
                .tokenFilters( "lowercase", "snowball_english", "asciifolding" ); (4)

        context.tokenFilter( "snowball_english" ) (5)
                .type( "snowball" )
                .param( "language", "English" ); (6)

        context.normalizer( "lowercase" ).custom() (7)
                .tokenFilters( "lowercase", "asciifolding" );

        context.analyzer( "french" ).custom() (8)
                .tokenizer( "standard" )
                .tokenFilters( "lowercase", "snowball_french", "asciifolding" );

        context.tokenFilter( "snowball_french" )
                .type( "snowball" )
                .param( "language", "French" );
    }
}
(1)
hibernate.search.backend.analysis.configurer = class:org.hibernate.search.documentation.analysis.MyElasticsearchAnalysisConfigurer

也可以向带参数的内置分析器分配一个名称:

It is also possible to assign a name to a parameterized built-in analyzer:

示例 438. 在 Elasticsearch 后端中命名参数化内置分析器

. Example 438. Naming a parameterized built-in analyzer in the Elasticsearch backend

context.analyzer( "english_stopwords" ).type( "standard" ) (1)
        .param( "stopwords", "_english_" ); (2)

要了解有哪些分析器、字符过滤器、分词器和分词器过滤器可用,请参阅文档:

To know which analyzers, character filters, tokenizers and token filters are available, refer to the documentation:

  1. If you want to use a built-in analyzer and not create your own: analyzers;

  2. If you want to define your own analyzer: character filters, tokenizers, token filters.

18.11.5. Overriding the default analyzer

使用 @FullTextField 但未明确指定分析器时的默认分析器名为 default

The default analyzer when using @FullTextField without specifying an analyzer explicitly, is named default.

如同其他 built-in analyzer 一样,通过定义同名 custom analyzer,可以替代默认分析器:

Like any other built-in analyzer, it is possible to override the default analyzer by defining a custom analyzer with the same name:

示例 439. 在 Elasticsearch 后端中覆盖默认分析器

. Example 439. Overriding the default analyzer in the Elasticsearch backend

package org.hibernate.search.documentation.analysis;

import org.hibernate.search.backend.elasticsearch.analysis.ElasticsearchAnalysisConfigurationContext;
import org.hibernate.search.backend.elasticsearch.analysis.ElasticsearchAnalysisConfigurer;

public class MyElasticsearchAnalysisConfigurer implements ElasticsearchAnalysisConfigurer {
    @Override
    public void configure(ElasticsearchAnalysisConfigurationContext context) {
        context.analyzer( "english" ).custom() (1)
                .tokenizer( "standard" ) (2)
                .charFilters( "html_strip" ) (3)
                .tokenFilters( "lowercase", "snowball_english", "asciifolding" ); (4)

        context.tokenFilter( "snowball_english" ) (5)
                .type( "snowball" )
                .param( "language", "English" ); (6)

        context.normalizer( "lowercase" ).custom() (7)
                .tokenFilters( "lowercase", "asciifolding" );

        context.analyzer( "french" ).custom() (8)
                .tokenizer( "standard" )
                .tokenFilters( "lowercase", "snowball_french", "asciifolding" );

        context.tokenFilter( "snowball_french" )
                .type( "snowball" )
                .param( "language", "French" );
    }
}
(1)
hibernate.search.backend.analysis.configurer = class:org.hibernate.search.documentation.analysis.DefaultOverridingElasticsearchAnalysisConfigurer

18.12. Custom index settings

Hibernate Search 可以 create and validate indexes,但是根据默认设置创建的索引只包含索引和搜索所需的最低限度:映射和分析设置。如果需要设置某些 custom index settings,可以将其提供给 Hibernate Search:在创建索引和验证索引时会包含这些设置。

Hibernate Search can create and validate indexes, but by default created indexes will only include the bare minimum required to index and search: the mapping, and the analysis settings. Should you need to set some custom index settings, it is possible to provide these settings to Hibernate Search: it will include them when creating an index and take them into account when validating an index.

# To configure the defaults for all indexes:
hibernate.search.backend.schema_management.settings_file = custom/index-settings.json
# To configure a specific index:
hibernate.search.backend.indexes.<index-name>.schema_management.settings_file = custom/index-settings.json
示例 440. Possible content of custom/index-settings.json file

. Example 440. Possible content of custom/index-settings.json file

{
  "number_of_shards": "3",
  "number_of_replicas": "3",
  "analysis": {
    "analyzer": {
      "my_standard-english": {
        "type": "standard",
        "stopwords": "_english_"
      },
      "my_analyzer_ngram": {
        "type": "custom",
        "tokenizer": "my_analyzer_ngram_tokenizer"
      }
    },
    "tokenizer": {
      "my_analyzer_ngram_tokenizer": {
        "type": "ngram",
        "min_gram": "5",
        "max_gram": "6"
      }
    }
  }
}

所提供的设置将与由 Hibernate Search 生成的设置合并,包括分析器定义。当通过 analysis configurer 和自定义设置配置分析时,将未定义该行为;不应依赖它。

The provided settings will be merged with those generated by Hibernate Search, including analyzer definitions. When analysis is configured both through an analysis configurer and these custom settings, the behavior is undefined; it should not be relied upon.

自定义索引设置必须以简化形式提供,即不带属性 the index 属性。

Custom index setting must be provided in the simplified form, the one without the attribute the index attribute.

18.12.1. Max result window size

如果使用设置 index.max_result_window,则 Hibernate Search 将使用此值限制返回的命中数大小,如果用户未在查询中定义限制。在这种情况下,如果 Hibernate Search 注意到还有更多结果,则将记录警告。

If the setting index.max_result_window is used, Hibernate Search will use this value to limit the returning hits size if no limit has been defined on the query by the user. In this case if Hibernate Search notices there are more results, a warning will be logged.

18.13. Threads

Elasticsearch 后端依赖于内部线程池来协调索引请求(添加/更新/删除)和计划请求超时。

The Elasticsearch backend relies on an internal thread pool to orchestrate indexing requests (add/update/delete) and to schedule request timeouts.

默认情况下,此池包含的线程数恰好等于引导时 JVM 可用的处理器数。可以使用配置属性更改此设置:

By default, the pool contains exactly as many threads as the number of processors available to the JVM on bootstrap. That can be changed using a configuration property:

hibernate.search.backend.thread_pool.size = 4

每索引这个数字都是 per backend,而不是每个索引。添加更多索引不会添加更多线程。

This number is per backend, not per index. Adding more indexes will not add more threads.

由于此线程池中发生的所有操作都是无阻塞的,因此将其大小提升到超出 JVM 可用的处理器内核数不会带来明显的性能优势。

As all operations happening in this thread-pool are non-blocking, raising its size above the number of processor cores available to the JVM will not bring noticeable performance benefits.

更改此设置的唯一原因是减少线程数;例如,在具有单个索引和单个索引队列的应用程序中,在具有 64 个处理器内核的机器上运行时,您可能想减少线程数。

The only reason to alter this setting would be to reduce the number of threads; for example, in an application with a single index with a single indexing queue, running on a machine with 64 processor cores, you might want to bring down the number of threads.

18.14. Indexing queues

在 Hibernate Search 发送给 Elasticsearch 的所有请求中,预期会有许多“索引”请求来创建/更新/删除特定文档。逐个发送这些请求会很低效(主要是因为网络延迟)。此外,我们通常希望保留这些请求的相对顺序,当它们与同一文档相关时。

Among all the requests sent by Hibernate Search to Elasticsearch, it is expected that there will be a lot of "indexing" requests to create/update/delete a specific document. Sending these requests one by one would be inefficient (mainly because of network latency). Also, we generally want to preserve the relative order of these requests when they are about the same documents.

由于这些原因,Hibernate Search 将这些请求推送到有序队列并依靠 Bulk API 分批发送它们。每个索引维护 10 个队列,每个队列最多包含 1000 个元素,每个队列将发送最多包含 100 个索引请求的批量请求。队列独立(并行)操作,但每个队列都会逐个发送一个批量请求,因此在任何给定时间,每个索引最多可以发送 10 个批量请求。

For these reasons, Hibernate Search pushes these requests to ordered queues and relies on the Bulk API to send them in batches. Each index maintains 10 queues holding at most 1000 elements each, and each queue will send bulk requests of at most 100 indexing requests. Queues operate independently (in parallel), but each queue sends one bulk request after the other, so at any given time there can be at most 10 bulk requests being sent for each index.

相对于同一文档 ID 的索引操作始终会被推送到同一队列。

Indexing operations relative to the same document ID are always pushed to the same queue.

为了降低 Elasticsearch 服务器上的负载,或相反,为了提高吞吐量,可以自定义队列。这可通过以下配置属性完成:

It is possible to customize the queues in order to reduce the load on the Elasticsearch server, or on the contrary to improve throughput. This is done through the following configuration properties:

# To configure the defaults for all indexes:
hibernate.search.backend.indexing.queue_count = 10
hibernate.search.backend.indexing.queue_size = 1000
hibernate.search.backend.indexing.max_bulk_size = 100
# To configure a specific index:
hibernate.search.backend.indexes.<index-name>.indexing.queue_count = 10
hibernate.search.backend.indexes.<index-name>.indexing.queue_size = 1000
hibernate.search.backend.indexes.<index-name>.indexing.max_bulk_size = 100
  1. indexing.queue_count defines the number of queues. Expects a strictly positive integer value. The default for this property is 10.

较高值会导致并行使用更多连接,这可能会提高索引吞吐量,但会产生 overloading Elasticsearch 的风险,导致 Elasticsearch 放弃一些请求并导致索引失败。

Higher values will lead to more connections being used in parallel, which may lead to higher indexing throughput, but incurs a risk of overloading Elasticsearch, leading to Elasticsearch giving up on some requests and resulting in indexing failures.

  1. indexing.queue_size defines the maximum number of elements each queue can hold. Expects a strictly positive integer value. The default for this property is 1000.

较低值可能导致较低的内存使用量,尤其是在存在大量队列时,但值过低会降低达到最大批量大小的可能性,并增加 application threads blocking 的可能性,原因是队列已满,这可能会导致降低索引吞吐量。

Lower values may lead to lower memory usage, especially if there are many queues, but values that are too low will reduce the likeliness of reaching the max bulk size and increase the likeliness of application threads blocking because the queue is full, which may lead to lower indexing throughput.

  1. indexing.max_bulk_size defines the maximum number of indexing requests in each bulk request. Expects a strictly positive integer value. The default for this property is 100.

较高值会导致在发送给 Elasticsearch 的每个 HTTP 请求中发送更多文档,这可能会提高索引吞吐量,但会产生 overloading Elasticsearch 的风险,导致 Elasticsearch 放弃一些请求并导致索引失败。

Higher values will lead to more documents being sent in each HTTP request sent to Elasticsearch, which may lead to higher indexing throughput, but incurs a risk of overloading Elasticsearch, leading to Elasticsearch giving up on some requests and resulting in indexing failures.

请注意,将此数字提高到高于队列大小时不起作用,因为批量不能包含比队列中包含的更多请求。

Note that raising this number above the queue size has no effect, as bulks cannot include more requests than are contained in the queue.

当队列已满时,任何请求索引的尝试都会阻塞,直到该请求可以放入队列。

When a queue is full, any attempt to request indexing will block until the request can be put into the queue.

为了达到合理的性能水平,务必将队列的大小设置为足够高的数字,以便仅在应用程序负载非常高时才会发生此类阻塞。

In order to achieve a reasonable level of performance, be sure to set the size of queues to a high enough number that this kind of blocking only happens when the application is under very high load.

Elasticsearch 节点只能处理如此多的并行请求,具体而言,它们在任何给定时间都 limit the amount of memory 可用于存储所有待处理请求。

Elasticsearch nodes can only handle so many parallel requests, and in particular they limit the amount of memory available to store all pending requests at any given time.

为了避免索引故障,请避免为队列数和最大批量大小使用过大的数字,尤其是当您希望您的索引包含大量文档时。

In order to avoid indexing failures, avoid using overly large numbers for the number of queues and the maximum bulk size, especially if you expect your index to hold large documents.

18.15. Writing and reading

有关在 Hibernate

For a preliminary introduction to writing to and reading from indexes in Hibernate Search, including in particular the concepts of commit and refresh, see Commit and refresh.

18.15.1. Commit

当写入索引时,Elasticsearch 依靠 transaction log 来确保更改(即使是未提交的更改)在 REST API 调用返回后始终是安全的。

When writing to indexes, Elasticsearch relies on a transaction log to make sure that changes, even uncommitted, are always safe as soon as the REST API call returns.

因此,“提交”的概念对 Elasticsearch 后端并不重要,提交要求在很大程度上无关紧要。

For that reason, the concept of "commit" is not as important to the Elasticsearch backend, and commit requirements are largely irrelevant.

18.15.2. Refresh

当从索引中读取时,Elasticsearch 依靠周期性刷新的索引读取器,这意味着搜索查询将返回稍过时的结果,除非强制刷新:这称为 near-real-time 行为。

When reading from indexes, Elasticsearch relies on a periodically refreshed index reader, meaning that search queries will return slightly out-of-date results, unless a refresh was forced: this is called near-real-time behavior.

默认情况下,索引读取器每秒都会刷新一次,但可以通过索引设置在 Elasticsearch 端进行自定义:请参阅 refresh_intervalthis page 处的设置。

By default, the index reader is refreshed every second, but this can be customized on the Elasticsearch side through index settings: see the refresh_interval setting on this page.

使用 Elasticsearch 后端进行搜索依赖于 same APIs as any other backend

Searching with the Elasticsearch backend relies on the same APIs as any other backend.

本节详细介绍与搜索相关的 Elasticsearch 特定配置。

This section details Elasticsearch-specific configuration related to searching.

18.16.1. Scroll timeout

使用 Elasticsearch 后端, scrolls 会受到超时限制。如果 next() 没有长时间被调用(默认值为 60 秒),则滚动将自动关闭,且下次调用 next() 时会失败。

With the Elasticsearch backend, scrolls are subject to timeout. If next() is not called for a long period of time (default: 60 seconds), the scroll will be closed automatically and the next call to next() will fail.

在后端级别使用以下配置属性配置超时(以秒为单位):

Use the following configuration property at the backend level to configure the timeout (in seconds):

hibernate.search.backend.scroll_timeout = 60

此属性的默认值为 60

The default for this property is 60.

18.16.2. Partial shard failure

使用 Elasticsearch 后端, fetching results 可能会导致部分分片失败,即一些分片将无法产生结果,而其他分片会成功。在这种情况中,Elasticsearch 集群将生成一个响应,该响应的状态代码为成功状态代码,但将包含关于失败分片的其他信息,以及它们失败的原因。

With the Elasticsearch backend, fetching results may result in partial shard failures, i.e. some of the shards will fail to produce results while the others will succeed. In such situations, an Elasticsearch cluster will produce a response with a successful status code, but will contain additional information on failed shards, and the reason they have failed.

默认情况下,Hibernate Search 将在获取结果时检查是否有任何分片失败,如果失败将抛出一个异常。

By default, Hibernate Search will check if any shards have failed while fetching the results and if so — will throw an exception.

使用后端级别的以下配置属性来更改默认行为:

Use the following configuration property at the backend level to change the default behaviour:

hibernate.search.backend.query.shard_failure.ignore = true

该属性的默认值为 false

The default for this property is false.