Hibernate Search 中文操作指南

13. Managing the index schema

13.1. Basics

在使用索引建立索引或搜索之前,必须在磁盘(Lucene)或远程群集(Elasticsearch)中创建这些索引。尤其是对于 Elasticsearch,这种创建可能并不明显,因为它需要描述每个索引的架构,尤其是包括:

Before indexes can be used for indexing or searching, they must be created on disk (Lucene) or in the remote cluster (Elasticsearch). With Elasticsearch in particular, this creation may not be obvious since it requires to describe the schema for each index, which includes in particular:

  1. the definition of every analyzer or normalizer used in this index;

  2. the definition of every single field used in this index, including in particular its type, the analyzer assigned to it, whether it requires doc values, etc.

Hibernate Search 具有生成此架构所需的所有必要信息,因此可以将管理架构的任务委托给 Hibernate Search。

Hibernate Search has all the necessary information to generate this schema automatically, so it is possible to delegate the task of managing the schema to Hibernate Search.

13.2. Automatic schema management on startup/shutdown

可以将属性 hibernate.search.schema_management.strategy 设置为以下值之一,以便定义在启动和关闭时对索引及其架构执行的操作。

The property hibernate.search.schema_management.strategy can be set to one of the following values in order to define what to do with the indexes and their schema on startup and shutdown.

Strategy

Definition

Warnings

none

A strategy that does not do anything on startup or shutdown.Indexes and their schema will not be created nor deleted on startup or shutdown. Hibernate Search will not even check that the index actually exists.

With Elasticsearch, indexes and their schema will have to be created explicitly before startup.

validate

A strategy that does not change indexes nor their schema, but checks that indexes exist and validates their schema on startup.An exception will be thrown on startup if:Indexes are missingOR, with the Elasticsearch backend only, indexes exist but their schema does not match the requirements of the Hibernate Search mapping: missing fields, fields with incorrect type, missing analyzer definitions or normalizer definitions, …​"Compatible" differences such as extra fields are ignored.

Indexes and their schema will have to be created explicitly before startup.With the Lucene backend, validation is limited to checking that the indexes exist, because local Lucene indexes don’t have a schema.

create

A strategy that creates missing indexes and their schema on startup, but does not touch existing indexes and assumes their schema is correct without validating it.

Creating a schema does not populate indexed data.

create-or-validate (default)

A strategy that creates missing indexes and their schema on startup, and validates the schema of existing indexes.With the Elasticsearch backend only, an exception will be thrown on startup if some indexes already exist but their schema does not match the requirements of the Hibernate Search mapping: missing fields, fields with incorrect type, missing analyzer definitions or normalizer definitions, …​"Compatible" differences such as extra fields are ignored.

Creating a schema does not populate indexed data.With the Lucene backend, validation is limited to checking that the indexes exist, because local Lucene indexes don’t have a schema.

create-or-update

A strategy that creates missing indexes and their schema on startup, and updates the schema of existing indexes if possible.

Updating a schema does not update indexed data.This strategy is unfit for production environments, due to several limitations including the impossibility to change the type of an existing field or the requirement to close indexes while updating analyzer definitions (which is not possible at all on AWS).With the Lucene backend, schema update is a no-op, because local Lucene indexes don’t have a schema.

drop-and-create

A strategy that drops existing indexes and re-creates them and their schema on startup.

All indexed data will be lost on startup.

drop-and-create-and-drop

A strategy that drops existing indexes and re-creates them and their schema on startup, then drops the indexes on shutdown.

All indexed data will be lost on startup and shutdown.

13.3. Manual schema management

架构管理不必在启动和关闭时自动进行。

Schema management does not have to happen automatically on startup and shutdown.

使用 SearchSchemaManager 接口,可以在 Hibernate Search 启动后显式触发架构管理操作。

Using the SearchSchemaManager interface, it is possible to trigger schema management operations explicitly after Hibernate Search has started.

最常见的用例是将 automatic schema management strategy 设置为 none ,并在满足某些其他条件(例如 Elasticsearch 集群已完成引导)时手动处理索引的创建/删除。

The most common use case is to set the automatic schema management strategy to none and handle the creation/deletion of indexes manually when some other conditions are met, for example the Elasticsearch cluster has finished booting.

在模式管理操作完成后,通常需要填充索引。为此,请使用 mass indexer

After schema management operations are complete, you will often want to populate indexes. To that end, use the mass indexer.

SearchSchemaManager 接口公开以下方法。

The SearchSchemaManager interface exposes the following methods.

Method

Definition

Warnings

validate()

Does not change indexes nor their schema, but checks that indexes exist and validates their schema.

With the Lucene backend, validation is limited to checking that the indexes exist, because local Lucene indexes don’t have a schema.

createIfMissing()

Creates missing indexes and their schema, but does not touch existing indexes and assumes their schema is correct without validating it.

Creating a schema does not populate indexed data.

createOrValidate()

Creates missing indexes and their schema, and validates the schema of existing indexes.

Creating a schema does not populate indexed data.With the Lucene backend, validation is limited to checking that the indexes exist, because local Lucene indexes don’t have a schema.

createOrUpdate()

Creates missing indexes and their schema, and updates the schema of existing indexes if possible.

Updating a schema does not update indexed data.With the Elasticsearch backend, updating a schema may fail.With the Elasticsearch backend, updating a schema may close indexes while updating analyzer definitions (which is not possible at all on Amazon OpenSearch Service).With the Lucene backend, schema update is a no-op, because local Lucene indexes don’t have a schema. (it just creates missing indexes).

dropIfExisting()

Drops existing indexes.

All indexed data will be lost.

dropAndCreate()

Drops existing indexes and re-creates them and their schema.

All indexed data will be lost.

以下是使用 SearchSchemaManager 删除和创建索引,然后使用 mass indexer 重新填充索引的示例。 dropAndCreateSchemaOnStart setting of the mass indexer 是实现相同结果的替代解决方案。

Below is an example using a SearchSchemaManager to drop and create indexes, then using a mass indexer to re-populate the indexes. The dropAndCreateSchemaOnStart setting of the mass indexer would be an alternative solution to achieve the same results.

示例 137。使用 SearchSchemaManager 重新初始化索引

. Example 137. Reinitializing indexes using a SearchSchemaManager

SearchSession searchSession = /* ... */ (1)
SearchSchemaManager schemaManager = searchSession.schemaManager(); (2)
schemaManager.dropAndCreate(); (3)
searchSession.massIndexer()
        .purgeAllOnStart( false )
        .startAndWait(); (4)

在创建架构管理器时,还可以选择实体类型,以仅管理这些类型的索引(及其已编入索引的子类型(如果存在)):

You can also select entity types when creating a schema manager, to manage the indexes of these types only (and their indexed subtypes, if any):

示例 138。仅使用 SearchSchemaManager 重新初始化部分索引

. Example 138. Reinitializing only some indexes using a SearchSchemaManager

SearchSchemaManager schemaManager = searchSession.schemaManager( Book.class ); (1)
schemaManager.dropAndCreate(); (2)

13.4. How schema management works

Creating/updating a schema does not create/update indexed data

通过架构管理创建或更新索引及其架构不会填充索引:

Creating or updating indexes and their schema through schema management will not populate the indexes:

新创建的索引将始终为空。

newly created indexes will always be empty.

具有最近更新架构的索引仍将包含相同的已编制索引数据,即不会将新字段添加到文档中,仅仅因为它们已添加到架构中。

indexes with a recently updated schema will still contain the same indexed data, i.e. new fields won’t be added to documents just because they were added to the schema.

这是设计使然:重新索引是一项潜在的长时间运行的任务,应明确触发。要使用数据库中的已存在数据填充索引,请使用 mass indexing

This is by design: reindexing is a potentially long-running task that should be triggered explicitly. To populate indexes with pre-existing data from the database, use mass indexing.

Dropping the schema means losing indexed data

删除架构将删除整个索引,包括所有已索引数据。

Dropping a schema will drop the whole index, including all indexed data.

已删除的索引需要通过模式管理重新创建,然后通过 mass indexing使用来自数据库中的已存在数据填充。

A dropped index will need to be re-created through schema management, then populated with pre-existing data from the database through mass indexing.

Schema validation and update are not effective with Lucene

Lucene 后端只会验证索引实际存在并创建缺失的索引,因为在 Lucene 中没有架之外的概念。

The Lucene backend will only validate that the index actually exists and create missing indexes, because there is no concept of schema in Lucene beyond the existence of index segments.

Schema validation is permissive

使用 Elasticsearch,架构验证尽可能宽松:

With Elasticsearch, schema validation is as permissive as possible:

将忽略 Hibernate Search 未知的字段。

Fields that are unknown to Hibernate Search will be ignored.

比需要更强大的设置将被认为有效。例如,在 Hibernate Search 中未标记为可排序但 Elasticsearch 中标记为 _"docvalues": true_的字段将被视为有效。

Settings that are more powerful than required will be deemed valid. For example, a field that is not marked as sortable in Hibernate Search but marked as "docvalues": true in Elasticsearch will be deemed valid.

将忽略 Hibernate Search 未知的分析器/规范器定义。

Analyzer/normalizer definitions that are unknown to Hibernate Search will be ignored.

一个例外:由于实现限制,日期格式必须与 Hibernate Search 指定的格式完全匹配。

One exception: date formats must match exactly the formats specified by Hibernate Search, due to implementation constraints.

Schema updates may fail

create-or-update 策略触发的架构更新可能会失败。这是因为架构可能会以不兼容的方式更改,例如字段的类型改变或分析器改变等。

A schema update, triggered by the create-or-update strategy, may simply fail. This is because schemas may change in an incompatible way, such as a field having its type changed, or its analyzer changed, etc.

更糟糕的是,由于更新是按索引进行处理的,因此某个索引的架构更新可能成功,但另一个索引的更新可能失败,从而导致整个架构处于半更新状态。

Worse, since updates are handled on a per-index basis, a schema update may succeed for one index but fail on another, leaving your schema as a whole half-updated.

由于这些原因,不建议在生产环境中使用架构更新。每当架构发生变化时,你应该:

For these reasons, using schema updates in a production environment is not recommended. Whenever the schema changes, you should either:

删除并创建索引,然后 reindex

drop and create indexes, then reindex.

或通过自定义脚本手动更新架构。

OR update the schema manually through custom scripts.

在这种情况下,create-or-update 策略将阻止 Hibernate Search 启动,但该策略可能已成功地为另一个索引更新了架构,从而导致回滚变得困难。

In this case, the create-or-update strategy will prevent Hibernate Search from starting, but it may already have successfully updated the schema for another index, making a rollback difficult.

Schema updates on Elasticsearch may close indexes

Elasticsearch 不允许在开放索引中更新分析器/规范化器定义。因此,当在架构更新期间必须更新分析器或规范化器定义时,Hibernate Search 将暂时停止受影响的索引。

Elasticsearch does not allow updating analyzer/normalizer definitions on an open index. Thus, when analyzer or normalizer definitions have to be updated during a schema update, Hibernate Search will temporarily stop the affected indexes.

因此,当多个客户端使用由 Hibernate Search 管理的 Elasticsearch 索引时,应谨慎使用 create-or-update 策略:应以这种方式同步这些客户端,使得在 Hibernate Search 启动时,没有其他客户端需要访问索引。

For this reason, the create-or-update strategy should be used with caution when multiple clients use Elasticsearch indexes managed by Hibernate Search: those clients should be synchronized in such a way that while Hibernate Search is starting, no other client needs to access the index.

此外,在 Amazon OpenSearch Service 运行 Elasticsearch(不是 OpenSearch)7.1 或更低版本以及 Amazon OpenSearch Serverless 中,不支持 close / open 操作,所以在尝试更新分析器定义时架构更新将失败。唯一的解决方法是避免在这些平台上进行架构更新。无论如何,在生产环境中都应该避免:请参见 [schema-management-concepts-update-failure]

Also, on Amazon OpenSearch Service running Elasticsearch (not OpenSearch) in version 7.1 or older, as well as on Amazon OpenSearch Serverless, the close/open operations are not supported, so the schema update will fail when trying to update analyzer definitions. The only workaround is to avoid the schema update on these platforms. It should be avoided in production environments regardless: see [schema-management-concepts-update-failure].

13.5. Exporting the schema

13.5.1. Exporting the schema to a set of files

schema manager提供了一种将模式导出到文件系统的方法。输出是后端特定的。

The schema manager provides a way to export schemas to the filesystem. The output is backend-specific.

模式导出是根据映射信息和配置(例如后端版本)构建的。生成的结果不会与实际后端模式进行比较或与之验证。

Schema exports are constructed based on the mapping information and configurations (e.g. such as the backend version). Resulting exports are not compared to or validated against the actual backend schema.

对于 Elasticsearch,文件提供创建索引(及其设置和映射)所需的信息。导出的文件树结构如下所示:

For Elasticsearch, the files provide the necessary information to create indexes (along with their settings and mapping). The file tree structure of an export is shown below:

# For the default backend the index schema will be written to:
.../backend/indexes/<index-name>/create-index.json
.../backend/indexes/<index-name>/create-index-query-params.json
# For additional named backends:
.../backends/<name of a particular backend>/indexes/<index-name>/create-index.json
.../backends/<name of a particular backend>/indexes/<index-name>/create-index-query-params.json
示例 139. 将架构导出到文件系统

. Example 139. Exporting the schema to the filesystem

SearchSchemaManager schemaManager = searchSession.schemaManager(); (1)
schemaManager.exportExpectedSchema( targetDirectory ); (2)

13.5.2. Exporting to a custom collector

Search schema managers允许根据此类管理器包含的数据遍历模式导出。要执行此操作,必须实现一个 _SearchSchemaCollector_并将其传递给模式管理器的 _exportExpectedSchema(..)_方法。

Search schema managers allow walking through the schema exports based on the data such managers contains. To do so a SearchSchemaCollector must be implemented and passed to the schema manager’s exportExpectedSchema(..) method.

模式导出是根据映射信息和配置(例如后端版本)构建的。生成的结果不会与实际后端模式进行比较或与之验证。

Schema exports are constructed based on the mapping information and configurations (e.g. such as the backend version). Resulting exports are not compared to or validated against the actual backend schema.

示例 140. 导出到自定义收集器

. Example 140. Exporting to a custom collector

SearchSchemaManager schemaManager = searchSession.schemaManager(); (1)
schemaManager.exportExpectedSchema(
        new SearchSchemaCollector() { (2)
            @Override
            public void indexSchema(Optional<String> backendName, String indexName, SchemaExport export) {
                String name = backendName.orElse( "default" ) + ":" + indexName; (3)
                // perform any other actions with an index schema export
            }
        }
);

要访问后端特定的功能,可应用对 SchemaExport 的扩展:

To access backend-specific functionality, an extension to SchemaExport can be applied:

_new SearchSchemaCollector() {

    @Override
    public void indexSchema(Optional<String> backendName, String indexName, SchemaExport export) {
        List<JsonObject> bodyParts = export
                .extension( ElasticsearchExtension.get() ) (1)
                .bodyParts(); (2)
    }
}_1Extend the _SchemaExport_ with the Elasticsearch extension.2Access an HTTP body of a request that is needed to create an index in an Elasticsearch cluster.
_new SearchSchemaCollector() {
    @Override
    public void indexSchema(Optional<String> backendName, String indexName, SchemaExport export) {
        List<JsonObject> bodyParts = export
                .extension( ElasticsearchExtension.get() ) (1)
                .bodyParts(); (2)
    }
}_1Extend the _SchemaExport_ with the Elasticsearch extension.2Access an HTTP body of a request that is needed to create an index in an Elasticsearch cluster.
_new SearchSchemaCollector() {
    @Override
    public void indexSchema(Optional<String> backendName, String indexName, SchemaExport export) {
        List<JsonObject> bodyParts = export
                .extension( ElasticsearchExtension.get() ) (1)
                .bodyParts(); (2)
    }
}_
_new SearchSchemaCollector() {
    @Override
    public void indexSchema(Optional<String> backendName, String indexName, SchemaExport export) {
        List<JsonObject> bodyParts = export
                .extension( ElasticsearchExtension.get() ) (1)
                .bodyParts(); (2)
    }
}_
1Extend the _SchemaExport_ with the Elasticsearch extension.2Access an HTTP body of a request that is needed to create an index in an Elasticsearch cluster.

13.5.3. Exporting in offline mode

有时,从无法访问 Elasticsearch 集群(例如)的环境中离线导出架构非常有用。

Sometimes it can be useful to export the schema offline, from an environment that doesn’t have access to e.g. the Elasticsearch cluster.

有关如何实现脱机启动的更多信息,请参见 this section

See this section for more information on how to achieve offline startup.