Hibernate Search 中文操作指南
16. Explicit backend/index operations
16.1. Applying configured analyzers/normalizers to a string
以下列出的特性尚处于 incubating 阶段:它们仍在积极开发中。
Features detailed below are incubating: they are still under active development.
通常 compatibility policy 不适用:孵化元素(例如类型、方法、配置属性等)的契约在后续版本中可能会以向后不兼容的方式更改,甚至可能被移除。
The usual compatibility policy does not apply: the contract of incubating elements (e.g. types, methods, configuration properties, etc.) may be altered in a backward-incompatible way — or even removed — in subsequent releases.
我们建议您使用孵化特性,以便开发团队可以收集反馈并对其进行改进,但在需要时您应做好更新依赖于这些特性的代码的准备。
You are encouraged to use incubating features so the development team can get feedback and improve them, but you should be prepared to update code which relies on them as needed.
Hibernate Search 提供了一个将分析器/归一化器应用于给定字符串的 API。这对于测试这些分析器/归一化器的工作方式非常有用。
Hibernate Search provides an API that applies an analyzer/normalizer to a given string. This can be useful to test how these analyzers/normalizers work.
. Example 413. Inspecting tokens produced by a configured analyzer.
SearchMapping mapping = /* ... */ (1)
IndexManager indexManager = mapping.indexManager( "Book" ); (2)
List<? extends AnalysisToken> tokens = indexManager.analyze( (3)
"my-analyzer", (4)
"The quick brown fox jumps right over the little lazy dog" (5)
);
for ( AnalysisToken token : tokens ) { (6)
String term = token.term();
int startOffset = token.startOffset();
int endOffset = token.endOffset();
// ...
}
. Example 414. Inspecting tokens produced by a configured normalizer.
SearchMapping mapping = /* ... */ (1)
IndexManager indexManager = mapping.indexManager( "Book" ); (2)
AnalysisToken normalizedToken = indexManager.normalize( (3)
"my-normalizer", (4)
"The quick brown fox jumps right over the little lazy dog" (5)
);
String term = normalizedToken.term(); (6)
// ...
还有用于执行分析/标准化的异步版本方法:analyzeAsync(..)/normalizeAsync(..)。 |
There are also async versions of the methods to perform analysis/normalization: analyzeAsync(..)/normalizeAsync(..). |
16.2. Explicitly altering a whole index
某些索引操作不涉及特定实体/文档,而是涉及大量的文档,可能是全部文档。这包括例如清除索引以删除所有内容。
Some index operations are not about a specific entity/document, but rather about a large number of documents, possibly all of them. This includes, for example, purging the index to remove all of its content.
可以通过 SearchWorkspace 接口访问这些操作,并且立即执行(不在 SearchSession 、Hibernate ORM 会话或事务的上下文中)。
The operations are accessed through the SearchWorkspace interface, and executed immediately (outside the context of a SearchSession, Hibernate ORM session or transaction).
可以从 SearchMapping 中检索 SearchWorkspace ,并且可以针对一个、多个或所有索引:
The SearchWorkspace can be retrieved from the SearchMapping, and can target one, several or all indexes:
. Example 415. Retrieving a SearchWorkspace from the SearchMapping
SearchMapping searchMapping = /* ... */ (1)
SearchWorkspace allEntitiesWorkspace = searchMapping.scope( Object.class ).workspace(); (2)
SearchWorkspace bookWorkspace = searchMapping.scope( Book.class ).workspace(); (3)
SearchWorkspace bookAndAuthorWorkspace = searchMapping.scope( Arrays.asList( Book.class, Author.class ) )
.workspace(); (4)
或者,为了方便,可以从 SearchSession 中检索 SearchWorkspace :
Alternatively, for convenience, the SearchWorkspace can be retrieved from the SearchSession:
. Example 416. Retrieving a SearchWorkspace from the SearchSession
SearchMapping searchMapping = /* ... */ (1)
SearchWorkspace allEntitiesWorkspace = searchMapping.scope( Object.class ).workspace(); (2)
SearchWorkspace bookWorkspace = searchMapping.scope( Book.class ).workspace(); (3)
SearchWorkspace bookAndAuthorWorkspace = searchMapping.scope( Arrays.asList( Book.class, Author.class ) )
.workspace(); (4)
SearchWorkspace 允许各种大规模操作应用于索引或一组索引。这些操作会在申请后立即触发,无需等待 SearchSession 关闭或提交 Hibernate ORM 事务。
The SearchWorkspace exposes various large-scale operations that can be applied to an index or a set of indexes. These operations are triggered as soon as they are requested, without waiting for the SearchSession to be closed or the Hibernate ORM transaction to be committed.
此界面提供以下方法:
This interface offers the following methods:
purge()
从这个工作区定位的索引中删除所有文档。
Delete all documents from indexes targeted by this workspace.
启用多租户后,将仅移除当前租户的文档:发起此工作空间的会话的租户。
With multi-tenancy enabled, only documents of the current tenant will be removed: the tenant of the session from which this workspace originated.
purgeAsync()
返回 CompletionStage 的 purge() 的异步版本。
Asynchronous version of purge() returning a CompletionStage.
purge(Set<String> routingKeys)
从这个工作区定位的索引中删除使用任意给定的路由密钥编制索引的文档。
Delete documents from indexes targeted by this workspace that were indexed with any of the given routing keys.
启用多租户后,将仅移除当前租户的文档:发起此工作空间的会话的租户。
With multi-tenancy enabled, only documents of the current tenant will be removed: the tenant of the session from which this workspace originated.
purgeAsync(Set<String> routingKeys)
返回 CompletionStage 的 purge(Set<String>) 的异步版本。
Asynchronous version of purge(Set<String>) returning a CompletionStage.
flush()
将还没有提交的更改刷新到磁盘的索引中。在具有事务日志(Elasticsearch)的后端的情况下,还应用尚未应用的事务日志中的操作。
Flush to disk the changes to indexes that have not been committed yet. In the case of backends with a transaction log (Elasticsearch), also apply operations from the transaction log that were not applied yet.
通常,这并不实用,因为 Hibernate Search 会自动提交更改。有关更多信息,请参见 Commit and refresh。
This is generally not useful as Hibernate Search commits changes automatically. See Commit and refresh for more information.
flushAsync()
返回 CompletionStage 的 flush() 的异步版本。
Asynchronous version of flush() returning a CompletionStage.
refresh()
刷新索引,以便到目前为止执行的所有更改在搜索查询中可见。
Refresh the indexes so that all changes executed so far will be visible in search queries.
通常,这并不实用,因为索引会自动刷新。有关更多信息,请参见 Commit and refresh。
This is generally not useful as indexes are refreshed automatically. See Commit and refresh for more information.
refreshAsync()
返回 CompletionStage 的 refresh() 的异步版本。
Asynchronous version of refresh() returning a CompletionStage.
mergeSegments()
将这个工作区定位的每个索引合并成一个片段。此操作不会总能提高性能:请参阅 Merging segments and performance 。
Merge each index targeted by this workspace into a single segment. This operation does not always improve performance: see Merging segments and performance.
mergeSegmentsAsync()
返回 CompletionStage 的 mergeSegments() 的异步版本。此操作不会总能提高性能:请参阅 Merging segments and performance 。
Asynchronous version of mergeSegments() returning a CompletionStage. This operation does not always improve performance: see Merging segments and performance.
合并片段和性能merge-segments 操作可能会对性能产生正向影响和负向影响。 |
Merging segments and performance The merge-segments operation may affect performance positively as well as negatively. |
此操作将所有索引数据重新归入一个巨大的片段(一个文件)中。最初,这可能会加速搜索,但随着文档的删除,这个巨大的片段将开始被必须在搜索过程中作为特殊情况处理的“孔”填充,从而降低性能。
This operation will regroup all index data into a single, huge segment (a file). This may speed up search at first, but as documents are deleted, this huge segment will begin to fill with "holes" which have to be handled as special cases during search, degrading performance.
Elasticsearch/Lucene 通过在某个时间点重建片段来解决此问题,但仅在达到一定比例的已删除文档后才执行此操作。如果所有文档都处于一个巨大的片段中,则不太可能达到此比例,并且索引性能将长时间持续下降。
Elasticsearch/Lucene do address this by rebuilding the segment at some point, but only once a certain ratio of deleted documents is reached. If all documents are in a single, huge segment, this ratio is less likely to be reached, and the index performance will continue to degrade for a long time.
然而,在以下两种情况下,合并片段可能有帮助:
There are, however, two situations in which merging segments may help:
在较长的一段时间内,不应进行删除或文档更新。
No deletions or document updates are expected for an extended period of time.
大多数或所有文档刚刚从索引中删除,导致段主要由已删除的文档组成。在这种情况下,将剩余的几个文档重新分组到一个段中是有意义的,尽管 Elasticsearch/Lucene 可能自动执行此操作。
Most, or all documents have just been removed from the index, leading to segments consisting mostly of deleted documents. In that case, it makes sense to regroup the few remaining documents into a single segment, though Elasticsearch/Lucene will probably do it automatically.
以下是一个使用 SearchWorkspace 清除多个索引的示例。
Below is an example using a SearchWorkspace to purge several indexes.
. Example 417. Purging indexes using a SearchWorkspace
SearchSession searchSession = /* ... */ (1)
SearchWorkspace workspace = searchSession.workspace( Book.class, Author.class ); (2)
workspace.purge(); (3)
16.3. Lucene-specific explicit backend/index operations
16.3.1. Retrieving analyzers and normalizers through the Lucene-specific Backend
Lucene 分析器和规范器 defined in Hibernate Search可以从 Lucene 后端检索。
Lucene analyzers and normalizers defined in Hibernate Search can be retrieved from the Lucene backend.
. Example 418. Retrieving the Lucene analyzers by name from the backend
SearchMapping mapping = /* ... */ (1)
Backend backend = mapping.backend(); (2)
LuceneBackend luceneBackend = backend.unwrap( LuceneBackend.class ); (3)
Optional<? extends Analyzer> analyzer = luceneBackend.analyzer( "english" ); (4)
Optional<? extends Analyzer> normalizer = luceneBackend.normalizer( "isbn" ); (5)
或者,您还可以检索完整索引的(复合)分析器。这些分析器对每个字段的行为不同,会将分析委托给每个字段的映射中配置的分析器。
Alternatively, you can also retrieve the (composite) analyzers for a whole index. These analyzers behave differently for each field, delegating to the analyzer configured in the mapping for each field.
. Example 419. Retrieving the Lucene analyzers for a whole index
SearchMapping mapping = /* ... */ (1)
IndexManager indexManager = mapping.indexManager( "Book" ); (2)
LuceneIndexManager luceneIndexManager = indexManager.unwrap( LuceneIndexManager.class ); (3)
Analyzer indexingAnalyzer = luceneIndexManager.indexingAnalyzer(); (4)
Analyzer searchAnalyzer = luceneIndexManager.searchAnalyzer(); (5)
16.3.2. Retrieving the Lucene’s index size
可以从 LuceneIndexManager 检索 Lucene 索引的大小。
The size of a Lucene index can be retrieved from the LuceneIndexManager.
. Example 420. Retrieving the index size from a Lucene index manager
SearchMapping mapping = /* ... */ (1)
IndexManager indexManager = mapping.indexManager( "Book" ); (2)
LuceneIndexManager luceneIndexManager = indexManager.unwrap( LuceneIndexManager.class ); (3)
long size = luceneIndexManager.computeSizeInBytes(); (4)
luceneIndexManager.computeSizeInBytesAsync() (5)
.thenAccept( sizeInBytes -> {
// ...
} );
16.3.3. Retrieving a Lucene IndexReader
可以从 LuceneIndexScope 检索低级别 IndexReader。
The low-level IndexReader can be retrieved from the LuceneIndexScope.
. Example 421. Retrieving the index reader from a Lucene index scope
SearchMapping mapping = /* ... */ (1)
LuceneIndexScope indexScope = mapping
.scope( Book.class ).extension( LuceneExtension.get() ); (2)
try ( IndexReader indexReader = indexScope.openIndexReader() ) { (3)
// work with the low-level index reader:
numDocs = indexReader.numDocs();
}
即使启用了多租户,返回的读取器也会公开所有租户的文档。
Even if multi-tenancy is enabled, the returned reader exposes documents of all tenants.
16.4. Elasticsearch-specific explicit backend/index operations
16.4.1. Retrieving the REST client
使用具有高级要求的编写复杂应用程序时,有时可能需要直接向 Elasticsearch 集群发送请求,尤其是在 Hibernate Search 不支持这种开箱即用的请求时。
When writing complex applications with advanced requirements, it may be necessary from time to time to send requests to the Elasticsearch cluster directly, in particular if Hibernate Search does not support this kind of requests out of the box.
为此,您可以检索 Elasticsearch 后端,然后访问 Hibernate Search 在内部使用的 Elasticsearch 客户端。请参见以下示例。
To that end, you can retrieve the Elasticsearch backend, then get access the Elasticsearch client used by Hibernate Search internally. See below for an example.
. Example 422. Accessing the low-level REST client
SearchMapping mapping = /* ... */ (1)
Backend backend = mapping.backend(); (2)
ElasticsearchBackend elasticsearchBackend = backend.unwrap( ElasticsearchBackend.class ); (3)
RestClient client = elasticsearchBackend.client( RestClient.class ); (4)
客户端本身不属于 Hibernate Search API,而是 official Elasticsearch REST client API 的一部分。
The client itself is not part of the Hibernate Search API, but of the official Elasticsearch REST client API.
Hibernate Search 可能在无事先通知的情况下切换到具有不同 Java 类型的其他客户端。如果发生这种情况,上面的代码片段将引发异常。
Hibernate Search may one day switch to another client with a different Java type, without prior notice. If that happens, the snippet of code above will throw an exception.