Hibernate Search 中文操作指南

21. Known issues and limitations

21.1. Without coordination, in rare cases, indexing involving @IndexedEmbedded may lead to out-of sync indexes

21.1.1. Description

在默认设置( no coordination)中,如果两个实体实例在同一“索引-嵌入”实体中 indexed-embedded,并且这两个实体实例在并行事务中更新,那么存在一个小风险,即事务提交刚好错误,导致索引-嵌入实体仅部分重新进行索引更新。

With the default settings (no coordination), if two entity instances are indexed-embedded in the same "index-embedding" entity, and these two entity instance are updated in parallel transactions, there is a small risk that the transaction commits happen in just the wrong way, leading to the index-embedding entity being re-indexed with only part of the updates.

例如,考虑一个索引实体 A,该索引嵌入了 B 和 C。以下涉及两个并行事务(T1 和 T2)的事件过程将导致索引过时:

For example, consider indexed entity A, which index-embeds B and C. The following course of events involving two parallel transactions (T1 and T2) will lead to an out of date index:

  1. T1: Load B.

  2. T1: Change B in a way that will require reindexing A.

  3. T2: Load C.

  4. T2: Change C in a way that will require reindexing A.

  5. T2: Request the transaction commit. Hibernate Search builds the document for A. While doing so, it automatically loads B. B appears unmodified, as T1 wasn’t committed yet.

  6. T1: Request the transaction commit. Hibernate Search builds documents to index. While doing so, it automatically loads C. C appears unmodified, as T2 wasn’t committed yet.

  7. T1: Transaction is committed. Hibernate Search automatically sends the updated A to the index. In this version, B is updated, but C is not.

  8. T2: Transaction is committed. Hibernate Search automatically sends the updated A to the index. In this version, C is updated, but B is not.

此事件链将结束于一个索引,其中包含已更新 C 但未更新 B 的 A 版本。

This chain of events ends with the index containing a version of A where C is updated, but B is not.

21.1.2. Solutions and workarounds

以下解决方案可以帮助规避此限制:

The following solutions can help circumvent this limitation:

21.1.3. Roadmap

此限制是由线程或应用程序节点之间缺乏协调直接造成的,因此只能通过配置 coordination完全解决。

This limitation is caused directly by the lack of coordination between threads or application nodes, so it can only be addressed completely by configuring coordination.

目前路线图中没有其他解决方案。

There are no other solutions currently on the roadmap.

21.2. Without coordination, backend errors during indexing may lead to out-of sync indexes

21.2.1. Description

使用默认设置 ( no coordination ) 时, indexing 实际上会在事务提交后立即在后端应用索引更改,而不会对索引更改进行任何类型的交易记录。

With the default settings (no coordination), indexing will actually apply index changes in the backend just after the transaction commit, without any kind of transaction log for the index changes.

因此,如果在索引期间后端发生错误(即 I/O 错误),则将取消此索引,而无法取消对应的数据库事务:因此索引将变得不同步。

Consequently, should an error occur in the backend while indexing (i.e. an I/O error), this indexing will be cancelled, with no way to cancel the corresponding database transaction: the index will thus become out of sync.

21.2.2. Solutions and workarounds

以下解决方案可以帮助规避此限制:

The following solutions can help circumvent this limitation:

21.2.3. Roadmap

此限制是由于缺少对实体更改事件的持久性直接导致的,因此仅能通过以下方式来彻底解决:持久化这些事件,例如切换到 outbox-polling coordination strategy

This limitation is caused directly by the lack of persistence of entity change events, so it can only be addressed completely by persisting those events, e.g. by switching to the outbox-polling coordination strategy.

未来版本中可能会考虑一些不完整的对策,例如自动的线程内重试,但它们永远无法完全解决问题,而且它们目前不在路线图中。

Some incomplete countermeasures may be considered in future versions, such as automatic in-thread retries, but they will never solve the problem completely, and they are not currently on the roadmap.

21.3. Listener-triggered indexing only considers changes applied directly to entity instances in Hibernate ORM sessions

21.3.1. Description

由于 how Hibernate Search uses internal events of Hibernate ORM为了检测更改,它不会检测_insert_/delete/_update_查询(无论是 SQL 查询还是 JPQL/HQL 查询)导致的更改。

Due to how Hibernate Search uses internal events of Hibernate ORM in order to detect changes, it will not detect changes resulting from insert/delete/update queries, be it SQL or JPQL/HQL queries.

这是因为这些查询是在数据库端执行的,而 Hibernate ORM 或 Search 并不了解实际上创建、删除或更新了哪些实体。

This is because queries are executed on the database side, without Hibernate ORM or Search having any knowledge of which entities are actually created, deleted or updated.

21.3.2. Solutions and workarounds

一种解决方法是,使用 MassIndexerJakarta Batch mass indexing jobexplicitly ,在运行 JPQL/SQL 查询后显式重新索引。

One workaround is to reindex explicitly after you run JPQL/SQL queries, either using the MassIndexer, using the Jakarta Batch mass indexing job, or explicitly.

21.3.3. Roadmap

实际检测这些更改的一种解决方案是直接从数据库中获取实体更改事件,例如使用 Debezium。

One solution to actually detect these changes would be to source entity change events directly from the database, using for example Debezium.

这在 HSEARCH-3513中作为跟踪,但仍然是长期的目标。

This is tracked as HSEARCH-3513, but is long-term goal.

21.4. Listener-triggered indexing ignores asymmetric association updates

21.4.1. Description

Hibernate ORM 能够处理关联的不对称更新,其中只有关联的所有者端得到更新且另一端被忽略。会话中的实体将在会话持续期间不一致,但在重新加载后,它们将再次保持一致,这是由于实体加载的工作方式。

Hibernate ORM is able to handle asymmetric updates of associations, where only the owning side of association is updated and the other side is ignored. The entities in the session will be inconsistent for the duration of the session, but upon reloading they will be consistent once again, due to how entity loading works.

这种关联的不对称更新做法通常会在应用程序中引起问题,在 Hibernate Search 中也如此,它可能导致索引不同步。因此,必须避免。

This practice of asymmetric updates of associations can cause problems in applications in general, but also in Hibernate Search specifically, where it may lead to out-of-sync indexes. Thus, it must be avoided.

例如,我们假设一个索引实体 A 与实体 @IndexedEmbedded 具有 A.b 关联 B,且 B 在其端拥有该关联,B.a。只需将 B.a 设置为 null 即可移除 AB 之间的关联,数据库的效果将与我们希望的一样。

For example, let’s assume an indexed entity A has an @IndexedEmbedded association A.b to entity B, and that B owns that association on its side, B.a. One can just set B.a to null in order to remove the association between A and B, and the effect on the database will be exactly what we want.

但是,Hibernate Search 将只能检测到 B.a 发生了变化,并且当它尝试推理需要重新索引哪些实体时,它将不再能够知道 B.a 以前指的是什么。这种变化本身对 Hibernate Search 来说是无用的:Hibernate Search 将不知道需要重新索引 A。它会“忘记”重新索引 A,导致 A.b 仍包含 B 的不同步索引。

However, Hibernate Search will only be able to detect that B.a changed, and by the time it tries to infer which entities need to be re-indexed, it will no longer be able to know what B.a used to refer to. That change in itself is useless to Hibernate Search: Hibernate Search will not know that A, specifically, needs to be re-indexed. It will "forget" to reindex A, leading to an out-of-sync index where A.b still contains B.

最终,Hibernate Search 了解需要重新索引 A 的唯一方法是同时将 A.b 设置为 null,这将导致 Hibernate Search 检测到 A.b 已更改,从而 A 也已更改。

In the end, the only way for Hibernate Search to know that A needs to be re-indexed is to also set A.b to null, which will cause Hibernate Search to detect that A.b changed, and thus that A changed too.

21.4.2. Solutions and workarounds

以下解决方案可以帮助规避此限制:

The following solutions can help circumvent this limitation:

  • When you update one side of an association, always update the other side consistently.

  • When the above is not possible, reindex affected entities explicitly after the association update, either using the MassIndexer, using the Jakarta Batch mass indexing job, or explicitly.

21.4.3. Roadmap

将来,Hibernate Search 可能会处理非对称关联更新,方法是对从关联中添加/删除的实体进行跟踪。但只有在后台线程中异步进行索引时,此方法才能彻底解决该问题,例如使用 outbox-polling coordination strategy 。此方法已作为 HSEARCH-3567 进行跟踪。

Hibernate Search may handle asymmetric association updates in the future, by keeping tracks of entities added to / removed from an association. However, this will only solve the problem completely if indexing happens asynchronously in a background thread, such as with the outbox-polling coordination strategy. This is tracked as HSEARCH-3567.

或者,使用 Debezium 等工具直接从数据库中获取实体变更事件也能解决此问题。这在 HSEARCH-3513中作为跟踪,但仍然是长期的目标。

Alternatively, sourcing entity change events directly from the database, using for example Debezium, would also solve the problem. This is tracked as HSEARCH-3513, but is long-term goal.

21.5. Listener-triggered indexing is not compatible with Session serialization

21.5.1. Description

listener-triggered indexing启用时,Hibernate Search 收集实体更改事件,以在 ORM EntityManager/_Session_内部构建“索引计划”。索引计划包含有关需要重新索引的实体的信息,有时也包含尚未索引的文档。

When listener-triggered indexing is enabled, Hibernate Search collects entity change events to build an "indexing plan" inside the ORM EntityManager/Session. The indexing plan holds information relative to which entities need to be re-indexed, and sometimes documents that have not been indexed yet.

索引计划无法被序列化。

The indexing plan cannot be serialized.

如果 ORM Session 被序列化,则在反序列化会话时所有收集的更改事件都将丢失,而 Hibernate Search 可能“忘记”重新为某些实体建立索引。

If the ORM Session gets serialized, all collected change events will be lost upon deserializing the session, and Hibernate Search will likely "forget" to reindex some entities.

在大多数应用中都没问题,因为他们并不用依赖会话的序列化,但是对于依赖 Bean 钝化的某些 JEE 应用来说,这可能是个问题。

This is fine in most applications, since they do not rely on serializing the session, but it might be a problem with some JEE applications relying on Bean Passivation.

21.5.2. Solutions and workarounds

在更改实体后,避免序列化 ORM EntityManager/Session

Avoid serializing an ORM EntityManager/Session after changing entities.

21.5.3. Roadmap

没有计划来解决此限制。当 Hibernate Search 被启用时,我们不打算支持 Session 序列化。

There are no plans to address this limitation. We do not intend to support Session serialization when Hibernate Search is enabled.