Postgresql 中文操作指南

Chapter 50. Replication Progress Tracking

复制始端旨在更容易在 logical decoding 上实施逻辑复制解决方案。它们为两个常见问题提供了解决方案:

Replication origins are intended to make it easier to implement logical replication solutions on top of logical decoding. They provide a solution to two common problems:

复制始端只有两个属性,一个名称和一个 ID。名称应指在系统中参考始端的内容,它是自由形式的 text 。它应使用不太可能导致由不同复制解决方案创建的复制始端之间的冲突的方式,如给它加上复制解决方案名称的前缀。ID 仅用于在空间效率很重要的情况下避免存储长版本。它绝对不应在系统之间共享。

Replication origins have just two properties, a name and an ID. The name, which is what should be used to refer to the origin across systems, is free-form text. It should be used in a way that makes conflicts between replication origins created by different replication solutions unlikely; e.g., by prefixing the replication solution’s name to it. The ID is used only to avoid having to store the long version in situations where space efficiency is important. It should never be shared across systems.

可以使用函数 pg_replication_origin_create() 来创建复制始端;使用 pg_replication_origin_drop() 来删除;并且可以在 pg_replication_origin 系统目录中看到。

Replication origins can be created using the function pg_replication_origin_create(); dropped using pg_replication_origin_drop(); and seen in the pg_replication_origin system catalog.

构建复制解决方案的一部分非平凡任务是以安全的方式跟踪重放进度。当应用进程或整个集群死掉时,必须能够找出已成功复制数据的上限。针对此的天真的解决方案(例如,为每笔重放的事务更新表中的一行)存在诸如运行时开销大和数据库膨胀之类的问题。

One nontrivial part of building a replication solution is to keep track of replay progress in a safe manner. When the applying process, or the whole cluster, dies, it needs to be possible to find out up to where data has successfully been replicated. Naive solutions to this, such as updating a row in a table for every replayed transaction, have problems like run-time overhead and database bloat.

使用复制始端基础结构,可将会话标记为从远程节点重放(使用 pg_replication_origin_session_setup() 函数)。此外,可以使用 pg_replication_origin_xact_setup() 按事务配置每次源事务的 LSN 和提交时间戳。如果已完成,则复制进度将以防止崩溃的方式持久化。可以在 pg_replication_origin_status 视图中看到所有复制始端的重放进度。可在针对任意始端使用 pg_replication_origin_progress() 或针对在当前会话中配置的始端使用 pg_replication_origin_session_progress() 来获取单个始端的进度,例如,在继续复制时。

Using the replication origin infrastructure a session can be marked as replaying from a remote node (using the pg_replication_origin_session_setup() function). Additionally the LSN and commit time stamp of every source transaction can be configured on a per transaction basis using pg_replication_origin_xact_setup(). If that’s done replication progress will persist in a crash safe manner. Replay progress for all replication origins can be seen in the pg_replication_origin_status view. An individual origin’s progress, e.g., when resuming replication, can be acquired using pg_replication_origin_progress() for any origin or pg_replication_origin_session_progress() for the origin configured in the current session.

在比从一个系统到另一个系统的复制更复杂的复制拓扑中,另一个问题可能是难以避免再次复制重放行。这将导致复制中出现循环和低效。复制始端提供了一种可选的机制来识别并阻止这种情况。使用前一段中引用的函数进行配置时,会话生成的传递给输出插件回调的每个更改和事务(参见 Section 49.6 )都标记有生成会话的复制始端。这允许在输出插件中对这些进行不同的处理,例如,忽略除本地始发行之外的所有行。此外,可以 filter_by_origin_cb callback 根据源来过滤逻辑解码的更改流。虽然灵活性较差,但通过该回调进行过滤比在输出插件中进行过滤要高得多。

In replication topologies more complex than replication from exactly one system to one other system, another problem can be that it is hard to avoid replicating replayed rows again. That can lead both to cycles in the replication and inefficiencies. Replication origins provide an optional mechanism to recognize and prevent that. When configured using the functions referenced in the previous paragraph, every change and transaction passed to output plugin callbacks (see Section 49.6) generated by the session is tagged with the replication origin of the generating session. This allows treating them differently in the output plugin, e.g., ignoring all but locally-originating rows. Additionally the filter_by_origin_cb callback can be used to filter the logical decoding change stream based on the source. While less flexible, filtering via that callback is considerably more efficient than doing it in the output plugin.