Postgresql 中文操作指南
27.2. Log-Shipping Standby Servers #
连续归档可用于创建 high availability(HA)集群配置,其中一个或多个 standby servers 就绪,如果主服务器发生故障,便可接手操作。此功能通常称为 warm standby 或 log shipping。
Continuous archiving can be used to create a high availability (HA) cluster configuration with one or more standby servers ready to take over operations if the primary server fails. This capability is widely referred to as warm standby or log shipping.
主服务器和备用服务器协同工作以提供此功能,尽管服务器只是松散耦合的。主服务器以连续归档模式运行,而每个备用服务器则以连续恢复模式运行,从主服务器中读取 WAL 文件。不需要对数据库表进行任何更改以启用此功能,因此与其他一些复制解决方案相比,它提供了较低管理开销。此配置对主服务器的性能影响也相对较小。
The primary and standby server work together to provide this capability, though the servers are only loosely coupled. The primary server operates in continuous archiving mode, while each standby server operates in continuous recovery mode, reading the WAL files from the primary. No changes to the database tables are required to enable this capability, so it offers low administration overhead compared to some other replication solutions. This configuration also has relatively low performance impact on the primary server.
直接将 WAL 记录从一个数据库服务器移动到另一个数据库服务器通常称为日志传送。PostgreSQL 通过一次一个文件(WAL 段)传输 WAL 记录实现基于文件的日志传送。无论传输到邻近系统、同一站点上的另一个系统还是地球另一端的另一个系统,WAL 文件(16MB)都可以轻松且廉价地进行传输。此技术所需的带宽因主服务器的事务速率而异。基于记录的日志传送更加细致,并通过网络连接增量传输 WAL 更改(参见 Section 27.2.5)。
Directly moving WAL records from one database server to another is typically described as log shipping. PostgreSQL implements file-based log shipping by transferring WAL records one file (WAL segment) at a time. WAL files (16MB) can be shipped easily and cheaply over any distance, whether it be to an adjacent system, another system at the same site, or another system on the far side of the globe. The bandwidth required for this technique varies according to the transaction rate of the primary server. Record-based log shipping is more granular and streams WAL changes incrementally over a network connection (see Section 27.2.5).
请注意,日志传送是异步的,也就是说,WAL 记录是在事务提交后传送的。因此,如果主服务器发生灾难性故障,将存在数据丢失的窗口;尚未传送的事务将丢失。可以使用 archive_timeout 参数限制基于文件的日志传送中的数据丢失窗口的大小,该参数可以低至几秒钟。但是,如此低的值将显著增加文件传输所需的带宽。流复制(参见 Section 27.2.5)允许的数据丢失窗口要小得多。
It should be noted that log shipping is asynchronous, i.e., the WAL records are shipped after transaction commit. As a result, there is a window for data loss should the primary server suffer a catastrophic failure; transactions not yet shipped will be lost. The size of the data loss window in file-based log shipping can be limited by use of the archive_timeout parameter, which can be set as low as a few seconds. However such a low setting will substantially increase the bandwidth required for file shipping. Streaming replication (see Section 27.2.5) allows a much smaller window of data loss.
恢复性能足够好,因此备用数据库一旦被激活,通常只需片刻即可完全可用。因此,这被称为热备用配置,它提供高可用性。从存档的基本备份和前滚恢复服务器需要花费更长的时间,因此该技术仅为灾难恢复提供解决方案,而不是高可用性。备用服务器也可以用于只读查询,在这种情况下,称为 hot standby 服务器。有关更多信息,请参见 Section 27.4。
Recovery performance is sufficiently good that the standby will typically be only moments away from full availability once it has been activated. As a result, this is called a warm standby configuration which offers high availability. Restoring a server from an archived base backup and rollforward will take considerably longer, so that technique only offers a solution for disaster recovery, not high availability. A standby server can also be used for read-only queries, in which case it is called a hot standby server. See Section 27.4 for more information.
27.2.1. Planning #
通常情况下,最好创建尽可能相似的原始服务器和备用服务器,至少从数据库服务器的角度来看是这样。特别是,与表空间关联的路径名称将直接传递,因此,如果使用了此功能,则原始服务器和备用服务器必须具有相同的表空间装载路径。请记住,如果在原始服务器上执行 CREATE TABLESPACE ,则需要为其创建的任何新装载点必须在命令执行之前先在原始服务器和所有备用服务器上创建。硬件不必完全相同,但经验表明,在应用程序和系统的生命周期中维护两个相同系统比维护两个不同系统更容易。在任何情况下,硬件架构必须相同——例如,从 32 位系统传输到 64 位系统将不起作用。
It is usually wise to create the primary and standby servers so that they are as similar as possible, at least from the perspective of the database server. In particular, the path names associated with tablespaces will be passed across unmodified, so both primary and standby servers must have the same mount paths for tablespaces if that feature is used. Keep in mind that if CREATE TABLESPACE is executed on the primary, any new mount point needed for it must be created on the primary and all standby servers before the command is executed. Hardware need not be exactly the same, but experience shows that maintaining two identical systems is easier than maintaining two dissimilar ones over the lifetime of the application and system. In any case the hardware architecture must be the same — shipping from, say, a 32-bit to a 64-bit system will not work.
一般情况下,无法在运行不同主要 PostgreSQL 发行级别的服务器之间进行日志传输。PostgreSQL 全球开发组的策略是,在次要发行升级期间不对磁盘格式进行任何更改,因此在主服务器和备用服务器上运行不同次要发行级别很可能能够成功运行。然而,对此未提供正式的支持,建议您尽可能将主服务器和备用服务器保持在相同的发行级别。当升级到新的次要发行时,最安全的策略是先更新备用服务器——与从较前次要发行读取 WAL 文件相比,较新的次要发行更有可能能够读取较前次要发行的 WAL 文件。
In general, log shipping between servers running different major PostgreSQL release levels is not possible. It is the policy of the PostgreSQL Global Development Group not to make changes to disk formats during minor release upgrades, so it is likely that running different minor release levels on primary and standby servers will work successfully. However, no formal support for that is offered and you are advised to keep primary and standby servers at the same release level as much as possible. When updating to a new minor release, the safest policy is to update the standby servers first — a new minor release is more likely to be able to read WAL files from a previous minor release than vice versa.
27.2.2. Standby Server Operation #
如果在启动服务器时服务器中存在 standby.signal 文件,则服务器将进入备用模式。
A server enters standby mode if a standby.signal file exists in the data directory when the server is started.
在备用模式下,服务器不断应用从主服务器接收的 WAL。备用服务器可以从 WAL 存档(见 restore_command)或直接从主服务器通过 TCP 连接(流复制)读取 WAL。备用服务器还将尝试恢复备用集群 pg_wal 目录中发现的任何 WAL。这通常发生在服务器重新启动之后,当时备用数据库再次重放故障转移之前从主数据库串流的 WAL,但是你也可以随时手动将文件复制到 pg_wal 来重放它们。
In standby mode, the server continuously applies WAL received from the primary server. The standby server can read WAL from a WAL archive (see restore_command) or directly from the primary over a TCP connection (streaming replication). The standby server will also attempt to restore any WAL found in the standby cluster’s pg_wal directory. That typically happens after a server restart, when the standby replays again WAL that was streamed from the primary before the restart, but you can also manually copy files to pg_wal at any time to have them replayed.
在启动时,备用服务器首先从档案库位置恢复所有可用的 WAL,并调用 restore_command。一旦到达那里可用的 WAL 的末尾,且 restore_command 失败,它将尝试恢复 pg_wal 目录中找到的任何 WAL。如果失败,并且已配置流复制,则备用服务器将尝试连接到主服务器,并尝试从存档或 pg_wal 中找到的最后一条有效记录开始流式传输 WAL。如果该尝试失败,或者未配置流复制,或者连接稍后断开,备用服务器将返回步骤 1,并尝试再次从存档恢复文件。此从档案库、pg_wal 和通过流复制进行重试的循环将一直持续到服务器停止或被提升。
At startup, the standby begins by restoring all WAL available in the archive location, calling restore_command. Once it reaches the end of WAL available there and restore_command fails, it tries to restore any WAL available in the pg_wal directory. If that fails, and streaming replication has been configured, the standby tries to connect to the primary server and start streaming WAL from the last valid record found in archive or pg_wal. If that fails or streaming replication is not configured, or if the connection is later disconnected, the standby goes back to step 1 and tries to restore the file from the archive again. This loop of retries from the archive, pg_wal, and via streaming replication goes on until the server is stopped or is promoted.
当运行 pg_ctl promote 或调用 pg_promote() 时,将退出备用模式,服务器将切换到正常操作。在故障转移之前,将恢复档案库或 pg_wal 中立即可用的任何 WAL,但不尝试连接到主服务器。
Standby mode is exited and the server switches to normal operation when pg_ctl promote is run, or pg_promote() is called. Before failover, any WAL immediately available in the archive or in pg_wal will be restored, but no attempt is made to connect to the primary.
27.2.3. Preparing the Primary for Standby Servers #
在 Section 26.3 中所述的在主数据库上设置连续归档到备用数据库可以访问的归档目录。即使主数据库关闭,备用数据库也应该能够访问归档位置,也就是说,它应该驻留在备用服务器本身或另一个受信任的服务器上,而不是主服务器上。
Set up continuous archiving on the primary to an archive directory accessible from the standby, as described in Section 26.3. The archive location should be accessible from the standby even when the primary is down, i.e., it should reside on the standby server itself or another trusted server, not on the primary server.
如果您要使用流复制,请在主服务器上设置身份验证,以允许备用服务器的复制连接;即创建角色并在 pg_hba.conf 中提供设置数据库字段为 replication 的适当条目。还要确保在主服务器的配置文件中将 max_wal_senders 设置为足够大的值。如果要使用复制插槽,请确保也将 max_replication_slots 设置得足够高。
If you want to use streaming replication, set up authentication on the primary server to allow replication connections from the standby server(s); that is, create a role and provide a suitable entry or entries in pg_hba.conf with the database field set to replication. Also ensure max_wal_senders is set to a sufficiently large value in the configuration file of the primary server. If replication slots will be used, ensure that max_replication_slots is set sufficiently high as well.
按照 Section 26.3.2 中所述进行基本备份以引导备用服务器。
Take a base backup as described in Section 26.3.2 to bootstrap the standby server.
27.2.4. Setting Up a Standby Server #
要设置备用服务器,请恢复从原始服务器获取的 base 备份(参见 Section 26.3.4 )。在备用服务器的集群数据目录中创建一个 standby.signal 文件。将 restore_command 设置为一个简单命令,以从 WAL 存档复制文件。如果您计划使用多个备用服务器来保证高可用性,请确保将 recovery_target_timeline 设置为 latest (默认值),以使备用服务器在故障转移至其他备用服务器时遵循时间表的变化。
To set up the standby server, restore the base backup taken from primary server (see Section 26.3.4). Create a file standby.signal in the standby’s cluster data directory. Set restore_command to a simple command to copy files from the WAL archive. If you plan to have multiple standby servers for high availability purposes, make sure that recovery_target_timeline is set to latest (the default), to make the standby server follow the timeline change that occurs at failover to another standby.
Note
如果文件不存在, restore_command 应立即返回;如果需要,服务器将再次重试该命令。
restore_command should return immediately if the file does not exist; the server will retry the command again if necessary.
如果你想使用流复制,请使用一个 libpq 连接字符串填充 primary_conninfo,包括主机名(或 IP 地址)和连接到主服务器所需的任何其他详细信息。如果主服务器需要密码进行身份验证,密码也需要在 primary_conninfo 中指定。
If you want to use streaming replication, fill in primary_conninfo with a libpq connection string, including the host name (or IP address) and any additional details needed to connect to the primary server. If the primary needs a password for authentication, the password needs to be specified in primary_conninfo as well.
如果您出于高可用性目的而设置备用服务器,请像设置主服务器那样设置 WAL 归档、连接和身份验证,因为故障转移后备用服务器将作为主服务器工作。
If you’re setting up the standby server for high availability purposes, set up WAL archiving, connections and authentication like the primary server, because the standby server will work as a primary server after failover.
如果您正在使用 WAL 存档,可以使用 archive_cleanup_command 参数来最小化其大小,以删除备用服务器不再需要的文件。pg_archivecleanup 实用程序专门用于在典型的单备用配置中与 archive_cleanup_command 配合使用,请参见 pg_archivecleanup 。但请注意,如果您将存档用于备份目的,则需要保留至少最新 base 备份所需的文件,即使备用服务器不再需要这些文件。
If you’re using a WAL archive, its size can be minimized using the archive_cleanup_command parameter to remove files that are no longer required by the standby server. The pg_archivecleanup utility is designed specifically to be used with archive_cleanup_command in typical single-standby configurations, see pg_archivecleanup. Note however, that if you’re using the archive for backup purposes, you need to retain files needed to recover from at least the latest base backup, even if they’re no longer needed by the standby.
配置的一个简单示例是:
A simple example of configuration is:
primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass options=''-c wal_sender_timeout=5000'''
restore_command = 'cp /path/to/archive/%f %p'
archive_cleanup_command = 'pg_archivecleanup /path/to/archive %r'
可以使用任意数量的备用服务器,但如果使用流复制,请确保在主服务器中设置 max_wal_senders 足够高,以允许它们同时连接。
You can have any number of standby servers, but if you use streaming replication, make sure you set max_wal_senders high enough in the primary to allow them to be connected simultaneously.
27.2.5. Streaming Replication #
流复制允许备用服务器比使用基于文件的日志传输更加最新。备用连接到主服务器,它将 WAL 记录流式传输到备用,因为它们被生成,而不等待 WAL 文件被填充。
Streaming replication allows a standby server to stay more up-to-date than is possible with file-based log shipping. The standby connects to the primary, which streams WAL records to the standby as they’re generated, without waiting for the WAL file to be filled.
流复制默认情况下是异步的(见 Section 27.2.8),在这种情况下,在主数据库中提交事务与备用数据库中可见更改之间有一个小的延迟。然而,此延迟远小于基于文件的日志传输,通常在备用数据库足够强大以跟上负载的情况下,小于一秒。使用流复制时,不需要 archive_timeout 来减少数据丢失窗口。
Streaming replication is asynchronous by default (see Section 27.2.8), in which case there is a small delay between committing a transaction in the primary and the changes becoming visible in the standby. This delay is however much smaller than with file-based log shipping, typically under one second assuming the standby is powerful enough to keep up with the load. With streaming replication, archive_timeout is not required to reduce the data loss window.
如果您使用流复制而不使用基于文件的连续归档,则服务器可能会在备用接收旧的 WAL 段之前回收旧的 WAL 段。如果发生这种情况,则需要从新的基本备份重新初始化备用。可以通过将 wal_keep_size 设置为足够大的值以确保不会过早回收 WAL 段,或为备用配置复制槽来避免这种情况。如果您设置了备用可访问的 WAL 存档,则不需要这些解决方案,因为如果备用保留足够多的段,则备用始终可以使用存档来追赶。
If you use streaming replication without file-based continuous archiving, the server might recycle old WAL segments before the standby has received them. If this occurs, the standby will need to be reinitialized from a new base backup. You can avoid this by setting wal_keep_size to a value large enough to ensure that WAL segments are not recycled too early, or by configuring a replication slot for the standby. If you set up a WAL archive that’s accessible from the standby, these solutions are not required, since the standby can always use the archive to catch up provided it retains enough segments.
要使用流复制,请按照 Section 27.2 中所述设置基于文件的日志传输备用服务器。将基于文件的日志传输备用服务器转换为流复制备用服务器的步骤是将 primary_conninfo 设置指向主服务器。在主数据库上设置 listen_addresses 和身份验证选项(见 pg_hba.conf),以便备用服务器可以连接到主服务器上的 replication 伪数据库(见 Section 27.2.5.1)。
To use streaming replication, set up a file-based log-shipping standby server as described in Section 27.2. The step that turns a file-based log-shipping standby into streaming replication standby is setting the primary_conninfo setting to point to the primary server. Set listen_addresses and authentication options (see pg_hba.conf) on the primary so that the standby server can connect to the replication pseudo-database on the primary server (see Section 27.2.5.1).
在支持保留套接字选项的系统上,设置 tcp_keepalives_idle、 tcp_keepalives_interval 和 tcp_keepalives_count 有助于主数据库及时发现断开的连接。
On systems that support the keepalive socket option, setting tcp_keepalives_idle, tcp_keepalives_interval and tcp_keepalives_count helps the primary promptly notice a broken connection.
设置来自备用服务器的最大并发连接数(有关详细信息,请参见 max_wal_senders)。
Set the maximum number of concurrent connections from the standby servers (see max_wal_senders for details).
当备用启动并且 primary_conninfo 设置正确时,备用将在重播存档中可用的所有 WAL 文件后连接到主服务器。如果成功建立连接,您将在备用中看到 walreceiver,以及主服务器中相应的 walsender 进程。
When the standby is started and primary_conninfo is set correctly, the standby will connect to the primary after replaying all WAL files available in the archive. If the connection is established successfully, you will see a walreceiver in the standby, and a corresponding walsender process in the primary.
27.2.5.1. Authentication #
设置复制的访问权限非常重要,只有受信任的用户才能读取 WAL 流,因为从中提取特权信息非常容易。备用服务器必须以具有 REPLICATION 权限或超级用户的帐户向主服务器进行身份验证。建议为复制创建具有 REPLICATION 和 LOGIN 权限的专用用户帐户。虽然 REPLICATION 权限授予非常高的权限,但它不允许用户修改主系统上的任何数据,而 SUPERUSER 权限则允许。
It is very important that the access privileges for replication be set up so that only trusted users can read the WAL stream, because it is easy to extract privileged information from it. Standby servers must authenticate to the primary as an account that has the REPLICATION privilege or a superuser. It is recommended to create a dedicated user account with REPLICATION and LOGIN privileges for replication. While REPLICATION privilege gives very high permissions, it does not allow the user to modify any data on the primary system, which the SUPERUSER privilege does.
复制的客户端身份验证由 pg_hba.conf 记录控制,该记录在 database 字段中指定 replication。例如,如果备用在主机 IP 192.168.1.100 上运行,并且复制的帐户名称为 foo,则管理员可以在主服务器上的 pg_hba.conf 文件中添加以下行:
Client authentication for replication is controlled by a pg_hba.conf record specifying replication in the database field. For example, if the standby is running on host IP 192.168.1.100 and the account name for replication is foo, the administrator can add the following line to the pg_hba.conf file on the primary:
# Allow the user "foo" from host 192.168.1.100 to connect to the primary
# as a replication standby if the user's password is correctly supplied.
#
# TYPE DATABASE USER ADDRESS METHOD
host replication foo 192.168.1.100/32 md5
主数据库的主机名和端口号、连接用户名和密码在 primary_conninfo 中指定。密码也可以在备用服务器上的 ~/.pgpass 文件中设置(在 database 字段中指定 replication)。例如,如果主数据库在主机 IP 192.168.1.50、端口 5432 上运行,复制的帐户名称是 foo,密码是 foopass,则管理员可以在备用服务器上的 postgresql.conf 文件中添加以下行:
The host name and port number of the primary, connection user name, and password are specified in the primary_conninfo. The password can also be set in the ~/.pgpass file on the standby (specify replication in the database field). For example, if the primary is running on host IP 192.168.1.50, port 5432, the account name for replication is foo, and the password is foopass, the administrator can add the following line to the postgresql.conf file on the standby:
# The standby connects to the primary that is running on host 192.168.1.50
# and port 5432 as the user "foo" whose password is "foopass".
primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
27.2.5.2. Monitoring #
流复制的一个重要健康指标是在主数据库中生成的大量 WAL 记录,但尚未在备用数据库中应用。您可以通过比较主数据库上的当前 WAL 写入位置与备用数据库接收到的最后一个 WAL 位置来计算此滞后。可以使用 pg_current_wal_lsn(在主数据库上)和 pg_last_wal_receive_lsn(在备用数据库上)分别检索这些位置(有关详细信息,请参见 Table 9.91 和 Table 9.92)。备用数据库中的最后一个 WAL 接收位置也显示在 WAL 接收器进程的进程状态中,该状态使用 ps 命令显示(有关详细信息,请参见 Section 28.1)。
An important health indicator of streaming replication is the amount of WAL records generated in the primary, but not yet applied in the standby. You can calculate this lag by comparing the current WAL write location on the primary with the last WAL location received by the standby. These locations can be retrieved using pg_current_wal_lsn on the primary and pg_last_wal_receive_lsn on the standby, respectively (see Table 9.91 and Table 9.92 for details). The last WAL receive location in the standby is also displayed in the process status of the WAL receiver process, displayed using the ps command (see Section 28.1 for details).
您可以通过 pg_stat_replication 视图来检索 WAL 发送器进程的列表。 pg_current_wal_lsn 和视图的 sent_lsn 字段之间较大的差异可能表明原始服务器负载过重,而备用服务器上的 sent_lsn 和 pg_last_wal_receive_lsn 之间的差异可能表明网络延迟或备用服务器负载过重。
You can retrieve a list of WAL sender processes via the pg_stat_replication view. Large differences between pg_current_wal_lsn and the view’s sent_lsn field might indicate that the primary server is under heavy load, while differences between sent_lsn and pg_last_wal_receive_lsn on the standby might indicate network delay, or that the standby is under heavy load.
在热备用中,可以通过 pg_stat_wal_receiver 视图来检索 WAL 接收器进程的状态。 pg_last_wal_replay_lsn 和视图的 flushed_lsn 之间的较大差异表明接收 WAL 的速度比重放的速度快。
On a hot standby, the status of the WAL receiver process can be retrieved via the pg_stat_wal_receiver view. A large difference between pg_last_wal_replay_lsn and the view’s flushed_lsn indicates that WAL is being received faster than it can be replayed.
27.2.6. Replication Slots #
复制槽提供了一种自动化方式,以确保主数据库在备用数据库全部接收 WAL 段后才删除 WAL 段,并且即使备用数据库断开连接,主数据库也不会删除可能导致 recovery conflict 的行。
Replication slots provide an automated way to ensure that the primary does not remove WAL segments until they have been received by all standbys, and that the primary does not remove rows which could cause a recovery conflict even when the standby is disconnected.
如果没有使用复制槽,可以使用 wal_keep_size 防止删除旧 WAL 段,或者使用 archive_command 或 archive_library 将 WAL 段存储在存档中。但是,这些方法通常会导致保留比所需更多的 WAL 段,而复制槽只会保留已知需要的段数。另一方面,复制槽可以保留如此多的 WAL 段,以至于它们填满了为 pg_wal 分配的空间; max_slot_wal_keep_size 限制了复制槽保留的 WAL 文件的大小。
In lieu of using replication slots, it is possible to prevent the removal of old WAL segments using wal_keep_size, or by storing the segments in an archive using archive_command or archive_library. However, these methods often result in retaining more WAL segments than required, whereas replication slots retain only the number of segments known to be needed. On the other hand, replication slots can retain so many WAL segments that they fill up the space allocated for pg_wal; max_slot_wal_keep_size limits the size of WAL files retained by replication slots.
类似地, hot_standby_feedback 本身,如果不使用复制槽,可以防止 vacuum 删除相关行,但在备用数据库未连接的任何时间段内,并不提供任何保护。复制槽克服了这些缺点。
Similarly, hot_standby_feedback on its own, without also using a replication slot, provides protection against relevant rows being removed by vacuum, but provides no protection during any time period when the standby is not connected. Replication slots overcome these disadvantages.
27.2.6.1. Querying and Manipulating Replication Slots #
每个复制槽都有一个名称,该名称可以包含小写字母、数字和下划线字符。
Each replication slot has a name, which can contain lower-case letters, numbers, and the underscore character.
可以在 pg_replication_slots 视图中查看现有的复制槽及其状态。
Existing replication slots and their state can be seen in the pg_replication_slots view.
插槽可以通过流复制协议(见 Section 55.4)或 SQL 函数(见 Section 9.27.6)创建和删除。
Slots can be created and dropped either via the streaming replication protocol (see Section 55.4) or via SQL functions (see Section 9.27.6).
27.2.6.2. Configuration Example #
您可以这样创建一个复制槽位:
You can create a replication slot like this:
postgres=# SELECT * FROM pg_create_physical_replication_slot('node_a_slot');
slot_name | lsn
-------------+-----
node_a_slot |
postgres=# SELECT slot_name, slot_type, active FROM pg_replication_slots;
slot_name | slot_type | active
-------------+-----------+--------
node_a_slot | physical | f
(1 row)
要配置备用使用此槽位,应该在备用上配置 primary_slot_name。这是一个简单的示例:
To configure the standby to use this slot, primary_slot_name should be configured on the standby. Here is a simple example:
primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
primary_slot_name = 'node_a_slot'
27.2.7. Cascading Replication #
级联复制功能允许备用服务器接受复制连接并将 WAL 记录流式传输到其他备用服务器,充当一个中继。这可用于减少到主机的直接连接数量,还可以最大程度地减少站点间带宽开销。
The cascading replication feature allows a standby server to accept replication connections and stream WAL records to other standbys, acting as a relay. This can be used to reduce the number of direct connections to the primary and also to minimize inter-site bandwidth overheads.
既作为接收者又作为发送者的备用称为级联备用。更直接连接到主机的备用称为上游服务器,而更“远”的备用服务器称为下游服务器。级联复制不会对下游服务器的数量或排列施加限制,尽管每个备用只连接到一个上游服务器,而上游服务器最终链接到一个主服务器。
A standby acting as both a receiver and a sender is known as a cascading standby. Standbys that are more directly connected to the primary are known as upstream servers, while those standby servers further away are downstream servers. Cascading replication does not place limits on the number or arrangement of downstream servers, though each standby connects to only one upstream server which eventually links to a single primary server.
级联备用发送的不仅从主接收的 WAL 记录,还包括从存档恢复的 WAL 记录。因此,即使某些上游连接中的复制连接终止,只要有新的 WAL 记录可用,流复制仍会继续进行下游部分。
A cascading standby sends not only WAL records received from the primary but also those restored from the archive. So even if the replication connection in some upstream connection is terminated, streaming replication continues downstream for as long as new WAL records are available.
级联复制当前是异步的。同步复制(见 Section 27.2.8)设置目前对级联复制没有影响。
Cascading replication is currently asynchronous. Synchronous replication (see Section 27.2.8) settings have no effect on cascading replication at present.
热备用反馈向上游传播,不管级联的排列如何。
Hot standby feedback propagates upstream, whatever the cascaded arrangement.
如果上游备用服务器被提升为新的主机,如果将 recovery_target_timeline 设置为 'latest'(默认值),下游服务器将继续从新主机流式传输。
If an upstream standby server is promoted to become the new primary, downstream servers will continue to stream from the new primary if recovery_target_timeline is set to 'latest' (the default).
要使用级联复制,请设置级联备用服务器,以便它可以接受复制连接(即,设置 max_wal_senders 和 hot_standby,并配置 host-based authentication)。您还需要在下游备用服务器中设置 primary_conninfo 以指向级联备用服务器。
To use cascading replication, set up the cascading standby so that it can accept replication connections (that is, set max_wal_senders and hot_standby, and configure host-based authentication). You will also need to set primary_conninfo in the downstream standby to point to the cascading standby.
27.2.8. Synchronous Replication #
PostgreSQL 流复制默认是异步的。如果主机服务器崩溃,那么提交的一些事务可能没有复制到备用服务器,这会导致数据丢失。数据丢失的量与故障转移时的复制延迟成正比。
PostgreSQL streaming replication is asynchronous by default. If the primary server crashes then some transactions that were committed may not have been replicated to the standby server, causing data loss. The amount of data loss is proportional to the replication delay at the time of failover.
同步复制提供了确认事务所做的所有更改已传输到一个或多个同步备用服务器的功能。这扩展了事务提交提供的那一标准级别的持久性。这种保护级别在计算机科学理论中称为 2 安全复制,在将 synchronous_commit 设置为 remote_write 时称为组 1 安全(组安全和 1 安全)。
Synchronous replication offers the ability to confirm that all changes made by a transaction have been transferred to one or more synchronous standby servers. This extends that standard level of durability offered by a transaction commit. This level of protection is referred to as 2-safe replication in computer science theory, and group-1-safe (group-safe and 1-safe) when synchronous_commit is set to remote_write.
在请求同步复制时,每次提交写事务都会等待收到确认,表明该提交已同时写入主机和备用服务器磁盘上的预写日志。唯一可能丢失数据的情况是主机和备用同时发生崩溃。虽然这可以提供更高的持久性级别,但前提是 sysadmin 谨慎安置和管理这两台服务器。等待确认会增加用户对服务器崩溃时更改不会丢失的信心,但也一定会增加请求事务的响应时间。最短等待时间是主机和备用之间的往返时间。
When requesting synchronous replication, each commit of a write transaction will wait until confirmation is received that the commit has been written to the write-ahead log on disk of both the primary and standby server. The only possibility that data can be lost is if both the primary and the standby suffer crashes at the same time. This can provide a much higher level of durability, though only if the sysadmin is cautious about the placement and management of the two servers. Waiting for confirmation increases the user’s confidence that the changes will not be lost in the event of server crashes but it also necessarily increases the response time for the requesting transaction. The minimum wait time is the round-trip time between primary and standby.
只读事务和事务回滚不必等待备用服务器回复。子事务提交不等待备用服务器回复,只等待顶级提交。诸如数据加载或索引构建之类的长时间操作不等待最后的提交消息。所有两阶段提交操作都需要提交等待,包括准备和提交。
Read-only transactions and transaction rollbacks need not wait for replies from standby servers. Subtransaction commits do not wait for responses from standby servers, only top-level commits. Long running actions such as data loading or index building do not wait until the very final commit message. All two-phase commit actions require commit waits, including both prepare and commit.
同步备用可以是物理复制备用或逻辑复制订阅者。它还可以是任何其他物理或逻辑 WAL 复制流消费者,该消费者知道如何发送适当的反馈消息。除了内置的物理和逻辑复制系统之外,这还包括诸如 pg_receivewal 和 pg_recvlogical 这样的特殊程序以及一些第三方复制系统和自定义程序。有关同步复制支持的详细信息,请查看相应文档。
A synchronous standby can be a physical replication standby or a logical replication subscriber. It can also be any other physical or logical WAL replication stream consumer that knows how to send the appropriate feedback messages. Besides the built-in physical and logical replication systems, this includes special programs such as pg_receivewal and pg_recvlogical as well as some third-party replication systems and custom programs. Check the respective documentation for details on synchronous replication support.
27.2.8.1. Basic Configuration #
在配置了流复制后,配置同步复制只需要一个额外的配置步骤: synchronous_standby_names 必须设置为非空值。synchronous_commit 也必须设置为 on,但由于这是默认值,通常不需要更改。(请参见 Section 20.5.1 和 Section 20.6.2。)此配置将导致每次提交都需要等待确认备用服务器已将提交记录写入到持久性存储中。synchronous_commit 可以由各个用户设置,因此它可以在配置文件中针对特定用户或数据库进行配置,或者在每个事务的基础上由应用程序动态控制持久性保证。
Once streaming replication has been configured, configuring synchronous replication requires only one additional configuration step: synchronous_standby_names must be set to a non-empty value. synchronous_commit must also be set to on, but since this is the default value, typically no change is required. (See Section 20.5.1 and Section 20.6.2.) This configuration will cause each commit to wait for confirmation that the standby has written the commit record to durable storage. synchronous_commit can be set by individual users, so it can be configured in the configuration file, for particular users or databases, or dynamically by applications, in order to control the durability guarantee on a per-transaction basis.
提交记录写入主机的磁盘后,WAL 记录将发送到备用。除非备用上将 wal_receiver_status_interval 设置为零,否则备用将在每次将新 WAL 数据批量写入磁盘时发送回复消息。如果将 synchronous_commit 设置为 remote_apply,则备用将在重新执行提交记录时发送回复消息,使事务可见。如果根据主服务器上 synchronous_standby_names 的设置将备用选为同步备用,那么将考虑来自该备用的回复消息以及其他同步备用的回复消息,以决定何时释放等待确认提交记录已接收的事务。这些参数允许管理员指定哪些备用服务器应该是同步备用。请注意,同步复制的配置主要在主机上进行。已命名的备用必须直接连接到主机;主机不知道使用级联复制的下游备用服务器。
After a commit record has been written to disk on the primary, the WAL record is then sent to the standby. The standby sends reply messages each time a new batch of WAL data is written to disk, unless wal_receiver_status_interval is set to zero on the standby. In the case that synchronous_commit is set to remote_apply, the standby sends reply messages when the commit record is replayed, making the transaction visible. If the standby is chosen as a synchronous standby, according to the setting of synchronous_standby_names on the primary, the reply messages from that standby will be considered along with those from other synchronous standbys to decide when to release transactions waiting for confirmation that the commit record has been received. These parameters allow the administrator to specify which standby servers should be synchronous standbys. Note that the configuration of synchronous replication is mainly on the primary. Named standbys must be directly connected to the primary; the primary knows nothing about downstream standby servers using cascaded replication.
将 synchronous_commit 设置为 remote_write 将导致每次提交都等待确认备用 đã收到提交记录并将其写入其自身操作系统,但不是等待将数据刷新到备用上的磁盘。此设置提供的持久性保证弱于 on:备用可能会在操作系统崩溃的情况下丢失数据,但不会在 PostgreSQL 崩溃的情况下丢失数据。但是,在实践中这是一种有用的设置,因为它可以减少事务的响应时间。只有当主机和备用同时崩溃且主机的数据库同时损坏时,才会丢失数据。
Setting synchronous_commit to remote_write will cause each commit to wait for confirmation that the standby has received the commit record and written it out to its own operating system, but not for the data to be flushed to disk on the standby. This setting provides a weaker guarantee of durability than on does: the standby could lose the data in the event of an operating system crash, though not a PostgreSQL crash. However, it’s a useful setting in practice because it can decrease the response time for the transaction. Data loss could only occur if both the primary and the standby crash and the database of the primary gets corrupted at the same time.
将 synchronous_commit 设置为 remote_apply 将导致每次提交都等待当前同步备用报告它们已重新执行事务,使其对用户查询可见。在简单的情况下,这允许具有因果一致性的负载平衡。
Setting synchronous_commit to remote_apply will cause each commit to wait until the current synchronous standbys report that they have replayed the transaction, making it visible to user queries. In simple cases, this allows for load balancing with causal consistency.
如果请求快速关闭,则用户将停止等待。但是,与使用异步复制一样,在所有未完成 WAL 记录传输到当前连接的备用服务器之前,服务器不会完全关闭。
Users will stop waiting if a fast shutdown is requested. However, as when using asynchronous replication, the server will not fully shutdown until all outstanding WAL records are transferred to the currently connected standby servers.
27.2.8.2. Multiple Synchronous Standbys #
同步复制支持一个或多个同步备用服务器;事务将等待直到被视为同步的所有备用服务器确认收到其数据。事务必须等待从其收到答复的同步备用服务器的数量在 synchronous_standby_names 中指定。此参数还指定备用名称列表以及从已列出的备用名称中选择同步备用的方法 (FIRST 和 ANY)。
Synchronous replication supports one or more synchronous standby servers; transactions will wait until all the standby servers which are considered as synchronous confirm receipt of their data. The number of synchronous standbys that transactions must wait for replies from is specified in synchronous_standby_names. This parameter also specifies a list of standby names and the method (FIRST and ANY) to choose synchronous standbys from the listed ones.
方法 FIRST 指定了基于优先级的同步复制,并且使得事务提交等待到它们的 WAL 记录复制到根据其优先级选择的请求的同步备用服务器数量为止。在列表中较早出现名称的备用服务器具有较高优先级,并将被视为同步。稍后出现在此列表中的其他备用服务器表示潜在的同步备用服务器。如果当前任何同步备用服务器由于某种原因断开连接,则它将立即被具有次高优先级的备用服务器取代。
The method FIRST specifies a priority-based synchronous replication and makes transaction commits wait until their WAL records are replicated to the requested number of synchronous standbys chosen based on their priorities. The standbys whose names appear earlier in the list are given higher priority and will be considered as synchronous. Other standby servers appearing later in this list represent potential synchronous standbys. If any of the current synchronous standbys disconnects for whatever reason, it will be replaced immediately with the next-highest-priority standby.
基于优先级的多个同步备用服务器的 synchronous_standby_names 示例如下:
An example of synchronous_standby_names for a priority-based multiple synchronous standbys is:
synchronous_standby_names = 'FIRST 2 (s1, s2, s3)'
在此示例中,如果四个备用服务器 s1、s2、s3 和 s4 正在运行,则两个备用服务器 s1 和 s2 将被选为同步备用服务器,因为它们的名称出现在备用名称列表中较早的位置。s3 是一个潜在的同步备用服务器,当 s1 或 s2 出现故障时,它将接管同步备用服务器的角色。s4 是一个异步备用服务器,因为其名称不在列表中。
In this example, if four standby servers s1, s2, s3 and s4 are running, the two standbys s1 and s2 will be chosen as synchronous standbys because their names appear early in the list of standby names. s3 is a potential synchronous standby and will take over the role of synchronous standby when either of s1 or s2 fails. s4 is an asynchronous standby since its name is not in the list.
方法 ANY 指定了基于法定的同步复制,并且使得事务提交等待到它们的 WAL 记录复制到请求的 @{17} 列表中的同步备用服务器数量为止。
The method ANY specifies a quorum-based synchronous replication and makes transaction commits wait until their WAL records are replicated to at least the requested number of synchronous standbys in the list.
基于法定的多个同步备用服务器的 synchronous_standby_names 示例如下:
An example of synchronous_standby_names for a quorum-based multiple synchronous standbys is:
synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
在此示例中,如果四个备用服务器 s1、s2、s3 和 s4 正在运行,则事务提交将等待来自 s1、s2 和 s3 的任何两个备用服务器的答复。s4 是一个异步备用服务器,因为其名称不在列表中。
In this example, if four standby servers s1, s2, s3 and s4 are running, transaction commits will wait for replies from at least any two standbys of s1, s2 and s3. s4 is an asynchronous standby since its name is not in the list.
可以使用 pg_stat_replication 视图查看备用服务器的同步状态。
The synchronous states of standby servers can be viewed using the pg_stat_replication view.
27.2.8.3. Planning for Performance #
同步复制通常需要仔细计划并放置备用服务器,以确保应用程序性能可以接受。等待不会利用系统资源,但事务锁将继续保持,直到传输得到确认。因此,不谨慎地使用同步复制会因增加响应时间和更高的争用而降低数据库应用程序的性能。
Synchronous replication usually requires carefully planned and placed standby servers to ensure applications perform acceptably. Waiting doesn’t utilize system resources, but transaction locks continue to be held until the transfer is confirmed. As a result, incautious use of synchronous replication will reduce performance for database applications because of increased response times and higher contention.
PostgreSQL 允许应用程序开发人员通过复制指定所需的持久性级别。这可针对整个系统指定,尽管也可以针对特定用户或连接,甚至是单个事务指定。
PostgreSQL allows the application developer to specify the durability level required via replication. This can be specified for the system overall, though it can also be specified for specific users or connections, or even individual transactions.
例如,应用程序工作负载可能包括:10% 的更改是重要的客户详细数据,而 90% 的更改是不太重要的数据,如果丢失,业务可以更轻松地应对,例如用户之间的聊天消息。
For example, an application workload might consist of: 10% of changes are important customer details, while 90% of changes are less important data that the business can more easily survive if it is lost, such as chat messages between users.
通过在应用程序级别(在主服务器上)指定的同步复制选项,我们可以为最重要的更改提供同步复制,而不会减慢全部工作负载的大部分。应用程序级别选项是允许高性能应用程序获得同步复制优势的重要且实用的工具。
With synchronous replication options specified at the application level (on the primary) we can offer synchronous replication for the most important changes, without slowing down the bulk of the total workload. Application level options are an important and practical tool for allowing the benefits of synchronous replication for high performance applications.
您应该考虑到网络带宽必须高于 WAL 数据生成速率。
You should consider that the network bandwidth must be higher than the rate of generation of WAL data.
27.2.8.4. Planning for High Availability #
当 synchronous_commit 设置为 on、remote_apply 或 remote_write 时,synchronous_standby_names 指定事务提交将等待其答复的同步备用服务器的数量和名称。如果任何同步备用服务器发生崩溃,则此类事务提交可能永远无法完成。
synchronous_standby_names specifies the number and names of synchronous standbys that transaction commits made when synchronous_commit is set to on, remote_apply or remote_write will wait for responses from. Such transaction commits may never be completed if any one of the synchronous standbys should crash.
实现高可用性的最佳解决方案是确保您保持尽可能多的请求同步备用服务器。这可以通过使用 synchronous_standby_names 命名多个潜在同步备用服务器来实现。
The best solution for high availability is to ensure you keep as many synchronous standbys as requested. This can be achieved by naming multiple potential synchronous standbys using synchronous_standby_names.
在基于优先级的同步复制中,名称出现在列表中较早位置的备用服务器将用作同步备用服务器。这些备用服务器之后列出的备用服务器将在当前备用之一发生故障时接管同步备用服务器的角色。
In a priority-based synchronous replication, the standbys whose names appear earlier in the list will be used as synchronous standbys. Standbys listed after these will take over the role of synchronous standby if one of current ones should fail.
在基于法定的同步复制中,出现在列表中的所有备用服务器都将用作同步备用服务器候选。即使其中一个发生故障,其他备用服务器也将继续执行同步备用服务器候选的角色。
In a quorum-based synchronous replication, all the standbys appearing in the list will be used as candidates for synchronous standbys. Even if one of them should fail, the other standbys will keep performing the role of candidates of synchronous standby.
当备用服务器首次连接到主服务器时,它仍未正确同步。这被称为 catchup 模式。一旦备用服务器和主服务器之间的滞后第一次达到零,我们将进入实时 streaming 状态。恢复持续时间在备用服务器创建后立即可能很长。如果关闭备用服务器,则恢复时间段将根据备用服务器关闭的时间长度增加。一旦达到 streaming 状态,备用服务器才能够成为同步备用服务器。可以通过 pg_stat_replication 视图查看此状态。
When a standby first attaches to the primary, it will not yet be properly synchronized. This is described as catchup mode. Once the lag between standby and primary reaches zero for the first time we move to real-time streaming state. The catch-up duration may be long immediately after the standby has been created. If the standby is shut down, then the catch-up period will increase according to the length of time the standby has been down. The standby is only able to become a synchronous standby once it has reached streaming state. This state can be viewed using the pg_stat_replication view.
如果在提交等待确认时主数据库重新启动,则主数据库恢复后这些等待的事务将被标记为已完全提交。无法确定所有备用数据库在主数据库崩溃时都收到了所有未处理的 WAL 数据。即使备用数据库上显示一些事务已提交,但它们在主数据库上也可能显示为已提交。我们提供的保证是,在所有同步备用数据库安全收到 WAL 数据之前,应用程序不会收到事务已成功提交的明确确认。
If primary restarts while commits are waiting for acknowledgment, those waiting transactions will be marked fully committed once the primary database recovers. There is no way to be certain that all standbys have received all outstanding WAL data at time of the crash of the primary. Some transactions may not show as committed on the standby, even though they show as committed on the primary. The guarantee we offer is that the application will not receive explicit acknowledgment of the successful commit of a transaction until the WAL data is known to be safely received by all the synchronous standbys.
如果您确实无法保留尽可能多的同步备用数据库,那么您应该减少事务提交必须等待响应的同步备用数据库数量 synchronous_standby_names(或将其禁用),并重新加载主服务器上的配置文件。
If you really cannot keep as many synchronous standbys as requested then you should decrease the number of synchronous standbys that transaction commits must wait for responses from in synchronous_standby_names (or disable it) and reload the configuration file on the primary server.
如果主数据库与其余备用服务器隔离,则应该故障转移到这些其他剩余备用服务器中最好的备用服务器。
If the primary is isolated from remaining standby servers you should fail over to the best candidate of those other remaining standby servers.
如果您需要在事务等待时重新创建备用服务器,请确保在会话中运行命令 pg_backup_start() 和 pg_backup_stop(),其中 synchronous_commit = off,否则这些请求将无限期等待备用服务器出现。
If you need to re-create a standby server while transactions are waiting, make sure that the commands pg_backup_start() and pg_backup_stop() are run in a session with synchronous_commit = off, otherwise those requests will wait forever for the standby to appear.
27.2.9. Continuous Archiving in Standby #
在备用服务器中使用连续 WAL 归档时,有两种不同的情况:WAL 归档可以在主数据库和备用数据库之间共享,或者备用数据库可以有自己的 WAL 归档。当备用数据库有自己的 WAL 归档时,将 archive_mode 设置为 always,备用数据库将为其接收的每一段 WAL 调用归档命令,无论它是通过从归档中恢复还是通过流复制。共享归档可以以类似的方式处理,但是 archive_command 或 archive_library 必须测试被归档的文件是否已存在,并且现有文件是否具有相同的内容。这要求在 archive_command 或 archive_library 中更加小心,因为它必须小心不要用其他内容覆盖现有文件,但在两次归档完全相同的文件时返回成功状态。并且所有这些都必须在没有竞争条件的情况下完成,如果两台服务器尝试同时归档同一个文件。
When continuous WAL archiving is used in a standby, there are two different scenarios: the WAL archive can be shared between the primary and the standby, or the standby can have its own WAL archive. When the standby has its own WAL archive, set archive_mode to always, and the standby will call the archive command for every WAL segment it receives, whether it’s by restoring from the archive or by streaming replication. The shared archive can be handled similarly, but the archive_command or archive_library must test if the file being archived exists already, and if the existing file has identical contents. This requires more care in the archive_command or archive_library, as it must be careful to not overwrite an existing file with different contents, but return success if the exactly same file is archived twice. And all that must be done free of race conditions, if two servers attempt to archive the same file at the same time.
如果将 archive_mode 设置为 on,则在恢复或备用模式期间不会启用归档器。如果备用服务器已提升,则它将在提升后开始归档,但不会归档它自己未生成的任何 WAL 或时间线历史文件。若要在归档中获得 WAL 文件的完整系列,必须确保在 WAL 到达备用数据库之前对所有 WAL 进行归档。使用基于文件的日志传输时,必然会遇到这种情况,因为备用数据库只能恢复归档中找到的文件,但启用流复制时则不会。当服务器不在恢复模式中时,on 模式和 always 模式之间没有区别。
If archive_mode is set to on, the archiver is not enabled during recovery or standby mode. If the standby server is promoted, it will start archiving after the promotion, but will not archive any WAL or timeline history files that it did not generate itself. To get a complete series of WAL files in the archive, you must ensure that all WAL is archived, before it reaches the standby. This is inherently true with file-based log shipping, as the standby can only restore files that are found in the archive, but not if streaming replication is enabled. When a server is not in recovery mode, there is no difference between on and always modes.