Postgresql 中文操作指南
pg_rewind
pg_rewind — 使用它分叉的另一个数据目录同步 PostgreSQL 数据目录
pg_rewind — synchronize a PostgreSQL data directory with another data directory that was forked from it
Synopsis
pg_rewind [ option …] { -D | —target-pgdata } directory { —source-pgdata=_directory _ | _—source-server=_connstr }
pg_rewind [option…] { -D | —target-pgdata } directory { _—source-pgdata=_directory | _—source-server=_connstr }
Description
在集群的时间线出现分歧后,pg_rewind 是一个用于将 PostgreSQL 集群与同一集群的另一个副本同步的工具。典型的情况是,在故障切换后,将旧的主服务器作为跟随新主服务器的备用服务器重新联机。
pg_rewind is a tool for synchronizing a PostgreSQL cluster with another copy of the same cluster, after the clusters' timelines have diverged. A typical scenario is to bring an old primary server back online after failover as a standby that follows the new primary.
在成功回滚后,目标数据目录的状态类似于源数据目录的基本备份。与创建新的基本备份或使用 rsync 等工具不同,pg_rewind 不需要比较或复制集群中未更改的关系块。只会复制现有关系文件中更改的块;所有其他文件,包括新的关系文件、配置文件和 WAL 段,都会全部复制。因此,当数据库较大且集群之间只有很小一部分块不同时,回滚操作明显比其他方法快。
After a successful rewind, the state of the target data directory is analogous to a base backup of the source data directory. Unlike taking a new base backup or using a tool like rsync, pg_rewind does not require comparing or copying unchanged relation blocks in the cluster. Only changed blocks from existing relation files are copied; all other files, including new relation files, configuration files, and WAL segments, are copied in full. As such the rewind operation is significantly faster than other approaches when the database is large and only a small fraction of blocks differ between the clusters.
pg_rewind 会检查源集群和目标集群的时间线历史记录,以确定它们分歧的点,并期望在目标集群的 pg_wal 目录中找到一直回溯到分歧点的 WAL。分歧点可以在目标时间线、源时间线或其公共祖先上找到。在故障切换的典型场景中,目标集群在分歧后不久被关闭,这不是问题,但如果目标集群在分歧后运行了很长时间,它的旧 WAL 文件可能不再存在。在这种情况下,你可以手动将它们从 WAL 存档复制到 pg_wal 目录,或使用 -c 选项运行 pg_rewind,以自动从 WAL 存档中检索它们。pg_rewind 的使用不仅限于故障切换,例如,可以提升备用服务器,运行一些写入事务,然后再回滚为备用服务器。
pg_rewind examines the timeline histories of the source and target clusters to determine the point where they diverged, and expects to find WAL in the target cluster’s pg_wal directory reaching all the way back to the point of divergence. The point of divergence can be found either on the target timeline, the source timeline, or their common ancestor. In the typical failover scenario where the target cluster was shut down soon after the divergence, this is not a problem, but if the target cluster ran for a long time after the divergence, its old WAL files might no longer be present. In this case, you can manually copy them from the WAL archive to the pg_wal directory, or run pg_rewind with the -c option to automatically retrieve them from the WAL archive. The use of pg_rewind is not limited to failover, e.g., a standby server can be promoted, run some write transactions, and then rewound to become a standby again.
在运行 pg_rewind 后,WAL 重新播放需要完成,以便数据目录处于一致的状态。当目标服务器重新启动时,它将进入存档恢复并重新播放分歧点之前最后一个检查点以来源服务器中生成的所有 WAL。如果在运行 pg_rewind 时某些 WAL 不再在源服务器中可用,因此无法被 pg_rewind 会话复制,则必须在目标服务器启动时使其可用。这可以通过在目标数据目录中创建一个 recovery.signal 文件和配置 postgresql.conf 中合适的 restore_command 来完成。
After running pg_rewind, WAL replay needs to complete for the data directory to be in a consistent state. When the target server is started again it will enter archive recovery and replay all WAL generated in the source server from the last checkpoint before the point of divergence. If some of the WAL was no longer available in the source server when pg_rewind was run, and therefore could not be copied by the pg_rewind session, it must be made available when the target server is started. This can be done by creating a recovery.signal file in the target data directory and by configuring a suitable restore_command in postgresql.conf.
pg_rewind 要求目标服务器在 postgresql.conf 中启用 wal_log_hints 选项或在使用 initdb 初始化集群时启用数据校验和。默认情况下,这两个目前都不启用。 full_page_writes 也必须设置为 on ,但默认情况下它是启用的。
pg_rewind requires that the target server either has the wal_log_hints option enabled in postgresql.conf or data checksums enabled when the cluster was initialized with initdb. Neither of these are currently on by default. full_page_writes must also be set to on, but is enabled by default.
Warning: Failures While Rewinding
如果 pg_rewind 在处理过程中失败,则目标的数据文件夹可能处于无法恢复的状态。在这种情况下,建议采用新的全新备份。
If pg_rewind fails while processing, then the data folder of the target is likely not in a state that can be recovered. In such a case, taking a new fresh backup is recommended.
由于 pg_rewind 完全从源中复制配置文件,因此在重新启动目标服务器之前,可能需要更正用于恢复的配置,特别是在目标重新作为源的备用服务器引入时。如果在回滚操作完成后但未配置恢复的情况下重新启动服务器,目标可能再次与主服务器分歧。
As pg_rewind copies configuration files entirely from the source, it may be required to correct the configuration used for recovery before restarting the target server, especially if the target is reintroduced as a standby of the source. If you restart the server after the rewind operation has finished but without configuring recovery, the target may again diverge from the primary.
如果 pg_rewind 找到无法直接写入的文件,它将立即失败。例如,当源和目标服务器使用相同的只读 SSL 密钥和证书的文件映射时,可能会发生这种情况。如果此类文件存在于目标服务器上,建议在运行 pg_rewind 之前删除它们。在进行回滚后,其中一些文件可能已从源复制,在这种情况下,可能需要删除复制的数据并恢复回滚之前使用的链接集。
pg_rewind will fail immediately if it finds files it cannot write directly to. This can happen for example when the source and the target server use the same file mapping for read-only SSL keys and certificates. If such files are present on the target server it is recommended to remove them before running pg_rewind. After doing the rewind, some of those files may have been copied from the source, in which case it may be necessary to remove the data copied and restore back the set of links used before the rewind.
Options
pg_rewind 接受以下命令行参数:
pg_rewind accepts the following command-line arguments:
-
-D _directory—target-pgdata=_directory
-
This option specifies the target data directory that is synchronized with the source. The target server must be shut down cleanly before running pg_rewind
-
-
—source-pgdata=_directory_
-
Specifies the file system path to the data directory of the source server to synchronize the target with. This option requires the source server to be cleanly shut down.
-
-
—source-server=_connstr_
-
Specifies a libpq connection string to connect to the source PostgreSQL server to synchronize the target with. The connection must be a normal (non-replication) connection with a role having sufficient permissions to execute the functions used by pg_rewind on the source server (see Notes section for details) or a superuser role. This option requires the source server to be running and accepting connections.
-
-
-R_—write-recovery-conf_
-
Create standby.signal and append connection settings to postgresql.auto.conf in the output directory. —source-server is mandatory with this option.
-
-
-n_—dry-run_
-
Do everything except actually modifying the target directory.
-
-
-N_—no-sync_
-
By default, pg_rewind will wait for all files to be written safely to disk. This option causes pg_rewind to return without waiting, which is faster, but means that a subsequent operating system crash can leave the data directory corrupt. Generally, this option is useful for testing but should not be used on a production installation.
-
-
-P_—progress_
-
Enables progress reporting. Turning this on will deliver an approximate progress report while copying data from the source cluster.
-
-
-c_—restore-target-wal_
-
Use restore_command defined in the target cluster configuration to retrieve WAL files from the WAL archive if these files are no longer available in the pg_wal directory.
-
-
—config-file=_filename_
-
Use the specified main server configuration file for the target cluster. This affects pg_rewind when it uses internally the postgres command for the rewind operation on this cluster (when retrieving restore_command with the option -c/--restore-target-wal and when forcing a completion of crash recovery).
-
-
—debug
-
Print verbose debugging output that is mostly useful for developers debugging pg_rewind.
-
-
—no-ensure-shutdown
-
pg_rewind requires that the target server is cleanly shut down before rewinding. By default, if the target server is not shut down cleanly, pg_rewind starts the target server in single-user mode to complete crash recovery first, and stops it. By passing this option, pg_rewind skips this and errors out immediately if the server is not cleanly shut down. Users are expected to handle the situation themselves in that case.
-
-
-V_—version_
-
Display version information, then exit.
-
-
-?_—help_
-
Show help, then exit.
-
Environment
使用 —source-server 选项时,pg_rewind 还会使用 libpq 支持的环境变量(请参阅 Section 34.15 )。
When —source-server option is used, pg_rewind also uses the environment variables supported by libpq (see Section 34.15).
环境变量 PG_COLOR 指定是否在诊断消息中使用颜色。可能的值是 always 、 auto 和 never 。
The environment variable PG_COLOR specifies whether to use color in diagnostic messages. Possible values are always, auto and never.
Notes
以联机集群作为源执行 pg_rewind 时,可以使用具有足够权限以执行源集群上 pg_rewind 使用的函数的角色,而不使用超级用户。下面介绍如何创建这种角色(在此我们将其命名为 rewind_user ):
When executing pg_rewind using an online cluster as source, a role having sufficient permissions to execute the functions used by pg_rewind on the source cluster can be used instead of a superuser. Here is how to create such a role, named rewind_user here:
CREATE USER rewind_user LOGIN;
GRANT EXECUTE ON function pg_catalog.pg_ls_dir(text, boolean, boolean) TO rewind_user;
GRANT EXECUTE ON function pg_catalog.pg_stat_file(text, boolean) TO rewind_user;
GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text) TO rewind_user;
GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text, bigint, bigint, boolean) TO rewind_user;