Postgresql 中文操作指南
26.3. Continuous Archiving and Point-in-Time Recovery (PITR) #
在所有情况下,PostgreSQL 都会在集群数据目录的 pg_wal/ 子目录中维护 write ahead log (WAL)。该日志会记录对数据库数据文件进行的所有更改。此日志主要用于防崩溃目的:如果系统崩溃,可以通过“重放”自上次检查点以来所做的日志条目来将数据库恢复到一致状态。然而,日志的存在使得可以使用第三种策略来备份数据库:我们可以将文件系统级别的备份与 WAL 文件的备份相结合。如果需要恢复,我们会还原文件系统备份,然后从备份的 WAL 文件中重放,以将系统恢复到当前状态。与前两种方法相比,这种方法的管理较为复杂,但它具有一些显著的优点:
At all times, PostgreSQL maintains a write ahead log (WAL) in the pg_wal/ subdirectory of the cluster’s data directory. The log records every change made to the database’s data files. This log exists primarily for crash-safety purposes: if the system crashes, the database can be restored to consistency by “replaying” the log entries made since the last checkpoint. However, the existence of the log makes it possible to use a third strategy for backing up databases: we can combine a file-system-level backup with backup of the WAL files. If recovery is needed, we restore the file system backup and then replay from the backed-up WAL files to bring the system to a current state. This approach is more complex to administer than either of the previous approaches, but it has some significant benefits:
Note
pg_dump 和 pg_dumpall 不会生成文件系统级别的备份,不能用作连续归档解决方案的一部分。这些转储是 logical 并且不包含 WAL 重放所需的足够信息。
pg_dump and pg_dumpall do not produce file-system-level backups and cannot be used as part of a continuous-archiving solution. Such dumps are logical and do not contain enough information to be used by WAL replay.
与普通文件系统备份技术一样,此方法只能支持整个数据库集群的恢复,而不是支持子集。此外,它需要大量的归档存储:基本备份可能会很大,而且繁忙的系统会生成需要归档的数兆字节的 WAL 流量。尽管如此,在需要高可靠性的许多情况下,它仍然是首选的备份技术。
As with the plain file-system-backup technique, this method can only support restoration of an entire database cluster, not a subset. Also, it requires a lot of archival storage: the base backup might be bulky, and a busy system will generate many megabytes of WAL traffic that have to be archived. Still, it is the preferred backup technique in many situations where high reliability is needed.
要成功地使用连续归档(许多数据库供应商也称之为“在线备份”),您需要一个连续的已归档 WAL 文件序列,该序列至少可以追溯到备份的开始时间。因此,要开始,您应该设置并测试用于归档 WAL 文件 before 的过程,然后进行首次基本备份。因此,我们首先讨论 WAL 文件归档的机制。
To recover successfully using continuous archiving (also called “online backup” by many database vendors), you need a continuous sequence of archived WAL files that extends back at least as far as the start time of your backup. So to get started, you should set up and test your procedure for archiving WAL files before you take your first base backup. Accordingly, we first discuss the mechanics of archiving WAL files.
26.3.1. Setting Up WAL Archiving #
从抽象意义上讲,正在运行的 PostgreSQL 系统会产生无限长的 WAL 记录序列。系统将在物理上将此序列划分为 WAL segment files,它们通常每个 16MB(尽管可以在 initdb 期间更改段大小)。段文件被赋予数字名称,这些名称反映它们在抽象 WAL 序列中的位置。当不使用 WAL 归档时,系统通常只会创建几个段文件,然后通过将不再需要的段文件重命名为更高的段号来“循环”使用它们。假定内容在最后一个检查点之前的段文件不再需要,并且可以循环使用。
In an abstract sense, a running PostgreSQL system produces an indefinitely long sequence of WAL records. The system physically divides this sequence into WAL segment files, which are normally 16MB apiece (although the segment size can be altered during initdb). The segment files are given numeric names that reflect their position in the abstract WAL sequence. When not using WAL archiving, the system normally creates just a few segment files and then “recycles” them by renaming no-longer-needed segment files to higher segment numbers. It’s assumed that segment files whose contents precede the last checkpoint are no longer of interest and can be recycled.
在归档 WAL 数据时,我们需要在每个区段文件填写后立即获取其内容,并在区段文件被重新利用之前将数据存储到某个位置。根据应用程序和可用硬件的不同,“将数据存储到某个位置”可能会采用多种不同的方式:我们可以将区段文件复制到另一台机器上 NFS 挂载的目录中,将它们写入磁带驱动器(确保有方法识别每个文件的原始名称),或将它们批量处理并刻录到 CD 中,或采用其他方式。为了为数据库管理员提供灵活性,PostgreSQL 尝试不对归档方式进行任何假设。相反,PostgreSQL 允许管理员指定要在将已完成的区段文件复制到所需位置时执行的 shell 命令或归档库。这可能就像一个简单的 shell 命令,使用 cp,或者它可能调用一个复杂的 C 函数——这由您决定。
When archiving WAL data, we need to capture the contents of each segment file once it is filled, and save that data somewhere before the segment file is recycled for reuse. Depending on the application and the available hardware, there could be many different ways of “saving the data somewhere”: we could copy the segment files to an NFS-mounted directory on another machine, write them onto a tape drive (ensuring that you have a way of identifying the original name of each file), or batch them together and burn them onto CDs, or something else entirely. To provide the database administrator with flexibility, PostgreSQL tries not to make any assumptions about how the archiving will be done. Instead, PostgreSQL lets the administrator specify a shell command or an archive library to be executed to copy a completed segment file to wherever it needs to go. This could be as simple as a shell command that uses cp, or it could invoke a complex C function — it’s all up to you.
要启用 WAL 存档,请设置 wal_level配置参数为 replica_或更高版本、 archive_mode为 _on、在 archive_command配置参数中指定要使用的 shell 命令或在 archive_library配置参数中指定要使用的库。实践中,这些设置将始终放在 _postgresql.conf_文件中。
To enable WAL archiving, set the wal_level configuration parameter to replica or higher, archive_mode to on, specify the shell command to use in the archive_command configuration parameter or specify the library to use in the archive_library configuration parameter. In practice these settings will always be placed in the postgresql.conf file.
在 archive_command 中,%p 被归档文件的路径名替换,而 %f 仅被文件名替换。(路径名相对于当前工作目录,即集群的数据目录。)如果您需要在命令中嵌入实际 % 字符,请使用 %%。最有用的简单命令类似于:
In archive_command, %p is replaced by the path name of the file to archive, while %f is replaced by only the file name. (The path name is relative to the current working directory, i.e., the cluster’s data directory.) Use %% if you need to embed an actual % character in the command. The simplest useful command is something like:
archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f' # Unix
archive_command = 'copy "%p" "C:\\server\\archivedir\\%f"' # Windows
这将把可归档的 WAL 区段复制到目录 /mnt/server/archivedir。(这是一个示例,而不是建议,并且可能无法在所有平台上运行。)替换 %p 和 %f 参数后,实际执行的命令可能如下所示:
which will copy archivable WAL segments to the directory /mnt/server/archivedir. (This is an example, not a recommendation, and might not work on all platforms.) After the %p and %f parameters have been replaced, the actual command executed might look like this:
test ! -f /mnt/server/archivedir/00000001000000A900000065 && cp pg_wal/00000001000000A900000065 /mnt/server/archivedir/00000001000000A900000065
对于每个要归档的新文件,都会生成一条类似的命令。
A similar command will be generated for each new file to be archived.
将以与 PostgreSQL 服务器运行的相同用户的所有权执行归档命令。由于要归档的 WAL 文件序列实际上包含了数据库中的所有内容,因此您需要确保归档数据受到保护,不受窥探;例如,归档到没有组或世界读取权限的目录中。
The archive command will be executed under the ownership of the same user that the PostgreSQL server is running as. Since the series of WAL files being archived contains effectively everything in your database, you will want to be sure that the archived data is protected from prying eyes; for example, archive into a directory that does not have group or world read access.
如果归档命令及其仅在成功时返回退出状态 0,那么这一点非常重要。在获得 0 结果后,PostgreSQL 将假设文件已经成功归档,并将删除或重新利用该文件。但是,非零状态告诉 PostgreSQL 该文件尚未归档;它将定期尝试,直至成功。
It is important that the archive command return zero exit status if and only if it succeeds. Upon getting a zero result, PostgreSQL will assume that the file has been successfully archived, and will remove or recycle it. However, a nonzero status tells PostgreSQL that the file was not archived; it will try again periodically until it succeeds.
另一种存档方式是将自定义存档模块用作 archive_library。由于此类模块使用 _C_编写,因此创建自己的模块可能需要比编写 shell 命令付出更大的努力。但是,存档模块可以比通过 shell 存档执行更好的性能,并且它们可以访问许多有用的服务器资源。有关存档模块的详细信息,请参见 Chapter 51。
Another way to archive is to use a custom archive module as the archive_library. Since such modules are written in C, creating your own may require considerably more effort than writing a shell command. However, archive modules can be more performant than archiving via shell, and they will have access to many useful server resources. For more information about archive modules, see Chapter 51.
当存档命令因信号(SIGTERM 除外,该信号用作服务器关闭的一部分)或 shell 出错且退出状态大于 125(例如找不到命令),或者如果存档函数发出 _ERROR_或 _FATAL_时,存档程序将中止并由后主重新启动。在这种情况下, pg_stat_archiver中未报告失败。
When the archive command is terminated by a signal (other than SIGTERM that is used as part of a server shutdown) or an error by the shell with an exit status greater than 125 (such as command not found), or if the archive function emits an ERROR or FATAL, the archiver process aborts and gets restarted by the postmaster. In such cases, the failure is not reported in pg_stat_archiver.
归档命令和库通常应设计为拒绝覆盖任何预先存在的归档文件。这是一项重要的安全功能,可以保护归档的完整性,以防管理员出错(例如将两个不同服务器的输出发送到同一归档目录)。建议测试您建议的归档库以确保它不会覆盖现有文件。
Archive commands and libraries should generally be designed to refuse to overwrite any pre-existing archive file. This is an important safety feature to preserve the integrity of your archive in case of administrator error (such as sending the output of two different servers to the same archive directory). It is advisable to test your proposed archive library to ensure that it does not overwrite an existing file.
在罕见的情况下,PostgreSQL 可能尝试重新归档以前归档过的 WAL 文件。例如,如果在服务器对归档成功进行持久记录之前发生系统崩溃,则服务器将在重新启动后再次尝试归档文件(前提是归档仍处于启用状态)。当归档命令或库遇到预先存在的文件时,如果 WAL 文件与预先存在的归档具有相同的内容并且预先存在的归档已完全持久化到存储中,则它应分别返回状态 0 或 true。如果预先存在的文件包含与所归档的 WAL 文件不同的内容,则归档命令或库 must 将分别返回非零状态或 false。
In rare cases, PostgreSQL may attempt to re-archive a WAL file that was previously archived. For example, if the system crashes before the server makes a durable record of archival success, the server will attempt to archive the file again after restarting (provided archiving is still enabled). When an archive command or library encounters a pre-existing file, it should return a zero status or true, respectively, if the WAL file has identical contents to the pre-existing archive and the pre-existing archive is fully persisted to storage. If a pre-existing file contains different contents than the WAL file being archived, the archive command or library must return a nonzero status or false, respectively.
上面用于 Unix 的示例命令通过包括单独的 test 步骤来避免覆盖预先存在的归档。在某些 Unix 平台上,cp 具有如 -i 等开关,可以用来做同样的事情,但你不应该依赖这些方法,除非验证是否返回了正确的退出状态。(特别是,当使用 -i 并且目标文件已经存在时,GNU cp 将返回状态 0,这是 not 的预期行为。)
The example command above for Unix avoids overwriting a pre-existing archive by including a separate test step. On some Unix platforms, cp has switches such as -i that can be used to do the same thing less verbosely, but you should not rely on these without verifying that the right exit status is returned. (In particular, GNU cp will return status zero when -i is used and the target file already exists, which is not the desired behavior.)
在设计归档设置时,请考虑如果归档命令或库反复失败,因为某些方面需要操作员干预或归档空间不足时会发生什么。例如,如果在没有自动更换器的情况下写入磁带,则可能会发生这种情况;当磁带填满时,在交换磁带之前无法再归档任何内容。您应确保以适当的方式报告任何错误条件或对人工操作员的请求,以便能够合理快速地解决这种情况。pg_wal/ 目录将继续填写 WAL 区段文件,直到解决这种情况。(如果包含 pg_wal/ 的文件系统已满,PostgreSQL 将执行 PANIC 关机。不会丢失已提交的事务,但数据库将在您释放一些空间之前保持脱机状态。)
While designing your archiving setup, consider what will happen if the archive command or library fails repeatedly because some aspect requires operator intervention or the archive runs out of space. For example, this could occur if you write to tape without an autochanger; when the tape fills, nothing further can be archived until the tape is swapped. You should ensure that any error condition or request to a human operator is reported appropriately so that the situation can be resolved reasonably quickly. The pg_wal/ directory will continue to fill with WAL segment files until the situation is resolved. (If the file system containing pg_wal/ fills up, PostgreSQL will do a PANIC shutdown. No committed transactions will be lost, but the database will remain offline until you free some space.)
只要归档命令或库能够跟上服务器生成 WAL 数据的平均速率,那么它的速度就不重要。即使归档进程稍微落后,正常操作也会继续。如果归档显著落后,这将增加在发生灾难时丢失的数据量。这也意味着 pg_wal/ 目录将包含大量尚未归档的区段文件,这最终可能会超过可用磁盘空间。建议您监控归档进程,以确保它按照您的意愿工作。
The speed of the archive command or library is unimportant as long as it can keep up with the average rate at which your server generates WAL data. Normal operation continues even if the archiving process falls a little behind. If archiving falls significantly behind, this will increase the amount of data that would be lost in the event of a disaster. It will also mean that the pg_wal/ directory will contain large numbers of not-yet-archived segment files, which could eventually exceed available disk space. You are advised to monitor the archiving process to ensure that it is working as you intend.
在编写归档命令或库时,您应该假设要归档的文件名最长为 64 个字符,并且可以包含任何组合的 ASCII 字母、数字和点。不必保留原始相对路径 (%p),但必须保留文件名 (%f)。
In writing your archive command or library, you should assume that the file names to be archived can be up to 64 characters long and can contain any combination of ASCII letters, digits, and dots. It is not necessary to preserve the original relative path (%p) but it is necessary to preserve the file name (%f).
请注意,虽然 WAL 存档允许您恢复对 PostgreSQL 数据库中数据所做的任何修改,但它不会恢复对配置文件所做的更改(即 postgresql.conf、pg_hba.conf_和 _pg_ident.conf),因为这些更改是手动编辑的,而不是通过 SQL 操作完成的。您可能希望将配置文件保存在常规文件系统备份程序将备份的位置。关于如何重新定位配置文件,请参见 Section 20.2。
Note that although WAL archiving will allow you to restore any modifications made to the data in your PostgreSQL database, it will not restore changes made to configuration files (that is, postgresql.conf, pg_hba.conf and pg_ident.conf), since those are edited manually rather than through SQL operations. You might wish to keep the configuration files in a location that will be backed up by your regular file system backup procedures. See Section 20.2 for how to relocate the configuration files.
存档命令或函数仅在完成的 WAL 段上调用。因此,如果您的服务器仅生成少量 WAL 流量(或有空闲时间),那么在事务完成及其在存档存储中安全记录之间可能会有较长的延迟。要限制未存档数据可能存在的时间,您可以设置 archive_timeout以强制服务器至少在此频率上切换到新的 WAL 段文件。请注意,由于强制切换而提前存档的已存档文件仍与完全满的文件长度相同。因此,不建议设置很短的 archive_timeout——它会增加存档存储的容量。大约一分钟或更短的 _archive_timeout_设置通常是合理的。
The archive command or function is only invoked on completed WAL segments. Hence, if your server generates only little WAL traffic (or has slack periods where it does so), there could be a long delay between the completion of a transaction and its safe recording in archive storage. To put a limit on how old unarchived data can be, you can set archive_timeout to force the server to switch to a new WAL segment file at least that often. Note that archived files that are archived early due to a forced switch are still the same length as completely full files. It is therefore unwise to set a very short archive_timeout — it will bloat your archive storage. archive_timeout settings of a minute or so are usually reasonable.
另外,如果您希望确保新完成的事务尽快存档,则可以使用 _pg_switch_wal_强制执行段切换。 Table 9.91中列出了与 WAL 管理相关的其他实用程序函数。
Also, you can force a segment switch manually with pg_switch_wal if you want to ensure that a just-finished transaction is archived as soon as possible. Other utility functions related to WAL management are listed in Table 9.91.
当 wal_level_为 _minimal_时,某些 SQL 命令已优化,以避免 WAL 日志记录,如 Section 14.4.7中所述。如果在执行其中一条语句期间打开了存档或流复制,则 WAL 将不包含足够的信息进行存档恢复。(崩溃恢复不受影响。)因此,_wal_level_只能在服务器启动时更改。但是,可以重新加载配置文件来更改 _archive_command_和 _archive_library。如果您通过 shell 进行存档并希望暂时停止存档,则一种方法是将 archive_command_设置为一个空字符串('')。这将导致 WAL 文件累积在 _pg_wal/_中,直到重新建立一个可用的 _archive_command。
When wal_level is minimal some SQL commands are optimized to avoid WAL logging, as described in Section 14.4.7. If archiving or streaming replication were turned on during execution of one of these statements, WAL would not contain enough information for archive recovery. (Crash recovery is unaffected.) For this reason, wal_level can only be changed at server start. However, archive_command and archive_library can be changed with a configuration file reload. If you are archiving via shell and wish to temporarily stop archiving, one way to do it is to set archive_command to the empty string (''). This will cause WAL files to accumulate in pg_wal/ until a working archive_command is re-established.
26.3.2. Making a Base Backup #
执行基准备份的最简单方法是使用 pg_basebackup 工具。它可以创建一个基准备份,也可以创建常规文件或 tar 存档。如果需要比 pg_basebackup 提供的更多灵活性,还可以使用低级 API 进行基准备份(请参阅 Section 26.3.3 )。
The easiest way to perform a base backup is to use the pg_basebackup tool. It can create a base backup either as regular files or as a tar archive. If more flexibility than pg_basebackup can provide is required, you can also make a base backup using the low level API (see Section 26.3.3).
關於建立基礎備份所花費的時間,你不需要擔心。但是,如果你通常在已停用 full_page_writes 的狀態下執行伺服器,你可能會注意到備份執行期間的效能下降,因為在備份模式期間,full_page_writes 實際上是被強制開啟的。
It is not necessary to be concerned about the amount of time it takes to make a base backup. However, if you normally run the server with full_page_writes disabled, you might notice a drop in performance while the backup runs since full_page_writes is effectively forced on during backup mode.
要使用備份,你需要保留在檔案系統備份期間及之後所產生的所有 WAL 段檔案。為了協助你執行此動作,基礎備份流程會建立一個 backup history file,並立即儲存在 WAL 備份區域中。此檔案的名稱會根據你所需的檔案系統備份的第一個 WAL 段檔案命名。例如,如果起始 WAL 檔案是 0000000100001234000055CD,備份記錄檔的名稱可能會類似於 0000000100001234000055CD.007C9330.backup。(檔案名稱的第二個部分代表 WAL 檔案中的確切位置,而且通常可以忽略。)一旦你已安全地封存檔案系統備份,以及在備份期間使用的 WAL 段檔案(如備份記錄檔中所指定),所有數值上較小的封存 WAL 段現在已經不需要來復原檔案系統備份,而且可以刪除。但是,你應該考慮保留數個備份組,以絕對確定你可以復原資料。
To make use of the backup, you will need to keep all the WAL segment files generated during and after the file system backup. To aid you in doing this, the base backup process creates a backup history file that is immediately stored into the WAL archive area. This file is named after the first WAL segment file that you need for the file system backup. For example, if the starting WAL file is 0000000100001234000055CD the backup history file will be named something like 0000000100001234000055CD.007C9330.backup. (The second part of the file name stands for an exact position within the WAL file, and can ordinarily be ignored.) Once you have safely archived the file system backup and the WAL segment files used during the backup (as specified in the backup history file), all archived WAL segments with names numerically less are no longer needed to recover the file system backup and can be deleted. However, you should consider keeping several backup sets to be absolutely certain that you can recover your data.
备份历史记录文件只是一个小的文本文件。它包含您提供给 pg_basebackup 的标签字符串,以及备份的开始和结束时间和 WAL 段。如果您使用该标签来标识关联的转储文件,则存档的历史记录文件足以告诉您要还原哪个转储文件。
The backup history file is just a small text file. It contains the label string you gave to pg_basebackup, as well as the starting and ending times and WAL segments of the backup. If you used the label to identify the associated dump file, then the archived history file is enough to tell you which dump file to restore.
由於你必須保留自上次基礎備份的所有已封存 WAL 檔案,因此你通常應該根據你希望花費在已封存 WAL 檔案上的儲存空間,來選擇基礎備份的間隔時間。你也應該考慮如果需要復原時,你準備花費多長時間復原 —— 系統將必須重新執行所有 WAL 段,如果自上次基礎備份已過一段長時間,這可能會花費一段時間。
Since you have to keep around all the archived WAL files back to your last base backup, the interval between base backups should usually be chosen based on how much storage you want to expend on archived WAL files. You should also consider how long you are prepared to spend recovering, if recovery should be necessary — the system will have to replay all those WAL segments, and that could take awhile if it has been a long time since the last base backup.
26.3.3. Making a Base Backup Using the Low Level API #
使用低级 API 进行基准备份的过程比 pg_basebackup 方法多几个步骤,但相对简单。执行这些步骤的顺序非常重要,并且在继续执行下一步之前必须验证步骤是否成功。
The procedure for making a base backup using the low level APIs contains a few more steps than the pg_basebackup method, but is relatively simple. It is very important that these steps are executed in sequence, and that the success of a step is verified before proceeding to the next step.
可以同时运行多个备份(包括使用此备份 API 启动的备份和使用 pg_basebackup 启动的备份)。
Multiple backups are able to be run concurrently (both those started using this backup API and those started using pg_basebackup).
26.3.3.1. Backing Up the Data Directory #
如果在文件复制过程中尝试复制的文件发生更改,则一些文件系统备份工具会发出警告或错误。在对活动数据库进行基本备份时,这种情况很常见,并不是错误。但是,您需要确保您可以将此类投诉与实际错误区分开来。例如,某些版本的 rsync 为“消失的源文件”返回一个单独的退出代码,并且您可以编写一个驱动程序脚本来将此退出代码接受为一个非错误情况。此外,如果在 tar 复制文件时文件被截断,则某些版本的 GNU tar 会返回与致命错误无法区分的错误代码。幸运的是,1.16 及更高版本的 GNU tar 在文件在备份过程中发生更改时退出为 1,其他错误退出为 2。使用 1.23 及更高版本的 GNU tar,您可以使用警告选项 _—warning=no-file-changed --warning=no-file-removed_来隐藏相关的警告消息。
Some file system backup tools emit warnings or errors if the files they are trying to copy change while the copy proceeds. When taking a base backup of an active database, this situation is normal and not an error. However, you need to ensure that you can distinguish complaints of this sort from real errors. For example, some versions of rsync return a separate exit code for “vanished source files”, and you can write a driver script to accept this exit code as a non-error case. Also, some versions of GNU tar return an error code indistinguishable from a fatal error if a file was truncated while tar was copying it. Fortunately, GNU tar versions 1.16 and later exit with 1 if a file was changed during the backup, and 2 for other errors. With GNU tar version 1.23 and later, you can use the warning options —warning=no-file-changed --warning=no-file-removed to hide the related warning messages.
請確保你的備份包括資料庫叢集目錄(例如 /usr/local/pgsql/data)下的所有檔案。如果你正在使用不在此目錄下的表格空間,請務必同時將其納入(並確保你的備份封存將符號連結作為連結,否則復原將會損毀你的表格空間)。
Be certain that your backup includes all of the files under the database cluster directory (e.g., /usr/local/pgsql/data). If you are using tablespaces that do not reside underneath this directory, be careful to include them as well (and be sure that your backup archives symbolic links as links, otherwise the restore will corrupt your tablespaces).
但是,你應該在備份中省略叢集 pg_wal/ 子目錄下的檔案。這項輕微的調整很值得花費,因為它降低了復原時的錯誤風險。如果 pg_wal/ 是指向叢集目錄外某個地方的符號連結,這很容易安排,這無論如何都是由於效能考量而普遍採用的設定。你可能還想排除 postmaster.pid 和 postmaster.opts,它們會記錄正在執行的 Postmaster,而不是將使用此備份的 Postmaster 的資訊。(這些檔案可能會讓 pg_ctl 混淆。)
You should, however, omit from the backup the files within the cluster’s pg_wal/ subdirectory. This slight adjustment is worthwhile because it reduces the risk of mistakes when restoring. This is easy to arrange if pg_wal/ is a symbolic link pointing to someplace outside the cluster directory, which is a common setup anyway for performance reasons. You might also want to exclude postmaster.pid and postmaster.opts, which record information about the running postmaster, not about the postmaster which will eventually use this backup. (These files can confuse pg_ctl.)
通常建議在備份中省略叢集 pg_replslot/ 目錄下的檔案,這樣在主資料庫上存在的複製槽就不會成為備份的一部分。否則,隨後使用備份來建立備援可能導致備援上 WAL 檔案無限保留,如果已啟用熱備援回饋,則可能會在主資料庫上膨脹,因為使用這些複製槽的用戶端仍會連線到主資料庫並更新槽,而不是備援。即使備份僅供建立新的主資料庫,複製槽的複製預期也沒有特別有用,因為當新的主資料庫上線時,這些槽的內容可能會過時。
It is often a good idea to also omit from the backup the files within the cluster’s pg_replslot/ directory, so that replication slots that exist on the primary do not become part of the backup. Otherwise, the subsequent use of the backup to create a standby may result in indefinite retention of WAL files on the standby, and possibly bloat on the primary if hot standby feedback is enabled, because the clients that are using those replication slots will still be connecting to and updating the slots on the primary, not the standby. Even if the backup is only intended for use in creating a new primary, copying the replication slots isn’t expected to be particularly useful, since the contents of those slots will likely be badly out of date by the time the new primary comes on line.
目錄 pg_dynshmem/、pg_notify/、pg_serial/、pg_snapshots/、pg_stat_tmp/ 和 pg_subtrans/ 的內容(但不包括目錄本身)可以從備份中省略,因為它們會在 Postmaster 啟動時初始化。
The contents of the directories pg_dynshmem/, pg_notify/, pg_serial/, pg_snapshots/, pg_stat_tmp/, and pg_subtrans/ (but not the directories themselves) can be omitted from the backup as they will be initialized on postmaster startup.
任何以 pgsql_tmp 開頭的檔案或目錄都可以從備份中省略。這些檔案會在 Postmaster 啟動時移除,而且目錄會視需要重新建立。
Any file or directory beginning with pgsql_tmp can be omitted from the backup. These files are removed on postmaster start and the directories will be recreated as needed.
當找到對應名稱的檔案時,pg_internal.init 檔案可以從備份中省略。這些檔案包含關聯快取資料,在復原時會始終重建。
pg_internal.init files can be omitted from the backup whenever a file of that name is found. These files contain relation cache data that is always rebuilt when recovering.
備份標籤檔案包括你提供給 pg_backup_start 的標籤字串,以及 pg_backup_start 的執行時間,以及起始 WAL 檔案的名稱。如果產生混淆,因此有可能查看備份檔案內部並確定傾印檔案來自哪個備份工作階段。表格空間對應檔案包含 pg_tblspc/ 目錄中的符號連結名稱和每個符號連結的完整路徑。這些檔案不僅供你參考;它們的存在和內容對系統復原流程的正確運作至關重要。
The backup label file includes the label string you gave to pg_backup_start, as well as the time at which pg_backup_start was run, and the name of the starting WAL file. In case of confusion it is therefore possible to look inside a backup file and determine exactly which backup session the dump file came from. The tablespace map file includes the symbolic link names as they exist in the directory pg_tblspc/ and the full path of each symbolic link. These files are not merely for your information; their presence and contents are critical to the proper operation of the system’s recovery process.
在伺服器停止時建立備份也是可行的。在這種情況下,你明顯無法使用 pg_backup_start 或 pg_backup_stop,因此你必須自行追蹤哪個備份是哪個,以及關聯的 WAL 檔案能回溯多久。通常最好遵循上述的持續封存程序。
It is also possible to make a backup while the server is stopped. In this case, you obviously cannot use pg_backup_start or pg_backup_stop, and you will therefore be left to your own devices to keep track of which backup is which and how far back the associated WAL files go. It is generally better to follow the continuous archiving procedure above.
26.3.4. Recovering Using a Continuous Archive Backup #
好吧,最糟糕的情況發生了,而且你需要從備份中復原。以下是程序:
Okay, the worst has happened and you need to recover from your backup. Here is the procedure:
所有這些的關鍵是建立一個復原組態,說明你希望如何復原,以及復原應該執行到什麼程度。你絕對必須指定的一件事是 restore_command,它會告訴 PostgreSQL 如何擷取已封存的 WAL 檔案段。與 archive_command 相同,這是一個 shell 命令字串。它可以包含 %f,它會被所需 WAL 檔案的名稱取代,以及 %p,它會被 WAL 檔案的複製路徑名稱取代。(路徑名稱與目前的工作目錄相對應,即叢集的資料目錄。)如果你需要在命令中嵌入實際的 % 字元,請寫入 %%。最簡單有用的指令類似於:
The key part of all this is to set up a recovery configuration that describes how you want to recover and how far the recovery should run. The one thing that you absolutely must specify is the restore_command, which tells PostgreSQL how to retrieve archived WAL file segments. Like the archive_command, this is a shell command string. It can contain %f, which is replaced by the name of the desired WAL file, and %p, which is replaced by the path name to copy the WAL file to. (The path name is relative to the current working directory, i.e., the cluster’s data directory.) Write %% if you need to embed an actual % character in the command. The simplest useful command is something like:
restore_command = 'cp /mnt/server/archivedir/%f %p'
這將從目錄 /mnt/server/archivedir 複製先前已封存的 WAL 段。當然,你可以使用更複雜的配置,甚至使用 shell 指令碼,要求操作員裝入適當的磁帶。
which will copy previously archived WAL segments from the directory /mnt/server/archivedir. Of course, you can use something much more complicated, perhaps even a shell script that requests the operator to mount an appropriate tape.
命令在失败时返回非零退出状态非常重要。可以请求不在存档中存在的文件的命令 will;必须在收到此类请求时返回非零值。这不是错误条件。但例外情况是,如果命令被信号(SIGTERM 除外,SIGTERM 用作数据库服务器关闭的一部分)或 Shell 错误(例如未找到命令)终止,那么恢复将中止,服务器将无法启动。
It is important that the command return nonzero exit status on failure. The command will be called requesting files that are not present in the archive; it must return nonzero when so asked. This is not an error condition. An exception is that if the command was terminated by a signal (other than SIGTERM, which is used as part of a database server shutdown) or an error by the shell (such as command not found), then recovery will abort and the server will not start up.
并非所有请求的文件都是 WAL 段文件;您还应预期请求带有后缀名的文件 .history。还要注意 %p 的基本名称路径将不同于 %f;不要指望它们可以互换。
Not all of the requested files will be WAL segment files; you should also expect requests for files with a suffix of .history. Also be aware that the base name of the %p path will be different from %f; do not expect them to be interchangeable.
无法在存档中找到的 WAL 段将在 pg_wal/ 中查找;这允许使用最近未存档的段。但是,将优先使用存档中可用的段,而不是 pg_wal/ 中的文件。
WAL segments that cannot be found in the archive will be sought in pg_wal/; this allows use of recent un-archived segments. However, segments that are available from the archive will be used in preference to files in pg_wal/.
通常,恢复将继续遍历所有可用的 WAL 段,因而将数据库恢复到当前时间点,或在可用的 WAL 段的情况下尽可能接近当前时间点。因此,常规恢复将以“文件未找到”消息结束,错误消息的确切文本取决于您对 restore_command_的选择。您还可能会在恢复开始时看到一个错误消息,错误消息的文件名类似于 _00000001.history。这也很正常,并且在简单的恢复情况下并不表示存在问题;有关讨论,请参见 Section 26.3.5。
Normally, recovery will proceed through all available WAL segments, thereby restoring the database to the current point in time (or as close as possible given the available WAL segments). Therefore, a normal recovery will end with a “file not found” message, the exact text of the error message depending upon your choice of restore_command. You may also see an error message at the start of recovery for a file named something like 00000001.history. This is also normal and does not indicate a problem in simple recovery situations; see Section 26.3.5 for discussion.
如果您希望恢复到某个先前的点(例如,就在初级 DBA 删除了您的主事务表之前),只需指定所需的 stopping point。您可以通过日期/时间、命名的还原点或完成特定事务 ID 来指定停止点,该停止点称为“恢复目标”。截至本文撰写之时,只有日期/时间和命名的还原点选项非常可用,因为没有工具可以帮助您准确地识别要使用的事务 ID。
If you want to recover to some previous point in time (say, right before the junior DBA dropped your main transaction table), just specify the required stopping point. You can specify the stop point, known as the “recovery target”, either by date/time, named restore point or by completion of a specific transaction ID. As of this writing only the date/time and named restore point options are very usable, since there are no tools to help you identify with any accuracy which transaction ID to use.
Note
停止点必须在基本备份的结束时间之后,即 pg_backup_stop 的结束时间。您不能使用基本备份恢复到该备份正在进行的时间。(要恢复到这样的时间,您必须返回到以前的底层备份,然后从那里向前滚动。)
The stop point must be after the ending time of the base backup, i.e., the end time of pg_backup_stop. You cannot use a base backup to recover to a time when that backup was in progress. (To recover to such a time, you must go back to your previous base backup and roll forward from there.)
如果恢复发现损坏的 WAL 数据,恢复将在该点停止,并且服务器将无法启动。在这种情况下,可以从头重新运行恢复过程,在损坏点之前指定“恢复目标”,以便恢复可以正常完成。如果恢复因外部原因(例如系统崩溃或 WAL 存档无法访问)而失败,那么恢复可以简单地重新启动,它将从失败位置附近的位置重新启动。恢复重新启动的工作方式与正常操作中的检查点类似:服务器会定期将所有状态强制到磁盘,然后更新 pg_control 文件以指示不必再次扫描已处理的 WAL 数据。
If recovery finds corrupted WAL data, recovery will halt at that point and the server will not start. In such a case the recovery process could be re-run from the beginning, specifying a “recovery target” before the point of corruption so that recovery can complete normally. If recovery fails for an external reason, such as a system crash or if the WAL archive has become inaccessible, then the recovery can simply be restarted and it will restart almost from where it failed. Recovery restart works much like checkpointing in normal operation: the server periodically forces all its state to disk, and then updates the pg_control file to indicate that the already-processed WAL data need not be scanned again.
26.3.5. Timelines #
将数据库恢复到以前的时间点有能力产生一些复杂性,类似于关于时间旅行和平行宇宙的科幻故事。例如,假设在数据库的原始历史中,您在周二晚上 5:15 时删除了一个关键表,但在周三中午才发现您的错误。毫不动摇,您拿出备份,将时间还原到周二晚上 5:14,然后就可以启动并运行。在数据库宇宙的 this 历史中,您从未删除过该表。但是,假设您后来意识到这是一个糟糕的想法,并且希望回到原始历史中的周三早晨。如果您在数据库启动并运行时覆盖了导致您现在希望可以恢复到的时间的一些 WAL 段文件,那么您将无法实现此操作。因此,为了避免这种情况,您需要区分在执行时间点恢复后生成的一系列 WAL 记录和在原始数据库历史中生成的 WAL 记录。
The ability to restore the database to a previous point in time creates some complexities that are akin to science-fiction stories about time travel and parallel universes. For example, in the original history of the database, suppose you dropped a critical table at 5:15PM on Tuesday evening, but didn’t realize your mistake until Wednesday noon. Unfazed, you get out your backup, restore to the point-in-time 5:14PM Tuesday evening, and are up and running. In this history of the database universe, you never dropped the table. But suppose you later realize this wasn’t such a great idea, and would like to return to sometime Wednesday morning in the original history. You won’t be able to if, while your database was up-and-running, it overwrote some of the WAL segment files that led up to the time you now wish you could get back to. Thus, to avoid this, you need to distinguish the series of WAL records generated after you’ve done a point-in-time recovery from those that were generated in the original database history.
为了解决这个问题,PostgreSQL 有 timelines 的概念。每当存档恢复完成时,都会创建一个新的时间轴来标识该恢复后生成的一系列 WAL 记录。时间轴 ID 号是 WAL 段文件名称的一部分,因此新时间轴不会覆盖由前一时间轴生成的数据。例如,在 WAL 文件名 0000000100001234000055CD 中,处于前列的 00000001 是十六进制中的时间轴 ID。(请注意,在其他情况下(例如服务器日志消息),时间轴 ID 通常以十进制打印。)
To deal with this problem, PostgreSQL has a notion of timelines. Whenever an archive recovery completes, a new timeline is created to identify the series of WAL records generated after that recovery. The timeline ID number is part of WAL segment file names so a new timeline does not overwrite the WAL data generated by previous timelines. For example, in the WAL file name 0000000100001234000055CD, the leading 00000001 is the timeline ID in hexadecimal. (Note that in other contexts, such as server log messages, timeline IDs are usually printed in decimal.)
实际上可以存档许多不同的时间轴。虽然这似乎是一个无用的功能,但它通常是救命稻草。考虑一下这样的情况:您不确定要恢复到哪个时间点,因此必须通过反复试验进行多次时间点恢复,直到找到从旧历史中分支出来的最佳位置。如果没有时间轴,这个过程很快就会产生难以管理的混乱。有了时间轴,您可以恢复到 any 先前状态,包括您之前放弃的时间轴分支中的状态。
It is in fact possible to archive many different timelines. While that might seem like a useless feature, it’s often a lifesaver. Consider the situation where you aren’t quite sure what point-in-time to recover to, and so have to do several point-in-time recoveries by trial and error until you find the best place to branch off from the old history. Without timelines this process would soon generate an unmanageable mess. With timelines, you can recover to any prior state, including states in timeline branches that you abandoned earlier.
每次创建一个新时间轴时,PostgreSQL 都会创建一个“时间轴历史”文件,其中显示它从哪个时间轴分支出来的以及何时分支出来的。这些历史文件对于允许系统在从包含多个时间轴的存档中恢复时挑选正确 WAL 段文件是必需的。因此,它们就像 WAL 段文件一样被存档到 WAL 存档区域。历史文件只是小的文本文件,因此无限期地保留它们很便宜而且合适(不像大的段文件)。如果您愿意,可以在历史文件中添加注释,以记录有关如何以及为何创建此特定时间轴的自己的注释。在您通过试验产生大量不同时间轴时,此类注释将特别有价值。
Every time a new timeline is created, PostgreSQL creates a “timeline history” file that shows which timeline it branched off from and when. These history files are necessary to allow the system to pick the right WAL segment files when recovering from an archive that contains multiple timelines. Therefore, they are archived into the WAL archive area just like WAL segment files. The history files are just small text files, so it’s cheap and appropriate to keep them around indefinitely (unlike the segment files which are large). You can, if you like, add comments to a history file to record your own notes about how and why this particular timeline was created. Such comments will be especially valuable when you have a thicket of different timelines as a result of experimentation.
恢复的默认行为是恢复到存档中找到的最新时间线。如果您希望恢复到在进行基本备份时有效的的时间线或特定的子时间线(即,您希望返回到在恢复尝试后本身已生成的状态),则需要指定 _current_或 recovery_target_timeline中的目标时间线 ID。您无法恢复到早于基本备份分支的时间线中。
The default behavior of recovery is to recover to the latest timeline found in the archive. If you wish to recover to the timeline that was current when the base backup was taken or into a specific child timeline (that is, you want to return to some state that was itself generated after a recovery attempt), you need to specify current or the target timeline ID in recovery_target_timeline. You cannot recover into timelines that branched off earlier than the base backup.
26.3.6. Tips and Examples #
在此处提供了配置连续存档的一些技巧。
Some tips for configuring continuous archiving are given here.
26.3.6.1. Standalone Hot Backups #
可以使用 PostgreSQL 的备份工具来生成独立热备份。这些备份无法用于时间点恢复,但通常比 pg_dump 转储的备份和恢复速度快得多。(它们也比 pg_dump 转储大得多,因此在某些情况下速度优势可能会被抵消。)
It is possible to use PostgreSQL’s backup facilities to produce standalone hot backups. These are backups that cannot be used for point-in-time recovery, yet are typically much faster to backup and restore than pg_dump dumps. (They are also much larger than pg_dump dumps, so in some cases the speed advantage might be negated.)
与基准备份一样,生成独立热备份的最简单方法是使用 pg_basebackup 工具。如果您在调用它时包含 -X 参数,则使用备份所需的所有预写式日志将自动包含在备份中,并且不需要特殊操作来还原备份。
As with base backups, the easiest way to produce a standalone hot backup is to use the pg_basebackup tool. If you include the -X parameter when calling it, all the write-ahead log required to use the backup will be included in the backup automatically, and no special action is required to restore the backup.
26.3.6.2. Compressed Archive Logs #
如果存档存储大小是一个问题,您可以使用 gzip 压缩存档文件:
If archive storage size is a concern, you can use gzip to compress the archive files:
archive_command = 'gzip < %p > /mnt/server/archivedir/%f.gz'
然后您需要在恢复过程中使用 gunzip:
You will then need to use gunzip during recovery:
restore_command = 'gunzip < /mnt/server/archivedir/%f.gz > %p'
26.3.6.3. archive_command Scripts #
许多人选择使用脚本来定义他们的 archive_command,以便他们的 postgresql.conf 条目看起来非常简单:
Many people choose to use scripts to define their archive_command, so that their postgresql.conf entry looks very simple:
archive_command = 'local_backup_script.sh "%p" "%f"'
如果您想在归档过程中使用多于一个命令,则建议使用单独的脚本文件。这允许在脚本中管理所有复杂性,该脚本可以用常用的脚本语言(例如 bash 或 perl)编写。
Using a separate script file is advisable any time you want to use more than a single command in the archiving process. This allows all complexity to be managed within the script, which can be written in a popular scripting language such as bash or perl.
可以在脚本中解决的要求示例包括:
Examples of requirements that might be solved within a script include:
Tip
在使用 _archive_command_脚本时,最好启用 logging_collector。然后脚本从 stderr 写入的任何消息都将显示在数据库服务器日志中,如果复杂的配置失败,这将允许轻松诊断。
When using an archive_command script, it’s desirable to enable logging_collector. Any messages written to stderr from the script will then appear in the database server log, allowing complex configurations to be diagnosed easily if they fail.
26.3.7. Caveats #
在撰写这份材料时,持续存档技术存在一些限制。这些限制很可能会在未来的版本中得到解决:
At this writing, there are several limitations of the continuous archiving technique. These will probably be fixed in future releases:
值得注意的是,默认的 WAL 格式非常庞大,因为它包含许多磁盘页快照。这些页面快照旨在支持故障恢复,因为我们可能需要修复部分写入的磁盘页面。根据系统硬件和软件,部分写入的风险可能小到可以忽略,在这种情况下,使用 full_page_writes 参数关闭页面快照可以显著减少存档的 WAL 文件的总容量。(在执行此操作之前,请阅读 Chapter 30 中的注释和警告。)关闭页面快照不会阻止将 WAL 用于 PITR 操作。未来的发展领域是在 full_page_writes 打开的情况下通过删除不必要的页面副本压缩存档的 WAL 数据。与此同时,管理员可能希望通过尽可能增加检查点间隔参数来减少包含在 WAL 中的页面快照的数量。
It should also be noted that the default WAL format is fairly bulky since it includes many disk page snapshots. These page snapshots are designed to support crash recovery, since we might need to fix partially-written disk pages. Depending on your system hardware and software, the risk of partial writes might be small enough to ignore, in which case you can significantly reduce the total volume of archived WAL files by turning off page snapshots using the full_page_writes parameter. (Read the notes and warnings in Chapter 30 before you do so.) Turning off page snapshots does not prevent use of the WAL for PITR operations. An area for future development is to compress archived WAL data by removing unnecessary page copies even when full_page_writes is on. In the meantime, administrators might wish to reduce the number of page snapshots included in WAL by increasing the checkpoint interval parameters as much as feasible.