Postgresql 中文操作指南

30.3. Write-Ahead Logging (WAL) #

Write-Ahead Logging(WAL)是确保数据完整性的标准方法。在大多数(如果不是全部)有关事务处理的书籍中都可以找到详细说明。简而言之,WAL 的核心概念是,对数据文件(表格和索引驻留的位置)的更改必须仅在记录这些更改之后才能写入,即,用于描述更改的 WAL 记录已刷新到永久存储。如果我们遵循此过程,那么我们不必在每次事务提交时将数据页刷新到磁盘,因为我们知道在发生崩溃的情况下,我们将能够使用日志恢复数据库:尚未应用于数据页的任何更改都可以从 WAL 记录中重新执行。(这是向前恢复,也称为 REDO。)

Write-Ahead Logging (WAL) is a standard method for ensuring data integrity. A detailed description can be found in most (if not all) books about transaction processing. Briefly, WAL’s central concept is that changes to data files (where tables and indexes reside) must be written only after those changes have been logged, that is, after WAL records describing the changes have been flushed to permanent storage. If we follow this procedure, we do not need to flush data pages to disk on every transaction commit, because we know that in the event of a crash we will be able to recover the database using the log: any changes that have not been applied to the data pages can be redone from the WAL records. (This is roll-forward recovery, also known as REDO.)

Tip

由于 WAL 在崩溃后恢复数据库文件内容,因此可靠存储数据文件或 WAL 文件并不需要日志文件系统。事实上,日志开销会降低性能,特别是如果日志导致文件系统 data 刷新到磁盘的话。幸运的是,写入日志期间的数据刷新通常可以通过文件系统挂载选项来禁用,例如,在 Linux ext3 文件系统上使用 data=writeback。日志文件系统在崩溃后确实会提高引导速度。

Because WAL restores database file contents after a crash, journaled file systems are not necessary for reliable storage of the data files or WAL files. In fact, journaling overhead can reduce performance, especially if journaling causes file system data to be flushed to disk. Fortunately, data flushing during journaling can often be disabled with a file system mount option, e.g., data=writeback on a Linux ext3 file system. Journaled file systems do improve boot speed after a crash.

使用 WAL 会大大减少磁盘写入次数,因为仅需要将 WAL 文件刷新到磁盘即可保证事务被提交,而不是被事务更改的每个数据文件。WAL 文件是顺序写入的,因此同步 WAL 的成本远小于刷新数据页的成本。对于处理涉及数据存储不同部分的许多小事务的服务器来说,这尤其如此。此外,当服务器处理许多小的并发事务时,WAL 文件的 fsync 一个可能就足以提交许多个事务。

Using WAL results in a significantly reduced number of disk writes, because only the WAL file needs to be flushed to disk to guarantee that a transaction is committed, rather than every data file changed by the transaction. The WAL file is written sequentially, and so the cost of syncing the WAL is much less than the cost of flushing the data pages. This is especially true for servers handling many small transactions touching different parts of the data store. Furthermore, when the server is processing many small concurrent transactions, one fsync of the WAL file may suffice to commit many transactions.

Section 26.3 中所述,WAL 还可以支持在线备份和特定时间恢复。通过存档 WAL 数据,我们可以支持恢复到可用 WAL 数据涵盖的任何时间点:我们只需安装数据库的先前物理备份,然后将 WAL 重放到所需的时间。更重要的是,物理备份不必是数据库状态的瞬时快照——如果它是经过一段时间制作的,那么重放该时期的 WAL 将修复任何内部不一致。

WAL also makes it possible to support on-line backup and point-in-time recovery, as described in Section 26.3. By archiving the WAL data we can support reverting to any time instant covered by the available WAL data: we simply install a prior physical backup of the database, and replay the WAL just as far as the desired time. What’s more, the physical backup doesn’t have to be an instantaneous snapshot of the database state — if it is made over some period of time, then replaying the WAL for that period will fix any internal inconsistencies.