Postgresql 中文操作指南
26.1. SQL Dump #
此转储方法的思想在于生成一个包含 SQL 命令的文件,这些命令在反馈给服务器后,会将数据库重新创建为与转储时相同的状态。PostgreSQL 为此提供了实用程序 pg_dump 。此命令的基本用法如下:
The idea behind this dump method is to generate a file with SQL commands that, when fed back to the server, will recreate the database in the same state as it was at the time of the dump. PostgreSQL provides the utility program pg_dump for this purpose. The basic usage of this command is:
pg_dump dbname > dumpfile
正如你所看到的,pg_dump 将其结果写入标准输出。我们将在下面看到这如何发挥作用。上述命令创建一个文本文件,而 pg_dump 可以创建其他格式的文件,用于并行处理和更细致的对象还原控制。
As you see, pg_dump writes its result to the standard output. We will see below how this can be useful. While the above command creates a text file, pg_dump can create files in other formats that allow for parallelism and more fine-grained control of object restoration.
pg_dump 是一个常规的 PostgreSQL 客户端应用程序(尽管特别巧妙)。这意味着你可以从任何可以访问数据库的远程主机执行此备份过程。但请记住,pg_dump 不会使用特殊权限进行操作。尤其是,它必须具有对所有要备份的表的可读访问权限,因此为了备份整个数据库,你几乎总是必须以数据库超级用户的身份运行它。(如果你没有足够的权限来备份整个数据库,你仍然可以使用 -n _schema _ or _-t _table 等选项备份你具有访问权限的数据库部分。)
pg_dump is a regular PostgreSQL client application (albeit a particularly clever one). This means that you can perform this backup procedure from any remote host that has access to the database. But remember that pg_dump does not operate with special permissions. In particular, it must have read access to all tables that you want to back up, so in order to back up the entire database you almost always have to run it as a database superuser. (If you do not have sufficient privileges to back up the entire database, you can still back up portions of the database to which you do have access using options such as _-n _schema or _-t _table.)
若要指定 pg_dump 应该联系哪个数据库服务器,请使用命令行选项 -h _host and -p _port. The default host is the local host or whatever your PGHOST 指定的环境变量。类似地,默认端口由 PGPORT 环境变量或(如果没有环境变量则)由已编译默认值指定。(很方便的是,服务器通常具有相同的已编译默认值。)
To specify which database server pg_dump should contact, use the command line options -h _host and _-p _port. The default host is the local host or whatever your _PGHOST environment variable specifies. Similarly, the default port is indicated by the PGPORT environment variable or, failing that, by the compiled-in default. (Conveniently, the server will normally have the same compiled-in default.)
与其他任何 PostgreSQL 客户端应用程序类似,pg_dump 默认情况下将使用与当前操作系统用户名相同的数据库用户名进行连接。要覆盖此连接方法,请指定 -U 选项或设置 PGUSER 环境变量。请记住,pg_dump 连接需遵守正常的客户端验证机制( Chapter 21 中有所描述)。
Like any other PostgreSQL client application, pg_dump will by default connect with the database user name that is equal to the current operating system user name. To override this, either specify the -U option or set the environment variable PGUSER. Remember that pg_dump connections are subject to the normal client authentication mechanisms (which are described in Chapter 21).
与稍后描述的其他备份方法相比,pg_dump 的一个重要优势在于,pg_dump 的输出通常可以重新加载到更新版本的 PostgreSQL 中,而文件级备份和连续归档都高度特定于服务器版本。pg_dump 也是在将数据库传输到不同机器体系结构(例如从 32 位服务器迁移到 64 位服务器)时唯一可用的方法。
An important advantage of pg_dump over the other backup methods described later is that pg_dump’s output can generally be re-loaded into newer versions of PostgreSQL, whereas file-level backups and continuous archiving are both extremely server-version-specific. pg_dump is also the only method that will work when transferring a database to a different machine architecture, such as going from a 32-bit to a 64-bit server.
由 pg_dump 创建的转储在内部是一致的,这意味着转储表示 pg_dump 开始运行时的数据库快照。pg_dump 在运行期间不会阻止对数据库的其他操作。(例外情况是需要使用独占锁才能操作的那些操作,例如大多数形式的 ALTER TABLE。)
Dumps created by pg_dump are internally consistent, meaning, the dump represents a snapshot of the database at the time pg_dump began running. pg_dump does not block other operations on the database while it is working. (Exceptions are those operations that need to operate with an exclusive lock, such as most forms of ALTER TABLE.)
26.1.1. Restoring the Dump #
pg_dump 创建的文本文件旨在供 psql 程序读取。还原转储的一般命令格式为:
Text files created by pg_dump are intended to be read in by the psql program. The general command form to restore a dump is
psql dbname < dumpfile
其中 dumpfile 是由 pg_dump 命令输出的文件。此命令不会创建数据库 dbname ,因此你必须在执行 psql 之前从 template0 手动创建它(例如,使用 createdb -T template0 _dbname_ )。psql 为指定要连接到的数据库服务器和要使用的用户名提供了类似于 pg_dump 的选项。有关更多信息,请参阅 psql 参考页。非文本文件转储使用 pg_restore 实用程序进行恢复。
where dumpfile is the file output by the pg_dump command. The database dbname will not be created by this command, so you must create it yourself from template0 before executing psql (e.g., with createdb -T template0 _dbname_). psql supports options similar to pg_dump for specifying the database server to connect to and the user name to use. See the psql reference page for more information. Non-text file dumps are restored using the pg_restore utility.
在还原 SQL 转储之前,所有在转储的数据库中拥有对象或被授予对象权限的用户必须已经存在。如果他们不存在,还原将无法重新创建具有原始所有权和/或权限的对象。(有时这是你想要的,但通常不是。)
Before restoring an SQL dump, all the users who own objects or were granted permissions on objects in the dumped database must already exist. If they do not, the restore will fail to recreate the objects with the original ownership and/or permissions. (Sometimes this is what you want, but usually it is not.)
默认情况下,在遇到 SQL 错误后,psql 脚本将继续执行。你可能希望使用 ON_ERROR_STOP 变量来运行 psql 来更改该行为,并在出现 SQL 错误时让 psql 以退出状态 3 退出:
By default, the psql script will continue to execute after an SQL error is encountered. You might wish to run psql with the ON_ERROR_STOP variable set to alter that behavior and have psql exit with an exit status of 3 if an SQL error occurs:
psql --set ON_ERROR_STOP=on dbname < dumpfile
无论哪种方式,你都只会拥有一个部分还原的数据库。或者,你可以指定应将整个转储还原为一个事务,以便还原完全完成或完全回滚。可以通过将 -1 或 —single-transaction 命令行选项传递给 psql 来指定此模式。在使用此模式时,请注意,即使是较小的错误也可能回滚已运行数小时的还原。但是,这可能仍然比在部分还原的转储后手动清理复杂数据库更好。
Either way, you will only have a partially restored database. Alternatively, you can specify that the whole dump should be restored as a single transaction, so the restore is either fully completed or fully rolled back. This mode can be specified by passing the -1 or —single-transaction command-line options to psql. When using this mode, be aware that even a minor error can rollback a restore that has already run for many hours. However, that might still be preferable to manually cleaning up a complex database after a partially restored dump.
pg_dump 和 psql 能够写入或从管道读取,这使得可以从一个服务器直接转储数据库到另一个服务器,例如:
The ability of pg_dump and psql to write to or read from pipes makes it possible to dump a database directly from one server to another, for example:
pg_dump -h host1 dbname | psql -h host2 dbname
Important
pg_dump 生成的转储与 template0 相关。这意味着通过 template1 添加的任何语言、过程等也由 pg_dump 转储。因此,在还原时,如果你使用的是自定义 template1,你必须从 template0 创建空数据库,如上例所示。
The dumps produced by pg_dump are relative to template0. This means that any languages, procedures, etc. added via template1 will also be dumped by pg_dump. As a result, when restoring, if you are using a customized template1, you must create the empty database from template0, as in the example above.
在还原备份后,最好对每个数据库运行 ANALYZE ,以便查询优化器具有有用的统计信息;有关更多信息,请参阅 Section 25.1.3 和 Section 25.1.6 。有关如何将大量数据高效加载到 PostgreSQL 中的更多建议,请参阅 Section 14.4 。
After restoring a backup, it is wise to run ANALYZE on each database so the query optimizer has useful statistics; see Section 25.1.3 and Section 25.1.6 for more information. For more advice on how to load large amounts of data into PostgreSQL efficiently, refer to Section 14.4.
26.1.2. Using pg_dumpall #
pg_dump 一次只转储单个数据库,并且不转储关于角色或表空间的信息(因为这些信息是针对整个集群的,而不是针对每个数据库的)。为了支持方便地转储数据库集群的全部内容,提供了 pg_dumpall 程序。pg_dumpall 备份给定集群中的每个数据库,并且还保留诸如角色和表空间定义等集群范围的数据。此命令的基本用法如下:
pg_dump dumps only a single database at a time, and it does not dump information about roles or tablespaces (because those are cluster-wide rather than per-database). To support convenient dumping of the entire contents of a database cluster, the pg_dumpall program is provided. pg_dumpall backs up each database in a given cluster, and also preserves cluster-wide data such as role and tablespace definitions. The basic usage of this command is:
pg_dumpall > dumpfile
可以使用 psql 还原生成的转储:
The resulting dump can be restored with psql:
psql -f dumpfile postgres
(实际上,你可以指定任何现有的数据库名称作为开始,但如果你要加载到一个空集群,则通常应使用 postgres。)在还原 pg_dumpall 转储时,始终需要具有数据库超级用户访问权限,因为这需要还原角色和表空间信息。如果你使用表空间,请确保转储中的表空间路径适用于新安装。
(Actually, you can specify any existing database name to start from, but if you are loading into an empty cluster then postgres should usually be used.) It is always necessary to have database superuser access when restoring a pg_dumpall dump, as that is required to restore the role and tablespace information. If you use tablespaces, make sure that the tablespace paths in the dump are appropriate for the new installation.
pg_dumpall 的工作原理是发出命令来重新创建角色、表空间和空数据库,然后为每个数据库调用 pg_dump。这意味着虽然每个数据库在内部是一致的,但不同数据库的快照是不同步的。
pg_dumpall works by emitting commands to re-create roles, tablespaces, and empty databases, then invoking pg_dump for each database. This means that while each database will be internally consistent, the snapshots of different databases are not synchronized.
可以使用 pg_dumpall —globals-only
选项仅针对集群范围的数据进行转储。如果在单独的数据库上运行 pg_dump
命令,这就对于完全备份集群而言是必要的。
Cluster-wide data can be dumped alone using the pg_dumpall —globals-only option. This is necessary to fully backup the cluster if running the pg_dump command on individual databases.
26.1.3. Handling Large Databases #
某些操作系统有最大文件大小限制,在创建大型 pg_dump
输出文件时会导致问题。幸运的是,pg_dump
可以写入标准输出,所以你可以使用标准 Unix 工具来解决此潜在问题。有几种可能的方法:
Some operating systems have maximum file size limits that cause problems when creating large pg_dump output files. Fortunately, pg_dump can write to the standard output, so you can use standard Unix tools to work around this potential problem. There are several possible methods:
*Use compressed dumps. * 您可以使用您喜欢的压缩程序,例如 gzip:
*Use compressed dumps. * You can use your favorite compression program, for example gzip:
pg_dump dbname | gzip > filename.gz
使用以下命令重新加载:
Reload with:
gunzip -c filename.gz | psql dbname
或:
or:
cat filename.gz | gunzip | psql dbname
*Use split. * split 命令允许你将输出分割成更小的文件,这些文件的大小可以被底层文件系统接受。例如,要创建 2 千兆字节的大块:
*Use split. * The split command allows you to split the output into smaller files that are acceptable in size to the underlying file system. For example, to make 2 gigabyte chunks:
pg_dump dbname | split -b 2G - filename
使用以下命令重新加载:
Reload with:
cat filename* | psql dbname
如果使用 GNU split,可以使用它和 gzip 一起:
If using GNU split, it is possible to use it and gzip together:
pg_dump dbname | split -b 2G --filter='gzip > $FILE.gz'
可以使用 zcat 进行还原。
It can be restored using zcat.
*Use pg_dump’s custom dump format. * 如果 PostgreSQL 是在安装有 zlib 压缩库的系统上构建的,则自定义转储格式会在将数据写入输出文件时对数据进行压缩。这将生成类似于使用 gzip 的转储文件大小,但它具有可以有选择地还原表的附加优点。以下命令使用自定义转储格式转储数据库:
*Use pg_dump’s custom dump format. * If PostgreSQL was built on a system with the zlib compression library installed, the custom dump format will compress data as it writes it to the output file. This will produce dump file sizes similar to using gzip, but it has the added advantage that tables can be restored selectively. The following command dumps a database using the custom dump format:
pg_dump -Fc dbname > filename
自定义格式转储不是 psql 的脚本,而是必须使用 pg_restore 进行还原,例如:
A custom-format dump is not a script for psql, but instead must be restored with pg_restore, for example:
pg_restore -d dbname filename
有关详细信息,请参阅 pg_dump 和 pg_restore 参考页。
See the pg_dump and pg_restore reference pages for details.
对于非常大的数据库,您可能需要将 split 与另外两种方法之一结合使用。
For very large databases, you might need to combine split with one of the other two approaches.
*Use pg_dump’s parallel dump feature. * 为了加快大型数据库的转储速度,可以使用 pg_dump 的并行模式。这将同时转储多个表。你可以使用 -j 参数控制并行度。仅针对“目录”存档格式支持并行转储。
*Use pg_dump’s parallel dump feature. * To speed up the dump of a large database, you can use pg_dump’s parallel mode. This will dump multiple tables at the same time. You can control the degree of parallelism with the -j parameter. Parallel dumps are only supported for the "directory" archive format.
pg_dump -j num -F d -f out.dir dbname
您可以使用 pg_restore -j 来并行还原转储。这适用于“custom”或“directory”存档模式的任何存档,无论是否使用 pg_dump -j 创建。
You can use pg_restore -j to restore a dump in parallel. This will work for any archive of either the "custom" or the "directory" archive mode, whether or not it has been created with pg_dump -j.