Postgresql 中文操作指南

19.2. Creating a Database Cluster #

在执行任何操作之前,您必须初始化磁盘上的数据库存储区。我们称之为 database cluster。(SQL 标准使用术语目录集群。)数据库集群是由运行数据库服务器的一个实例管理的数据库集合。初始化后,数据库集群将包含一个名为 postgres 的数据库,该数据库旨在作为实用工具、用户和第三方应用程序使用的默认数据库。数据库服务器本身不要求存在 postgres 数据库,但许多外部实用工具程序假定它存在。在每个集群初始化过程中,另外创建两个数据库,分别命名为 template1template0。顾名思义,这些数据库将用作后续创建的数据库的模板;它们不应用于实际工作。(有关在集群中创建新数据库的信息,请参见 Chapter 23。)

Before you can do anything, you must initialize a database storage area on disk. We call this a database cluster. (The SQL standard uses the term catalog cluster.) A database cluster is a collection of databases that is managed by a single instance of a running database server. After initialization, a database cluster will contain a database named postgres, which is meant as a default database for use by utilities, users and third party applications. The database server itself does not require the postgres database to exist, but many external utility programs assume it exists. There are two more databases created within each cluster during initialization, named template1 and template0. As the names suggest, these will be used as templates for subsequently-created databases; they should not be used for actual work. (See Chapter 23 for information about creating new databases within a cluster.)

在文件系统术语中,数据库集群是将存储所有数据的一个目录。我们称其为 data directorydata area 。将数据存储在何处完全由你决定。没有默认位置,但诸如 /usr/local/pgsql/data/var/lib/pgsql/data 之类的位置很常见。在使用之前,必须使用随 PostgreSQL 一起安装的程序 initdb 初始化数据目录。

In file system terms, a database cluster is a single directory under which all data will be stored. We call this the data directory or data area. It is completely up to you where you choose to store your data. There is no default, although locations such as /usr/local/pgsql/data or /var/lib/pgsql/data are popular. The data directory must be initialized before being used, using the program initdb which is installed with PostgreSQL.

如果你使用的是 PostgreSQL 的预打包版本,它很可能对放置数据目录的位置有特定的约定,并且它还可能提供一个用于创建数据目录的脚本。如果是这种情况,你应该优先使用该脚本,而不是直接运行 initdb。有关详细信息,请参阅包级文档。

If you are using a pre-packaged version of PostgreSQL, it may well have a specific convention for where to place the data directory, and it may also provide a script for creating the data directory. In that case you should use that script in preference to running initdb directly. Consult the package-level documentation for details.

要手动初始化数据库簇,请运行 initdb 并使用 -D 选项指定数据库簇所需的文件系统位置,例如:

To initialize a database cluster manually, run initdb and specify the desired file system location of the database cluster with the -D option, for example:

$ initdb -D /usr/local/pgsql/data

请注意,必须在登录到 PostgreSQL 用户帐户时执行此命令,该帐户在上一部分中进行了描述。

Note that you must execute this command while logged into the PostgreSQL user account, which is described in the previous section.

Tip

作为 -D 选项的替代方法,可以设置环境变量 PGDATA

As an alternative to the -D option, you can set the environment variable PGDATA.

或者,你可以像这样通过 pg_ctl 程序运行 initdb

Alternatively, you can run initdb via the pg_ctl program like so:

$ pg_ctl -D /usr/local/pgsql/data initdb

如果您使用 pg_ctl 开始和停止服务器(见 Section 19.3),则这可能会更直观,这样 pg_ctl 将是您用于管理数据库服务器实例的唯一命令。

This may be more intuitive if you are using pg_ctl for starting and stopping the server (see Section 19.3), so that pg_ctl would be the sole command you use for managing the database server instance.

如果目录不存在,initdb 将尝试创建你指定的目录。当然,如果 initdb 没有权限在父目录中写入,这将失败。通常建议 PostgreSQL 用户不仅拥有数据目录,还拥有其父目录,因此这应该不是问题。如果所需父目录也不存在,则需要首先创建它,如果祖父母目录不可写入,则需要使用 root 权限创建它。因此,流程可能如下:

initdb will attempt to create the directory you specify if it does not already exist. Of course, this will fail if initdb does not have permissions to write in the parent directory. It’s generally recommendable that the PostgreSQL user own not just the data directory but its parent directory as well, so that this should not be a problem. If the desired parent directory doesn’t exist either, you will need to create it first, using root privileges if the grandparent directory isn’t writable. So the process might look like this:

root# mkdir /usr/local/pgsql
root# chown postgres /usr/local/pgsql
root# su postgres
postgres$ initdb -D /usr/local/pgsql/data

如果数据目录存在并且已经包含文件,initdb 将拒绝运行;这是为了防止意外覆盖现有安装。

initdb will refuse to run if the data directory exists and already contains files; this is to prevent accidentally overwriting an existing installation.

由于数据目录包含了所有存储在数据库中的数据,因此必须确保安全,防止未经授权的访问。initdb 因此撤销了除了 PostgreSQL 用户和可选的组之外的所有人的访问权限。启用组访问时,仅为只读模式。这允许群集所有者所在同一组中的无特权用户备份群集数据或执行其他仅需要读取权限的操作。

Because the data directory contains all the data stored in the database, it is essential that it be secured from unauthorized access. initdb therefore revokes access permissions from everyone but the PostgreSQL user, and optionally, group. Group access, when enabled, is read-only. This allows an unprivileged user in the same group as the cluster owner to take a backup of the cluster data or perform other operations that only require read access.

请注意,在现有群集上启用或禁用组访问权限需要关闭群集,并在重新启动 PostgreSQL 之前在所有目录和文件上设置相应模式。否则,数据目录中可能存在混杂模式。对于仅允许所有者访问的群集,相应模式是 0700 用于目录,0600 用于文件。对于还允许组读取的群集,相应模式是 0750 用于目录,0640 用于文件。

Note that enabling or disabling group access on an existing cluster requires the cluster to be shut down and the appropriate mode to be set on all directories and files before restarting PostgreSQL. Otherwise, a mix of modes might exist in the data directory. For clusters that allow access only by the owner, the appropriate modes are 0700 for directories and 0600 for files. For clusters that also allow reads by the group, the appropriate modes are 0750 for directories and 0640 for files.

但是,虽然目录内容是安全的,但默认客户端身份验证设置允许任何本地用户连接到数据库,甚至成为数据库超级用户。如果您不信任其他本地用户,我们建议您使用 initdb-W—​pwprompt—​pwfile 选项为数据库超级用户分配密码。此外,指定 -A scram-sha-256,以不使用默认 trust 身份验证模式;或者在运行 initdb 后修改生成的 pg_hba.conf 文件,但在 before 首次启动服务器。 (其他合理的方法包括使用 peer 身份验证或文件系统权限来限制连接。有关详细信息,请参见 Chapter 21。)

However, while the directory contents are secure, the default client authentication setup allows any local user to connect to the database and even become the database superuser. If you do not trust other local users, we recommend you use one of initdb's -W, —​pwprompt or —​pwfile options to assign a password to the database superuser. Also, specify -A scram-sha-256 so that the default trust authentication mode is not used; or modify the generated pg_hba.conf file after running initdb, but before you start the server for the first time. (Other reasonable approaches include using peer authentication or file system permissions to restrict connections. See Chapter 21 for more information.)

initdb 还初始化数据库集群的默认区域设置。通常,它只会采用环境中的区域设置并将其应用到已初始化的数据库。可以为数据库指定其他区域设置;更多信息请参见 Section 24.1。initdb 创建的特定数据库集群中使用的默认排序顺序由 initdb 设置,虽然您可以使用不同的排序顺序创建新数据库,但在模板数据库中使用的顺序无法在不删除并重新创建它们的情况下进行更改。对于使用除 CPOSIX 之外的区域设置,也会存在性能影响。因此,首次正确做出此选择非常重要。

initdb also initializes the default locale for the database cluster. Normally, it will just take the locale settings in the environment and apply them to the initialized database. It is possible to specify a different locale for the database; more information about that can be found in Section 24.1. The default sort order used within the particular database cluster is set by initdb, and while you can create new databases using different sort order, the order used in the template databases that initdb creates cannot be changed without dropping and recreating them. There is also a performance impact for using locales other than C or POSIX. Therefore, it is important to make this choice correctly the first time.

initdb 还设置数据库集群的默认字符集编码。通常这应选择与区域设置匹配。有关详细信息,请参见 Section 24.3

initdb also sets the default character set encoding for the database cluster. Normally this should be chosen to match the locale setting. For details see Section 24.3.

C 和非 POSIX 区域设置依赖于操作系统的排序规则库来进行字符集排序。这控制了存储在索引中的密钥的排序顺序。因此,一个群集无法通过快照还原、二进制流复制、不同的操作系统或操作系统升级切换到不兼容的排序规则库版本。

Non-C and non-POSIX locales rely on the operating system’s collation library for character set ordering. This controls the ordering of keys stored in indexes. For this reason, a cluster cannot switch to an incompatible collation library version, either through snapshot restore, binary streaming replication, a different operating system, or an operating system upgrade.

19.2.1. Use of Secondary File Systems #

许多安装在其机器的“根”卷以外的文件系统(卷)上创建其数据库群集。如果您选择这样做,不建议尝试使用辅助卷的最顶层目录(装入点)作为数据目录。最佳做法是在装入点目录内创建一个由 PostgreSQL 用户拥有的目录,然后在其中创建数据目录。这样可以避免权限问题,特别是对于 pg_upgrade 等操作,并且如果辅助卷处于脱机状态,还可以确保干净失败。

Many installations create their database clusters on file systems (volumes) other than the machine’s “root” volume. If you choose to do this, it is not advisable to try to use the secondary volume’s topmost directory (mount point) as the data directory. Best practice is to create a directory within the mount-point directory that is owned by the PostgreSQL user, and then create the data directory within that. This avoids permissions problems, particularly for operations such as pg_upgrade, and it also ensures clean failures if the secondary volume is taken offline.

19.2.2. File Systems #

一般来说,任何具有 POSIX 语法的文件系统都可以用于 PostgreSQL。由于各种原因,用户更喜欢不同的文件系统,包括供应商支持、性能和熟悉程度。经验表明,在其他条件相同的情况下,不应该仅仅因为切换文件系统或进行微小的文件系统配置更改,而期望产生重大的性能或行为变化。

Generally, any file system with POSIX semantics can be used for PostgreSQL. Users prefer different file systems for a variety of reasons, including vendor support, performance, and familiarity. Experience suggests that, all other things being equal, one should not expect major performance or behavior changes merely from switching file systems or making minor file system configuration changes.

19.2.2.1. NFS #

可以将 NFS 文件系统用于存储 PostgreSQL 数据目录。PostgreSQL 没有针对 NFS 文件系统采取任何特殊措施,这意味着它假设 NFS 的行为与本地连接的驱动器完全相同。PostgreSQL 没有使用任何已知在 NFS 上具有非标准行为的功能,例如文件锁定。

It is possible to use an NFS file system for storing the PostgreSQL data directory. PostgreSQL does nothing special for NFS file systems, meaning it assumes NFS behaves exactly like locally-connected drives. PostgreSQL does not use any functionality that is known to have nonstandard behavior on NFS, such as file locking.

将 NFS 与 PostgreSQL 结合使用的唯一严格要求是文件系统使用 hard 选项装入。通过 hard 选项,如果出现网络问题,进程可能会无限期地“挂起”,所以此配置将需要精心设置监控。soft 选项将在出现网络问题时中断系统调用,但 PostgreSQL 不会重复以这种方式中断的系统调用,所以任何此类中断都会导致报告 I/O 错误。

The only firm requirement for using NFS with PostgreSQL is that the file system is mounted using the hard option. With the hard option, processes can “hang” indefinitely if there are network problems, so this configuration will require a careful monitoring setup. The soft option will interrupt system calls in case of network problems, but PostgreSQL will not repeat system calls interrupted in this way, so any such interruption will result in an I/O error being reported.

不必使用 sync 挂载选项。async 选项的行为已足够,因为 PostgreSQL 在适当的时候发出 fsync 调用以刷新写缓存。(这类似于它在本地文件系统上工作的方式。)但是,强烈建议在存在该 sync 选项的 NFS server 中使用 sync 导出选项(主要是 Linux)。否则,NFS 客户端上的 fsync 或等效命令实际上无法到达服务器上的永久存储,这可能导致类似于使用参数 fsync 的损坏。这些挂载和导出选项的默认值在不同的供应商和版本之间有所不同,因此建议在任何情况下都检查并可能明确指定它们以避免任何歧义。

It is not necessary to use the sync mount option. The behavior of the async option is sufficient, since PostgreSQL issues fsync calls at appropriate times to flush the write caches. (This is analogous to how it works on a local file system.) However, it is strongly recommended to use the sync export option on the NFS server on systems where it exists (mainly Linux). Otherwise, an fsync or equivalent on the NFS client is not actually guaranteed to reach permanent storage on the server, which could cause corruption similar to running with the parameter fsync off. The defaults of these mount and export options differ between vendors and versions, so it is recommended to check and perhaps specify them explicitly in any case to avoid any ambiguity.

在某些情况下,可以通过 NFS 或低级别协议(如 iSCSI)访问外部存储产品。在后一种情况下,存储显示为块设备,可以在其上创建任何可用文件系统。这种方法可以使 DBA 不必处理 NFS 的一些特殊习惯,但当然管理远程存储的复杂性会在其他级别发生。

In some cases, an external storage product can be accessed either via NFS or a lower-level protocol such as iSCSI. In the latter case, the storage appears as a block device and any available file system can be created on it. That approach might relieve the DBA from having to deal with some of the idiosyncrasies of NFS, but of course the complexity of managing remote storage then happens at other levels.