Postgresql 中文操作指南
19.4. Managing Kernel Resources #
PostgreSQL 有时会耗尽各种操作系统资源限制,尤其是在同一系统上运行多个服务器副本时,或者在非常大的安装中。本节说明了 PostgreSQL 使用的内核资源以及为解决与内核资源消耗相关的问题而可以采取的步骤。
PostgreSQL can sometimes exhaust various operating system resource limits, especially when multiple copies of the server are running on the same system, or in very large installations. This section explains the kernel resources used by PostgreSQL and the steps you can take to resolve problems related to kernel resource consumption.
19.4.1. Shared Memory and Semaphores #
PostgreSQL 需要操作系统来提供进程间通信(IPC)功能,尤其是共享内存和信号灯。源自 Unix 的系统通常提供“System V”IPC,“POSIX”IPC 或二者兼具。Windows 有自己的实现功能,此处不在讨论范围。
PostgreSQL requires the operating system to provide inter-process communication (IPC) features, specifically shared memory and semaphores. Unix-derived systems typically provide “System V” IPC, “POSIX” IPC, or both. Windows has its own implementation of these features and is not discussed here.
默认情况下,PostgreSQL 分配极少量的 System V 共享内存,以及更大数量的匿名 mmap 共享内存。或者,可以使用单个较大的 System V 共享内存区域(参见 shared_memory_type)。此外,在服务器启动时会创建大量信号量,这些信号量可以是 System V 样式,也可以是 POSIX 样式。目前,POSIX 信号量用在 Linux 和 FreeBSD 系统上,而其他平台使用 System V 信号量。
By default, PostgreSQL allocates a very small amount of System V shared memory, as well as a much larger amount of anonymous mmap shared memory. Alternatively, a single large System V shared memory region can be used (see shared_memory_type). In addition a significant number of semaphores, which can be either System V or POSIX style, are created at server startup. Currently, POSIX semaphores are used on Linux and FreeBSD systems while other platforms use System V semaphores.
System V IPC 功能通常受到系统级分配限制的约束。当 PostgreSQL 超出其中一个限制时,服务器将拒绝启动,并且会留下一条有指导性的错误消息来描述问题和解决办法。(另请参见 Section 19.3.1。)相关内核参数在不同的系统之间始终如一地命名; Table 19.1 提供了概述。然而,设置它们的方法各不相同。下面给出了一些平台的建议。
System V IPC features are typically constrained by system-wide allocation limits. When PostgreSQL exceeds one of these limits, the server will refuse to start and should leave an instructive error message describing the problem and what to do about it. (See also Section 19.3.1.) The relevant kernel parameters are named consistently across different systems; Table 19.1 gives an overview. The methods to set them, however, vary. Suggestions for some platforms are given below.
Table 19.1. System V IPC Parameters
Table 19.1. System V IPC Parameters
Name |
Description |
Values needed to run one PostgreSQL instance |
SHMMAX |
Maximum size of shared memory segment (bytes) |
at least 1kB, but the default is usually much higher |
SHMMIN |
Minimum size of shared memory segment (bytes) |
1 |
SHMALL |
Total amount of shared memory available (bytes or pages) |
same as SHMMAX if bytes, or ceil(SHMMAX/PAGE_SIZE) if pages, plus room for other applications |
SHMSEG |
Maximum number of shared memory segments per process |
only 1 segment is needed, but the default is much higher |
SHMMNI |
Maximum number of shared memory segments system-wide |
like SHMSEG plus room for other applications |
SEMMNI |
Maximum number of semaphore identifiers (i.e., sets) |
at least ceil((max_connections + autovacuum_max_workers + max_wal_senders + max_worker_processes + 5) / 16) plus room for other applications |
SEMMNS |
Maximum number of semaphores system-wide |
ceil((max_connections + autovacuum_max_workers + max_wal_senders + max_worker_processes + 5) / 16) * 17 plus room for other applications |
SEMMSL |
Maximum number of semaphores per set |
at least 17 |
SEMMAP |
Number of entries in semaphore map |
see text |
SEMVMX |
Maximum value of semaphore |
at least 1000 (The default is often 32767; do not change unless necessary) |
对于服务器的每个副本,PostgreSQL 需要一些字节的 System V 共享内存(通常为 48 字节,在 64 位平台上)。在大多数现代操作系统上,可以轻松地分配此数量。但是,如果您运行多个服务器副本,或者您明确将服务器配置为使用大量 System V 共享内存(参见 shared_memory_type 和 dynamic_shared_memory_type),则可能需要增加 SHMALL,它是在系统范围内总共的 System V 共享内存。请注意,在许多系统上,SHMALL 以页为单位测量,而不是以字节为单位。
PostgreSQL requires a few bytes of System V shared memory (typically 48 bytes, on 64-bit platforms) for each copy of the server. On most modern operating systems, this amount can easily be allocated. However, if you are running many copies of the server or you explicitly configure the server to use large amounts of System V shared memory (see shared_memory_type and dynamic_shared_memory_type), it may be necessary to increase SHMALL, which is the total amount of System V shared memory system-wide. Note that SHMALL is measured in pages rather than bytes on many systems.
不太可能导致问题的共享内存段的最小尺寸(SHMMIN),对于 PostgreSQL 最多约为 32 字节(通常只是 1)。除非您的系统将系统范围(SHMMNI)或每个进程(SHMSEG)的最大段数设置为零,否则不太可能造成问题。
Less likely to cause problems is the minimum size for shared memory segments (SHMMIN), which should be at most approximately 32 bytes for PostgreSQL (it is usually just 1). The maximum number of segments system-wide (SHMMNI) or per-process (SHMSEG) are unlikely to cause a problem unless your system has them set to zero.
当使用系统V信号量时,PostgreSQL每允许一个连接( max_connections)、允许一个自动清理工作程序( autovacuum_max_workers)和允许一个后台程序( max_worker_processes),以16个一组的方式使用一个信号量。每16个信号量还包含一个包含“魔数”的第17个信号量,用于检查与其他应用程序使用的信号量冲突。系统中信号量的最大数量由_SEMMNS_设置,因此它必定至少与_max_connections_加上_autovacuum_max_workers_加上_max_wal_senders_加上_max_worker_processes_加上针对每个允许的16个连接和工作程序新增的一个相等(请参见 Table 19.1中的公式)。参数_SEMMNI_用于确定系统一次可以使用的信号量数目限制。因此此参数必须至少为_ceil((max_connections + autovacuum_max_workers + max_wal_senders + max_worker_processes + 5) / 16)_。降低允许连接的数量是一种临时解决方法,可用于处理通常用令人困惑的表达“设备无可用空间”来表述的故障(来自_semget_函数)。
When using System V semaphores, PostgreSQL uses one semaphore per allowed connection (max_connections), allowed autovacuum worker process (autovacuum_max_workers) and allowed background process (max_worker_processes), in sets of 16. Each such set will also contain a 17th semaphore which contains a “magic number”, to detect collision with semaphore sets used by other applications. The maximum number of semaphores in the system is set by SEMMNS, which consequently must be at least as high as max_connections plus autovacuum_max_workers plus max_wal_senders, plus max_worker_processes, plus one extra for each 16 allowed connections plus workers (see the formula in Table 19.1). The parameter SEMMNI determines the limit on the number of semaphore sets that can exist on the system at one time. Hence this parameter must be at least ceil((max_connections + autovacuum_max_workers + max_wal_senders + max_worker_processes + 5) / 16). Lowering the number of allowed connections is a temporary workaround for failures, which are usually confusingly worded “No space left on device”, from the function semget.
在某些情况下,可能还需要将 SEMMAP 至少增加到 SEMMNS 的数量级。如果系统具有该参数(许多系统没有),它将定义信号灯资源映射的大小,其中每一块连续的可用信号灯组都需要一个条目。当释放信号灯组时,它将被添加到与释放块相邻的现有条目,或者在新的映射条目下注册。如果映射已满,释放的信号灯将丢失(直至重新启动)。信号灯空间的碎片化可能会导致可用信号灯少于应有数量。
In some cases it might also be necessary to increase SEMMAP to be at least on the order of SEMMNS. If the system has this parameter (many do not), it defines the size of the semaphore resource map, in which each contiguous block of available semaphores needs an entry. When a semaphore set is freed it is either added to an existing entry that is adjacent to the freed block or it is registered under a new map entry. If the map is full, the freed semaphores get lost (until reboot). Fragmentation of the semaphore space could over time lead to fewer available semaphores than there should be.
与“信号灯撤消”相关的其他各种设置,例如 SEMMNU 和 SEMUME,不会影响 PostgreSQL。
Various other settings related to “semaphore undo”, such as SEMMNU and SEMUME, do not affect PostgreSQL.
当使用POSIX信号量时,所需的信号量数量与系统V相同,也就是说,每允许一个连接( max_connections)、允许一个自动清理工作程序( autovacuum_max_workers)和允许一个后台程序( max_worker_processes),都使用一个信号量。在更偏好此选项的平台上,没有对POSIX信号量的数量设置特定的内核限制。
When using POSIX semaphores, the number of semaphores needed is the same as for System V, that is one semaphore per allowed connection (max_connections), allowed autovacuum worker process (autovacuum_max_workers) and allowed background process (max_worker_processes). On the platforms where this option is preferred, there is no specific kernel limit on the number of POSIX semaphores.
-
AIX
-
It should not be necessary to do any special configuration for such parameters as SHMMAX, as it appears this is configured to allow all memory to be used as shared memory. That is the sort of configuration commonly used for other databases such as DB/2.
-
It might, however, be necessary to modify the global ulimit information in /etc/security/limits, as the default hard limits for file sizes (fsize) and numbers of files (nofiles) might be too low.
-
-
FreeBSD
-
The default shared memory settings are usually good enough, unless you have set shared_memory_type to sysv. System V semaphores are not used on this platform.
-
The default IPC settings can be changed using the sysctl or loader interfaces. The following parameters can be set using sysctl:
-
# sysctl kern.ipc.shmall=32768
# sysctl kern.ipc.shmmax=134217728
-
To make these settings persist over reboots, modify /etc/sysctl.conf.
-
If you have set shared_memory_type to sysv, you might also want to configure your kernel to lock System V shared memory into RAM and prevent it from being paged out to swap. This can be accomplished using the sysctl setting kern.ipc.shm_use_phys.
-
If running in a FreeBSD jail, you should set its sysvshm parameter to new, so that it has its own separate System V shared memory namespace. (Before FreeBSD 11.0, it was necessary to enable shared access to the host’s IPC namespace from jails, and take measures to avoid collisions.)
-
NetBSD
-
-
The default shared memory settings are usually good enough, unless you have set shared_memory_type to sysv. You will usually want to increase kern.ipc.semmni and kern.ipc.semmns, as NetBSD’s default settings for these are uncomfortably small.
-
IPC parameters can be adjusted using sysctl, for example:
# sysctl -w kern.ipc.semmni=100
-
To make these settings persist over reboots, modify /etc/sysctl.conf.
-
If you have set shared_memory_type to sysv, you might also want to configure your kernel to lock System V shared memory into RAM and prevent it from being paged out to swap. This can be accomplished using the sysctl setting kern.ipc.shm_use_phys.
-
OpenBSD
-
-
The default shared memory settings are usually good enough, unless you have set shared_memory_type to sysv. You will usually want to increase kern.seminfo.semmni and kern.seminfo.semmns, as OpenBSD’s default settings for these are uncomfortably small.
-
IPC parameters can be adjusted using sysctl, for example:
# sysctl kern.seminfo.semmni=100
-
To make these settings persist over reboots, modify /etc/sysctl.conf.
-
Linux
-
-
The default shared memory settings are usually good enough, unless you have set shared_memory_type to sysv, and even then only on older kernel versions that shipped with low defaults. System V semaphores are not used on this platform.
-
The shared memory size settings can be changed via the sysctl interface. For example, to allow 16 GB:
$ sysctl -w kernel.shmmax=17179869184
$ sysctl -w kernel.shmall=4194304
-
To make these settings persist over reboots, see /etc/sysctl.conf.
-
macOS
-
-
The default shared memory and semaphore settings are usually good enough, unless you have set shared_memory_type to sysv.
-
The recommended method for configuring shared memory in macOS is to create a file named /etc/sysctl.conf, containing variable assignments such as:
kern.sysv.shmmax=4194304
kern.sysv.shmmin=1
kern.sysv.shmmni=32
kern.sysv.shmseg=8
kern.sysv.shmall=1024
-
Note that in some macOS versions, all five shared-memory parameters must be set in /etc/sysctl.conf, else the values will be ignored.
-
SHMMAX can only be set to a multiple of 4096.
-
SHMALL is measured in 4 kB pages on this platform.
-
It is possible to change all but SHMMNI on the fly, using sysctl. But it’s still best to set up your preferred values via /etc/sysctl.conf, so that the values will be kept across reboots.
-
Solarisillumos
-
-
The default shared memory and semaphore settings are usually good enough for most PostgreSQL applications. Solaris defaults to a SHMMAX of one-quarter of system RAM. To further adjust this setting, use a project setting associated with the postgres user. For example, run the following as root:
projadd -c "PostgreSQL DB User" -K "project.max-shm-memory=(privileged,8GB,deny)" -U postgres -G postgres user.postgres
-
This command adds the user.postgres project and sets the shared memory maximum for the postgres user to 8GB, and takes effect the next time that user logs in, or when you restart PostgreSQL (not reload). The above assumes that PostgreSQL is run by the postgres user in the postgres group. No server reboot is required.
-
Other recommended kernel setting changes for database servers which will have a large number of connections are:
project.max-shm-ids=(priv,32768,deny)
project.max-sem-ids=(priv,4096,deny)
project.max-msg-ids=(priv,4096,deny)
-
Additionally, if you are running PostgreSQL inside a zone, you may need to raise the zone resource usage limits as well. See "Chapter2: Projects and Tasks" in the System Administrator’s Guide for more information on projects and prctl.
19.4.2. systemd RemoveIPC #
如果使用 systemd,必须注意操作系统不会过早删除 IPC 资源(包括共享内存)。在从源代码安装 PostgreSQL 时,这一点尤其令人担忧。较不可能影响 PostgreSQL 发行包的用户,因为 postgres 用户通常作为系统用户创建。
If systemd is in use, some care must be taken that IPC resources (including shared memory) are not prematurely removed by the operating system. This is especially of concern when installing PostgreSQL from source. Users of distribution packages of PostgreSQL are less likely to be affected, as the postgres user is then normally created as a system user.
logind.conf 中的 RemoveIPC 设置控制在用户完全注销时是否删除 IPC 对象。系统用户获得豁免。此设置在 stock systemd 中默认为启用,但某些操作系统发行版将其默认为禁用。
The setting RemoveIPC in logind.conf controls whether IPC objects are removed when a user fully logs out. System users are exempt. This setting defaults to on in stock systemd, but some operating system distributions default it to off.
当此设置启用时的一个典型的观察到的效果是用于并行查询执行的共享内存对象在看似随机的时间被删除,从而导致在尝试打开和删除它们时出现错误和警告,例如:
A typical observed effect when this setting is on is that shared memory objects used for parallel query execution are removed at apparently random times, leading to errors and warnings while attempting to open and remove them, like
WARNING: could not remove shared memory segment "/PostgreSQL.1450751626": No such file or directory
Systemd 对不同类型的 IPC 对象(共享内存相对于信号量,System V 相对于 POSIX)的处理略有不同,所以人们可能会观察到一些 IPC 资源的删除方式与其他资源不同。但建议不要依赖这些细微差别。
Different types of IPC objects (shared memory vs. semaphores, System V vs. POSIX) are treated slightly differently by systemd, so one might observe that some IPC resources are not removed in the same way as others. But it is not advisable to rely on these subtle differences.
“用户注销”可能是维护工作的一部分,或者当管理员以 postgres 用户身份或类似身份登录时手动注销,因此总体而言很难防止这种情况的发生。
A “user logging out” might happen as part of a maintenance job or manually when an administrator logs in as the postgres user or something similar, so it is hard to prevent in general.
“系统用户”在 systemd 编译时根据 /etc/login.defs 中的 SYS_UID_MAX 设置确定。
What is a “system user” is determined at systemd compile time from the SYS_UID_MAX setting in /etc/login.defs.
打包和部署脚本应通过使用 useradd -r、adduser --system 或同等方式小心创建 postgres 用户作为系统用户。
Packaging and deployment scripts should be careful to create the postgres user as a system user by using useradd -r, adduser --system, or equivalent.
或者,如果用户账户创建不正确或无法更改,建议设置
Alternatively, if the user account was created incorrectly or cannot be changed, it is recommended to set
RemoveIPC=no
在 /etc/systemd/logind.conf 或其他适当的配置文件中。
in /etc/systemd/logind.conf or another appropriate configuration file.
Caution
必须确保这两个中的至少一个,否则 PostgreSQL 服务器将非常不可靠。
At least one of these two things has to be ensured, or the PostgreSQL server will be very unreliable.
19.4.3. Resource Limits #
类 Unix 操作系统强制执行各种类型的资源限制,这些限制可能会干扰 PostgreSQL 服务器的操作。特别重要的是对每个用户可用的进程数、每个进程可打开的文件数以及每个进程可用的内存量进行限制。这些限制都具有“硬”限制和“软”限制。软限制是实际计算的,但用户可以将其更改到硬限制。硬限制只能由 root 用户更改。系统调用 setrlimit 负责设置这些参数。外壳内置命令 ulimit(Bourne 外壳)或 limit(csh)用于从命令行控制资源限制。在 BSD 派生系统上,文件 /etc/login.conf 控制登录期间设置的各种资源限制。有关详细信息,请参见操作系统文档。相关参数是 maxproc、openfiles 和 datasize。例如:
Unix-like operating systems enforce various kinds of resource limits that might interfere with the operation of your PostgreSQL server. Of particular importance are limits on the number of processes per user, the number of open files per process, and the amount of memory available to each process. Each of these have a “hard” and a “soft” limit. The soft limit is what actually counts but it can be changed by the user up to the hard limit. The hard limit can only be changed by the root user. The system call setrlimit is responsible for setting these parameters. The shell’s built-in command ulimit (Bourne shells) or limit (csh) is used to control the resource limits from the command line. On BSD-derived systems the file /etc/login.conf controls the various resource limits set during login. See the operating system documentation for details. The relevant parameters are maxproc, openfiles, and datasize. For example:
default:\
...
:datasize-cur=256M:\
:maxproc-cur=256:\
:openfiles-cur=256:\
...
(-cur 是软限制。附加 -max 以设置硬限制。)
(-cur is the soft limit. Append -max to set the hard limit.)
内核还可以在某些资源上设置系统范围内的限制。
Kernels can also have system-wide limits on some resources.
PostgreSQL 服务器每个连接使用一个进程,因此除了系统其他部分所需的进程外,还应至少提供与允许的连接一样多的进程。这通常不成问题,但如果你在同一台机器上运行多个服务器,情况可能会变得紧张。
The PostgreSQL server uses one process per connection so you should provide for at least as many processes as allowed connections, in addition to what you need for the rest of your system. This is usually not a problem but if you run several servers on one machine things might get tight.
打开文件数的出厂默认限制通常设置为“社交友好”值,这允许许多用户在一台机器上共存而不会使用不适当数量的系统资源。如果你在同一台机器上运行多个服务器,这可能就是你想要的,但在专用服务器上,你可能希望提高此限制。
The factory default limit on open files is often set to “socially friendly” values that allow many users to coexist on a machine without using an inappropriate fraction of the system resources. If you run many servers on a machine this is perhaps what you want, but on dedicated servers you might want to raise this limit.
另一方面,某些系统允许单个处理打开大量文件;如果有多个处理这样做,那么系统限制就很可能被超出。如果您发现发生了这种情况,并且不想更改系统限制,您可以设置PostgreSQL的 max_files_per_process配置参数,以对打开文件的使用进行限制。
On the other side of the coin, some systems allow individual processes to open large numbers of files; if more than a few processes do so then the system-wide limit can easily be exceeded. If you find this happening, and you do not want to alter the system-wide limit, you can set PostgreSQL’s max_files_per_process configuration parameter to limit the consumption of open files.
在支持大量客户端连接时,可能另一个需要注意的内核限制是最大套接字连接队列长度。如果在非常短的时间内到达多于此数量的连接请求,一些连接可能会在 PostgreSQL 服务器可以处理请求之前被拒绝,这些客户端会收到无用的连接失败错误,例如“资源暂时不可用”或“连接被拒绝”。在许多平台上,默认队列长度限制为 128。要提高此限制,请通过 sysctl 调整适当的内核参数,然后重新启动 PostgreSQL 服务器。该参数在 Linux 上的不同名称是 net.core.somaxconn,在较新的 FreeBSD 上的不同名称是 kern.ipc.soacceptqueue,在 macOS 和其他 BSD 变体上不同名称是 kern.ipc.somaxconn。
Another kernel limit that may be of concern when supporting large numbers of client connections is the maximum socket connection queue length. If more than that many connection requests arrive within a very short period, some may get rejected before the PostgreSQL server can service the requests, with those clients receiving unhelpful connection failure errors such as “Resource temporarily unavailable” or “Connection refused”. The default queue length limit is 128 on many platforms. To raise it, adjust the appropriate kernel parameter via sysctl, then restart the PostgreSQL server. The parameter is variously named net.core.somaxconn on Linux, kern.ipc.soacceptqueue on newer FreeBSD, and kern.ipc.somaxconn on macOS and other BSD variants.
19.4.4. Linux Memory Overcommit #
Linux 上的默认虚拟内存行为对于 PostgreSQL 来说并不是最优的。由于内核实现内存超量提交的方式,如果 PostgreSQL 或其他进程的内存需求导致系统耗尽虚拟内存,内核可能会终止 PostgreSQL 的后端(监督服务器进程)。
The default virtual memory behavior on Linux is not optimal for PostgreSQL. Because of the way that the kernel implements memory overcommit, the kernel might terminate the PostgreSQL postmaster (the supervisor server process) if the memory demands of either PostgreSQL or another process cause the system to run out of virtual memory.
如果发生这种情况,你会看到类似这样的一条内核消息(请查阅系统文档和配置以了解在哪里查找此类消息):
If this happens, you will see a kernel message that looks like this (consult your system documentation and configuration on where to look for such a message):
Out of Memory: Killed process 12345 (postgres).
这意味着 postgres 进程因内存压力而终止。现有数据库连接将继续正常运行,但不会接受新连接。为了恢复,需要重新启动 PostgreSQL。
This indicates that the postgres process has been terminated due to memory pressure. Although existing database connections will continue to function normally, no new connections will be accepted. To recover, PostgreSQL will need to be restarted.
避免这个问题的一种方法是在一台机器上运行 PostgreSQL,你可以在该机器上确保其他进程不会使机器耗尽内存。如果内存紧张,增加操作系统的交换空间可以帮助避免这个问题,因为只有在物理内存和交换空间耗尽时,才会调用内存不足 (OOM) 终止程序。
One way to avoid this problem is to run PostgreSQL on a machine where you can be sure that other processes will not run the machine out of memory. If memory is tight, increasing the swap space of the operating system can help avoid the problem, because the out-of-memory (OOM) killer is invoked only when physical memory and swap space are exhausted.
如果 PostgreSQL 本身是系统用尽内存的原因,则可以更改配置来避免此问题。在某些情况下,它可能有助于降低与内存相关的配置参数,尤其是 shared_buffers 、 work_mem 和 hash_mem_multiplier 。在其他情况下,问题可能是允许太多的连接到数据库服务器本身。在很多情况下,也许最好是减少 max_connections ,而改为使用外部连接池软件。
If PostgreSQL itself is the cause of the system running out of memory, you can avoid the problem by changing your configuration. In some cases, it may help to lower memory-related configuration parameters, particularly shared_buffers, work_mem, and hash_mem_multiplier. In other cases, the problem may be caused by allowing too many connections to the database server itself. In many cases, it may be better to reduce max_connections and instead make use of external connection-pooling software.
可以通过修改内核行为来使其不会“过度提交”内存。虽然此设置并不能彻底防止调用 OOM killer,但它会降低调用的频率,从而带来更稳健的系统行为。可以通过_sysctl_选择严格的过度提交模式来实现此目的:
It is possible to modify the kernel’s behavior so that it will not “overcommit” memory. Although this setting will not prevent the OOM killer from being invoked altogether, it will lower the chances significantly and will therefore lead to more robust system behavior. This is done by selecting strict overcommit mode via sysctl:
sysctl -w vm.overcommit_memory=2
或在_/etc/sysctl.conf_中放一个等效的条目。您可能还需要修改相关的设置_vm.overcommit_ratio_。详情请参阅内核文档文件 https://www.kernel.org/doc/Documentation/vm/overcommit-accounting。
or placing an equivalent entry in /etc/sysctl.conf. You might also wish to modify the related setting vm.overcommit_ratio. For details see the kernel documentation file https://www.kernel.org/doc/Documentation/vm/overcommit-accounting.
另一种方法,可在更改或不更改 vm.overcommit_memory 时使用,是将后处理程序的进程特定的 OOM score adjustment 值设置为 -1000,从而保证它不会成为 OOM 终止的目标。执行此操作的最简单方法是在调用 postgres 之前在 PostgreSQL 启动脚本中执行
Another approach, which can be used with or without altering vm.overcommit_memory, is to set the process-specific OOM score adjustment value for the postmaster process to -1000, thereby guaranteeing it will not be targeted by the OOM killer. The simplest way to do this is to execute
echo -1000 > /proc/self/oom_score_adj
请注意,此操作必须以 root 身份执行,否则将不起作用;所以 root 所有权的启动脚本是最简单的方法。如果您这样做,您还应该在调用 postgres 之前在启动脚本中设置以下环境变量:
in the PostgreSQL startup script just before invoking postgres. Note that this action must be done as root, or it will have no effect; so a root-owned startup script is the easiest place to do it. If you do this, you should also set these environment variables in the startup script before invoking postgres:
export PG_OOM_ADJUST_FILE=/proc/self/oom_score_adj
export PG_OOM_ADJUST_VALUE=0
这些设置将导致 postmaster 子进程以常规 OOM 分数调整零运行,以便 OOM 终止器仍然可以在需要时将其作为目标。如果您希望子进程以其他一些 OOM 分数调整运行,则可以使用 PG_OOM_ADJUST_VALUE 的其他一些值。(也可以省略 PG_OOM_ADJUST_VALUE,在这种情况下,它默认为零。)如果您没有设置 PG_OOM_ADJUST_FILE,则子进程将以与 postmaster 相同的 OOM 分数调整运行,这是不明智的,因为其重点是确保 postmaster 具有首选设置。
These settings will cause postmaster child processes to run with the normal OOM score adjustment of zero, so that the OOM killer can still target them at need. You could use some other value for PG_OOM_ADJUST_VALUE if you want the child processes to run with some other OOM score adjustment. (PG_OOM_ADJUST_VALUE can also be omitted, in which case it defaults to zero.) If you do not set PG_OOM_ADJUST_FILE, the child processes will run with the same OOM score adjustment as the postmaster, which is unwise since the whole point is to ensure that the postmaster has a preferential setting.
19.4.5. Linux Huge Pages #
使用大页面可减少在使用大块连续内存(例如PostgreSQL会做的操作)时的开销,尤其是在使用较大的 shared_buffers值时。要在PostgreSQL中使用此功能,您需要一个具备_CONFIG_HUGETLBFS=y_和_CONFIG_HUGETLB_PAGE=y_的内核。您还必须配置操作系统,以提供所需数量的大页面。要确定所需的页面数量,可以使用_postgres_命令查看 shared_memory_size_in_huge_pages的值。请注意,必须关闭服务器才能查看此运行时计算的参数。它可能如下:
Using huge pages reduces overhead when using large contiguous chunks of memory, as PostgreSQL does, particularly when using large values of shared_buffers. To use this feature in PostgreSQL you need a kernel with CONFIG_HUGETLBFS=y and CONFIG_HUGETLB_PAGE=y. You will also have to configure the operating system to provide enough huge pages of the desired size. To determine the number of huge pages needed, use the postgres command to see the value of shared_memory_size_in_huge_pages. Note that the server must be shut down to view this runtime-computed parameter. This might look like:
$ postgres -D $PGDATA -C shared_memory_size_in_huge_pages
3170
$ grep ^Hugepagesize /proc/meminfo
Hugepagesize: 2048 kB
$ ls /sys/kernel/mm/hugepages
hugepages-1048576kB hugepages-2048kB
在本示例中,默认值为2MB,但您也可以用 huge_page_size显式要求2MB或1GB,以调整由_shared_memory_size_in_huge_pages_计算的页数。虽然在本示例中我们需要至少_3170_个大页面,但如果计算机上的其他程序也需要大页面,则适当的设置会更大。我们可以使用以下设置来设置它:
In this example the default is 2MB, but you can also explicitly request either 2MB or 1GB with huge_page_size to adapt the number of pages calculated by shared_memory_size_in_huge_pages. While we need at least 3170 huge pages in this example, a larger setting would be appropriate if other programs on the machine also need huge pages. We can set this with:
# sysctl -w vm.nr_hugepages=3170
别忘了将此设置添加到 /etc/sysctl.conf 中,以便在重新启动后重新应用它。对于非默认大页大小,我们可以使用以下方法:
Don’t forget to add this setting to /etc/sysctl.conf so that it is reapplied after reboots. For non-default huge page sizes, we can instead use:
# echo 3170 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
还可以使用 hugepagesz=2M hugepages=3170 等内核参数在启动时提供这些设置。
It is also possible to provide these settings at boot time using kernel parameters such as hugepagesz=2M hugepages=3170.
有时,由于碎片,内核无法立即分配所需的巨页数,因此可能需要重复命令或重新启动。(重新启动后立即,计算机的大部分内存都应可转换为巨页。)要验证给定大小的巨页分配情况,请使用:
Sometimes the kernel is not able to allocate the desired number of huge pages immediately due to fragmentation, so it might be necessary to repeat the command or to reboot. (Immediately after a reboot, most of the machine’s memory should be available to convert into huge pages.) To verify the huge page allocation situation for a given size, use:
$ cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
还可能需要通过 sysctl 设置 vm.hugetlb_shm_group 来授予数据库服务器的操作系统用户使用巨页的权限,和/或授予使用 ulimit -l 锁定内存的权限。
It may also be necessary to give the database server’s operating system user permission to use huge pages by setting vm.hugetlb_shm_group via sysctl, and/or give permission to lock memory with ulimit -l.
PostgreSQL对于大页面的默认行为是可能的话使用大页面(使用系统默认的大页面大小),并在失败时切换回普通页面。为了强制使用大页面,可以在_postgresql.conf_中将 huge_pages设置为_on_。请注意,在使用此设置的情况下,如果未提供足够的大页面,PostgreSQL将无法启动。
The default behavior for huge pages in PostgreSQL is to use them when possible, with the system’s default huge page size, and to fall back to normal pages on failure. To enforce the use of huge pages, you can set huge_pages to on in postgresql.conf. Note that with this setting PostgreSQL will fail to start if not enough huge pages are available.
有关Linux大页面功能的详细说明,请参阅 https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt。
For a detailed description of the Linux huge pages feature have a look at https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt.