Postgresql 中文操作指南
Chapter 48. Background Worker Processes
PostgreSQL 可扩展为在单独的进程中运行用户提供的代码。此类进程由 postgres 启动、停止和监视,允许它们与服务器的状态紧密关联。这些进程附加到 PostgreSQL 的共享内存区域,可以选择内部连接到数据库;它们还可以像常规客户端连接的服务器进程一样串行地运行多个事务。此外,通过链接到 libpq,它们可以连接到服务器并表现得像常规客户端应用程序。
PostgreSQL can be extended to run user-supplied code in separate processes. Such processes are started, stopped and monitored by postgres, which permits them to have a lifetime closely linked to the server’s status. These processes are attached to PostgreSQL’s shared memory area and have the option to connect to databases internally; they can also run multiple transactions serially, just like a regular client-connected server process. Also, by linking to libpq they can connect to the server and behave like a regular client application.
Warning
在后台 worker 进程中使用存在相当大的鲁棒性和安全风险,因为它们使用 C 编写,可以不受限制地访问数据。希望启用包含后台 worker 进程的模块的管理员应格外小心。只有经过仔细审核的模块才被允许运行后台 worker 进程。
There are considerable robustness and security risks in using background worker processes because, being written in the C language, they have unrestricted access to data. Administrators wishing to enable modules that include background worker processes should exercise extreme caution. Only carefully audited modules should be permitted to run background worker processes.
在 PostgreSQL 启动时,可以通过在 shared_preload_libraries 中包含模块名称对后台工作程序进行初始化。希望运行后台工作程序的模块可以通过从其 PG_init() 函数调用 RegisterBackgroundWorker(_BackgroundWorker * worker )_ 来注册它。在系统启动并运行后,也可以通过调用 RegisterDynamicBackgroundWorker(_BackgroundWorker * worker , BackgroundWorkerHandle ** handle_ ) . Unlike _RegisterBackgroundWorker 来启动后台工作程序,只能在邮局进程中调用它, RegisterDynamicBackgroundWorker 必须从常规后端或其他后台工作程序中调用。
Background workers can be initialized at the time that PostgreSQL is started by including the module name in shared_preload_libraries. A module wishing to run a background worker can register it by calling RegisterBackgroundWorker(_BackgroundWorker *worker)_ from its PG_init() function. Background workers can also be started after the system is up and running by calling RegisterDynamicBackgroundWorker(_BackgroundWorker *worker, BackgroundWorkerHandle **handle_). Unlike _RegisterBackgroundWorker, which can only be called from within the postmaster process, RegisterDynamicBackgroundWorker must be called from a regular backend or another background worker.
结构 BackgroundWorker 定义如下:
The structure BackgroundWorker is defined thus:
typedef void (*bgworker_main_type)(Datum main_arg);
typedef struct BackgroundWorker
{
char bgw_name[BGW_MAXLEN];
char bgw_type[BGW_MAXLEN];
int bgw_flags;
BgWorkerStartTime bgw_start_time;
int bgw_restart_time; /* in seconds, or BGW_NEVER_RESTART */
char bgw_library_name[BGW_MAXLEN];
char bgw_function_name[BGW_MAXLEN];
Datum bgw_main_arg;
char bgw_extra[BGW_EXTRALEN];
pid_t bgw_notify_pid;
} BackgroundWorker;
bgw_name 和 bgw_type 是要在日志消息、进程列表和类似上下文中使用的字符串。 bgw_type 对于同一类型的后台 worker 应相同,以便在进程列表中对这些 worker 进行分组。另一方面, bgw_name 可以包含关于特定进程的附加信息。(通常, bgw_name 的字符串会在某种程度上包含类型,但这并不是严格要求的。)
bgw_name and bgw_type are strings to be used in log messages, process listings and similar contexts. bgw_type should be the same for all background workers of the same type, so that it is possible to group such workers in a process listing, for example. bgw_name on the other hand can contain additional information about the specific process. (Typically, the string for bgw_name will contain the type somehow, but that is not strictly required.)
bgw_flags 是按位或运算的位掩码,指示模块需要的功能。可能的值有:
bgw_flags is a bitwise-or’d bit mask indicating the capabilities that the module wants. Possible values are:
-
BGWORKER_SHMEM_ACCESS
-
Requests shared memory access. This flag is required.
-
-
BGWORKER_BACKEND_DATABASE_CONNECTION
-
Requests the ability to establish a database connection through which it can later run transactions and queries. A background worker using BGWORKER_BACKEND_DATABASE_CONNECTION to connect to a database must also attach shared memory using BGWORKER_SHMEM_ACCESS, or worker start-up will fail.
-
bgw_start_time 是 postgres 应启动进程的服务器状态;它可以是 BgWorkerStart_PostmasterStart 之一(在 postgres 本身完成其自己的初始化后立即启动;请求此项的进程没有资格建立数据库连接), BgWorkerStart_ConsistentState (在热备中达到一致状态后立即启动,允许进程连接到数据库并运行仅读查询),和 BgWorkerStart_RecoveryFinished (在系统进入正常读写状态后立即启动)。请注意,最后两个值在不是热备的服务器中是等效的。请注意,此设置仅指示何时启动进程;当达到不同的状态时,它们不会停止。
bgw_start_time is the server state during which postgres should start the process; it can be one of BgWorkerStart_PostmasterStart (start as soon as postgres itself has finished its own initialization; processes requesting this are not eligible for database connections), BgWorkerStart_ConsistentState (start as soon as a consistent state has been reached in a hot standby, allowing processes to connect to databases and run read-only queries), and BgWorkerStart_RecoveryFinished (start as soon as the system has entered normal read-write state). Note the last two values are equivalent in a server that’s not a hot standby. Note that this setting only indicates when the processes are to be started; they do not stop when a different state is reached.
bgw_restart_time 是在进程崩溃时 postgres 应当等待的时间间隔(以秒为单位)以重新启动进程。它可以是任何正值,或者 BGW_NEVER_RESTART ,表示在崩溃时不重新启动进程。
bgw_restart_time is the interval, in seconds, that postgres should wait before restarting the process in the event that it crashes. It can be any positive value, or BGW_NEVER_RESTART, indicating not to restart the process in case of a crash.
bgw_library_name 是应在其中查找后台 worker 初始入口点的库名。命名库将由 worker 进程动态加载, bgw_function_name 将用于识别要调用的函数。如果调用核心代码中的函数,则必须将其设置为 "postgres" 。
bgw_library_name is the name of a library in which the initial entry point for the background worker should be sought. The named library will be dynamically loaded by the worker process and bgw_function_name will be used to identify the function to be called. If calling a function in the core code, this must be set to "postgres".
bgw_function_name 是用作新后台 worker 初始入口点的函数名。如果此函数位于动态加载的库中,则必须标记为 PGDLLEXPORT (而不是 static )。
bgw_function_name is the name of the function to use as the initial entry point for the new background worker. If this function is in a dynamically loaded library, it must be marked PGDLLEXPORT (and not static).
bgw_main_arg 是后台工作主函数的 Datum 。此主函数应采用 Datum 类型的一个参数,并返回 void 。 bgw_main_arg 将作为该参数传递。此外,全局变量 MyBgworkerEntry 指向注册时传递的 BackgroundWorker 结构体的副本;工作函数将此结构体视为有用的检查目标。
bgw_main_arg is the Datum argument to the background worker main function. This main function should take a single argument of type Datum and return void. bgw_main_arg will be passed as the argument. In addition, the global variable MyBgworkerEntry points to a copy of the BackgroundWorker structure passed at registration time; the worker may find it helpful to examine this structure.
在 Windows 中(以及定义了 EXEC_BACKEND 的任何其他地方)或在动态后台工作函数中,不能通过引用传递 Datum ,只能按值传递。如果需要参数,则最安全的方法是传递一个 int32 或其他小的值,并将其作为共享内存中分配的数组的索引。如果传递了像 cstring 或 text 的值,那么新后台工作进程将无法获取有效的指针。
On Windows (and anywhere else where EXEC_BACKEND is defined) or in dynamic background workers it is not safe to pass a Datum by reference, only by value. If an argument is required, it is safest to pass an int32 or other small value and use that as an index into an array allocated in shared memory. If a value like a cstring or text is passed then the pointer won’t be valid from the new background worker process.
bgw_extra 可以包含传递给后台工作函数的额外数据。与 bgw_main_arg 不同,此数据无法作为后台工作主函数的参数传递,但可以通过 MyBgworkerEntry 访问,如上所述。
bgw_extra can contain extra data to be passed to the background worker. Unlike bgw_main_arg, this data is not passed as an argument to the worker’s main function, but it can be accessed via MyBgworkerEntry, as discussed above.
bgw_notify_pid 是一个 PostgreSQL 后台进程的 PID,当该进程启动或退出后,后置服务器应向该进程发送 SIGUSR1 。该进程应为在后置服务器启动时注册的工作函数设置为 0,或在注册工作函数的后端不希望等待工作函数启动时设置为 0。否则应初始化为 MyProcPid 。
bgw_notify_pid is the PID of a PostgreSQL backend process to which the postmaster should send SIGUSR1 when the process is started or exits. It should be 0 for workers registered at postmaster startup time, or when the backend registering the worker does not wish to wait for the worker to start up. Otherwise, it should be initialized to MyProcPid.
运行后,通过调用 BackgroundWorkerInitializeConnection(_char *dbname , char *username , uint32 flags )或 BackgroundWorkerInitializeConnectionByOid(_Oid dboid , Oid useroid , uint32 flags ) . This allows the process to run transactions and queries using the _SPI 接口,该进程可以连接到一个数据库。如果 dbname 为 NULL 或 dboid 为 InvalidOid ,会话不会连接到任何特定数据库,但可以访问共享目录。如果 username 为 NULL 或 useroid 为 InvalidOid ,则该进程将以 initdb 期间创建的超级用户身份运行。如果 BGWORKER_BYPASS_ALLOWCONN 指定为 flags ,则可以绕过禁止连接不允许用户连接的数据库的限制。后台工作函数只能调用这两个函数之一,且只能调用一次。不能切换数据库。
Once running, the process can connect to a database by calling BackgroundWorkerInitializeConnection(_char *dbname, char *username, uint32 flags)_ or BackgroundWorkerInitializeConnectionByOid(_Oid dboid, Oid useroid, uint32 flags). This allows the process to run transactions and queries using the _SPI interface. If dbname is NULL or dboid is InvalidOid, the session is not connected to any particular database, but shared catalogs can be accessed. If username is NULL or useroid is InvalidOid, the process will run as the superuser created during initdb. If BGWORKER_BYPASS_ALLOWCONN is specified as flags it is possible to bypass the restriction to connect to databases not allowing user connections. A background worker can only call one of these two functions, and only once. It is not possible to switch databases.
当控制权到达后台工作函数的主函数时,信号最初会被阻塞,并且必须由它解除阻塞;此举允许进程根据需要自定义其信号处理程序。通过调用 BackgroundWorkerUnblockSignals 可以解除新进程中的信号阻塞,通过调用 BackgroundWorkerBlockSignals 可以阻止新进程中的信号。
Signals are initially blocked when control reaches the background worker’s main function, and must be unblocked by it; this is to allow the process to customize its signal handlers, if necessary. Signals can be unblocked in the new process by calling BackgroundWorkerUnblockSignals and blocked by calling BackgroundWorkerBlockSignals.
如果后台工作函数的 bgw_restart_time 配置为 BGW_NEVER_RESTART ,或者它使用 0 的退出代码退出,或者它被 TerminateBackgroundWorker 终止,它将在退出时自动注销后置服务器。否则,它将在通过 bgw_restart_time 配置的时间段后重新启动,或者如果后置服务器因后端故障而重新初始化群集,它将立即重启。只需要暂时挂起执行的后端应使用可中断的睡眠,而不是退出;通过调用 WaitLatch() 可以实现此目标。调用该函数时,请确保 WL_POSTMASTER_DEATH 标志已设置,并在紧急情况下验证返回代码,此时 postgres 本身已终止。
If bgw_restart_time for a background worker is configured as BGW_NEVER_RESTART, or if it exits with an exit code of 0 or is terminated by TerminateBackgroundWorker, it will be automatically unregistered by the postmaster on exit. Otherwise, it will be restarted after the time period configured via bgw_restart_time, or immediately if the postmaster reinitializes the cluster due to a backend failure. Backends which need to suspend execution only temporarily should use an interruptible sleep rather than exiting; this can be achieved by calling WaitLatch(). Make sure the WL_POSTMASTER_DEATH flag is set when calling that function, and verify the return code for a prompt exit in the emergency case that postgres itself has terminated.
当使用 RegisterDynamicBackgroundWorker 函数注册后台工作函数时,可以执行执行注册的后端以获取有关工作函数状态的信息。希望执行此操作的后端应将 BackgroundWorkerHandle * 的地址作为第二个参数传递给 RegisterDynamicBackgroundWorker 。如果工作函数成功注册,此指针将初始化为一个不透明的句柄,该句柄随后可以传递给 GetBackgroundWorkerPid(_BackgroundWorkerHandle * , pid_t * )或 TerminateBackgroundWorker(_BackgroundWorkerHandle * )。 . _GetBackgroundWorkerPid 可以用于轮询工作函数的状态:返回 BGWH_NOT_YET_STARTED 表示工作函数尚未被后置服务器启动; BGWH_STOPPED 表示工作函数已启动,但不再运行; BGWH_STARTED 表示工作函数当前正在运行。在最后一种情况下,PID 也会通过第二个参数返回。 TerminateBackgroundWorker 导致后置服务器向工作函数(如果正在运行)发送 SIGTERM ,并解除注册。
When a background worker is registered using the RegisterDynamicBackgroundWorker function, it is possible for the backend performing the registration to obtain information regarding the status of the worker. Backends wishing to do this should pass the address of a BackgroundWorkerHandle * as the second argument to RegisterDynamicBackgroundWorker. If the worker is successfully registered, this pointer will be initialized with an opaque handle that can subsequently be passed to GetBackgroundWorkerPid(_BackgroundWorkerHandle *, pid_t *)_ or TerminateBackgroundWorker(_BackgroundWorkerHandle *). _GetBackgroundWorkerPid can be used to poll the status of the worker: a return value of BGWH_NOT_YET_STARTED indicates that the worker has not yet been started by the postmaster; BGWH_STOPPED indicates that it has been started but is no longer running; and BGWH_STARTED indicates that it is currently running. In this last case, the PID will also be returned via the second argument. TerminateBackgroundWorker causes the postmaster to send SIGTERM to the worker if it is running, and to unregister it as soon as it is not.
在某些情况下,注册后台工作函数的进程希望等到工作函数启动。可以通过将 bgw_notify_pid 初始化为 MyProcPid ,然后将注册时获得的 BackgroundWorkerHandle * 传递给 WaitForBackgroundWorkerStartup(_BackgroundWorkerHandle *handle , pid_t * )函数来完成。该函数将一直阻塞,直到后置服务器尝试启动后台工作函数或直到后置服务器死亡。如果后台工作函数正在运行,返回值将为 BGWH_STARTED ,并且 PID 将会写入提供的地址。否则,返回值将为 BGWH_STOPPED 或 BGWH_POSTMASTER_DIED 。
In some cases, a process which registers a background worker may wish to wait for the worker to start up. This can be accomplished by initializing bgw_notify_pid to MyProcPid and then passing the BackgroundWorkerHandle * obtained at registration time to WaitForBackgroundWorkerStartup(_BackgroundWorkerHandle *handle, pid_t *)_ function. This function will block until the postmaster has attempted to start the background worker, or until the postmaster dies. If the background worker is running, the return value will be BGWH_STARTED, and the PID will be written to the provided address. Otherwise, the return value will be BGWH_STOPPED or BGWH_POSTMASTER_DIED.
一个进程还可以使用 WaitForBackgroundWorkerShutdown(_BackgroundWorkerHandle *handle )函数和传递注册时获得的 BackgroundWorkerHandle * 来等待后台工作函数关闭。该函数将一直阻塞,直到后台工作函数退出或后置服务器死亡。当后台工作函数退出时,返回值为 BGWH_STOPPED ,如果后置服务器死亡,它将返回 BGWH_POSTMASTER_DIED 。
A process can also wait for a background worker to shut down, by using the WaitForBackgroundWorkerShutdown(_BackgroundWorkerHandle *handle)_ function and passing the BackgroundWorkerHandle * obtained at registration. This function will block until the background worker exits, or postmaster dies. When the background worker exits, the return value is BGWH_STOPPED, if postmaster dies it will return BGWH_POSTMASTER_DIED.
后台工作函数可以通过 SPI 发送 NOTIFY 命令,或者通过 Async_Notify() 直接发送,发送异步通知消息。此类通知将在事务提交时发送。后台工作函数不应使用 LISTEN 命令来注册接收异步通知,因为没有用于工作函数使用此类通知的基础设施。
Background workers can send asynchronous notification messages, either by using the NOTIFY command via SPI, or directly via Async_Notify(). Such notifications will be sent at transaction commit. Background workers should not register to receive asynchronous notifications with the LISTEN command, as there is no infrastructure for a worker to consume such notifications.
src/test/modules/worker_spi 模块包含一个工作示例,演示了一些有用的技术。
The src/test/modules/worker_spi module contains a working example, which demonstrates some useful techniques.
可注册后台工作函数的最大数量受到 max_worker_processes 的限制。
The maximum number of registered background workers is limited by max_worker_processes.