Postgresql 中文操作指南

15.4. Parallel Safety #

计划器将查询中涉及的操作分类为 parallel safeparallel restrictedparallel unsafe。并行安全操作不会与并行查询的使用产生冲突。并行受限操作不能在并行工作器中执行,但可以在并行查询使用期间在领导者中执行。因此,并行受限操作永远不能出现在 GatherGather Merge 节点下面,但可以在包含此类节点的计划中的其他位置出现。并行不安全操作不能在并行查询使用期间执行,甚至不能在领导者中执行。当查询包含任何与并行不安全相关的内容时,将为该查询完全禁用并行查询。

The planner classifies operations involved in a query as either parallel safe, parallel restricted, or parallel unsafe. A parallel safe operation is one that does not conflict with the use of parallel query. A parallel restricted operation is one that cannot be performed in a parallel worker, but that can be performed in the leader while parallel query is in use. Therefore, parallel restricted operations can never occur below a Gather or Gather Merge node, but can occur elsewhere in a plan that contains such a node. A parallel unsafe operation is one that cannot be performed while parallel query is in use, not even in the leader. When a query contains anything that is parallel unsafe, parallel query is completely disabled for that query.

以下操作始终是并行受限的:

The following operations are always parallel restricted:

15.4.1. Parallel Labeling for Functions and Aggregates #

规划器无法自动确定用户自定义函数或聚合是并行安全的、并行受限的还是并行不安全的,因为这需要预测函数可能执行的每一项操作。总的来说,这等同于停机问题,因此是不可能的。即使对于可以想象可以执行的简单函数,我们也不会尝试,因为这将非常昂贵且容易出错。相反,除非另有标记,否则所有用户自定义函数都假定为并行不安全。当使用 CREATE FUNCTIONALTER FUNCTION 时,可以通过适当地指定 PARALLEL SAFEPARALLEL RESTRICTEDPARALLEL UNSAFE 来设置标记。当使用 CREATE AGGREGATE 时,可以将 PARALLEL 选项指定为 SAFERESTRICTEDUNSAFE 作为对应的值。

The planner cannot automatically determine whether a user-defined function or aggregate is parallel safe, parallel restricted, or parallel unsafe, because this would require predicting every operation that the function could possibly perform. In general, this is equivalent to the Halting Problem and therefore impossible. Even for simple functions where it could conceivably be done, we do not try, since this would be expensive and error-prone. Instead, all user-defined functions are assumed to be parallel unsafe unless otherwise marked. When using CREATE FUNCTION or ALTER FUNCTION, markings can be set by specifying PARALLEL SAFE, PARALLEL RESTRICTED, or PARALLEL UNSAFE as appropriate. When using CREATE AGGREGATE, the PARALLEL option can be specified with SAFE, RESTRICTED, or UNSAFE as the corresponding value.

如果函数和聚合写入数据库、访问序列、更改事务状态,即使是暂时更改(例如,建立 EXCEPTION 块以捕获错误的 PL/pgSQL 函数),或者对设置进行持久更改,则必须将它们标记为 PARALLEL UNSAFE。类似地,如果函数访问临时表、客户端连接状态、游标、预处理语句或系统无法跨工作器同步的其他后端本地状态,则必须将它们标记为 PARALLEL RESTRICTED。例如,出于此最后原因,setseedrandom 是并行受限的。

Functions and aggregates must be marked PARALLEL UNSAFE if they write to the database, access sequences, change the transaction state even temporarily (e.g., a PL/pgSQL function that establishes an EXCEPTION block to catch errors), or make persistent changes to settings. Similarly, functions must be marked PARALLEL RESTRICTED if they access temporary tables, client connection state, cursors, prepared statements, or miscellaneous backend-local state that the system cannot synchronize across workers. For example, setseed and random are parallel restricted for this last reason.

一般而言,如果一个函数被标注为受限或不安全时被标记为安全,或者如果一个函数被标注为受限,而事实上它是 不安全的,那么当在并行查询中使用它时,它可能会引发错误或生成错误答案。如果标注错误,用 C 语言编写的函数理论上可能表现出完全未定义的行为,因为系统无法保护自身免遭任意的 C 代码,但在大多数情况下,结果不会比任何其他函数差。如有疑问,最好将函数标记为 UNSAFE

In general, if a function is labeled as being safe when it is restricted or unsafe, or if it is labeled as being restricted when it is in fact unsafe, it may throw errors or produce wrong answers when used in a parallel query. C-language functions could in theory exhibit totally undefined behavior if mislabeled, since there is no way for the system to protect itself against arbitrary C code, but in most likely cases the result will be no worse than for any other function. If in doubt, it is probably best to label functions as UNSAFE.

如果在并行工作进程中执行的函数获取了领导者不持有的锁,例如通过查询未在查询中引用的表,这些锁将在工作进程退出时释放,而不会在事务结束时释放。如果您编写了执行此操作的函数,并且这种行为差异对您很重要,请将此类函数标记为 PARALLEL RESTRICTED 以确保它们仅在领导者中执行。

If a function executed within a parallel worker acquires locks that are not held by the leader, for example by querying a table not referenced in the query, those locks will be released at worker exit, not end of transaction. If you write a function that does this, and this behavior difference is important to you, mark such functions as PARALLEL RESTRICTED to ensure that they execute only in the leader.

请注意,查询计划器不会考虑延迟查询中涉及的并行受限函数或聚合的评估,以获取更好的计划。因此,例如,如果应用于特定表的 WHERE 语句受并行限制,则查询计划器将不会考虑在计划的并行部分中扫描该表。在某些情况下,可以在查询的并行部分中包含该表的扫描并且延迟评估 WHERE 语句,以便它发生在 Gather 节点之上,这是可能的(甚至可能是有效的)。但是,计划器不会执行此操作。

Note that the query planner does not consider deferring the evaluation of parallel-restricted functions or aggregates involved in the query in order to obtain a superior plan. So, for example, if a WHERE clause applied to a particular table is parallel restricted, the query planner will not consider performing a scan of that table in the parallel portion of a plan. In some cases, it would be possible (and perhaps even efficient) to include the scan of that table in the parallel portion of the query and defer the evaluation of the WHERE clause so that it happens above the Gather node. However, the planner does not do this.