Postgresql 中文操作指南
15.1. How Parallel Query Works #
当优化器确定并行查询是特定查询最快的执行策略时,它将创建一个包含 Gather 或 Gather Merge 节点的查询计划。下面是一个简单的示例:
When the optimizer determines that parallel query is the fastest execution strategy for a particular query, it will create a query plan that includes a Gather or Gather Merge node. Here is a simple example:
EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
QUERY PLAN
-------------------------------------------------------------------------------------
Gather (cost=1000.00..217018.43 rows=1 width=97)
Workers Planned: 2
-> Parallel Seq Scan on pgbench_accounts (cost=0.00..216018.33 rows=1 width=97)
Filter: (filler ~~ '%x%'::text)
(4 rows)
在所有情况下,Gather 或 Gather Merge 节点将只有一个子计划,这是将在并行中执行的计划部分。如果 Gather 或 Gather Merge 节点位于计划树的最顶部,则整个查询将并行执行。如果它位于计划树中的其他位置,则它下面的计划部分将并行执行。在上面的示例中,该查询仅访问一个表,因此除了 Gather 节点本身之外,只有一个计划节点;由于该计划节点是 Gather 节点的子节点,因此它将并行执行。
In all cases, the Gather or Gather Merge node will have exactly one child plan, which is the portion of the plan that will be executed in parallel. If the Gather or Gather Merge node is at the very top of the plan tree, then the entire query will execute in parallel. If it is somewhere else in the plan tree, then only the portion of the plan below it will run in parallel. In the example above, the query accesses only one table, so there is only one plan node other than the Gather node itself; since that plan node is a child of the Gather node, it will run in parallel.
Using EXPLAIN,您可以看到规划器选择的工人的数量。当在查询执行过程中到达 Gather 节点时,实现用户会话的进程将请求与规划器选择的工人的数量相等数量的 background worker processes。规划器将考虑使用的后台工人的数量最多限制为 max_parallel_workers_per_gather。在任何时间点可能存在的后台工人的总数同时受到 max_worker_processes 和 max_parallel_workers 的限制。因此,可以以少量于计划数量的工人,甚至没有工人来运行并行查询。最佳计划可能取决于可用的工人数量,因此这会导致查询性能不佳。如果这种情况经常发生,请考虑增加 max_worker_processes 和 max_parallel_workers,以便可以同时运行更多工人,或者减少 max_parallel_workers_per_gather,以便规划器请求更少的工人。
Using EXPLAIN, you can see the number of workers chosen by the planner. When the Gather node is reached during query execution, the process that is implementing the user’s session will request a number of background worker processes equal to the number of workers chosen by the planner. The number of background workers that the planner will consider using is limited to at most max_parallel_workers_per_gather. The total number of background workers that can exist at any one time is limited by both max_worker_processes and max_parallel_workers. Therefore, it is possible for a parallel query to run with fewer workers than planned, or even with no workers at all. The optimal plan may depend on the number of workers that are available, so this can result in poor query performance. If this occurrence is frequent, consider increasing max_worker_processes and max_parallel_workers so that more workers can be run simultaneously or alternatively reducing max_parallel_workers_per_gather so that the planner requests fewer workers.
为给定的并行查询成功启动的每个后台 worker 进程都将执行计划的并行部分。领导者也将执行计划的该部分,但它还有额外的职责:它还必须读取 worker 生成的所有元组。当计划的并行部分只生成少量的元组时,领导者通常表现得非常像一个额外的 worker,加速了查询执行。相反,当计划的并行部分生成大量的元组时,领导者可能几乎完全忙于读取 worker 生成的元组,并执行 Gather 节点或 Gather Merge 节点上方计划节点所需的任何进一步处理步骤。在这种情况下,领导者将在执行计划的并行部分方面做得很少。
Every background worker process that is successfully started for a given parallel query will execute the parallel portion of the plan. The leader will also execute that portion of the plan, but it has an additional responsibility: it must also read all of the tuples generated by the workers. When the parallel portion of the plan generates only a small number of tuples, the leader will often behave very much like an additional worker, speeding up query execution. Conversely, when the parallel portion of the plan generates a large number of tuples, the leader may be almost entirely occupied with reading the tuples generated by the workers and performing any further processing steps that are required by plan nodes above the level of the Gather node or Gather Merge node. In such cases, the leader will do very little of the work of executing the parallel portion of the plan.
当计划并行部分顶部的节点是 Gather Merge 而不是 Gather 时,它表明执行计划并行部分的每个进程正在以排序顺序生成元组,并且领导者正在执行保序合并。相比之下,Gather 从 worker 按照任何方便的方式读取元组,破坏可能存在的任何排序顺序。
When the node at the top of the parallel portion of the plan is Gather Merge rather than Gather, it indicates that each process executing the parallel portion of the plan is producing tuples in sorted order, and that the leader is performing an order-preserving merge. In contrast, Gather reads tuples from the workers in whatever order is convenient, destroying any sort order that may have existed.