Postgresql 中文操作指南
52.5. Planner/Optimizer #
planner/optimizer 的任务是创建最佳执行计划。一个给定的 SQL 查询(因此,一个查询树)实际上可以用多种不同的方式执行,每种方式都会产生相同的结果集。如果计算可行,查询优化器将检查每个可能的执行计划,最终选择预计运行最快的执行计划。
The task of the planner/optimizer is to create an optimal execution plan. A given SQL query (and hence, a query tree) can be actually executed in a wide variety of different ways, each of which will produce the same set of results. If it is computationally feasible, the query optimizer will examine each of these possible execution plans, ultimately selecting the execution plan that is expected to run the fastest.
Note
在某些情况下,查看查询可以执行的每种可能方式将占用大量时间和内存。尤其是在执行涉及大量联接操作的查询时,会出现这种情况。为了在合理的时间内确定一个合理的(不一定最优的)查询计划,当联接数超过一个阈值(参见 geqo_threshold)时,PostgreSQL 使用 Genetic Query Optimizer(参见 Chapter 62)。
In some situations, examining each possible way in which a query can be executed would take an excessive amount of time and memory. In particular, this occurs when executing queries involving large numbers of join operations. In order to determine a reasonable (not necessarily optimal) query plan in a reasonable amount of time, PostgreSQL uses a Genetic Query Optimizer (see Chapter 62) when the number of joins exceeds a threshold (see geqo_threshold).
计划程序的搜索过程实际上使用称为 paths 的数据结构,这些结构是仅包含计划程序进行决策所需信息的精简计划表示形式。在确定最便宜的路径之后,构造一个成熟的 plan tree 传递给执行器。这表示对执行器运行所需的足够详细的预期执行计划。在本节的其余部分中,我们将忽略路径和计划之间的区别。
The planner’s search procedure actually works with data structures called paths, which are simply cut-down representations of plans containing only as much information as the planner needs to make its decisions. After the cheapest path is determined, a full-fledged plan tree is built to pass to the executor. This represents the desired execution plan in sufficient detail for the executor to run it. In the rest of this section we’ll ignore the distinction between paths and plans.
52.5.1. Generating Possible Plans #
计划程序/优化器首先通过生成用于扫描查询中使用的每个单独关系(表)的计划来开始。可能的计划由每个关系上的可用索引确定。始终有可能在关系上执行顺序扫描,因此始终创建一个顺序扫描计划。假设在关系上定义了一个索引(例如 B 树索引)并且查询包含限制 relation.attribute OPR constant。如果 relation.attribute 碰巧与 B 树索引的键匹配并且 OPR 是索引 operator class 中列出的运算符之一,将使用 B 树索引创建一个其他计划来扫描关系。如果存在其他索引并且查询中的限制碰巧与某个索引的键匹配,则将考虑制定其他计划。索引扫描计划也针对具有可以与查询的 ORDER BY 子句(如果有的话)相匹配的排序顺序或可能对合并连接有用的排序顺序的索引进行生成(见下文)。
The planner/optimizer starts by generating plans for scanning each individual relation (table) used in the query. The possible plans are determined by the available indexes on each relation. There is always the possibility of performing a sequential scan on a relation, so a sequential scan plan is always created. Assume an index is defined on a relation (for example a B-tree index) and a query contains the restriction relation.attribute OPR constant. If relation.attribute happens to match the key of the B-tree index and OPR is one of the operators listed in the index’s operator class, another plan is created using the B-tree index to scan the relation. If there are further indexes present and the restrictions in the query happen to match a key of an index, further plans will be considered. Index scan plans are also generated for indexes that have a sort ordering that can match the query’s ORDER BY clause (if any), or a sort ordering that might be useful for merge joining (see below).
如果查询需要连接两个或更多个关系,则在为扫描单个关系找到所有可行的计划后考虑连接关系的计划。三种可用的连接策略是:
If the query requires joining two or more relations, plans for joining relations are considered after all feasible plans have been found for scanning single relations. The three available join strategies are:
当查询涉及两个以上关系时,最终结果必须通过连接步骤的树来构建,每个步骤有两个输入。计划程序检查不同的可能的连接序列以找到最便宜的序列。
When the query involves more than two relations, the final result must be built up by a tree of join steps, each with two inputs. The planner examines different possible join sequences to find the cheapest one.
如果查询使用少于 geqo_threshold 个关系,将进行近乎穷举的搜索,以找出最佳联接顺序。计划器优先考虑存在相应联接子句的任何两个关系之间的联接(即,存在类似 where rel1.attr1=rel2.attr2 的限制)。只有在没有其他选择时才会考虑没有联接子句的联接对,也就是说,特定关系没有与任何其他关系的可用联接子句。对于计划器考虑的每个联接对,将生成所有可能的计划,并选择(估计为)最便宜的计划。
If the query uses fewer than geqo_threshold relations, a near-exhaustive search is conducted to find the best join sequence. The planner preferentially considers joins between any two relations for which there exists a corresponding join clause in the WHERE qualification (i.e., for which a restriction like where rel1.attr1=rel2.attr2 exists). Join pairs with no join clause are considered only when there is no other choice, that is, a particular relation has no available join clauses to any other relation. All possible plans are generated for every join pair considered by the planner, and the one that is (estimated to be) the cheapest is chosen.
当超过 geqo_threshold 时,将通过启发法确定所考虑的联接顺序,如 Chapter 62 中所述。否则,这个过程是相同的。
When geqo_threshold is exceeded, the join sequences considered are determined by heuristics, as described in Chapter 62. Otherwise the process is the same.
完成的计划树由基础关系的顺序扫描或索引扫描组成,以及根据需要嵌套循环、合并或哈希连接节点,以及任何所需的辅助步骤,例如排序节点或聚合函数计算节点。这些计划节点类型中的大多数具有执行 selection(丢弃不满足指定布尔条件的行)和 projection(基于给定列值计算派生列集,即在需要时计算标量表达式)的附加能力。计划程序的职责之一是将 WHERE 子句中的选择条件和对所需输出表达式的计算附加到计划树的最合适节点。
The finished plan tree consists of sequential or index scans of the base relations, plus nested-loop, merge, or hash join nodes as needed, plus any auxiliary steps needed, such as sort nodes or aggregate-function calculation nodes. Most of these plan node types have the additional ability to do selection (discarding rows that do not meet a specified Boolean condition) and projection (computation of a derived column set based on given column values, that is, evaluation of scalar expressions where needed). One of the responsibilities of the planner is to attach selection conditions from the WHERE clause and computation of required output expressions to the most appropriate nodes of the plan tree.