Postgresql 中文操作指南
Synopsis
EXPLAIN [ ( option [, ...] ) ] statement
EXPLAIN [ ANALYZE ] [ VERBOSE ] statement
where option can be one of:
ANALYZE [ boolean ]
VERBOSE [ boolean ]
COSTS [ boolean ]
SETTINGS [ boolean ]
GENERIC_PLAN [ boolean ]
BUFFERS [ boolean ]
WAL [ boolean ]
TIMING [ boolean ]
SUMMARY [ boolean ]
FORMAT { TEXT | XML | JSON | YAML }
Description
此命令显示 PostgreSQL 规划器针对所提供的语句生成的执行计划。执行计划显示了语句引用的表将如何扫描——通过纯顺序扫描、索引扫描等——如果引用了多个表,将使用哪些连接算法将各个输入表中的必需行组合在一起。
This command displays the execution plan that the PostgreSQL planner generates for the supplied statement. The execution plan shows how the table(s) referenced by the statement will be scanned — by plain sequential scan, index scan, etc. — and if multiple tables are referenced, what join algorithms will be used to bring together the required rows from each input table.
显示中最关键的部分是估计的语句执行成本,这是规划器衡量运行语句需要多长时间的猜测(以任意的成本单位衡量,但按照惯例表示磁盘页面提取)。实际上显示了两个数字:第一个行可以返回之前的启动成本,以及返回所有行的总成本。对于大多数查询,总成本才是重要的事项,但在 EXISTS 中的子查询等上下文中,规划器将选择最小的启动成本而不是最小的总成本(因为执行器在获取一行后无论如何都会停止)。此外,如果您使用 LIMIT 子句限制要返回的行数,则规划器会在端点成本之间进行适当的插值,以估计哪个计划实际上是最便宜的。
The most critical part of the display is the estimated statement execution cost, which is the planner’s guess at how long it will take to run the statement (measured in cost units that are arbitrary, but conventionally mean disk page fetches). Actually two numbers are shown: the start-up cost before the first row can be returned, and the total cost to return all the rows. For most queries the total cost is what matters, but in contexts such as a subquery in EXISTS, the planner will choose the smallest start-up cost instead of the smallest total cost (since the executor will stop after getting one row, anyway). Also, if you limit the number of rows to return with a LIMIT clause, the planner makes an appropriate interpolation between the endpoint costs to estimate which plan is really the cheapest.
ANALYZE 选项导致实际执行该语句,而不仅仅是计划。然后,将实际运行时间统计信息添加到显示中,包括在每个计划节点中实际消耗的总时间(以毫秒为单位)以及它实际返回的行总数。这对于观察规划器的估算是否接近实际情况非常有用。
The ANALYZE option causes the statement to be actually executed, not only planned. Then actual run time statistics are added to the display, including the total elapsed time expended within each plan node (in milliseconds) and the total number of rows it actually returned. This is useful for seeing whether the planner’s estimates are close to reality.
Important
请记住,当使用 ANALYZE 选项时,会实际执行该语句。虽然 EXPLAIN 会丢弃 SELECT 会返回的任何输出,但语句的其他副作用将照常发生。如果您希望在 INSERT 、 UPDATE 、 DELETE 、 MERGE 、 CREATE TABLE AS 或 EXECUTE 语句中使用 EXPLAIN ANALYZE 而又不让命令影响您的数据,请使用此方法:
Keep in mind that the statement is actually executed when the ANALYZE option is used. Although EXPLAIN will discard any output that a SELECT would return, other side effects of the statement will happen as usual. If you wish to use EXPLAIN ANALYZE on an INSERT, UPDATE, DELETE, MERGE, CREATE TABLE AS, or EXECUTE statement without letting the command affect your data, use this approach:
BEGIN;
EXPLAIN ANALYZE ...;
ROLLBACK;
只能指定 ANALYZE 和 VERBOSE 选项,并且只能按照该顺序指定,而不能用括号括住选项列表。在 PostgreSQL 9.0 之前,未加括号的语法是唯一受支持的语法。预期所有新选项都只能在带括号的语法中受支持。
Only the ANALYZE and VERBOSE options can be specified, and only in that order, without surrounding the option list in parentheses. Prior to PostgreSQL 9.0, the unparenthesized syntax was the only one supported. It is expected that all new options will be supported only in the parenthesized syntax.
Parameters
-
ANALYZE
-
Carry out the command and show actual run times and other statistics. This parameter defaults to FALSE.
-
-
VERBOSE
-
Display additional information regarding the plan. Specifically, include the output column list for each node in the plan tree, schema-qualify table and function names, always label variables in expressions with their range table alias, and always print the name of each trigger for which statistics are displayed. The query identifier will also be displayed if one has been computed, see compute_query_id for more details. This parameter defaults to FALSE.
-
-
COSTS
-
Include information on the estimated startup and total cost of each plan node, as well as the estimated number of rows and the estimated width of each row. This parameter defaults to TRUE.
-
-
SETTINGS
-
Include information on configuration parameters. Specifically, include options affecting query planning with value different from the built-in default value. This parameter defaults to FALSE.
-
-
GENERIC_PLAN
-
Allow the statement to contain parameter placeholders like $1, and generate a generic plan that does not depend on the values of those parameters. See PREPARE for details about generic plans and the types of statement that support parameters. This parameter cannot be used together with ANALYZE. It defaults to FALSE.
-
-
BUFFERS
-
Include information on buffer usage. Specifically, include the number of shared blocks hit, read, dirtied, and written, the number of local blocks hit, read, dirtied, and written, the number of temp blocks read and written, and the time spent reading and writing data file blocks and temporary file blocks (in milliseconds) if track_io_timing is enabled. A hit means that a read was avoided because the block was found already in cache when needed. Shared blocks contain data from regular tables and indexes; local blocks contain data from temporary tables and indexes; while temporary blocks contain short-term working data used in sorts, hashes, Materialize plan nodes, and similar cases. The number of blocks dirtied indicates the number of previously unmodified blocks that were changed by this query; while the number of blocks written indicates the number of previously-dirtied blocks evicted from cache by this backend during query processing. The number of blocks shown for an upper-level node includes those used by all its child nodes. In text format, only non-zero values are printed. This parameter defaults to FALSE.
-
-
WAL
-
Include information on WAL record generation. Specifically, include the number of records, number of full page images (fpi) and the amount of WAL generated in bytes. In text format, only non-zero values are printed. This parameter may only be used when ANALYZE is also enabled. It defaults to FALSE.
-
-
TIMING
-
Include actual startup time and time spent in each node in the output. The overhead of repeatedly reading the system clock can slow down the query significantly on some systems, so it may be useful to set this parameter to FALSE when only actual row counts, and not exact times, are needed. Run time of the entire statement is always measured, even when node-level timing is turned off with this option. This parameter may only be used when ANALYZE is also enabled. It defaults to TRUE.
-
-
SUMMARY
-
Include summary information (e.g., totaled timing information) after the query plan. Summary information is included by default when ANALYZE is used but otherwise is not included by default, but can be enabled using this option. Planning time in EXPLAIN EXECUTE includes the time required to fetch the plan from the cache and the time required for re-planning, if necessary.
-
-
FORMAT
-
Specify the output format, which can be TEXT, XML, JSON, or YAML. Non-text output contains the same information as the text output format, but is easier for programs to parse. This parameter defaults to TEXT.
-
-
boolean
-
Specifies whether the selected option should be turned on or off. You can write TRUE, ON, or 1 to enable the option, and FALSE, OFF, or 0 to disable it. The boolean value can also be omitted, in which case TRUE is assumed.
-
-
statement
-
Any SELECT, INSERT, UPDATE, DELETE, MERGE, VALUES, EXECUTE, DECLARE, CREATE TABLE AS, or CREATE MATERIALIZED VIEW AS statement, whose execution plan you wish to see.
-
Outputs
该命令的结果是对 statement 选择的计划的文本描述,可选地带有执行统计信息注释。 Section 14.1 描述了所提供的信息。
The command’s result is a textual description of the plan selected for the statement, optionally annotated with execution statistics. Section 14.1 describes the information provided.
Notes
为了允许 PostgreSQL 查询规划器在优化查询时做出合理的明智决策, pg_statistic 数据应针对查询中使用的所有表都是最新的。通常 autovacuum daemon 会自动处理。但是,如果某个表的原始内容发生了重大更改,则可能需要进行手动 ANALYZE ,而不是等待自动清理赶上更改。
In order to allow the PostgreSQL query planner to make reasonably informed decisions when optimizing queries, the pg_statistic data should be up-to-date for all tables used in the query. Normally the autovacuum daemon will take care of that automatically. But if a table has recently had substantial changes in its contents, you might need to do a manual ANALYZE rather than wait for autovacuum to catch up with the changes.
为了测量执行计划中每个节点的运行时成本, EXPLAIN ANALYZE 的当前实现增加了查询执行的分析开销。因此,对查询运行 EXPLAIN ANALYZE 有时可能比正常执行查询所需的时间长得多。开销量取决于查询的性质以及所使用的平台。最坏的情况发生在本身每次执行所需时间很短的计划节点,以及获取当天时间的操作系统调用相对较慢的机器上。
In order to measure the run-time cost of each node in the execution plan, the current implementation of EXPLAIN ANALYZE adds profiling overhead to query execution. As a result, running EXPLAIN ANALYZE on a query can sometimes take significantly longer than executing the query normally. The amount of overhead depends on the nature of the query, as well as the platform being used. The worst case occurs for plan nodes that in themselves require very little time per execution, and on machines that have relatively slow operating system calls for obtaining the time of day.
Examples
要在具有单个 integer 列和 10000 行的表上显示简单查询的计划:
To show the plan for a simple query on a table with a single integer column and 10000 rows:
EXPLAIN SELECT * FROM foo;
QUERY PLAN
---------------------------------------------------------
Seq Scan on foo (cost=0.00..155.00 rows=10000 width=4)
(1 row)
这是具有 JSON 输出格式的相同查询:
Here is the same query, with JSON output formatting:
EXPLAIN (FORMAT JSON) SELECT * FROM foo;
QUERY PLAN
--------------------------------
[ +
{ +
"Plan": { +
"Node Type": "Seq Scan",+
"Relation Name": "foo", +
"Alias": "foo", +
"Startup Cost": 0.00, +
"Total Cost": 155.00, +
"Plan Rows": 10000, +
"Plan Width": 4 +
} +
} +
]
(1 row)
如果有索引,并且我们对可索引的 WHERE 条件使用查询,则 EXPLAIN 可能显示不同的计划:
If there is an index and we use a query with an indexable WHERE condition, EXPLAIN might show a different plan:
EXPLAIN SELECT * FROM foo WHERE i = 4;
QUERY PLAN
--------------------------------------------------------------
Index Scan using fi on foo (cost=0.00..5.98 rows=1 width=4)
Index Cond: (i = 4)
(2 rows)
这是相同的查询,但采用 YAML 格式:
Here is the same query, but in YAML format:
EXPLAIN (FORMAT YAML) SELECT * FROM foo WHERE i='4';
QUERY PLAN
-------------------------------
- Plan: +
Node Type: "Index Scan" +
Scan Direction: "Forward"+
Index Name: "fi" +
Relation Name: "foo" +
Alias: "foo" +
Startup Cost: 0.00 +
Total Cost: 5.98 +
Plan Rows: 1 +
Plan Width: 4 +
Index Cond: "(i = 4)"
(1 row)
XML 格式留给读者练习。
XML format is left as an exercise for the reader.
以下是隐藏成本估算的同计划:
Here is the same plan with cost estimates suppressed:
EXPLAIN (COSTS FALSE) SELECT * FROM foo WHERE i = 4;
QUERY PLAN
----------------------------
Index Scan using fi on foo
Index Cond: (i = 4)
(2 rows)
以下是一个使用聚合函数查询的查询计划示例:
Here is an example of a query plan for a query using an aggregate function:
EXPLAIN SELECT sum(i) FROM foo WHERE i < 10;
QUERY PLAN
---------------------------------------------------------------------
Aggregate (cost=23.93..23.93 rows=1 width=4)
-> Index Scan using fi on foo (cost=0.00..23.92 rows=6 width=4)
Index Cond: (i < 10)
(3 rows)
以下是使用 EXPLAIN EXECUTE 显示准备查询的执行计划的示例:
Here is an example of using EXPLAIN EXECUTE to display the execution plan for a prepared query:
PREPARE query(int, int) AS SELECT sum(bar) FROM test
WHERE id > $1 AND id < $2
GROUP BY foo;
EXPLAIN ANALYZE EXECUTE query(100, 200);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=10.77..10.87 rows=10 width=12) (actual time=0.043..0.044 rows=10 loops=1)
Group Key: foo
Batches: 1 Memory Usage: 24kB
-> Index Scan using test_pkey on test (cost=0.29..10.27 rows=99 width=8) (actual time=0.009..0.025 rows=99 loops=1)
Index Cond: ((id > 100) AND (id < 200))
Planning Time: 0.244 ms
Execution Time: 0.073 ms
(7 rows)
当然,此处显示的具体数字取决于所涉及的表中的实际内容。还要注意,由于计划人员的改进,这些数字,甚至选定的查询策略,可能会在不同的 PostgreSQL 版本之间有所不同。此外, ANALYZE 命令使用随机抽样来估算数据统计信息;因此,即使表中数据的实际分布未发生变化,仍然有可能在重新运行 ANALYZE 之后改变成本估算。
Of course, the specific numbers shown here depend on the actual contents of the tables involved. Also note that the numbers, and even the selected query strategy, might vary between PostgreSQL releases due to planner improvements. In addition, the ANALYZE command uses random sampling to estimate data statistics; therefore, it is possible for cost estimates to change after a fresh run of ANALYZE, even if the actual distribution of data in the table has not changed.
请注意,前面的示例显示了针对 EXECUTE 中给定的特定参数值的一个“自定义”计划。我们可能还希望看到带参数的查询的通用计划,这可以通过 GENERIC_PLAN 实现:
Notice that the previous example showed a “custom” plan for the specific parameter values given in EXECUTE. We might also wish to see the generic plan for a parameterized query, which can be done with GENERIC_PLAN:
EXPLAIN (GENERIC_PLAN)
SELECT sum(bar) FROM test
WHERE id > $1 AND id < $2
GROUP BY foo;
QUERY PLAN
-------------------------------------------------------------------------------
HashAggregate (cost=26.79..26.89 rows=10 width=12)
Group Key: foo
-> Index Scan using test_pkey on test (cost=0.29..24.29 rows=500 width=8)
Index Cond: ((id > $1) AND (id < $2))
(4 rows)
在这种情况下,解析器正确地推断出 $1 和 $2 应与 id 具有相同的数据类型,因此,缺少 PREPARE 的参数类型信息并没有问题。在其他情况下,可能需要为参数符号明确指定类型,可以执行转换,例如:
In this case the parser correctly inferred that $1 and $2 should have the same data type as id, so the lack of parameter type information from PREPARE was not a problem. In other cases it might be necessary to explicitly specify types for the parameter symbols, which can be done by casting them, for example:
EXPLAIN (GENERIC_PLAN)
SELECT sum(bar) FROM test
WHERE id > $1::integer AND id < $2::integer
GROUP BY foo;