Postgresql 中文操作指南

ANALYZE

ANALYZE — 收集有关数据库的统计信息

ANALYZE — collect statistics about a database

Synopsis

ANALYZE [ ( option [, ...] ) ] [ table_and_columns [, ...] ]
ANALYZE [ VERBOSE ] [ table_and_columns [, ...] ]

where option can be one of:

    VERBOSE [ boolean ]
    SKIP_LOCKED [ boolean ]
    BUFFER_USAGE_LIMIT size

and table_and_columns is:

    table_name [ ( column_name [, ...] ) ]

Description

ANALYZE 收集有关数据库中表内容的统计信息,并将结果存储在 pg_statistic 系统目录中。随后,查询计划程序使用这些统计信息来帮助确定查询的最有效执行计划。

ANALYZE collects statistics about the contents of tables in the database, and stores the results in the pg_statistic system catalog. Subsequently, the query planner uses these statistics to help determine the most efficient execution plans for queries.

如果没有 table_and_columns 列表, ANALYZE 将处理当前数据库中当前用户有权分析的每个表和物化视图。对于列表来说, ANALYZE 只处理那些表。此外,还可以为表提供一个列名列表,在这种情况下,只收集这些列的统计信息。

Without a table_and_columns list, ANALYZE processes every table and materialized view in the current database that the current user has permission to analyze. With a list, ANALYZE processes only those table(s). It is further possible to give a list of column names for a table, in which case only the statistics for those columns are collected.

当选项列表被括号包围时,可以按任何顺序编写选项。PostgreSQL 11 中添加了带括号的语法;不带括号的语法已被弃用。

When the option list is surrounded by parentheses, the options can be written in any order. The parenthesized syntax was added in PostgreSQL 11; the unparenthesized syntax is deprecated.

Parameters

  • VERBOSE

    • Enables display of progress messages.

  • SKIP_LOCKED

    • Specifies that ANALYZE should not wait for any conflicting locks to be released when beginning work on a relation: if a relation cannot be locked immediately without waiting, the relation is skipped. Note that even with this option, ANALYZE may still block when opening the relation’s indexes or when acquiring sample rows from partitions, table inheritance children, and some types of foreign tables. Also, while ANALYZE ordinarily processes all partitions of specified partitioned tables, this option will cause ANALYZE to skip all partitions if there is a conflicting lock on the partitioned table.

  • BUFFER_USAGE_LIMIT

    • Specifies the [role="bare"]glossary.html#GLOSSARY-BUFFER-ACCESS-STRATEGYBuffer Access Strategy ring buffer size for ANALYZE. This size is used to calculate the number of shared buffers which will be reused as part of this strategy. 0 disables use of a Buffer Access Strategy. When this option is not specified, ANALYZE uses the value from vacuum_buffer_usage_limit. Higher settings can allow ANALYZE to run more quickly, but having too large a setting may cause too many other useful pages to be evicted from shared buffers. The minimum value is 128 kB and the maximum value is 16 GB.

  • boolean

    • Specifies whether the selected option should be turned on or off. You can write TRUE, ON, or 1 to enable the option, and FALSE, OFF, or 0 to disable it. The boolean value can also be omitted, in which case TRUE is assumed.

  • size

    • Specifies an amount of memory in kilobytes. Sizes may also be specified as a string containing the numerical size followed by any one of the following memory units: B (bytes), kB (kilobytes), MB (megabytes), GB (gigabytes), or TB (terabytes).

  • table_name

    • The name (possibly schema-qualified) of a specific table to analyze. If omitted, all regular tables, partitioned tables, and materialized views in the current database are analyzed (but not foreign tables). If the specified table is a partitioned table, both the inheritance statistics of the partitioned table as a whole and statistics of the individual partitions are updated.

  • column_name

    • The name of a specific column to analyze. Defaults to all columns.

Outputs

在指定了 VERBOSE 时, ANALYZE 发出进度消息来指示当前正在处理哪个表。还会打印有关表的各种统计信息。

When VERBOSE is specified, ANALYZE emits progress messages to indicate which table is currently being processed. Various statistics about the tables are printed as well.

Notes

要分析表,一般必须是该表的拥有者或超级用户。但是,允许数据库所有者分析其数据库中的所有表,但共享目录除外。(共享目录的限制意味着真正的数据库范围内的 ANALYZE 只能由超级用户执行。) ANALYZE 将跳过调用用户无权分析的任何表。

To analyze a table, one must ordinarily be the table’s owner or a superuser. However, database owners are allowed to analyze all tables in their databases, except shared catalogs. (The restriction for shared catalogs means that a true database-wide ANALYZE can only be performed by a superuser.) ANALYZE will skip over any tables that the calling user does not have permission to analyze.

外键表只能在明确选择时进行分析。并非所有外键数据包装器都支持 ANALYZE 。如果该表的包装器不支持 ANALYZE ,该命令会打印一个警告,并且不执行任何操作。

Foreign tables are analyzed only when explicitly selected. Not all foreign data wrappers support ANALYZE. If the table’s wrapper does not support ANALYZE, the command prints a warning and does nothing.

在默认 PostgreSQL 配置中,自动真空守护进程(请参阅 Section 25.1.6 )在首次向表加载数据时以及在整个常规操作过程中表发生更改时,负责表的自动分析。禁用自动真空时,建议定期运行 ANALYZE ,或者刚好在表的内容中进行了重大更改之后运行。准确的统计数据将帮助规划器选择最合适的查询计划,从而提高查询处理速度。仅读数据库的共同策略是在一天中使用率较低的时间段内每天运行一次 VACUUMANALYZE 。(如果有繁重的更新活动,这将不足以应对。)

In the default PostgreSQL configuration, the autovacuum daemon (see Section 25.1.6) takes care of automatic analyzing of tables when they are first loaded with data, and as they change throughout regular operation. When autovacuum is disabled, it is a good idea to run ANALYZE periodically, or just after making major changes in the contents of a table. Accurate statistics will help the planner to choose the most appropriate query plan, and thereby improve the speed of query processing. A common strategy for read-mostly databases is to run VACUUM and ANALYZE once a day during a low-usage time of day. (This will not be sufficient if there is heavy update activity.)

ANALYZE 只需要读锁对目标表进行锁定,这样便可在与表中的其他活动并行运行时。

ANALYZE requires only a read lock on the target table, so it can run in parallel with other activity on the table.

ANALYZE 收集的统计数据通常包括每次各列中部分最常见值的列表以及显示各列中近似数据分布的直方图。如果 ANALYZE 认为这些数据不有趣(例如,在唯一键一列中,没有常见值),或者列数据类型不支持合适的运算符,则可以省略其中一个或两个数据。 Chapter 25 中提供了有关统计数据的更多信息。

The statistics collected by ANALYZE usually include a list of some of the most common values in each column and a histogram showing the approximate data distribution in each column. One or both of these can be omitted if ANALYZE deems them uninteresting (for example, in a unique-key column, there are no common values) or if the column data type does not support the appropriate operators. There is more information about the statistics in Chapter 25.

对于较大的表, ANALYZE 会对表内容进行随机抽样,而不是检查每一行。这样即使是非常大的表也可以在短时间内完成分析。然而,请注意统计数据只是一种近似,并且每次运行 ANALYZE 时都会略有变化,即使实际表内容没有更改。这可能会导致 EXPLAIN 显示的规划器估算成本中出现小幅更改。在极少数情况下,这种不确定性会导致规划器在运行 ANALYZE 之后更改查询计划的选择。为避免这种情况,请提高 ANALYZE 收集的统计数据量,如下所述。

For large tables, ANALYZE takes a random sample of the table contents, rather than examining every row. This allows even very large tables to be analyzed in a small amount of time. Note, however, that the statistics are only approximate, and will change slightly each time ANALYZE is run, even if the actual table contents did not change. This might result in small changes in the planner’s estimated costs shown by EXPLAIN. In rare situations, this non-determinism will cause the planner’s choices of query plans to change after ANALYZE is run. To avoid this, raise the amount of statistics collected by ANALYZE, as described below.

可以通过调整 default_statistics_target 配置变量或者通过 ALTER TABLE …​ ALTER COLUMN …​ SET STATISTICS 为每列设置每列统计信息目标来控制分析范围。目标值设置最常见值列表中的最大条目数和直方图中的最大槽数。默认目标值为 100,但可以向上或向下调整目标值,以平衡规划器估算的准确性与运行 ANALYZE 所需时间和 pg_statistic 占据的空间。特别是,将统计信息目标设为零将禁用对该列统计信息进行收集。对不会作为查询的 WHEREGROUP BYORDER BY 子句的一部分使用的列执行此操作可能有用,因为规划器将无法使用此类列上的统计信息。

The extent of analysis can be controlled by adjusting the default_statistics_target configuration variable, or on a column-by-column basis by setting the per-column statistics target with ALTER TABLE …​ ALTER COLUMN …​ SET STATISTICS. The target value sets the maximum number of entries in the most-common-value list and the maximum number of bins in the histogram. The default target value is 100, but this can be adjusted up or down to trade off accuracy of planner estimates against the time taken for ANALYZE and the amount of space occupied in pg_statistic. In particular, setting the statistics target to zero disables collection of statistics for that column. It might be useful to do that for columns that are never used as part of the WHERE, GROUP BY, or ORDER BY clauses of queries, since the planner will have no use for statistics on such columns.

要分析的列中最多的统计信息目标将决定要准备这些统计信息时所抽样的表行数。增加目标值将使 ANALYZE 所需时间和空间成比例增加。

The largest statistics target among the columns being analyzed determines the number of table rows sampled to prepare the statistics. Increasing the target causes a proportional increase in the time and space needed to do ANALYZE.

ANALYZE 估算的一个值是各列中出现的唯一值的数量。由于只检查了子集行,因此即使具有最大的统计信息目标,有时该估算可能相当不准确。如果这种不准确性导致糟糕的查询计划,可以手动确定一个更准确的值,然后使用 ALTER TABLE …​ ALTER COLUMN …​ SET (n_distinct = …​) 安装。

One of the values estimated by ANALYZE is the number of distinct values that appear in each column. Because only a subset of the rows are examined, this estimate can sometimes be quite inaccurate, even with the largest possible statistics target. If this inaccuracy leads to bad query plans, a more accurate value can be determined manually and then installed with ALTER TABLE …​ ALTER COLUMN …​ SET (n_distinct = …​).

如果要分析的表具有继承子项, ANALYZE 将收集两组统计数据:一组仅针对父表的行,另一组包括父表的行和所有子表的行。在将继承树作为一个整体进行处理的查询规划时需要这第二组数据。在这种情况下,不会单独分析子表。然而,自动真空守护进程在决定是否对该表触发自动分析时,只会考虑父表本身的插入或更新。如果很少插入或更新该表,则除非您手动运行 ANALYZE ,否则继承统计信息将不会是最新的。

If the table being analyzed has inheritance children, ANALYZE gathers two sets of statistics: one on the rows of the parent table only, and a second including rows of both the parent table and all of its children. This second set of statistics is needed when planning queries that process the inheritance tree as a whole. The child tables themselves are not individually analyzed in this case. The autovacuum daemon, however, will only consider inserts or updates on the parent table itself when deciding whether to trigger an automatic analyze for that table. If that table is rarely inserted into or updated, the inheritance statistics will not be up to date unless you run ANALYZE manually.

对于分区表, ANALYZE 通过从所有分区抽样行来收集统计信息;此外,它还会递归到每个分区并更新其统计信息。即使是多级分区,每个叶分区也只进行一次分析。不会仅针对父表(不包含其分区中的数据)收集统计信息,因为在分区的情况下,该表保证是空的。

For partitioned tables, ANALYZE gathers statistics by sampling rows from all partitions; in addition, it will recurse into each partition and update its statistics. Each leaf partition is analyzed only once, even with multi-level partitioning. No statistics are collected for only the parent table (without data from its partitions), because with partitioning it’s guaranteed to be empty.

自动真空守护进程不会处理分区表,也不会处理仅修改了子项的继承父项。通常需要定期手动运行 ANALYZE 以保持表层次结构的统计信息是最新的。

The autovacuum daemon does not process partitioned tables, nor does it process inheritance parents if only the children are ever modified. It is usually necessary to periodically run a manual ANALYZE to keep the statistics of the table hierarchy up to date.

如果任何子表或分区的外国数据包装器不支持 ANALYZE 的话,则在收集继承统计信息时会忽略这些表。

If any child tables or partitions are foreign tables whose foreign data wrappers do not support ANALYZE, those tables are ignored while gathering inheritance statistics.

如果被分析的表是完全空的, ANALYZE 将不会为该表记录新的统计信息。将保留任何现有的统计信息。

If the table being analyzed is completely empty, ANALYZE will not record new statistics for that table. Any existing statistics will be retained.

每个运行 ANALYZE 的后端都会在 pg_stat_progress_analyze 视图中报告其进度。有关详细信息,请参阅 Section 28.4.1

Each backend running ANALYZE will report its progress in the pg_stat_progress_analyze view. See Section 28.4.1 for details.

Compatibility

SQL 标准中没有 ANALYZE 语句。

There is no ANALYZE statement in the SQL standard.