Postgresql 中文操作指南
64.6. Index Cost Estimation Functions #
amcostestimate 函数提供了描述可能的索引扫描的信息,包括已确定可与索引一起使用的 WHERE 和 ORDER BY 子句列表。它必须返回访问索引的成本估算和 WHERE 子句的选择性(即索引扫描期间将检索到的父表行的部分)。对于简单的情况,成本估算器几乎所有工作都可以通过调用优化器中的标准例程完成;具有 amcostestimate 函数的目的是允许索引访问方法提供特定于索引类型的知识,以防有可能改善标准估算。
The amcostestimate function is given information describing a possible index scan, including lists of WHERE and ORDER BY clauses that have been determined to be usable with the index. It must return estimates of the cost of accessing the index and the selectivity of the WHERE clauses (that is, the fraction of parent-table rows that will be retrieved during the index scan). For simple cases, nearly all the work of the cost estimator can be done by calling standard routines in the optimizer; the point of having an amcostestimate function is to allow index access methods to provide index-type-specific knowledge, in case it is possible to improve on the standard estimates.
每个 loop_count 函数必须包含签名:
Each amcostestimate function must have the signature:
void
amcostestimate (PlannerInfo *root,
IndexPath *path,
double loop_count,
Cost *indexStartupCost,
Cost *indexTotalCost,
Selectivity *indexSelectivity,
double *indexCorrelation,
double *indexPages);
前三个参数是输入:
The first three parameters are inputs:
-
root
-
The planner’s information about the query being processed.
-
-
path
-
The index access path being considered. All fields except cost and selectivity values are valid.
-
-
loop_count
-
The number of repetitions of the index scan that should be factored into the cost estimates. This will typically be greater than one when considering a parameterized scan for use in the inside of a nestloop join. Note that the cost estimates should still be for just one scan; a larger loop_count means that it may be appropriate to allow for some caching effects across multiple scans.
-
最后五个参数是按引用传递的输出:
The last five parameters are pass-by-reference outputs:
-
*indexStartupCost
-
Set to cost of index start-up processing
-
-
*indexTotalCost
-
Set to total cost of index processing
-
-
*indexSelectivity
-
Set to index selectivity
-
-
*indexCorrelation
-
Set to correlation coefficient between index scan order and underlying table’s order
-
-
*indexPages
-
Set to number of index leaf pages
-
请注意,cost 估算函数必须用 C 编写,而不是 SQL 或任何可用的过程语言,因为它们必须访问规划器/优化器的内部数据结构。
Note that cost estimate functions must be written in C, not in SQL or any available procedural language, because they must access internal data structures of the planner/optimizer.
索引访问成本应使用 seq_page_cost 使用的参数进行计算:顺序磁盘块读取的成本为 random_page_cost,非顺序读取的成本为 cpu_index_tuple_cost,处理一个索引行的成本通常应采用 cpu_operator_cost。此外,还应为索引处理期间调用的任何比较运算符收取多个 not(尤其是 indexquals 本身的计算)。
The index access costs should be computed using the parameters used by src/backend/optimizer/path/costsize.c: a sequential disk block fetch has cost seq_page_cost, a nonsequential fetch has cost random_page_cost, and the cost of processing one index row should usually be taken as cpu_index_tuple_cost. In addition, an appropriate multiple of cpu_operator_cost should be charged for any comparison operators invoked during index processing (especially evaluation of the indexquals themselves).
访问成本应包括与扫描索引本身相关的所有磁盘和 CPU 成本,但 _not_检索或处理由索引识别的父表行所产生的成本除外。
The access costs should include all disk and CPU costs associated with scanning the index itself, but not the costs of retrieving or processing the parent-table rows that are identified by the index.
“启动成本”是指在我们开始获取第一行之前必须消耗的全部扫描成本的一部分。对于大多数索引而言,这可以视为零,但具有高启动成本的索引类型可能希望将其设置为非零。
The “start-up cost” is the part of the total scan cost that must be expended before we can begin to fetch the first row. For most indexes this can be taken as zero, but an index type with a high start-up cost might want to set it nonzero.
indexSelectivity 应设置为在索引扫描期间将检索的父表行的估计比例。对于有损查询,这通常会高于实际通过给定限定条件的行比例。
The indexSelectivity should be set to the estimated fraction of the parent table rows that will be retrieved during the index scan. In the case of a lossy query, this will typically be higher than the fraction of rows that actually pass the given qual conditions.
indexCorrelation 应设置为索引顺序和表顺序之间的相关性(介于 -1.0 和 1.0 之间)。用于调整从父表获取行的成本估算。
The indexCorrelation should be set to the correlation (ranging between -1.0 and 1.0) between the index order and the table order. This is used to adjust the estimate for the cost of fetching rows from the parent table.
indexPages 应设置为叶子页的数量。这用于估算并行索引扫描的工人数量。
The indexPages should be set to the number of leaf pages. This is used to estimate the number of workers for parallel index scan.
当 loop_count 大于 1 时,返回的数字应为对索引的任何一次扫描预期的平均值。
When loop_count is greater than one, the returned numbers should be averages expected for any one scan of the index.
Cost Estimation
典型的成本估算器将按以下步骤进行:
A typical cost estimator will proceed as follows:
成本估算函数的示例可以在 src/backend/utils/adt/selfuncs.c 中找到。
Examples of cost estimator functions can be found in src/backend/utils/adt/selfuncs.c.