Postgresql 中文操作指南

76.1. Row Estimation Examples #

下面显示的示例使用 PostgreSQL 回归测试数据库中的表。显示的输出来自版本 8.3。早期(或晚期)版本的行为可能会有所不同。还要注意,由于 ANALYZE 在生成统计信息时使用随机采样,因此在任何新 ANALYZE 之后,结果会略有变化。

The examples shown below use tables in the PostgreSQL regression test database. The outputs shown are taken from version 8.3. The behavior of earlier (or later) versions might vary. Note also that since ANALYZE uses random sampling while producing statistics, the results will change slightly after any new ANALYZE.

让我们从一个非常简单的查询开始:

Let’s start with a very simple query:

EXPLAIN SELECT * FROM tenk1;

                         QUERY PLAN
-------------------------------------------------------------
 Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=244)

Section 14.2 中介绍了规划器如何确定 _tenk1_的基数,但这里为了完整起见重复一遍。页面和行数在 _pg_class_中查找:

How the planner determines the cardinality of tenk1 is covered in Section 14.2, but is repeated here for completeness. The number of pages and rows is looked up in pg_class:

SELECT relpages, reltuples FROM pg_class WHERE relname = 'tenk1';

 relpages | reltuples
----------+-----------
      358 |     10000

这些数字是最后 VACUUMANALYZE 表上的当前值。然后,计划程序提取表中当前页的实际数量(这是一个花费不大的操作,不需要表扫描)。如果与 relpages 不同,则相应地按比例调整 reltuples 以得出当前的行数估计。在以上示例中,relpages 的值是最新的,因此行估计与 reltuples 相同。

These numbers are current as of the last VACUUM or ANALYZE on the table. The planner then fetches the actual current number of pages in the table (this is a cheap operation, not requiring a table scan). If that is different from relpages then reltuples is scaled accordingly to arrive at a current number-of-rows estimate. In the example above, the value of relpages is up-to-date so the rows estimate is the same as reltuples.

让我们继续讨论一个在 WHERE 子句中带范围条件的示例:

Let’s move on to an example with a range condition in its WHERE clause:

EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 1000;

                                   QUERY PLAN
-------------------------------------------------------------------​-------------
 Bitmap Heap Scan on tenk1  (cost=24.06..394.64 rows=1007 width=244)
   Recheck Cond: (unique1 < 1000)
   ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..23.80 rows=1007 width=0)
         Index Cond: (unique1 < 1000)

计划程序检查 WHERE 子句条件并查找 pg_operator< 操作符的选择性函数。这保存在 oprrest 列中,在这种情况下,条目是 scalarltselscalarltsel 函数从 pg_statistic 中检索 unique1 的直方图。对于手动查询,在更简单的 pg_stats 视图中查找更为方便:

The planner examines the WHERE clause condition and looks up the selectivity function for the operator < in pg_operator. This is held in the column oprrest, and the entry in this case is scalarltsel. The scalarltsel function retrieves the histogram for unique1 from pg_statistic. For manual queries it is more convenient to look in the simpler pg_stats view:

SELECT histogram_bounds FROM pg_stats
WHERE tablename='tenk1' AND attname='unique1';

                   histogram_bounds
------------------------------------------------------
 {0,993,1997,3050,4040,5036,5957,7057,8029,9016,9995}

接下来,计算出直方图中“< 1000”所占的比例。这就是选择性。直方图将范围分为相等频率的存储段,所以我们要做的就是找到我们数值所在的存储段并计算_part_和之前_all_。1000 的值显然在第二个存储段(993–1997)中。假设每个存储段内的值呈线性分布,我们可以计算选择性为:

Next the fraction of the histogram occupied by “< 1000” is worked out. This is the selectivity. The histogram divides the range into equal frequency buckets, so all we have to do is locate the bucket that our value is in and count part of it and all of the ones before. The value 1000 is clearly in the second bucket (993–1997). Assuming a linear distribution of values inside each bucket, we can calculate the selectivity as:

selectivity = (1 + (1000 - bucket[2].min)/(bucket[2].max - bucket[2].min))/num_buckets
            = (1 + (1000 - 993)/(1997 - 993))/10
            = 0.100697

也就是说,一个完整的存储段加上第二个存储段的线性部分,除以存储段的数量。现在可以将基数与选择性相乘,计算出估计行数:

that is, one whole bucket plus a linear fraction of the second, divided by the number of buckets. The estimated number of rows can now be calculated as the product of the selectivity and the cardinality of tenk1:

rows = rel_cardinality * selectivity
     = 10000 * 0.100697
     = 1007  (rounding off)

接下来让我们考虑一个在 WHERE 子句中具有相等条件的示例:

Next let’s consider an example with an equality condition in its WHERE clause:

EXPLAIN SELECT * FROM tenk1 WHERE stringu1 = 'CRAAAA';

                        QUERY PLAN
----------------------------------------------------------
 Seq Scan on tenk1  (cost=0.00..483.00 rows=30 width=244)
   Filter: (stringu1 = 'CRAAAA'::name)

计划程序再次检查 WHERE 子句条件并查找 = 的选择性函数,即 eqsel。对于相等估计,直方图没有用;相反,使用 most common values (MCV) 列表来确定选择性。让我们看看 MCV,其中一些附加的列在以后会派上用场:

Again the planner examines the WHERE clause condition and looks up the selectivity function for =, which is eqsel. For equality estimation the histogram is not useful; instead the list of most common values (MCVs) is used to determine the selectivity. Let’s have a look at the MCVs, with some additional columns that will be useful later:

SELECT null_frac, n_distinct, most_common_vals, most_common_freqs FROM pg_stats
WHERE tablename='tenk1' AND attname='stringu1';

null_frac         | 0
n_distinct        | 676
most_common_vals  | {EJAAAA,BBAAAA,CRAAAA,FCAAAA,FEAAAA,GSAAAA,​JOAAAA,MCAAAA,NAAAAA,WGAAAA}
most_common_freqs | {0.00333333,0.003,0.003,0.003,0.003,0.003,​0.003,0.003,0.003,0.003}

由于 CRAAAA 出现在 MCV 列表中,所以选择性仅仅是常用频率 (MCF) 列表中对应的条目:

Since CRAAAA appears in the list of MCVs, the selectivity is merely the corresponding entry in the list of most common frequencies (MCFs):

selectivity = mcf[3]
            = 0.003

和以前一样,估计行数只是此数与 tenk1 基数的乘积:

As before, the estimated number of rows is just the product of this with the cardinality of tenk1:

rows = 10000 * 0.003
     = 30

现在考虑相同的查询,但使用不在 MCV 列表中的常量:

Now consider the same query, but with a constant that is not in the MCV list:

EXPLAIN SELECT * FROM tenk1 WHERE stringu1 = 'xxx';

                        QUERY PLAN
----------------------------------------------------------
 Seq Scan on tenk1  (cost=0.00..483.00 rows=15 width=244)
   Filter: (stringu1 = 'xxx'::name)

这是一个完全不同的问题:当 MCV 列表中没有值 not 时如何估计选择性。其方法是使用值不在列表中这一事实,以及了解所有 MCV 的频率:

This is quite a different problem: how to estimate the selectivity when the value is not in the MCV list. The approach is to use the fact that the value is not in the list, combined with the knowledge of the frequencies for all of the MCVs:

selectivity = (1 - sum(mcv_freqs))/(num_distinct - num_mcv)
            = (1 - (0.00333333 + 0.003 + 0.003 + 0.003 + 0.003 + 0.003 +
                    0.003 + 0.003 + 0.003 + 0.003))/(676 - 10)
            = 0.0014559

也就是说,将所有 MCV 的频率加起来,并从中减去 1,然后除以 other 不同值的数目。这相当于假设不在任何 MCV 中的列部分在所有其他不同值中均匀分布。请注意没有空值,所以我们不必担心那些值(否则我们会从分子中减去空值部分)。然后,按常规方式计算估计行数:

That is, add up all the frequencies for the MCVs and subtract them from one, then divide by the number of other distinct values. This amounts to assuming that the fraction of the column that is not any of the MCVs is evenly distributed among all the other distinct values. Notice that there are no null values so we don’t have to worry about those (otherwise we’d subtract the null fraction from the numerator as well). The estimated number of rows is then calculated as usual:

rows = 10000 * 0.0014559
     = 15  (rounding off)

之前带 unique1 < 1000 的示例简化了 scalarltsel 的实际操作;现在我们已经看到 MCV 使用的示例,我们可以填写更多详细信息。示例是正确的,因为它走到了尽头,因为 unique1 是唯一列,它没有 MCV(显然,没有哪个值比任何其他值更常见)。对于非唯一列,通常会有直方图和 MCV 列表,和 the histogram does not include the portion of the column population represented by the MCVs。我们这样做是因为它允许更精确的估计。在这种情况下,scalarltsel 将条件(例如,“< 1000”)直接应用于 MCV 列表的每个值,并汇总条件为真的 MCV 的频率。这给出了 MCV 部分表中的选择性的精确估计。然后,使用直方图以与上面相同的方式估计不是 MCV 的表部分中的选择性,然后将这两个数字结合起来以估计整体选择性。例如,考虑

The previous example with unique1 < 1000 was an oversimplification of what scalarltsel really does; now that we have seen an example of the use of MCVs, we can fill in some more detail. The example was correct as far as it went, because since unique1 is a unique column it has no MCVs (obviously, no value is any more common than any other value). For a non-unique column, there will normally be both a histogram and an MCV list, and the histogram does not include the portion of the column population represented by the MCVs. We do things this way because it allows more precise estimation. In this situation scalarltsel directly applies the condition (e.g., “< 1000”) to each value of the MCV list, and adds up the frequencies of the MCVs for which the condition is true. This gives an exact estimate of the selectivity within the portion of the table that is MCVs. The histogram is then used in the same way as above to estimate the selectivity in the portion of the table that is not MCVs, and then the two numbers are combined to estimate the overall selectivity. For example, consider

EXPLAIN SELECT * FROM tenk1 WHERE stringu1 < 'IAAAAA';

                         QUERY PLAN
------------------------------------------------------------
 Seq Scan on tenk1  (cost=0.00..483.00 rows=3077 width=244)
   Filter: (stringu1 < 'IAAAAA'::name)

我们已经看到 stringu1 的 MCV 信息,下面是它的直方图:

We already saw the MCV information for stringu1, and here is its histogram:

SELECT histogram_bounds FROM pg_stats
WHERE tablename='tenk1' AND attname='stringu1';

                                histogram_bounds
-------------------------------------------------------------------​-------------
 {AAAAAA,CQAAAA,FRAAAA,IBAAAA,KRAAAA,NFAAAA,PSAAAA,SGAAAA,VAAAAA,​XLAAAA,ZZAAAA}

检查 MCV 列表,我们发现条件 stringu1 < 'IAAAAA' 由前六个条目满足,而后四个条目则不满足,因此人群 MCV 部分内的选择性是

Checking the MCV list, we find that the condition stringu1 < 'IAAAAA' is satisfied by the first six entries and not the last four, so the selectivity within the MCV part of the population is

selectivity = sum(relevant mvfs)
            = 0.00333333 + 0.003 + 0.003 + 0.003 + 0.003 + 0.003
            = 0.01833333

总和所有 MCF 也告诉我们 MCV 表示的人群总数为 0.03033333,因此由直方图表示的分数为 0.96966667(再一次,没有 null,否则我们必须排除它们)。我们可以看到值 IAAAAA 几乎位于第三个直方图存储段的末端。通过对不同字符频率进行一些相当俗气的假设,规划程序得出估计值 0.298387,以表示低于 IAAAAA 的直方图人群部分。然后,我们合并对 MCV 和非 MCV 人群的估计:

Summing all the MCFs also tells us that the total fraction of the population represented by MCVs is 0.03033333, and therefore the fraction represented by the histogram is 0.96966667 (again, there are no nulls, else we’d have to exclude them here). We can see that the value IAAAAA falls nearly at the end of the third histogram bucket. Using some rather cheesy assumptions about the frequency of different characters, the planner arrives at the estimate 0.298387 for the portion of the histogram population that is less than IAAAAA. We then combine the estimates for the MCV and non-MCV populations:

selectivity = mcv_selectivity + histogram_selectivity * histogram_fraction
            = 0.01833333 + 0.298387 * 0.96966667
            = 0.307669

rows        = 10000 * 0.307669
            = 3077  (rounding off)

在该特定示例中,来自 MCV 列表的修正非常小,因为列分布实际上比较平坦(显示这些特定值的统计信息之所以比其他值更常见,主要是由于抽样误差)。在更典型的情况下,当一些值明显比其他值更为常见时,这个复杂的过程会有效地提高精度,因为可以准确地发现在最常见值的准确性。

In this particular example, the correction from the MCV list is fairly small, because the column distribution is actually quite flat (the statistics showing these particular values as being more common than others are mostly due to sampling error). In a more typical case where some values are significantly more common than others, this complicated process gives a useful improvement in accuracy because the selectivity for the most common values is found exactly.

现在我们考虑 WHERE 子句中有多个条件的情况:

Now let’s consider a case with more than one condition in the WHERE clause:

EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 1000 AND stringu1 = 'xxx';

                                   QUERY PLAN
-------------------------------------------------------------------​-------------
 Bitmap Heap Scan on tenk1  (cost=23.80..396.91 rows=1 width=244)
   Recheck Cond: (unique1 < 1000)
   Filter: (stringu1 = 'xxx'::name)
   ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..23.80 rows=1007 width=0)
         Index Cond: (unique1 < 1000)

规划程序假设这两个条件是独立的,因此可以将子句的各个选择性相乘:

The planner assumes that the two conditions are independent, so that the individual selectivities of the clauses can be multiplied together:

selectivity = selectivity(unique1 < 1000) * selectivity(stringu1 = 'xxx')
            = 0.100697 * 0.0014559
            = 0.0001466

rows        = 10000 * 0.0001466
            = 1  (rounding off)

请注意,从位图索引扫描中估计返回的行数仅反映了与索引一起使用的条件;这很重要,因为它会影响后续堆获取的成本估算。

Notice that the number of rows estimated to be returned from the bitmap index scan reflects only the condition used with the index; this is important since it affects the cost estimate for the subsequent heap fetches.

最后我们将检查涉及联接的查询:

Finally we will examine a query that involves a join:

EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 50 AND t1.unique2 = t2.unique2;

                                      QUERY PLAN
-------------------------------------------------------------------​-------------------
 Nested Loop  (cost=4.64..456.23 rows=50 width=488)
   ->  Bitmap Heap Scan on tenk1 t1  (cost=4.64..142.17 rows=50 width=244)
         Recheck Cond: (unique1 < 50)
         ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..4.63 rows=50 width=0)
               Index Cond: (unique1 < 50)
   ->  Index Scan using tenk2_unique2 on tenk2 t2  (cost=0.00..6.27 rows=1 width=244)
         Index Cond: (unique2 = t1.unique2)

在嵌套循环联接之前,评估对 tenk1unique1 < 50 的限制。这与先前的范围示例类似。这一次,值 50 落入 unique1 直方图的第一存储段:

The restriction on tenk1, unique1 < 50, is evaluated before the nested-loop join. This is handled analogously to the previous range example. This time the value 50 falls into the first bucket of the unique1 histogram:

selectivity = (0 + (50 - bucket[1].min)/(bucket[1].max - bucket[1].min))/num_buckets
            = (0 + (50 - 0)/(993 - 0))/10
            = 0.005035

rows        = 10000 * 0.005035
            = 50  (rounding off)

联接的限制是 t2.unique2 = t1.unique2。算子只是我们熟知的 =,但是选择性函数是从 pg_operatoroprjoin 列获得的,即 eqjoinseleqjoinsel 查找 tenk2tenk1 的统计信息:

The restriction for the join is t2.unique2 = t1.unique2. The operator is just our familiar =, however the selectivity function is obtained from the oprjoin column of pg_operator, and is eqjoinsel. eqjoinsel looks up the statistical information for both tenk2 and tenk1:

SELECT tablename, null_frac,n_distinct, most_common_vals FROM pg_stats
WHERE tablename IN ('tenk1', 'tenk2') AND attname='unique2';

tablename  | null_frac | n_distinct | most_common_vals
-----------+-----------+------------+------------------
 tenk1     |         0 |         -1 |
 tenk2     |         0 |         -1 |

在这种情况下,对于 unique2,没有 MCV 信息,因为所有值似乎都是唯一的,因此我们使用仅依赖于两个关系的不同值的数量及其 null 分数的算法:

In this case there is no MCV information for unique2 because all the values appear to be unique, so we use an algorithm that relies only on the number of distinct values for both relations together with their null fractions:

selectivity = (1 - null_frac1) * (1 - null_frac2) * min(1/num_distinct1, 1/num_distinct2)
            = (1 - 0) * (1 - 0) / max(10000, 10000)
            = 0.0001

也就是说,为每个关系从中减去 null 分数,并除以不同值数量的最大值。联接可能发出的行数计算为两个输入的笛卡尔积的基数,乘以选择性:

This is, subtract the null fraction from one for each of the relations, and divide by the maximum of the numbers of distinct values. The number of rows that the join is likely to emit is calculated as the cardinality of the Cartesian product of the two inputs, multiplied by the selectivity:

rows = (outer_cardinality * inner_cardinality) * selectivity
     = (50 * 10000) * 0.0001
     = 50

如果这两个列有 MCV 列表,eqjoinsel 会使用 MCV 列表的直接比较来确定由 MCV 表示的列人群部分内的联接选择性。其余人群的估计遵循此处所示的相同方法。

Had there been MCV lists for the two columns, eqjoinsel would have used direct comparison of the MCV lists to determine the join selectivity within the part of the column populations represented by the MCVs. The estimate for the remainder of the populations follows the same approach shown here.

请注意,我们将 inner_cardinality 显示为 10000,即 tenk2 的未修改大小。从 EXPLAIN 输出中可能会看出,联接行的估计来自 50 * 1,即外部行数乘以每个内部索引扫描在 tenk2 上获得的行数估计值。但情况并非如此:在考虑任何特定联接计划之前,会估计联接关系大小。如果一切运行良好,则估计联接大小的两种方法会产生相同答案,但由于舍入误差和其他因素,它们有时会产生显著差异。

Notice that we showed inner_cardinality as 10000, that is, the unmodified size of tenk2. It might appear from inspection of the EXPLAIN output that the estimate of join rows comes from 50 * 1, that is, the number of outer rows times the estimated number of rows obtained by each inner index scan on tenk2. But this is not the case: the join relation size is estimated before any particular join plan has been considered. If everything is working well then the two ways of estimating the join size will produce about the same answer, but due to round-off error and other factors they sometimes diverge significantly.

对于有兴趣了解更详细信息的人来说,在 WHERE 子句之前估计表的大小的工作在 src/backend/optimizer/util/plancat.c 中完成。子句选择性的通用逻辑在 src/backend/optimizer/path/clausesel.c 中。特定于运算符的选择性函数主要在 src/backend/utils/adt/selfuncs.c 中找到。

For those interested in further details, estimation of the size of a table (before any WHERE clauses) is done in src/backend/optimizer/util/plancat.c. The generic logic for clause selectivities is in src/backend/optimizer/path/clausesel.c. The operator-specific selectivity functions are mostly found in src/backend/utils/adt/selfuncs.c.