Postgresql 中文操作指南
11.5. Combining Multiple Indexes #
单个索引扫描只能使用查询子句,这些子句使用索引列与其运算符类中的运算符,并与 AND 联接。例如,给出 (a, b) 上的索引后,像 WHERE a = 5 AND b = 6 这样的查询条件可以使用该索引,但像 WHERE a = 5 OR b = 6 这样的查询无法直接使用该索引。
A single index scan can only use query clauses that use the index’s columns with operators of its operator class and are joined with AND. For example, given an index on (a, b) a query condition like WHERE a = 5 AND b = 6 could use the index, but a query like WHERE a = 5 OR b = 6 could not directly use the index.
幸运的是,PostgreSQL 能够合并多个索引(包括同个索引的多重使用),以处理无法通过单个索引扫描实现的情况。该系统可以在多个索引扫描中形成 AND 和 OR 条件。例如,像 WHERE x = 42 OR x = 47 OR x = 53 OR x = 99 这样的查询可以分解为对 x 上的索引的四个独立扫描,每个扫描都使用一个查询子句。然后将这些扫描的结果按位 OR 在一起以产生结果。另一个示例是,如果我们对 x 和 y 有独立索引,则类似 WHERE x = 5 AND y = 6 这样的查询的可能实现是使用每个索引加上适当的查询子句,然后将索引结果通过按位 AND 运算找出结果行。
Fortunately, PostgreSQL has the ability to combine multiple indexes (including multiple uses of the same index) to handle cases that cannot be implemented by single index scans. The system can form AND and OR conditions across several index scans. For example, a query like WHERE x = 42 OR x = 47 OR x = 53 OR x = 99 could be broken down into four separate scans of an index on x, each scan using one of the query clauses. The results of these scans are then ORed together to produce the result. Another example is that if we have separate indexes on x and y, one possible implementation of a query like WHERE x = 5 AND y = 6 is to use each index with the appropriate query clause and then AND together the index results to identify the result rows.
为了合并多个索引,系统将扫描每个所需的索引,并在内存中准备一个 bitmap,给出报告与该索引条件匹配的表行的位置。然后根据需要按位 AND 和按位 OR 运算位图。最后,访问并返回实际的表行。表行按物理顺序访问,因为这就是位图的布局方式;这意味着原始索引的任何顺序都将会丢失,因此如果查询具有 ORDER BY 子句,则需要一个单独的排序步骤。出于此原因,并且由于每个额外的索引扫描都会增加额外的时间,因此计划程序有时会选择使用简单的索引扫描,即使可以同时使用其他可用索引。
To combine multiple indexes, the system scans each needed index and prepares a bitmap in memory giving the locations of table rows that are reported as matching that index’s conditions. The bitmaps are then ANDed and ORed together as needed by the query. Finally, the actual table rows are visited and returned. The table rows are visited in physical order, because that is how the bitmap is laid out; this means that any ordering of the original indexes is lost, and so a separate sort step will be needed if the query has an ORDER BY clause. For this reason, and because each additional index scan adds extra time, the planner will sometimes choose to use a simple index scan even though additional indexes are available that could have been used as well.
除了最简单的应用程序外,还有可能会有各种有用的索引组合,而数据库开发人员必须权衡做出取舍来决定提供哪些索引。有时多列索引最好,但有时最好创建单独的索引并依靠索引组合功能。例如,如果你的工作负载包含有时仅涉及列 x、有时仅涉及列 y、有时同时涉及这两个列的查询的混合,则你可能选择在 _x_和 _y_上创建两个单独的索引,同时依赖于索引组合来处理同时使用这两列的查询。你也可以在 _(x, y)_上创建多列索引。这个索引通常比索引组合更高效,适用于涉及这两列的查询,但如 Section 11.3中所述,对于仅涉及 _y_的查询它几乎是无用的,因此它不应该是唯一的索引。多列索引与 _y_上一个单独索引的组合将很好地起到作用。对于仅涉及 _x_的查询,可以使用多列索引,尽管它在体积上会更大和速度会更慢,而仅_x_上的索引则没有这些问题。最后一个备选方案是创建所有三个索引,但这可能仅在表被搜索的频率远远高于其被更新,并且所有三种类型的查询都很常见的情况下才合理。如果其中一种类型的查询的常见程度远低于其他类型,则你可能将满足于仅创建与最常见的类型最匹配的两个索引。
In all but the simplest applications, there are various combinations of indexes that might be useful, and the database developer must make trade-offs to decide which indexes to provide. Sometimes multicolumn indexes are best, but sometimes it’s better to create separate indexes and rely on the index-combination feature. For example, if your workload includes a mix of queries that sometimes involve only column x, sometimes only column y, and sometimes both columns, you might choose to create two separate indexes on x and y, relying on index combination to process the queries that use both columns. You could also create a multicolumn index on (x, y). This index would typically be more efficient than index combination for queries involving both columns, but as discussed in Section 11.3, it would be almost useless for queries involving only y, so it should not be the only index. A combination of the multicolumn index and a separate index on y would serve reasonably well. For queries involving only x, the multicolumn index could be used, though it would be larger and hence slower than an index on x alone. The last alternative is to create all three indexes, but this is probably only reasonable if the table is searched much more often than it is updated and all three types of query are common. If one of the types of query is much less common than the others, you’d probably settle for creating just the two indexes that best match the common types.