Postgresql 中文操作指南

64.3. Index Scanning #

在索引扫描中,索引访问方法负责 regurgitating 与 scan keys 匹配的所有元组的 TID。访问方法 not 涉及从索引的父表实际获取那些元组,也不涉及确定它们是否通过扫描的可见性测试或其他条件。

In an index scan, the index access method is responsible for regurgitating the TIDs of all the tuples it has been told about that match the scan keys. The access method is not involved in actually fetching those tuples from the index’s parent table, nor in determining whether they pass the scan’s visibility test or other conditions.

扫描键是 WHERE 子句 WHERE 内部表示形式 index_key operator constant 的内部表示形式,其中索引键是索引列之一,运算符是与该索引列关联的运算符族的一个成员。索引扫描具有零或更多个扫描键,这些键隐式地用 AND 运算符连接 — 预计返回的元组将满足所有指出的条件。

A scan key is the internal representation of a WHERE clause of the form index_key operator constant, where the index key is one of the columns of the index and the operator is one of the members of the operator family associated with that index column. An index scan has zero or more scan keys, which are implicitly ANDed — the returned tuples are expected to satisfy all the indicated conditions.

对于特定查询,访问方法可以报告索引为 lossy,或要求重新检查。这意味着索引扫描将返回通过扫描键的所有条目,再加上可能通过的附加条目。然后,核心系统的索引扫描机制将再次对堆元组应用索引条件,以验证是否真的应该选择它。如果没有指定重新检查选项,则索引扫描必须返回与匹配的条目集完全匹配的内容。

The access method can report that the index is lossy, or requires rechecks, for a particular query. This implies that the index scan will return all the entries that pass the scan key, plus possibly additional entries that do not. The core system’s index-scan machinery will then apply the index conditions again to the heap tuple to verify whether or not it really should be selected. If the recheck option is not specified, the index scan must return exactly the set of matching entries.

请注意,完全由访问方法确保其正确查找并只查找通过所有给定扫描键的条目。此外,核心系统将简单地将所有与索引键和运算符族匹配的 WHERE 子句移交出去,而不会进行任何语义分析来确定它们是否冗余或矛盾。举例来说,在 WHERE x > 4 AND x > 14x 是一个 b 树索引列,b 树 amrescan 函数用于实现第一个扫描键是冗余的,可以丢弃。amrescan 期间所需预处理的程度将取决于索引访问方法将扫描键还原为“规范”形式的程度。

Note that it is entirely up to the access method to ensure that it correctly finds all and only the entries passing all the given scan keys. Also, the core system will simply hand off all the WHERE clauses that match the index keys and operator families, without any semantic analysis to determine whether they are redundant or contradictory. As an example, given WHERE x > 4 AND x > 14 where x is a b-tree indexed column, it is left to the b-tree amrescan function to realize that the first scan key is redundant and can be discarded. The extent of preprocessing needed during amrescan will depend on the extent to which the index access method needs to reduce the scan keys to a “normalized” form.

一些访问方法按明确定义的顺序返回索引条目,而另一些则不返回。访问方法实际上支持排序输出有两种不同的方式:

Some access methods return index entries in a well-defined order, others do not. There are actually two different ways that an access method can support sorted output:

amgettuple 函数具有 direction 参数,该参数可以是 ForwardScanDirection(通常情况下)或 BackwardScanDirection。如果 amrescan 之后的第一个调用指定了 BackwardScanDirection,则要从后至前扫描一系列匹配的索引条目,而不是从正常的前至后方向扫描,因此 amgettuple 必须返回索引中的最后一个匹配的元组,而不是像通常情况一样返回第一个元组。(这将仅发生在将 amcanorder 设置为 true 的访问方法中。)在第一次调用之后,amgettuple 必须准备好从最近返回的条目向任一方向推进扫描。(但如果 amcanbackward 为 false,则所有后续调用都将使用与第一个调用相同的方向。)

The amgettuple function has a direction argument, which can be either ForwardScanDirection (the normal case) or BackwardScanDirection. If the first call after amrescan specifies BackwardScanDirection, then the set of matching index entries is to be scanned back-to-front rather than in the normal front-to-back direction, so amgettuple must return the last matching tuple in the index, rather than the first one as it normally would. (This will only occur for access methods that set amcanorder to true.) After the first call, amgettuple must be prepared to advance the scan in either direction from the most recently returned entry. (But if amcanbackward is false, all subsequent calls will have the same direction as the first one.)

支持顺序扫描的访问方法必须支持在扫描中“标记”一个位置并在以后返回到标记的位置。可以多次还原同一位置。但是,每个扫描只需要记住一个位置; 新的 ammarkpos 调用将覆盖先前标记的位置。不支持顺序扫描的访问方法不必提供 ammarkposamrestrpos 函数 IndexAmRoutine; 相反,将这些指针设置为 NULL。

Access methods that support ordered scans must support “marking” a position in a scan and later returning to the marked position. The same position might be restored multiple times. However, only one position need be remembered per scan; a new ammarkpos call overrides the previously marked position. An access method that does not support ordered scans need not provide ammarkpos and amrestrpos functions in IndexAmRoutine; set those pointers to NULL instead.

扫描位置和标记位置(如果有)必须在面对索引中的并发插入或删除时保持一致。如果刚插入的条目不被扫描返回,而该扫描可以在启动扫描时找到该条目,或者扫描返回此类条目,即使该条目在第一次通过时没有被返回,也是可以的。类似地,并发删除可能会或可能不会反应该扫描的结果。重要的是,插入或删除不会导致扫描遗漏或多次返回未自身插入或删除的条目。

Both the scan position and the mark position (if any) must be maintained consistently in the face of concurrent insertions or deletions in the index. It is OK if a freshly-inserted entry is not returned by a scan that would have found the entry if it had existed when the scan started, or for the scan to return such an entry upon rescanning or backing up even though it had not been returned the first time through. Similarly, a concurrent delete might or might not be reflected in the results of a scan. What is important is that insertions or deletions not cause the scan to miss or multiply return entries that were not themselves being inserted or deleted.

如果索引存储原始索引数据值(而不是数据的某种有损表示),则很有用,因为在这种情况下索引返回实际数据,而不仅仅是堆元组的 TID。仅当可见性图显示 TID 位于完全可见的页面上,才可以避免 I/O;否则,必须访问堆元组以检查 MVCC 可见性。但这与访问方法无关。

If the index stores the original indexed data values (and not some lossy representation of them), it is useful to support index-only scans, in which the index returns the actual data not just the TID of the heap tuple. This will only avoid I/O if the visibility map shows that the TID is on an all-visible page; else the heap tuple must be visited anyway to check MVCC visibility. But that is no concern of the access method’s.

可以执行索引扫描,而不是使用 amgettuple,使用 amgetbitmap 一次获取所有元组。这明显比 amgettuple 更有效,因为它允许在访问方法中避免锁/解锁周期。原则上,amgetbitmap 应该与重复的 amgettuple 调用具有相同的效果,但我们施加了一些限制以简化问题。首先,amgetbitmap 一次返回所有元组,并且不支持标记或恢复扫描位置。其次,元组以不具有任何特定顺序的位图返回,这就是 amgetbitmap 不采用 direction 参数的原因。(在这种扫描中也永远不会提供排序运算符。)此外,不存在用于只索引扫描的置备,因为无法返回索引元组的内容。最后,amgetbitmap 并不保证返回的元组有任何锁,其含义在 Section 64.4 中说明。

Instead of using amgettuple, an index scan can be done with amgetbitmap to fetch all tuples in one call. This can be noticeably more efficient than amgettuple because it allows avoiding lock/unlock cycles within the access method. In principle amgetbitmap should have the same effects as repeated amgettuple calls, but we impose several restrictions to simplify matters. First of all, amgetbitmap returns all tuples at once and marking or restoring scan positions isn’t supported. Secondly, the tuples are returned in a bitmap which doesn’t have any specific ordering, which is why amgetbitmap doesn’t take a direction argument. (Ordering operators will never be supplied for such a scan, either.) Also, there is no provision for index-only scans with amgetbitmap, since there is no way to return the contents of index tuples. Finally, amgetbitmap does not guarantee any locking of the returned tuples, with implications spelled out in Section 64.4.

请注意,如果其内部实现不适合一个或另一个 API,则允许访问方法仅实现 amgetbitmap 而不实现 amgettuple,反之亦然。

Note that it is permitted for an access method to implement only amgetbitmap and not amgettuple, or vice versa, if its internal implementation is unsuited to one API or the other.