Postgresql 中文操作指南
69.4. Implementation #
本部分涵盖 SP-GiST 运算符类实施者了解的有用实现详细信息和其他技巧。
This section covers implementation details and other tricks that are useful for implementers of SP-GiST operator classes to know.
69.4.1. SP-GiST Limits #
单独的叶元组和内部元组必须适合单个索引页(默认为 8kB)。因此,在为可变长度数据类型的值编制索引时,长值只能通过诸如基数树的方法支持,其中树的每一层都包含一个前缀,其足够短以适合一页,并且最终叶层包含一个后缀,也足够短以适合一页。只有当运算符类准备好安排这种情况发生时,才应将 longValuesOK 设置为真。否则,SP-GiST 核心将拒绝任何索引太大而无法放入索引页的值的请求。
Individual leaf tuples and inner tuples must fit on a single index page (8kB by default). Therefore, when indexing values of variable-length data types, long values can only be supported by methods such as radix trees, in which each level of the tree includes a prefix that is short enough to fit on a page, and the final leaf level includes a suffix also short enough to fit on a page. The operator class should set longValuesOK to true only if it is prepared to arrange for this to happen. Otherwise, the SP-GiST core will reject any request to index a value that is too large to fit on an index page.
同样,内部元组不增长为太大而无法放入索引页是运算符类的责任;这限制了可在单个内部元组中使用的子节点数,以及前缀值的长度上限。
Likewise, it is the operator class’s responsibility that inner tuples do not grow too large to fit on an index page; this limits the number of child nodes that can be used in one inner tuple, as well as the maximum size of a prefix value.
另一个限制是,当内元组的节点指向一组叶元组时,那些元组都必须位于同一索引页面中。(这是一项设计决策,旨在减少寻址和节省串联这些元组的链接中的空间。)如果叶元组的集合增长到超过一个页面,则会执行拆分,并插入一个中间内元组。为了解决这个问题,新的内元组 _must_会将叶值的集合划分为多个节点组。如果运算符类的 _picksplit_函数无法执行此操作,则 SP-GiST 核心将诉诸 Section 69.4.3中描述的非常规措施。
Another limitation is that when an inner tuple’s node points to a set of leaf tuples, those tuples must all be in the same index page. (This is a design decision to reduce seeking and save space in the links that chain such tuples together.) If the set of leaf tuples grows too large for a page, a split is performed and an intermediate inner tuple is inserted. For this to fix the problem, the new inner tuple must divide the set of leaf values into more than one node group. If the operator class’s picksplit function fails to do that, the SP-GiST core resorts to extraordinary measures described in Section 69.4.3.
当 longValuesOK 为真时,预计 SP-GiST 树的连续层级会将更多信息吸收进内部元组的前缀和节点标签中,使所需的叶数据越来越小,直到最终可以放入一页中。为了防止运算符类中的错误导致无限插入循环,如果叶数据在十个 choose 个方法调用周期内没有变小,则 SP-GiST 核心将引发错误。
When longValuesOK is true, it is expected that successive levels of the SP-GiST tree will absorb more and more information into the prefixes and node labels of the inner tuples, making the required leaf datum smaller and smaller, so that eventually it will fit on a page. To prevent bugs in operator classes from causing infinite insertion loops, the SP-GiST core will raise an error if the leaf datum does not become any smaller within ten cycles of choose method calls.
69.4.2. SP-GiST Without Node Labels #
一些树算法为每个内部元组使用固定的节点集;例如,在四叉树中,始终有恰好四个节点对应于内部元组质心点周围的四个象限。在这种情况下,代码通常按编号处理节点,并且不需要明确的节点标签。为了取消节点标签(从而节省一些空间),picksplit 函数可以为 nodeLabels 数组返回 NULL,同样,choose 函数可以在 spgSplitTuple 操作期间返回 NULL 以获得 prefixNodeLabels 数组。这将导致 nodeLabels 在对 choose 和 inner_consistent 的后续调用期间变为 NULL。原则上,节点标签可以用于某些内部元组,也可以在同一个索引中省略其他内部元组的节点标签。
Some tree algorithms use a fixed set of nodes for each inner tuple; for example, in a quad-tree there are always exactly four nodes corresponding to the four quadrants around the inner tuple’s centroid point. In such a case the code typically works with the nodes by number, and there is no need for explicit node labels. To suppress node labels (and thereby save some space), the picksplit function can return NULL for the nodeLabels array, and likewise the choose function can return NULL for the prefixNodeLabels array during a spgSplitTuple action. This will in turn result in nodeLabels being NULL during subsequent calls to choose and inner_consistent. In principle, node labels could be used for some inner tuples and omitted for others in the same index.
在处理具有未标记节点的内部元组时,对于 choose 返回 spgAddNode 是一个错误,因为在这种情况下节点集应该是固定的。
When working with an inner tuple having unlabeled nodes, it is an error for choose to return spgAddNode, since the set of nodes is supposed to be fixed in such cases.
69.4.3. “All-the-Same” Inner Tuples #
当 picksplit 无法将提供的叶值划分为至少两个节点类别时,SP-GiST 核心可以覆盖运算符类 picksplit 函数的结果。当发生这种情况时,新内部元组将创建具有多个节点,每个节点都具有相同的标签(如果有)picksplit 给它使用的那个节点,并且叶值在这些等价节点之间随机分配。allTheSame 标记设置在内部元组上,以警告 choose 和 inner_consistent 函数,该元组没有它们可能期望的节点集。
The SP-GiST core can override the results of the operator class’s picksplit function when picksplit fails to divide the supplied leaf values into at least two node categories. When this happens, the new inner tuple is created with multiple nodes that each have the same label (if any) that picksplit gave to the one node it did use, and the leaf values are divided at random among these equivalent nodes. The allTheSame flag is set on the inner tuple to warn the choose and inner_consistent functions that the tuple does not have the node set that they might otherwise expect.
在处理 allTheSame 元组时,spgMatchNode 的 choose 结果被解释为新值可以分配给任何等价节点;核心代码将忽略提供的 nodeN 值并随机进入一个节点(以保持树平衡)。对于 choose 返回 spgAddNode 是一个错误,因为那样会导致节点不完全相等;如果要插入的值与现有节点不匹配,则必须使用 spgSplitTuple 操作。
When dealing with an allTheSame tuple, a choose result of spgMatchNode is interpreted to mean that the new value can be assigned to any of the equivalent nodes; the core code will ignore the supplied nodeN value and descend into one of the nodes at random (so as to keep the tree balanced). It is an error for choose to return spgAddNode, since that would make the nodes not all equivalent; the spgSplitTuple action must be used if the value to be inserted doesn’t match the existing nodes.
在处理 allTheSame 元组时,inner_consistent 函数应返回全部或无节点作为继续索引搜索的目标,因为它们是等价的。这可能需要或不需要任何特殊情况代码,具体取决于 inner_consistent 函数通常对节点的意义有多少假设。
When dealing with an allTheSame tuple, the inner_consistent function should return either all or none of the nodes as targets for continuing the index search, since they are all equivalent. This may or may not require any special-case code, depending on how much the inner_consistent function normally assumes about the meaning of the nodes.