Postgresql 中文操作指南

68.4. Implementation #

68.4.1. GiST Index Build Methods #

构建 GiST 索引的最简单方法就是逐一插入所有条目。对于大型索引,此方法往往很慢,因为如果索引元组分散在索引中并且索引大到无法放入缓存,则需要大量随机 I/O。PostgreSQL 支持 GiST 索引初始构建的两种替代方法:sortedbuffered 模式。

The simplest way to build a GiST index is just to insert all the entries, one by one. This tends to be slow for large indexes, because if the index tuples are scattered across the index and the index is large enough to not fit in cache, a lot of random I/O will be needed. PostgreSQL supports two alternative methods for initial build of a GiST index: sorted and buffered modes.

只有当索引使用的每个操作类都提供 _sortsupport_函数(如 Section 68.3中所述)时,才能使用排序方法。如果提供了这些函数,此方法通常是最好的,因此默认情况下使用此方法。

The sorted method is only available if each of the opclasses used by the index provides a sortsupport function, as described in Section 68.3. If they do, this method is usually the best, so it is used by default.

缓冲方法的工作原理是不直接将元组插入索引。它可以显着减少无序数据集所需的随机 I/O 量。对于有序数据集,优势较小或不存在,因为一次只有少量页面收到新元组,并且即使整个索引不适合,这些页面也会放入缓存。

The buffered method works by not inserting tuples directly into the index right away. It can dramatically reduce the amount of random I/O needed for non-ordered data sets. For well-ordered data sets the benefit is smaller or non-existent, because only a small number of pages receive new tuples at a time, and those pages fit in cache even if the index as a whole does not.

缓冲方法需要比简单方法更频繁地调用 penalty 函数,这会消耗一些额外的 CPU 资源。此外,缓冲区需要临时磁盘空间,大小为结果索引的大小。缓冲也可能朝积极和消极方向影响结果索引的质量。这种影响取决于各种因素,如输入数据的分布和运算符类实现。

The buffered method needs to call the penalty function more often than the simple method does, which consumes some extra CPU resources. Also, the buffers need temporary disk space, up to the size of the resulting index. Buffering can also influence the quality of the resulting index, in both positive and negative directions. That influence depends on various factors, like the distribution of the input data and the operator class implementation.

如果无法排序,则默认情况下,GiST 索引构建在索引大小达到 effective_cache_size时切换到缓冲方法。可以通过 CREATE INDEX 命令中的 _buffering_参数手动强制或阻止缓冲。在大多数情况下,默认行为都是好的,但是如果输入数据是有序的,关闭缓冲可能会加快构建速度。

If sorting is not possible, then by default a GiST index build switches to the buffering method when the index size reaches effective_cache_size. Buffering can be manually forced or prevented by the buffering parameter to the CREATE INDEX command. The default behavior is good for most cases, but turning buffering off might speed up the build somewhat if the input data is ordered.