Postgresql 中文操作指南

71.1. Introduction #

BRIN 表示块范围索引。BRIN 旨在处理具有非常大的表的表,其中某些列与它们在表内的物理位置具有一定的自然相关性。

BRIN stands for Block Range Index. BRIN is designed for handling very large tables in which certain columns have some natural correlation with their physical location within the table.

BRIN 在 block ranges (或“页面范围”)方面起作用。块范围是表中物理上相邻的一组页面;针对每个块范围,索引会存储一些摘要信息。例如,存储某个商店的销售订单的表可能有一个日期列,每个订单都会在此日期列上放置,并且大多数情况下,早期订单的条目也会早些显示在表中;存储邮政编码列的表可能将某一城市的邮政编码组合在一起。

BRIN works in terms of block ranges (or “page ranges”). A block range is a group of pages that are physically adjacent in the table; for each block range, some summary info is stored by the index. For example, a table storing a store’s sale orders might have a date column on which each order was placed, and most of the time the entries for earlier orders will appear earlier in the table as well; a table storing a ZIP code column might have all codes for a city grouped together naturally.

BRIN 索引可以通过常规位图索引扫描来满足查询,并将返回每个范围内的所有页面中的所有元组(如果索引存储的摘要信息 consistent 满足查询条件)。查询执行器负责重新检查这些元组并丢弃不满足查询条件的元组——换句话说,这些索引是有损的。由于 BRIN 索引非常小,因此与顺序扫描相比,扫描索引几乎没有开销,但可以避免扫描已知不包含匹配元组的大部分表。

BRIN indexes can satisfy queries via regular bitmap index scans, and will return all tuples in all pages within each range if the summary info stored by the index is consistent with the query conditions. The query executor is in charge of rechecking these tuples and discarding those that do not match the query conditions — in other words, these indexes are lossy. Because a BRIN index is very small, scanning the index adds little overhead compared to a sequential scan, but may avoid scanning large parts of the table that are known not to contain matching tuples.

BRIN 索引将存储的具体数据以及索引将能够满足的具体查询取决于为索引的每一列选择的运算符类。具有线性排序顺序的数据类型可以具有运算符类,例如,这些运算符类存储每个块范围内的最小值和最大值;几何类型可以存储块范围中所有对象的边界框。

The specific data that a BRIN index will store, as well as the specific queries that the index will be able to satisfy, depend on the operator class selected for each column of the index. Data types having a linear sort order can have operator classes that store the minimum and maximum value within each block range, for instance; geometrical types might store the bounding box for all the objects in the block range.

块范围的大小是在索引创建时由 pages_per_range 存储参数确定的。索引条目的数量将等于页面中的关系大小除以为 pages_per_range 选择的值。因此,数字越小,索引大小越大(因为需要存储更多索引条目),但同时,存储的摘要数据可以更精确,在索引扫描期间可以跳过更多数据块。

The size of the block range is determined at index creation time by the pages_per_range storage parameter. The number of index entries will be equal to the size of the relation in pages divided by the selected value for pages_per_range. Therefore, the smaller the number, the larger the index becomes (because of the need to store more index entries), but at the same time the summary data stored can be more precise and more data blocks can be skipped during an index scan.

71.1.1. Index Maintenance #

在创建时,扫描所有现有的堆页面,并针对每个范围(包括末尾可能不完整的范围)创建一个摘要索引元组。当新页面填满数据时,已经过摘要的页面范围将导致摘要信息使用新元组的数据进行更新。在创建不属于上次总结的范围的新页面时,新页面所属的范围不会自动获取摘要元组;这些元组将保持未总结状态,直到以后调用总结运行,为该范围创建初始总结。

At the time of creation, all existing heap pages are scanned and a summary index tuple is created for each range, including the possibly-incomplete range at the end. As new pages are filled with data, page ranges that are already summarized will cause the summary information to be updated with data from the new tuples. When a new page is created that does not fall within the last summarized range, the range that the new page belongs to does not automatically acquire a summary tuple; those tuples remain unsummarized until a summarization run is invoked later, creating the initial summary for that range.

有几种方法可以触发页面范围的初始汇总。如果表是真空的,无论是手动还是通过 autovacuum,所有现有的未汇总页面范围都会汇总。此外,如果启用了索引的 autosummarize参数(默认情况下没有启用),那么无论表本身是否由自动真空处理,只要在该数据库中运行自动真空,就会汇总所有已填满的未汇总页面范围;请参见下文。

There are several ways to trigger the initial summarization of a page range. If the table is vacuumed, either manually or by autovacuum, all existing unsummarized page ranges are summarized. Also, if the index’s autosummarize parameter is enabled, which it isn’t by default, whenever autovacuum runs in that database, summarization will occur for all unsummarized page ranges that have been filled, regardless of whether the table itself is processed by autovacuum; see below.

最后,可以使用以下功能:

Lastly, the following functions can be used:

当启用自动总结时,当检测到对下一个块范围的第一个页面的第一个项目进行插入时,请求将被发送到 autovacuum ,以便针对块范围执行定向总结,以在下一次自动清理工作器在同一数据库中完成运行时完成。如果请求队列已满,则不会记录请求,并且会向服务器日志发送一条消息:

When autosummarization is enabled, a request is sent to autovacuum to execute a targeted summarization for a block range when an insertion is detected for the first item of the first page of the next block range, to be fulfilled the next time an autovacuum worker finishes running in the same database. If the request queue is full, the request is not recorded and a message is sent to the server log:

LOG:  request for BRIN range summarization for index "brin_wi_idx" page 128 was not recorded

发生这种情况时,该范围将保持未总结状态,直到在表上执行下一次定期清理运行,或调用上面提到的函数之一。

When this happens, the range will remain unsummarized until the next regular vacuum run on the table, or one of the functions mentioned above are invoked.

相反,可以使用 _brin_desummarize_range(regclass, bigint)_函数对范围进行取消汇总,这在索引元组不再是很好的表示由于现有值已更改时很有用。有关详细信息,请参见 Section 9.27.8

Conversely, a range can be de-summarized using the brin_desummarize_range(regclass, bigint) function, which is useful when the index tuple is no longer a very good representation because the existing values have changed. See Section 9.27.8 for details.