Postgresql 中文操作指南
64.1. Basic API Structure for Indexes #
pg_am 系统目录中的每一行都描述了一个索引访问方法。 pg_am 条目为索引访问方法指定一个名称和一个 handler function 。可以使用 CREATE ACCESS METHOD 和 DROP ACCESS METHOD SQL 命令创建和删除这些条目。
Each index access method is described by a row in the pg_am system catalog. The pg_am entry specifies a name and a handler function for the index access method. These entries can be created and deleted using the CREATE ACCESS METHOD and DROP ACCESS METHOD SQL commands.
必须声明索引访问方法处理函数以接受类型 internal_的单个自变量并返回伪类型 _index_am_handler。自变量是一个虚拟值,仅用于防止直接从 SQL 命令调用处理函数。该函数的结果必须是已分配内存的 IndexAmRoutine_类型结构,它包含核心代码使用索引访问方法所需的所有内容。被称为访问方法的 _IndexAmRoutine 结构(也称为访问方法的 API struct)包括指定访问方法的各种固定属性的字段,例如它是否支持多列索引。更重要的是,它包含指向访问方法的支持函数的指针,这些函数执行访问索引的所有实际工作。这些支持函数是普通 C 函数,在 SQL 级别不可见或不可调用。支持函数在 Section 64.2中进行了描述。
An index access method handler function must be declared to accept a single argument of type internal and to return the pseudo-type index_am_handler. The argument is a dummy value that simply serves to prevent handler functions from being called directly from SQL commands. The result of the function must be a palloc’d struct of type IndexAmRoutine, which contains everything that the core code needs to know to make use of the index access method. The IndexAmRoutine struct, also called the access method’s API struct, includes fields specifying assorted fixed properties of the access method, such as whether it can support multicolumn indexes. More importantly, it contains pointers to support functions for the access method, which do all of the real work to access indexes. These support functions are plain C functions and are not visible or callable at the SQL level. The support functions are described in Section 64.2.
_IndexAmRoutine_结构定义如下:
The structure IndexAmRoutine is defined thus:
typedef struct IndexAmRoutine
{
NodeTag type;
/*
* Total number of strategies (operators) by which we can traverse/search
* this AM. Zero if AM does not have a fixed set of strategy assignments.
*/
uint16 amstrategies;
/* total number of support functions that this AM uses */
uint16 amsupport;
/* opclass options support function number or 0 */
uint16 amoptsprocnum;
/* does AM support ORDER BY indexed column's value? */
bool amcanorder;
/* does AM support ORDER BY result of an operator on indexed column? */
bool amcanorderbyop;
/* does AM support backward scanning? */
bool amcanbackward;
/* does AM support UNIQUE indexes? */
bool amcanunique;
/* does AM support multi-column indexes? */
bool amcanmulticol;
/* does AM require scans to have a constraint on the first index column? */
bool amoptionalkey;
/* does AM handle ScalarArrayOpExpr quals? */
bool amsearcharray;
/* does AM handle IS NULL/IS NOT NULL quals? */
bool amsearchnulls;
/* can index storage data type differ from column data type? */
bool amstorage;
/* can an index of this type be clustered on? */
bool amclusterable;
/* does AM handle predicate locks? */
bool ampredlocks;
/* does AM support parallel scan? */
bool amcanparallel;
/* does AM support columns included with clause INCLUDE? */
bool amcaninclude;
/* does AM use maintenance_work_mem? */
bool amusemaintenanceworkmem;
/* does AM summarize tuples, with at least all tuples in the block
* summarized in one summary */
bool amsummarizing;
/* OR of parallel vacuum flags */
uint8 amparallelvacuumoptions;
/* type of data stored in index, or InvalidOid if variable */
Oid amkeytype;
/* interface functions */
ambuild_function ambuild;
ambuildempty_function ambuildempty;
aminsert_function aminsert;
ambulkdelete_function ambulkdelete;
amvacuumcleanup_function amvacuumcleanup;
amcanreturn_function amcanreturn; /* can be NULL */
amcostestimate_function amcostestimate;
amoptions_function amoptions;
amproperty_function amproperty; /* can be NULL */
ambuildphasename_function ambuildphasename; /* can be NULL */
amvalidate_function amvalidate;
amadjustmembers_function amadjustmembers; /* can be NULL */
ambeginscan_function ambeginscan;
amrescan_function amrescan;
amgettuple_function amgettuple; /* can be NULL */
amgetbitmap_function amgetbitmap; /* can be NULL */
amendscan_function amendscan;
ammarkpos_function ammarkpos; /* can be NULL */
amrestrpos_function amrestrpos; /* can be NULL */
/* interface functions to support parallel index scans */
amestimateparallelscan_function amestimateparallelscan; /* can be NULL */
aminitparallelscan_function aminitparallelscan; /* can be NULL */
amparallelrescan_function amparallelrescan; /* can be NULL */
} IndexAmRoutine;
为了有用,索引访问方法还必须在 pg_opfamily 、 pg_opclass 、 pg_amop 和 pg_amproc 中定义一个或多个 operator families 和 operator classes 。这些项允许计划器确定可与该访问方法的索引一起使用哪些类型的查询限定条件。操作符系列和类在 Section 38.16 中描述,这是阅读本章的先决条件材料。
To be useful, an index access method must also have one or more operator families and operator classes defined in pg_opfamily, pg_opclass, pg_amop, and pg_amproc. These entries allow the planner to determine what kinds of query qualifications can be used with indexes of this access method. Operator families and classes are described in Section 38.16, which is prerequisite material for reading this chapter.
单个索引由一个将其描述为物理关系的 pg_class 项定义,加上一个显示索引逻辑内容的 pg_index 项 —— 即它具有的索引列集和这些列的语义,如关联的操作符类所捕获。索引列(键值)可以是基础表的简单列,也可以是表行的表达式。索引访问方法通常对索引键值来自何处没有兴趣(始终交给它预计算的键值),但它会对 pg_index 中的操作符类信息非常感兴趣。这两个目录项都可以作为 Relation 数据结构的一部分进行访问,该数据结构传递给索引上的所有操作。
An individual index is defined by a pg_class entry that describes it as a physical relation, plus a pg_index entry that shows the logical content of the index — that is, the set of index columns it has and the semantics of those columns, as captured by the associated operator classes. The index columns (key values) can be either simple columns of the underlying table or expressions over the table rows. The index access method normally has no interest in where the index key values come from (it is always handed precomputed key values) but it will be very interested in the operator class information in pg_index. Both of these catalog entries can be accessed as part of the Relation data structure that is passed to all operations on the index.
IndexAmRoutine_的某些标志字段具有不明显的含义。_amcanunique_的要求在 Section 64.5中进行了讨论。_amcanmulticol_标志断言该访问方法支持多键列索引,而 _amoptionalkey_断言它允许扫描而未对第一个索引列给出可索引的限制子句。当 _amcanmulticol_为假时,_amoptionalkey_基本上表示访问方法是否支持没有任何限制子句的全部索引扫描。支持多个索引列 _must_的访问方法支持省略第一个索引列后任何或所有列上的限制的扫描;但是,允许它们要求对第一个索引列出现某些限制,并且这是通过将 _amoptionalkey_设置为假来实现的。索引 AM 将 _amoptionalkey_设置为假的其中一个原因是它不索引空值。由于大多数可索引运算符都是严格的,因此不能为 null 输入返回 true,因此乍一看不为 null 值存储索引条目很有吸引力:反正索引扫描永远不会返回它们。但是,当索引扫描对给定索引列没有限制子句时,此自变量将失败。实际上,这意味着具有 _amoptionalkey_true 的索引必须索引 null 值,因为计划程序可能会决定在没有任何扫描键的情况下使用此类索引。相关的限制是支持多个索引列 _must_的索引访问方法支持在第一个索引列后的列中索引 null 值,因为计划程序将假定索引可用于不限制这些索引的查询列。例如,考虑 (a,b) 上的索引和 _WHERE a = 4_的查询。该系统将假定可以使用索引扫描 _a = 4_的行,如果索引省略 _b_为 null 的行,则这是错误的。但是,省略第一个索引列为 null 的行是可以的。索引 null 值的索引访问方法也可以设置 _amsearchnulls,表示它支持 _IS NULL_和 _IS NOT NULL_子句作为搜索条件。
Some of the flag fields of IndexAmRoutine have nonobvious implications. The requirements of amcanunique are discussed in Section 64.5. The amcanmulticol flag asserts that the access method supports multi-key-column indexes, while amoptionalkey asserts that it allows scans where no indexable restriction clause is given for the first index column. When amcanmulticol is false, amoptionalkey essentially says whether the access method supports full-index scans without any restriction clause. Access methods that support multiple index columns must support scans that omit restrictions on any or all of the columns after the first; however they are permitted to require some restriction to appear for the first index column, and this is signaled by setting amoptionalkey false. One reason that an index AM might set amoptionalkey false is if it doesn’t index null values. Since most indexable operators are strict and hence cannot return true for null inputs, it is at first sight attractive to not store index entries for null values: they could never be returned by an index scan anyway. However, this argument fails when an index scan has no restriction clause for a given index column. In practice this means that indexes that have amoptionalkey true must index nulls, since the planner might decide to use such an index with no scan keys at all. A related restriction is that an index access method that supports multiple index columns must support indexing null values in columns after the first, because the planner will assume the index can be used for queries that do not restrict these columns. For example, consider an index on (a,b) and a query with WHERE a = 4. The system will assume the index can be used to scan for rows with a = 4, which is wrong if the index omits rows where b is null. It is, however, OK to omit rows where the first indexed column is null. An index access method that does index nulls may also set amsearchnulls, indicating that it supports IS NULL and IS NOT NULL clauses as search conditions.
amcaninclude_标志表示访问方法是否支持“包含”列,即它可以在不进行处理的情况下存储除了键列之外的其他列。前一段的要求仅适用于键列。特别是,_amcanmulticol=false_和_amcaninclude=_true_的组合是有意义的:它表示只能有一个键列,但也可以有包含列。此外,包含列必须允许为 null,与_amoptionalkey_无关。
The amcaninclude flag indicates whether the access method supports “included” columns, that is it can store (without processing) additional columns beyond the key column(s). The requirements of the preceding paragraph apply only to the key columns. In particular, the combination of amcanmulticol=false and amcaninclude=true is sensible: it means that there can only be one key column, but there can also be included column(s). Also, included columns must be allowed to be null, independently of amoptionalkey.
amsummarizing 标志指示访问方法是否汇总索引元组,其汇总粒度至少为每个块。可能允许 HOT 优化继续进行,但不能指向单个元组而是指向块范围(如 BRIN)的访问方法。这不适用于索引谓词中引用的属性,对该属性的更新始终会禁用 HOT。
The amsummarizing flag indicates whether the access method summarizes the indexed tuples, with summarizing granularity of at least per block. Access methods that do not point to individual tuples, but to block ranges (like BRIN), may allow the HOT optimization to continue. This does not apply to attributes referenced in index predicates, an update of such an attribute always disables HOT.