Postgresql 中文操作指南
Chapter 60. Writing a Table Sampling Method
Table of Contents
PostgreSQL 对 TABLESAMPLE 子句的实现支持自定义表抽样方法,除了 SQL 标准要求的 BERNOULLI 和 SYSTEM 方法。抽样方法确定在使用 TABLESAMPLE 子句时选择表的哪些行。
PostgreSQL’s implementation of the TABLESAMPLE clause supports custom table sampling methods, in addition to the BERNOULLI and SYSTEM methods that are required by the SQL standard. The sampling method determines which rows of the table will be selected when the TABLESAMPLE clause is used.
在 SQL 级别,表抽样方法由单个 SQL 函数表示,通常在 C 中实现,具有签名
At the SQL level, a table sampling method is represented by a single SQL function, typically implemented in C, having the signature
method_name(internal) RETURNS tsm_handler
函数的名称是出现在 TABLESAMPLE 子句中的相同方法名称。参数 internal 是一个虚拟参数(始终具有零值),简单地阻止此函数直接从 SQL 命令调用。函数的结果必须是类型的 palloc’d 结构 TsmRoutine ,其中包含指向抽样方法的支持函数的指针。这些支持函数是普通的 C 函数,在 SQL 级不可见或不可调用。在 Section 60.1 中描述了支持函数。
The name of the function is the same method name appearing in the TABLESAMPLE clause. The internal argument is a dummy (always having value zero) that simply serves to prevent this function from being called directly from an SQL command. The result of the function must be a palloc’d struct of type TsmRoutine, which contains pointers to support functions for the sampling method. These support functions are plain C functions and are not visible or callable at the SQL level. The support functions are described in Section 60.1.
除了函数指针外, TsmRoutine 结构还必须提供以下附加字段:
In addition to function pointers, the TsmRoutine struct must provide these additional fields:
-
List *parameterTypes
-
This is an OID list containing the data type OIDs of the parameter(s) that will be accepted by the TABLESAMPLE clause when this sampling method is used. For example, for the built-in methods, this list contains a single item with value FLOAT4OID, which represents the sampling percentage. Custom sampling methods can have more or different parameters.
-
-
bool repeatable_across_queries
-
If true, the sampling method can deliver identical samples across successive queries, if the same parameters and REPEATABLE seed value are supplied each time and the table contents have not changed. When this is false, the REPEATABLE clause is not accepted for use with the sampling method.
-
-
bool repeatable_across_scans
-
If true, the sampling method can deliver identical samples across successive scans in the same query (assuming unchanging parameters, seed value, and snapshot). When this is false, the planner will not select plans that would require scanning the sampled table more than once, since that might result in inconsistent query output.
-
TsmRoutine 结构类型在 src/include/access/tsmapi.h 中声明,有关附加的详细信息,参见此处。
The TsmRoutine struct type is declared in src/include/access/tsmapi.h, which see for additional details.
标准分布中包含的表采样方法在尝试撰写您自己的采样方法时是一种很好的参考。查看源代码树的 src/backend/access/tablesample 子目录以了解内置采样方法,并查看 contrib 子目录以了解附加方法。
The table sampling methods included in the standard distribution are good references when trying to write your own. Look into the src/backend/access/tablesample subdirectory of the source tree for the built-in sampling methods, and into the contrib subdirectory for add-on methods.