Postgresql 中文操作指南
F.20. intarray — manipulate arrays of integers #
intarray 模块提供了一些实用函数和运算符,用于操作无空整数数组。还支持使用某些运算符进行索引搜索。
The intarray module provides a number of useful functions and operators for manipulating null-free arrays of integers. There is also support for indexed searches using some of the operators.
如果提供的数组包含任何 NULL 元素,所有这些操作都将引发错误。
All of these operations will throw an error if a supplied array contains any NULL elements.
其中的许多操作仅对一维数组有意义。尽管它们会接受更多维的输入数组,但数据在处理中将视为存储顺序中的线性数组。
Many of these operations are only sensible for one-dimensional arrays. Although they will accept input arrays of more dimensions, the data is treated as though it were a linear array in storage order.
此模块被认为是“受信任的”,也就是说,它可以由在当前数据库上具有 CREATE 权限的非超级用户安装。
This module is considered “trusted”, that is, it can be installed by non-superusers who have CREATE privilege on the current database.
F.20.1. intarray Functions and Operators #
intarray 模块提供的函数如 Table F.9 所示,操作符如 Table F.10 所示。
The functions provided by the intarray module are shown in Table F.9, the operators in Table F.10.
Table F.9. intarray Functions
Table F.9. intarray Functions
Function Description Example(s) |
icount ( integer[] ) → integer Returns the number of elements in the array. icount('{1,2,3}'::integer[]) → 3 |
sort ( integer[], dir text ) → integer[] Sorts the array in either ascending or descending order. dir must be asc or desc. sort('{1,3,2}'::integer[], 'desc') → {3,2,1} |
sort ( integer[] ) → integer[] sort_asc ( integer[] ) → integer[] Sorts in ascending order. sort(array[11,77,44]) → {11,44,77} |
sort_desc ( integer[] ) → integer[] Sorts in descending order. sort_desc(array[11,77,44]) → {77,44,11} |
uniq ( integer[] ) → integer[] Removes adjacent duplicates. Often used with sort to remove all duplicates. uniq('{1,2,2,3,1,1}'::integer[]) → {1,2,3,1} uniq(sort('{1,2,3,2,1}'::integer[])) → {1,2,3} |
idx ( integer[], item integer ) → integer Returns index of the first array element matching item, or 0 if no match. idx(array[11,22,33,22,11], 22) → 2 |
subarray ( integer[], start integer, len integer ) → integer[] Extracts the portion of the array starting at position start, with len elements. subarray('{1,2,3,2,1}'::integer[], 2, 3) → {2,3,2} |
subarray ( integer[], start integer ) → integer[] Extracts the portion of the array starting at position start. subarray('{1,2,3,2,1}'::integer[], 2) → {2,3,2,1} |
intset ( integer ) → integer[] Makes a single-element array. intset(42) → {42} |
Table F.10. intarray Operators
Table F.10. intarray Operators
Operator Description |
integer[] && integer[] → boolean Do arrays overlap (have at least one element in common)? |
integer[] @> integer[] → boolean Does left array contain right array? |
integer[] <@ integer[] → boolean Is left array contained in right array? |
# integer[] → integer Returns the number of elements in the array. |
integer[] # integer → integer Returns index of the first array element matching the right argument, or 0 if no match. (Same as idx function.) |
integer[] + integer → integer[] Adds element to end of array. |
integer[] + integer[] → integer[] Concatenates the arrays. |
integer[] - integer → integer[] Removes entries matching the right argument from the array. |
integer[] - integer[] → integer[] Removes elements of the right array from the left array. |
integer[] _ |
_ integer → integer[] Computes the union of the arguments. |
integer[] _ |
_ integer[] → integer[] Computes the union of the arguments. |
integer[] & integer[] → integer[] Computes the intersection of the arguments. |
integer[] @@ query_int → boolean Does array satisfy query? (see below) |
query_int ~~ integer[] → boolean Does array satisfy query? (commutator of @@) |
运算符 &&、@> 和 <@ 等同于 PostgreSQL 的同名内置运算符,但它们仅在不包含空值的整数数组中起作用,而内置运算符适用于任何数组类型。这种限制使得它们在很多情况下比内置运算符更快。
The operators &&, @> and <@ are equivalent to PostgreSQL’s built-in operators of the same names, except that they work only on integer arrays that do not contain nulls, while the built-in operators work for any array type. This restriction makes them faster than the built-in operators in many cases.
@@ 和 ~~ 运算符测试数组是否满足 query,后者表示为专门数据类型 query_int 的值。一个 query 包含检查对数组元素的整型值,可能通过以下运算符组合实现:& (AND)、| (OR) 和 ! (NOT)。必要时可以使用括号。例如,查询 1&(2|3) 匹配既包含 1 又包含 2 或 3 的数组。
The @@ and ~~ operators test whether an array satisfies a query, which is expressed as a value of a specialized data type query_int. A query consists of integer values that are checked against the elements of the array, possibly combined using the operators & (AND), | (OR), and ! (NOT). Parentheses can be used as needed. For example, the query 1&(2|3) matches arrays that contain 1 and also contain either 2 or 3.
F.20.2. Index Support #
intarray 为运算符 &&、@> 和 @@ 提供了索引支持,以及常规数组相等性。
intarray provides index support for the &&, @>, and @@ operators, as well as regular array equality.
提供了两个参数化 GiST 索引运算符类:gist_int_ops(默认使用)适合于中小型数据集,而 gist_intbig_ops 则使用更大的签名,更适合于索引大型数据集(即包含大量不同数组值列)。该实现使用内置的有损压缩功能的 RD 树数据结构。
Two parameterized GiST index operator classes are provided: gist_int_ops (used by default) is suitable for small- to medium-size data sets, while gist_intbig_ops uses a larger signature and is more suitable for indexing large data sets (i.e., columns containing a large number of distinct array values). The implementation uses an RD-tree data structure with built-in lossy compression.
gist__int_ops 通过一组整数范围逼近一个整数集。它的可选整数参数 numranges 确定一个索引键中范围的最大数量。numranges 的默认值为 100。有效值介于 1 和 253 之间。使用更大的数组作为 GiST 索引键会导致更精确的搜索(扫描较小部分的索引和较少的堆页),代价是索引更大。
gist__int_ops approximates an integer set as an array of integer ranges. Its optional integer parameter numranges determines the maximum number of ranges in one index key. The default value of numranges is 100. Valid values are between 1 and 253. Using larger arrays as GiST index keys leads to a more precise search (scanning a smaller fraction of the index and fewer heap pages), at the cost of a larger index.
gist__intbig_ops 通过位图签名逼近一个整数集。它的可选整数参数 siglen 以字节为单位确定签名长度。默认签名长度为 16 个字节。有效的签名长度介于 1 和 2024 个字节之间。更长的签名会导致更精确的搜索(扫描更小部分的索引和更少的堆页),代价是索引更大。
gist__intbig_ops approximates an integer set as a bitmap signature. Its optional integer parameter siglen determines the signature length in bytes. The default signature length is 16 bytes. Valid values of signature length are between 1 and 2024 bytes. Longer signatures lead to a more precise search (scanning a smaller fraction of the index and fewer heap pages), at the cost of a larger index.
还有一类非默认 GIN 运算符类 gin__int_ops,它支持这些运算符以及 <@。
There is also a non-default GIN operator class gin__int_ops, which supports these operators as well as <@.
在 GiST 和 GIN 索引之间的选择取决于 GiST 和 GIN 的相对性能特性,这一点在其他地方已经讨论过了。
The choice between GiST and GIN indexing depends on the relative performance characteristics of GiST and GIN, which are discussed elsewhere.
F.20.3. Example #
-- a message can be in one or more “sections”
CREATE TABLE message (mid INT PRIMARY KEY, sections INT[], ...);
-- create specialized index with signature length of 32 bytes
CREATE INDEX message_rdtree_idx ON message USING GIST (sections gist__intbig_ops (siglen = 32));
-- select messages in section 1 OR 2 - OVERLAP operator
SELECT message.mid FROM message WHERE message.sections && '{1,2}';
-- select messages in sections 1 AND 2 - CONTAINS operator
SELECT message.mid FROM message WHERE message.sections @> '{1,2}';
-- the same, using QUERY operator
SELECT message.mid FROM message WHERE message.sections @@ '1&2'::query_int;
F.20.4. Benchmark #
源目录 contrib/intarray/bench 包含一个基准测试套件,可以在已安装 PostgreSQL 服务器上运行。(它还需要安装 DBD::Pg。)要运行:
The source directory contrib/intarray/bench contains a benchmark test suite, which can be run against an installed PostgreSQL server. (It also requires DBD::Pg to be installed.) To run:
cd .../contrib/intarray/bench
createdb TEST
psql -c "CREATE EXTENSION intarray" TEST
./create_test.pl | psql TEST
./bench.pl
bench.pl 脚本有许多选项,在没有参数的情况下运行时将显示这些选项。
The bench.pl script has numerous options, which are displayed when it is run without any arguments.
F.20.5. Authors #
所有工作均由 Teodor Sigaev ( < link:mailto:teodor@sigaev.ru[teodor@sigaev.ru]> ) 和 Oleg Bartunov ( < link:mailto:oleg@sai.msu.su[oleg@sai.msu.su]> ) 完成。有关其他信息,请参见:http://www.sai.msu.su/ megera/postgres/gist/[http://www.sai.msu.su/ megera/postgres/gist/。Andrey Oktyabrski 在添加新函数和操作方面做了出色的工作。
All work was done by Teodor Sigaev (<link:mailto:teodor@sigaev.ru[teodor@sigaev.ru]>) and Oleg Bartunov (<link:mailto:oleg@sai.msu.su[oleg@sai.msu.su]>). See http://www.sai.msu.su/megera/postgres/gist/ for additional information. Andrey Oktyabrski did a great work on adding new functions and operations.