Postgresql 中文操作指南

7.2. Table Expressions #

table expression 计算表。表表达式包含一个 FROM 子句,其后可选择地跟有 WHEREGROUP BYHAVING 子句。简单的表表达式只引用磁盘上的表,即所谓的基表,但是可以使用更复杂的表达式以各种方式修改或合并基表。

A table expression computes a table. The table expression contains a FROM clause that is optionally followed by WHERE, GROUP BY, and HAVING clauses. Trivial table expressions simply refer to a table on disk, a so-called base table, but more complex expressions can be used to modify or combine base tables in various ways.

表表达式中的可选 WHEREGROUP BYHAVING 子句指定对 FROM 子句中派生的表执行连续转换的管道。所有这些转换都会生成提供行以传递给选择列表用于计算查询输出行的虚拟表。

The optional WHERE, GROUP BY, and HAVING clauses in the table expression specify a pipeline of successive transformations performed on the table derived in the FROM clause. All these transformations produce a virtual table that provides the rows that are passed to the select list to compute the output rows of the query.

7.2.1. The FROM Clause #

FROM 从句从逗号分隔的表参考列表中给出的一个或多个其他表中派生一个表。

The FROM clause derives a table from one or more other tables given in a comma-separated table reference list.

FROM table_reference [, table_reference [, ...]]

表引用可以是表名(可能具有模式限定),或者派生表(例如子查询、JOIN 构造或它们的复杂组合)。如果 FROM 子句中列出了多个表引用,则表将被交叉连接(也就是说,会形成它们的行的笛卡尔积;请参阅下方)。FROM 列表的结果是一个中间虚拟表,然后可以由 WHEREGROUP BYHAVING 子句进行转换,最终成为总表表达式的结果。

A table reference can be a table name (possibly schema-qualified), or a derived table such as a subquery, a JOIN construct, or complex combinations of these. If more than one table reference is listed in the FROM clause, the tables are cross-joined (that is, the Cartesian product of their rows is formed; see below). The result of the FROM list is an intermediate virtual table that can then be subject to transformations by the WHERE, GROUP BY, and HAVING clauses and is finally the result of the overall table expression.

当表引用指定表为表继承层次结构的父表时,表引用不仅会生成该表中的行,还会生成它所有下级表中的行,除非表名之前有关键字 ONLY。但是,引用只会生成已命名表中出现的列 — 忽略已分表中添加的任何列。

When a table reference names a table that is the parent of a table inheritance hierarchy, the table reference produces rows of not only that table but all of its descendant tables, unless the key word ONLY precedes the table name. However, the reference produces only the columns that appear in the named table — any columns added in subtables are ignored.

您可以在表名前面写 ONLY 来表示表名,也可以在表名后面写 * 来明确指定包含下级表。目前使用这种语法已经没有真正理由,因为现在始终将搜索下级表作为默认行为。但是,为了与较早版本兼容,仍然支持它。

Instead of writing ONLY before the table name, you can write * after the table name to explicitly specify that descendant tables are included. There is no real reason to use this syntax any more, because searching descendant tables is now always the default behavior. However, it is supported for compatibility with older releases.

7.2.1.1. Joined Tables #

联接表是依据特定的联接类型规则从两个其他(真实或派生)表派生出的表。可以使用内联接、外联接和交叉连接。联接表的常规语法是

A joined table is a table derived from two other (real or derived) tables according to the rules of the particular join type. Inner, outer, and cross-joins are available. The general syntax of a joined table is

T1 join_type T2 [ join_condition ]

所有类型的联接都可以链接在一起,或者嵌套:T1T2,或者二者,都可以是联接表。圆括号可以围绕 JOIN 子句用来控制联接顺序。在没有圆括号的情况下,JOIN 子句按从左至右嵌套。

Joins of all types can be chained together, or nested: either or both T1 and T2 can be joined tables. Parentheses can be used around JOIN clauses to control the join order. In the absence of parentheses, JOIN clauses nest left-to-right.

Join Types

  • Cross join

T1 CROSS JOIN T2
  • For every possible combination of rows from T1 and T2 (i.e., a Cartesian product), the joined table will contain a row consisting of all columns in T1 followed by all columns in T2. If the tables have N and M rows respectively, the joined table will have N * M rows.

  • FROM _T1 CROSS JOIN T2 is equivalent to _FROM _T1 INNER JOIN T2 ON TRUE_ (see below). It is also equivalent to FROM _T1, _T2.

    • Qualified joins

T1 { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN T2 ON boolean_expression
T1 { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN T2 USING ( join column list )
T1 NATURAL { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN T2
  • The words INNER and OUTER are optional in all forms. INNER is the default; LEFT, RIGHT, and FULL imply an outer join.

  • The join condition is specified in the ON or USING clause, or implicitly by the word NATURAL. The join condition determines which rows from the two source tables are considered to “match”, as explained in detail below.

  • The possible types of qualified join are:

  • The ON clause is the most general kind of join condition: it takes a Boolean value expression of the same kind as is used in a WHERE clause. A pair of rows from T1 and T2 match if the ON expression evaluates to true.

  • The USING clause is a shorthand that allows you to take advantage of the specific situation where both sides of the join use the same name for the joining column(s). It takes a comma-separated list of the shared column names and forms a join condition that includes an equality comparison for each one. For example, joining T1 and T2 with USING (a, b) produces the join condition ON _T1.a = T2.a AND T1.b = T2.b_.

  • Furthermore, the output of JOIN USING suppresses redundant columns: there is no need to print both of the matched columns, since they must have equal values. While JOIN ON produces all columns from T1 followed by all columns from T2, JOIN USING produces one output column for each of the listed column pairs (in the listed order), followed by any remaining columns from T1, followed by any remaining columns from T2.

  • Finally, NATURAL is a shorthand form of USING: it forms a USING list consisting of all column names that appear in both input tables. As with USING, these columns appear only once in the output table. If there are no common column names, NATURAL JOIN behaves like JOIN …​ ON TRUE, producing a cross-product join.

Note

后者当出现两个以上表时,该等价关系并不完全成立,因为 JOIN 的结合比逗号更紧密。例如, FROM _T1 CROSS JOIN T2 INNER JOIN T3 ON condition is not the same as FROM _T1, T2 INNER JOIN T3 ON condition because the condition 在第一种情况下可以引用 T1 ,但在第二种情况下不能。

This latter equivalence does not hold exactly when more than two tables appear, because JOIN binds more tightly than comma. For example FROM _T1 CROSS JOIN T2 INNER JOIN T3 ON condition is not the same as _FROM _T1, T2 INNER JOIN T3 ON condition because the _condition can reference T1 in the first case but not the second.

  • INNER JOIN

    • For each row R1 of T1, the joined table has a row for each row in T2 that satisfies the join condition with R1.

  • LEFT OUTER JOIN

    • First, an inner join is performed. Then, for each row in T1 that does not satisfy the join condition with any row in T2, a joined row is added with null values in columns of T2. Thus, the joined table always has at least one row for each row in T1.

  • RIGHT OUTER JOIN

    • First, an inner join is performed. Then, for each row in T2 that does not satisfy the join condition with any row in T1, a joined row is added with null values in columns of T1. This is the converse of a left join: the result table will always have a row for each row in T2.

  • FULL OUTER JOIN

    • First, an inner join is performed. Then, for each row in T1 that does not satisfy the join condition with any row in T2, a joined row is added with null values in columns of T2. Also, for each row of T2 that does not satisfy the join condition with any row in T1, a joined row with null values in the columns of T1 is added.

Note

USING 对于连接关系中的列更改相当安全,因为只有列出的列被合并。NATURAL 的风险更大,因为对任一关系的任何导致出现新匹配的列名的架构更改都将导致连接也合并该新列。

USING is reasonably safe from column changes in the joined relations since only the listed columns are combined. NATURAL is considerably more risky since any schema changes to either relation that cause a new matching column name to be present will cause the join to combine that new column as well.

综上所述,假设我们有以下各表:

To put this together, assume we have tables t1:

 num | name
-----+------
   1 | a
   2 | b
   3 | c

和:

and t2:

 num | value
-----+-------
   1 | xxx
   3 | yyy
   5 | zzz

那么,我们会得到以下用于不同联接的结果:

then we get the following results for the various joins:

=> SELECT * FROM t1 CROSS JOIN t2;
 num | name | num | value
-----+------+-----+-------
   1 | a    |   1 | xxx
   1 | a    |   3 | yyy
   1 | a    |   5 | zzz
   2 | b    |   1 | xxx
   2 | b    |   3 | yyy
   2 | b    |   5 | zzz
   3 | c    |   1 | xxx
   3 | c    |   3 | yyy
   3 | c    |   5 | zzz
(9 rows)

=> SELECT * FROM t1 INNER JOIN t2 ON t1.num = t2.num;
 num | name | num | value
-----+------+-----+-------
   1 | a    |   1 | xxx
   3 | c    |   3 | yyy
(2 rows)

=> SELECT * FROM t1 INNER JOIN t2 USING (num);
 num | name | value
-----+------+-------
   1 | a    | xxx
   3 | c    | yyy
(2 rows)

=> SELECT * FROM t1 NATURAL INNER JOIN t2;
 num | name | value
-----+------+-------
   1 | a    | xxx
   3 | c    | yyy
(2 rows)

=> SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num;
 num | name | num | value
-----+------+-----+-------
   1 | a    |   1 | xxx
   2 | b    |     |
   3 | c    |   3 | yyy
(3 rows)

=> SELECT * FROM t1 LEFT JOIN t2 USING (num);
 num | name | value
-----+------+-------
   1 | a    | xxx
   2 | b    |
   3 | c    | yyy
(3 rows)

=> SELECT * FROM t1 RIGHT JOIN t2 ON t1.num = t2.num;
 num | name | num | value
-----+------+-----+-------
   1 | a    |   1 | xxx
   3 | c    |   3 | yyy
     |      |   5 | zzz
(3 rows)

=> SELECT * FROM t1 FULL JOIN t2 ON t1.num = t2.num;
 num | name | num | value
-----+------+-----+-------
   1 | a    |   1 | xxx
   2 | b    |     |
   3 | c    |   3 | yyy
     |      |   5 | zzz
(4 rows)

ON 指定的联接条件也可以包含与联接无关的条件。这对于某些查询很有用,但需要仔细考虑。例如:

The join condition specified with ON can also contain conditions that do not relate directly to the join. This can prove useful for some queries but needs to be thought out carefully. For example:

=> SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num AND t2.value = 'xxx';
 num | name | num | value
-----+------+-----+-------
   1 | a    |   1 | xxx
   2 | b    |     |
   3 | c    |     |
(3 rows)

请注意,将限制条件放在 WHERE 条款中会产生不同的结果:

Notice that placing the restriction in the WHERE clause produces a different result:

=> SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num WHERE t2.value = 'xxx';
 num | name | num | value
-----+------+-----+-------
   1 | a    |   1 | xxx
(1 row)

这是因为放在 ON 条款中的限制条件是在联接之前处理的,而放在 WHERE 条款中的限制条件是在联接之后处理的。对于内联接,这一点无关紧要,但对于外联接则影响很大。

This is because a restriction placed in the ON clause is processed before the join, while a restriction placed in the WHERE clause is processed after the join. That does not matter with inner joins, but it matters a lot with outer joins.

7.2.1.2. Table and Column Aliases #

可以为复杂表引用和临时表指定一个临时名称,以便在查询的其余部分引用派生表中。这称为 table alias

A temporary name can be given to tables and complex table references to be used for references to the derived table in the rest of the query. This is called a table alias.

要创建表别名,请这样写

To create a table alias, write

FROM table_reference AS alias

or

FROM table_reference alias

AS 关键字是可选噪音。alias 可以是任何标识符。

The AS key word is optional noise. alias can be any identifier.

表别名的一个典型应用是为长表名指定短标识符,以使联接条款保持可读性。例如:

A typical application of table aliases is to assign short identifiers to long table names to keep the join clauses readable. For example:

SELECT * FROM some_very_long_table_name s JOIN another_fairly_long_name a ON s.id = a.num;

别名成为表引用新的名称,就当前查询而言,它不被允许在查询中其他地方用原始名称引用表。因此,这是无效的:

The alias becomes the new name of the table reference so far as the current query is concerned — it is not allowed to refer to the table by the original name elsewhere in the query. Thus, this is not valid:

SELECT * FROM my_table AS m WHERE my_table.a > 5;    -- wrong

表别名主要是为了方便表示,但是当将表自身连接起来时(例如),必须使用表别名:

Table aliases are mainly for notational convenience, but it is necessary to use them when joining a table to itself, e.g.:

SELECT * FROM people AS mother JOIN people AS child ON mother.id = child.mother_id;

括号用于解决歧义。在下面的示例中,第 1 条语句将别名 b 分配给 my_table 的第二个实例,但第 2 条语句将别名分配给连接的结果:

Parentheses are used to resolve ambiguities. In the following example, the first statement assigns the alias b to the second instance of my_table, but the second statement assigns the alias to the result of the join:

SELECT * FROM my_table AS a CROSS JOIN my_table AS b ...
SELECT * FROM (my_table AS a CROSS JOIN my_table) AS b ...

另一种形式的表别名给表的列以及表自身赋予了临时名称:

Another form of table aliasing gives temporary names to the columns of the table, as well as the table itself:

FROM table_reference [AS] alias ( column1 [, column2 [, ...]] )

如果指定小于表的实际列数的列别名,则不会重命名其余列。此语法对于自连接或子查询特别有用。

If fewer column aliases are specified than the actual table has columns, the remaining columns are not renamed. This syntax is especially useful for self-joins or subqueries.

当将别名应用于 JOIN 子句的输出时,别名将隐藏 JOIN 内的原始名称。例如:

When an alias is applied to the output of a JOIN clause, the alias hides the original name(s) within the JOIN. For example:

SELECT a.* FROM my_table AS a JOIN your_table AS b ON ...

是有效的 SQL,但:

is valid SQL, but:

SELECT a.* FROM (my_table AS a JOIN your_table AS b ON ...) AS c

无效;表别名 a 在别名 c 外不可见。

is not valid; the table alias a is not visible outside the alias c.

7.2.1.3. Subqueries #

指定派生表的子查询必须括在括号中。可以为它们分配表别名,还可以选择分配列别名(如 Section 7.2.1.2 中所示)。例如:

Subqueries specifying a derived table must be enclosed in parentheses. They may be assigned a table alias name, and optionally column alias names (as in Section 7.2.1.2). For example:

FROM (SELECT * FROM table1) AS alias_name

此示例等效于 FROM table1 AS alias_name。当子查询涉及分组或聚合时,更有趣的案例(无法简化为普通连接)会产生。

This example is equivalent to FROM table1 AS alias_name. More interesting cases, which cannot be reduced to a plain join, arise when the subquery involves grouping or aggregation.

子查询还可为 VALUES 列表:

A subquery can also be a VALUES list:

FROM (VALUES ('anne', 'smith'), ('bob', 'jones'), ('joe', 'blow'))
     AS names(first, last)

同样,一个表别名是可选的。为 VALUES 列表中的列分配别名是可选的,但这是一个好习惯。有关详细信息,请参见 Section 7.7

Again, a table alias is optional. Assigning alias names to the columns of the VALUES list is optional, but is good practice. For more information see Section 7.7.

根据 SQL 标准,必须为子查询提供一个表别名。PostgreSQL 允许 AS 并省略别名,但在可能移植到另一个系统的 SQL 代码中编写别名是一个良好的做法。

According to the SQL standard, a table alias name must be supplied for a subquery. PostgreSQL allows AS and the alias to be omitted, but writing one is good practice in SQL code that might be ported to another system.

7.2.1.4. Table Functions #

表函数是产生一组行的函数,由基本数据类型(标量类型)或复合数据类型(表行)组成。它们在查询的 FROM 子句中像表、视图或子查询一样使用。表函数返回的列可以与表、视图或子查询的列在 SELECTJOINWHERE 子句中以相同的方式包括。

Table functions are functions that produce a set of rows, made up of either base data types (scalar types) or composite data types (table rows). They are used like a table, view, or subquery in the FROM clause of a query. Columns returned by table functions can be included in SELECT, JOIN, or WHERE clauses in the same manner as columns of a table, view, or subquery.

也可以使用 ROWS FROM 语法组合表函数,并以并行列形式返回结果;在这种情况下结果行的数量是最大的函数结果,而较小的结果将用空值填充以匹配。

Table functions may also be combined using the ROWS FROM syntax, with the results returned in parallel columns; the number of result rows in this case is that of the largest function result, with smaller results padded with null values to match.

function_call [WITH ORDINALITY] [[AS] table_alias [(column_alias [, ... ])]]
ROWS FROM( function_call [, ... ] ) [WITH ORDINALITY] [[AS] table_alias [(column_alias [, ... ])]]

如果指定 WITH ORDINALITY 子句,将向函数结果列添加额外的 bigint 类型列。此列对函数结果集的行进行编号,从 1 开始。(这是 UNNEST …​ WITH ORDINALITY 的 SQL 标准语法的概括。)默认情况下,序数列称为 ordinality,但可以使用 AS 子句为其分配不同的列名。

If the WITH ORDINALITY clause is specified, an additional column of type bigint will be added to the function result columns. This column numbers the rows of the function result set, starting from 1. (This is a generalization of the SQL-standard syntax for UNNEST …​ WITH ORDINALITY.) By default, the ordinal column is called ordinality, but a different column name can be assigned to it using an AS clause.

特殊表函数 UNNEST 可使用任意数量的数组参数调用,并返回相应数量的列,就好像 UNNEST ( Section 9.19) 已分别对每个参数调用并使用 ROWS FROM 构造组合在一起一样。

The special table function UNNEST may be called with any number of array parameters, and it returns a corresponding number of columns, as if UNNEST (Section 9.19) had been called on each parameter separately and combined using the ROWS FROM construct.

UNNEST( array_expression [, ... ] ) [WITH ORDINALITY] [[AS] table_alias [(column_alias [, ... ])]]

如果未指定 table_alias,则函数名称用作表名称;在 ROWS FROM() 构造函数的情况下,使用第一个函数的名称。

If no table_alias is specified, the function name is used as the table name; in the case of a ROWS FROM() construct, the first function’s name is used.

如果未提供列别名,则对于返回基本数据类型的函数,列名也与函数名称相同。对于返回复合类型的函数,结果列获取该类型的各个属性的名称。

If column aliases are not supplied, then for a function returning a base data type, the column name is also the same as the function name. For a function returning a composite type, the result columns get the names of the individual attributes of the type.

举例:

Some examples:

CREATE TABLE foo (fooid int, foosubid int, fooname text);

CREATE FUNCTION getfoo(int) RETURNS SETOF foo AS $$
    SELECT * FROM foo WHERE fooid = $1;
$$ LANGUAGE SQL;

SELECT * FROM getfoo(1) AS t1;

SELECT * FROM foo
    WHERE foosubid IN (
                        SELECT foosubid
                        FROM getfoo(foo.fooid) z
                        WHERE z.fooid = foo.fooid
                      );

CREATE VIEW vw_getfoo AS SELECT * FROM getfoo(1);

SELECT * FROM vw_getfoo;

在某些情况下,定义表函数很有用,这些函数可以根据它们的调用方式返回不同的列集。为了支持这一点,可以将表函数声明为返回伪类型 record,没有 OUT 参数。当在查询中使用这样的函数时,必须在查询本身中指定预期的行结构,这样系统才能知道如何解析和计划查询。此语法如下所示:

In some cases it is useful to define table functions that can return different column sets depending on how they are invoked. To support this, the table function can be declared as returning the pseudo-type record with no OUT parameters. When such a function is used in a query, the expected row structure must be specified in the query itself, so that the system can know how to parse and plan the query. This syntax looks like:

function_call [AS] alias (column_definition [, ... ])
function_call AS [alias] (column_definition [, ... ])
ROWS FROM( ... function_call AS (column_definition [, ... ]) [, ... ] )

如果不使用 ROWS FROM() 语法,column_definition 列表将替换可附加到 FROM 项的列别名列表;列定义中的名称用作列别名。当使用 ROWS FROM() 语法时,可以分别为每个成员函数附加 column_definition 列表;或如果只有一个成员函数且没有 WITH ORDINALITY 子句,则可以编写 column_definition 列表来代替 ROWS FROM() 后的列别名列表。

When not using the ROWS FROM() syntax, the column_definition list replaces the column alias list that could otherwise be attached to the FROM item; the names in the column definitions serve as column aliases. When using the ROWS FROM() syntax, a column_definition list can be attached to each member function separately; or if there is only one member function and no WITH ORDINALITY clause, a column_definition list can be written in place of a column alias list following ROWS FROM().

考虑以下示例:

Consider this example:

SELECT *
    FROM dblink('dbname=mydb', 'SELECT proname, prosrc FROM pg_proc')
      AS t1(proname name, prosrc text)
    WHERE proname LIKE 'bytea%';

dblink 函数(是 dblink 模块的一部分)可执行远程查询。它被声明为返回 record ,因为它可以用于任何种类的查询。实际列集必须在调用查询中指定,以便解析器知道 * 应扩展到哪些内容。

The dblink function (part of the dblink module) executes a remote query. It is declared to return record since it might be used for any kind of query. The actual column set must be specified in the calling query so that the parser knows, for example, what * should expand to.

此示例使用 ROWS FROM

This example uses ROWS FROM:

SELECT *
FROM ROWS FROM
    (
        json_to_recordset('[{"a":40,"b":"foo"},{"a":"100","b":"bar"}]')
            AS (a INTEGER, b TEXT),
        generate_series(1, 3)
    ) AS x (p, q, s)
ORDER BY p;

  p  |  q  | s
-----+-----+---
  40 | foo | 1
 100 | bar | 2
     |     | 3

它将两个函数连接到一个 FROM 目标。json_to_recordset() 被指示返回两列,第一列 integer 和第二列 textgenerate_series() 的结果直接使用。ORDER BY 子句将列值按整数排序。

It joins two functions into a single FROM target. json_to_recordset() is instructed to return two columns, the first integer and the second text. The result of generate_series() is used directly. The ORDER BY clause sorts the column values as integers.

7.2.1.5. LATERAL Subqueries #

出现在 FROM 中的子查询可以以关键字 LATERAL 为前缀。这允许它们引用前面 FROM 项提供的列。(如果没有 LATERAL,每个子查询都是独立评估的,因此无法交叉引用任何其他 FROM 项。)

Subqueries appearing in FROM can be preceded by the key word LATERAL. This allows them to reference columns provided by preceding FROM items. (Without LATERAL, each subquery is evaluated independently and so cannot cross-reference any other FROM item.)

出现在 FROM 中的表函数也可以以关键字 LATERAL 为前缀,但对于函数来说关键字是可选的;函数的参数在任何情况下都可以包含对先前 FROM 项提供的列的引用。

Table functions appearing in FROM can also be preceded by the key word LATERAL, but for functions the key word is optional; the function’s arguments can contain references to columns provided by preceding FROM items in any case.

LATERAL 项可以出现在 FROM 列表的顶层,或出现在 JOIN 树中。在后一种情况下,它还可以引用位于它右侧 JOIN 的左侧的任何项。

A LATERAL item can appear at the top level in the FROM list, or within a JOIN tree. In the latter case it can also refer to any items that are on the left-hand side of a JOIN that it is on the right-hand side of.

FROM 项包含 LATERAL 交叉引用时,评估过程如下:对于提供交叉引用列的 FROM 项的每一行,或提供列的多个 FROM 项的行集, LATERAL 项使用该行或行使用列集的值进行评估。所得行与计算它们的行的连接方式与通常一样。这会针对列源表中的每一行或行集重复进行。

When a FROM item contains LATERAL cross-references, evaluation proceeds as follows: for each row of the FROM item providing the cross-referenced column(s), or set of rows of multiple FROM items providing the columns, the LATERAL item is evaluated using that row or row set’s values of the columns. The resulting row(s) are joined as usual with the rows they were computed from. This is repeated for each row or set of rows from the column source table(s).

LATERAL 的一个简单示例是

A trivial example of LATERAL is

SELECT * FROM foo, LATERAL (SELECT * FROM bar WHERE bar.id = foo.bar_id) ss;

这不是特别有用,因为它与更传统的

This is not especially useful since it has exactly the same result as the more conventional

SELECT * FROM foo, bar WHERE bar.id = foo.bar_id;

LATERAL 主要是当交叉引用列对于计算要加入的行是必需的时候才有用。一种常见的应用是为集合返回函数提供参数值。例如,假设 vertices(polygon) 返回多边形的顶点集合,我们可以用以下方法识别存储在表中的多边形的紧密相连的顶点:

LATERAL is primarily useful when the cross-referenced column is necessary for computing the row(s) to be joined. A common application is providing an argument value for a set-returning function. For example, supposing that vertices(polygon) returns the set of vertices of a polygon, we could identify close-together vertices of polygons stored in a table with:

SELECT p1.id, p2.id, v1, v2
FROM polygons p1, polygons p2,
     LATERAL vertices(p1.poly) v1,
     LATERAL vertices(p2.poly) v2
WHERE (v1 <-> v2) < 10 AND p1.id != p2.id;

此查询还可以编写为

This query could also be written

SELECT p1.id, p2.id, v1, v2
FROM polygons p1 CROSS JOIN LATERAL vertices(p1.poly) v1,
     polygons p2 CROSS JOIN LATERAL vertices(p2.poly) v2
WHERE (v1 <-> v2) < 10 AND p1.id != p2.id;

或以几个其他等效公式编写。(如前所述,本例中 LATERAL 关键字是不必要的,但我们为清楚起见使用了它。)

or in several other equivalent formulations. (As already mentioned, the LATERAL key word is unnecessary in this example, but we use it for clarity.)

通常,将 LEFT JOINLATERAL 子查询特别方便,这样即使 LATERAL 子查询没有为它们生成行,源行也会出现在结果中。例如,如果 get_product_names() 返回某个制造商制造的产品名称,但我们表中的一些制造商目前没有生产任何产品,我们可以这样来找出这些制造商:

It is often particularly handy to LEFT JOIN to a LATERAL subquery, so that source rows will appear in the result even if the LATERAL subquery produces no rows for them. For example, if get_product_names() returns the names of products made by a manufacturer, but some manufacturers in our table currently produce no products, we could find out which ones those are like this:

SELECT m.name
FROM manufacturers m LEFT JOIN LATERAL get_product_names(m.id) pname ON true
WHERE pname IS NULL;

7.2.2. The WHERE Clause #

WHERE 条款的语法是

The syntax of the WHERE clause is

WHERE search_condition

其中 search_condition 是返回类型为 boolean 的任何值表达式(参见 Section 4.2)。

where search_condition is any value expression (see Section 4.2) that returns a value of type boolean.

FROM 子句的处理完成后,派生虚拟表的每一行都会针对搜索条件进行检查。如果条件的结果为真,则该行保留在输出表中,否则(即,如果结果为假或空)则将其丢弃。搜索条件通常引用 FROM 子句中生成的表的至少一列;这不是必需的,但如果没有它,WHERE 子句将相当无用。

After the processing of the FROM clause is done, each row of the derived virtual table is checked against the search condition. If the result of the condition is true, the row is kept in the output table, otherwise (i.e., if the result is false or null) it is discarded. The search condition typically references at least one column of the table generated in the FROM clause; this is not required, but otherwise the WHERE clause will be fairly useless.

Note

内连接的连接条件可以写在 WHERE 子句或 JOIN 子句中。例如,这些表表达式是等价的:

The join condition of an inner join can be written either in the WHERE clause or in the JOIN clause. For example, these table expressions are equivalent:

FROM a, b WHERE a.id = b.id AND b.val > 5

及:

and:

FROM a INNER JOIN b ON (a.id = b.id) WHERE b.val > 5

或甚至:

or perhaps even:

FROM a NATURAL JOIN b WHERE b.val > 5

使用哪一种主要取决于风格。JOINFROM 子句中的语法可能不像 SQL 标准中那样便携到其他 SQL 数据库管理系统。对于外部连接,没有选择:必须在 FROM 子句中完成。外部连接的 ONUSING 子句_not_ 等同于 WHERE 条件,因为会导致添加行(对于不匹配的输入行)以及删除最终结果中的行。

Which one of these you use is mainly a matter of style. The JOIN syntax in the FROM clause is probably not as portable to other SQL database management systems, even though it is in the SQL standard. For outer joins there is no choice: they must be done in the FROM clause. The ON or USING clause of an outer join is not equivalent to a WHERE condition, because it results in the addition of rows (for unmatched input rows) as well as the removal of rows in the final result.

下面是一些 WHERE 子句的示例:

Here are some examples of WHERE clauses:

SELECT ... FROM fdt WHERE c1 > 5

SELECT ... FROM fdt WHERE c1 IN (1, 2, 3)

SELECT ... FROM fdt WHERE c1 IN (SELECT c1 FROM t2)

SELECT ... FROM fdt WHERE c1 IN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10)

SELECT ... FROM fdt WHERE c1 BETWEEN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10) AND 100

SELECT ... FROM fdt WHERE EXISTS (SELECT c1 FROM t2 WHERE c2 > fdt.c1)

fdtFROM 子句中派生的表。不满足 WHERE 子句搜索条件的行将从 fdt 中消除。注意标量子查询用作值表达式。就像任何其他查询一样,子查询可以使用复杂表表达式。还要注意 @{13} 在子查询中引用的方式。只有当 c1 也是子查询的派生输入表中一列的名称时,才需要将 c1 限定为 fdt.c1。但即使不需要限定列名,限定列名也会增加清晰度。此示例显示了外部查询的列命名范围如何扩展到其内部查询。

fdt is the table derived in the FROM clause. Rows that do not meet the search condition of the WHERE clause are eliminated from fdt. Notice the use of scalar subqueries as value expressions. Just like any other query, the subqueries can employ complex table expressions. Notice also how fdt is referenced in the subqueries. Qualifying c1 as fdt.c1 is only necessary if c1 is also the name of a column in the derived input table of the subquery. But qualifying the column name adds clarity even when it is not needed. This example shows how the column naming scope of an outer query extends into its inner queries.

7.2.3. The GROUP BY and HAVING Clauses #

通过 WHERE 筛选器后,派生的输入表可能受到使用 GROUP BY 子句的分组以及使用 HAVING 子句消除组行的影响。

After passing the WHERE filter, the derived input table might be subject to grouping, using the GROUP BY clause, and elimination of group rows using the HAVING clause.

SELECT select_list
    FROM ...
    [WHERE ...]
    GROUP BY grouping_column_reference [, grouping_column_reference]...

GROUP BY 条款用于对表中列出所有列中具有相同值的行进行分组。列出列的顺序无关紧要。效果是将具有公共值的每组行合并到一个组行中,该组行表示组中的所有行。这是为了消除输出结果中的冗余和/或计算适用于这些组的聚合。例如:

The GROUP BY clause is used to group together those rows in a table that have the same values in all the columns listed. The order in which the columns are listed does not matter. The effect is to combine each set of rows having common values into one group row that represents all rows in the group. This is done to eliminate redundancy in the output and/or compute aggregates that apply to these groups. For instance:

=> SELECT * FROM test1;
 x | y
---+---
 a | 3
 c | 2
 b | 5
 a | 1
(4 rows)

=> SELECT x FROM test1 GROUP BY x;
 x
---
 a
 b
 c
(3 rows)

在第二个查询中,我们无法编写 SELECT * FROM test1 GROUP BY x,因为没有可以与每个组关联的列 y 的单个值。分组列可以在选择列表中引用,因为它们在每个组中具有单个值。

In the second query, we could not have written SELECT * FROM test1 GROUP BY x, because there is no single value for the column y that could be associated with each group. The grouped-by columns can be referenced in the select list since they have a single value in each group.

通常,如果对表进行分组,则不能在聚合表达式中引用尚未列在 GROUP BY 中的列。带有聚合表达式的示例如下:

In general, if a table is grouped, columns that are not listed in GROUP BY cannot be referenced except in aggregate expressions. An example with aggregate expressions is:

=> SELECT x, sum(y) FROM test1 GROUP BY x;
 x | sum
---+-----
 a |   4
 b |   5
 c |   2
(3 rows)

此处 sum 是一个聚合函数,它计算整个组的单个值。有关可用聚合函数的详细信息,请参见 Section 9.21

Here sum is an aggregate function that computes a single value over the entire group. More information about the available aggregate functions can be found in Section 9.21.

Tip

在没有任何聚合表达式的条件下进行分组实际上就是计算列中不同的值集。也可以使用 DISTINCT 子句(参见 Section 7.3.3)来实现此目的。

Grouping without aggregate expressions effectively calculates the set of distinct values in a column. This can also be achieved using the DISTINCT clause (see Section 7.3.3).

这是另一个示例:它计算每个产品的总销售额(而不是所有产品的总销售额):

Here is another example: it calculates the total sales for each product (rather than the total sales of all products):

SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
    FROM products p LEFT JOIN sales s USING (product_id)
    GROUP BY product_id, p.name, p.price;

在此示例中,列 product_idp.namep.price 必须在 GROUP BY 子句中,因为它们在查询选择列表中被引用(但请参见下文)。列 s.units 不必在 GROUP BY 列表中,因为它仅用于表示产品销售额的聚合表达式(sum(…​))。对于每个产品,查询都会返回有关该产品所有销售的摘要行。

In this example, the columns product_id, p.name, and p.price must be in the GROUP BY clause since they are referenced in the query select list (but see below). The column s.units does not have to be in the GROUP BY list since it is only used in an aggregate expression (sum(…​)), which represents the sales of a product. For each product, the query returns a summary row about all sales of the product.

如果产品表设置得很好,比如 product_id 是主键,则在上述示例中按 product_id 分组就足够了,因为在产品 ID 中 name 和 price 会是 functionally dependent,因此每个产品 ID 组要返回的 name 和 price 值不会有歧义。

If the products table is set up so that, say, product_id is the primary key, then it would be enough to group by product_id in the above example, since name and price would be functionally dependent on the product ID, and so there would be no ambiguity about which name and price value to return for each product ID group.

在严格的 SQL 中,GROUP BY 只可以按源表的列分组,但 PostgreSQL 将其扩展为还允许 GROUP BY 按选择列表中的列分组。还允许按值表达式而不是简单列名分组。

In strict SQL, GROUP BY can only group by columns of the source table but PostgreSQL extends this to also allow GROUP BY to group by columns in the select list. Grouping by value expressions instead of simple column names is also allowed.

如果已使用 GROUP BY 对表进行了分组,但只关注某些组,则可以像 WHERE 子句一样使用 HAVING 子句,从结果中消除组。语法如下:

If a table has been grouped using GROUP BY, but only certain groups are of interest, the HAVING clause can be used, much like a WHERE clause, to eliminate groups from the result. The syntax is:

SELECT select_list FROM ... [WHERE ...] GROUP BY ... HAVING boolean_expression

HAVING 子句中的表达式可以引用组表达式和未分组表达式(它们必然涉及聚合函数)。

Expressions in the HAVING clause can refer both to grouped expressions and to ungrouped expressions (which necessarily involve an aggregate function).

示例:

Example:

=> SELECT x, sum(y) FROM test1 GROUP BY x HAVING sum(y) > 3;
 x | sum
---+-----
 a |   4
 b |   5
(2 rows)

=> SELECT x, sum(y) FROM test1 GROUP BY x HAVING x < 'c';
 x | sum
---+-----
 a |   4
 b |   5
(2 rows)

同样,一个更现实的示例:

Again, a more realistic example:

SELECT product_id, p.name, (sum(s.units) * (p.price - p.cost)) AS profit
    FROM products p LEFT JOIN sales s USING (product_id)
    WHERE s.date > CURRENT_DATE - INTERVAL '4 weeks'
    GROUP BY product_id, p.name, p.price, p.cost
    HAVING sum(p.price * s.units) > 5000;

在上面的示例中,WHERE 子句按未分组的列选择行(该表达式仅对过去四周的销售有效),而 HAVING 子句将输出限制为总毛销售额超过 5000 的组。请注意,聚合函数不一定在查询的所有部分中都相同。

In the example above, the WHERE clause is selecting rows by a column that is not grouped (the expression is only true for sales during the last four weeks), while the HAVING clause restricts the output to groups with total gross sales over 5000. Note that the aggregate expressions do not necessarily need to be the same in all parts of the query.

如果查询包含聚合函数调用,但没有 GROUP BY 子句,仍然会进行分组:结果是单个组行(或者如果单行随后被 HAVING 消除),则可能没有行。如果查询包含 HAVING 子句,即使没有任何聚合函数调用或 GROUP BY 子句,情况也是如此。

If a query contains aggregate function calls, but no GROUP BY clause, grouping still occurs: the result is a single group row (or perhaps no rows at all, if the single row is then eliminated by HAVING). The same is true if it contains a HAVING clause, even without any aggregate function calls or GROUP BY clause.

7.2.4. GROUPING SETS, CUBE, and ROLLUP #

还可以使用 grouping sets 的概念来执行比上述更复杂的分组操作。由 FROMWHERE 子句选择的数据会按每个指定的分组集分别进行分组,针对每个分组计算聚合值,就像对简单的 GROUP BY 子句所做的那样,然后返回结果。例如:

More complex grouping operations than those described above are possible using the concept of grouping sets. The data selected by the FROM and WHERE clauses is grouped separately by each specified grouping set, aggregates computed for each group just as for simple GROUP BY clauses, and then the results returned. For example:

=> SELECT * FROM items_sold;
 brand | size | sales
-------+------+-------
 Foo   | L    |  10
 Foo   | M    |  20
 Bar   | M    |  15
 Bar   | L    |  5
(4 rows)

=> SELECT brand, size, sum(sales) FROM items_sold GROUP BY GROUPING SETS ((brand), (size), ());
 brand | size | sum
-------+------+-----
 Foo   |      |  30
 Bar   |      |  20
       | L    |  15
       | M    |  35
       |      |  50
(5 rows)

GROUPING SETS 的每个子列表可以指定零个或更多列或表达式,它的解释方式与直接在 GROUP BY 子句中指定一样。一个空的分组集表示所有行均被聚合到一个单一分组(即使没有输入行,也会输出该分组),如之前对于没有 GROUP BY 子句的聚合函数所述。

Each sublist of GROUPING SETS may specify zero or more columns or expressions and is interpreted the same way as though it were directly in the GROUP BY clause. An empty grouping set means that all rows are aggregated down to a single group (which is output even if no input rows were present), as described above for the case of aggregate functions with no GROUP BY clause.

对于未出现这些列的分组集,对分组列或表达式的引用将在结果行中替换为 null 值。要区分特定输出行是由哪个分组产生的,请参见 Table 9.63

References to the grouping columns or expressions are replaced by null values in result rows for grouping sets in which those columns do not appear. To distinguish which grouping a particular output row resulted from, see Table 9.63.

提供了一种速记法来指定两种常见的分组集类型。这种形式的子句

A shorthand notation is provided for specifying two common types of grouping set. A clause of the form

ROLLUP ( e1, e2, e3, ... )

表示给定的表达式列表以及所有列表的前缀,包括空列表;因此它等效于

represents the given list of expressions and all prefixes of the list including the empty list; thus it is equivalent to

GROUPING SETS (
    ( e1, e2, e3, ... ),
    ...
    ( e1, e2 ),
    ( e1 ),
    ( )
)

这通常用于对分层数据进行分析;例如,按部门、部门和全公司的工资总计。

This is commonly used for analysis over hierarchical data; e.g., total salary by department, division, and company-wide total.

这种形式的子句

A clause of the form

CUBE ( e1, e2, ... )

表示给定的列表及其所有可能的子集(即幂集)。因此

represents the given list and all of its possible subsets (i.e., the power set). Thus

CUBE ( a, b, c )

等效于

is equivalent to

GROUPING SETS (
    ( a, b, c ),
    ( a, b    ),
    ( a,    c ),
    ( a       ),
    (    b, c ),
    (    b    ),
    (       c ),
    (         )
)

CUBEROLLUP 子句的各个元素可以是单个表达式,也可以是括号中的元素子列表。在后一种情况下,这些子列表在生成各个分组集时被视为单个单元。例如:

The individual elements of a CUBE or ROLLUP clause may be either individual expressions, or sublists of elements in parentheses. In the latter case, the sublists are treated as single units for the purposes of generating the individual grouping sets. For example:

CUBE ( (a, b), (c, d) )

等效于

is equivalent to

GROUPING SETS (
    ( a, b, c, d ),
    ( a, b       ),
    (       c, d ),
    (            )
)

and

ROLLUP ( a, (b, c), d )

等效于

is equivalent to

GROUPING SETS (
    ( a, b, c, d ),
    ( a, b, c    ),
    ( a          ),
    (            )
)

CUBEROLLUP 构造均可以在 GROUP BY 子句中直接使用,也可以嵌套在 GROUPING SETS 子句内。如果一个 GROUPING SETS 子句嵌套在另一个内,效果与将内部子句的所有元素直接写在外部子句中相同。

The CUBE and ROLLUP constructs can be used either directly in the GROUP BY clause, or nested inside a GROUPING SETS clause. If one GROUPING SETS clause is nested inside another, the effect is the same as if all the elements of the inner clause had been written directly in the outer clause.

如果在一个 GROUP BY 子句中指定多个分组项目,那么分组集的最终列表就是各项目的笛卡儿积。例如:

If multiple grouping items are specified in a single GROUP BY clause, then the final list of grouping sets is the cross product of the individual items. For example:

GROUP BY a, CUBE (b, c), GROUPING SETS ((d), (e))

等效于

is equivalent to

GROUP BY GROUPING SETS (
    (a, b, c, d), (a, b, c, e),
    (a, b, d),    (a, b, e),
    (a, c, d),    (a, c, e),
    (a, d),       (a, e)
)

在同时指定多个分组项时,最终的分组集可能包含重复项。例如:

When specifying multiple grouping items together, the final set of grouping sets might contain duplicates. For example:

GROUP BY ROLLUP (a, b), ROLLUP (a, c)

等效于

is equivalent to

GROUP BY GROUPING SETS (
    (a, b, c),
    (a, b),
    (a, b),
    (a, c),
    (a),
    (a),
    (a, c),
    (a),
    ()
)

如果这些副本不需要,可以使用 DISTINCT 子句直接在 GROUP BY 上将其删除。因此:

If these duplicates are undesirable, they can be removed using the DISTINCT clause directly on the GROUP BY. Therefore:

GROUP BY DISTINCT ROLLUP (a, b), ROLLUP (a, c)

等效于

is equivalent to

GROUP BY GROUPING SETS (
    (a, b, c),
    (a, b),
    (a, c),
    (a),
    ()
)

这不等于使用 SELECT DISTINCT,因为输出行仍然可能包含副本。如果任何未分组列所含的内容均为空,则从该列被分组时使用的空值中无法将其区分开来。

This is not the same as using SELECT DISTINCT because the output rows may still contain duplicates. If any of the ungrouped columns contains NULL, it will be indistinguishable from the NULL used when that same column is grouped.

Note

构造 (a, b) 通常在表达式的 row constructor 中被识别。在 GROUP BY 条款内,这并不适用于表达式最上层,而 (a, b) 被解析为上述表达式列表。如果您出于某种原因在分组表达式中省略 need 行构造函数,请使用 ROW(a, b)

The construct (a, b) is normally recognized in expressions as a row constructor. Within the GROUP BY clause, this does not apply at the top levels of expressions, and (a, b) is parsed as a list of expressions as described above. If for some reason you need a row constructor in a grouping expression, use ROW(a, b).

7.2.5. Window Function Processing #

如果查询包含任何窗口函数(参见 Section 3.5Section 9.22Section 4.2.8),那么在执行任何分组、聚合和 HAVING 筛选之后,将对这些函数进行求值。也就是说,如果查询使用任何聚合、GROUP BYHAVING,那么窗口函数看到的行是组行,而不是来自 FROM/WHERE 的原始表行。

If the query contains any window functions (see Section 3.5, Section 9.22 and Section 4.2.8), these functions are evaluated after any grouping, aggregation, and HAVING filtering is performed. That is, if the query uses any aggregates, GROUP BY, or HAVING, then the rows seen by the window functions are the group rows instead of the original table rows from FROM/WHERE.

在使用多个窗口函数时,所有窗口定义中拥有语法等效 PARTITION BYORDER BY 子句的窗口函数必定会对数据执行一次遍历后评估。因此,它们将看到相同的排序顺序,即使 ORDER BY 并未唯一确定一个顺序。然而,对于拥有不同 PARTITION BYORDER BY 规格的函数,不作任何评估保证。(在这种情况下,窗口函数评估的各遍历之间通常需要执行一个排序步骤,而排序无法保留其 ORDER BY 视为等效的行顺序。)

When multiple window functions are used, all the window functions having syntactically equivalent PARTITION BY and ORDER BY clauses in their window definitions are guaranteed to be evaluated in a single pass over the data. Therefore they will see the same sort ordering, even if the ORDER BY does not uniquely determine an ordering. However, no guarantees are made about the evaluation of functions having different PARTITION BY or ORDER BY specifications. (In such cases a sort step is typically required between the passes of window function evaluations, and the sort is not guaranteed to preserve ordering of rows that its ORDER BY sees as equivalent.)

当前,窗口函数始终需要预排序数据,因此查询输出将根据一个或多个窗口函数的 PARTITION BY/ORDER BY 子句进行排序。然而,不建议依赖这一点。如果您希望确保以特定方式对结果进行排序,请使用显式顶级 ORDER BY 子句。

Currently, window functions always require presorted data, and so the query output will be ordered according to one or another of the window functions' PARTITION BY/ORDER BY clauses. It is not recommended to rely on this, however. Use an explicit top-level ORDER BY clause if you want to be sure the results are sorted in a particular way.