Postgresql 中文操作指南

3.5. Window Functions #

window function 对一组以某种方式与当前行相关联的表行执行计算。这与可以使用聚合函数执行的计算类型类似。但是,窗口函数不会导致行像非窗口聚合调用那样分组到一个单独的输出行中。相反,这些行保持自己的独立性。在幕后,窗口函数能够访问的不仅仅是查询结果的当前行。

A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. However, window functions do not cause rows to become grouped into a single output row like non-window aggregate calls would. Instead, the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result.

这里有一个示例,演示如何将每个员工的工资与他或她所在部门的平均工资进行比较:

Here is an example that shows how to compare each employee’s salary with the average salary in his or her department:

SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary;
  depname  | empno | salary |          avg
-----------+-------+--------+-----------------------
 develop   |    11 |   5200 | 5020.0000000000000000
 develop   |     7 |   4200 | 5020.0000000000000000
 develop   |     9 |   4500 | 5020.0000000000000000
 develop   |     8 |   6000 | 5020.0000000000000000
 develop   |    10 |   5200 | 5020.0000000000000000
 personnel |     5 |   3500 | 3700.0000000000000000
 personnel |     2 |   3900 | 3700.0000000000000000
 sales     |     3 |   4800 | 4866.6666666666666667
 sales     |     1 |   5000 | 4866.6666666666666667
 sales     |     4 |   4800 | 4866.6666666666666667
(10 rows)

前三列输出直接来自表 empsalary,表中每一行对应一个输出行。第四列表示对所有具有与当前行相同的 depname 值的表行取平均值。(实际上,这与非窗口 avg 聚合是相同的函数,但 OVER 子句导致它被视为窗口函数,并在窗口框架中进行计算。)

The first three output columns come directly from the table empsalary, and there is one output row for each row in the table. The fourth column represents an average taken across all the table rows that have the same depname value as the current row. (This actually is the same function as the non-window avg aggregate, but the OVER clause causes it to be treated as a window function and computed across the window frame.)

窗口函数调用总是包含一个 OVER 子句,紧跟在窗口函数的名称和参数之后。这在语法上将其与普通函数或非窗口聚合区分开来。OVER 子句精确地确定查询的行如何分割,以便由窗口函数进行处理。OVER 中的 PARTITION BY 子句将行按共享 PARTITION BY 表达式的相同值划分为组或分区。对于每行,窗口函数在与当前行属于同一分区的行中进行计算。

A window function call always contains an OVER clause directly following the window function’s name and argument(s). This is what syntactically distinguishes it from a normal function or non-window aggregate. The OVER clause determines exactly how the rows of the query are split up for processing by the window function. The PARTITION BY clause within OVER divides the rows into groups, or partitions, that share the same values of the PARTITION BY expression(s). For each row, the window function is computed across the rows that fall into the same partition as the current row.

您还可以在 OVER 中使用 ORDER BY 控制窗口函数处理行的方式。(窗口 ORDER BY 甚至不必与行的输出顺序相匹配。)以下是一个示例:

You can also control the order in which rows are processed by window functions using ORDER BY within OVER. (The window ORDER BY does not even have to match the order in which the rows are output.) Here is an example:

SELECT depname, empno, salary,
       rank() OVER (PARTITION BY depname ORDER BY salary DESC)
FROM empsalary;
  depname  | empno | salary | rank
-----------+-------+--------+------
 develop   |     8 |   6000 |    1
 develop   |    10 |   5200 |    2
 develop   |    11 |   5200 |    2
 develop   |     9 |   4500 |    4
 develop   |     7 |   4200 |    5
 personnel |     2 |   3900 |    1
 personnel |     5 |   3500 |    2
 sales     |     1 |   5000 |    1
 sales     |     4 |   4800 |    2
 sales     |     3 |   4800 |    2
(10 rows)

如下所示,rank 函数使用 ORDER BY 子句定义的顺序,为当前行的每个不同的 ORDER BY 值生成数字排名。rank 无需显式参数,因为它的行为完全由 OVER 子句决定。

As shown here, the rank function produces a numerical rank for each distinct ORDER BY value in the current row’s partition, using the order defined by the ORDER BY clause. rank needs no explicit parameter, because its behavior is entirely determined by the OVER clause.

窗口函数考虑的行是查询的 FROM 子句生产的“虚拟表”的行,如果存在它的 WHEREGROUP BYHAVING 子句,则由这些子句过滤。例如,由于不满足 WHERE 条件而删除的行不会被任何窗口函数看到。一个查询可以包含多个窗口函数,它们使用不同的 OVER 子句以不同的方式对数据进行切分,但它们都在同一个由该虚拟表定义的行集合上执行。

The rows considered by a window function are those of the “virtual table” produced by the query’s FROM clause as filtered by its WHERE, GROUP BY, and HAVING clauses if any. For example, a row removed because it does not meet the WHERE condition is not seen by any window function. A query can contain multiple window functions that slice up the data in different ways using different OVER clauses, but they all act on the same collection of rows defined by this virtual table.

我们已经看到,如果行的顺序不重要,可以省略 ORDER BY。也可以省略 PARTITION BY,在这种情况下,只有一个包含所有行的分区。

We already saw that ORDER BY can be omitted if the ordering of rows is not important. It is also possible to omit PARTITION BY, in which case there is a single partition containing all rows.

还有另一个与窗口函数相关的重要概念:对于每行,其分区中有一组行,称为其 window frame 。某些窗口函数只针对窗口范围的行操作,而不是针对整个分区操作。默认情况下,如果提供了 ORDER BY ,那么该范围将包含从分区开始到当前行的所有行,外加根据 ORDER BY 子句与当前行相等的所有后续行。当省略 ORDER BY 时,默认范围将包含分区中的所有行。 [5 ] 这里是一个使用 sum 的示例:

There is another important concept associated with window functions: for each row, there is a set of rows within its partition called its window frame. Some window functions act only on the rows of the window frame, rather than of the whole partition. By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause. When ORDER BY is omitted the default frame consists of all rows in the partition. [5] Here is an example using sum:

SELECT salary, sum(salary) OVER () FROM empsalary;
 salary |  sum
--------+-------
   5200 | 47100
   5000 | 47100
   3500 | 47100
   4800 | 47100
   3900 | 47100
   4200 | 47100
   4500 | 47100
   4800 | 47100
   6000 | 47100
   5200 | 47100
(10 rows)

上面,由于 OVER 子句中没有 ORDER BY,所以窗口框架与分区相同,由于没有 PARTITION BY,因此是整个表;换句话说,每次求和都对整个表进行,因此我们对每一行输出获得相同的结果。但如果我们添加一个 ORDER BY 子句,我们得到的结果非常不同:

Above, since there is no ORDER BY in the OVER clause, the window frame is the same as the partition, which for lack of PARTITION BY is the whole table; in other words each sum is taken over the whole table and so we get the same result for each output row. But if we add an ORDER BY clause, we get very different results:

SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary;
 salary |  sum
--------+-------
   3500 |  3500
   3900 |  7400
   4200 | 11600
   4500 | 16100
   4800 | 25700
   4800 | 25700
   5000 | 30700
   5200 | 41100
   5200 | 41100
   6000 | 47100
(10 rows)

在这里,对从最小的第一个薪水到当前薪水(包括当前薪水的任何重复项)进行了求和(注意重复薪水的结果)。

Here the sum is taken from the first (lowest) salary up through the current one, including any duplicates of the current one (notice the results for the duplicated salaries).

窗口函数只允许在查询的 SELECT 列表和 ORDER BY 子句中。它们在其他地方是不允许的,例如在 GROUP BYHAVINGWHERE 子句中。这是因为它们在逻辑上在这些子句处理后执行。此外,窗口函数在非窗口聚合函数之后执行。这意味着将一个聚合函数调用包含在窗口函数的参数中是有效的,但反之则不行。

Window functions are permitted only in the SELECT list and the ORDER BY clause of the query. They are forbidden elsewhere, such as in GROUP BY, HAVING and WHERE clauses. This is because they logically execute after the processing of those clauses. Also, window functions execute after non-window aggregate functions. This means it is valid to include an aggregate function call in the arguments of a window function, but not vice versa.

如果需要在窗口计算执行后过滤或分组行,可以使用子选择。例如:

If there is a need to filter or group rows after the window calculations are performed, you can use a sub-select. For example:

SELECT depname, empno, salary, enroll_date
FROM
  (SELECT depname, empno, salary, enroll_date,
          rank() OVER (PARTITION BY depname ORDER BY salary DESC, empno) AS pos
     FROM empsalary
  ) AS ss
WHERE pos < 3;

上面的查询只显示内部查询中 rank 小于 3 的行。

The above query only shows the rows from the inner query having rank less than 3.

当一个查询涉及多个窗口函数时,可以用一个单独的 OVER 子句写出每一个函数,但如果需要对几个函数使用相同的窗口行为,则这是重复且容易出错的。相反,可以在 WINDOW 子句中为每个窗口行为命名,然后在 OVER 中引用。例如:

When a query involves multiple window functions, it is possible to write out each one with a separate OVER clause, but this is duplicative and error-prone if the same windowing behavior is wanted for several functions. Instead, each windowing behavior can be named in a WINDOW clause and then referenced in OVER. For example:

SELECT sum(salary) OVER w, avg(salary) OVER w
  FROM empsalary
  WINDOW w AS (PARTITION BY depname ORDER BY salary DESC);

有关窗口函数的更多详细信息,请参阅 Section 4.2.8Section 9.22Section 7.2.5SELECT 参考页面。

More details about window functions can be found in Section 4.2.8, Section 9.22, Section 7.2.5, and the SELECT reference page.

[5 ] 还有其他方式定义窗口范围的选项,但本教程不包含这些内容。有关详细信息,请参阅 Section 4.2.8

[5] There are options to define the window frame in other ways, but this tutorial does not cover them. See Section 4.2.8 for details.