Postgresql 中文操作指南
2.7. Aggregate Functions #
与大多数其他关系数据库产品一样,PostgreSQL 支持 aggregate functions。一个聚合函数从多个输入行计算一个单一结果。例如,有一些聚合函数计算 count、sum、avg(平均值)、max(最大值)和 min(最小值)的一组行。
Like most other relational database products, PostgreSQL supports aggregate functions. An aggregate function computes a single result from multiple input rows. For example, there are aggregates to compute the count, sum, avg (average), max (maximum) and min (minimum) over a set of rows.
例如,我们可以使用以下内容找到任何地方的最高低温读数:
As an example, we can find the highest low-temperature reading anywhere with:
SELECT max(temp_lo) FROM weather;
max
-----
46
(1 row)
如果我们想知道发生阅读的城市(或城市),我们可能会尝试:
If we wanted to know what city (or cities) that reading occurred in, we might try:
SELECT city FROM weather WHERE temp_lo = max(temp_lo); WRONG
但这将不起作用,因为总计 max 无法在 WHERE 子句中使用。(存在此限制是因为 WHERE 子句确定将哪些行包括在总计计算中;因此显然必须在计算总计函数之前对其进行评估。)但是,通常可以重新表述查询以实现所需的结果,这里通过使用 subquery:
but this will not work since the aggregate max cannot be used in the WHERE clause. (This restriction exists because the WHERE clause determines which rows will be included in the aggregate calculation; so obviously it has to be evaluated before aggregate functions are computed.) However, as is often the case the query can be restated to accomplish the desired result, here by using a subquery:
SELECT city FROM weather
WHERE temp_lo = (SELECT max(temp_lo) FROM weather);
city
---------------
San Francisco
(1 row)
这可以,因为子查询是一个独立的计算,它单独计算自己的总计,与外部查询中的发生无关。
This is OK because the subquery is an independent computation that computes its own aggregate separately from what is happening in the outer query.
聚合与 GROUP BY 子句结合使用时也非常有用。例如,我们可以通过以下方式获取各个城市的读数数量和观察到的最高低温:
Aggregates are also very useful in combination with GROUP BY clauses. For example, we can get the number of readings and the maximum low temperature observed in each city with:
SELECT city, count(*), max(temp_lo)
FROM weather
GROUP BY city;
city | count | max
---------------+-------+-----
Hayward | 1 | 37
San Francisco | 2 | 46
(2 rows)
这为我们提供了每个城市一行输出。每个总计结果都是通过与该城市匹配的表行计算的。我们可以使用 HAVING 过滤这些分组行:
which gives us one output row per city. Each aggregate result is computed over the table rows matching that city. We can filter these grouped rows using HAVING:
SELECT city, count(*), max(temp_lo)
FROM weather
GROUP BY city
HAVING max(temp_lo) < 40;
city | count | max
---------+-------+-----
Hayward | 1 | 37
(1 row)
该操作为我们提供了仅对所有 temp_lo 值低于 40 的城市的相同结果。最后,如果我们只关心名称以“S”开头的城市,我们可能会执行以下操作:
which gives us the same results for only the cities that have all temp_lo values below 40. Finally, if we only care about cities whose names begin with “S”, we might do:
SELECT city, count(*), max(temp_lo)
FROM weather
WHERE city LIKE 'S%' -- (1)
GROUP BY city;
city | count | max
---------------+-------+-----
San Francisco | 2 | 46
(1 row)
了解总计与 SQL 的 WHERE 和 HAVING 子句之间的交互非常重要。WHERE 和 HAVING 之间的根本区别在于:WHERE 在计算组和总计之前选择输入行(因此,它控制哪些行进入总计计算),而 HAVING 在计算组和总计之后选择组行。因此,WHERE 子句不得包含总计函数;尝试使用总计来确定哪些行将成为总计的输入是没有意义的。另一方面,HAVING 子句始终包含总计函数。(严格来说,允许你编写不使用总计的 HAVING 子句,但它很少有用。相同条件可以在 WHERE 阶段更有效地使用。)
It is important to understand the interaction between aggregates and SQL’s WHERE and HAVING clauses. The fundamental difference between WHERE and HAVING is this: WHERE selects input rows before groups and aggregates are computed (thus, it controls which rows go into the aggregate computation), whereas HAVING selects group rows after groups and aggregates are computed. Thus, the WHERE clause must not contain aggregate functions; it makes no sense to try to use an aggregate to determine which rows will be inputs to the aggregates. On the other hand, the HAVING clause always contains aggregate functions. (Strictly speaking, you are allowed to write a HAVING clause that doesn’t use aggregates, but it’s seldom useful. The same condition could be used more efficiently at the WHERE stage.)
在前面的示例中,我们可以在 WHERE 中应用城市名称限制,因为它不需要总计。这比将限制添加到 HAVING 更有效,因为我们避免对无法通过 WHERE 检查的所有行执行分组和总计计算。
In the previous example, we can apply the city name restriction in WHERE, since it needs no aggregate. This is more efficient than adding the restriction to HAVING, because we avoid doing the grouping and aggregate calculations for all rows that fail the WHERE check.
选择进入总计计算的行的方法是使用 FILTER,这是一个按总计选项计算的选项:
Another way to select the rows that go into an aggregate computation is to use FILTER, which is a per-aggregate option:
SELECT city, count(*) FILTER (WHERE temp_lo < 45), max(temp_lo)
FROM weather
GROUP BY city;
city | count | max
---------------+-------+-----
Hayward | 1 | 37
San Francisco | 1 | 46
(2 rows)
FILTER 与 WHERE 非常相似,但它仅从其附加到的特定总计函数的输入中移除行。此处,count 总计仅计算 temp_lo 低于 45 的行;但 max 总计仍应用于所有行,因此它仍找到 46 的读数。
FILTER is much like WHERE, except that it removes rows only from the input of the particular aggregate function that it is attached to. Here, the count aggregate counts only rows with temp_lo below 45; but the max aggregate is still applied to all rows, so it still finds the reading of 46.