Postgresql 中文操作指南
2.6. Joins Between Tables #
到目前为止,我们的查询一次只访问一个表格。查询可以一次访问多个表格,或者以一种方式访问同一个表格,即同时处理该表格的多行。一次访问多个表格(或同一个表格的多个实例)的查询称为 join 查询。它们将来自一个表格的行与来自第二个表格的行组合在一起,表达式指定了哪些行将被配对。例如,要返回所有天气记录以及相关城市的所在位置,数据库需要将 weather 表格中每行的 city 列与 cities 表格中所有行的 name 列进行比较,并选择这些值匹配的行对。 [4 ] 这将通过以下查询完成:
Thus far, our queries have only accessed one table at a time. Queries can access multiple tables at once, or access the same table in such a way that multiple rows of the table are being processed at the same time. Queries that access multiple tables (or multiple instances of the same table) at one time are called join queries. They combine rows from one table with rows from a second table, with an expression specifying which rows are to be paired. For example, to return all the weather records together with the location of the associated city, the database needs to compare the city column of each row of the weather table with the name column of all rows in the cities table, and select the pairs of rows where these values match.[4] This would be accomplished by the following query:
SELECT * FROM weather JOIN cities ON city = name;
city | temp_lo | temp_hi | prcp | date | name | location
---------------+---------+---------+------+------------+---------------+-----------
San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53)
(2 rows)
请注意结果集中的两件事:
Observe two things about the result set:
由于这些列的所有名称都不同,因此解析器会自动找出它们属于哪张表。如果两个表中存在重复的列名,则需要“限定”列名以表明你的目的是什么,如下所示:
Since the columns all had different names, the parser automatically found which table they belong to. If there were duplicate column names in the two tables you’d need to qualify the column names to show which one you meant, as in:
SELECT weather.city, weather.temp_lo, weather.temp_hi,
weather.prcp, weather.date, cities.location
FROM weather JOIN cities ON weather.city = cities.name;
通常将连接查询中的所有列名称限定为好习惯,以便在以后将重复的列名称添加到其中一个表时,查询不会失败。
It is widely considered good style to qualify all column names in a join query, so that the query won’t fail if a duplicate column name is later added to one of the tables.
类似迄今为止看到的连接查询也可以以这种形式编写:
Join queries of the kind seen thus far can also be written in this form:
SELECT *
FROM weather, cities
WHERE city = name;
此语法早于 SQL-92 中引入的“JOIN”/“ON”语法。这些表只是在“FROM”子句中列出,比较表达式被添加到“ON”子句中。这种较旧的隐式语法的结果和较新的显式“JOIN”/“ON”语法的结果相同。但对于查询的阅读者而言,显式语法使它的含义更易于理解:联接条件由其自己的关键字引入,而以前该条件与其他条件一起混合在“WHERE”子句中。
This syntax pre-dates the JOIN/ON syntax, which was introduced in SQL-92. The tables are simply listed in the FROM clause, and the comparison expression is added to the WHERE clause. The results from this older implicit syntax and the newer explicit JOIN/ON syntax are identical. But for a reader of the query, the explicit syntax makes its meaning easier to understand: The join condition is introduced by its own key word whereas previously the condition was mixed into the WHERE clause together with other conditions.
现在,我们将找出如何找回 Hayward 记录。我们希望查询扫描“天气”表,并针对每行查找匹配的“城市”行。如果找不到匹配的行,我们希望为“城市”表的列替换一些“空值”。这种查询称为“左联接”。(到目前为止,我们看到的联接是“内联接”。)命令如下所示:
Now we will figure out how we can get the Hayward records back in. What we want the query to do is to scan the weather table and for each row to find the matching cities row(s). If no matching row is found we want some “empty values” to be substituted for the cities table’s columns. This kind of query is called an outer join. (The joins we have seen so far are inner joins.) The command looks like this:
SELECT *
FROM weather LEFT OUTER JOIN cities ON weather.city = cities.name;
city | temp_lo | temp_hi | prcp | date | name | location
---------------+---------+---------+------+------------+---------------+-----------
Hayward | 37 | 54 | | 1994-11-29 | |
San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53)
(3 rows)
此查询称为“左外联接”,因为联接操作符左侧提到的表的每一行都会至少一次出现在输出中,而右侧的表只有那些与左侧表的一些行匹配的行才会输出。为没有右表匹配项的左表行输出时,空(null)值将替换为右表列。
This query is called a left outer join because the table mentioned on the left of the join operator will have each of its rows in the output at least once, whereas the table on the right will only have those rows output that match some row of the left table. When outputting a left-table row for which there is no right-table match, empty (null) values are substituted for the right-table columns.
*Exercise: *还有右外联接和全外联接。尝试找出它们的作用。
*Exercise: * There are also right outer joins and full outer joins. Try to find out what those do.
我们还可以将一张表与其自身联接起来。这称为“自联接”。例如,假设我们希望找出温度范围在其他天气记录范围内的所有天气记录。因此,我们需要将每一行“天气”表的“温度”和“湿度”列与所有其他“天气”行的“温度”和“湿度”列进行比较。我们可以使用以下查询来执行此操作:
We can also join a table against itself. This is called a self join. As an example, suppose we wish to find all the weather records that are in the temperature range of other weather records. So we need to compare the temp_lo and temp_hi columns of each weather row to the temp_lo and temp_hi columns of all other weather rows. We can do this with the following query:
SELECT w1.city, w1.temp_lo AS low, w1.temp_hi AS high,
w2.city, w2.temp_lo AS low, w2.temp_hi AS high
FROM weather w1 JOIN weather w2
ON w1.temp_lo < w2.temp_lo AND w1.temp_hi > w2.temp_hi;
city | low | high | city | low | high
---------------+-----+------+---------------+-----+------
San Francisco | 43 | 57 | San Francisco | 46 | 50
Hayward | 37 | 54 | San Francisco | 46 | 50
(2 rows)
这里,我们将天气表重新标记为 w1 和 w2,以便区分连接的左侧和右侧。您还可以在其他查询中使用这些别名来节省一些键入,例如:
Here we have relabeled the weather table as w1 and w2 to be able to distinguish the left and right side of the join. You can also use these kinds of aliases in other queries to save some typing, e.g.:
SELECT *
FROM weather w JOIN cities c ON w.city = c.name;
您会经常遇到这种缩写样式。
You will encounter this style of abbreviating quite frequently.
[4 ] 这仅仅是一个概念模型。联接通常会以比实际比较每一行对更有效的方式进行,但这对于用户是不可见的。
[4] This is only a conceptual model. The join is usually performed in a more efficient manner than actually comparing each possible pair of rows, but this is invisible to the user.