Hive 简明教程

HiveQL - Select-Joins

JOIN 是一种从两个表中组合特定字段的从句,它使用这两个表中公用的值。用于组合数据库中两个或更多个表中的记录。

JOIN is a clause that is used for combining specific fields from two tables by using values common to each one. It is used to combine records from two or more tables in the database.

Syntax

join_table:

   table_reference JOIN table_factor [join_condition]
   | table_reference {LEFT|RIGHT|FULL} [OUTER] JOIN table_reference
   join_condition
   | table_reference LEFT SEMI JOIN table_reference join_condition
   | table_reference CROSS JOIN table_reference [join_condition]

Example

我们在本章中将使用以下两张表。考虑名为 CUSTOMERS 的表。

We will use the following two tables in this chapter. Consider the following table named CUSTOMERS..

+----+----------+-----+-----------+----------+
| ID | NAME     | AGE | ADDRESS   | SALARY   |
+----+----------+-----+-----------+----------+
| 1  | Ramesh   | 32  | Ahmedabad | 2000.00  |
| 2  | Khilan   | 25  | Delhi     | 1500.00  |
| 3  | kaushik  | 23  | Kota      | 2000.00  |
| 4  | Chaitali | 25  | Mumbai    | 6500.00  |
| 5  | Hardik   | 27  | Bhopal    | 8500.00  |
| 6  | Komal    | 22  | MP        | 4500.00  |
| 7  | Muffy    | 24  | Indore    | 10000.00 |
+----+----------+-----+-----------+----------+

考虑另一个表 ORDERS 如下:

Consider another table ORDERS as follows:

+-----+---------------------+-------------+--------+
|OID  | DATE                | CUSTOMER_ID | AMOUNT |
+-----+---------------------+-------------+--------+
| 102 | 2009-10-08 00:00:00 |           3 | 3000   |
| 100 | 2009-10-08 00:00:00 |           3 | 1500   |
| 101 | 2009-11-20 00:00:00 |           2 | 1560   |
| 103 | 2008-05-20 00:00:00 |           4 | 2060   |
+-----+---------------------+-------------+--------+

给出如下几种类型的联接:

There are different types of joins given as follows:

  1. JOIN

  2. LEFT OUTER JOIN

  3. RIGHT OUTER JOIN

  4. FULL OUTER JOIN

JOIN

JOIN 子句用于组合和检索来自多张表的记录。JOIN 与 SQL 中的 OUTER JOIN 相同。JOIN 条件应使用表的键和外键创建。

JOIN clause is used to combine and retrieve the records from multiple tables. JOIN is same as OUTER JOIN in SQL. A JOIN condition is to be raised using the primary keys and foreign keys of the tables.

以下查询对 CUSTOMER 和 ORDER 表执行 JOIN 并检索记录:

The following query executes JOIN on the CUSTOMER and ORDER tables, and retrieves the records:

hive> SELECT c.ID, c.NAME, c.AGE, o.AMOUNT
FROM CUSTOMERS c JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);

在成功执行查询后,您可以看到以下响应:

On successful execution of the query, you get to see the following response:

+----+----------+-----+--------+
| ID | NAME     | AGE | AMOUNT |
+----+----------+-----+--------+
| 3  | kaushik  | 23  | 3000   |
| 3  | kaushik  | 23  | 1500   |
| 2  | Khilan   | 25  | 1560   |
| 4  | Chaitali | 25  | 2060   |
+----+----------+-----+--------+

LEFT OUTER JOIN

HiveQL LEFT OUTER JOIN 返回左表的全部行,即使在右表中没有匹配行也是如此。这意味着,如果 ON 子句在右表中匹配 0(零)个记录,则 JOIN 仍然会返回结果中的行,但在右表的每一列中都会返回 NULL。

The HiveQL LEFT OUTER JOIN returns all the rows from the left table, even if there are no matches in the right table. This means, if the ON clause matches 0 (zero) records in the right table, the JOIN still returns a row in the result, but with NULL in each column from the right table.

LEFT JOIN 返回左表中的全部值以及右表中的匹配值,如果匹配 JOIN 谓词不存在,则返回 NULL。

A LEFT JOIN returns all the values from the left table, plus the matched values from the right table, or NULL in case of no matching JOIN predicate.

以下查询演示了 CUSTOMER 和 ORDER 表之间的 LEFT OUTER JOIN:

The following query demonstrates LEFT OUTER JOIN between CUSTOMER and ORDER tables:

hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE
FROM CUSTOMERS c
LEFT OUTER JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);

在成功执行查询后,您可以看到以下响应:

On successful execution of the query, you get to see the following response:

+----+----------+--------+---------------------+
| ID | NAME     | AMOUNT | DATE                |
+----+----------+--------+---------------------+
| 1  | Ramesh   | NULL   | NULL                |
| 2  | Khilan   | 1560   | 2009-11-20 00:00:00 |
| 3  | kaushik  | 3000   | 2009-10-08 00:00:00 |
| 3  | kaushik  | 1500   | 2009-10-08 00:00:00 |
| 4  | Chaitali | 2060   | 2008-05-20 00:00:00 |
| 5  | Hardik   | NULL   | NULL                |
| 6  | Komal    | NULL   | NULL                |
| 7  | Muffy    | NULL   | NULL                |
+----+----------+--------+---------------------+

RIGHT OUTER JOIN

HiveQL RIGHT OUTER JOIN 返回右表的全部行,即使在左表中没有匹配行也是如此。如果 ON 子句在左表中匹配 0(零)个记录,则 JOIN 仍然会返回结果中的行,但在左表的每一列中都会返回 NULL。

The HiveQL RIGHT OUTER JOIN returns all the rows from the right table, even if there are no matches in the left table. If the ON clause matches 0 (zero) records in the left table, the JOIN still returns a row in the result, but with NULL in each column from the left table.

RIGHT JOIN 返回右表中的全部值以及左表中的匹配值,如果匹配 JOIN 谓词不存在,则返回 NULL。

A RIGHT JOIN returns all the values from the right table, plus the matched values from the left table, or NULL in case of no matching join predicate.

以下查询演示了 CUSTOMER 和 ORDER 表之间的 RIGHT OUTER JOIN。

The following query demonstrates RIGHT OUTER JOIN between the CUSTOMER and ORDER tables.

在成功执行查询后,您可以看到以下响应:

On successful execution of the query, you get to see the following response:

+------+----------+--------+---------------------+
| ID   | NAME     | AMOUNT | DATE                |
+------+----------+--------+---------------------+
| 3    | kaushik  | 3000   | 2009-10-08 00:00:00 |
| 3    | kaushik  | 1500   | 2009-10-08 00:00:00 |
| 2    | Khilan   | 1560   | 2009-11-20 00:00:00 |
| 4    | Chaitali | 2060   | 2008-05-20 00:00:00 |
+------+----------+--------+---------------------+

FULL OUTER JOIN

HiveQL FULL OUTER JOIN 组合满足 JOIN 条件的左外表和右外表的记录。连接表包含来自两张表的所有记录,或为两边缺失的匹配填入 NULL 值。

The HiveQL FULL OUTER JOIN combines the records of both the left and the right outer tables that fulfil the JOIN condition. The joined table contains either all the records from both the tables, or fills in NULL values for missing matches on either side.

以下查询演示了 CUSTOMER 表与 ORDER 表之间的 FULL OUTER JOIN:

The following query demonstrates FULL OUTER JOIN between CUSTOMER and ORDER tables:

hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE
FROM CUSTOMERS c
FULL OUTER JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);

在成功执行查询后,您可以看到以下响应:

On successful execution of the query, you get to see the following response:

+------+----------+--------+---------------------+
| ID   | NAME     | AMOUNT | DATE                |
+------+----------+--------+---------------------+
| 1    | Ramesh   | NULL   | NULL                |
| 2    | Khilan   | 1560   | 2009-11-20 00:00:00 |
| 3    | kaushik  | 3000   | 2009-10-08 00:00:00 |
| 3    | kaushik  | 1500   | 2009-10-08 00:00:00 |
| 4    | Chaitali | 2060   | 2008-05-20 00:00:00 |
| 5    | Hardik   | NULL   | NULL                |
| 6    | Komal    | NULL   | NULL                |
| 7    | Muffy    | NULL   | NULL                |
| 3    | kaushik  | 3000   | 2009-10-08 00:00:00 |
| 3    | kaushik  | 1500   | 2009-10-08 00:00:00 |
| 2    | Khilan   | 1560   | 2009-11-20 00:00:00 |
| 4    | Chaitali | 2060   | 2008-05-20 00:00:00 |
+------+----------+--------+---------------------+