Hive 简明教程
HiveQL - Select-Joins
JOIN 是一种从两个表中组合特定字段的从句,它使用这两个表中公用的值。用于组合数据库中两个或更多个表中的记录。
Syntax
join_table:
table_reference JOIN table_factor [join_condition]
| table_reference {LEFT|RIGHT|FULL} [OUTER] JOIN table_reference
join_condition
| table_reference LEFT SEMI JOIN table_reference join_condition
| table_reference CROSS JOIN table_reference [join_condition]
Example
我们在本章中将使用以下两张表。考虑名为 CUSTOMERS 的表。
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
考虑另一个表 ORDERS 如下:
+-----+---------------------+-------------+--------+
|OID | DATE | CUSTOMER_ID | AMOUNT |
+-----+---------------------+-------------+--------+
| 102 | 2009-10-08 00:00:00 | 3 | 3000 |
| 100 | 2009-10-08 00:00:00 | 3 | 1500 |
| 101 | 2009-11-20 00:00:00 | 2 | 1560 |
| 103 | 2008-05-20 00:00:00 | 4 | 2060 |
+-----+---------------------+-------------+--------+
给出如下几种类型的联接:
-
JOIN
-
LEFT OUTER JOIN
-
RIGHT OUTER JOIN
-
FULL OUTER JOIN
JOIN
JOIN 子句用于组合和检索来自多张表的记录。JOIN 与 SQL 中的 OUTER JOIN 相同。JOIN 条件应使用表的键和外键创建。
以下查询对 CUSTOMER 和 ORDER 表执行 JOIN 并检索记录:
hive> SELECT c.ID, c.NAME, c.AGE, o.AMOUNT
FROM CUSTOMERS c JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);
在成功执行查询后,您可以看到以下响应:
+----+----------+-----+--------+
| ID | NAME | AGE | AMOUNT |
+----+----------+-----+--------+
| 3 | kaushik | 23 | 3000 |
| 3 | kaushik | 23 | 1500 |
| 2 | Khilan | 25 | 1560 |
| 4 | Chaitali | 25 | 2060 |
+----+----------+-----+--------+
LEFT OUTER JOIN
HiveQL LEFT OUTER JOIN 返回左表的全部行,即使在右表中没有匹配行也是如此。这意味着,如果 ON 子句在右表中匹配 0(零)个记录,则 JOIN 仍然会返回结果中的行,但在右表的每一列中都会返回 NULL。
LEFT JOIN 返回左表中的全部值以及右表中的匹配值,如果匹配 JOIN 谓词不存在,则返回 NULL。
以下查询演示了 CUSTOMER 和 ORDER 表之间的 LEFT OUTER JOIN:
hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE
FROM CUSTOMERS c
LEFT OUTER JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);
在成功执行查询后,您可以看到以下响应:
+----+----------+--------+---------------------+
| ID | NAME | AMOUNT | DATE |
+----+----------+--------+---------------------+
| 1 | Ramesh | NULL | NULL |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
| 5 | Hardik | NULL | NULL |
| 6 | Komal | NULL | NULL |
| 7 | Muffy | NULL | NULL |
+----+----------+--------+---------------------+
RIGHT OUTER JOIN
HiveQL RIGHT OUTER JOIN 返回右表的全部行,即使在左表中没有匹配行也是如此。如果 ON 子句在左表中匹配 0(零)个记录,则 JOIN 仍然会返回结果中的行,但在左表的每一列中都会返回 NULL。
RIGHT JOIN 返回右表中的全部值以及左表中的匹配值,如果匹配 JOIN 谓词不存在,则返回 NULL。
以下查询演示了 CUSTOMER 和 ORDER 表之间的 RIGHT OUTER JOIN。
在成功执行查询后,您可以看到以下响应:
+------+----------+--------+---------------------+
| ID | NAME | AMOUNT | DATE |
+------+----------+--------+---------------------+
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
+------+----------+--------+---------------------+
FULL OUTER JOIN
HiveQL FULL OUTER JOIN 组合满足 JOIN 条件的左外表和右外表的记录。连接表包含来自两张表的所有记录,或为两边缺失的匹配填入 NULL 值。
以下查询演示了 CUSTOMER 表与 ORDER 表之间的 FULL OUTER JOIN:
hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE
FROM CUSTOMERS c
FULL OUTER JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);
在成功执行查询后,您可以看到以下响应:
+------+----------+--------+---------------------+
| ID | NAME | AMOUNT | DATE |
+------+----------+--------+---------------------+
| 1 | Ramesh | NULL | NULL |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
| 5 | Hardik | NULL | NULL |
| 6 | Komal | NULL | NULL |
| 7 | Muffy | NULL | NULL |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
+------+----------+--------+---------------------+