Teradata 简明教程

Teradata - JOIN strategies

本章讨论 Teradata 中可用的各种 JOIN 策略。

This chapter discusses the various JOIN strategies available in Teradata.

Join Methods

Teradata 使用不同的连接方法来执行连接操作。一些常用的连接方法包括：

Teradata uses different join methods to perform join operations. Some of the commonly used Join methods are −

Merge Join
Nested Join
Product Join

Merge Join

当连接基于相等条件时，将执行 Merge Join 方法。Merge Join 要求连接的行位于同一 AMP 上。行是根据其行哈希连接的。Merge Join 使用不同的连接策略将行带到同一个 AMP。

Merge Join method takes place when the join is based on the equality condition. Merge Join requires the joining rows to be on the same AMP. Rows are joined based on their row hash. Merge Join uses different join strategies to bring the rows to the same AMP.

Strategy

如果连接列是相应表的初级索引，那么连接行已经位于同一个 AMP 上。在这种情况下，不需要分配。

If the join columns are the primary indexes of the corresponding tables, then the joining rows are already on the same AMP. In this case, no distribution is required.

考虑以下员工和薪水表。

Consider the following Employee and Salary Tables.

CREATE SET TABLE EMPLOYEE,FALLBACK (
   EmployeeNo INTEGER,
   FirstName VARCHAR(30) ,
   LastName VARCHAR(30) ,
   DOB DATE FORMAT 'YYYY-MM-DD',
   JoinedDate DATE FORMAT 'YYYY-MM-DD',
   DepartmentNo BYTEINT
)
UNIQUE PRIMARY INDEX ( EmployeeNo );

CREATE SET TABLE Salary (
   EmployeeNo INTEGER,
   Gross INTEGER,
   Deduction INTEGER,
   NetPay INTEGER
)
UNIQUE PRIMARY INDEX(EmployeeNo);

当这两个表按 EmployeeNo 列连接时，则不会发生重新分配，因为 EmployeeNo 是被连接的两个表的初级索引。

When these two tables are joined on EmployeeNo column, then no redistribution takes place since EmployeeNo is the primary index of both the tables which are being joined.

Strategy

考虑以下员工和部门表。

Consider the following Employee and Department tables.

CREATE SET TABLE EMPLOYEE,FALLBACK (
   EmployeeNo INTEGER,
   FirstName VARCHAR(30) ,
   LastName VARCHAR(30) ,
   DOB DATE FORMAT 'YYYY-MM-DD',
   JoinedDate DATE FORMAT 'YYYY-MM-DD',
   DepartmentNo BYTEINT
)
UNIQUE PRIMARY INDEX ( EmployeeNo );

CREATE SET TABLE DEPARTMENT,FALLBACK (
   DepartmentNo BYTEINT,
   DepartmentName CHAR(15)
)
UNIQUE PRIMARY INDEX ( DepartmentNo );

如果这两个表按 DeparmentNo 列连接，那么需要重新分配这些行，因为 DepartmentNo 是一个表中的初级索引，而另一个表中的非初级索引。在此场景中，连接行可能不在同一个 AMP 上。在这种情况下，Teradata 可能会在 DepartmentNo 列上重新分配员工表。

If these two tables are joined on DeparmentNo column, then the rows need to be redistributed since DepartmentNo is a primary index in one table and non-primary index in another table. In this scenario, joining rows may not be on the same AMP. In such case, Teradata may redistribute employee table on DepartmentNo column.

Strategy

对于上面的员工和部门表，如果部门表大小较小，Teradata 可能会在所有 AMP 上复制部门表。

For the above Employee and Department tables, Teradata may duplicate the Department table on all AMPs, if the size of Department table is small.

Nested Join

Nested Join 并不使用所有 AMP。对于 Nested Join 来说，一个条件应该是对一个表的唯一初级索引的相等，然后将此列连接到另一个表上的任何索引。

Nested Join doesn’t use all AMPs. For the Nested Join to take place, one of the condition should be equality on the unique primary index of one table and then joining this column to any index on the other table.

在此情况下，系统将使用一个表唯一初级索引获取一行，并使用该行哈希从其他表获取匹配的记录。Nested join 是所有连接方法中最有效的。

In this scenario, the system will fetch the one row using Unique Primary index of one table and use that row hash to fetch the matching records from other table. Nested join is the most efficient of all Join methods.

Product Join

Product Join 将一个表中每个符合条件的行与另一个表中每个符合条件的行进行比较。由于以下一些因素，可能会发生 Product join：

Product Join compares each qualifying row from one table with each qualifying row from other table. Product join may take place due to some of the following factors −

Where condition is missing.
Join condition is not based on equality condition.
Table aliases is not correct.
Multiple join conditions.