Teradata 简明教程

Teradata - JOIN Index

JOIN INDEX 是物化视图。其定义永久存储,每当关联索引中引用的基础表更新时数据都会更新。JOIN INDEX 可能包含一个或多个表,并且还包含预聚合数据。连接索引主要用于提高性能。

JOIN INDEX is a materialized view. Its definition is permanently stored and the data is updated whenever the base tables referred in the join index is updated. JOIN INDEX may contain one or more tables and also contain pre-aggregated data. Join indexes are mainly used for improving the performance.

有不同类型的 join 索引可用。

There are different types of join indexes available.

  1. Single Table Join Index (STJI)

  2. Multi Table Join Index (MTJI)

  3. Aggregated Join Index (AJI)

Single Table Join Index

单表连接索引允许基于不同于基础表的主索引列对大表进行分区。

Single Table Join index allows to partition a large table based on the different primary index columns than the one from the base table.

Syntax

以下是 JOIN INDEX 的语法。

Following is the syntax of a JOIN INDEX.

CREATE JOIN INDEX <index name>
AS
<SELECT Query>
<Index Definition>;

Example

考虑以下 Employee 和 Salary 表。

Consider the following Employee and Salary tables.

CREATE SET TABLE EMPLOYEE,FALLBACK (
   EmployeeNo INTEGER,
   FirstName VARCHAR(30) ,
   LastName VARCHAR(30) ,
   DOB DATE FORMAT 'YYYY-MM-DD',
   JoinedDate DATE FORMAT 'YYYY-MM-DD',
   DepartmentNo BYTEINT
)
UNIQUE PRIMARY INDEX ( EmployeeNo );

CREATE SET TABLE SALARY,FALLBACK (
   EmployeeNo INTEGER,
   Gross INTEGER,
   Deduction INTEGER,
   NetPay INTEGER
)
PRIMARY INDEX ( EmployeeNo )
UNIQUE INDEX (EmployeeNo);

以下是一个示例,在 Employee 表上创建名为 Employee_JI 的连接索引。

Following is an example that creates a Join index named Employee_JI on Employee table.

CREATE JOIN INDEX Employee_JI
AS
SELECT EmployeeNo,FirstName,LastName,
BirthDate,JoinedDate,DepartmentNo
FROM Employee
PRIMARY INDEX(FirstName);

如果用户提交一个带有 EmployeeNo 的 WHERE 子句的查询,那么系统将使用唯一主索引对 Employee 表进行查询。如果用户使用 employee_name 对 employee 表进行查询,那么系统可能会使用 employee_name 访问关联索引 Employee_JI。连接索引的行在 employee_name 列上进行哈希处理。如果没有定义连接索引,并且 employee_name 未定义为二级索引,那么系统将执行全表扫描以访问耗时的行。

If the user submits a query with a WHERE clause on EmployeeNo, then the system will query the Employee table using the unique primary index. If the user queries the employee table using employee_name, then the system may access the join index Employee_JI using employee_name. The rows of the join index are hashed on employee_name column. If the join index is not defined and the employee_name is not defined as secondary index, then the system will perform full table scan to access the rows which is time consuming.

您可以运行以下 EXPLAIN 计划并验证优化器计划。在以下示例中,您会看到优化器在使用 Employee_Name 列查询表时使用连接索引而不是基础 Employee 表。

You can run the following EXPLAIN plan and verify the optimizer plan. In the following example you can see that the optimizer is using the Join Index instead of base Employee table when the table queries using the Employee_Name column.

EXPLAIN SELECT * FROM EMPLOYEE WHERE FirstName='Mike';
*** Help information returned. 8 rows.
*** Total elapsed time was 1 second.
Explanation
------------------------------------------------------------------------
   1) First, we do a single-AMP RETRIEVE step from EMPLOYEE_JI by
      way of the primary index "EMPLOYEE_JI.FirstName = 'Mike'"
      with no residual conditions into Spool 1 (one-amp), which is built
      locally on that AMP.  The size of Spool 1 is estimated with low
      confidence to be 2 rows (232 bytes).  The estimated time for this
      step is 0.02 seconds.
   → The contents of Spool 1 are sent back to the user as the result of
      statement 1.  The total estimated time is 0.02 seconds.

Multi Table Join Index

多表连接索引通过连接多张表来创建。多表连接索引可用于存储常用连接表的 result 集,以提高性能。

A multi-table join index is created by joining more than one table. Multi-table join index can be used to store the result set of frequently joined tables to improve the performance.

Example

以下示例通过连接 Employee 和 Salary 表创建名为 Employee_Salary_JI 的 JOIN INDEX。

The following example creates a JOIN INDEX named Employee_Salary_JI by joining Employee and Salary tables.

CREATE JOIN INDEX Employee_Salary_JI
AS
SELECT a.EmployeeNo,a.FirstName,a.LastName,
a.BirthDate,a.JoinedDate,a.DepartmentNo,b.Gross,b.Deduction,b.NetPay
FROM Employee a
INNER JOIN Salary b
ON(a.EmployeeNo = b.EmployeeNo)
PRIMARY INDEX(FirstName);

每当更新基础表 Employee 或 Salary 时,连接索引 Employee_Salary_JI 也会自动更新。如果您正在运行一个连接 Employee 和 Salary 表的查询,那么优化器可能会选择直接从 Employee_Salary_JI 访问数据,而不是连接表。可以对查询使用 EXPLAIN 计划来验证优化器是否会选择基础表或连接索引。

Whenever the base tables Employee or Salary are updated, then the Join index Employee_Salary_JI is also automatically updated. If you are running a query joining Employee and Salary tables, then the optimizer may choose to access the data from Employee_Salary_JI directly instead of joining the tables. EXPLAIN plan on the query can be used to verify if the optimizer will choose the base table or Join index.

Aggregate Join Index

如果某个表总是在某些列上聚合,那么可以在表上定义聚合连接索引来提升性能。聚合连接索引的一个限制是它仅支持 SUM 和 COUNT 函数。

If a table is consistently aggregated on certain columns, then aggregate join index can be defined on the table to improve the performance. One limitation of aggregate join index is that it supports only SUM and COUNT functions.

Example

在以下示例中,员工和薪水连接起来,从而按部门识别总薪水。

In the following example Employee and Salary is joined to identify the total salary per Department.

CREATE JOIN INDEX Employee_Salary_JI
AS
SELECT a.DepartmentNo,SUM(b.NetPay) AS TotalPay
FROM Employee a
INNER JOIN Salary b
ON(a.EmployeeNo = b.EmployeeNo)
GROUP BY a.DepartmentNo
Primary Index(DepartmentNo);