Teradata 简明教程

Teradata - Performance Tuning

本章讨论 Teradata 中的性能调优过程。

This chapter discusses the procedure of performance tuning in Teradata.

Explain

性能调优的第一步是针对查询使用 EXPLAIN。EXPLAIN 计划提供优化器执行查询的详细信息。在 EXPLAIN 计划中,检查关键字(如置信级别、已使用的联接策略、假脱机文件大小、重新分配等)。

The first step in performance tuning is the use of EXPLAIN on your query. EXPLAIN plan gives the details of how optimizer will execute your query. In the Explain plan, check for the keywords like confidence level, join strategy used, spool file size, redistribution, etc.

Collect Statistics

优化器使用数据人口统计数据提出有效的执行策略。COLLECT STATISTICS 命令用于收集表的数据人口统计数据。确保收集的列上的统计数据是最新的。

Optimizer uses Data demographics to come up with effective execution strategy. COLLECT STATISTICS command is used to collect data demographics of the table. Make sure that the statistics collected on the columns are up to date.

  1. Collect statistics on the columns that are used in WHERE clause and on the columns used in the joining condition.

  2. Collect statistics on the Unique Primary Index columns.

  3. Collect statistics on Non Unique Secondary Index columns. Optimizer will decide if it can use NUSI or Full Table Scan.

  4. Collect statistics on the Join Index though the statistics on base table is collected.

  5. Collect statistics on the partitioning columns.

Data Types

确保使用适当的数据类型。这将避免使用超出所需的大量存储空间。

Make sure that proper data types are used. This will avoid the use of excessive storage than required.

Conversion

确保联接条件所用列的数据类型兼容,以避免显式数据转换。

Make sure that the data types of the columns used in join condition are compatible to avoid explicit data conversions.

Sort

除非需要,否则删除不必要的 ORDER BY 子句。

Remove unnecessary ORDER BY clauses unless required.

Spool Space Issue

如果查询超过用户为每个 AMP 分配的卷轴空间限制,则会生成卷轴空间错误。验证执行计划并确定占用更多卷轴空间的步骤。可以将这些中间查询拆分并分别放置以构建临时表。

Spool space error is generated if the query exceeds per AMP spool space limit for that user. Verify the explain plan and identify the step that consumes more spool space. These intermediate queries can be split and put as separately to build temporary tables.

Primary Index

确保表的主索引正确定义。主索引列应均匀分布数据,并且应经常用于访问数据。

Make sure that the Primary Index is correctly defined for the table. The primary index column should evenly distribute the data and should be frequently used to access the data.

SET Table

如果您定义 SET 表,则优化器会检查对于所插入的每条记录,是否存在重复记录。若要删除重复检查条件,您可以为表定义唯一二级索引。

If you define a SET table, then the optimizer will check if the record is duplicate for each and every record inserted. To remove the duplicate check condition, you can define Unique Secondary Index for the table.

UPDATE on Large Table

更新大型表会很耗时。您可以删除记录并插入包含修改后的行的新记录,而不是更新表。

Updating the large table will be time consuming. Instead of updating the table, you can delete the records and insert the records with modified rows.

Dropping Temporary Tables

如果不再需要,请删除临时表(暂存表)和临时变量。这将释放永久空间和卷轴空间。

Drop the temporary tables (staging tables) and volatiles if they are no longer needed. This will free up permanent space and spool space.

MULTISET Table

如果您确信输入记录没有重复记录,则可以将目标表定义为 MULTISET 表,以避免使用 SET 表执行重复行检查。

If you are sure that the input records will not have duplicate records, then you can define the target table as MULTISET table to avoid the duplicate row check used by SET table.