Teradata 简明教程

Teradata - Partitioned Primary Index

分区主索引 (PPI) 是一种索引机制，可用于提高某些查询的性能。当行插入表中时，它们会存储在 AMP 中，并按其行哈希顺序排列。当使用 PPI 定义表时，行会按其分区号排序。在每个分区内，它们会按行哈希排列。行会根据定义的分区表达式分配给分区。

Partitioned Primary Index (PPI) is an indexing mechanism that is useful in improving the performance of certain queries. When rows are inserted into a table, they are stored in an AMP and arranged by their row hash order. When a table is defined with PPI, the rows are sorted by their partition number. Within each partition, they are arranged by their row hash. Rows are assigned to a partition based on the partition expression defined.

Advantages

Avoid full table scan for certain queries.
Avoid using secondary index that requires additional physical structure and additional I/O maintenance.
Access a subset of a large table quickly.
Drop the old data quickly and add new data.

Example

考虑具有订单号为主索引的如下订单表。

Consider the following Orders table with Primary Index on OrderNo.

StoreNo	OrderNo	OrderDate	OrderTotal
101	7501	2015-10-01	900
101	7502	2015-10-02	1,200
102	7503	2015-10-02	3,000
102	7504	2015-10-03	2,454
101	7505	2015-10-03	1201
103	7506	2015-10-04	2,454
101	7507	2015-10-05	1201
101	7508	2015-10-05	1201

假设记录如以下表格所示在 AMP 之间进行分配。记录存储在 AMP 中，根据其行哈希进行排序。

Assume that the records are distributed between AMPs as shown in the following tables. Recorded are stored in AMPs, sorted based on their row hash.

RowHash	OrderNo	OrderDate
1	7505	2015-10-03
2	7504	2015-10-03
3	7501	2015-10-01
4	7508	2015-10-05

RowHash	OrderNo	OrderDate
1	7507	2015-10-05
2	7502	2015-10-02
3	7506	2015-10-04
4	7503	2015-10-02

如果您运行一个查询以提取某个特定日期的订单，则优化器可能会选择使用全表扫描，然后访问 AMP 中的所有记录。为了避免这种情况，您可以将订单日期定义为分区主索引。当行插入订单表时，它们按订单日期分区。它们将在每个分区中按其行哈希进行排序。

If you run a query to extract the orders for a particular date, then the optimizer may choose to use Full Table Scan, then all the records within the AMP may be accessed. To avoid this, you can define the order date as Partitioned Primary Index. When rows are inserted into orders table, they are partitioned by the order date. Within each partition they will be ordered by their row hash.

以下数据显示了如果按订单日期分区，记录如何存储在 AMP 中。如果运行一个查询以按订单日期访问记录，则只会访问包含特定订单记录的分区。

The following data shows how the records will be stored in AMPs, if they are partitioned by Order Date. If a query is run to access the records by Order Date, then only the partition that contains the records for that particular order will be accessed.

Partition	RowHash	OrderNo	OrderDate
0	3	7501	2015-10-01
1	1	7505	2015-10-03
1	2	7504	2015-10-03
2	4	7508	2015-10-05

Partition	RowHash	OrderNo	OrderDate
0	2	7502	2015-10-02
0	4	7503	2015-10-02
1	3	7506	2015-10-04
2	1	7507	2015-10-05

以下是一个带分区主索引创建表的示例。PARTITION BY 子句用于定义分区。

Following is an example to create a table with partition primary Index. PARTITION BY clause is used to define the partition.

CREATE SET TABLE Orders (
   StoreNo SMALLINT,
   OrderNo INTEGER,
   OrderDate DATE FORMAT 'YYYY-MM-DD',
   OrderTotal INTEGER
)
PRIMARY INDEX(OrderNo)
PARTITION BY RANGE_N  (
   OrderDate BETWEEN DATE '2010-01-01' AND '2016-12-31' EACH INTERVAL '1' DAY
);

在上述示例中，该表按 OrderDate 列分区。每一天将会有一个单独的分区。

In the above example, the table is partitioned by OrderDate column. There will be one separate partition for each day.