Teradata 简明教程

Teradata - Partitioned Primary Index

分区主索引 (PPI) 是一种索引机制,可用于提高某些查询的性能。当行插入表中时,它们会存储在 AMP 中,并按其行哈希顺序排列。当使用 PPI 定义表时,行会按其分区号排序。在每个分区内,它们会按行哈希排列。行会根据定义的分区表达式分配给分区。

Partitioned Primary Index (PPI) is an indexing mechanism that is useful in improving the performance of certain queries. When rows are inserted into a table, they are stored in an AMP and arranged by their row hash order. When a table is defined with PPI, the rows are sorted by their partition number. Within each partition, they are arranged by their row hash. Rows are assigned to a partition based on the partition expression defined.

Advantages

  1. Avoid full table scan for certain queries.

  2. Avoid using secondary index that requires additional physical structure and additional I/O maintenance.

  3. Access a subset of a large table quickly.

  4. Drop the old data quickly and add new data.

Example

考虑具有订单号为主索引的如下订单表。

Consider the following Orders table with Primary Index on OrderNo.

StoreNo

OrderNo

OrderDate

OrderTotal

101

7501

2015-10-01

900

101

7502

2015-10-02

1,200

102

7503

2015-10-02

3,000

102

7504

2015-10-03

2,454

101

7505

2015-10-03

1201

103

7506

2015-10-04

2,454

101

7507

2015-10-05

1201

101

7508

2015-10-05

1201

假设记录如以下表格所示在 AMP 之间进行分配。记录存储在 AMP 中,根据其行哈希进行排序。

Assume that the records are distributed between AMPs as shown in the following tables. Recorded are stored in AMPs, sorted based on their row hash.

**

RowHash

OrderNo

OrderDate

1

7505

2015-10-03

2

7504

2015-10-03

3

7501

2015-10-01

4

7508

2015-10-05

**

RowHash

OrderNo

OrderDate

1

7507

2015-10-05

2

7502

2015-10-02

3

7506

2015-10-04

4

7503

2015-10-02

如果您运行一个查询以提取某个特定日期的订单,则优化器可能会选择使用全表扫描,然后访问 AMP 中的所有记录。为了避免这种情况,您可以将订单日期定义为分区主索引。当行插入订单表时,它们按订单日期分区。它们将在每个分区中按其行哈希进行排序。

If you run a query to extract the orders for a particular date, then the optimizer may choose to use Full Table Scan, then all the records within the AMP may be accessed. To avoid this, you can define the order date as Partitioned Primary Index. When rows are inserted into orders table, they are partitioned by the order date. Within each partition they will be ordered by their row hash.

以下数据显示了如果按订单日期分区,记录如何存储在 AMP 中。如果运行一个查询以按订单日期访问记录,则只会访问包含特定订单记录的分区。

The following data shows how the records will be stored in AMPs, if they are partitioned by Order Date. If a query is run to access the records by Order Date, then only the partition that contains the records for that particular order will be accessed.

**

Partition

RowHash

OrderNo

OrderDate

0

3

7501

2015-10-01

1

1

7505

2015-10-03

1

2

7504

2015-10-03

2

4

7508

2015-10-05

**

Partition

RowHash

OrderNo

OrderDate

0

2

7502

2015-10-02

0

4

7503

2015-10-02

1

3

7506

2015-10-04

2

1

7507

2015-10-05

以下是一个带分区主索引创建表的示例。PARTITION BY 子句用于定义分区。

Following is an example to create a table with partition primary Index. PARTITION BY clause is used to define the partition.

CREATE SET TABLE Orders (
   StoreNo SMALLINT,
   OrderNo INTEGER,
   OrderDate DATE FORMAT 'YYYY-MM-DD',
   OrderTotal INTEGER
)
PRIMARY INDEX(OrderNo)
PARTITION BY RANGE_N  (
   OrderDate BETWEEN DATE '2010-01-01' AND '2016-12-31' EACH INTERVAL '1' DAY
);

在上述示例中,该表按 OrderDate 列分区。每一天将会有一个单独的分区。

In the above example, the table is partitioned by OrderDate column. There will be one separate partition for each day.