Teradata 简明教程
Teradata - Hashing Algorithm
行是根据主键值分配给特定 AMP 的。Teradata 使用哈希算法来确定哪一个 AMP 获取行。
A row is assigned to a particular AMP based on the primary index value. Teradata uses hashing algorithm to determine which AMP gets the row.
以下是有关哈希算法的高级图表。
Following is a high level diagram on hashing algorithm.

以下是插入数据的步骤。
Following are the steps to insert the data.
-
The client submits a query.
-
The parser receives the query and passes the PI value of the record to the hashing algorithm.
-
The hashing algorithm hashes the primary index value and returns a 32 bit number, called Row Hash.
-
The higher order bits of the row hash (first 16 bits) is used to identify the hash map entry. The hash map contains one AMP #. Hash map is an array of buckets which contains specific AMP #.
-
BYNET sends the data to the identified AMP.
-
AMP uses the 32 bit Row hash to locate the row within its disk.
-
If there is any record with same row hash, then it increments the uniqueness ID which is a 32 bit number. For new row hash, uniqueness ID is assigned as 1 and incremented whenever a record with same row hash is inserted.
-
The combination of Row hash and Uniqueness ID is called as Row ID.
-
Row ID prefixes each record in the disk.
-
Each table row in the AMP is logically sorted by their Row IDs.
How Tables are Stored
表格按其行 ID(行哈希 + 唯一性 ID)排序,然后存储在 AMP 中。行 ID 与每行数据一起存储。
Tables are sorted by their Row ID (Row hash + uniqueness id) and then stored within the AMPs. Row ID is stored with each data row.
Row Hash |
Uniqueness ID |
EmployeeNo |
FirstName |
LastName |
2A01 2611 |
0000 0001 |
101 |
Mike |
James |
2A01 2612 |
0000 0001 |
104 |
Alex |
Stuart |
2A01 2613 |
0000 0001 |
102 |
Robert |
Williams |
2A01 2614 |
0000 0001 |
105 |
Robert |
James |
2A01 2615 |
0000 0001 |
103 |
Peter |
Paul |