Teradata 简明教程

Teradata - Hashing Algorithm

行是根据主键值分配给特定 AMP 的。Teradata 使用哈希算法来确定哪一个 AMP 获取行。

A row is assigned to a particular AMP based on the primary index value. Teradata uses hashing algorithm to determine which AMP gets the row.

以下是有关哈希算法的高级图表。

Following is a high level diagram on hashing algorithm.

hashing algorithm

以下是插入数据的步骤。

Following are the steps to insert the data.

  1. The client submits a query.

  2. The parser receives the query and passes the PI value of the record to the hashing algorithm.

  3. The hashing algorithm hashes the primary index value and returns a 32 bit number, called Row Hash.

  4. The higher order bits of the row hash (first 16 bits) is used to identify the hash map entry. The hash map contains one AMP #. Hash map is an array of buckets which contains specific AMP #.

  5. BYNET sends the data to the identified AMP.

  6. AMP uses the 32 bit Row hash to locate the row within its disk.

  7. If there is any record with same row hash, then it increments the uniqueness ID which is a 32 bit number. For new row hash, uniqueness ID is assigned as 1 and incremented whenever a record with same row hash is inserted.

  8. The combination of Row hash and Uniqueness ID is called as Row ID.

  9. Row ID prefixes each record in the disk.

  10. Each table row in the AMP is logically sorted by their Row IDs.

How Tables are Stored

表格按其行 ID(行哈希 + 唯一性 ID)排序,然后存储在 AMP 中。行 ID 与每行数据一起存储。

Tables are sorted by their Row ID (Row hash + uniqueness id) and then stored within the AMPs. Row ID is stored with each data row.

Row Hash

Uniqueness ID

EmployeeNo

FirstName

LastName

2A01 2611

0000 0001

101

Mike

James

2A01 2612

0000 0001

104

Alex

Stuart

2A01 2613

0000 0001

102

Robert

Williams

2A01 2614

0000 0001

105

Robert

James

2A01 2615

0000 0001

103

Peter

Paul