Teradata 简明教程

Teradata - Quick Guide

Teradata - Introduction

What is Teradata?

Teradata 是流行的关系数据库管理系统之一。它主要适用于构建大型数据仓库应用程序。Teradata 通过并行性的概念实现了这一点。它是由 Teradata 公司开发的。

Teradata is one of the popular Relational Database Management System. It is mainly suitable for building large scale data warehousing applications. Teradata achieves this by the concept of parallelism. It is developed by the company called Teradata.

History of Teradata

以下是 Teradata 历史的快速摘要,列出了主要里程碑。

Following is a quick summary of the history of Teradata, listing major milestones.

  1. 1979 − Teradata was incorporated.

  2. 1984 − Release of first database computer DBC/1012.

  3. 1986 − Fortune magazine names Teradata as ‘Product of the Year’.

  4. 1999 − Largest database in the world using Teradata with 130 Terabytes.

  5. 2002 − Teradata V2R5 released with Partition Primary Index and compression.

  6. 2006 − Launch of Teradata Master Data Management solution.

  7. 2008 − Teradata 13.0 released with Active Data Warehousing.

  8. 2011 − Acquires Teradata Aster and enters into Advanced Analytics Space.

  9. 2012 − Teradata 14.0 introduced.

  10. 2014 − Teradata 15.0 introduced.

Features of Teradata

以下是 Teradata 的一些功能 -

Following are some of the features of Teradata −

  1. Unlimited Parallelism − Teradata database system is based on Massively Parallel Processing (MPP) Architecture. MPP architecture divides the workload evenly across the entire system. Teradata system splits the task among its processes and runs them in parallel to ensure that the task is completed quickly.

  2. Shared Nothing Architecture − Teradata’s architecture is called as Shared Nothing Architecture. Teradata Nodes, its Access Module Processors (AMPs) and the disks associated with AMPs work independently. They are not shared with others.

  3. Linear Scalability − Teradata systems are highly scalable. They can scale up to 2048 Nodes. For example, you can double the capacity of the system by doubling the number of AMPs.

  4. Connectivity − Teradata can connect to Channel-attached systems such as Mainframe or Network-attached systems.

  5. Mature Optimizer − Teradata optimizer is one of the matured optimizer in the market. It has been designed to be parallel since its beginning. It has been refined for each release.

  6. SQL − Teradata supports industry standard SQL to interact with the data stored in tables. In addition to this, it provides its own extension.

  7. Robust Utilities − Teradata provides robust utilities to import/export data from/to Teradata system such as FastLoad, MultiLoad, FastExport and TPT.

  8. Automatic Distribution − Teradata automatically distributes the data evenly to the disks without any manual intervention.

Teradata - Installation

Teradata 提供了适用于 VMWARE 的 Teradata Express,这是一个全面运行的 Teradata 虚拟机。它提供了高达 1 TB 的存储空间。Teradata 同时提供了 40 GB 和 1 TB 版本的 VMware。

Teradata provides Teradata express for VMWARE which is a fully operational Teradata virtual machine. It provides up to 1 terabyte of storage. Teradata provides both 40GB and 1TB version of VMware.

Prerequisites

由于 VM 是 64 位,因此你的 CPU 必须支持 64 位。

Since the VM is 64 bit, your CPU must support 64-bit.

Installation Steps for Windows

Step 1 − Download the required VM version from the link, https://downloads.teradata.com/download/database/teradata-express-for-vmware-player

Step 2 − 提取文件并指定目标文件夹。

Step 2 − Extract the file and specify the target folder.

Step 3 − 从以下链接下载 VMWare Workstation 播放器, https://my.vmware.com/web/vmware/downloads 。它适用于 Windows 和 Linux。下载适用于 Windows 的 VMWARE Workstation 播放器。

Step 3 − Download the VMWare Workstation player from the link, https://my.vmware.com/web/vmware/downloads. It is available for both Windows and Linux. Download the VMWARE workstation player for Windows.

vmware workstation player

Step 4 − 下载完成后,安装软件。

Step 4 − Once the download is complete, install the software.

Step 5 − 安装完成后,运行 VMWARE 客户端。

Step 5 − After the installation is complete, run the VMWARE client.

Step 6 − 选择“打开虚拟机”。浏览解压的 Teradata VMWare 文件夹并选择扩展名为 .vmdk 的文件。

Step 6 − Select 'Open a Virtual Machine'. Navigate through the extracted Teradata VMWare folder and select the file with extension .vmdk.

open virtual machine

Step 7 − Teradata VMWare 已添加到 VMWare 客户端。选择添加的 Teradata VMware 然后单击“播放虚拟机”。

Step 7 − Teradata VMWare is added to the VMWare client. Select the added Teradata VMware and click ‘Play Virtual Machine’.

play virtual machine

Step 8 − 如果软件更新中弹出窗口,您可以选择“稍后提醒我”。

Step 8 − If you get a popup on software updates, you can select ‘Remind Me Later’.

Step 9 − 输入用户名 root,按 tab 并输入密码 root,再次按 Enter。

Step 9 − Enter the user name as root, press tab and enter password as root and again press Enter.

welcome tdexpress

Step 10 − 桌面出现以下屏幕后,双击“root 的主页”。然后双击“Genome 终端”。这将打开 Shell。

Step 10 − Once the following screen appears on the desktop, double-click on ‘root’s home’. Then double-click on ‘Genome’s Terminal’. This will open the Shell.

open shell

Step 11 − 在以下 shell 中,输入命令 /etc/init.d/tpa start。这将启动 Teradata 服务器。

Step 11 − From the following shell, enter the command /etc/init.d/tpa start. This will start the Teradata server.

start teradata server

Starting BTEQ

BTEQ 实用程序用于交互方式提交 SQL 查询。以下是启动 BTEQ 实用程序的步骤。

BTEQ utility is used to submit SQL queries interactively. Following are the steps to start BTEQ utility.

Step 1 − 输入命令 /sbin/ifconfig 并记下 VMWare 的 IP 地址。

Step 1 − Enter the command /sbin/ifconfig and note down the IP address of the VMWare.

Step 2 − 运行命令 bteq。在登录提示符下,输入命令。

Step 2 − Run the command bteq. At the logon prompt, enter the command.

Logon <ipaddress>/dbc,dbc; 并在密码提示符下,输入密码 dbc;

Logon <ipaddress>/dbc,dbc; and enter At the password prompt, enter password as dbc;

start bteq

您可以使用 BTEQ 登录到 Teradata 系统并运行任何 SQL 查询。

You can log into Teradata system using BTEQ and run any SQL queries.

Teradata - Architecture

Teradata 架构基于大规模并行处理 (MPP) 架构。Teradata 的主要组件有解析引擎、BYNET 和访问模块处理器 (AMP)。下图显示了 Teradata 节点的高层次架构。

Teradata architecture is based on Massively Parallel Processing (MPP) architecture. The major components of Teradata are Parsing Engine, BYNET and Access Module Processors (AMPs). The following diagram shows the high level architecture of a Teradata Node.

teradata node architecture

Components of Teradata

Teradata 的关键组件如下所示 −

The key components of Teradata are as follows −

  1. Node − It is the basic unit in Teradata System. Each individual server in a Teradata system is referred as a Node. A node consists of its own operating system, CPU, memory, own copy of Teradata RDBMS software and disk space. A cabinet consists of one or more Nodes.

  2. Parsing Engine − Parsing Engine is responsible for receiving queries from the client and preparing an efficient execution plan. The responsibilities of parsing engine are − Receive the SQL query from the client Parse the SQL query check for syntax errors Check if the user has required privilege against the objects used in the SQL query Check if the objects used in the SQL actually exists Prepare the execution plan to execute the SQL query and pass it to BYNET Receives the results from the AMPs and send to the client

  3. Message Passing Layer − Message Passing Layer called as BYNET, is the networking layer in Teradata system. It allows the communication between PE and AMP and also between the nodes. It receives the execution plan from Parsing Engine and sends to AMP. Similarly, it receives the results from the AMPs and sends to Parsing Engine.

  4. Access Module Processor (AMP) − AMPs, called as Virtual Processors (vprocs) are the one that actually stores and retrieves the data. AMPs receive the data and execution plan from Parsing Engine, performs any data type conversion, aggregation, filter, sorting and stores the data in the disks associated with them. Records from the tables are evenly distributed among the AMPs in the system. Each AMP is associated with a set of disks on which data is stored. Only that AMP can read/write data from the disks.

Storage Architecture

当客户端运行查询以插入记录时,解析引擎会将这些记录发送给 BYNET。BYNET 检索记录并将行发送给目标 AMP。AMP 将这些记录存储在其磁盘上。下图显示了 Teradata 的存储架构。

When the client runs queries to insert records, Parsing engine sends the records to BYNET. BYNET retrieves the records and sends the row to the target AMP. AMP stores these records on its disks. Following diagram shows the storage architecture of Teradata.

storage architecture

Retrieval Architecture

当客户端运行查询以检索记录时,解析引擎会向 BYNET 发送一个请求。BYNET 将检索请求发送给适当的 AMP。然后,AMP 会并行搜索其磁盘并识别所需记录,然后发送给 BYNET。然后 BYNET 会将记录发送给解析引擎,解析引擎会将其发送给客户端。以下是 Teradata 的检索架构。

When the client runs queries to retrieve records, the Parsing engine sends a request to BYNET. BYNET sends the retrieval request to appropriate AMPs. Then AMPs search their disks in parallel and identify the required records and sends to BYNET. BYNET then sends the records to Parsing Engine which in turn will send to the client. Following is the retrieval architecture of Teradata.

retrieval architecture

Teradata - Relational Concepts

关系数据库管理系统 (RDBMS) 是一种 DBMS 软件,有助于与数据库交互。它们使用结构化查询语言 (SQL) 与存储在表中的数据交互。

Relational Database Management System (RDBMS) is a DBMS software that helps to interact with databases. They use Structured Query Language (SQL) to interact with the data stored in tables.

Database

数据库是逻辑相关数据的集合。许多用户出于不同的目的对其进行访问。例如,销售数据库包含存储在许多表中的整个销售信息。

Database is a collection of logically related data. They are accessed by many users for different purposes. For example, a sales database contains entire information about sales which is stored in many tables.

Tables

表是 RDBMS 中存储数据的基本单位。表是行和列的集合。以下是 employee 表的示例。

Tables is the basic unit in RDBMS where the data is stored. A table is a collection of rows and columns. Following is an example of employee table.

EmployeeNo

FirstName

LastName

BirthDate

101

Mike

James

1/5/1980

104

Alex

Stuart

11/6/1984

102

Robert

Williams

3/5/1983

105

Robert

James

12/1/1984

103

Peter

Paul

4/1/1983

Columns

一列中包含类似的数据。例如,Employee 表中的列 BirthDate 包含所有员工的 birth_date 信息。

A column contains similar data. For example, the column BirthDate in Employee table contains birth_date information for all employees.

BirthDate

1/5/1980

11/6/1984

3/5/1983

12/1/1984

4/1/1983

Row

行是所有列的一个实例。例如,在 employee 表中,一行包含关于单个员工的信息。

Row is one instance of all the columns. For example, in employee table one row contains information about single employee.

EmployeeNo

FirstName

LastName

BirthDate

101

Mike

James

1/5/1980

Primary Key

主键用于唯一标识表中的行。主键列中不允许重复的值,并且它们不能接受 NULL 值。它是表中的一个必填字段。

Primary key is used to uniquely identify a row in a table. No duplicate values are allowed in a primary key column and they cannot accept NULL values. It is a mandatory field in a table.

Foreign Key

外键用于建立表之间的关系。子表中的外键被定义为主表中的主键。一个表可以有多个外键。它可以接受重复值和空值。外键在表中是可选的。

Foreign keys are used to build a relationship between the tables. A foreign key in a child table is defined as the primary key in the parent table. A table can have more than one foreign key. It can accept duplicate values and also null values. Foreign keys are optional in a table.

Teradata - Data Types

表中的每一行都与数据类型相关联。数据类型指定将存储在该列中的值类型。Teradata 支持多种数据类型。以下是一些常用的数据类型。

Each column in a table is associated with a data type. Data types specify what kind of values will be stored in the column. Teradata supports several data types. Following are some of the frequently used data types.

Data Types

Length (Bytes)

Range of values

BYTEINT

1

-128 to +127

SMALLINT

2

-32768 to +32767

INTEGER

4

-2,147,483,648 to +2147,483,647

BIGINT

8

-9,233,372,036,854,775,80 8 to +9,233,372,036,854,775,8 07

DECIMAL

1-16

NUMERIC

1-16

FLOAT

8

IEEE format

CHAR

Fixed Format

1-64,000

VARCHAR

Variable

1-64,000

DATE

4

YYYYYMMDD

TIME

6 or 8

HHMMSS.nnnnnn or HHMMSS.nnnnnn+HHMM

TIMESTAMP

10 or 12

YYMMDDHHMMSS.nnnnnn or YYMMDDHHMMSS.nnnnnn +HHMM

Teradata - Tables

关系模型中的表定义为数据的集合。它们表示为行和列。

Tables in Relational model are defined as collection of data. They are represented as rows and columns.

Table Types

类型 Teradata 支持不同类型的表。

Types Teradata supports different types of tables.

  1. Permanent Table − This is the default table and it contains data inserted by the user and stores the data permanently.

  2. Volatile Table − The data inserted into a volatile table is retained only during the user session. The table and data is dropped at the end of the session. These tables are mainly used to hold the intermediate data during data transformation.

  3. Global Temporary Table − The definition of Global Temporary table are persistent but the data in the table is deleted at the end of user session.

  4. Derived Table − Derived table holds the intermediate results in a query. Their lifetime is within the query in which they are created, used and dropped.

Set Versus Multiset

Teradata 根据重复记录的处理方式将表分类为 SET 或 MULTISET 表。定义为 SET 表的表不存储重复记录,而 MULTISET 表则可以存储重复记录。

Teradata classifies the tables as SET or MULTISET tables based on how the duplicate records are handled. A table defined as SET table doesn’t store the duplicate records, whereas the MULTISET table can store duplicate records.

Sr.No

Table Commands & Description

1

Create TableCREATE TABLE command is used to create tables in Teradata.

2

Alter TableALTER TABLE command is used to add or drop columns from an existing table.

3

Drop TableDROP TABLE command is used to drop a table.

Teradata - Data Manipulation

本章介绍用于处理存储在 Teradata 表中的数据的 SQL 命令。

This chapter introduces the SQL commands used to manipulate the data stored in Teradata tables.

Insert Records

INSERT INTO 语句用于向表中插入记录。

INSERT INTO statement is used to insert records into the table.

Syntax

以下是 INSERT INTO 的通用语法。

Following is the generic syntax for INSERT INTO.

INSERT INTO <tablename>
(column1, column2, column3,…)
VALUES
(value1, value2, value3 …);

Example

以下示例向 employee 表中插入记录。

The following example inserts records into the employee table.

INSERT INTO Employee (
   EmployeeNo,
   FirstName,
   LastName,
   BirthDate,
   JoinedDate,
   DepartmentNo
)
VALUES (
   101,
   'Mike',
   'James',
   '1980-01-05',
   '2005-03-27',
   01
);

插入上述查询后,可以使用 SELECT 语句从表中查看记录。

Once the above query is inserted, you can use the SELECT statement to view the records from the table.

EmployeeNo

FirstName

LastName

JoinedDate

DepartmentNo

BirthDate

101

Mike

James

3/27/2005

1

1/5/1980

Insert from Another Table

INSERT SELECT 语句用于从另一个表中插入记录。

INSERT SELECT statement is used to insert records from another table.

Syntax

以下是 INSERT INTO 的通用语法。

Following is the generic syntax for INSERT INTO.

INSERT INTO <tablename>
(column1, column2, column3,…)
SELECT
column1, column2, column3…
FROM
<source table>;

Example

以下示例向 employee 表中插入记录。在运行以下插入查询之前,创建一个名为 Employee_Bkup 的表,其列定义与 employee 表相同。

The following example inserts records into the employee table. Create a table called Employee_Bkup with the same column definition as employee table before running the following insert query.

INSERT INTO Employee_Bkup (
   EmployeeNo,
   FirstName,
   LastName,
   BirthDate,
   JoinedDate,
   DepartmentNo
)
SELECT
   EmployeeNo,
   FirstName,
   LastName,
   BirthDate,
   JoinedDate,
   DepartmentNo
FROM
   Employee;

执行上述查询时,将把 employee 表中的所有记录插入到 employee_bkup 表中。

When the above query is executed, it will insert all records from the employee table into employee_bkup table.

Rules

  1. The number of columns specified in the VALUES list should match with the columns specified in the INSERT INTO clause.

  2. Values are mandatory for NOT NULL columns.

  3. If no values are specified, then NULL is inserted for nullable fields.

  4. The data types of columns specified in the VALUES clause should be compatible with the data types of columns in the INSERT clause.

Update Records

UPDATE 语句用于更新表中的记录。

UPDATE statement is used to update records from the table.

Syntax

以下是 UPDATE 的通用语法。

Following is the generic syntax for UPDATE.

UPDATE <tablename>
SET <columnnamme> = <new value>
[WHERE condition];

Example

下面的示例将雇员 101 的员工部门更新为 03。

The following example updates the employee dept to 03 for employee 101.

UPDATE Employee
SET DepartmentNo = 03
WHERE EmployeeNo = 101;

在下面的输出中,您可以看到雇员编号为 101 的部门编号从 1 更新为 3。

In the following output, you can see that the DepartmentNo is updated from 1 to 3 for EmployeeNo 101.

SELECT Employeeno, DepartmentNo FROM Employee;
*** Query completed. One row found. 2 columns returned.
*** Total elapsed time was 1 second.
EmployeeNo    DepartmentNo
-----------  -------------
   101             3

Rules

  1. You can update one or more values of the table.

  2. If WHERE condition is not specified then all rows of the table are impacted.

  3. You can update a table with the values from another table.

Delete Records

DELETE FROM 语句用于更新表中的记录。

DELETE FROM statement is used to update records from the table.

Syntax

以下是 DELETE FROM 的一般语法。

Following is the generic syntax for DELETE FROM.

DELETE FROM  <tablename>
[WHERE condition];

Example

下面的示例从表 employee 中删除雇员 101。

The following example deletes the employee 101 from the table employee.

DELETE FROM Employee
WHERE EmployeeNo = 101;

在下面的输出中,您可以看到雇员 101 已从表中删除。

In the following output, you can see that employee 101 is deleted from the table.

SELECT EmployeeNo FROM Employee;
*** Query completed. No rows found.
*** Total elapsed time was 1 second.

Rules

  1. You can update one or more records of the table.

  2. If WHERE condition is not specified then all rows of the table are deleted.

  3. You can update a table with the values from another table.

Teradata - SELECT Statement

SELECT 语句用于从表中检索记录。

SELECT statement is used to retrieve records from a table.

Syntax

以下是 SELECT 语句的基本语法。

Following is the basic syntax of SELECT statement.

SELECT
column 1, column 2, .....
FROM
tablename;

Example

考虑以下员工表。

Consider the following employee table.

EmployeeNo

FirstName

LastName

JoinedDate

DepartmentNo

BirthDate

101

Mike

James

3/27/2005

1

1/5/1980

102

Robert

Williams

4/25/2007

2

3/5/1983

103

Peter

Paul

3/21/2007

2

4/1/1983

104

Alex

Stuart

2/1/2008

2

11/6/1984

105

Robert

James

1/4/2008

3

12/1/1984

下面是 SELECT 语句的一个示例。

Following is an example of SELECT statement.

SELECT EmployeeNo,FirstName,LastName
FROM Employee;

执行此查询时,它会从 employee 表中提取 EmployeeNo、FirstName 和 LastName 列。

When this query is executed, it fetches EmployeeNo, FirstName and LastName columns from the employee table.

 EmployeeNo            FirstName                       LastName
-----------  ------------------------------  ---------------------------
   101                   Mike                            James
   104                   Alex                            Stuart
   102                   Robert                          Williams
   105                   Robert                          James
   103                   Peter                           Paul

如果您想提取表中的所有列,您可以使用以下命令,而不必列出所有列。

If you want to fetch all the columns from a table, you can use the following command instead of listing down all columns.

SELECT * FROM Employee;

上述查询将从 employee 表中提取所有记录。

The above query will fetch all records from the employee table.

WHERE Clause

WHERE 子句用于过滤 SELECT 语句返回的记录。一个条件与 WHERE 子句相关联。只有满足 WHERE 子句中条件的记录才会被返回。

WHERE clause is used to filter the records returned by the SELECT statement. A condition is associated with WHERE clause. Only, the records that satisfy the condition in the WHERE clause are returned.

Syntax

下面是带 WHERE 子句的 SELECT 语句的语法。

Following is the syntax of the SELECT statement with WHERE clause.

SELECT * FROM tablename
WHERE[condition];

Example

以下查询提取 EmployeeNo 为 101 的记录。

The following query fetches records where EmployeeNo is 101.

SELECT * FROM Employee
WHERE EmployeeNo = 101;

执行此查询时,它返回以下记录。

When this query is executed, it returns the following records.

 EmployeeNo          FirstName                      LastName
----------- ------------------------------ -----------------------------
   101                 Mike                           James

ORDER BY

执行 SELECT 语句时,返回的行没有任何特定顺序。ORDER BY 子句用于以升序/降序在任何列上排列记录。

When the SELECT statement is executed, the returned rows are not in any specific order. ORDER BY clause is used to arrange the records in ascending/descending order on any columns.

Syntax

以下是带 ORDER BY 子句的 SELECT 语句的语法。

Following is the syntax of the SELECT statement with ORDER BY clause.

SELECT * FROM tablename
ORDER BY column 1, column 2..;

Example

以下查询从 employee 表中提取记录,并按 FirstName 对结果进行排序。

The following query fetches records from the employee table and orders the results by FirstName.

SELECT * FROM Employee
ORDER BY FirstName;

执行以上查询后,将产生以下输出。

When the above query is executed, it produces the following output.

 EmployeeNo         FirstName                      LastName
----------- ------------------------------ -----------------------------
    104               Alex                           Stuart
    101               Mike                           James
    103               Peter                          Paul
    102               Robert                         Williams
    105               Robert                         James

GROUP BY

GROUP BY 子句与 SELECT 语句配合使用,并将类似的记录整理到组中。

GROUP BY clause is used with SELECT statement and arranges similar records into groups.

Syntax

以下是带有 GROUP BY 子句的 SELECT 语句的语法。

Following is the syntax of the SELECT statement with GROUP BY clause.

SELECT column 1, column2 …. FROM tablename
GROUP BY column 1, column 2..;

Example

以下示例按 DepartmentNo 列对记录进行分组,并识别每个部门的总数。

The following example groups the records by DepartmentNo column and identifies the total count from each department.

SELECT DepartmentNo,Count(*) FROM
Employee
GROUP BY DepartmentNo;

执行以上查询后,将产生以下输出。

When the above query is executed, it produces the following output.

 DepartmentNo    Count(*)
------------  -----------
     3             1
     1             1
     2             3

Teradata - Logical and Conditional Operators

Teradata 支持以下逻辑和条件运算符。这些运算符用于执行比较并组合多个条件。

Teradata supports the following logical and conditional operators. These operators are used to perform comparison and combine multiple conditions.

Syntax

Meaning

>

Greater than

<

Less than

>=

Greater than or equal to

Less than or equal to

=

Equal to

BETWEEN

If values within range

IN

If values in <expression>

NOT IN

If values not in <expression>

IS NULL

If value is NULL

IS NOT NULL

If value is NOT NULL

AND

Combine multiple conditions. Evaluates to true only if all conditions are met

OR

Combine multiple conditions. Evaluates to true only if either of the conditions is met.

NOT

Reverses the meaning of the condition

BETWEEN

BETWEEN 命令用于检查某个值是否在一个值范围内。

BETWEEN command is used to check if a value is within a range of values.

Example

考虑以下员工表。

Consider the following employee table.

EmployeeNo

FirstName

LastName

JoinedDate

DepartmentNo

BirthDate

101

Mike

James

3/27/2005

1

1/5/1980

102

Robert

Williams

4/25/2007

2

3/5/1983

103

Peter

Paul

3/21/2007

2

4/1/1983

104

Alex

Stuart

2/1/2008

2

11/6/1984

105

Robert

James

1/4/2008

3

12/1/1984

以下示例获取雇员编号在 101、102 和 103 之间的记录。

The following example fetches records with employee numbers in the range between 101,102 and 103.

SELECT EmployeeNo, FirstName FROM
Employee
WHERE EmployeeNo BETWEEN 101 AND 103;

在执行上述查询时,它返回雇员编号在 101 和 103 之间的雇员记录。

When the above query is executed, it returns the employee records with employee no between 101 and 103.

*** Query completed. 3 rows found. 2 columns returned.
*** Total elapsed time was 1 second.
 EmployeeNo            FirstName
-----------  ------------------------------
   101                   Mike
   102                   Robert
   103                   Peter

IN

IN 命令用于针对给定的值列表检查值。

IN command is used to check the value against a given list of values.

Example

以下示例获取雇员编号为 101、102 和 103 的记录。

The following example fetches records with employee numbers in 101, 102 and 103.

SELECT EmployeeNo, FirstName FROM
Employee
WHERE EmployeeNo in (101,102,103);

上述查询返回以下记录。

The above query returns the following records.

*** Query completed. 3 rows found. 2 columns returned.
*** Total elapsed time was 1 second.
 EmployeeNo            FirstName
-----------  ------------------------------
   101                   Mike
   102                   Robert
   103                   Peter

NOT IN

NOT IN 命令颠倒了 IN 命令的结果。它获取与给定列表不匹配的值的记录。

NOT IN command reverses the result of IN command. It fetches records with values that don’t match with the given list.

Example

以下示例提取了不在 101、102 和 103 中的员工编号记录。

The following example fetches records with employee numbers not in 101, 102 and 103.

SELECT * FROM
Employee
WHERE EmployeeNo not in (101,102,103);

上述查询返回以下记录。

The above query returns the following records.

*** Query completed. 2 rows found. 6 columns returned.
*** Total elapsed time was 1 second.
 EmployeeNo          FirstName                      LastName
----------- ------------------------------ -----------------------------
    104                Alex                          Stuart
    105                Robert                        James

Teradata - SET Operators

集合运算符结合了来自多个 SELECT 语句的结果。这可能看起来类似于联接,但联接合并了来自多个表的列,而集合运算符合并了来自多行中的行。

SET operators combine results from multiple SELECT statement. This may look similar to Joins, but joins combines columns from multiple tables whereas SET operators combines rows from multiple rows.

Rules

  1. The number of columns from each SELECT statement should be same.

  2. The data types from each SELECT must be compatible.

  3. ORDER BY should be included only in the final SELECT statement.

UNION

UNION 语句用于结合来自多个 SELECT 语句的结果。它忽略了重复项。

UNION statement is used to combine results from multiple SELECT statements. It ignores duplicates.

Syntax

以下是 UNION 语句的基本语法。

Following is the basic syntax of the UNION statement.

SELECT col1, col2, col3…
FROM
<table 1>
[WHERE condition]
UNION

SELECT col1, col2, col3…
FROM
<table 2>
[WHERE condition];

Example

考虑以下员工表和薪资表。

Consider the following employee table and salary table.

EmployeeNo

FirstName

LastName

JoinedDate

DepartmentNo

BirthDate

101

Mike

James

3/27/2005

1

1/5/1980

102

Robert

Williams

4/25/2007

2

3/5/1983

103

Peter

Paul

3/21/2007

2

4/1/1983

104

Alex

Stuart

2/1/2008

2

11/6/1984

105

Robert

James

1/4/2008

3

12/1/1984

EmployeeNo

Gross

Deduction

NetPay

101

40,000

4,000

36,000

102

80,000

6,000

74,000

103

90,000

7,000

83,000

104

75,000

5,000

70,000

以下 UNION 查询结合了 Employee 和 Salary 表中的 EmployeeNo 值。

The following UNION query combines the EmployeeNo value from both Employee and Salary table.

SELECT EmployeeNo
FROM
Employee
UNION

SELECT EmployeeNo
FROM
Salary;

执行查询后,会产生以下输出。

When the query is executed, it produces the following output.

EmployeeNo
-----------
   101
   102
   103
   104
   105

UNION ALL

UNION ALL 语句类似于 UNION,它结合了来自多个表的包括重复行在内的结果。

UNION ALL statement is similar to UNION, it combines results from multiple tables including duplicate rows.

Syntax

以下是 UNION ALL 语句的基本语法。

Following is the basic syntax of the UNION ALL statement.

SELECT col1, col2, col3…
FROM
<table 1>
[WHERE condition]
UNION ALL

SELECT col1, col2, col3…
FROM
<table 2>
[WHERE condition];

Example

以下是 UNION ALL 语句的一个示例。

Following is an example for UNION ALL statement.

SELECT EmployeeNo
FROM
Employee
UNION ALL

SELECT EmployeeNo
FROM
Salary;

执行以上查询后,会产生以下输出。你可以看到它还返回了重复项。

When the above query is executed, it produces the following output. You can see that it returns the duplicates also.

 EmployeeNo
-----------
    101
    104
    102
    105
    103
    101
    104
    102
    103

INTERSECT

INTERSECT 命令也用于结合来自多个 SELECT 语句的结果。它返回了在第二个 SELECT 语句中具有对应匹配项的第一个 SELECT 语句中的行。换句话说,它返回了同时存在于这两个 SELECT 语句中的行。

INTERSECT command is also used to combine results from multiple SELECT statements. It returns the rows from the first SELECT statement that has corresponding match in the second SELECT statements. In other words, it returns the rows that exist in both SELECT statements.

Syntax

以下是 INTERSECT 语句的基本语法。

Following is the basic syntax of the INTERSECT statement.

SELECT col1, col2, col3…
FROM
<table 1>
[WHERE condition]
INTERSECT

SELECT col1, col2, col3…
FROM
<table 2>
[WHERE condition];

Example

以下是 INTERSECT 语句的一个示例。它返回了同时存在于这两个表中的 EmployeeNo 值。

Following is an example of INTERSECT statement. It returns the EmployeeNo values that exist in both tables.

SELECT EmployeeNo
FROM
Employee
INTERSECT

SELECT EmployeeNo
FROM
Salary;

执行以上查询后,会返回以下记录。EmployeeNo 105 被排除在外,因为它不存在于 SALARY 表中。

When the above query is executed, it returns the following records. EmployeeNo 105 is excluded since it doesn’t exist in SALARY table.

EmployeeNo
-----------
   101
   104
   102
   103

MINUS/EXCEPT

MINUS/EXCEPT 命令结合了来自多个表的行,并返回存在于第一个 SELECT 中但不存在于第二个 SELECT 中的行。它们都返回相同的结果。

MINUS/EXCEPT commands combine rows from multiple tables and returns the rows which are in first SELECT but not in second SELECT. They both return the same results.

Syntax

以下是 MINUS 语句的基本语法。

Following is the basic syntax of the MINUS statement.

SELECT col1, col2, col3…
FROM
<table 1>
[WHERE condition]
MINUS

SELECT col1, col2, col3…
FROM
<table 2>
[WHERE condition];

Example

以下是 MINUS 语句的一个示例。

Following is an example of MINUS statement.

SELECT EmployeeNo
FROM
Employee
MINUS

SELECT EmployeeNo
FROM
Salary;

执行此查询时,它会返回以下记录。

When this query is executed, it returns the following record.

EmployeeNo
-----------
   105

Teradata - String Manipulation

Teradata 提供了多个函数来处理字符串。这些函数与 ANSI 标准兼容。

Teradata provides several functions to manipulate the strings. These functions are compatible with ANSI standard.

Sr.No

String Function & Description

1

*

*Concatenates strings together

2

*SUBSTR*Extracts a portion of a string (Teradata extension)

3

*SUBSTRING*Extracts a portion of a string (ANSI standard)

4

*INDEX*Locates the position of a character in a string (Teradata extension)

5

*POSITION*Locates the position of a character in a string (ANSI standard)

6

*TRIM*Trims blanks from a string

7

*UPPER*Converts a string to uppercase

8

*LOWER*Converts a string to lowercase

Example

下表列出了某些字符串函数及其结果。

Following table lists some of the string functions with the results.

String Function

Result

SELECT SUBSTRING(‘warehouse’ FROM 1 FOR 4)

ware

SELECT SUBSTR(‘warehouse’,1,4)

ware

SELECT ‘data’

‘ ‘

‘warehouse’

data warehouse

SELECT UPPER(‘data’)

DATA

SELECT LOWER(‘DATA’)

data

Teradata - Date/Time Functions

本章讨论 Teradata 中提供的日期/时间函数。

This chapter discusses the date/time functions available in Teradata.

Date Storage

日期使用以下公式在内部存储为整数。

Dates are stored as integer internally using the following formula.

((YEAR - 1900) * 10000) + (MONTH * 100) + DAY

您可以使用以下查询检查日期的存储方式。

You can use the following query to check how the dates are stored.

SELECT CAST(CURRENT_DATE AS INTEGER);

由于日期存储为整数,因此您可以在日期上执行一些算术运算。Teradata 提供可执行这些运算的函数。

Since the dates are stored as integer, you can perform some arithmetic operations on them. Teradata provides functions to perform these operations.

EXTRACT

EXTRACT 函数从 DATE 值中摘取日期、月份和年份的部分。该函数也用于从 TIME/TIMESTAMP 值中摘取小时、分钟和秒。

EXTRACT function extracts portions of day, month and year from a DATE value. This function is also used to extract hour, minute and second from TIME/TIMESTAMP value.

Example

以下示例显示如何从日期和时间戳值中摘取年份、月份、日期、小时、分钟和秒值。

Following examples show how to extract Year, Month, Date, Hour, Minute and second values from Date and Timestamp values.

SELECT EXTRACT(YEAR FROM CURRENT_DATE);
EXTRACT(YEAR FROM Date)
-----------------------
        2016
SELECT EXTRACT(MONTH FROM CURRENT_DATE);
EXTRACT(MONTH FROM Date)
------------------------
          1
SELECT EXTRACT(DAY FROM CURRENT_DATE);
EXTRACT(DAY FROM Date)
------------------------
          1

SELECT EXTRACT(HOUR FROM CURRENT_TIMESTAMP);
EXTRACT(HOUR FROM Current TimeStamp(6))
---------------------------------------
                 4
SELECT EXTRACT(MINUTE FROM CURRENT_TIMESTAMP);
EXTRACT(MINUTE FROM Current TimeStamp(6))
-----------------------------------------
                 54
SELECT EXTRACT(SECOND FROM CURRENT_TIMESTAMP);
EXTRACT(SECOND FROM Current TimeStamp(6))
-----------------------------------------
              27.140000

INTERVAL

Teradata 提供 INTERVAL 函数对 DATE 和 TIME 值执行算术运算。有两种类型的 INTERVAL 函数。

Teradata provides INTERVAL function to perform arithmetic operations on DATE and TIME values. There are two types of INTERVAL functions.

Year-Month Interval

  1. YEAR

  2. YEAR TO MONTH

  3. MONTH

Day-Time Interval

  1. DAY

  2. DAY TO HOUR

  3. DAY TO MINUTE

  4. DAY TO SECOND

  5. HOUR

  6. HOUR TO MINUTE

  7. HOUR TO SECOND

  8. MINUTE

  9. MINUTE TO SECOND

  10. SECOND

Example

以下示例将 3 年添加到当前日期。

The following example adds 3 years to current date.

SELECT CURRENT_DATE, CURRENT_DATE + INTERVAL '03' YEAR;
  Date    (Date+ 3)
--------  ---------
16/01/01   19/01/01

以下示例将 3 年和 01 个月添加到当前日期。

The following example adds 3 years and 01 month to current date.

SELECT CURRENT_DATE, CURRENT_DATE + INTERVAL '03-01' YEAR TO MONTH;
 Date     (Date+ 3-01)
--------  ------------
16/01/01    19/02/01

以下示例将 01 天、05 小时和 10 分钟添加到当前时间戳。

The following example adds 01 day, 05 hours and 10 minutes to current timestamp.

SELECT CURRENT_TIMESTAMP,CURRENT_TIMESTAMP + INTERVAL '01 05:10' DAY TO MINUTE;
     Current TimeStamp(6)         (Current TimeStamp(6)+ 1 05:10)
--------------------------------  --------------------------------
2016-01-01 04:57:26.360000+00:00  2016-01-02 10:07:26.360000+00:00

Teradata - Built-in Functions

Teradata 提供内置函数,这是对 SQL 的扩展。以下是常见的内置函数。

Teradata provides built-in functions which are extensions to SQL. Following are the common built-in functions.

Function

Result

SELECT DATE;

Date -------- 16/01/01

SELECT CURRENT_DATE;

Date -------- 16/01/01

SELECT TIME;

Time -------- 04:50:29

SELECT CURRENT_TIME;

Time -------- 04:50:29

SELECT CURRENT_TIMESTAMP;

Current TimeStamp(6) -------------------------------- 2016-01-01 04:51:06.990000+00:00

SELECT DATABASE;

Database ------------------------------ TDUSER

Teradata - Aggregate Functions

Teradata 支持常见的聚合函数。它们可以与 SELECT 语句一起使用。

Teradata supports common aggregate functions. They can be used with the SELECT statement.

  1. COUNT − Counts the rows

  2. SUM − Sums up the values of the specified column(s)

  3. MAX − Returns the large value of the specified column

  4. MIN − Returns the minimum value of the specified column

  5. AVG − Returns the average value of the specified column

Example

请考虑以下 Salary 表。

Consider the following Salary Table.

EmployeeNo

Gross

Deduction

NetPay

101

40,000

4,000

36,000

104

75,000

5,000

70,000

102

80,000

6,000

74,000

105

70,000

4,000

66,000

103

90,000

7,000

83,000

COUNT

以下示例计算 Salary 表中的记录数。

The following example counts the number of records in the Salary table.

SELECT count(*) from Salary;

  Count(*)
-----------
    5

MAX

以下示例返回最大员工净薪水值。

The following example returns maximum employee net salary value.

SELECT max(NetPay) from Salary;
   Maximum(NetPay)
---------------------
       83000

MIN

以下示例从 Salary 表中返回最低员工净薪水值。

The following example returns minimum employee net salary value from the Salary table.

SELECT min(NetPay) from Salary;

   Minimum(NetPay)
---------------------
        36000

AVG

以下示例从表中返回员工净薪水值的平均值。

The following example returns the average of employees net salary value from the table.

SELECT avg(NetPay) from Salary;

   Average(NetPay)
---------------------
       65800

SUM

以下示例计算 Salary 表所有记录中员工净薪水总和。

The following example calculates the sum of employees net salary from all records of the Salary table.

SELECT sum(NetPay) from Salary;

   Sum(NetPay)
-----------------
     329000

Teradata - CASE and COALESCE

本章介绍 Teradata 的 CASE 和 COALESCE 函数。

This chapter explains the CASE and COALESCE functions of Teradata.

CASE Expression

CASE 表达式针对条件或 WHEN 子句评估每一行,并返回第一次匹配的结果。如果没有匹配项,则返回 ELSE 部分的结果。

CASE expression evaluates each row against a condition or WHEN clause and returns the result of the first match. If there are no matches then the result from ELSE part of returned.

Syntax

以下是 CASE 表达式的语法。

Following is the syntax of the CASE expression.

CASE <expression>
WHEN <expression> THEN result-1
WHEN <expression> THEN result-2

ELSE
   Result-n
END

Example

考虑以下 Employee 表。

Consider the following Employee table.

EmployeeNo

FirstName

LastName

JoinedDate

DepartmentNo

BirthDate

101

Mike

James

3/27/2005

1

1/5/1980

102

Robert

Williams

4/25/2007

2

3/5/1983

103

Peter

Paul

3/21/2007

2

4/1/1983

104

Alex

Stuart

2/1/2008

2

11/6/1984

105

Robert

James

1/4/2008

3

12/1/1984

以下示例评估 DepartmentNo 列,如果部门编号为 1,则返回 1;如果部门编号为 3,则返回 2;否则,返回“无效部门”值。

The following example evaluates the DepartmentNo column and returns value of 1 if the department number is 1; returns 2 if the department number is 3; otherwise it returns value as invalid department.

SELECT
   EmployeeNo,
CASE DepartmentNo
   WHEN 1 THEN 'Admin'
   WHEN 2 THEN 'IT'
ELSE 'Invalid Dept'
   END AS Department
FROM Employee;

执行以上查询后,将产生以下输出。

When the above query is executed, it produces the following output.

*** Query completed. 5 rows found. 2 columns returned.
*** Total elapsed time was 1 second.
 EmployeeNo    Department
-----------   ------------
   101         Admin
   104         IT
   102         IT
   105         Invalid Dept
   103         IT

上面的 CASE 表达式也可以写成以下形式,产生的结果与上面相同。

The above CASE expression can also be written in the following form which will produce the same result as above.

SELECT
   EmployeeNo,
CASE
   WHEN DepartmentNo = 1 THEN 'Admin'
   WHEN  DepartmentNo = 2 THEN 'IT'
ELSE 'Invalid Dept'
   END AS Department
FROM Employee;

COALESCE

COALESCE 是一个语句,返回表达式中的第一个非空值。如果表达式的所有参数都计算为空值,则它返回 NULL。以下是语法。

COALESCE is a statement that returns the first non-null value of the expression. It returns NULL if all the arguments of the expression evaluates to NULL. Following is the syntax.

Syntax

COALESCE(expression 1, expression 2, ....)

Example

SELECT
   EmployeeNo,
   COALESCE(dept_no, 'Department not found')
FROM
   employee;

NULLIF

NULLIF 语句在参数相等时返回 NULL。

NULLIF statement returns NULL if the arguments are equal.

Syntax

以下是 NULLIF 语句的语法。

Following is the syntax of the NULLIF statement.

NULLIF(expression 1, expression 2)

Example

以下示例在 DepartmentNo 等于 3 时返回 NULL。否则,返回 DepartmentNo 值。

The following example returns NULL if the DepartmentNo is equal to 3. Otherwise, it returns the DepartmentNo value.

SELECT
   EmployeeNo,
   NULLIF(DepartmentNo,3) AS department
FROM Employee;

以上查询返回以下记录。您可以看到员工 105 的部门号为 NULL。

The above query returns the following records. You can see that employee 105 has department no. as NULL.

*** Query completed. 5 rows found. 2 columns returned.
*** Total elapsed time was 1 second.
 EmployeeNo      department
-----------  ------------------
    101              1
    104              2
    102              2
    105              ?
    103              2

Teradata - Primary Index

主键索引用于指定数据在 Teradata 中所在的位置。它用于指定哪个 AMP 获取数据行。Teradata 中的每个表格都需要定义一个主键索引。如果未定义主键索引,则 Teradata 会自动分配主键索引。主键索引提供了访问数据的最快速方式。主键最多可以有 64 列。

Primary index is used to specify where the data resides in Teradata. It is used to specify which AMP gets the data row. Each table in Teradata is required to have a primary index defined. If the primary index is not defined, Teradata automatically assigns the primary index. Primary index provides the fastest way to access the data. A primary may have a maximum of 64 columns.

主键索引在创建表格时定义。主键索引有 2 种类型。

Primary index is defined while creating a table. There are 2 types of Primary Indexes.

  1. Unique Primary Index(UPI)

  2. Non Unique Primary Index(NUPI)

Unique Primary Index (UPI)

如果将表格定义为具有 UPI,则被视为 UPI 的列不应具有任何重复值。如果插入任何重复值,它们将被拒绝。

If the table is defined to be having UPI, then the column deemed as UPI should not have any duplicate values. If any duplicate values are inserted, they will be rejected.

Create Unique Primary Index

以下示例创建 Salary 表格,其列 EmployeeNo 为唯一主键索引。

The following example creates the Salary table with column EmployeeNo as Unique Primary Index.

CREATE SET TABLE Salary (
   EmployeeNo INTEGER,
   Gross INTEGER,
   Deduction INTEGER,
   NetPay INTEGER
)
UNIQUE PRIMARY INDEX(EmployeeNo);

Non Unique Primary Index (NUPI)

如果将表格定义为具有 NUPI,则被视为 UPI 的列可以接受重复值。

If the table is defined to be having NUPI, then the column deemed as UPI can accept duplicate values.

Create Non Unique Primary Index

以下示例创建员工帐户表格,其列 EmployeeNo 为非唯一主键索引。EmployeeNo 定义为非唯一主键索引,因为一名员工可以在表格中拥有多个帐户:一个用于薪资帐户,另一个用于报销帐户。

The following example creates the employee accounts table with column EmployeeNo as Non Unique Primary Index. EmployeeNo is defined as Non Unique Primary Index since an employee can have multiple accounts in the table; one for salary account and another one for reimbursement account.

CREATE SET TABLE Employee _Accounts (
   EmployeeNo INTEGER,
   employee_bank_account_type BYTEINT.
   employee_bank_account_number INTEGER,
   employee_bank_name VARCHAR(30),
   employee_bank_city VARCHAR(30)
)
PRIMARY INDEX(EmployeeNo);

Teradata - Joins

Join 用于将来自多张表的数据记录组合在一起。表是根据这些表中公共栏位/值相连接的。

Join is used to combine records from more than one table. Tables are joined based on the common columns/values from these tables.

有不同类型的 Join 可用。

There are different types of Joins available.

  1. Inner Join

  2. Left Outer Join

  3. Right Outer Join

  4. Full Outer Join

  5. Self Join

  6. Cross Join

  7. Cartesian Production Join

INNER JOIN

Inner Join 组合来自多张表的数据记录,并返回同时存在于这两张表中的值。

Inner Join combines records from multiple tables and returns the values that exist in both the tables.

Syntax

以下是 INNER JOIN 语句的语法。

Following is the syntax of the INNER JOIN statement.

SELECT col1, col2, col3….
FROM
Table-1
INNER JOIN
Table-2
ON (col1 = col2)
<WHERE condition>;

Example

考虑以下员工表和薪资表。

Consider the following employee table and salary table.

EmployeeNo

FirstName

LastName

JoinedDate

DepartmentNo

BirthDate

101

Mike

James

3/27/2005

1

1/5/1980

102

Robert

Williams

4/25/2007

2

3/5/1983

103

Peter

Paul

3/21/2007

2

4/1/1983

104

Alex

Stuart

2/1/2008

2

11/6/1984

105

Robert

James

1/4/2008

3

12/1/1984

EmployeeNo

Gross

Deduction

NetPay

101

40,000

4,000

36,000

102

80,000

6,000

74,000

103

90,000

7,000

83,000

104

75,000

5,000

70,000

以下查询在公共栏位 EmployeeNo 上连接 Employee 表和 Salary 表。每个表都被赋予别名 A 和 B,并且栏位使用正确的别名进行引用。

The following query joins the Employee table and Salary table on the common column EmployeeNo. Each table is assigned an alias A & B and the columns are referenced with the correct alias.

SELECT A.EmployeeNo, A.DepartmentNo, B.NetPay
FROM
Employee A
INNER JOIN
Salary B
ON (A.EmployeeNo = B. EmployeeNo);

当执行以上查询时,将会返回以下记录。员工 105 未包含在结果中,因为它在 Salary 表中没有匹配的记录。

When the above query is executed, it returns the following records. Employee 105 is not included in the result since it doesn’t have matching records in the Salary table.

*** Query completed. 4 rows found. 3 columns returned.
*** Total elapsed time was 1 second.
 EmployeeNo   DepartmentNo     NetPay
-----------  ------------   -----------
    101           1            36000
    102           2            74000
    103           2            83000
    104           2            70000

OUTER JOIN

LEFT OUTER JOIN 和 RIGHT OUTER JOIN 也合并来自多张表的结果。

LEFT OUTER JOIN and RIGHT OUTER JOIN also combine the results from multiple table.

  1. LEFT OUTER JOIN returns all the records from the left table and returns only the matching records from the right table.

  2. RIGHT OUTER JOIN returns all the records from the right table and returns only matching rows from the left table.

  3. FULL OUTER JOIN combines the results from both LEFT OUTER and RIGHT OUTER JOINS. It returns both matching and non-matching rows from the joined tables.

Syntax

以下为 OUTER JOIN 语句的语法。你需要使用 LEFT OUTER JOIN、RIGHT OUTER JOIN 或 FULL OUTER JOIN 中的一个选项。

Following is the syntax of the OUTER JOIN statement. You need to use one of the options from LEFT OUTER JOIN, RIGHT OUTER JOIN or FULL OUTER JOIN.

SELECT col1, col2, col3….
FROM
Table-1
LEFT OUTER JOIN/RIGHT OUTER JOIN/FULL OUTER JOIN
Table-2
ON (col1 = col2)
<WHERE condition>;

Example

考虑以下 LEFT OUTER JOIN 查询示例。它返回 Employee 表中的所有记录以及 Salary 表中的匹配记录。

Consider the following example of the LEFT OUTER JOIN query. It returns all the records from Employee table and matching records from Salary table.

SELECT A.EmployeeNo, A.DepartmentNo, B.NetPay
FROM
Employee A
LEFT OUTER JOIN
Salary B
ON (A.EmployeeNo = B. EmployeeNo)
ORDER BY A.EmployeeNo;

当执行以上查询时,将生成以下输出。对于员工 105,NetPay 值为 NULL,因为它在 Salary 表中没有匹配的记录。

When the above query is executed, it produces the following output. For employee 105, NetPay value is NULL, since it doesn’t have matching records in Salary table.

*** Query completed. 5 rows found. 3 columns returned.
*** Total elapsed time was 1 second.
 EmployeeNo  DepartmentNo     NetPay
-----------  ------------   -----------
    101           1           36000
    102           2           74000
    103           2           83000
    104           2           70000
    105           3             ?

CROSS JOIN

Cross Join 将左表中的每行与右表中的每行进行连接。

Cross Join joins every row from the left table to every row from the right table.

Syntax

以下是 CROSS JOIN 语句的语法。

Following is the syntax of the CROSS JOIN statement.

SELECT A.EmployeeNo, A.DepartmentNo, B.EmployeeNo,B.NetPay
FROM
Employee A
CROSS JOIN
Salary B
WHERE A.EmployeeNo = 101
ORDER BY B.EmployeeNo;

执行上述查询时,将得到以下输出。将 Employee 表中的员工编号 101 与 Salary 表中的每一条记录联接。

When the above query is executed, it produces the following output. Employee No 101 from Employee table is joined with each and every record from Salary Table.

*** Query completed. 4 rows found. 4 columns returned.
*** Total elapsed time was 1 second.
 EmployeeNo  DepartmentNo   EmployeeNo    NetPay
-----------  ------------  -----------  -----------
    101           1            101         36000
    101           1            104         70000
    101           1            102         74000
    101           1            103         83000

Teradata - SubQueries

子查询基于其他表中的值从一个表中返回记录。它是另一个查询中的 SELECT 查询。称为内部查询的 SELECT 查询首先执行,而结果则由外部查询使用。其一些显著特征:

A subquery returns records from one table based on the values from another table. It is a SELECT query within another query. The SELECT query called as inner query is executed first and the result is used by the outer query. Some of its salient features are −

  1. A query can have multiple subqueries and subqueries may contain another subquery.

  2. Subqueries doesn’t return duplicate records.

  3. If subquery returns only one value, you can use = operator to use it with the outer query. If it returns multiple values you can use IN or NOT IN.

Syntax

以下是子查询的通用语法。

Following is the generic syntax of subqueries.

SELECT col1, col2, col3,…
FROM
Outer Table
WHERE col1 OPERATOR ( Inner SELECT Query);

Example

考虑以下 Salary 表格。

Consider the following Salary table.

EmployeeNo

Gross

Deduction

NetPay

101

40,000

4,000

36,000

102

80,000

6,000

74,000

103

90,000

7,000

83,000

104

75,000

5,000

70,000

以下查询标识出具有最高工资的员工编号。内部 SELECT 执行聚合函数以返回最大的 NetPay 值,而外部 SELECT 查询使用此值返回具有此值的员工记录。

The following query identifies the employee number with highest salary. The inner SELECT performs the aggregation function to return the maximum NetPay value and the outer SELECT query uses this value to return the employee record with this value.

SELECT EmployeeNo, NetPay
FROM Salary
WHERE NetPay =
(SELECT MAX(NetPay)
FROM Salary);

执行此查询时,将得到以下输出。

When this query is executed, it produces the following output.

*** Query completed. One row found. 2 columns returned.
*** Total elapsed time was 1 second.
 EmployeeNo     NetPay
-----------  -----------
    103         83000

Teradata - Table Types

Teradata 支持以下表格类型来保存临时数据。

Teradata supports the following table types to hold temporary data.

  1. Derived Table

  2. Volatile Table

  3. Global Temporary Table

Derived Table

导出表在查询中创建、使用和删除。它们用于在查询中存储中间结果。

Derived tables are created, used and dropped within a query. These are used to store intermediate results within a query.

Example

以下示例使用工资超过 75000 的员工记录构建了导出表 EmpSal。

The following example builds a derived table EmpSal with records of employees with salary greater than 75000.

SELECT
Emp.EmployeeNo,
Emp.FirstName,
Empsal.NetPay
FROM
Employee Emp,
(select EmployeeNo , NetPay
from Salary
where NetPay >= 75000) Empsal
where Emp.EmployeeNo = Empsal.EmployeeNo;

当执行以上查询时,它将返回工资超过 75000 的员工。

When the above query is executed, it returns the employees with salary greater than 75000.

*** Query completed. One row found. 3 columns returned.
*** Total elapsed time was 1 second.
 EmployeeNo            FirstName               NetPay
-----------  ------------------------------  -----------
    103                  Peter                 83000

Volatile Table

易失表在用户会话中创建、使用和删除。它们的定义未存储在数据字典中。它们保存查询中经常使用的中间数据。以下是语法。

Volatile tables are created, used and dropped within a user session. Their definition is not stored in data dictionary. They hold intermediate data of the query which is frequently used. Following is the syntax.

Syntax

CREATE [SET|MULTISET] VOALTILE TABLE tablename
<table definitions>
<column definitions>
<index definitions>
ON COMMIT [DELETE|PRESERVE] ROWS

Example

CREATE VOLATILE TABLE dept_stat (
   dept_no INTEGER,
   avg_salary INTEGER,
   max_salary INTEGER,
   min_salary INTEGER
)
PRIMARY INDEX(dept_no)
ON COMMIT PRESERVE ROWS;

执行以上查询后,将产生以下输出。

When the above query is executed, it produces the following output.

*** Table has been created.
*** Total elapsed time was 1 second.

Global Temporary Table

全局临时表的定义存储在数据字典中,并且可以在许多用户/会话中使用它们。但加载到全局临时表中的数据仅在会话期间保留。每个会话最多可以实现 2000 个全局临时表。以下是语法。

The definition of Global Temporary table is stored in data dictionary and they can be used by many users/sessions. But the data loaded into global temporary table is retained only during the session. You can materialize up to 2000 global temporary tables per session. Following is the syntax.

Syntax

CREATE [SET|MULTISET] GLOBAL TEMPORARY TABLE tablename
<table definitions>
<column definitions>
<index definitions>

Example

CREATE SET GLOBAL TEMPORARY TABLE dept_stat (
   dept_no INTEGER,
   avg_salary INTEGER,
   max_salary INTEGER,
   min_salary INTEGER
)
PRIMARY INDEX(dept_no);

执行以上查询后,将产生以下输出。

When the above query is executed, it produces the following output.

*** Table has been created.
*** Total elapsed time was 1 second.

Teradata - Space Concepts

Teradata 中有三种类型的空间可用。

There are three types of spaces available in Teradata.

Permanent Space

永久空间是用户/数据库可用于保存数据行的最大可用空间。永久表、日记、回退表和辅助索引子表使用永久空间。

Permanent space is the maximum amount of space available for the user/database to hold data rows. Permanent tables, journals, fallback tables and secondary index sub-tables use permanent space.

永久空间不会预先分配给数据库/用户。它们仅被定义为数据库/用户可以使用的最大空间量。永久空间的数量将除以 AMP 的数量。每当 AMP 限制超过时,就会生成一个错误消息。

Permanent space is not pre-allocated for the database/user. They are just defined as the maximum amount of space the database/user can use. The amount of permanent space is divided by the number of AMPs. Whenever per AMP limit exceeds, an error message is generated.

Spool Space

池空间是未使用的永久空间,系统使用它来保留 SQL 查询的中间结果。没有池空间的用户无法执行任何查询。

Spool space is the unused permanent space which is used by the system to keep the intermediate results of the SQL query. Users without spool space cannot execute any query.

类似永久空间,池空间定义用户可以使用的最大空间量。池空间将除以 AMP 的数量。每当 AMP 限制超过时,用户都会收到池空间错误。

Similar to Permanent space, spool space defines the maximum amount of space the user can use. Spool space is divided by the number of AMPs. Whenever per AMP limit exceeds, the user will get a spool space error.

Temp Space

临时空间是未使用的永久空间,由全局临时表使用。临时空间也会除以 AMP 的数量。

Temp space is the unused permanent space which is used by Global Temporary tables. Temp space is also divided by the number of AMPs.

Teradata - Secondary Index

表只能包含一个主键索引。更常见的是,您将遇到表包含其他列的情况,其中使用经常访问数据。Teradata 将对这些查询执行全表扫描。辅助索引解决了此问题。

A table can contain only one primary index. More often, you will come across scenarios where the table contains other columns, using which the data is frequently accessed. Teradata will perform full table scan for those queries. Secondary indexes resolve this issue.

辅助索引是访问数据的替代路径。主键索引与辅助索引之间存在一些差异。

Secondary indexes are an alternate path to access the data. There are some differences between the primary index and the secondary index.

  1. Secondary index is not involved in data distribution.

  2. Secondary index values are stored in sub tables. These tables are built in all AMPs.

  3. Secondary indexes are optional.

  4. They can be created during table creation or after a table is created.

  5. They occupy additional space since they build sub-table and they also require maintenance since the sub-tables need to be updated for each new row.

有两种类型的二级索引 -

There are two types of secondary indexes −

  1. Unique Secondary Index (USI)

  2. Non-Unique Secondary Index (NUSI)

Unique Secondary Index (USI)

唯一二级索引仅允许定义为 USI 的列的唯一值。通过 USI 访问行是两 amp 操作。

A Unique Secondary Index allows only unique values for the columns defined as USI. Accessing the row by USI is a two amp operation.

Create Unique Secondary Index

以下示例在 employee 表的 EmployeeNo 列上创建 USI。

The following example creates USI on EmployeeNo column of employee table.

CREATE UNIQUE INDEX(EmployeeNo) on employee;

Non Unique Secondary Index (NUSI)

非唯一二级索引允许定义为 NUSI 的列的重复值。通过 NUSI 访问行是全部-amp 操作。

A Non-Unique Secondary Index allows duplicate values for the columns defined as NUSI. Accessing the row by NUSI is all-amp operation.

Create Non Unique Secondary Index

以下示例在 employee 表的 FirstName 上创建 NUSI。

The following example creates NUSI on FirstName column of employee table.

CREATE INDEX(FirstName) on Employee;

Teradata - Statistics

Teradata 优化器会针对每个 SQL 查询提出执行策略。此执行策略基于在 SQL 查询所用表中收集的统计信息。使用 COLLECT STATISTICS 命令收集表的统计信息。优化器需要环境信息和数据人口统计信息来提出最佳执行策略。

Teradata optimizer comes up with an execution strategy for every SQL query. This execution strategy is based on the statistics collected on the tables used within the SQL query. Statistics on the table is collected using COLLECT STATISTICS command. Optimizer requires environment information and data demographics to come up with optimal execution strategy.

Environment Information

  1. Number of Nodes, AMPs and CPUs

  2. Amount of memory

Data Demographics

  1. Number of rows

  2. Row size

  3. Range of values in the table

  4. Number of rows per value

  5. Number of Nulls

有三种方法可用于收集表中的统计信息。

There are three approaches to collect statistics on the table.

  1. Random AMP Sampling

  2. Full statistics collection

  3. Using SAMPLE option

Collecting Statistics

COLLECT STATISTICS 命令用于收集表中的统计信息。

COLLECT STATISTICS command is used to collect statistics on a table.

Syntax

以下是收集表中的统计信息的语法。

Following is the basic syntax to collect statistics on a table.

COLLECT [SUMMARY] STATISTICS
INDEX (indexname) COLUMN (columnname)
ON <tablename>;

Example

以下示例收集 Employee 表的 EmployeeNo 列中的统计信息。

The following example collects statistics on EmployeeNo column of Employee table.

COLLECT STATISTICS COLUMN(EmployeeNo) ON Employee;

执行以上查询后,将产生以下输出。

When the above query is executed, it produces the following output.

*** Update completed. 2 rows changed.
*** Total elapsed time was 1 second.

Viewing Statistics

您可以使用 HELP STATISTICS 命令查看收集到的统计信息。

You can view the collected statistics using HELP STATISTICS command.

Syntax

以下是查看所收集统计信息的语法。

Following is the syntax to view the statistics collected.

HELP STATISTICS <tablename>;

Example

以下是查看 Employee 表中收集到的统计信息的示例。

Following is an example to view the statistics collected on Employee table.

HELP STATISTICS employee;

当执行上面的查询时,会生成以下结果。

When the above query is executed, it produces the following result.

  Date       Time      Unique Values           Column Names
--------   -------- -------------------- -----------------------
16/01/01   08:07:04         5                       *
16/01/01   07:24:16         3                   DepartmentNo
16/01/01   08:07:04         5                   EmployeeNo

Teradata - Compression

压缩用于减少表格使用的存储空间。在 Teradata 中,压缩可以压缩 255 个不同的值,包括 NULL。由于存储减少,因此 Teradata 可以在一个块中存储更多的记录。由于每个 I/O 操作可以处理每个块中的更多行,因此这将改善查询响应时间。可以在创建表格时使用 CREATE TABLE 添加压缩,或在创建表格后使用 ALTER TABLE 命令添加压缩。

Compression is used to reduce the storage used by the tables. In Teradata, compression can compress up to 255 distinct values including NULL. Since the storage is reduced, Teradata can store more records in a block. This results in improved query response time since any I/O operation can process more rows per block. Compression can be added at table creation using CREATE TABLE or after table creation using ALTER TABLE command.

Limitations

  1. Only 255 values can be compressed per column.

  2. Primary Index column cannot be compressed.

  3. Volatile tables cannot be compressed.

Multi-Value Compression (MVC)

下表将字段 DepatmentNo 压缩为值 1、2 和 3。当压缩应用于某一列时,该列的值不会与该行存储在一起。而是这些值存储在每个 AMP 的表头中,并且只添加表示值到行中的存在位。

The following table compresses the field DepatmentNo for values 1, 2 and 3. When compression is applied on a column, the values for this column is not stored with the row. Instead the values are stored in the Table header in each AMP and only presence bits are added to the row to indicate the value.

CREATE SET TABLE employee (
   EmployeeNo integer,
   FirstName CHAR(30),
   LastName CHAR(30),
   BirthDate DATE FORMAT 'YYYY-MM-DD-',
   JoinedDate DATE FORMAT 'YYYY-MM-DD-',
   employee_gender CHAR(1),
   DepartmentNo CHAR(02) COMPRESS(1,2,3)
)
UNIQUE PRIMARY INDEX(EmployeeNo);

多值压缩可用于处理包含有限值的大表中的列。

Multi-Value compression can be used when you have a column in a large table with finite values.

Teradata - Explain

EXPLAIN 命令以英语返回解析引擎的执行计划。它可以与任何 SQL 语句一起使用,但不能与另一个 EXPLAIN 命令一起使用。当查询以 EXPLAIN 命令开头时,解析引擎的执行计划将返回给用户,而不是 AMP。

EXPLAIN command returns the execution plan of parsing engine in English. It can be used with any SQL statement except on another EXPLAIN command. When a query is preceded with EXPLAIN command, the execution plan of the Parsing Engine is returned to the user instead of AMPs.

Examples of EXPLAIN

考虑以下定义的 Employee 表。

Consider the table Employee with the following definition.

CREATE SET TABLE EMPLOYEE,FALLBACK (
   EmployeeNo INTEGER,
   FirstName VARCHAR(30),
   LastName VARCHAR(30),
   DOB DATE FORMAT 'YYYY-MM-DD',
   JoinedDate DATE FORMAT 'YYYY-MM-DD',
   DepartmentNo BYTEINT
)
UNIQUE PRIMARY INDEX ( EmployeeNo );

下面给出了一些 EXPLAIN 计划示例。

Some examples of EXPLAIN plan are given below.

Full Table Scan (FTS)

当 SELECT 语句中未指定条件时,优化器可能会选择使用全表扫描,在该扫描中将访问表中的每一行。

When no conditions are specified in the SELECT statement, then the optimizer may choose to use Full Table Scan where each and every row of the table is accessed.

Example

以下是优化器可能选择 FTS 的示例查询。

Following is a sample query where the optimizer may choose FTS.

EXPLAIN SELECT * FROM employee;

当执行以上查询时,它将生成以下输出。可以看出,优化器选择访问所有 AMP 和 AMP 中的所有行。

When the above query is executed, it produces the following output. As can be seen the optimizer chooses to access all AMPs and all rows within the AMP.

1) First, we lock a distinct TDUSER."pseudo table" for read on a
   RowHash to prevent global deadlock for TDUSER.employee.
2) Next, we lock TDUSER.employee for read.
3) We do an all-AMPs RETRIEVE step from TDUSER.employee by way of an
   all-rows scan with no residual conditions into Spool 1
   (group_amps), which is built locally on the AMPs.  The size of
   Spool 1 is estimated with low confidence to be 2 rows (116 bytes).
   The estimated time for this step is 0.03 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
   in processing the request.
→ The contents of Spool 1 are sent back to the user as the result of
   statement 1.  The total estimated time is 0.03 seconds.

Unique Primary Index

当使用唯一的主索引访问行时,则是一个 AMP 操作。

When the rows are accessed using Unique Primary Index, then it is one AMP operation.

EXPLAIN SELECT * FROM employee WHERE EmployeeNo = 101;

当执行以上查询时,它将生成以下输出。可以看到,这是一个单 AMP 检索,并且优化器正在使用唯一的主索引来访问行。

When the above query is executed, it produces the following output. As can be seen it is a single-AMP retrieval and the optimizer is using the unique primary index to access the row.

1) First, we do a single-AMP RETRIEVE step from TDUSER.employee by
   way of the unique primary index "TDUSER.employee.EmployeeNo = 101"
   with no residual conditions. The estimated time for this step is
   0.01 seconds.
→ The row is sent directly back to the user as the result of
   statement 1.  The total estimated time is 0.01 seconds.

Unique Secondary Index

当使用唯一辅助索引访问行时,这是一个两安操作。

When the rows are accessed using Unique Secondary Index, it’s a two amp operation.

Example

查看如下定义的表“薪资”。

Consider the table Salary with the following definition.

CREATE SET TABLE SALARY,FALLBACK (
   EmployeeNo INTEGER,
   Gross INTEGER,
   Deduction INTEGER,
   NetPay INTEGER
)
PRIMARY INDEX ( EmployeeNo )
UNIQUE INDEX (EmployeeNo);

查看以下 SELECT 语句。

Consider the following SELECT statement.

EXPLAIN SELECT * FROM Salary WHERE EmployeeNo = 101;

当执行上述查询时,会生成以下输出。正如可以看到的那样,优化器使用唯一的辅助索引在两安操作中检索行。

When the above query is executed, it produces the following output. As can be seen the optimizer retrieves the row in two amp operation using unique secondary index.

1) First, we do a two-AMP RETRIEVE step from TDUSER.Salary
   by way of unique index # 4 "TDUSER.Salary.EmployeeNo =
   101" with no residual conditions.  The estimated time for this
   step is 0.01 seconds.
→ The row is sent directly back to the user as the result of
   statement 1.  The total estimated time is 0.01 seconds.

Additional Terms

以下是 EXPLAIN 计划中常见的术语列表。

Following is the list of terms commonly seen in EXPLAIN plan.

…​ (Last Use) …

…​ (Last Use) …

不再需要暂存文件,并且在完成此步骤时将释放该文件。

A spool file is no longer needed and will be released when this step completes.

…​ with no residual conditions …

…​ with no residual conditions …

所有适用的条件都已应用于行。

All applicable conditions have been applied to the rows.

…​ END TRANSACTION …

…​ END TRANSACTION …

释放事务锁,并且提交更改。

Transaction locks are released, and changes are committed.

…​ eliminating duplicate rows …​

…​ eliminating duplicate rows …​

仅在暂存文件中存在重复行,不存在集合表中。执行 DISTINCT 操作。

Duplicate rows only exist in spool files, not set tables. Doing a DISTINCT operation.

…​ by way of a traversal of index #n extracting row ids only …

…​ by way of a traversal of index #n extracting row ids only …

构建一个包含在辅助索引(索引 #n)中找到的行 ID 的暂存文件。

A spool file is built containing the Row IDs found in a secondary index (index #n)

…​ we do a SMS (set manipulation step) …

…​ we do a SMS (set manipulation step) …

使用 UNION、MINUS 或 INTERSECT 运算符组合行。

Combining rows using a UNION, MINUS, or INTERSECT operator.

…​ which is redistributed by hash code to all AMPs.

…​ which is redistributed by hash code to all AMPs.

在准备联接时重新分发数据。

Redistributing data in preparation for a join.

…​ which is duplicated on all AMPs.

…​ which is duplicated on all AMPs.

复制更小表(根据 SPOOL)中数据以准备进行联接。

Duplicating data from the smaller table (in terms of SPOOL) in preparation for a join.

…​ (one_AMP) or (group_AMPs)

…​ (one_AMP) or (group_AMPs)

指明将使用一个 AMP 或 AMP 子集而不是全部 AMP。

Indicates one AMP or subset of AMPs will be used instead of all AMPs.

Teradata - Hashing Algorithm

行是根据主键值分配给特定 AMP 的。Teradata 使用哈希算法来确定哪一个 AMP 获取行。

A row is assigned to a particular AMP based on the primary index value. Teradata uses hashing algorithm to determine which AMP gets the row.

以下是有关哈希算法的高级图表。

Following is a high level diagram on hashing algorithm.

hashing algorithm

以下是插入数据的步骤。

Following are the steps to insert the data.

  1. The client submits a query.

  2. The parser receives the query and passes the PI value of the record to the hashing algorithm.

  3. The hashing algorithm hashes the primary index value and returns a 32 bit number, called Row Hash.

  4. The higher order bits of the row hash (first 16 bits) is used to identify the hash map entry. The hash map contains one AMP #. Hash map is an array of buckets which contains specific AMP #.

  5. BYNET sends the data to the identified AMP.

  6. AMP uses the 32 bit Row hash to locate the row within its disk.

  7. If there is any record with same row hash, then it increments the uniqueness ID which is a 32 bit number. For new row hash, uniqueness ID is assigned as 1 and incremented whenever a record with same row hash is inserted.

  8. The combination of Row hash and Uniqueness ID is called as Row ID.

  9. Row ID prefixes each record in the disk.

  10. Each table row in the AMP is logically sorted by their Row IDs.

How Tables are Stored

表格按其行 ID(行哈希 + 唯一性 ID)排序,然后存储在 AMP 中。行 ID 与每行数据一起存储。

Tables are sorted by their Row ID (Row hash + uniqueness id) and then stored within the AMPs. Row ID is stored with each data row.

Row Hash

Uniqueness ID

EmployeeNo

FirstName

LastName

2A01 2611

0000 0001

101

Mike

James

2A01 2612

0000 0001

104

Alex

Stuart

2A01 2613

0000 0001

102

Robert

Williams

2A01 2614

0000 0001

105

Robert

James

2A01 2615

0000 0001

103

Peter

Paul

Teradata - JOIN Index

JOIN INDEX 是物化视图。其定义永久存储,每当关联索引中引用的基础表更新时数据都会更新。JOIN INDEX 可能包含一个或多个表,并且还包含预聚合数据。连接索引主要用于提高性能。

JOIN INDEX is a materialized view. Its definition is permanently stored and the data is updated whenever the base tables referred in the join index is updated. JOIN INDEX may contain one or more tables and also contain pre-aggregated data. Join indexes are mainly used for improving the performance.

有不同类型的 join 索引可用。

There are different types of join indexes available.

  1. Single Table Join Index (STJI)

  2. Multi Table Join Index (MTJI)

  3. Aggregated Join Index (AJI)

Single Table Join Index

单表连接索引允许基于不同于基础表的主索引列对大表进行分区。

Single Table Join index allows to partition a large table based on the different primary index columns than the one from the base table.

Syntax

以下是 JOIN INDEX 的语法。

Following is the syntax of a JOIN INDEX.

CREATE JOIN INDEX <index name>
AS
<SELECT Query>
<Index Definition>;

Example

考虑以下 Employee 和 Salary 表。

Consider the following Employee and Salary tables.

CREATE SET TABLE EMPLOYEE,FALLBACK (
   EmployeeNo INTEGER,
   FirstName VARCHAR(30) ,
   LastName VARCHAR(30) ,
   DOB DATE FORMAT 'YYYY-MM-DD',
   JoinedDate DATE FORMAT 'YYYY-MM-DD',
   DepartmentNo BYTEINT
)
UNIQUE PRIMARY INDEX ( EmployeeNo );

CREATE SET TABLE SALARY,FALLBACK (
   EmployeeNo INTEGER,
   Gross INTEGER,
   Deduction INTEGER,
   NetPay INTEGER
)
PRIMARY INDEX ( EmployeeNo )
UNIQUE INDEX (EmployeeNo);

以下是一个示例,在 Employee 表上创建名为 Employee_JI 的连接索引。

Following is an example that creates a Join index named Employee_JI on Employee table.

CREATE JOIN INDEX Employee_JI
AS
SELECT EmployeeNo,FirstName,LastName,
BirthDate,JoinedDate,DepartmentNo
FROM Employee
PRIMARY INDEX(FirstName);

如果用户提交一个带有 EmployeeNo 的 WHERE 子句的查询,那么系统将使用唯一主索引对 Employee 表进行查询。如果用户使用 employee_name 对 employee 表进行查询,那么系统可能会使用 employee_name 访问关联索引 Employee_JI。连接索引的行在 employee_name 列上进行哈希处理。如果没有定义连接索引,并且 employee_name 未定义为二级索引,那么系统将执行全表扫描以访问耗时的行。

If the user submits a query with a WHERE clause on EmployeeNo, then the system will query the Employee table using the unique primary index. If the user queries the employee table using employee_name, then the system may access the join index Employee_JI using employee_name. The rows of the join index are hashed on employee_name column. If the join index is not defined and the employee_name is not defined as secondary index, then the system will perform full table scan to access the rows which is time consuming.

您可以运行以下 EXPLAIN 计划并验证优化器计划。在以下示例中,您会看到优化器在使用 Employee_Name 列查询表时使用连接索引而不是基础 Employee 表。

You can run the following EXPLAIN plan and verify the optimizer plan. In the following example you can see that the optimizer is using the Join Index instead of base Employee table when the table queries using the Employee_Name column.

EXPLAIN SELECT * FROM EMPLOYEE WHERE FirstName='Mike';
*** Help information returned. 8 rows.
*** Total elapsed time was 1 second.
Explanation
------------------------------------------------------------------------
   1) First, we do a single-AMP RETRIEVE step from EMPLOYEE_JI by
      way of the primary index "EMPLOYEE_JI.FirstName = 'Mike'"
      with no residual conditions into Spool 1 (one-amp), which is built
      locally on that AMP.  The size of Spool 1 is estimated with low
      confidence to be 2 rows (232 bytes).  The estimated time for this
      step is 0.02 seconds.
   → The contents of Spool 1 are sent back to the user as the result of
      statement 1.  The total estimated time is 0.02 seconds.

Multi Table Join Index

多表连接索引通过连接多张表来创建。多表连接索引可用于存储常用连接表的 result 集,以提高性能。

A multi-table join index is created by joining more than one table. Multi-table join index can be used to store the result set of frequently joined tables to improve the performance.

Example

以下示例通过连接 Employee 和 Salary 表创建名为 Employee_Salary_JI 的 JOIN INDEX。

The following example creates a JOIN INDEX named Employee_Salary_JI by joining Employee and Salary tables.

CREATE JOIN INDEX Employee_Salary_JI
AS
SELECT a.EmployeeNo,a.FirstName,a.LastName,
a.BirthDate,a.JoinedDate,a.DepartmentNo,b.Gross,b.Deduction,b.NetPay
FROM Employee a
INNER JOIN Salary b
ON(a.EmployeeNo = b.EmployeeNo)
PRIMARY INDEX(FirstName);

每当更新基础表 Employee 或 Salary 时,连接索引 Employee_Salary_JI 也会自动更新。如果您正在运行一个连接 Employee 和 Salary 表的查询,那么优化器可能会选择直接从 Employee_Salary_JI 访问数据,而不是连接表。可以对查询使用 EXPLAIN 计划来验证优化器是否会选择基础表或连接索引。

Whenever the base tables Employee or Salary are updated, then the Join index Employee_Salary_JI is also automatically updated. If you are running a query joining Employee and Salary tables, then the optimizer may choose to access the data from Employee_Salary_JI directly instead of joining the tables. EXPLAIN plan on the query can be used to verify if the optimizer will choose the base table or Join index.

Aggregate Join Index

如果某个表总是在某些列上聚合,那么可以在表上定义聚合连接索引来提升性能。聚合连接索引的一个限制是它仅支持 SUM 和 COUNT 函数。

If a table is consistently aggregated on certain columns, then aggregate join index can be defined on the table to improve the performance. One limitation of aggregate join index is that it supports only SUM and COUNT functions.

Example

在以下示例中,员工和薪水连接起来,从而按部门识别总薪水。

In the following example Employee and Salary is joined to identify the total salary per Department.

CREATE JOIN INDEX Employee_Salary_JI
AS
SELECT a.DepartmentNo,SUM(b.NetPay) AS TotalPay
FROM Employee a
INNER JOIN Salary b
ON(a.EmployeeNo = b.EmployeeNo)
GROUP BY a.DepartmentNo
Primary Index(DepartmentNo);

Teradata - Views

视图是通过查询构建的数据库对象。视图可以使用单个表或通过连接使用多个表构建。其定义永久存储在数据字典中,但它们不会存储数据的副本。视图的数据是动态构建的。

Views are database objects that are built by the query. Views can be built using a single table or multiple tables by way of join. Their definition is stored permanently in data dictionary but they don’t store copy of the data. Data for the view is built dynamically.

视图可能包含表的子行集或表的子列集。

A view may contain a subset of rows of the table or a subset of columns of the table.

Create a View

使用 CREATE VIEW 语句创建视图。

Views are created using CREATE VIEW statement.

Syntax

以下是创建视图的语法。

Following is the syntax for creating a view.

CREATE/REPLACE VIEW <viewname>
AS
<select query>;

Example

考虑以下 Employee 表。

Consider the following Employee table.

EmployeeNo

FirstName

LastName

BirthDate

101

Mike

James

1/5/1980

104

Alex

Stuart

11/6/1984

102

Robert

Williams

3/5/1983

105

Robert

James

12/1/1984

103

Peter

Paul

4/1/1983

以下示例在 Employee 表上创建了一个视图。

The following example creates a view on Employee table.

CREATE VIEW Employee_View
AS
SELECT
EmployeeNo,
FirstName,
LastName,
FROM
Employee;

Using Views

您可以使用常规 SELECT 语句从视图中检索数据。

You can use regular SELECT statement to retrieve data from Views.

Example

以下示例从 Employee_View 中检索记录;

The following example retrieves the records from Employee_View;

SELECT EmployeeNo, FirstName, LastName FROM Employee_View;

执行以上查询后,将产生以下输出。

When the above query is executed, it produces the following output.

*** Query completed. 5 rows found. 3 columns returned.
*** Total elapsed time was 1 second.
 EmployeeNo            FirstName                       LastName
-----------  ------------------------------  ---------------------------
    101                  Mike                           James
    104                  Alex                           Stuart
    102                  Robert                         Williams
    105                  Robert                         James
    103                  Peter                          Paul

Modifying Views

可以使用 REPLACE VIEW 语句修改现有视图。

An existing view can be modified using REPLACE VIEW statement.

以下是修改视图的语法。

Following is the syntax to modify a view.

REPLACE VIEW <viewname>
AS
<select query>;

Example

以下示例修改视图 Employee_View 以添加额外列。

The following example modifies the view Employee_View for adding additional columns.

REPLACE VIEW Employee_View
AS
SELECT
EmployeeNo,
FirstName,
BirthDate,
JoinedDate
DepartmentNo
FROM
Employee;

Drop View

可以使用 DROP VIEW 语句删除现有视图。

An existing view can be dropped using DROP VIEW statement.

Syntax

以下是 DROP VIEW 的语法。

Following is the syntax of DROP VIEW.

DROP VIEW <viewname>;

Example

以下是删除视图 Employee_View 的示例。

Following is an example to drop the view Employee_View.

DROP VIEW Employee_View;

Advantages of Views

  1. Views provide additional level of security by restricting the rows or columns of a table.

  2. Users can be given access only to views instead of base tables.

  3. Simplifies the use of multiple tables by pre-joining them using Views.

Teradata - Macros

宏是一组 SQL 语句,它们通过调用宏名称来存储和执行。宏的定义存储在数据字典中。用户只需要 EXEC 权限来执行宏。用户不需要对宏中使用的数据库对象有单独的权限。宏语句作为单个事务执行。如果宏中的一个 SQL 语句失败,那么所有语句都会回滚。宏可以接受参数。宏可以包含 DDL 语句,但该语句应为宏中的最后一条语句。

Macro is a set of SQL statements which are stored and executed by calling the Macro name. The definition of Macros is stored in Data Dictionary. Users only need EXEC privilege to execute the Macro. Users don’t need separate privileges on the database objects used inside the Macro. Macro statements are executed as a single transaction. If one of the SQL statements in Macro fails, then all the statements are rolled back. Macros can accept parameters. Macros can contain DDL statements, but that should be the last statement in Macro.

Create Macros

使用 CREATE MACRO 语句创建宏。

Macros are created using CREATE MACRO statement.

Syntax

以下为 CREATE MACRO 命令的通用语法。

Following is the generic syntax of CREATE MACRO command.

CREATE MACRO <macroname> [(parameter1, parameter2,...)] (
   <sql statements>
);

Example

考虑以下 Employee 表。

Consider the following Employee table.

EmployeeNo

FirstName

LastName

BirthDate

101

Mike

James

1/5/1980

104

Alex

Stuart

11/6/1984

102

Robert

Williams

3/5/1983

105

Robert

James

12/1/1984

103

Peter

Paul

4/1/1983

以下示例创建了一个名为 Get_Emp 的宏。它包含一个 select 语句以从员工表中检索记录。

The following example creates a Macro called Get_Emp. It contains a select statement to retrieve records from employee table.

CREATE MACRO Get_Emp AS (
   SELECT
   EmployeeNo,
   FirstName,
   LastName
   FROM
   employee
   ORDER BY EmployeeNo;
);

Executing Macros

使用 EXEC 命令执行宏。

Macros are executed using EXEC command.

Syntax

以下为 EXECUTE MACRO 命令的语法。

Following is the syntax of EXECUTE MACRO command.

EXEC <macroname>;

Example

以下示例执行名为 Get_Emp 的宏;当执行以下命令时,它会从员工表中检索所有记录。

The following example executes the Macro names Get_Emp; When the following command is executed, it retrieves all records from employee table.

EXEC Get_Emp;
*** Query completed. 5 rows found. 3 columns returned.
*** Total elapsed time was 1 second.
EmployeeNo             FirstName                      LastName
-----------  ------------------------------  ---------------------------
   101                  Mike                          James
   102                  Robert                        Williams
   103                  Peter                         Paul
   104                  Alex                          Stuart
   105                  Robert                        James

Parameterized Macros

Teradata 宏可以接受参数。在宏中,这些参数用 ;(分号)引用。

Teradata Macros can accept parameters. Within a Macro, these parameters are referenced with ; (semicolon).

以下是接受参数的宏示例。

Following is an example of a Macro that accepts parameters.

CREATE MACRO Get_Emp_Salary(EmployeeNo INTEGER) AS (
   SELECT
   EmployeeNo,
   NetPay
   FROM
   Salary
   WHERE EmployeeNo = :EmployeeNo;
);

Executing Parameterized Macros

使用 EXEC 命令执行宏。您需要 EXEC 权限才能执行宏。

Macros are executed using EXEC command. You need EXEC privilege to execute the Macros.

Syntax

以下为 EXECUTE MACRO 语句的语法。

Following is the syntax of EXECUTE MACRO statement.

EXEC <macroname>(value);

Example

以下示例执行名为 Get_Emp 的宏;它接受员工编号作为参数,并为该员工从员工表中提取记录。

The following example executes the Macro names Get_Emp; It accepts employee no as parameter and extracts records from employee table for that employee.

EXEC Get_Emp_Salary(101);
*** Query completed. One row found. 2 columns returned.
*** Total elapsed time was 1 second.

EmployeeNo      NetPay
-----------  ------------
   101           36000

Teradata - Stored Procedure

存储过程包含一组 SQL 语句和过程语句。它们可能仅包含过程语句。存储过程的定义存储在数据库中,并且参数存储在数据字典表中。

A stored procedure contains a set of SQL statements and procedural statements. They may contain only procedural statements. The definition of stored procedure is stored in database and the parameters are stored in data dictionary tables.

Advantages

  1. Stored procedures reduce the network load between the client and the server.

  2. Provides better security since the data is accessed through stored procedures instead of accessing them directly.

  3. Gives better maintenance since the business logic is tested and stored in the server.

Creating Procedure

存储过程使用 CREATE PROCEDURE 语句创建。

Stored Procedures are created using CREATE PROCEDURE statement.

Syntax

以下是 CREATE PROCEDURE 语句的通用语法。

Following is the generic syntax of the CREATE PROCEDURE statement.

CREATE PROCEDURE <procedurename> ( [parameter 1 data type, parameter 2 data type..] )
BEGIN
   <SQL or SPL statements>;
END;

Example

请考虑以下 Salary 表。

Consider the following Salary Table.

EmployeeNo

Gross

Deduction

NetPay

101

40,000

4,000

36,000

102

80,000

6,000

74,000

103

90,000

7,000

83,000

104

75,000

5,000

70,000

以下示例创建了一个名为 InsertSalary 的存储过程以接受值并将其插入 Salary 表中。

The following example creates a stored procedure named InsertSalary to accept the values and insert into Salary Table.

CREATE PROCEDURE InsertSalary (
   IN in_EmployeeNo INTEGER, IN in_Gross INTEGER,
   IN in_Deduction INTEGER, IN in_NetPay INTEGER
)
BEGIN
   INSERT INTO Salary (
      EmployeeNo,
      Gross,
      Deduction,
      NetPay
   )
   VALUES (
      :in_EmployeeNo,
      :in_Gross,
      :in_Deduction,
      :in_NetPay
   );
END;

Executing Procedures

存储过程使用 CALL 语句执行。

Stored Procedures are executed using CALL statement.

Syntax

以下是 CALL 语句的通用语法。

Following is the generic syntax of the CALL statement.

CALL <procedure name> [(parameter values)];

Example

以下示例调用存储过程 InsertSalary 并将记录插入 Salary 表中。

The following example calls the stored procedure InsertSalary and inserts records to Salary Table.

CALL InsertSalary(105,20000,2000,18000);

一旦执行以上查询,它将生成以下输出,您可以在 Salary 表中看到已插入的行。

Once the above query is executed, it produces the following output and you can see the inserted row in Salary table.

EmployeeNo

Gross

Deduction

NetPay

101

40,000

4,000

36,000

102

80,000

6,000

74,000

103

90,000

7,000

83,000

104

75,000

5,000

70,000

105

20,000

2,000

18,000

Teradata - JOIN strategies

本章讨论 Teradata 中可用的各种 JOIN 策略。

This chapter discusses the various JOIN strategies available in Teradata.

Join Methods

Teradata 使用不同的连接方法来执行连接操作。一些常用的连接方法包括:

Teradata uses different join methods to perform join operations. Some of the commonly used Join methods are −

  1. Merge Join

  2. Nested Join

  3. Product Join

Merge Join

当连接基于相等条件时,将执行 Merge Join 方法。Merge Join 要求连接的行位于同一 AMP 上。行是根据其行哈希连接的。Merge Join 使用不同的连接策略将行带到同一个 AMP。

Merge Join method takes place when the join is based on the equality condition. Merge Join requires the joining rows to be on the same AMP. Rows are joined based on their row hash. Merge Join uses different join strategies to bring the rows to the same AMP.

Strategy

如果连接列是相应表的初级索引,那么连接行已经位于同一个 AMP 上。在这种情况下,不需要分配。

If the join columns are the primary indexes of the corresponding tables, then the joining rows are already on the same AMP. In this case, no distribution is required.

考虑以下员工和薪水表。

Consider the following Employee and Salary Tables.

CREATE SET TABLE EMPLOYEE,FALLBACK (
   EmployeeNo INTEGER,
   FirstName VARCHAR(30) ,
   LastName VARCHAR(30) ,
   DOB DATE FORMAT 'YYYY-MM-DD',
   JoinedDate DATE FORMAT 'YYYY-MM-DD',
   DepartmentNo BYTEINT
)
UNIQUE PRIMARY INDEX ( EmployeeNo );
CREATE SET TABLE Salary (
   EmployeeNo INTEGER,
   Gross INTEGER,
   Deduction INTEGER,
   NetPay INTEGER
)
UNIQUE PRIMARY INDEX(EmployeeNo);

当这两个表按 EmployeeNo 列连接时,则不会发生重新分配,因为 EmployeeNo 是被连接的两个表的初级索引。

When these two tables are joined on EmployeeNo column, then no redistribution takes place since EmployeeNo is the primary index of both the tables which are being joined.

Strategy

考虑以下员工和部门表。

Consider the following Employee and Department tables.

CREATE SET TABLE EMPLOYEE,FALLBACK (
   EmployeeNo INTEGER,
   FirstName VARCHAR(30) ,
   LastName VARCHAR(30) ,
   DOB DATE FORMAT 'YYYY-MM-DD',
   JoinedDate DATE FORMAT 'YYYY-MM-DD',
   DepartmentNo BYTEINT
)
UNIQUE PRIMARY INDEX ( EmployeeNo );
CREATE SET TABLE DEPARTMENT,FALLBACK (
   DepartmentNo BYTEINT,
   DepartmentName CHAR(15)
)
UNIQUE PRIMARY INDEX ( DepartmentNo );

如果这两个表按 DeparmentNo 列连接,那么需要重新分配这些行,因为 DepartmentNo 是一个表中的初级索引,而另一个表中的非初级索引。在此场景中,连接行可能不在同一个 AMP 上。在这种情况下,Teradata 可能会在 DepartmentNo 列上重新分配员工表。

If these two tables are joined on DeparmentNo column, then the rows need to be redistributed since DepartmentNo is a primary index in one table and non-primary index in another table. In this scenario, joining rows may not be on the same AMP. In such case, Teradata may redistribute employee table on DepartmentNo column.

Strategy

对于上面的员工和部门表,如果部门表大小较小,Teradata 可能会在所有 AMP 上复制部门表。

For the above Employee and Department tables, Teradata may duplicate the Department table on all AMPs, if the size of Department table is small.

Nested Join

Nested Join 并不使用所有 AMP。对于 Nested Join 来说,一个条件应该是对一个表的唯一初级索引的相等,然后将此列连接到另一个表上的任何索引。

Nested Join doesn’t use all AMPs. For the Nested Join to take place, one of the condition should be equality on the unique primary index of one table and then joining this column to any index on the other table.

在此情况下,系统将使用一个表唯一初级索引获取一行,并使用该行哈希从其他表获取匹配的记录。Nested join 是所有连接方法中最有效的。

In this scenario, the system will fetch the one row using Unique Primary index of one table and use that row hash to fetch the matching records from other table. Nested join is the most efficient of all Join methods.

Product Join

Product Join 将一个表中每个符合条件的行与另一个表中每个符合条件的行进行比较。由于以下一些因素,可能会发生 Product join:

Product Join compares each qualifying row from one table with each qualifying row from other table. Product join may take place due to some of the following factors −

  1. Where condition is missing.

  2. Join condition is not based on equality condition.

  3. Table aliases is not correct.

  4. Multiple join conditions.

Teradata - Partitioned Primary Index

分区主索引 (PPI) 是一种索引机制,可用于提高某些查询的性能。当行插入表中时,它们会存储在 AMP 中,并按其行哈希顺序排列。当使用 PPI 定义表时,行会按其分区号排序。在每个分区内,它们会按行哈希排列。行会根据定义的分区表达式分配给分区。

Partitioned Primary Index (PPI) is an indexing mechanism that is useful in improving the performance of certain queries. When rows are inserted into a table, they are stored in an AMP and arranged by their row hash order. When a table is defined with PPI, the rows are sorted by their partition number. Within each partition, they are arranged by their row hash. Rows are assigned to a partition based on the partition expression defined.

Advantages

  1. Avoid full table scan for certain queries.

  2. Avoid using secondary index that requires additional physical structure and additional I/O maintenance.

  3. Access a subset of a large table quickly.

  4. Drop the old data quickly and add new data.

Example

考虑具有订单号为主索引的如下订单表。

Consider the following Orders table with Primary Index on OrderNo.

StoreNo

OrderNo

OrderDate

OrderTotal

101

7501

2015-10-01

900

101

7502

2015-10-02

1,200

102

7503

2015-10-02

3,000

102

7504

2015-10-03

2,454

101

7505

2015-10-03

1201

103

7506

2015-10-04

2,454

101

7507

2015-10-05

1201

101

7508

2015-10-05

1201

假设记录如以下表格所示在 AMP 之间进行分配。记录存储在 AMP 中,根据其行哈希进行排序。

Assume that the records are distributed between AMPs as shown in the following tables. Recorded are stored in AMPs, sorted based on their row hash.

**

RowHash

OrderNo

OrderDate

1

7505

2015-10-03

2

7504

2015-10-03

3

7501

2015-10-01

4

7508

2015-10-05

**

RowHash

OrderNo

OrderDate

1

7507

2015-10-05

2

7502

2015-10-02

3

7506

2015-10-04

4

7503

2015-10-02

如果您运行一个查询以提取某个特定日期的订单,则优化器可能会选择使用全表扫描,然后访问 AMP 中的所有记录。为了避免这种情况,您可以将订单日期定义为分区主索引。当行插入订单表时,它们按订单日期分区。它们将在每个分区中按其行哈希进行排序。

If you run a query to extract the orders for a particular date, then the optimizer may choose to use Full Table Scan, then all the records within the AMP may be accessed. To avoid this, you can define the order date as Partitioned Primary Index. When rows are inserted into orders table, they are partitioned by the order date. Within each partition they will be ordered by their row hash.

以下数据显示了如果按订单日期分区,记录如何存储在 AMP 中。如果运行一个查询以按订单日期访问记录,则只会访问包含特定订单记录的分区。

The following data shows how the records will be stored in AMPs, if they are partitioned by Order Date. If a query is run to access the records by Order Date, then only the partition that contains the records for that particular order will be accessed.

**

Partition

RowHash

OrderNo

OrderDate

0

3

7501

2015-10-01

1

1

7505

2015-10-03

1

2

7504

2015-10-03

2

4

7508

2015-10-05

**

Partition

RowHash

OrderNo

OrderDate

0

2

7502

2015-10-02

0

4

7503

2015-10-02

1

3

7506

2015-10-04

2

1

7507

2015-10-05

以下是一个带分区主索引创建表的示例。PARTITION BY 子句用于定义分区。

Following is an example to create a table with partition primary Index. PARTITION BY clause is used to define the partition.

CREATE SET TABLE Orders (
   StoreNo SMALLINT,
   OrderNo INTEGER,
   OrderDate DATE FORMAT 'YYYY-MM-DD',
   OrderTotal INTEGER
)
PRIMARY INDEX(OrderNo)
PARTITION BY RANGE_N  (
   OrderDate BETWEEN DATE '2010-01-01' AND '2016-12-31' EACH INTERVAL '1' DAY
);

在上述示例中,该表按 OrderDate 列分区。每一天将会有一个单独的分区。

In the above example, the table is partitioned by OrderDate column. There will be one separate partition for each day.

Teradata - OLAP Functions

OLAP 函数与聚合函数类似,只不过聚合函数仅返回一个值,而 OLAP 函数将提供各个行和聚合。

OLAP functions are similar to aggregate functions except that the aggregate functions will return only one value whereas the OLAP function will provide the individual rows in addition to the aggregates.

Syntax

以下是 OLAP 函数的一般语法。

Following is the general syntax of the OLAP function.

<aggregate function> OVER
([PARTITION BY] [ORDER BY columnname][ROWS BETWEEN
UNBOUDED PRECEDING AND UNBOUNDED FOLLOWING)

聚合函数可以是 SUM、COUNT、MAX、MIN、AVG。

Aggregation functions can be SUM, COUNT, MAX,MIN, AVG.

Example

考虑以下 Salary 表格。

Consider the following Salary table.

EmployeeNo

Gross

Deduction

NetPay

101

40,000

4,000

36,000

102

80,000

6,000

74,000

103

90,000

7,000

83,000

104

75,000

5,000

70,000

以下是一个在 Salary 表格上查找 NetPay 的累积和或运行总和的示例。记录按 EmployeeNo 排序,并在 NetPay 列上计算累积和。

Following is an example to find the cumulative sum or running total of NetPay on Salary table. Records are sorted by EmployeeNo and cumulative sum is calculated on NetPay column.

SELECT
EmployeeNo, NetPay,
SUM(Netpay) OVER(ORDER BY EmployeeNo ROWS
UNBOUNDED PRECEDING) as TotalSalary
FROM Salary;

执行以上查询后,将产生以下输出。

When the above query is executed, it produces the following output.

EmployeeNo     NetPay     TotalSalary
-----------  -----------  -----------
   101         36000        36000
   102         74000        110000
   103         83000        193000
   104         70000        263000
   105         18000        281000

RANK

RANK 函数根据提供的列对记录进行排序。RANK 函数还可以根据排名过滤返回的记录数。

RANK function orders the records based on the column provided. RANK function can also filter the number of records returned based on the rank.

Syntax

下面是使用 RANK 函数的通用语法。

Following is the generic syntax to use the RANK function.

RANK() OVER
([PARTITION BY columnnlist] [ORDER BY columnlist][DESC|ASC])

Example

考虑以下 Employee 表。

Consider the following Employee table.

EmployeeNo

FirstName

LastName

JoinedDate

DepartmentID

BirthDate

101

Mike

James

3/27/2005

1

1/5/1980

102

Robert

Williams

4/25/2007

2

3/5/1983

103

Peter

Paul

3/21/2007

2

4/1/1983

104

Alex

Stuart

2/1/2008

2

11/6/1984

105

Robert

James

1/4/2008

3

12/1/1984

下面的查询按加入日期对员工表中的记录排序,并按加入日期分配排名。

Following query orders the records of the employee table by Joined Date and assigns the ranking on Joined Date.

SELECT EmployeeNo, JoinedDate,RANK()
OVER(ORDER BY JoinedDate) as Seniority
FROM Employee;

执行以上查询后,将产生以下输出。

When the above query is executed, it produces the following output.

EmployeeNo   JoinedDate   Seniority
-----------  ----------  -----------
   101       2005-03-27       1
   103       2007-03-21       2
   102       2007-04-25       3
   105       2008-01-04       4
   104       2008-02-01       5

PARTITION BY 子句按 PARTITION BY 子句中定义的列对数据进行分组,并在每个组内执行 OLAP 函数。下面是使用 PARTITION BY 子句的查询示例。

PARTITION BY clause groups the data by the columns defined in the PARTITION BY clause and performs the OLAP function within each group. Following is an example of the query that uses PARTITION BY clause.

SELECT EmployeeNo, JoinedDate,RANK()
OVER(PARTITION BY DeparmentNo ORDER BY JoinedDate) as Seniority
FROM Employee;

执行上述查询时,它会产生以下输出。可以看到每个部门的排名都已重置。

When the above query is executed, it produces the following output. You can see that the Rank is reset for each Department.

EmployeeNo  DepartmentNo  JoinedDate   Seniority
-----------  ------------  ----------  -----------

    101           1        2005-03-27       1
    103           2        2007-03-21       1
    102           2        2007-04-25       2
    104           2        2008-02-01       3
    105           3        2008-01-04       1

Teradata - Data Protection

本章介绍了 Teradata 中提供的用于数据保护的功能。

This chapter discusses the features available for data protection in Teradata.

Transient Journal

Teradata 使用临时日志来防止事务失败导致的数据丢失。每当运行任何事务时,临时日志都会保留受影响行的映像副本,直到事务成功或成功回滚为止。然后,将丢弃之前的映像。临时日志保存在每个 AMP 中。这是一个自动过程,并且无法禁用。

Teradata uses Transient Journal to protect data from transaction failures. Whenever any transactions are run, Transient journal keeps a copy of the before images of the affected rows until the transaction is successful or rolled back successfully. Then, the before images are discarded. Transient journal is kept in each AMPs. It is an automatic process and cannot be disabled.

Fallback

备用保护通过在另一个称为备用 AMP 的 AMP 上存储表的第二行副本来保护表数据。如果一个 AMP 发生故障,则可以访问备用行。这样,即使一个 AMP 发生故障,数据仍然可以通过备用 AMP 获得。备用选项可以在创建表时或创建表后使用。备用可确保始终将表行的第二副本存储在另一个 AMP 中,以防止 AMP 故障导致数据丢失。但是,备用占据了插入/删除/更新操作的两倍存储空间和 I/O。

Fallback protects the table data by storing the second copy of rows of a table on another AMP called as Fallback AMP. If one AMP fails, then the fallback rows are accessed. With this, even if one AMP fails, data is still available through fallback AMP. Fallback option can be used at table creation or after table creation. Fallback ensures that the second copy of the rows of the table is always stored in another AMP to protect the data from AMP failure. However, fallback occupies twice the storage and I/O for Insert/Delete/Update.

以下图表显示了如何将行的备用副本存储在另一个 AMP 中。

Following diagram shows how fallback copy of the rows are stored in another AMP.

fallback

Down AMP Recovery Journal

当 AMP 发生故障并且表受到备用保护时,会激活 Down AMP 恢复日志。此日志会跟踪对发生故障的 AMP 的数据的所有更改。日志会在集群中剩余的 AMP 上激活。这是一个自动过程,并且无法禁用。当有故障的 AMP 恢复后,Down AMP 恢复日志中的数据将与 AMP 同步。完成此操作后,将丢弃日志。

The Down AMP recovery journal is activated when the AMP fails and the table is fallback protected. This journal keeps track of all the changes to the data of the failed AMP. The journal is activated on the remaining AMPs in the cluster. It is an automatic process and cannot be disabled. Once the failed AMP is live then the data from the Down AMP recovery journal is synchronized with the AMP. Once this is done, the journal is discarded.

down amp recovery journal

Cliques

隔离是一种机制,Teradata 使用它来保护数据免遭节点故障的影响。隔离只不过是一组共享一组公共磁盘阵列的 Teradata 节点。当一个节点发生故障时,有故障节点的 vproc 将迁移到隔离中的其他节点,并继续访问其磁盘阵列。

Clique is a mechanism used by Teradata to protect data from Node failures. A clique is nothing but a set of Teradata nodes that share a common set of Disk Arrays. When a node fails, then the vprocs from the failed node will migrate to other nodes in the clique and continue to access their disk arrays.

Hot Standby Node

热备用节点是不参与生产环境的节点。如果一个节点发生故障,则有故障节点的 vproc 将迁移到热备用节点。一旦恢复发生故障的节点,它将成为热备用节点。热备用节点用于在发生节点故障时维持性能。

Hot Standby Node is a node that does not participate in the production environment. If a node fails then the vprocs from the failed nodes will migrate to the hot standby node. Once the failed node is recovered it becomes the hot standby node. Hot Standby nodes are used to maintain the performance in case of node failures.

RAID

独立磁盘冗余阵列 (RAID) 是一种用于保护数据免受磁盘故障影响的机制。磁盘阵列由一组磁盘组成,这些磁盘被分组为一个逻辑单元。此单元在用户看来可能像一个单元,但可能分布在多个磁盘上。

Redundant Array of Independent Disks (RAID) is a mechanism used to protect data from Disk Failures. Disk Array consists of a set of disks which are grouped as a logical unit. This unit may look like a single unit to the user but they may be spread across several disks.

RAID 1 通常在 Teradata 中使用。在 RAID 1 中,每个磁盘都与一个镜像磁盘关联。对主磁盘中的数据的任何更改也会反映在镜像副本中。如果主磁盘发生故障,则可以访问镜像磁盘中的数据。

RAID 1 is commonly used in Teradata. In RAID 1, each disk is associated with a mirror disk. Any changes to the data in primary disk is reflected in mirror copy also. If the primary disk fails, then the data from mirror disk can be accessed.

raid

Teradata - User Management

本章讨论了 Teradata 中用户管理的各种策略。

This chapter discussed the various strategies of user management in Teradata.

Users

使用 CREATE USER 命令创建用户。在 Teradata 中,用户类似于数据库。它们都可被分配空间并包含数据库对象,只不过用户被分配了密码。

A user is created using CREATE USER command. In Teradata, a user is also similar to a database. They both can be assigned space and contain database objects except that the user is assigned a password.

Syntax

以下是 CREATE USER 的语法。

Following is the syntax for CREATE USER.

CREATE USER username
AS
[PERMANENT|PERM] = n BYTES
PASSWORD = password
TEMPORARY = n BYTES
SPOOL = n BYTES;

创建用户时,必须提供用户名、永久空间和密码的值。其他字段是可选的。

While creating a user, the values for user name, Permanent space and Password is mandatory. Other fields are optional.

Example

以下是创建用户 TD01 的示例。

Following is an example to create the user TD01.

CREATE USER TD01
AS
PERMANENT = 1000000 BYTES
PASSWORD = ABC$124
TEMPORARY = 1000000 BYTES
SPOOL = 1000000 BYTES;

Accounts

创建新用户时,可以将用户分配给一个帐号。CREATE USER 中的 ACCOUNT 选项用于分配帐号。用户可被分配给多个帐号。

While creating a new user, the user may be assigned to an account. ACCOUNT option in CREATE USER is used to assign the account. A user may be assigned to multiple accounts.

Syntax

以下是使用帐户选项的 CREATE USER 的语法。

Following is the syntax for CREATE USER with account option.

CREATE USER username
PERM = n BYTES
PASSWORD = password
ACCOUNT = accountid

Example

以下示例创建用户 TD02,并将帐户分配为 IT 和 Admin。

The following example creates the user TD02 and assigns the account as IT and Admin.

CREATE USER TD02
AS
PERMANENT = 1000000 BYTES
PASSWORD = abc$123
TEMPORARY = 1000000 BYTES
SPOOL = 1000000 BYTES
ACCOUNT = (‘IT’,’Admin’);

用户可以在登录 Teradata 系统时或使用 SET SESSION 命令登录系统后指定帐户 ID。

The user can specify the account id while logging into Teradata system or after being logged into the system using SET SESSION command.

.LOGON username, passowrd,accountid
OR
SET SESSION ACCOUNT = accountid

Grant Privileges

GRANT 命令用于将数据库对象的多个权限分配给用户或数据库。

GRANT command is used to assign one or more privileges on the database objects to the user or database.

Syntax

以下是 GRANT 命令的语法。

Following is the syntax of the GRANT command.

GRANT privileges ON objectname TO username;

权限可以是 INSERT、SELECT、UPDATE 和 REFERENCES。

Privileges can be INSERT, SELECT, UPDATE, REFERENCES.

Example

以下是一个 GRANT 语句的示例。

Following is an example of GRANT statement.

GRANT SELECT,INSERT,UPDATE ON Employee TO TD01;

Revoke Privileges

REVOKE 命令从用户或数据库中删除权限。REVOKE 命令只能删除显式权限。

REVOKE command removes the privileges from the users or databases. The REVOKE command can only remove explicit privileges.

Syntax

以下是 REVOKE 命令的基本语法。

Following is the basic syntax for REVOKE command.

REVOKE [ALL|privileges] ON objectname FROM username;

Example

以下是一个 REVOKE 命令的示例。

Following is an example of REVOKE command.

REVOKE INSERT,SELECT ON Employee FROM TD01;

Teradata - Performance Tuning

本章讨论 Teradata 中的性能调优过程。

This chapter discusses the procedure of performance tuning in Teradata.

Explain

性能调优的第一步是针对查询使用 EXPLAIN。EXPLAIN 计划提供优化器执行查询的详细信息。在 EXPLAIN 计划中,检查关键字(如置信级别、已使用的联接策略、假脱机文件大小、重新分配等)。

The first step in performance tuning is the use of EXPLAIN on your query. EXPLAIN plan gives the details of how optimizer will execute your query. In the Explain plan, check for the keywords like confidence level, join strategy used, spool file size, redistribution, etc.

Collect Statistics

优化器使用数据人口统计数据提出有效的执行策略。COLLECT STATISTICS 命令用于收集表的数据人口统计数据。确保收集的列上的统计数据是最新的。

Optimizer uses Data demographics to come up with effective execution strategy. COLLECT STATISTICS command is used to collect data demographics of the table. Make sure that the statistics collected on the columns are up to date.

  1. Collect statistics on the columns that are used in WHERE clause and on the columns used in the joining condition.

  2. Collect statistics on the Unique Primary Index columns.

  3. Collect statistics on Non Unique Secondary Index columns. Optimizer will decide if it can use NUSI or Full Table Scan.

  4. Collect statistics on the Join Index though the statistics on base table is collected.

  5. Collect statistics on the partitioning columns.

Data Types

确保使用适当的数据类型。这将避免使用超出所需的大量存储空间。

Make sure that proper data types are used. This will avoid the use of excessive storage than required.

Conversion

确保联接条件所用列的数据类型兼容,以避免显式数据转换。

Make sure that the data types of the columns used in join condition are compatible to avoid explicit data conversions.

Sort

除非需要,否则删除不必要的 ORDER BY 子句。

Remove unnecessary ORDER BY clauses unless required.

Spool Space Issue

如果查询超过用户为每个 AMP 分配的卷轴空间限制,则会生成卷轴空间错误。验证执行计划并确定占用更多卷轴空间的步骤。可以将这些中间查询拆分并分别放置以构建临时表。

Spool space error is generated if the query exceeds per AMP spool space limit for that user. Verify the explain plan and identify the step that consumes more spool space. These intermediate queries can be split and put as separately to build temporary tables.

Primary Index

确保表的主索引正确定义。主索引列应均匀分布数据,并且应经常用于访问数据。

Make sure that the Primary Index is correctly defined for the table. The primary index column should evenly distribute the data and should be frequently used to access the data.

SET Table

如果您定义 SET 表,则优化器会检查对于所插入的每条记录,是否存在重复记录。若要删除重复检查条件,您可以为表定义唯一二级索引。

If you define a SET table, then the optimizer will check if the record is duplicate for each and every record inserted. To remove the duplicate check condition, you can define Unique Secondary Index for the table.

UPDATE on Large Table

更新大型表会很耗时。您可以删除记录并插入包含修改后的行的新记录,而不是更新表。

Updating the large table will be time consuming. Instead of updating the table, you can delete the records and insert the records with modified rows.

Dropping Temporary Tables

如果不再需要,请删除临时表(暂存表)和临时变量。这将释放永久空间和卷轴空间。

Drop the temporary tables (staging tables) and volatiles if they are no longer needed. This will free up permanent space and spool space.

MULTISET Table

如果您确信输入记录没有重复记录,则可以将目标表定义为 MULTISET 表,以避免使用 SET 表执行重复行检查。

If you are sure that the input records will not have duplicate records, then you can define the target table as MULTISET table to avoid the duplicate row check used by SET table.

Teradata - FastLoad

FastLoad 实用工具用于将数据加载到空表中。因为它不使用瞬态日志,所以数据可以快速加载。即使目标表是 MULTISET 表,它也不会加载重复行。

FastLoad utility is used to load data into empty tables. Since it does not use transient journals, data can be loaded quickly. It doesn’t load duplicate rows even if the target table is a MULTISET table.

Limitation

目标表不应具有辅助索引、联接索引和外键引用。

Target table should not have secondary index, join index and foreign key reference.

How FastLoad Works

FastLoad 分两阶段执行。

FastLoad is executed in two phases.

Phase 1

  1. The Parsing engines read the records from the input file and sends a block to each AMP.

  2. Each AMP stores the blocks of records.

  3. Then AMPs hash each record and redistribute them to the correct AMP.

  4. At the end of Phase 1, each AMP has its rows but they are not in row hash sequence.

Phase 2

  1. Phase 2 starts when FastLoad receives the END LOADING statement.

  2. Each AMP sorts the records on row hash and writes them to the disk.

  3. Locks on the target table is released and the error tables are dropped.

Example

使用以下记录创建一个文本文件,并将该文件命名为 employee.txt。

Create a text file with the following records and name the file as employee.txt.

101,Mike,James,1980-01-05,2010-03-01,1
102,Robert,Williams,1983-03-05,2010-09-01,1
103,Peter,Paul,1983-04-01,2009-02-12,2
104,Alex,Stuart,1984-11-06,2014-01-01,2
105,Robert,James,1984-12-01,2015-03-09,3

下面是一个示例 FastLoad 脚本,用于将上述文件加载到 Employee_Stg 表中。

Following is a sample FastLoad script to load the above file into Employee_Stg table.

LOGON 192.168.1.102/dbc,dbc;
   DATABASE tduser;
   BEGIN LOADING tduser.Employee_Stg
      ERRORFILES Employee_ET, Employee_UV
      CHECKPOINT 10;
      SET RECORD VARTEXT ",";
      DEFINE in_EmployeeNo (VARCHAR(10)),
         in_FirstName (VARCHAR(30)),
         in_LastName (VARCHAR(30)),
         in_BirthDate (VARCHAR(10)),
         in_JoinedDate (VARCHAR(10)),
         in_DepartmentNo (VARCHAR(02)),
         FILE = employee.txt;
      INSERT INTO Employee_Stg (
         EmployeeNo,
         FirstName,
         LastName,
         BirthDate,
         JoinedDate,
         DepartmentNo
      )
      VALUES (
         :in_EmployeeNo,
         :in_FirstName,
         :in_LastName,
         :in_BirthDate (FORMAT 'YYYY-MM-DD'),
         :in_JoinedDate (FORMAT 'YYYY-MM-DD'),
         :in_DepartmentNo
      );
   END LOADING;
LOGOFF;

Executing a FastLoad Script

在创建 input 文件 employee.txt 并且 FastLoad 脚本被命名为 EmployeeLoad.fl 后,您可以在 UNIX 和 Windows 中使用以下命令运行 FastLoad 脚本。

Once the input file employee.txt is created and the FastLoad script is named as EmployeeLoad.fl, you can run the FastLoad script using the following command in UNIX and Windows.

FastLoad < EmployeeLoad.fl;

执行上述命令后,FastLoad 脚本将运行并生成日志。在日志中,您可以看到 FastLoad 处理的记录数量和状态码。

Once the above command is executed, the FastLoad script will run and produce the log. In the log, you can see the number of records processed by FastLoad and status code.

**** 03:19:14 END LOADING COMPLETE
   Total Records Read              =  5
   Total Error Table 1             =  0  ---- Table has been dropped
   Total Error Table 2             =  0  ---- Table has been dropped
   Total Inserts Applied           =  5
   Total Duplicate Rows            =  0
   Start:   Fri Jan  8 03:19:13 2016
   End  :   Fri Jan  8 03:19:14 2016
**** 03:19:14 Application Phase statistics:
              Elapsed time: 00:00:01 (in hh:mm:ss)
0008  LOGOFF;
**** 03:19:15 Logging off all sessions

FastLoad Terms

以下是 FastLoad 脚本中常用的术语列表。

Following is the list of common terms used in FastLoad script.

  1. LOGON − Logs into Teradata and initiates one or more sessions.

  2. DATABASE − Sets the default database.

  3. BEGIN LOADING − Identifies the table to be loaded.

  4. ERRORFILES − Identifies the 2 error tables that needs to be created/updated.

  5. CHECKPOINT − Defines when to take checkpoint.

  6. SET RECORD − Specifies if the input file format is formatted, binary, text or unformatted.

  7. DEFINE − Defines the input file layout.

  8. FILE − Specifies the input file name and path.

  9. INSERT − Inserts the records from the input file into the target table.

  10. END LOADING − Initiates phase 2 of the FastLoad. Distributes the records into the target table.

  11. LOGOFF − Ends all sessions and terminates FastLoad.

Teradata - MultiLoad

MultiLoad 能够一次加载多张表,它还可以执行不同类型的任务,比如 INSERT、DELETE、UPDATE 和 UPSERT。它一次最多可以加载 5 张表,并且在一个脚本中最多可以执行 20 个 DML 操作。MultiLoad 不需要目标表。

MultiLoad can load multiple tables at a time and it can also perform different types of tasks such as INSERT, DELETE, UPDATE and UPSERT. It can load up to 5 tables at a time and perform up to 20 DML operations in a script. The target table is not required for MultiLoad.

MultiLoad 支持两种模式 −

MultiLoad supports two modes −

  1. IMPORT

  2. DELETE

MultiLoad 除目标表外,还需要一个工作表、一个日志表和两个错误表。

MultiLoad requires a work table, a log table and two error tables in addition to the target table.

  1. Log Table − Used to maintain the checkpoints taken during load which will be used for restart.

  2. Error Tables − These tables are inserted during load when an error occurs. First error table stores conversion errors whereas second error table stores duplicate records.

  3. Log Table − Maintains the results from each phase of MultiLoad for restart purpose.

  4. Work table − MultiLoad script creates one work table per target table. Work table is used to keep DML tasks and the input data.

Limitation

MultiLoad 有一些限制。

MultiLoad has some limitations.

  1. Unique Secondary Index not supported on target table.

  2. Referential integrity not supported.

  3. Triggers not supported.

How MultiLoad Works

MultiLoad 导入有五个阶段 −

MultiLoad import has five phases −

  1. Phase 1 − Preliminary Phase – Performs basic setup activities.

  2. Phase 2 − DML Transaction Phase – Verifies the syntax of DML statements and brings them to Teradata system.

  3. Phase 3 − Acquisition Phase – Brings the input data into work tables and locks the table.

  4. Phase 4 − Application Phase – Applies all DML operations.

  5. Phase 5 − Cleanup Phase – Releases the table lock.

MultiLoad 脚本中涉及的步骤 −

The steps involved in a MultiLoad script are −

  1. Step 1 − Set up the log table.

  2. Step 2 − Log on to Teradata.

  3. Step 3 − Specify the Target, Work and Error tables.

  4. Step 4 − Define INPUT file layout.

  5. Step 5 − Define the DML queries.

  6. Step 6 − Name the IMPORT file.

  7. Step 7 − Specify the LAYOUT to be used.

  8. Step 8 − Initiate the Load.

  9. Step 9 − Finish the load and terminate the sessions.

Example

使用以下记录创建一个文本文件,并将该文件命名为 employee.txt。

Create a text file with the following records and name the file as employee.txt.

101,Mike,James,1980-01-05,2010-03-01,1
102,Robert,Williams,1983-03-05,2010-09-01,1
103,Peter,Paul,1983-04-01,2009-02-12,2
104,Alex,Stuart,1984-11-06,2014-01-01,2
105,Robert,James,1984-12-01,2015-03-09,3

以下示例为一个 MultiLoad 脚本,该脚本读取 employee 表中的记录并将记录加载到 Employee_Stg 表中。

The following example is a MultiLoad script that reads records from employee table and loads into Employee_Stg table.

.LOGTABLE tduser.Employee_log;
.LOGON 192.168.1.102/dbc,dbc;
   .BEGIN MLOAD TABLES Employee_Stg;
      .LAYOUT Employee;
      .FIELD in_EmployeeNo * VARCHAR(10);
      .FIELD in_FirstName * VARCHAR(30);
      .FIELD in_LastName * VARCHAR(30);
      .FIELD in_BirthDate * VARCHAR(10);
      .FIELD in_JoinedDate * VARCHAR(10);
      .FIELD in_DepartmentNo * VARCHAR(02);

      .DML LABEL EmpLabel;
      INSERT INTO Employee_Stg (
         EmployeeNo,
         FirstName,
         LastName,
         BirthDate,
         JoinedDate,
         DepartmentNo
      )
      VALUES (
         :in_EmployeeNo,
         :in_FirstName,
         :in_Lastname,
         :in_BirthDate,
         :in_JoinedDate,
         :in_DepartmentNo
      );
      .IMPORT INFILE employee.txt
      FORMAT VARTEXT ','
      LAYOUT Employee
      APPLY EmpLabel;
   .END MLOAD;
LOGOFF;

Executing a MultiLoad Script

创建输入文件 employee.txt 并将 MultiLoad 脚本命名为 EmployeeLoad.ml 后,可使用 UNIX 和 Windows 中的以下命令运行 MultiLoad 脚本。

Once the input file employee.txt is created and the multiload script is named as EmployeeLoad.ml, then you can run the Multiload script using the following command in UNIX and Windows.

Multiload < EmployeeLoad.ml;

Teradata - FastExport

FastExport 实用程序用于将数据从 Teradata 表导出到平面文件中。它还可以生成报告格式的数据。可以使用 Join 从一个或多个表中提取数据。由于 FastExport 以 64K 块导出数据,因此它对于提取大量数据非常有用。

FastExport utility is used to export data from Teradata tables into flat files. It can also generate the data in report format. Data can be extracted from one or more tables using Join. Since FastExport exports the data in 64K blocks, it is useful for extracting large volume of data.

Example

考虑以下 Employee 表。

Consider the following Employee table.

EmployeeNo

FirstName

LastName

BirthDate

101

Mike

James

1/5/1980

104

Alex

Stuart

11/6/1984

102

Robert

Williams

3/5/1983

105

Robert

James

12/1/1984

103

Peter

Paul

4/1/1983

以下是 FastExport 脚本的一个示例。它从 employee 表中导出数据,并写入到文件 employeedata.txt 中。

Following is an example of a FastExport script. It exports data from employee table and writes into a file employeedata.txt.

.LOGTABLE tduser.employee_log;
.LOGON 192.168.1.102/dbc,dbc;
   DATABASE tduser;
   .BEGIN EXPORT SESSIONS 2;
      .EXPORT OUTFILE employeedata.txt
      MODE RECORD FORMAT TEXT;
      SELECT CAST(EmployeeNo AS CHAR(10)),
         CAST(FirstName AS CHAR(15)),
         CAST(LastName AS CHAR(15)),
         CAST(BirthDate AS CHAR(10))
      FROM
      Employee;
   .END EXPORT;
.LOGOFF;

Executing a FastExport Script

一旦脚本编写完成并命名为 employee.fx,你可以使用以下命令执行脚本。

Once the script is written and named as employee.fx, you can use the following command to execute the script.

fexp < employee.fx

执行上述命令之后,将在文件 employeedata.txt 中收到以下输出。

After executing the above command, you will receive the following output in the file employeedata.txt.

103       Peter          Paul           1983-04-01
101       Mike           James          1980-01-05
102       Robert         Williams       1983-03-05
105       Robert         James          1984-12-01
104       Alex           Stuart         1984-11-06

FastExport Terms

以下是 FastExport 脚本中常用的术语列表。

Following is the list of terms commonly used in FastExport script.

  1. LOGTABLE − Specifies the log table for restart purpose.

  2. LOGON − Logs into Teradata and initiates one or more sessions.

  3. DATABASE − Sets the default database.

  4. BEGIN EXPORT − Indicates the beginning of the export.

  5. EXPORT − Specifies the target file and the export format.

  6. SELECT − Specifies the select query to export data.

  7. END EXPORT − Specifies the end of FastExport.

  8. LOGOFF − Ends all sessions and terminates FastExport.

Teradata - BTEQ

BTEQ 实用工具是 Teradata 中的一个强大实用工具,可以在批处理和交互模式下使用。它可用于运行任何 DDL 语句、DML 语句、创建宏和存储过程。BTEQ 可用于从平面文件导入数据到 Teradata 表,还可用于将数据从表中提取到文件或报告。

BTEQ utility is a powerful utility in Teradata that can be used in both batch and interactive mode. It can be used to run any DDL statement, DML statement, create Macros and stored procedures. BTEQ can be used to import data into Teradata tables from flat file and it can also be used to extract data from tables into files or reports.

BTEQ Terms

以下是 BTEQ 脚本中常用的术语列表。

Following is the list of terms commonly used in BTEQ scripts.

  1. LOGON − Used to log into Teradata system.

  2. ACTIVITYCOUNT − Returns the number of rows affected by the previous query.

  3. ERRORCODE − Returns the status code of the previous query.

  4. DATABASE − Sets the default database.

  5. LABEL − Assigns a label to a set of SQL commands.

  6. RUN FILE − Executes the query contained in a file.

  7. GOTO − Transfers control to a label.

  8. LOGOFF − Logs off from database and terminates all sessions.

  9. IMPORT − Specifies the input file path.

  10. EXPORT − Specifies the output file path and initiates the export.

Example

以下是一个 BTEQ 脚本示例。

Following is a sample BTEQ script.

.LOGON 192.168.1.102/dbc,dbc;
   DATABASE tduser;

   CREATE TABLE employee_bkup (
      EmployeeNo INTEGER,
      FirstName CHAR(30),
      LastName CHAR(30),
      DepartmentNo SMALLINT,
      NetPay INTEGER
   )
   Unique Primary Index(EmployeeNo);

   .IF ERRORCODE <> 0 THEN .EXIT ERRORCODE;

   SELECT * FROM
   Employee
   Sample 1;
   .IF ACTIVITYCOUNT <> 0 THEN .GOTO InsertEmployee;

   DROP TABLE employee_bkup;

   .IF ERRORCODE <> 0 THEN .EXIT ERRORCODE;

   .LABEL InsertEmployee
   INSERT INTO employee_bkup
   SELECT a.EmployeeNo,
      a.FirstName,
      a.LastName,
      a.DepartmentNo,
      b.NetPay
   FROM
   Employee a INNER JOIN Salary b
   ON (a.EmployeeNo = b.EmployeeNo);

   .IF ERRORCODE <> 0 THEN .EXIT ERRORCODE;
.LOGOFF;

上面的脚本执行以下任务。

The above script performs the following tasks.

  1. Logs into Teradata System.

  2. Sets the Default Database.

  3. Creates a table called employee_bkup.

  4. Selects one record from Employee table to check if the table has any records.

  5. Drops employee_bkup table, if the table is empty.

  6. Transfers the control to a Label InsertEmployee which inserts records into employee_bkup table

  7. Checks ERRORCODE to make sure that the statement is successful, following each SQL statement.

  8. ACTIVITYCOUNT returns number of records selected/impacted by the previous SQL query.