Hive 简明教程

Hive - Partitioning

Hive 会将表组织成分区。这是一种基于分区列（如日期、城市和部门）的值将表分成相关部分的方法。使用分区可以轻松地查询部分数据。

Hive organizes tables into partitions. It is a way of dividing a table into related parts based on the values of partitioned columns such as date, city, and department. Using partition, it is easy to query a portion of the data.

表或分区被细分 buckets, 以便向数据提供额外结构，这些结构可用于更有效的查询。分桶基于表某些列的哈希函数值进行工作。

Tables or partitions are sub-divided into buckets, to provide extra structure to the data that may be used for more efficient querying. Bucketing works based on the value of hash function of some column of a table.

例如，名为 Tab1 的表包含诸如 id、名称、部门和 yoj（即入职年）之类的员工数据。假设您需要检索 2012 年加入的所有员工的详细信息。查询将搜索整个表以获取所需信息。然而，如果您按年份对员工数据进行分区并将其存储在单独的文件中，这将减少查询处理时间。以下示例展示如何对文件及其数据进行分区：

For example, a table named Tab1 contains employee data such as id, name, dept, and yoj (i.e., year of joining). Suppose you need to retrieve the details of all employees who joined in 2012. A query searches the whole table for the required information. However, if you partition the employee data with the year and store it in a separate file, it reduces the query processing time. The following example shows how to partition a file and its data:

以下文件包含 employeedata 表。

The following file contains employeedata table.

/tab1/employeedata/file1

id, name, dept, yoj
1, gopal, TP, 2012
2, kiran, HR, 2012
3, kaleel,SC, 2013
4, Prasanth, SC, 2013

上面数据使用年份分成两个文件。

The above data is partitioned into two files using year.

/tab1/employeedata/2012/file2

1, gopal, TP, 2012
2, kiran, HR, 2012

/tab1/employeedata/2013/file3

3, kaleel,SC, 2013
4, Prasanth, SC, 2013

Adding a Partition

我们可以通过更改表格来向表格添加分区。让我们假设我们有一个名为 employee 的表格，其中包含诸如Id、姓名、工资、职务、部门和yoj等字段。

We can add partitions to a table by altering the table. Let us assume we have a table called employee with fields such as Id, Name, Salary, Designation, Dept, and yoj.

Syntax:

ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec
[LOCATION 'location1'] partition_spec [LOCATION 'location2'] ...;

partition_spec:
: (p_column = p_col_value, p_column = p_col_value, ...)

以下查询用于向 employee 表中添加分区。

The following query is used to add a partition to the employee table.

hive> ALTER TABLE employee
> ADD PARTITION (year=’2012’)
> location '/2012/part2012';

Renaming a Partition

此命令的语法如下。

The syntax of this command is as follows.

ALTER TABLE table_name PARTITION partition_spec RENAME TO PARTITION partition_spec;

以下查询用于重命名分区：

The following query is used to rename a partition:

hive> ALTER TABLE employee PARTITION (year=’1203’)
   > RENAME TO PARTITION (Yoj=’1203’);

Dropping a Partition

以下语法用于删除分区：

The following syntax is used to drop a partition:

ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec, PARTITION partition_spec,...;

以下查询用于删除分区：

The following query is used to drop a partition:

hive> ALTER TABLE employee DROP [IF EXISTS]
   > PARTITION (year=’1203’);