Hive 简明教程
Hive - Create Table
本章说明了如何创建表以及如何向其中插入数据。在 HIVE 中创建表的约定与使用 SQL 创建表非常相似。
This chapter explains how to create a table and how to insert data into it. The conventions of creating a table in HIVE is quite similar to creating a table using SQL.
Create Table Statement
Create Table 是一款用于在 Hive 中创建表的语句。语法和示例如下:
Create Table is a statement used to create a table in Hive. The syntax and example are as follows:
Syntax
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[ROW FORMAT row_format]
[STORED AS file_format]
Example
让我们假设您需要使用 CREATE TABLE 语句创建一个名为 employee 的表。下表列出了员工表中的字段及其数据类型:
Let us assume you need to create a table named employee using CREATE TABLE statement. The following table lists the fields and their data types in employee table:
Sr.No |
Field Name |
Data Type |
1 |
Eid |
int |
2 |
Name |
String |
3 |
Salary |
Float |
4 |
Designation |
string |
以下数据是一条注释,行格式字段(例如字段终止符、行终止符和存储文件类型)。
The following data is a Comment, Row formatted fields such as Field terminator, Lines terminator, and Stored File type.
COMMENT ‘Employee details’
FIELDS TERMINATED BY ‘\t’
LINES TERMINATED BY ‘\n’
STORED IN TEXT FILE
以下查询使用上述数据创建名为 employee 的表。
The following query creates a table named employee using the above data.
hive> CREATE TABLE IF NOT EXISTS employee ( eid int, name String,
salary String, destination String)
COMMENT ‘Employee details’
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t’
LINES TERMINATED BY ‘\n’
STORED AS TEXTFILE;
如果你添加了 IF NOT EXISTS 选项,如果表已经存在,Hive 会忽略该声明。
If you add the option IF NOT EXISTS, Hive ignores the statement in case the table already exists.
在成功创建表后,你可以看到以下响应:
On successful creation of table, you get to see the following response:
OK
Time taken: 5.905 seconds
hive>
JDBC Program
下面给出了创建表的 JDBC 程序示例。
The JDBC program to create a table is given example.
import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;
public class HiveCreateTable {
private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";
public static void main(String[] args) throws SQLException {
// Register driver and create driver instance
Class.forName(driverName);
// get connection
Connection con = DriverManager.getConnection("jdbc:hive://localhost:10000/userdb", "", "");
// create statement
Statement stmt = con.createStatement();
// execute statement
stmt.executeQuery("CREATE TABLE IF NOT EXISTS "
+" employee ( eid int, name String, "
+" salary String, destignation String)"
+" COMMENT ‘Employee details’"
+" ROW FORMAT DELIMITED"
+" FIELDS TERMINATED BY ‘\t’"
+" LINES TERMINATED BY ‘\n’"
+" STORED AS TEXTFILE;");
System.out.println(“ Table employee created.”);
con.close();
}
}
将程序保存到名为 HiveCreateDb.java 的文件中。使用以下命令编译并执行此程序。
Save the program in a file named HiveCreateDb.java. The following commands are used to compile and execute this program.
$ javac HiveCreateDb.java
$ java HiveCreateDb
Load Data Statement
通常,在 SQL 中创建表之后,我们可以使用 Insert 语句插入数据。但在 Hive 中,我们可以使用 LOAD DATA 语句插入数据。
Generally, after creating a table in SQL, we can insert data using the Insert statement. But in Hive, we can insert data using the LOAD DATA statement.
在将数据插入 Hive 时,最好使用 LOAD DATA 存储大量记录。有两种方法可以加载数据:一种是来自本地文件系统,另一种是来自 Hadoop 文件系统。
While inserting data into Hive, it is better to use LOAD DATA to store bulk records. There are two ways to load data: one is from local file system and second is from Hadoop file system.
Syntax
加载数据的语法如下:
The syntax for load data is as follows:
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename
[PARTITION (partcol1=val1, partcol2=val2 ...)]
-
LOCAL is identifier to specify the local path. It is optional.
-
OVERWRITE is optional to overwrite the data in the table.
-
PARTITION is optional.
Example
我们将向表中插入以下数据。它是一个文本文件,名为 sample.txt ,位于 /home/user 目录中。
We will insert the following data into the table. It is a text file named sample.txt in /home/user directory.
1201 Gopal 45000 Technical manager
1202 Manisha 45000 Proof reader
1203 Masthanvali 40000 Technical writer
1204 Kiran 40000 Hr Admin
1205 Kranthi 30000 Op Admin
以下查询将给定的文本加载到表中。
The following query loads the given text into the table.
hive> LOAD DATA LOCAL INPATH '/home/user/sample.txt'
OVERWRITE INTO TABLE employee;
在成功下载后,你可以看到以下响应:
On successful download, you get to see the following response:
OK
Time taken: 15.905 seconds
hive>
JDBC Program
下面给出了将给定数据加载到表中的 JDBC 程序。
Given below is the JDBC program to load given data into the table.
import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;
public class HiveLoadData {
private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";
public static void main(String[] args) throws SQLException {
// Register driver and create driver instance
Class.forName(driverName);
// get connection
Connection con = DriverManager.getConnection("jdbc:hive://localhost:10000/userdb", "", "");
// create statement
Statement stmt = con.createStatement();
// execute statement
stmt.executeQuery("LOAD DATA LOCAL INPATH '/home/user/sample.txt'" + "OVERWRITE INTO TABLE employee;");
System.out.println("Load Data into employee successful");
con.close();
}
}
将程序保存到名为 HiveLoadData.java 的文件中。使用以下命令编译并执行此程序。
Save the program in a file named HiveLoadData.java. Use the following commands to compile and execute this program.
$ javac HiveLoadData.java
$ java HiveLoadData