Sqoop 简明教程

Sqoop - Job

本章介绍如何创建和维护 Sqoop 作业。Sqoop 作业创建并保存导入和导出命令。它指定参数以识别和调用已保存的作业。这种重新调用或重新执行用于增量导入,它可以将更新的行从 RDBMS 表导入到 HDFS。

This chapter describes how to create and maintain the Sqoop jobs. Sqoop job creates and saves the import and export commands. It specifies parameters to identify and recall the saved job. This re-calling or re-executing is used in the incremental import, which can import the updated rows from RDBMS table to HDFS.

Syntax

以下是在创建 Sqoop 作业时的语法。

The following is the syntax for creating a Sqoop job.

$ sqoop job (generic-args) (job-args)
   [-- [subtool-name] (subtool-args)]

$ sqoop-job (generic-args) (job-args)
   [-- [subtool-name] (subtool-args)]

Create Job (--create)

在其中,我们创建了一个名为 myjob 的作业,它可以将表数据从 RDBMS 表导入到 HDFS。以下命令用于创建一个作业,即将数据从 employee 数据库中的 db 表导入到 HDFS 文件中。

Here we are creating a job with the name myjob, which can import the table data from RDBMS table to HDFS. The following command is used to create a job that is importing data from the employee table in the db database to the HDFS file.

$ sqoop job --create myjob \
-- import \
--connect jdbc:mysql://localhost/db \
--username root \
--table employee --m 1

Verify Job (--list)

‘--list’ 参数用于验证已保存的作业。以下命令用于验证已保存的 Sqoop 作业的列表。

‘--list’ argument is used to verify the saved jobs. The following command is used to verify the list of saved Sqoop jobs.

$ sqoop job --list

它展示已保存作业的列表。

It shows the list of saved jobs.

Available jobs:
   myjob

Inspect Job (--show)

‘--show’ 参数用于检查或验证特定作业及其详细信息。以下命令和示例输出用于验证一个被称为 myjob 的作业。

‘--show’ argument is used to inspect or verify particular jobs and their details. The following command and sample output is used to verify a job called myjob.

$ sqoop job --show myjob

它展示用在 myjob 中的工具及其选项。

It shows the tools and their options, which are used in myjob.

Job: myjob
 Tool: import Options:
 ----------------------------
 direct.import = true
 codegen.input.delimiters.record = 0
 hdfs.append.dir = false
 db.table = employee
 ...
 incremental.last.value = 1206
 ...

Execute Job (--exec)

‘--exec’ 选项用于执行一个已保存的作业。以下命令用于执行一个被称为 myjob 的已保存作业。

‘--exec’ option is used to execute a saved job. The following command is used to execute a saved job called myjob.

$ sqoop job --exec myjob

它展示给你以下输出。

It shows you the following output.

10/08/19 13:08:45 INFO tool.CodeGenTool: Beginning code generation
...