Sqoop 简明教程
Sqoop - Introduction
传统应用程序管理系统,也就是应用程序使用 RDBMS 与关系数据库交互,是大数据生成的源头之一。此类由 RDBMS 生成的海量数据被储存在关系 Database Servers 中,即关系数据库结构中。
The traditional application management system, that is, the interaction of applications with relational database using RDBMS, is one of the sources that generate Big Data. Such Big Data, generated by RDBMS, is stored in Relational Database Servers in the relational database structure.
当 Hadoop 生态系统的大数据存储和分析器(例如 MapReduce、Hive、HBase、Cassandra、Pig 等)进入人们的视野后,它们需要一个工具与关系数据库服务器交互,以导入和导出其中驻留的大数据。在此,Sqoop 占据了 Hadoop 生态系统中的一个位置,以在关系数据库服务器和 Hadoop 的 HDFS 之间提供可行的交互。
When Big Data storages and analyzers such as MapReduce, Hive, HBase, Cassandra, Pig, etc. of the Hadoop ecosystem came into picture, they required a tool to interact with the relational database servers for importing and exporting the Big Data residing in them. Here, Sqoop occupies a place in the Hadoop ecosystem to provide feasible interaction between relational database server and Hadoop’s HDFS.
Sqoop −“SQL 到 Hadoop,Hadoop 到 SQL”
Sqoop − “SQL to Hadoop and Hadoop to SQL”
Sqoop 是一款设计用于在 Hadoop 与关系数据库服务器之间传输数据的工具。它用于将数据从诸如 MySQL 和 Oracle 等关系数据库导入 Hadoop HDFS,以及将数据从 Hadoop 文件系统导出到关系数据库。它由 Apache 软件基金会提供。
Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and export from Hadoop file system to relational databases. It is provided by the Apache Software Foundation.
Sqoop Import
导入工具将 RDBMS 中的各个表格导入 HDFS。表格中的每一行在 HDFS 中被视为一条记录。所有记录都以文本数据形式存储在文本文件中,或者以二进制数据形式存储在 Avro 和序列文件中。
The import tool imports individual tables from RDBMS to HDFS. Each row in a table is treated as a record in HDFS. All records are stored as text data in text files or as binary data in Avro and Sequence files.
Sqoop Export
导出工具将一组文件从 HDFS 导回到 RDBMS。作为 Sqoop 输入的文件包含记录,这些记录在表格中被称为行。对这些文件进行读取并解析为一组记录,然后使用用户指定的定界符进行分隔。
The export tool exports a set of files from HDFS back to an RDBMS. The files given as input to Sqoop contain records, which are called as rows in table. Those are read and parsed into a set of records and delimited with user-specified delimiter.