Use Sqoop - E-MapReduce - Alibaba Cloud Documentation Center

Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datasets, such as relational databases.

Background information

Apache Sqoop is commonly used to transfer data in the following scenarios:

Import data from MySQL to HDFS
Import data from HDFS to MySQL
Import data from Hive to MySQL
Import data from MySQL to Hive
Import data from MySQL to OSS
Import data from OSS to MySQL
Use an SQL statement to import data

Import data from MySQL to HDFS

Run the following command on the master node of a cluster:

sqoop import --connect <dburi>/<dbname> --username <username> --password <password> --table <tablename> --target-dir <hdfs-dir> --split-by <split-column> --check-column <col> --incremental <mode> --last-value <value>

Parameter	Description
dburi	The access URL of the database. Example: jdbc:mysql://192.168..:3306/.
dbname	The name of the database.
username	The username that you want to use to log on to the database.
password	The password that you want to use to log on to the database.
tablename	The name of the MySQL table.
hdfs-dir	The Hadoop Distributed File System (HDFS) directory to which you want to import data. Example: /user/hive/result.
split-column	Optional. The column that you want to use to split tasks. The default value is the primary key column.
col	Optional. The column that you want to use to determine the rows to be imported in incremental import scenarios.
mode	Optional. The incremental import mode. Valid values: append and lastmodified.
value	Optional. The maximum value of the column in the previous incremental import.

For more information about the parameters, see the 7.2. Syntax section of Sqoop User Guide.

Import data from HDFS to MySQL

After you create MySQL tables that comply with the data structure of HDFS, run the following command on the master node of a cluster:

sqoop export --connect <dburi>/<dbname> --username <username> --password <password> --table <tablename> --export-dir <hdfs-dir>

Parameter	Description
dburi	The access URL of the database. Example: jdbc:mysql://192.168..:3306/.
dbname	The name of the database.
username	The username that you want to use to log on to the database.
password	The password that you want to use to log on to the database.
tablename	The name of the MySQL table.
hdfs-dir	The Hadoop Distributed File System (HDFS) directory to which you want to import data. Example: /user/hive/result.

For more information about the parameters, see the 7.2. Syntax section of Sqoop User Guide.

Import data from MySQL to Hive

Run the following command on the master node of a cluster:

sqoop import --connect <dburi>/<dbname> --username <username> --password <password> --table <tablename> --check-column <col> --incremental <mode> --last-value <value> --fields-terminated-by "\t" --lines-terminated-by "\n" --hive-import --target-dir <hdfs-dir> --hive-table <hive-tablename>

Parameter	Description
dburi	The access URL of the database. Example: jdbc:mysql://192.168..:3306/.
dbname	The name of the database.
username	The username that you want to use to log on to the database.
password	The password that you want to use to log on to the database.
tablename	The name of the MySQL table.
col	Optional. The column that you want to use to determine the rows to be imported in incremental import scenarios.
mode	Optional. The incremental import mode. Valid values: append and lastmodified.
value	Optional. The maximum value of the column in the previous incremental import.
hdfs-dir	The Hadoop Distributed File System (HDFS) directory to which you want to import data. Example: /user/hive/result.
hive-tablename	The name of the Hive table.

For more information about the parameters, see the 7.2. Syntax section of Sqoop User Guide.

Import data from Hive to MySQL

You can use the same command that is used to import data from HDFS to MySQL. You must specify the HDFS directory of the Hive tables from which you import data to MySQL. For more information, see Import data from HDFS to MySQL.

Import data from MySQL to OSS

Run the following command on the master node of a cluster:

sqoop import --connect <dburi>/<dbname> --username <username> --password <password> --table <tablename> --target-dir <oss-dir> --temporary-rootdir <oss-tmpdir> --check-column <col> --incremental <mode> --last-value <value>

Parameter	Description
dburi	The access URL of the database. Example: jdbc:mysql://192.168..:3306/.
dbname	The name of the database.
username	The username that you want to use to log on to the database.
password	The password that you want to use to log on to the database.
tablename	The name of the MySQL table.
oss-dir	The Object Storage Service (OSS) directory to which you want to import data. Example: oss://<AccessKeyID>:<AccessKey_Secret>@<BucketName>.oss-cn-hangzhou-internal.aliyuncs.com/result.
oss-tmpdir	The OSS directory to which you want to temporarily write data. If you set the mode parameter to append, you must specify this parameter. If you set the mode parameter to append, Apache Sqoop imports data to a temporary directory, and then renames the files in the temporary directory to store the files to the destination directory. If the destination directory exists in HDFS, Apache Sqoop does not import data to the directory or overwrite data in the directory.
col	Optional. The column that you want to use to determine the rows to be imported in incremental import scenarios.
mode	Optional. The incremental import mode. Valid values: append and lastmodified.
value	Optional. The maximum value of the column in the previous incremental import.

For more information about the parameters, see the 7.2. Syntax section of Sqoop User Guide.

Import data from OSS to MySQL

After you create MySQL tables that comply with the data structure of OSS, run the following command on the master node of a cluster:

sqoop export --connect <dburi>/<dbname> --username <username> --password <password> --table <tablename> --export-dir <oss-dir>

Parameter	Description
dburi	The access URL of the database. Example: jdbc:mysql://192.168..:3306/.
dbname	The name of the database.
username	The username that you want to use to log on to the database.
password	The password that you want to use to log on to the database.
tablename	The name of the MySQL table.
oss-dir	The Object Storage Service (OSS) directory to which you want to import data. Example: oss://<AccessKeyID>:<AccessKey_Secret>@<BucketName>.oss-cn-hangzhou-internal.aliyuncs.com/result.

For more information about the parameters, see the 7.2. Syntax section of Sqoop User Guide.

Use an SQL statement to import data

Run the following command:

sqoop import --connect <dburi>/<dbname> --username <username> --password <password> --query <query-sql> --split-by <sp-column> --hive-import --hive-table <hive-tablename> --target-dir <hdfs-dir>

Parameter	Description
dburi	The access URL of the database. Example: jdbc:mysql://192.168..:3306/.
dbname	The name of the database.
username	The username that you want to use to log on to the database.
password	The password that you want to use to log on to the database.
query-sql	The SQL statement that is used to select the data you want to import. Example: `SELECT * FROM profile WHERE id>1 AND \$CONDITIONS`.
sp-column	The name of the column to be split. In most cases, the value of this parameter is the primary key of the MySQL table.
hdfs-dir	The Hadoop Distributed File System (HDFS) directory to which you want to import data. Example: /user/hive/result.
hive-tablename	The name of the Hive table.

For more information about the parameters, see the 7.2. Syntax section of Sqoop User Guide.