Sqoop

Last Updated: Mar 28, 2017

Sqoop is open-source software used for data transmission between different data storage software. It supports multiple types of data storage software.

Data transmission

Common scenarios are as follows:

  • Mysql > HDFS

  • HDFS > Mysql

  • Mysql > Hive

  • Hive > Mysql

  • Use SQL as the importing condition.

Note: Before executing the following commands, switch your user to hadoop.

  1. su hadoop

From MySQL to HDFS

Execute the following commands on the master node of the cluster:

  1. sqoop import --connect jdbc:mysql://<dburi>/<dbname> --username <username> --password <password> --table <tablename> --target-dir <hdfs-dir>

Parameter description:

dburi: The access connection to the database, such as: jdbc:mysql://192.168.1.124:3306/.

dbname: Database name, such as: user.

sername: The user name used for logging on to the database.

Password: The password corresponding to the user name.

Tablename: The MySQL table name.

Hdfs-dir: The written directory of HDFS, such as: /user/hive/result.

For more detailed parameters, see Sqoop Import.

From HDFS to MySQL

First, you need to create the MySQL table corresponding to the data structure in HDFS, and then execute the following commands on the master node of the cluster and specify the path of the data file you want to export.

  1. sqoop export --connect jdbc:mysql://<dburi>/<dbname> --username <username> --password <password> --table <tablename> --export-dir <hdfs-dir>

Parameter description:

Dburi: The access connection to the database, such as: jdbc:mysql://192.168.1.124:3306/.

Dbname: Database name, such as: user.

Username: The user name used for logging on to the database.

Password: The password corresponding to the user name.

Tablename: The MySQL table name.

Hdfs-dir: The HDFS data directory you want to import to MySQL, such as: /user/hive/result.

For more detailed parameters, see Sqoop Export.

From MySQL to Hive

Create a Hive table during importing by executing the following commands:

  1. sqoop import --connect jdbc:mysql://<dburi>/<dbname> --username <username> --password <password> --table <tablename> --fields-terminated-by "\t" --lines-terminated-by "\n" --hive-import --target-dir <hdfs-dir> --hive-table <hive-tablename>

Parameter description:

Dburi: The access connection to the database, such as: jdbc:mysql://192.168.1.124:3306/.

Dbname: Database name, such as: user.

Username: The user name used for logging on to the database.

Password: The password corresponding to the user name.

Tablename: The MySQL table name.

Hdfs-dir: The data directory of HDFS you want to import to MySQL, such as: /user/hive/result.

Hive-tablename: The corresponding table name in Hive. It may be xxx.yyy.

For more detailed parameters, see Sqoop Import.

From Hive to MySQL

Refer to the above commands from HDFS to MySQL. You only need to specify the HDFS path corresponding to the Hive table.

From MySQL to OSS

It is similar to data transmission from MySQL to HDFS, and the only difference is the —target-dir. Execute the following commands on the master node of the cluster:

Note: OSS host addresses include intranet address, internet address and VPC network address. If you are using a classic network, you need to specify the intranet address. For Hongzhou region, it is oss-cn-hangzhou-internal.aliyuncs.com. If you want to specify the VPC network, for Hongzhou region, use vpc100-oss-cn-hangzhou.aliyuncs.com.

  1. sqoop import --connect jdbc:mysql://<dburi>/<dbname> --username <username> --password <password> --table <tablename> --target-dir <oss-dir>

Parameter description:

Dburi: The access connection to the database, such as: jdbc:mysql://192.168.1.124:3306/.

Dbname: Database name, such as: user.

Username: The user name used for logging on to the database.

Password: The password corresponding to the user name.

Tablename: The MySQL table name.

oss-dir: The written directory of OSS, such as: ‘oss://:@.oss-cn-hangzhou-internal.aliyuncs.com/result’.

For more detailed parameters, see Sqoop Import.

From OSS to MySQL

Data transmission from OSS to MySQL is similar to that from MySQL to HDFS, and the only difference is the —export-dir. First, you need to create the MySQL table corresponding to the data structure in OSS.

And then execute the following commands on the master node of the cluster: specify the path of the data file you want to import.

  1. sqoop export --connect jdbc:mysql://<dburi>/<dbname> --username <username> --password <password> --table <tablename> --export-dir <oss-dir>

Dburi: The access connection to the database, such as: jdbc:mysql://192.168.1.124:3306/.

Dbname: Database name, such as: user.

Username: The user name used for logging on to the database.

Password: The password corresponding to the user name.

Tablename: The MySQL table name.

oss-dir: The written directory of OSS, such as:oss://<accessid>:<accesskey>@<bucketname>.oss-cn-hangzhou-internal.aliyuncs.com/result

Note: OSS host addresses include intranet address, internet address and VPC network address. If you are using a classic network, you need to specify the intranet address. For Hongzhou region, it is oss-cn-hangzhou-internal.aliyuncs.com. If you want to specify the VPC network, for Hongzhou region, use vpc100-oss-cn-hangzhou.aliyuncs.com.

For more detailed parameters, see Sqoop Export.

Use SQL as the importing condition

In addition to specifying MySQL full table importing, you can also specify the imported data by writing SQL statements, as follows:

  1. sqoop import --connect jdbc:mysql://<dburi>/<dbname> --username <username> --password <password> --query <query-sql> --split-by <sp-column> --hive-import --hive-table <hive-tablename> --target-dir <hdfs-dir>

Parameter description:

Dburi: The access connection to the database, such as: jdbc:mysql://192.168.1.124:3306/.

Dbname: Database name, such as: user.

Username: The user name used for logging on to the database.

Password: The password corresponding to the user name.

Query-sql: The query statements used, such as: “SELECT * FROM profile WHERE id>1 AND \$CONDITIONS”. Remember to enclose the statement with quotes, and add AND \$CONDITIONS to the end.

Sp-column: The condition of separation, usually followed by the primary key of the MySQL table.

Hdfs-dir: The data directory of HDFS you want to import to MySQL, such as: /user/hive/result.

Hive-tablename: The corresponding table name in Hive. It may be xxx.yyy.

For more detailed parameters, see Sqoop Query Import.

Thank you! We've received your feedback.