Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datasets, such as relational databases.

Prerequisites

A Hadoop cluster is created. For more information about how to create a cluster, see Create a cluster.

Background information

Import data from MySQL to HDFS

Run the following command on the master node of your cluster:
sqoop import --connect <dburi>/<dbname> --username <username> --password <password> --table <tablename> --target-dir <hdfs-dir> --split-by <split-column> --check-column <col> --incremental <mode> --last-value <value>
Parameter Description
dburi The access URL of the database. Example: jdbc:mysql://192.168.xxx.xxx:3306/.
dbname The name of the database.
username The username that is used to log on to the database.
password The password that is used to log on to the database.
tablename The name of the MySQL table.
hdfs-dir The HDFS directory. Example: /user/hive/result.
split-column Optional. The column used to split tasks. The default value is the primary key column.
col Optional. Check columns in an incremental import scenario.
mode Optional. The incremental import mode. Valid values: append and lastmodified.
value Optional. The maximum value in a specific check column in the previous incremental import task.

For more information about the parameters, see the Sqoop Import section of Sqoop User Guide.

Import data from HDFS to MySQL

After you create MySQL tables that comply with the data structure of HDFS, run the following command on the master node of your cluster:
sqoop export --connect <dburi>/<dbname> --username <username> --password <password> --table <tablename> --export-dir <hdfs-dir>
Parameter Description
dburi The access URL of the database. Example: jdbc:mysql://192.168.xxx.xxx:3306/.
dbname The name of the database.
username The username that is used to log on to the database.
password The password that is used to log on to the database.
tablename The name of the MySQL table.
hdfs-dir The HDFS directory. Example: /user/hive/result.

For more information about the parameters, see the Sqoop Import section of Sqoop User Guide.

Import data from MySQL to Hive

Run the following command on the master node of your cluster:
sqoop import --connect <dburi>/<dbname> --username <username> --password <password> --table <tablename> --check-column <col> --incremental <mode> --last-value <value> --fields-terminated-by "\t" --lines-terminated-by "\n" --hive-import --target-dir <hdfs-dir> --hive-table <hive-tablename>
Parameter Description
dburi The access URL of the database. Example: jdbc:mysql://192.168.xxx.xxx:3306/.
dbname The name of the database.
username The username that is used to log on to the database.
password The password that is used to log on to the database.
tablename The name of the MySQL table.
col Optional. Check columns in an incremental import scenario.
mode Optional. The incremental import mode. Valid values: append and lastmodified.
value Optional. The maximum value in a specific check column in the previous incremental import task.
hdfs-dir The HDFS directory. Example: /user/hive/result.
hive-tablename The name of the Hive table.

For more information about the parameters, see the Sqoop Import section of Sqoop User Guide.

Import data from Hive to MySQL

You can refer to the command that is used to import data from HDFS to MySQL. You must specify the HDFS directory of the Hive tables from which you import data to MySQL. For more information, see Import data from HDFS to MySQL.

Import data from MySQL to OSS

Run the following command on the master node of your cluster:
sqoop import --connect <dburi>/<dbname> --username <username> --password <password> --table <tablename> --target-dir <oss-dir> --temporary-rootdir <oss-tmpdir> --check-column <col> --incremental <mode> --last-value <value>
Parameter Description
dburi The access URL of the database. Example: jdbc:mysql://192.168.xxx.xxx:3306/.
dbname The name of the database.
username The username that is used to log on to the database.
password The password that is used to log on to the database.
tablename The name of the MySQL table.
oss-dir The Object Storage Service (OSS) directory. Example: oss://<accessid>:<accesskey>@<bucketname>.oss-cn-hangzhou-internal.aliyuncs.com/result.
oss-tmpdir The temporary OSS directory. If mode is set to append, you must specify this parameter.

If you specify the append mode, Sqoop imports data to a temporary directory and renames and stores the files in the destination directory. Make sure that the new names do not conflict with existing file names in that directory. If the destination directory exists in HDFS, Sqoop does not import data into the directory or overwrite data in the directory.

col Optional. Check columns in an incremental import scenario.
mode Optional. The incremental import mode. Valid values: append and lastmodified.
value Optional. The maximum value in a specific check column in the previous incremental import task.

For more information about the parameters, see the Sqoop Import section of Sqoop User Guide.

Import data from OSS to MySQL

After you create MySQL tables that comply with the data structure of OSS, run the following command on the master node of your cluster:
sqoop export --connect <dburi>/<dbname> --username <username> --password <password> --table <tablename> --export-dir <oss-dir>
Parameter Description
dburi The access URL of the database. Example: jdbc:mysql://192.168.xxx.xxx:3306/.
dbname The name of the database.
username The username that is used to log on to the database.
password The password that is used to log on to the database.
tablename The name of the MySQL table.
oss-dir The Object Storage Service (OSS) directory. Example: oss://<accessid>:<accesskey>@<bucketname>.oss-cn-hangzhou-internal.aliyuncs.com/result.

For more information about the parameters, see the Sqoop Import section of Sqoop User Guide.

Use an SQL statement to import data

Run the following command:
sqoop import --connect <dburi>/<dbname> --username <username> --password <password> --query <query-sql> --split-by <sp-column> --hive-import --hive-table <hive-tablename> --target-dir <hdfs-dir>
Parameter Description
dburi The access URL of the database. Example: jdbc:mysql://192.168.xxx.xxx:3306/.
dbname The name of the database.
username The username that is used to log on to the database.
password The password that is used to log on to the database.
query-sql The SQL statement that is used to select the data you want to import. Example: SELECT * FROM profile WHERE id>1 AND \$CONDITIONS.
sp-column The name of a column to be split. In most cases, the value of this parameter is the primary key of the MySQL table.
hdfs-dir The HDFS directory. Example: /user/hive/result.
hive-tablename The name of the Hive table.

For more information about the parameters, see the Sqoop Import section of Sqoop User Guide.