Apache Sqoop transfers bulk data between Apache Hadoop and structured datasets, such as relational databases. Use it to move data between MySQL, Hadoop Distributed File System (HDFS), Hive, and Object Storage Service (OSS) on EMR clusters.
Run all commands on the master node of your cluster.
Supported scenarios
Import data from MySQL to HDFS
sqoop import --connect <dburi>/<dbname> --username <username> --password <password> \
--table <tablename> --target-dir <hdfs-dir> \
--split-by <split-column> \
--check-column <col> --incremental <mode> --last-value <value>
| Parameter | Required | Description |
|---|---|---|
dburi |
Required | Database access URL. Example: jdbc:mysql://192.168..:3306/ |
dbname |
Required | Database name |
username |
Required | Username to log on to the database |
password |
Required | Password to log on to the database |
tablename |
Required | MySQL table name |
hdfs-dir |
Required | HDFS directory to write data to. Example: /user/hive/result |
split-column |
Optional | Column used to split parallel tasks. Defaults to the primary key column |
col |
Optional | Column used to identify rows for incremental import |
mode |
Optional | Incremental import mode. Valid values: append and lastmodified |
value |
Optional | Maximum column value from the previous incremental import |
For the full parameter reference, see Syntax in the Sqoop User Guide.
Import data from HDFS to MySQL
Before running this command, create a MySQL table whose schema matches the HDFS data structure.
sqoop export --connect <dburi>/<dbname> --username <username> --password <password> \
--table <tablename> --export-dir <hdfs-dir>
| Parameter | Required | Description |
|---|---|---|
dburi |
Required | Database access URL. Example: jdbc:mysql://192.168..:3306/ |
dbname |
Required | Database name |
username |
Required | Username to log on to the database |
password |
Required | Password to log on to the database |
tablename |
Required | MySQL table name |
hdfs-dir |
Required | HDFS source directory. Example: /user/hive/result |
For the full parameter reference, see Syntax in the Sqoop User Guide.
Import data from Hive to MySQL
Hive tables are stored as files in HDFS. To export a Hive table to MySQL, use the same sqoop export command as Import data from HDFS to MySQL, specifying the HDFS directory of the Hive table as --export-dir.
Import data from MySQL to Hive
sqoop import --connect <dburi>/<dbname> --username <username> --password <password> \
--table <tablename> \
--check-column <col> --incremental <mode> --last-value <value> \
--fields-terminated-by "\t" --lines-terminated-by "\n" \
--hive-import --target-dir <hdfs-dir> --hive-table <hive-tablename>
| Parameter | Required | Description |
|---|---|---|
dburi |
Required | Database access URL. Example: jdbc:mysql://192.168..:3306/ |
dbname |
Required | Database name |
username |
Required | Username to log on to the database |
password |
Required | Password to log on to the database |
tablename |
Required | MySQL table name |
hdfs-dir |
Required | Intermediate HDFS staging directory. Example: /user/hive/result |
hive-tablename |
Required | Target Hive table name |
col |
Optional | Column used to identify rows for incremental import |
mode |
Optional | Incremental import mode. Valid values: append and lastmodified |
value |
Optional | Maximum column value from the previous incremental import |
For the full parameter reference, see Syntax in the Sqoop User Guide.
Import data from MySQL to OSS
sqoop import --connect <dburi>/<dbname> --username <username> --password <password> \
--table <tablename> --target-dir <oss-dir> --temporary-rootdir <oss-tmpdir> \
--check-column <col> --incremental <mode> --last-value <value>
| Parameter | Required | Description |
|---|---|---|
dburi |
Required | Database access URL. Example: jdbc:mysql://192.168..:3306/ |
dbname |
Required | Database name |
username |
Required | Username to log on to the database |
password |
Required | Password to log on to the database |
tablename |
Required | MySQL table name |
oss-dir |
Required | OSS destination directory. Example: oss://<AccessKeyId>:<AccessKeySecret>@<BucketName>.oss-cn-hangzhou-internal.aliyuncs.com/result |
oss-tmpdir |
Required if mode=append |
OSS temporary directory. When mode is append, Sqoop writes data to this directory first, then renames the files to the destination. If the destination directory already exists in HDFS, Sqoop does not import or overwrite data |
col |
Optional | Column used to identify rows for incremental import |
mode |
Optional | Incremental import mode. Valid values: append and lastmodified |
value |
Optional | Maximum column value from the previous incremental import |
For the full parameter reference, see Syntax in the Sqoop User Guide.
Import data from OSS to MySQL
Before running this command, create a MySQL table whose schema matches the OSS data structure.
sqoop export --connect <dburi>/<dbname> --username <username> --password <password> \
--table <tablename> --export-dir <oss-dir>
| Parameter | Required | Description |
|---|---|---|
dburi |
Required | Database access URL. Example: jdbc:mysql://192.168..:3306/ |
dbname |
Required | Database name |
username |
Required | Username to log on to the database |
password |
Required | Password to log on to the database |
tablename |
Required | MySQL table name |
oss-dir |
Required | OSS source directory. Example: oss://<AccessKeyId>:<AccessKeySecret>@<BucketName>.oss-cn-hangzhou-internal.aliyuncs.com/result |
For the full parameter reference, see Syntax in the Sqoop User Guide.
Use an SQL statement to import data
Use --query to filter rows with a custom SQL statement instead of importing an entire table.
sqoop import --connect <dburi>/<dbname> --username <username> --password <password> \
--query <query-sql> --split-by <sp-column> \
--hive-import --hive-table <hive-tablename> --target-dir <hdfs-dir>
| Parameter | Required | Description |
|---|---|---|
dburi |
Required | Database access URL. Example: jdbc:mysql://192.168..:3306/ |
dbname |
Required | Database name |
username |
Required | Username to log on to the database |
password |
Required | Password to log on to the database |
query-sql |
Required | SQL statement to select the data to import. Must include $CONDITIONS. Example: SELECT * FROM profile WHERE id>1 AND \$CONDITIONS |
sp-column |
Required | Column used to split parallel tasks. Usually the primary key of the MySQL table |
hdfs-dir |
Required | Intermediate HDFS staging directory. Example: /user/hive/result |
hive-tablename |
Required | Target Hive table name |
For the full parameter reference, see Syntax in the Sqoop User Guide.