This topic describes how to use Apache Sqoop on an E-MapReduce (EMR) cluster to implement read and write access to data stored in OSS-HDFS.
Prerequisites
-
You have created an EMR cluster that runs EMR V3.42.0 or later, or EMR V5.8.0 or later, and selected the Sqoop component. For more information, see Create a cluster.
-
You have activated OSS-HDFS and granted the required access permissions. For more information, see Activate OSS-HDFS.
Procedure
-
Connect to the EMR cluster.
-
Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.
-
Click the EMR cluster that you created.
-
Click the Nodes tab and click the
icon to the left of the node group. -
Click the ECS ID. On the Instances page, click Connect next to the instance ID.
To log on to the cluster from a Windows or Linux environment with an SSH key pair or password, see Log on to a cluster.
-
-
Import data from OSS-HDFS to MySQL.
sudo sqoop import --connect <dburi>/<dbname> --username <username> --password <password> --table <tablename> --target-dir <oss-dir> --temporary-rootdir <oss-tmpdir> --check-column <col> --incremental <mode> --last-value <value> --as-<format> -m <count>-
The following table describes the parameters in the command.
Parameter
Required
Description
dburi
Yes
The database connection URI. Example:
jdbc:mysql://192.168.xxx.xxx:3306/.dbname
Yes
The name of the database.
username
Yes
The username to log on to the database.
password
Yes
The password for the database user.
tablename
Yes
The name of the MySQL table.
oss-dir
Yes
The destination directory in OSS-HDFS for the imported data. Example:
oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/dir/.oss-tmpdir
No
The temporary directory for the import job. This parameter is required for the 'append' incremental import mode.
If you set the mode parameter to append, Apache Sqoop imports data to a temporary directory, and then renames and stores the files in the destination directory. If the destination directory exists in OSS-HDFS, Apache Sqoop does not import data to the directory or overwrite data in the directory.
col
No
The check column for an incremental import.
mode
No
The incremental import mode. Valid values:
appendandlastmodified.-
append: Imports data incrementally based on an auto-incrementing column. -
lastmodified: Imports data incrementally based on a timestamp column.
value
No
The maximum value of the check column from the previous incremental import.
format
No
The output file format. Valid values:
avrodatafile,sequencefile,textfile(default), andparquetfile.count
No
The number of MapReduce tasks.
-
-
Example
The following sample code provides an example on how to import data in a specified directory of the examplebucket to the src_kv table in MySQL:
sqoop import --connect jdbc:mysql://master-1-1/sqoop_test --username root --password password1 --table src_kv -m 1 --target-dir oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/tmp/sqoop_kv --as-parquetfile
-