All Products
Search
Document Center

Object Storage Service:Use Apache Sqoop on an EMR cluster to implement read and write access to data stored in OSS-HDFS

Last Updated:Apr 29, 2026

This topic describes how to use Apache Sqoop on an E-MapReduce (EMR) cluster to implement read and write access to data stored in OSS-HDFS.

Prerequisites

  • You have created an EMR cluster that runs EMR V3.42.0 or later, or EMR V5.8.0 or later, and selected the Sqoop component. For more information, see Create a cluster.

  • You have activated OSS-HDFS and granted the required access permissions. For more information, see Activate OSS-HDFS.

Procedure

  1. Connect to the EMR cluster.

    1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.

    2. Click the EMR cluster that you created.

    3. Click the Nodes tab and click the + icon to the left of the node group.

    4. Click the ECS ID. On the Instances page, click Connect next to the instance ID.

    To log on to the cluster from a Windows or Linux environment with an SSH key pair or password, see Log on to a cluster.

  2. Import data from OSS-HDFS to MySQL.

    sudo sqoop import --connect  <dburi>/<dbname> --username <username> --password <password> --table <tablename> --target-dir <oss-dir>  --temporary-rootdir <oss-tmpdir> --check-column <col> --incremental <mode> --last-value <value> --as-<format> -m <count>
    • The following table describes the parameters in the command.

      Parameter

      Required

      Description

      dburi

      Yes

      The database connection URI. Example: jdbc:mysql://192.168.xxx.xxx:3306/.

      dbname

      Yes

      The name of the database.

      username

      Yes

      The username to log on to the database.

      password

      Yes

      The password for the database user.

      tablename

      Yes

      The name of the MySQL table.

      oss-dir

      Yes

      The destination directory in OSS-HDFS for the imported data. Example: oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/dir/.

      oss-tmpdir

      No

      The temporary directory for the import job. This parameter is required for the 'append' incremental import mode.

      If you set the mode parameter to append, Apache Sqoop imports data to a temporary directory, and then renames and stores the files in the destination directory. If the destination directory exists in OSS-HDFS, Apache Sqoop does not import data to the directory or overwrite data in the directory.

      col

      No

      The check column for an incremental import.

      mode

      No

      The incremental import mode. Valid values: append and lastmodified.

      • append: Imports data incrementally based on an auto-incrementing column.

      • lastmodified: Imports data incrementally based on a timestamp column.

      value

      No

      The maximum value of the check column from the previous incremental import.

      format

      No

      The output file format. Valid values: avrodatafile, sequencefile, textfile (default), and parquetfile.

      count

      No

      The number of MapReduce tasks.

    • Example

      The following sample code provides an example on how to import data in a specified directory of the examplebucket to the src_kv table in MySQL:

      sqoop import --connect jdbc:mysql://master-1-1/sqoop_test --username root --password password1  --table src_kv -m 1 --target-dir oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/tmp/sqoop_kv --as-parquetfile