All Products
Search
Document Center

Object Storage Service:Use Apache Sqoop on an EMR cluster to implement read and write access to data stored in OSS-HDFS

Last Updated:Sep 28, 2023

This topic describes how to use Apache Sqoop on an E-MapReduce (EMR) cluster to implement read and write access to data stored in OSS-HDFS.

Prerequisites

Procedure

  1. Connect to the EMR cluster.
    1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.
    2. Click the EMR cluster that you created.
    3. Click the Nodes tab, and then click the plus icon (+) on the left side of the node group.
    4. Click the ID of the ECS instance. On the Instances page, click Connect next to the instance ID.
    For more information about how to log on to a cluster in Windows or Linux by using an SSH key pair or SSH password, see Log on to a cluster.
  2. Import data from OSS-HDFS to MySQL.
    sudo sqoop import --connect  <dburi>/<dbname> --username <username> --password <password> --table <tablename> --target-dir <oss-dir>  --temporary-rootdir <oss-tmpdir> --check-column <col> --incremental <mode> --last-value <value> -as <format> -m <count>
    • The following table describes the parameters in the preceding command.
      ParameterRequiredDescription
      dburiYesThe URI that is used to access the database. Example: jdbc:mysql://192.168.xxx.xxx:3306/.
      dbnameYesThe name of the database.
      usernameYesThe username that you want to use to log on to the database.
      passwordYesThe password of the user.
      tablenameYesThe name of the MySQL table.
      oss-dirYesThe OSS-HDFS directory from which you want to read data or to which you want to write data. Example: oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/dir/.
      oss-tmpdirNoThe directory to which data is temporarily written. You must specify this parameter if the mode parameter is set to append.

      If you set the mode parameter to append, Apache Sqoop imports data to a temporary directory, and then renames and stores the files in the destination directory. If the destination directory exists in OSS-HDFS, Apache Sqoop does not import data to the directory or overwrite data in the directory.

      colNoThe check column that you want to use to determine the rows to be imported in incremental import scenarios.
      modeNoThe incremental import mode. Valid values: append and lastmodified.
      • append: Data is imported in increments based on incremental columns.
      • lastmodified: Data is imported in increments based on the time column.
      valueNoThe maximum value of the check column in the previous incremental import.
      formatNoThe format in which you want to store an object. Valid values: avrodatafile, sequencefile, textfile (default), and parquetfile.
      countNoThe number of MapReduce tasks.
    • Examples:

      The following sample code provides an example on how to import data in a specified directory of the examplebucket to the src_kv table in MySQL:

      sudo sqoop import --connect jdbc:mysql://master-1-1/sqoop_test --username root --password password1  --table src_kv -m 1 --target-dir oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/tmp/sqoop_kv --as-parquetfile