If you need to import data on external storage into a table of MaxCompute or a partition of a table, you can implement the operation through the LOAD command. This article describes how to use the LOAD command to import CSV format or other open source format data to MaxCompute.

Before performing load operations, you must have the project space to create tables (CreateTable) and modify tables (Alter). For more information, see Authorize users.

You can execute the statements that are described in this topic on the following platforms:

Features

MaxCompute supports you to import OSS, Amazon Redshift, BigQuery externally stored CSV format or other open source format data into MaxCompute tables or partitions of tables by using load overwrite or load into commands. Among them, Amazon Redshift and BigQuery data need to be imported into OSS before MaxCompute can be imported through OSS.

MaxCompute also supports importing data into partition table partitions of MaxCompute in a dynamic partitioning manner.

The load into command appends data directly to the table or partition. The load overwrite command clears the original data in the table or partition before inserting data into the table or partition.

Prerequisites

Before importing external storage data to MaxCompute, you need to authorize MaxCompute. The authorization mode for load overwrite or load into commands follows the authorization mode for MaxCompute external tables. The authorization mode is as follows:
Migration method Description
Specify the AccessKey ID and AccessKey secret in the directory in which data is stored. An example of the specified method is as follows:
'oss://<AccessKey_Id>:<AccessKey_Secret>@<oss_endpoint>/<object>'
Note If you use this method, the AccessKey ID and AccessKey secret are displayed in plaintext. This may pose security risks. We recommend that you do not use this method. For more information about the parameters on the Update page, see OSS domain names.
STS authorization It can be authorized with one click and has high security. Please refer to the STS authorization for details.
After the authorization is completed, you need to select the corresponding import method according to the format type of the imported data:

Limits

  • Only externally stored data can be imported to MaxCompute projects in the same region.
  • If you download data to a partitioned table, the schema of the external table in the OSS directory does not include partition key columns.

Use an extractor or a storage handler to import data

  • Syntax
    {load overwrite|into} table <table_name> [partition (<pt_spec>)]
    from location <external_location>
    stored by <StorageHandler>
    [with serdeproperties (<Options>)];
  • Parameter
    • table_name: required. The name of the destination table into which you want to insert data. You must create a destination table before you append data to it. The schema of the destination table must be compatible with the format of the MaxCompute external table.
    • pt_spec: optional. The destination table partition information that needs to be inserted into the data. The value of this parameter is in the (partition_col1 = partition_col_value1, partition_col2 = partition_col_value2, ...) format.
    • external_location: Required. Specifies the OSS directory in the format of 'oss://<oss_endpoint>/<object>'. For details, see OSS domain names. The system will read all files in this directory by default.
    • StorageHandler: Required. Specifies the built-in StorageHandler name. com.aliyun.odps.CsvStorageHandler is a built-in storage handler used to process CSV files. It defines how to read and write CSV files. You can specify this parameter as required. The logic of reading and writing CSV files is implemented by the system. You can use this parameter in the same way as you use it for the MaxCompute external table. For more information, see Process OSS data stored in open source formats.
    • Options: Optional. Specify the parameters related to the external table. SERDEPROPERTIES the supported attributes are consistent with those of the MaxCompute external table. Please refer to the Access OSS data by using the built-in extractor for details of the attribute list.
  • Examples

    Use an extractor or a storage handler to import data. Assume that the owner of MaxCompute and OSS are the same account, and import the data of the vehicle.csv file to MaxCompute through the Alibaba Cloud intranet.

    1. Click here to complete one-click authorization.
    2. Save the vehicle.csv file to the OSS Bucket Directory, mc-test/data_location/ in the oss-cn-hangzhou region, and organize the OSS directory path. An Object Storage Service (OSS) bucket is created. For more information, see Create buckets.

      The vehicle.csv file data is as follows:

      1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S
      1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE
      1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE
      1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W
      1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S
      1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S
      1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N
      1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW
      1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE
      1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,N
      The OSS directory path is organized based on the bucket, region, and endpoint information as follows:
      oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/
    3. Log on to the MaxCompute client to create a destination table ambulance_data_csv_load. The following code provides an example:
      create table ambulance_data_csv_load (
      vehicleId INT,
      recordId INT,
      patientId INT,
      calls INT,
      locationLatitute DOUBLE,
      locationLongtitue DOUBLE,
      recordTime STRING,
      direction STRING );
    4. Execute the load overwrite command to import the vehicle.csv file on OSS to the destination table. The following code provides an example:
      load overwrite table ambulance_data_csv_load
      from
      location 'oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/'
      stored by 'com.aliyun.odps.CsvStorageHandler'
      with serdeproperties (
      'odps.properties.rolearn'='acs:ram::xxxxx:role/aliyunodpsdefaultrole',   -- The Alibaba Cloud Resource Name (ARN) of the AliyunODPSDefaultRole role.
      'odps.text.option.delimiter'=','
      );
    5. View the import results of the destination table ambulance_data_csv_load. The following code provides an example:
      -- Enable a full table scan only for the current session.
      set odps.sql.allow.fullscan=true;
      select * from ambulance_data_csv_load;
      The following result is returned:
      +------------+------------+------------+------------+------------------+-------------------+------------+------------+
      | vehicleid  | recordid   | patientid  | calls      | locationlatitute | locationlongtitue | recordtime | direction  |
      +------------+------------+------------+------------+------------------+-------------------+------------+------------+
      | 1          | 1          | 51         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          |
      | 1          | 2          | 13         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         |
      | 1          | 3          | 48         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         |
      | 1          | 4          | 30         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | W          |
      | 1          | 5          | 47         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          |
      | 1          | 6          | 9          | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          |
      | 1          | 7          | 53         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | N          |
      | 1          | 8          | 63         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | SW         |
      | 1          | 9          | 4          | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         |
      | 1          | 10         | 31         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | N          |
      +------------+------------+------------+------------+------------------+-------------------+------------+------------+

Import data stored in an open source format

  • Syntax
    {load overwrite|into} table <table_name> [partition (<pt_spec>)]
    from location <external_location>
    [row format serde '<serde_class>'
      [with serdeproperties (<Options>)]
    ]
    stored as <file_format>;
  • Parameter
    • table_name: required. The name of the destination table into which you want to insert data. You must create a destination table before you append data to it. The schema of the destination table must be compatible with the format of the MaxCompute external table.
    • pt_spec: optional. The destination table partition information that needs to be inserted into the data. The value of this parameter is in the (partition_col1 = partition_col_value1, partition_col2 = partition_col_value2, ...) format.
    • external_location: Required. Specifies the OSS directory in the format of 'oss://<oss_endpoint>/<object>'. For details, see OSS domain names. The system will read all files in this directory by default.
    • serde_class: Optional. You can use this parameter in the same way as you use it for the MaxCompute external table. For more information, see Process OSS data stored in open source formats.
      Mappings between file formats and SerDe classes:
      • SEQUENCEFILE: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      • TEXTFILE: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      • RCFILE: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe
      • ORC: org.apache.hadoop.hive.ql.io.orc.OrcSerde
      • ORCFILE: org.apache.hadoop.hive.ql.io.orc.OrcSerde
      • PARQUET: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
      • AVRO: org.apache.hadoop.hive.serde2.avro.AvroSerDe
    • Options: Required. Specify the parameters related to the external table. SERDEPROPERTIES the supported attributes are consistent with those of the MaxCompute external table. Please refer to the Process OSS data stored in open source formats for details of the attribute list.
    • file_format: Required. Specifies the import data file format. such as ORC, PARQUET, RCFILE, SEQUENCEFILE, and TEXTFILE. You can use this parameter in the same way as you use it for the MaxCompute external table. For more information, see Process OSS data stored in open source formats.
      Note
      • serde_class and Options can be omitted without writing when using default values.
      • In the STORED AS clause, the size of a file cannot be greater than 3 GB. If the size of a file exceeds 3 GB, split the file.
  • Examples
    • Example 1: Import data in other open source formats. Assume that the owner of MaxCompute and OSS are the same account, and import the data of the vehicle.textfile file to MaxCompute through the Alibaba Cloud intranet.
      1. Click here to complete one-click authorization.
      2. Save the vehicle.txtfile file file to the OSS Bucket directory mc-test/data_location/ in the oss-cn-hangzhou region, and organize the OSS directory path. An Object Storage Service (OSS) bucket is created. For more information, see Create buckets.

        The vehicle.textfile file file data is as follows:

        1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S
        1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE
        1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE
        1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W
        1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S
        1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S
        1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N
        1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW
        1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE
        1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,N
        The OSS directory path is organized based on the bucket, region, and endpoint information as follows:
        oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/
      3. Log on to the MaxCompute client to create a destination table ambulance_data_textfile_load_pt. The following code provides an example:
        create table ambulance_data_textfile_load_pt (
        vehicleId STRING,
        recordId STRING,
        patientId STRING,
        calls STRING,
        locationLatitute STRING,
        locationLongtitue STRING,
        recordTime STRING,
        direction STRING)
        partitioned by (ds STRING);
      4. Execute the load overwrite command to import the vehicle.textfile file file on OSS to the destination table. The following code provides an example:
        load overwrite table ambulance_data_textfile_load_pt partition(ds='20200910')
        from
        location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/'
        row format serde 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
        stored as textfile;
      5. View the import results of the destination table ambulance_data_textfile_load_pt. The following code provides an example:
        -- Enable a full table scan only for the current session.
        set odps.sql.allow.fullscan=true;
        select * from ambulance_data_textfile_load_pt;
        The following result is returned:
        +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+
        | vehicleid  | recordid   | patientid  | calls      | locationlatitute | locationlongtitue | recordtime | direction  | ds         |
        +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+
        | 1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
        | 1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
        | 1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
        | 1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
        | 1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
        | 1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
        | 1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
        | 1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
        | 1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
        | 1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,N | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
        +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+
    • Import data to a destination table in dynamic partition mode.
      Note If the subdirectories under the OSS directory are organized as partition names, you can import data to the partition table as dynamic partitions.
      1. Click here to complete one-click authorization.
      2. Save the vehicle1.csv file and the vehicle2.csv file to the OSS Bucket directory mc-test/data_location/ds=20200909/ and mc-test/data_location/ds=20200910/, respectively, in the oss-cn-hangzhou region, and organize the OSS directory path. An Object Storage Service (OSS) bucket is created. For more information, see Create buckets.

        The vehicle1.csv file and vehicle2.csv file data are as follows:

        --vehicle1.csv
        1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S
        1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE
        1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE
        1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W
        1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S
        1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S
        --vehicle2.csv
        1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N
        1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW
        1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE
        1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,N
        The OSS directory path is organized based on the bucket, region, and endpoint information as follows:
        oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/ds=20200909/'
        oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/ds=20200910/'
      3. Log on to the MaxCompute client to create a destination table ambulance_data_csv_load_dynpt. The following code provides an example:
        create table ambulance_data_csv_load_dynpt (
        vehicleId STRING,
        recordId STRING,
        patientId STRING,
        calls STRING,
        locationLatitute STRING,
        locationLongtitue STRING,
        recordTime STRING,
        direction STRING)
        partitioned by (ds STRING);
      4. Execute the load overwrite command to import files on OSS into the destination table. The following code provides an example:
        load overwrite table ambulance_data_csv_load_dynpt partition(ds)
        from
        location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/'
        row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
        stored as textfile;
      5. -- Create destination table ambulance_data_csv_load_dynpt. The following code provides an example:
        -- Enable a full table scan only for the current session.
        set odps.sql.allow.fullscan=true;
        select * from ambulance_data_csv_load_dynpt;
        The following result is returned:
        +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+
        | vehicleid  | recordid   | patientid  | calls      | locationlatitute | locationlongtitue | recordtime | direction  | ds         |
        +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+
        | 1          | 7          | 53         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | N          | 20200909   |
        | 1          | 8          | 63         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | SW         | 20200909   |
        | 1          | 9          | 4          | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         | 20200909   |
        | 1          | 10         | 31         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | N          | 20200909   |
        | 1          | 1          | 51         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          | 20200910   |
        | 1          | 2          | 13         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         | 20200910   |
        | 1          | 3          | 48         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         | 20200910   |
        | 1          | 4          | 30         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | W          | 20200910   |
        | 1          | 5          | 47         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          | 20200910   |
        | 1          | 6          | 9          | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          | 20200910   |
        +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+