If you want to import data from an external data store into a MaxCompute table or a partition of the table, you can use LOAD statements. This topic describes how to use LOAD statements to import data in the CSV format or another open source format from an external data store into a MaxCompute table.

Before you execute LOAD statements, make sure that you have the CREATE TABLE and ALTER permissions on MaxCompute projects. For more information about authorization, see Authorize users.

You can execute the statements that are described in this topic on the following platforms:

Description

MaxCompute allows you to execute the LOAD OVERWRITE or LOAD INTO statement to import data from an external data store into a MaxCompute table or a partition of the table. The data that you want to import must be in the CSV format or another open source format. The external data store can be Object Storage Service (OSS), Amazon Redshift, or BigQuery. If you want to import data from Amazon Redshift or BigQuery into a MaxCompute table, you must first import the data into OSS.

MaxCompute also allows you to import data into a partition of a partitioned table in dynamic partition mode.

The LOAD INTO statement directly appends data to a table or a partition of the table. The LOAD OVERWRITE statement clears a table or a partition of the table and then inserts data into the table or partition.

Prerequisites

MaxCompute must be granted the related permissions before you import data into MaxCompute. The authorization methods for the LOAD OVERWRITE and LOAD INTO statements are the same as those for MaxCompute external tables. You can use one-click authorization to ensure security. For more information, see STS authorization.

After the authorization is complete, you need to select an appropriate import method based on the format of the data that you want to import.

Limits

  • Data in an external data store can be imported into only the MaxCompute projects that are in the same region as the external data store.
  • If you want to import data into a partitioned table, make sure that the schema of the table in the external data store does not include partition key columns.

Use a built-in extractor to import data

  • Syntax
    {load overwrite|into} table <table_name> [partition (<pt_spec>)]
    from location <external_location>
    stored by <StorageHandler>
    [with serdeproperties (<Options>)];
  • Parameters
    • table_name: required. The name of the table into which you want to insert data. You must create the table before you insert data into it. The schema of the table, excluding partition key columns, must be consistent with the layout of the data in the external data store.
    • pt_spec: optional. The partition information of the table into which you want to insert data. The value of this parameter is in the (partition_col1 = partition_col_value1, partition_col2 = partition_col_value2, ...) format.
    • external_location: required. The OSS directory that stores the data. The value of this parameter is in the 'oss://<oss_endpoint>/<object>' format. For more information about OSS endpoints, see OSS domain names. By default, MaxCompute reads all files in this directory.
    • StorageHandler: required. The name of a storage handler that is considered a built-in extractor. com.aliyun.odps.CsvStorageHandler is a storage handler that is used to process CSV files. It defines how to read data from and write data to CSV files. You need to specify only this parameter based on your business requirements. The related logic is implemented by the system. You can use this parameter in the same way as you use it for a MaxCompute external table. For more information, see Access OSS data by using a built-in extractor.
    • Options: optional. The properties related to the external table in WITH SERDEPROPERTIES. The properties are the same as those for creating the MaxCompute external table. For more information about the properties, see Access OSS data by using a built-in extractor.
  • Example

    The owners of MaxCompute and OSS resources use the same Alibaba Cloud account. Use a built-in extractor to import data from the vehicle.csv file into a MaxCompute table over a virtual private cloud (VPC).

    1. Perform one-click authorization.
    2. Save the vehicle.csv file to the mc-test/data_location/ directory of an OSS bucket in the oss-cn-hangzhou region and organize the path to the OSS directory. For more information about how to create an OSS bucket, see Create buckets.

      The vehicle.csv file contains the following data:

      1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S
      1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE
      1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE
      1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W
      1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S
      1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S
      1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N
      1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW
      1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE
      1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,N
      Organize the following path to the OSS directory based on the bucket, region, and endpoint:
      oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/
    3. Log on to the MaxCompute client and create a table named ambulance_data_csv_load. Sample statement:
      create table ambulance_data_csv_load (
      vehicleId INT,
      recordId INT,
      patientId INT,
      calls INT,
      locationLatitute DOUBLE,
      locationLongtitue DOUBLE,
      recordTime STRING,
      direction STRING );
    4. Execute the LOAD OVERWRITE statement to import the vehicle.csv file from OSS into the destination table. Sample statement:
      load overwrite table ambulance_data_csv_load
      from
      location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/'
      stored by 'com.aliyun.odps.CsvStorageHandler'
      with serdeproperties (
      'odps.properties.rolearn'='acs:ram::xxxxx:role/aliyunodpsdefaultrole',   -- The Alibaba Cloud Resource Name (ARN) of the AliyunODPSDefaultRole role. You can view the ARN on the RAM Roles page. 
      'odps.text.option.delimiter'=','
      );
    5. View the import result of the ambulance_data_csv_load table. Sample statements:
      -- Enable a full table scan. This operation takes effect only for the current session. 
      set odps.sql.allow.fullscan=true;
      select * from ambulance_data_csv_load;
      The following result is returned:
      +------------+------------+------------+------------+------------------+-------------------+------------+------------+
      | vehicleid  | recordid   | patientid  | calls      | locationlatitute | locationlongtitue | recordtime | direction  |
      +------------+------------+------------+------------+------------------+-------------------+------------+------------+
      | 1          | 1          | 51         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          |
      | 1          | 2          | 13         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         |
      | 1          | 3          | 48         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         |
      | 1          | 4          | 30         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | W          |
      | 1          | 5          | 47         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          |
      | 1          | 6          | 9          | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          |
      | 1          | 7          | 53         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | N          |
      | 1          | 8          | 63         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | SW         |
      | 1          | 9          | 4          | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         |
      | 1          | 10         | 31         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | N          |
      +------------+------------+------------+------------+------------------+-------------------+------------+------------+

Import data in another open source format

  • Syntax
    {load overwrite|into} table <table_name> [partition (<pt_spec>)]
    from location <external_location>
    [row format serde '<serde_class>'
      [with serdeproperties (<Options>)]
    ]
    stored as <file_format>;
  • Parameters
    • table_name: required. The name of the table into which you want to insert data. You must create the table before you insert data into it. The schema of the table, excluding partition key columns, must be consistent with the layout of the data in the external data store.
    • pt_spec: optional. The partition information of the table into which you want to insert data. The value of this parameter is in the (partition_col1 = partition_col_value1, partition_col2 = partition_col_value2, ...) format.
    • external_location: required. The OSS directory that stores the data. The value of this parameter is in the 'oss://<oss_endpoint>/<object>' format. For more information about OSS endpoints, see OSS domain names. By default, MaxCompute reads all files in this directory.
    • serde_class: optional. You can use this parameter in the same way as you use it for a MaxCompute external table. For more information, see Open source data formats supported by OSS external tables.
    • Options: required. The properties related to the external table in WITH SERDEPROPERTIES. The properties are the same as those for creating the MaxCompute external table. For more information about the properties, see Open source data formats supported by OSS external tables.
    • file_format: required. The format of the data that you want to import, such as ORC, PARQUET, RCFILE, SEQUENCEFILE, or TEXTFILE. You can use this parameter in the same way as you use it for a MaxCompute external table. For more information, see Open source data formats supported by OSS external tables.
      Note
      • If you want to use the default values of serde_class and Options, you do not need to specify these parameters.
      • The size of a file that you want to import cannot exceed 3 GB. If the size exceeds 3 GB, split the file.
  • Examples
    • Example 1: Import data in another open source format. The owners of MaxCompute and OSS use the same Alibaba Cloud account. Import data from a text file named vehicle into a MaxCompute table over a VPC.
      1. Perform one-click authorization.
      2. Save the vehicle text file to the mc-test/data_location/ directory of an OSS bucket in the oss-cn-hangzhou region and organize the path to OSS directory. For more information about how to create an OSS bucket, see Create buckets.

        The vehicle text file contains the following data:

        1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S
        1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE
        1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE
        1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W
        1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S
        1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S
        1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N
        1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW
        1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE
        1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,N
        Organize the following path to the OSS directory based on the bucket, region, and endpoint:
        oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/
      3. Log on to the MaxCompute client and create a destination table named ambulance_data_textfile_load_pt. Sample statement:
        create table ambulance_data_textfile_load_pt (
        vehicleId STRING,
        recordId STRING,
        patientId STRING,
        calls STRING,
        locationLatitute STRING,
        locationLongtitue STRING,
        recordTime STRING,
        direction STRING)
        partitioned by (ds STRING);
      4. Execute the LOAD OVERWRITE statement to import the vehicle text file from OSS to the destination table. Sample statement:
        load overwrite table ambulance_data_textfile_load_pt partition(ds='20200910')
        from
        location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/'
        row format serde 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
        stored as textfile;
      5. View the import result of the ambulance_data_textfile_load_pt table. Sample statements:
        -- Enable a full table scan. This operation takes effect only for the current session. 
        set odps.sql.allow.fullscan=true;
        select * from ambulance_data_textfile_load_pt;
        The following result is returned:
        +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+
        | vehicleid  | recordid   | patientid  | calls      | locationlatitute | locationlongtitue | recordtime | direction  | ds         |
        +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+
        | 1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
        | 1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
        | 1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
        | 1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
        | 1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
        | 1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
        | 1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
        | 1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
        | 1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
        | 1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,N | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
        +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+
    • Example 2: Import data into a destination table in dynamic partition mode.
      Note If the subdirectories under an OSS directory are mapped to partition names, you can import data into a partitioned table in dynamic partition mode.
      1. Perform one-click authorization.
      2. Save the vehicle1.csv file to the mc-test/data_location/ds=20200909/ directory of the OSS bucket and the vehicle2.csv file to the mc-test/data_location/ds=20200910/ directory of the OSS bucket in the oss-cn-hangzhou region and organize the path to the OSS directory. For more information about how to create an OSS bucket, see Create buckets.

        The vehicle1.csv and vehicle2.csv files contain the following data:

        --vehicle1.csv
        1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S
        1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE
        1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE
        1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W
        1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S
        1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S
        --vehicle2.csv
        1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N
        1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW
        1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE
        1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,N
        Organize the following paths to the OSS directories based on the bucket, region, and endpoint:
        oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/ds=20200909/'
        oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/ds=20200910/'
      3. Log on to the MaxCompute client and create a destination table named ambulance_data_csv_load_dynpt. Sample statement:
        create table ambulance_data_csv_load_dynpt (
        vehicleId STRING,
        recordId STRING,
        patientId STRING,
        calls STRING,
        locationLatitute STRING,
        locationLongtitue STRING,
        recordTime STRING,
        direction STRING)
        partitioned by (ds STRING);
      4. Execute the LOAD OVERWRITE statement to import the files from OSS into the destination table. Sample statement:
        load overwrite table ambulance_data_csv_load_dynpt partition(ds)
        from
        location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/'
        row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
        stored as textfile;
      5. -- View the import result of the ambulance_data_csv_load_dynpt table. Sample statements:
        -- Enable a full table scan. This operation takes effect only for the current session. 
        set odps.sql.allow.fullscan=true;
        select * from ambulance_data_csv_load_dynpt;
        The following result is returned:
        +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+
        | vehicleid  | recordid   | patientid  | calls      | locationlatitute | locationlongtitue | recordtime | direction  | ds         |
        +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+
        | 1          | 7          | 53         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | N          | 20200909   |
        | 1          | 8          | 63         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | SW         | 20200909   |
        | 1          | 9          | 4          | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         | 20200909   |
        | 1          | 10         | 31         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | N          | 20200909   |
        | 1          | 1          | 51         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          | 20200910   |
        | 1          | 2          | 13         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         | 20200910   |
        | 1          | 3          | 48         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         | 20200910   |
        | 1          | 4          | 30         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | W          | 20200910   |
        | 1          | 5          | 47         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          | 20200910   |
        | 1          | 6          | 9          | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          | 20200910   |
        +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+