This topic describes how to use the LOAD statement. The LOAD statement is used to import data from a MaxCompute external table on a data store, such as Object Storage Service (OSS), into a table or a table partition on MaxCompute.

Precautions

Before you execute the LOAD statement, make sure that MaxCompute is granted the required permissions. In this case, the methods that are used to grant the required permissions are the same as those that are used to grant the permissions to access MaxCompute external tables. You can use one of the following methods to grant the permissions:
  • Specify the AccessKey ID and AccessKey secret in the directory where the MaxCompute external table is saved. Example of the directory on OSS:
    'oss://<yourAccessKeyId>:<yourAccessKeySecret>@oss-cn-hangzhou-zmf.aliyuncs.com/my_bucket_id/my_location/'
    Note If you use this method, the AccessKey ID and AccessKey secret are displayed in plaintext. This may pose security risks. We recommend that you do not use this method.
  • Use STS. For more information, see STS authorization.

Use an extractor or a storage handler to import data

  • Syntax
    LOAD OVERWRITE|INTO TABLE table_name [PARTITION(partcol1=val1, partcol2=val2 ...)]
    FROM LOCATION external_location
    [STORED BY StorageHandler]
    [WITH SERDEPROPERTIES (Options)];
  • Parameters
    • LOAD INTO: appends data to a table or partition.
    • LOAD OVERWRITE: clears the existing data in a table or partition and then inserts data into the table or partition.
    • table_name: the name of the destination table to which you want to append data. You must create a destination table before you append data to it. The schema of the destination table must be compatible with the format of the MaxCompute external table.
    • LOCATION: the OSS directory in which the data files you want to read are saved. The system reads all files in the directory by default.
    • SORTED BY: the name of the storage handler. You can use this parameter in the same way as you use it for the MaxCompute external table. For more information, see Access OSS data by using the built-in extractor.
    • WITH SERDEPROPERTIES: the properties of the MaxCompute external table. The properties supported by this parameter are the same as those supported by the MaxCompute external table. For more information about the properties, see Access OSS data by using the built-in extractor.
  • Example
    Assume that the vehicle.csv file contains the following data:
    1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S
    1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE
    1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE
    1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W
    1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S
    1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S
    1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N
    1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW
    1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE
    1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,N
    Sample code:
    -- Save the vehicle.csv file to a folder in an OSS bucket. Obtain the structure of the directory where the MaxCompute external table is saved.
    oss://${accessKeyId}:${accessKeySecret}@oss-cn-hangzhou-zmf.aliyuncs.com/bucket/data_location/
    
    -- Create destination table ambulance_data_csv_load.
    CREATE TABLE ambulance_data_csv_load (
    vehicleId INT,
    recordId INT,
    patientId INT,
    calls INT,
    locationLatitute DOUBLE,
    locationLongtitue DOUBLE,
    recordTime STRING,
    direction STRING );
    
    -- Import the CSV file from OSS to the destination table.
    LOAD OVERWRITE TABLE ambulance_data_csv_load
    FROM
    LOCATION 'oss://oss-cn-hangzhou-zmf.aliyuncs.com/bucket/data_location/'
    STORED BY 'com.aliyun.odps.CsvStorageHandler'
    WITH SERDEPROPERTIES (
    'odps.properties.rolearn'='acs:ram::xxxxx:role/aliyunodpsdefaultrole',   -- The Alibaba Cloud Resource Name (ARN) specified in the AliyunODPSDefaultRole role.
    'odps.text.option.delimiter'=','
    );

Import data stored in an open source format

  • Syntax
    LOAD OVERWRITE|INTO TABLE table_name [PARTITION(partcol1=val1, partcol2=val2 ...)]
    FROM LOCATION external_location
    [ROW FORMAT SERDE '<serde class>'
      [WITH SERDEPROPERTIES ('odps.properties.rolearn'='${roleran}' [,'name2'='value2',...])]
    ]
    STORED AS <file format>;
  • Parameters
    • LOAD INTO: appends data to a table or partition.
    • LOAD OVERWRITE: clears the existing data in a table or partition and then inserts data into the table or partition.
    • table_name: the name of the destination table to which you want to append data. You must create a destination table before you append data to it. The schema of the destination table must be compatible with the format of the MaxCompute external table.
    • ROW FORMAT SERDE: This parameter is optional. You can use this parameter in the same way as you use it for the MaxCompute external table. For more information, see Process OSS data stored in open source formats.
      Mappings between file formats and SerDe classes:
      • SEQUENCEFILE: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      • TEXTFILE: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      • RCFILE: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe
      • ORC: org.apache.hadoop.hive.ql.io.orc.OrcSerde
      • ORCFILE: org.apache.hadoop.hive.ql.io.orc.OrcSerde
      • PARQUET: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
      • AVRO: org.apache.hadoop.hive.serde2.avro.AvroSerDe
    • SERDEPROPERTIES: If you use STS to grant MaxCompute the required permissions, you must use this parameter to specify the value of the odps.properties.rolearn property. This property indicates the ARN specified in the AliyunODPSDefaultRole role. For more information, see STS authorization. If the owner of MaxCompute and that of OSS use the same account, one-click authorization is allowed. For more information about the SERDEPROPERTIES of the data stored in other open source formats, see Process OSS data stored in open source formats.
    • STORED AS: the format of the imported data file, such as ORC, PARQUET, RCFILE, SEQUENCEFILE, and TEXTFILE. You can use this parameter in the same way as you use it for the MaxCompute external table. For more information, see Process OSS data stored in open source formats.
  • Example:
    -- Assume that the directory where the MaxCompute external table is saved uses the following structure. The MaxCompute external table in the directory does not include partition key columns. The information of partition key columns is a constituent part of the directory information.
    oss://${accessKeyId}:${accessKeySecret}@oss-cn-hangzhou-zmf.aliyuncs.com/bucket/data_location/ds=20200910/'
    
    -- Create destination table ambulance_data_csv_load_pt.
    CREATE TABLE ambulance_data_csv_load_pt (
    vehicleId STRING,
    recordId STRING,
    patientId STRING,
    calls STRING,
    locationLatitute STRING,
    locationLongtitue STRING,
    recordTime STRING,
    direction STRING)
    PARTITIONED BY (ds STRING);
    
    -- Import data stored in open source formats from OSS to destination table ambulance_data_csv_load_pt.
    LOAD OVERWRITE TABLE ambulance_data_csv_load_pt PARTITION(ds='20200910')
    FROM
    LOCATION 'oss://<yourAccessKeyId>:<yourAccessKeySecret>@oss-cn-hangzhou-zmf.aliyuncs.com/bucket/data_location/'
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
    STORED AS TEXTFILE;
    Note If you download data to a partitioned table, the schema of the external table in the OSS directory does not contain partition key columns. In the preceding example, the external table in the directory contains only the business fields. The table does not contain partition key columns.