MaxCompute allows you to execute INSERT statements to write data from a MaxCompute project to an Object Storage Service (OSS) directory by using OSS external tables. This topic describes how to write data to OSS and provides examples of data write operations.

Background information

You can write data in an internal table of MaxCompute to OSS. After you read data from an OSS external table and process the data, you can write the processed data back to OSS.
  • Write data to OSS by using a built-in text extractor or a built-in open source data extractor.

    If you need to write data in the CSV or TSV format, an open source format, or a compression format that is supported by MaxCompute to OSS, you can create an OSS external table by using a built-in text extractor or a built-in open source data extractor of MaxCompute and execute an INSERT statement to write data to OSS by using the OSS external table.

  • Write data to OSS by using a custom extractor.

    If you need to write data in a custom format to OSS, you can create an OSS external table by using a custom extractor and execute an INSERT statement to write data to OSS by using the OSS external table.

    For more information, see Example: Write data to OSS by using a custom extractor.

  • Write data to OSS by using the multipart upload feature.

    If you need to write data in an open source format to OSS, you can create an OSS external table by using a built-in open source data extractor, enable or disable the multipart upload feature, and execute an INSERT statement to write data to OSS by using the OSS external table. To enable or disable the multipart upload feature, you must specify the odps.sql.unstructured.oss.commit.mode parameter at the session or project level. By default, this feature is disabled. For more information about the multipart upload feature, see Multipart upload.

    Before you use this feature, make sure that jobConf2 is enabled for your project to generate a job execution plan. To enable jobConf2, set the odps.sql.jobconf.odps2 parameter to True.

    Note The default value of odps.sql.jobconf.odps2 is True. If it is not set to True, run the set odps.sql.jobconf.odps2=true; command to enable jobConf2 at the session level.
    Valid values of the odps.sql.unstructured.oss.commit.mode parameter:
    • False: Data that is written from MaxCompute to an OSS external table is stored in the .odps folder in the directory specified by LOCATION. A .meta file is included in the .odps folder to ensure that the data written to OSS is consistent with the data in MaxCompute. Only MaxCompute can correctly process the data in the .odps folder. If another data processing engine parses the data in this folder, an error is returned.
    • True: MaxCompute uses the multipart upload feature to ensure that the data written to OSS is consistent with the data in MaxCompute in two-phase commit mode. In this case, the .odps folder and the .meta file are not generated. Other data processing engines can normally parse the data that is written from MaxCompute to OSS.
    Notice In extreme cases, if a job that executes the INSERT OVERWRITE statement fails, historical data is deleted but new data is not written to the destination table.

    Cause: A hardware failure occurs or the metadata fails to be updated. This issue rarely occurs. If this issue occurs, new data cannot be written to the destination table, and the deleted historical data cannot be restored because the delete operation in OSS cannot be rolled back.

    Solution:
    • If you want to use historical data to overwrite the data of an OSS external table, you must back up the OSS data. This way, you can use the backup data to overwrite the OSS external table when the job fails. For example, you can execute the insert overwrite table T select * from table T; statement to overwrite table T by using the historical data of table T.
    • If a job that executes the INSERT OVERWRITE statement can be repeatedly submitted, you can submit the job again when the job fails.

Prerequisites

Before you write data to OSS, make sure that the following requirements are met:

  • (Optional) An OSS bucket, OSS directories, and OSS data files are prepared.
    Note MaxCompute can automatically create an OSS directory. You can also manually create an OSS directory.

    For more information about how to create an OSS bucket, how to create an OSS directory, and how to upload data files, see Create buckets, Create directories, and Upload files.

Example: Write data to a non-partitioned OSS directory by using a built-in text extractor

Read data from the OSS external table mc_oss_csv_external1 that is created in Access OSS data by using a built-in extractor and write the data in the CSV format to the OSS directory oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo1/output/.

Perform the following steps:

  1. Log on to the MaxCompute client and execute the following statement to create an OSS external table. The OSS external table maps to the OSS directory to which you want to write data.
    create external table if not exists mc_oss_csv_external4
    (
    vehicleId int,
    recordId int,
    patientId int,
    calls int,
    locationLatitute double,
    locationLongtitue double,
    recordTime string,
    direction string
    )
    stored by 'com.aliyun.odps.CsvStorageHandler' 
    with serdeproperties (
     'odps.properties.rolearn'='acs:ram::xxxxxx:role/aliyunodpsdefaultrole'
    ) 
    location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo1/output/';                       
  2. Execute the INSERT OVERWRITE or INSERT INTO statement on the MaxCompute client to write data to the created OSS external table.
    insert into table mc_oss_csv_external4 select * from mc_oss_csv_external1;

    If the statement is successfully executed, you can view the exported files in the OSS directory. A folder named .odps is generated in the output folder. The .odps folder contains a .meta file and a folder where .csv files are saved. Example of a file in the OSS directory: output/.odps/20210330*********/R2_1_0_0-0_TableSink1-0-.csv.

    Export result
    Note
    • The .meta file in the .odps folder is an extra macro data file that is generated by MaxCompute. This file records valid data in the .odps folder. If the INSERT statement is successfully executed, all data in the .odps folder is valid. MaxCompute parses the .meta file only when the job that executes the INSERT statement fails. If the INSERT OVERWRITE statement is used and the job that executes the statement fails or is terminated, execute the INSERT OVERWRITE statement again.
    • If you use a built-in extractor of MaxCompute to process TSV or CSV files, the number of files generated in the OSS directory is the same as the number of concurrent SQL jobs.
    • If you execute the insert overwrite ... select ... from ...; statement and 1,000 mappers are allocated on the source table specified by from_tablename, 1,000 TSV or CSV files will be generated.
    • You can use flexible semantics and configurations of MaxCompute to limit the number of files that can be generated. If an outputer runs in a mapper, you can adjust the number of generated files by changing the value of odps.stage.mapper.split.size to adjust the number of concurrent mappers. If an outputer runs in a reducer, you can adjust the number of generated files by changing the value of odps.stage.reducer.num. If an outputer runs in a joiner, you can adjust the number of generated files by changing the value of odps.stage.joiner.num.

Example: Write data to a partitioned OSS directory by using a built-in text extractor

Read data from the OSS external table mc_oss_csv_external2 that is created in Access OSS data by using a built-in extractor and write the data in the CSV format to the OSS directory oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo2/output/.

Perform the following steps:

  1. Log on to the MaxCompute client and execute the following statement to create an OSS external table. The OSS external table maps to the OSS directory to which you want to write data.
    create external table if not exists mc_oss_csv_external5
    (
    vehicleId int,
    recordId int,
    patientId int,
    calls int,
    locationLatitute double,
    locationLongtitue double,
    recordTime string
    )
    partitioned by (
    direction string
    )
    stored by 'com.aliyun.odps.CsvStorageHandler' 
    with serdeproperties (
     'odps.properties.rolearn'='acs:ram::xxxxxx:role/aliyunodpsdefaultrole'
    ) 
    location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo2/output/';                      
  2. Execute the INSERT OVERWRITE or INSERT INTO statement on the MaxCompute client to write data to the created OSS external table.
    insert into table mc_oss_csv_external5 partition (direction) select * from mc_oss_csv_external2;

    If the statement is successfully executed, you can view the exported files in the OSS directory. Subdirectories that are mapped to the partitions specified in the INSERT statement are generated in the output folder. A folder named .odps is generated in each subdirectory. The .odps folder contains the .meta file and a folder where .csv files are saved. Example of a file in the OSS directory: output/direction=N/.odps/20210330*********/R2_1_0_0-0_TableSink1-0-.csv.

    Export result
    Note
    • The .meta file in the .odps folder is an extra macro data file that is generated by MaxCompute. This file records valid data in the .odps folder. If the INSERT statement is successfully executed, all data in the .odps folder is valid. MaxCompute parses the .meta file only when the job that executes the INSERT statement fails. If the INSERT OVERWRITE statement is used and the job that executes the statement fails or is terminated, execute the INSERT OVERWRITE statement again.
    • If you use a built-in extractor of MaxCompute to process TSV or CSV files, the number of files generated in the OSS directory is the same as the number of concurrent SQL jobs.
    • If you execute the insert overwrite ... select ... from ...; statement and 1,000 mappers are allocated on the source table specified by from_tablename, 1,000 TSV or CSV files will be generated.
    • You can use flexible semantics and configurations of MaxCompute to limit the number of files that can be generated. If an outputer runs in a mapper, you can adjust the number of generated files by changing the value of odps.stage.mapper.split.size to adjust the number of concurrent mappers. If an outputer runs in a reducer, you can adjust the number of generated files by changing the value of odps.stage.reducer.num. If an outputer runs in a joiner, you can adjust the number of generated files by changing the value of odps.stage.joiner.num.

Example: Write data in a compression format to OSS by using a built-in text extractor

Read data from the OSS external table mc_oss_csv_external1 that is created in Access OSS data by using a built-in extractor and write the data in the GZIP format to the OSS directory oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo3/output/.

Perform the following steps:

  1. Log on to the MaxCompute client and execute the following statement to create an OSS external table. The OSS external table maps to the OSS directory to which you want to write data.
    create external table if not exists mc_oss_csv_external6
    (
    vehicleId int,
    recordId int,
    patientId int,
    calls int,
    locationLatitute double,
    locationLongtitue double,
    recordTime string,
    direction string
    )
    stored by 'com.aliyun.odps.CsvStorageHandler' 
    with serdeproperties (
     'odps.properties.rolearn'='acs:ram::xxxxxx:role/aliyunodpsdefaultrole',
     'odps.text.option.gzip.input.enabled'='true',
     'odps.text.option.gzip.output.enabled'='true' 
    ) 
    location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo3/output/';
  2. Execute the INSERT OVERWRITE or INSERT INTO statement on the MaxCompute client to write data to the created OSS external table.
    insert into table mc_oss_csv_external6 select * from mc_oss_csv_external1;

    If the statement is successfully executed, you can view the exported files in the OSS directory. A folder named .odps is generated in the output folder. The .odps folder contains the .meta file and a folder where .gz files are saved. Example of a file in the OSS directory: output/.odps/20220413*********/M1_0_0-0_TableSink1-0-.csv.gz.

    Export result
    Note
    • The .meta file in the .odps folder is an extra macro data file that is generated by MaxCompute. This file records valid data in the .odps folder. If the INSERT statement is successfully executed, all data in the .odps folder is valid. MaxCompute parses the .meta file only when the job that executes the INSERT statement fails. If the INSERT OVERWRITE statement is used and the job that executes the statement fails or is terminated, execute the INSERT OVERWRITE statement again.
    • If you use a built-in extractor of MaxCompute to process TSV or CSV files, the number of files generated in the OSS directory is the same as the number of concurrent SQL jobs.
    • If you execute the insert overwrite ... select ... from ...; statement and 1,000 mappers are allocated on the source table specified by from_tablename, 1,000 TSV or CSV files will be generated.
    • You can use flexible semantics and configurations of MaxCompute to limit the number of files that can be generated. If an outputer runs in a mapper, you can adjust the number of generated files by changing the value of odps.stage.mapper.split.size to adjust the number of concurrent mappers. If an outputer runs in a reducer, you can adjust the number of generated files by changing the value of odps.stage.reducer.num. If an outputer runs in a joiner, you can adjust the number of generated files by changing the value of odps.stage.joiner.num.

Example: Write data to OSS by using a built-in open source data extractor

Read data from the OSS external table mc_oss_csv_external1 that is created in Access OSS data by using a built-in extractor and write the data in the ORC format to the OSS directory oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo4/output/.

Perform the following steps:

  1. Log on to the MaxCompute client and execute the following statement to create an OSS external table. The OSS external table maps to the OSS directory to which you want to write data.
    create external table if not exists mc_oss_orc_external
    (
    vehicleId int,
    recordId int,
    patientId int,
    calls int,
    locationLatitute double,
    locationLongtitue double,
    recordTime string,
    direction string
    )
    stored as orc  
    location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo4/output/';
  2. Execute the INSERT OVERWRITE or INSERT INTO statement on the MaxCompute client to write data to the created OSS external table.
    insert into table mc_oss_orc_external select * from mc_oss_csv_external1;

    If the statement is successfully executed, you can view the exported files in the OSS directory. A folder named .odps is generated in the output folder. The .odps folder contains the .meta file and a folder where ORC files are saved. Example of a file in the OSS directory: output/.odps/20220413*********/M1_0_0-0_TableSink1.

    Export result
    Note
    • The .meta file in the .odps folder is an extra macro data file that is generated by MaxCompute. This file records valid data in the .odps folder. If the INSERT statement is successfully executed, all data in the .odps folder is valid. MaxCompute parses the .meta file only when the job that executes the INSERT statement fails. If the INSERT OVERWRITE statement is used and the job that executes the statement fails or is terminated, execute the INSERT OVERWRITE statement again.
    • If you use a built-in extractor of MaxCompute to process TSV or CSV files, the number of files generated in the OSS directory is the same as the number of concurrent SQL jobs.
    • If you execute the insert overwrite ... select ... from ...; statement and 1,000 mappers are allocated on the source table specified by from_tablename, 1,000 TSV or CSV files will be generated.
    • You can use flexible semantics and configurations of MaxCompute to limit the number of files that can be generated. If an outputer runs in a mapper, you can adjust the number of generated files by changing the value of odps.stage.mapper.split.size to adjust the number of concurrent mappers. If an outputer runs in a reducer, you can adjust the number of generated files by changing the value of odps.stage.reducer.num. If an outputer runs in a joiner, you can adjust the number of generated files by changing the value of odps.stage.joiner.num.

Example: Write data to OSS by using a custom extractor

This example shows how to write data from MaxCompute to an OSS directory by using the custom extractor that is developed in Access OSS data by using a built-in extractor.

Perform the following steps:

  1. Log on to the MaxCompute client and execute the following statement to create an OSS external table. The OSS external table maps to the OSS directory to which you want to write data.
    create external table if not exists mc_oss_external 
    (
    vehicleId int,
    recordId int,
    patientId int,
    calls int,
    locationLatitute double,
    locationLongtitue double,
    recordTime string,
    direction string
    )
    stored by 'com.aliyun.odps.udf.example.text.TextStorageHandler' 
      with serdeproperties (
    'delimiter'='|',  
    'odps.properties.rolearn'='acs:ram::xxxxxxxxxxxxx:role/aliyunodpsdefaultrole'
    )
    location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/SampleData/output'
    using 'javatest-1.0-SNAPSHOT.jar'; 
  2. Execute the INSERT OVERWRITE or INSERT INTO statement on the MaxCompute client to write data to the created OSS external table.
    insert into table mc_oss_external select * from mc_oss_csv_external1;

    If the statement is successfully executed, you can view the exported files in the OSS directory. A folder named .odps is generated in the output folder. The .odps folder contains the .meta file and a folder where TEXT files are saved. Example of a file in the OSS directory: output/.odps/20220413*********/M1_0_0-0_TableSink1-0.

    Export result
    Note
    • The .meta file in the .odps folder is an extra macro data file that is generated by MaxCompute. This file records valid data in the .odps folder. If the INSERT statement is successfully executed, all data in the .odps folder is valid. MaxCompute parses the .meta file only when the job that executes the INSERT statement fails. If the INSERT OVERWRITE statement is used and the job that executes the statement fails or is terminated, execute the INSERT OVERWRITE statement again.
    • If you use a built-in extractor of MaxCompute to process TSV or CSV files, the number of files generated in the OSS directory is the same as the number of concurrent SQL jobs.
    • If you execute the insert overwrite ... select ... from ...; statement and 1,000 mappers are allocated on the source table specified by from_tablename, 1,000 TSV or CSV files will be generated.
    • You can use flexible semantics and configurations of MaxCompute to limit the number of files that can be generated. If an outputer runs in a mapper, you can adjust the number of generated files by changing the value of odps.stage.mapper.split.size to adjust the number of concurrent mappers. If an outputer runs in a reducer, you can adjust the number of generated files by changing the value of odps.stage.reducer.num. If an outputer runs in a joiner, you can adjust the number of generated files by changing the value of odps.stage.joiner.num.