MaxCompute allows you to execute INSERT statements to write data from a MaxCompute
project to an Object Storage Service (OSS) directory by using OSS external tables.
This topic describes how to write data to OSS and provides examples of data write
operations.
Background information
You can write data in an internal table of MaxCompute to OSS. After you read data
from an OSS external table and process the data, you can write the processed data
back to OSS.
- Write data to OSS by using a built-in text extractor or a built-in open source data
extractor.
If you need to write data in the CSV or TSV format, an open source format, or a compression
format that is supported by MaxCompute to OSS, you can create an OSS external table
by using a built-in text extractor or a built-in open source data extractor of MaxCompute
and execute an INSERT statement to write data to OSS by using the OSS external table.
- Write data to OSS by using a custom extractor.
If you need to write data in a custom format to OSS, you can create an OSS external
table by using a custom extractor and execute an INSERT statement to write data to
OSS by using the OSS external table.
For more information, see Example: Write data to OSS by using a custom extractor.
- Write data to OSS by using the multipart upload feature.
If you need to write data in an open source format to OSS, you can create an OSS external
table by using a built-in open source data extractor, enable or disable the multipart
upload feature, and execute an INSERT statement to write data to OSS by using the
OSS external table. To enable or disable the multipart upload feature, you must configure
the odps.sql.unstructured.oss.commit.mode
parameter at the session level or project level. By default, this feature is disabled.
For more information about the multipart upload feature, see Multipart upload.
Before you use this feature, make sure that jobConf2 is enabled for your project to
generate a job execution plan. To enable jobConf2, set the odps.sql.jobconf.odps2
parameter to True.
Note The default value of odps.sql.jobconf.odps2
is True. If it is not set to True, run the set odps.sql.jobconf.odps2=true;
command to enable jobConf2 at the session level.
Valid values of the
odps.sql.unstructured.oss.commit.mode
parameter:
- False: Data that is written from MaxCompute to an OSS external table is stored in
the .odps folder in the directory specified by LOCATION. A .meta file is included in the .odps folder to ensure that the data written to OSS is consistent with the data in MaxCompute.
Only MaxCompute can correctly process the data in the .odps folder. If another data processing engine parses the data in this folder, an error
is returned.
- True: MaxCompute uses the multipart upload feature to ensure that the data written
to OSS is consistent with the data in MaxCompute in
two-phase commit
mode. In this case, the .odps folder and the .meta file are not generated. Other data processing engines can normally parse the data
that is written from MaxCompute to OSS.
You can configure the
odps.sql.unstructured.oss.commit.mode
parameter at the project level and session level. Sample commands:
Important In extreme cases, if a job that executes the
INSERT OVERWRITE
statement fails, historical data is deleted but new data is not written to the destination
table.
Cause: A hardware failure occurs or the metadata fails to be updated. This issue rarely
occurs. If this issue occurs, new data cannot be written to the destination table,
and the deleted historical data cannot be restored because the delete operation in
OSS cannot be rolled back.
Solution:
- If you want to use historical data to overwrite the data of an OSS external table,
you must back up the OSS data. This way, you can use the backup data to overwrite
the OSS external table when the job fails. For example, you can execute the
insert overwrite table T select * from table T;
statement to overwrite table T by using the historical data of table T.
- If a job that executes the
INSERT OVERWRITE
statement can be repeatedly submitted, you can submit the job again when the job
fails.
Prerequisites
Before you write data to OSS, make sure that the following requirements are met:
- The Alibaba Cloud account or RAM user is granted access permissions on OSS.
For more information about authorization, see STS authorization for OSS.
- (Optional) An OSS bucket, OSS directories, and OSS data files are prepared.
Note MaxCompute can automatically create an OSS directory. You can also manually create
an OSS directory.
For more information about how to create an OSS bucket, how to create an OSS directory,
and how to upload data files, see Create buckets, Create directories, and Upload files.
Example: Write data to a non-partitioned OSS directory by using a built-in text extractor
Read data from the OSS external table mc_oss_csv_external1 that is created in Example: Create a non-partitioned table as an OSS external table by using a built-in
text extractor and write the data in the CSV format to the OSS directory oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo1/output/
.
Perform the following steps:
- Log on to the MaxCompute client and execute the following statement to create an OSS external table. The OSS external
table maps to the OSS directory to which you want to write data.
create external table if not exists mc_oss_csv_external4
(
vehicleId int,
recordId int,
patientId int,
calls int,
locationLatitute double,
locationLongtitue double,
recordTime string,
direction string
)
stored by 'com.aliyun.odps.CsvStorageHandler'
with serdeproperties (
'odps.properties.rolearn'='acs:ram::xxxxxx:role/aliyunodpsdefaultrole'
)
location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo1/output/';
- Execute the
INSERT OVERWRITE
or INSERT INTO
statement on the MaxCompute client to write data to the created OSS external table.
insert into table mc_oss_csv_external4 select * from mc_oss_csv_external1;
If the statement is successfully executed, you can view the exported files in the
OSS directory. A folder named .odps is generated in the output folder. The .odps folder contains a .meta file and a folder where .csv files are saved. Example of a file in the OSS directory: output/.odps/20210330*********/R2_1_0_0-0_TableSink1-0-.csv
.

Note
- The .meta file in the .odps folder is an extra macro data file that is generated by MaxCompute. This file records
valid data in the .odps folder. If the INSERT statement is successfully executed,
all data in the .odps folder is valid. MaxCompute parses the .meta file only when
the job that executes the INSERT statement fails. If the INSERT OVERWRITE statement
is used and the job that executes the statement fails or is terminated, execute the
INSERT OVERWRITE statement again.
- If you use a built-in extractor of MaxCompute to process TSV or CSV files, the number
of files generated in the OSS directory is the same as the number of concurrent SQL
jobs.
- If you execute the
insert overwrite ... select ... from ...;
statement and 1,000 mappers are allocated on the source table specified by from_tablename, 1,000 TSV or CSV files will be generated.
- You can use flexible semantics and configurations of MaxCompute to limit the number
of files that can be generated. If an outputer runs in a mapper, you can adjust the
number of generated files by changing the value of
odps.stage.mapper.split.size
to adjust the number of concurrent mappers. If an outputer runs in a reducer, you
can adjust the number of generated files by changing the value of odps.stage.reducer.num
. If an outputer runs in a joiner, you can adjust the number of generated files by
changing the value of odps.stage.joiner.num
.
Example: Write data to a partitioned OSS directory by using a built-in text extractor
Read data from the OSS external table mc_oss_csv_external2 that is created in Example: Create a partitioned table as an OSS external table by using a built-in text
extractor and write the data in the CSV format to the OSS directory oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo2/output/
.
Perform the following steps:
- Log on to the MaxCompute client and execute the following statement to create an OSS external table. The OSS external
table maps to the OSS directory to which you want to write data.
create external table if not exists mc_oss_csv_external5
(
vehicleId int,
recordId int,
patientId int,
calls int,
locationLatitute double,
locationLongtitue double,
recordTime string
)
partitioned by (
direction string
)
stored by 'com.aliyun.odps.CsvStorageHandler'
with serdeproperties (
'odps.properties.rolearn'='acs:ram::xxxxxx:role/aliyunodpsdefaultrole'
)
location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo2/output/';
- Execute the
INSERT OVERWRITE
or INSERT INTO
statement on the MaxCompute client to write data to the created OSS external table.
insert into table mc_oss_csv_external5 partition (direction) select * from mc_oss_csv_external2;
If the statement is successfully executed, you can view the exported files in the
OSS directory. Subdirectories that are mapped to the partitions specified in the INSERT
statement are generated in the output folder. A folder named .odps is generated in each subdirectory. The .odps folder contains the .meta file and a folder where .csv files are saved. Example of a file in the OSS directory: output/direction=N/.odps/20210330*********/R2_1_0_0-0_TableSink1-0-.csv
.

Note
- The .meta file in the .odps folder is an extra macro data file that is generated by MaxCompute. This file records
valid data in the .odps folder. If the INSERT statement is successfully executed,
all data in the .odps folder is valid. MaxCompute parses the .meta file only when
the job that executes the INSERT statement fails. If the INSERT OVERWRITE statement
is used and the job that executes the statement fails or is terminated, execute the
INSERT OVERWRITE statement again.
- If you use a built-in extractor of MaxCompute to process TSV or CSV files, the number
of files generated in the OSS directory is the same as the number of concurrent SQL
jobs.
- If you execute the
insert overwrite ... select ... from ...;
statement and 1,000 mappers are allocated on the source table specified by from_tablename, 1,000 TSV or CSV files will be generated.
- You can use flexible semantics and configurations of MaxCompute to limit the number
of files that can be generated. If an outputer runs in a mapper, you can adjust the
number of generated files by changing the value of
odps.stage.mapper.split.size
to adjust the number of concurrent mappers. If an outputer runs in a reducer, you
can adjust the number of generated files by changing the value of odps.stage.reducer.num
. If an outputer runs in a joiner, you can adjust the number of generated files by
changing the value of odps.stage.joiner.num
.
Example: Write data in a compression format to OSS by using a built-in text extractor
Read data from the OSS external table mc_oss_csv_external1 that is created in Example: Create a non-partitioned table as an OSS external table by using a built-in
text extractor and write the data in the GZIP format to the OSS directory oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo3/output/
.
Perform the following steps:
- Log on to the MaxCompute client and execute the following statement to create an OSS external table. The OSS external
table maps to the OSS directory to which you want to write data.
create external table if not exists mc_oss_csv_external6
(
vehicleId int,
recordId int,
patientId int,
calls int,
locationLatitute double,
locationLongtitue double,
recordTime string,
direction string
)
stored by 'com.aliyun.odps.CsvStorageHandler'
with serdeproperties (
'odps.properties.rolearn'='acs:ram::xxxxxx:role/aliyunodpsdefaultrole',
'odps.text.option.gzip.input.enabled'='true',
'odps.text.option.gzip.output.enabled'='true'
)
location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo3/output/';
- Execute the
INSERT OVERWRITE
or INSERT INTO
statement on the MaxCompute client to write data to the created OSS external table.
insert into table mc_oss_csv_external6 select * from mc_oss_csv_external1;
If the statement is successfully executed, you can view the exported files in the
OSS directory. A folder named .odps is generated in the output folder. The .odps folder contains the .meta file and a folder where .gz files are saved. Example of a file in the OSS directory: output/.odps/20220413*********/M1_0_0-0_TableSink1-0-.csv.gz
.

Note
- The .meta file in the .odps folder is an extra macro data file that is generated by MaxCompute. This file records
valid data in the .odps folder. If the INSERT statement is successfully executed,
all data in the .odps folder is valid. MaxCompute parses the .meta file only when
the job that executes the INSERT statement fails. If the INSERT OVERWRITE statement
is used and the job that executes the statement fails or is terminated, execute the
INSERT OVERWRITE statement again.
- If you use a built-in extractor of MaxCompute to process TSV or CSV files, the number
of files generated in the OSS directory is the same as the number of concurrent SQL
jobs.
- If you execute the
insert overwrite ... select ... from ...;
statement and 1,000 mappers are allocated on the source table specified by from_tablename, 1,000 TSV or CSV files will be generated.
- You can use flexible semantics and configurations of MaxCompute to limit the number
of files that can be generated. If an outputer runs in a mapper, you can adjust the
number of generated files by changing the value of
odps.stage.mapper.split.size
to adjust the number of concurrent mappers. If an outputer runs in a reducer, you
can adjust the number of generated files by changing the value of odps.stage.reducer.num
. If an outputer runs in a joiner, you can adjust the number of generated files by
changing the value of odps.stage.joiner.num
.
Example: Write data to OSS by using a built-in open source data extractor
Read data from the OSS external table mc_oss_csv_external1 that is created in Example: Create a non-partitioned table as an OSS external table by using a built-in
text extractor and write the data in the ORC format to the OSS directory oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo4/output/
.
Perform the following steps:
- Log on to the MaxCompute client and execute the following statement to create an OSS external table. The OSS external
table maps to the OSS directory to which you want to write data.
create external table if not exists mc_oss_orc_external
(
vehicleId int,
recordId int,
patientId int,
calls int,
locationLatitute double,
locationLongtitue double,
recordTime string,
direction string
)
stored as orc
location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo4/output/';
- Execute the
INSERT OVERWRITE
or INSERT INTO
statement on the MaxCompute client to write data to the created OSS external table.
insert into table mc_oss_orc_external select * from mc_oss_csv_external1;
If the statement is successfully executed, you can view the exported files in the
OSS directory. A folder named .odps is generated in the output folder. The .odps folder contains the .meta file and a folder where ORC files are saved. Example of a file in the OSS directory:
output/.odps/20220413*********/M1_0_0-0_TableSink1
.

Note
- The .meta file in the .odps folder is an extra macro data file that is generated by MaxCompute. This file records
valid data in the .odps folder. If the INSERT statement is successfully executed,
all data in the .odps folder is valid. MaxCompute parses the .meta file only when
the job that executes the INSERT statement fails. If the INSERT OVERWRITE statement
is used and the job that executes the statement fails or is terminated, execute the
INSERT OVERWRITE statement again.
- If you use a built-in extractor of MaxCompute to process TSV or CSV files, the number
of files generated in the OSS directory is the same as the number of concurrent SQL
jobs.
- If you execute the
insert overwrite ... select ... from ...;
statement and 1,000 mappers are allocated on the source table specified by from_tablename, 1,000 TSV or CSV files will be generated.
- You can use flexible semantics and configurations of MaxCompute to limit the number
of files that can be generated. If an outputer runs in a mapper, you can adjust the
number of generated files by changing the value of
odps.stage.mapper.split.size
to adjust the number of concurrent mappers. If an outputer runs in a reducer, you
can adjust the number of generated files by changing the value of odps.stage.reducer.num
. If an outputer runs in a joiner, you can adjust the number of generated files by
changing the value of odps.stage.joiner.num
.
Example: Write data to OSS by using a custom extractor
This example shows how to write data from MaxCompute to an OSS directory by using
the custom extractor that is developed in Example: Create an OSS external table by using a custom extractor.
Perform the following steps:
- Log on to the MaxCompute client and execute the following statement to create an OSS external table. The OSS external
table maps to the OSS directory to which you want to write data.
create external table if not exists mc_oss_external
(
vehicleId int,
recordId int,
patientId int,
calls int,
locationLatitute double,
locationLongtitue double,
recordTime string,
direction string
)
stored by 'com.aliyun.odps.udf.example.text.TextStorageHandler'
with serdeproperties (
'delimiter'='|',
'odps.properties.rolearn'='acs:ram::xxxxxxxxxxxxx:role/aliyunodpsdefaultrole'
)
location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/SampleData/output'
using 'javatest-1.0-SNAPSHOT.jar';
- Execute the
INSERT OVERWRITE
or INSERT INTO
statement on the MaxCompute client to write data to the created OSS external table.
insert into table mc_oss_external select * from mc_oss_csv_external1;
If the statement is successfully executed, you can view the exported files in the
OSS directory. A folder named .odps is generated in the output folder. The .odps folder contains the .meta file and a folder where TEXT files are saved. Example of a file in the OSS directory:
output/.odps/20220413*********/M1_0_0-0_TableSink1-0
.

Note
- The .meta file in the .odps folder is an extra macro data file that is generated by MaxCompute. This file records
valid data in the .odps folder. If the INSERT statement is successfully executed,
all data in the .odps folder is valid. MaxCompute parses the .meta file only when
the job that executes the INSERT statement fails. If the INSERT OVERWRITE statement
is used and the job that executes the statement fails or is terminated, execute the
INSERT OVERWRITE statement again.
- If you use a built-in extractor of MaxCompute to process TSV or CSV files, the number
of files generated in the OSS directory is the same as the number of concurrent SQL
jobs.
- If you execute the
insert overwrite ... select ... from ...;
statement and 1,000 mappers are allocated on the source table specified by from_tablename, 1,000 TSV or CSV files will be generated.
- You can use flexible semantics and configurations of MaxCompute to limit the number
of files that can be generated. If an outputer runs in a mapper, you can adjust the
number of generated files by changing the value of
odps.stage.mapper.split.size
to adjust the number of concurrent mappers. If an outputer runs in a reducer, you
can adjust the number of generated files by changing the value of odps.stage.reducer.num
. If an outputer runs in a joiner, you can adjust the number of generated files by
changing the value of odps.stage.joiner.num
.