MaxCompute lets you export data from a MaxCompute project to an external data store, such as Object Storage Service (OSS) or Hologres. Other compute engines can then use this data. This topic describes how to use the UNLOAD command and provides examples.
You can run the commands in this topic on the following platforms:
Feature introduction
You can run the unload command to export data to an external data store, such as OSS or Hologres. You can store data in OSS in CSV format or other open source formats. Note the following:
To export MaxCompute data to other compute engines, such as Amazon Redshift or BigQuery, you must first export the data to OSS.
If you export data multiple times, the new data is exported to new files. Existing files are not overwritten.
Limits
The UNLOAD command has the following limits:
When you export a file to OSS, the system automatically splits the file and generates a name for it. Therefore, you cannot specify a custom filename or file extension.
When you export data from MaxCompute to Hologres, dual-signature authorization is not supported.
You cannot export data from MaxCompute to a partitioned Hologres table.
Adding file extensions to exported files in open source formats is not supported.
Notes
The UNLOAD command is free of charge. However, the query clauses in the UNLOAD command scan data and use computing resources. These query clauses are billed as standard SQL jobs.
In some scenarios, storing structured data in OSS can reduce storage costs. However, you should estimate the costs in advance.
MaxCompute storage costs USD 0.018 per GB per month. For more information about storage billing, see Storage pricing (pay-as-you-go). Data imported into MaxCompute has an approximate compression ratio of 5:1. You are billed based on the size of the compressed data.
The unit price for the OSS Standard storage class is USD 0.018 per GB per month. Other storage classes, such as Infrequent Access, Archive, and Cold Archive, are also available. For more information, see Storage pricing.
If you are exporting data solely to save on storage costs, you should first test your data to estimate the compression ratio. Then, estimate the cost of the UNLOAD operation based on the export query statement. You should also evaluate how you will access the exported data. This planning helps you avoid extra costs from unnecessary data migration.
Prerequisites
The account has the required MaxCompute permissions.
Before you run an
unloadoperation, the account must have the read permission (Select) for the table data that you want to export from the MaxCompute project. For more information about how to grant permissions, see MaxCompute permissions.The account has the required permissions on the data source for the external storage.
Before you export data from MaxCompute to an external storage service, you must grant MaxCompute access to that service (OSS or Hologres). The authorization method for the
unloadcommand is the same as the method for MaxCompute external tables. The following sections describe how to grant authorization for OSS and Hologres.External storage: OSS
For higher security, you can use one-click authorization. For more information, see STS mode authorization.
NoteThe examples in this topic use one-click authorization. The role name is AliyunODPSDefaultRole.
External storage: Hologres
To complete the authorization, you can create a RAM role, grant it permissions to access MaxCompute, and then add the role to a Hologres instance. For more information, see Create a Hologres external table (STS mode).
After you complete the authorization, select an export method based on the format of the data you want to export:
Export data using a built-in extractor (StorageHandler)
Syntax
UNLOAD FROM {<select_statement>|<table_name> [PARTITION (<pt_spec>)]} INTO LOCATION <external_location> STORED BY <StorageHandler> [WITH SERDEPROPERTIES ('<property_name>'='<property_value>',...)];Parameters
External storage: OSS
Parameter
Required
Description
select_statement
No
A SELECT clause that queries data to insert into the destination OSS path from the source table. The source table can be a partitioned or non-partitioned table. For more information about the SELECT clause, see SELECT syntax.
table_name, pt_spec
No
Specifies the data to export by table name or by table name and partition name. This export method does not generate a query statement, so no fees are incurred. The format of pt_spec is
(partition_col1 = partition_col_value1, partition_col2 = partition_col_value2, ...).external_location
Yes
The destination OSS path where the exported data is stored. The format is
'oss://<oss_endpoint>/<object>'. For more information about OSS paths, see Access OSS over IPv6.StorageHandler
Yes
The name of the built-in StorageHandler. Set this to
com.aliyun.odps.CsvStorageHandlerorcom.aliyun.odps.TsvStorageHandler. These are built-in StorageHandlers that process CSV and TSV files and define how to read or write them. The system implements the related logic, so you only need to specify this parameter. This method adds the.csvor.tsvextension to the exported filename by default. The usage is the same as for MaxCompute external tables. For more information, see Create an OSS external table.'<property_name>'='<property_value>'
No
property_name is the property name, and property_value is the property value. The supported properties are the same as for MaxCompute external tables. For more information, see Create an OSS external table.
External storage: Hologres
Parameter
Required
Description
select_statement
No
A
SELECTclause that queries data to insert into the destination Hologres path from the source table. The source table must be a non-partitioned table. For more information about theSELECTclause, see SELECT syntax.table_name
No
Specifies the data to export by table name. This export method does not generate a query statement, so no fees are incurred.
external_location
Yes
The destination Hologres path where the exported data is stored. The format is
'jdbc:postgresql://<endpoint>:<port>/<database>?ApplicationName=MaxCompute&[currentSchema=<schema>&][useSSL={true|false}&]table=<holo_table_name>/'. For more information about Hologres paths, see Create a Hologres external table.StorageHandler
Yes
The name of the built-in StorageHandler. Set this to
com.aliyun.odps.jdbc.JdbcStorageHandlerto use the JDBC StorageHandler connection method.'<property_name>'='<property_value>'
No
property_name is the property name, and property_value is the property value. When you export data to Hologres, you must specify the following parameters:
'odps.properties.rolearn'='<ram_arn>': Specifies the Alibaba Cloud Resource Name (ARN) of the RAM role for Security Token Service (STS) authentication. You can obtain the ARN from the Basic Information section of the target RAM role's details page in the Resource Access Management console.
'mcfed.mapreduce.jdbc.driver.class'='org.postgresql.Driver': Specifies the driver used to connect to the Hologres database. Set this to
org.postgresql.Driver.'odps.federation.jdbc.target.db.type'='holo': Specifies the type of the database to connect to. Set this to
holo.
The properties supported by Hologres are the same as for MaxCompute external tables. For more information, see Create a Hologres external table.
Examples
External storage: OSS
Assume that you want to export data from the sale_detail table in a MaxCompute project to OSS. The sale_detail table contains the following data:
+------------+-------------+-------------+------------+------------+ | shop_name | customer_id | total_price | sale_date | region | +------------+-------------+-------------+------------+------------+ | s1 | c1 | 100.1 | 2013 | china | | s2 | c2 | 100.2 | 2013 | china | | s3 | c3 | 100.3 | 2013 | china | | null | c5 | NULL | 2014 | shanghai | | s6 | c6 | 100.4 | 2014 | shanghai | | s7 | c7 | 100.5 | 2014 | shanghai | +------------+-------------+-------------+------------+------------+Log on to the OSS console. In the
oss-cn-hangzhouregion, create the OSS bucket foldermc-unload/data_location/and construct the OSS path. For more information about how to create an OSS bucket, see Create buckets.
The OSS path is constructed from the bucket, region, and endpoint information, as shown in the following example:
oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_locationLog on to the MaxCompute client and run the UNLOAD command to export data from the sale_detail table to OSS. The following sample commands show how to do this:
Example 1: Export data from the sale_detail table to a CSV file and compress it using the GZIP format. The following sample command shows how to do this:
-- Control the number of exported files: Set the amount of data a single worker reads from the MaxCompute table. Unit: MB. -- Because MaxCompute tables are compressed, the data exported to OSS is typically about four times larger. SET odps.stage.mapper.split.size=256; -- Export the data. UNLOAD FROM (SELECT * FROM sale_detail) INTO LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_location' STORED BY 'com.aliyun.odps.CsvStorageHandler' WITH SERDEPROPERTIES ('odps.properties.rolearn'='acs:ram::<uid>:role/AliyunODPSDefaultRole', 'odps.text.option.gzip.output.enabled'='true'); -- This is equivalent to the following statement. SET odps.stage.mapper.split.size=256; UNLOAD FROM sale_detail INTO LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_location' STORED BY 'com.aliyun.odps.CsvStorageHandler' WITH SERDEPROPERTIES ('odps.properties.rolearn'='acs:ram::<uid>:role/AliyunODPSDefaultRole', 'odps.text.option.gzip.output.enabled'='true');Example 2: Export data from the partition where sale_date='2013' and region='china' in the sale_detail table to a TSV file and compress it using the GZIP format.
-- Control the number of exported files: Set the amount of data a single worker reads from the MaxCompute table. Unit: MB. -- Because MaxCompute tables are compressed, the data exported to OSS is typically about four times larger. SET odps.stage.mapper.split.size=256; -- Export the data. UNLOAD FROM sale_detail PARTITION (sale_date='2013',region='china') INTO LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_location' STORED BY 'com.aliyun.odps.TsvStorageHandler' WITH SERDEPROPERTIES ('odps.properties.rolearn'='acs:ram::<uid>:role/AliyunODPSDefaultRole', 'odps.text.option.gzip.output.enabled'='true');
'odps.text.option.gzip.output.enabled'='true'specifies that the exported file is compressed using the GZIP format. Currently, only the GZIP format is supported.Log on to the OSS console to view the export result in the destination OSS path.
Result for Example 1:

Result for Example 2:

External storage: Hologres
Assume that you want to export data from the
data_testtable in a MaxCompute project to Hologres. Thedata_testtable contains the following data:+------------+------+ | id | name | +------------+------+ | 3 | rgege | | 4 | Gegegegr | +------------+------+In Hologres, create a destination table named
mc_2_holoin thetestdatabase. You can run the following `CREATE TABLE` statement in the SQL editor of HoloWeb. For more information, see Connect to HoloWeb and run queries. The `CREATE TABLE` statement is as follows:NoteThe data types of the fields in the destination table must match the data types of the fields in the MaxCompute table. For more information, see Data type mapping between MaxCompute and Hologres.
CREATE TABLE mc_2_holo (id INT, name TEXT);Log on to the MaxCompute client and run the UNLOAD command to export data from the
data_testtable to Hologres. The sample command is as follows:UNLOAD FROM (SELECT * FROM data_test) INTO LOCATION 'jdbc:postgresql://hgprecn-cn-5y**********-cn-hangzhou-internal.hologres.aliyuncs.com:80/test?ApplicationName=MaxCompute¤tSchema=public&useSSL=false&table=mc_2_holo/' STORED BY 'com.aliyun.odps.jdbc.JdbcStorageHandler' WITH SERDEPROPERTIES ( 'odps.properties.rolearn'='acs:ram::13**************:role/aliyunodpsholorole', 'mcfed.mapreduce.jdbc.driver.class'='org.postgresql.Driver', 'odps.federation.jdbc.target.db.type'='holo' );Query the exported data in Hologres:
SELECT * FROM mc_2_holo;The following result is returned:
id name 4 Gegegegr 3 rgege
Export data in other open source formats
Syntax
UNLOAD FROM {<select_statement>|<table_name> [PARTITION (<pt_spec>)]}
INTO
LOCATION <external_location>
[ROW FORMAT SERDE '<serde_class>'
[WITH SERDEPROPERTIES ('<property_name>'='<property_value>',...)]
]
STORED AS <file_format>
[PROPERTIES ('<tbproperty_name>'='<tbproperty_value>')];Parameters
Parameter | Required | Description |
select_statement | No | A SELECT clause that queries data to insert into the destination OSS path from the source table. The source table can be a partitioned or non-partitioned table. For more information about the SELECT clause, see SELECT syntax. |
table_name, pt_spec | No | Specifies the data to export by table name or by table name and partition name. This export method does not generate a query statement, so no fees are incurred. The format of pt_spec is |
external_location | Yes | The destination OSS path where the exported data is stored. The format is |
serde_class | No | The usage is the same as for MaxCompute external tables. For more information, see Create an OSS external table. |
'<property_name>'='<property_value>' | No | property_name is the property name, and property_value is the property value. The supported properties are the same as for MaxCompute external tables. For more information, see Create an OSS external table. |
file_format | Yes | The format of the exported data file, such as ORC, PARQUET, RCFILE, SEQUENCEFILE, and TEXTFILE. The usage is the same as for MaxCompute external tables. For more information, see Create an OSS external table. |
'<tbproperty_name>'='<tbproperty_value>' | No | tbproperty_name is the name of an extended property of the external table, and tbproperty_value is the value of the property. For example, open source data can be exported in SNAPPY or LZO compression format. Set the compression property to |
Example: Export data from the sale_detail table in a MaxCompute project to OSS
Sample data
-- Data in the sale_detail table.
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s3 | c3 | 100.3 | 2013 | china |
| null | c5 | NULL | 2014 | shanghai |
| s6 | c6 | 100.4 | 2014 | shanghai |
| s7 | c7 | 100.5 | 2014 | shanghai |
+------------+-------------+-------------+------------+------------+Log on to the OSS console. In the
oss-cn-hangzhouregion, create the OSS bucket foldermc-unload/data_location/and construct the OSS path. For more information about how to create an OSS bucket, see Create buckets.
The OSS path is constructed from the bucket, region, and endpoint information, as shown in the following example:
oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_locationLog on to the MaxCompute client and run the UNLOAD command to export data from the sale_detail table to OSS.
Example 1: Export data from the sale_detail table in PARQUET format and compress it using SNAPPY.
-- Control the number of exported files: Set the amount of data a single worker reads from the MaxCompute table. Unit: MB. -- Because MaxCompute tables are compressed, the data exported to OSS is typically about four times larger. SET odps.stage.mapper.split.size=256; -- Export the data. UNLOAD FROM (SELECT * FROM sale_detail) INTO LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_location' ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES ('odps.properties.rolearn'='acs:ram::<uid>:role/AliyunODPSDefaultRole') STORED AS PARQUET PROPERTIES('mcfed.parquet.compression'='SNAPPY');Example 2: Export data from the partition where sale_date='2013' and region='china' in the sale_detail table. Export the data in PARQUET format and compress it using SNAPPY.
-- Control the number of exported files: Set the amount of data a single worker reads from the MaxCompute table. Unit: MB. -- Because MaxCompute tables are compressed, the data exported to OSS is typically about four times larger. SET odps.stage.mapper.split.size=256; -- Export the data. UNLOAD FROM sale_detail PARTITION (sale_date='2013',region='china') INTO LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_location' ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES ('odps.properties.rolearn'='acs:ram::<uid>:role/AliyunODPSDefaultRole') STORED AS PARQUET PROPERTIES('mcfed.parquet.compression'='SNAPPY');NoteWhen you export data that is compressed using the SNAPPY or LZO format, the exported file does not have the .snappy or .lzo extension.
Example 3: Export data from the sale_detail table to a .txt file and specify a comma as the separator.
SET odps.sql.allow.fullscan=true; UNLOAD FROM (SELECT * FROM sale_detail) INTO LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/mc-unload/data_location/' ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('field.delim'=',') STORED AS TEXTFILE properties ('odps.external.data.enable.extension'='true');
Set prefixes, suffixes, and extensions for exported files
When you use the unload command to export a MaxCompute table to a file, some business scenarios may require you to specify a file prefix, suffix, and extension. You can follow these steps to customize the file prefix and suffix, and to automatically generate the extension for the corresponding file format.
Syntax
Export files in formats such as CSV and TSV using a built-in extractor.
-- Use a built-in extractor to export files in formats such as CSV and TSV. UNLOAD FROM {<select_statement>|<table_name> [PARTITION (<pt_spec>)]} INTO LOCATION <external_location> [STORED BY <StorageHandler>] [WITH SERDEPROPERTIES ('<property_name>'='<property_value>',...)];Export files in open source formats such as ORC and Parquet.
UNLOAD FROM {<select_statement>|<table_name> [PARTITION (<pt_spec>)]} INTO LOCATION <external_location> [ROW FORMAT SERDE '<serde_class>' [WITH SERDEPROPERTIES ('<property_name>'='<property_value>',...)] ] STORED AS <file_format> [PROPERTIES('<tbproperty_name>'='<tbproperty_value>')];The property_name for the prefix is odps.external.data.prefix. The value can be a custom string of up to 10 characters.
The property_name for the extension is odps.external.data.enable.extension. A value of
truedisplays the file format as the extension.For more information about other parameters, see Export data using a built-in extractor (StorageHandler).
The tbproperty_name for the prefix is odps.external.data.prefix. The value can be a custom string of up to 10 characters.
The tbproperty_name for the extension is odps.external.data.enable.extension. A value of
truedisplays the file format as the extension.For more information about other parameters, see Export data in other open source formats.
Parameters
Property
Scenario
Description
property_value
Default value
odps.external.data.output.prefix
(Compatible with odps.external.data.prefix)
Add this property to add a custom prefix to the output file name.
Contains only letters, digits, and underscores (a-z, A-Z, 0-9, _).
The length must be 1 to 10 characters.
A valid string, such as 'mc_'.
None
odps.external.data.enable.extension
Add this property to display the extension of the output file.
A value of `True` displays the file extension. A value of `False` hides the file extension.
True
False
False
odps.external.data.output.suffix
Add this property to add a custom suffix to the output file name.
Contains only letters, digits, and underscores (a-z, A-Z, 0-9, _).
A valid string, such as '_hangzhou'.
None
odps.external.data.output.explicit.extension
Add this property to add a custom extension to the output file.
Contains only letters, digits, and underscores (a-z, A-Z, 0-9, _).
The length must be 1 to 10 characters.
This property has a higher priority than the
odps.external.data.enable.extensionparameter.
A valid string, such as "jsonl".
None
For more information about other parameters, see Export data using a built-in extractor (StorageHandler) and Export data in other open source formats.
Extension reference
The following table lists the extensions that are automatically generated for foreign tables when you set the
odps.external.data.enable.extension=trueparameter.File Format
SerDe
Extension
SEQUENCEFILE
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
.sequencefile
TEXTFILE
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
.txt
RCFILE
org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe
.rcfile
ORC
org.apache.hadoop.hive.ql.io.orc.OrcSerde
.orc
PARQUET
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
.parquet
AVRO
org.apache.hadoop.hive.serde2.avro.AvroSerDe
.avro
JSON
org.apache.hive.hcatalog.data.JsonSerDe
.json
CSV
org.apache.hadoop.hive.serde2.OpenCSVSerde
.csv
Examples
Export a file in TEXTFILE format. Add the mc_ prefix, add the _beijing suffix, and automatically generate the file extension.
UNLOAD FROM (SELECT * FROM vehicle) INTO LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/textfile' row format serde 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS textfile PROPERTIES ( 'odps.external.data.output.prefix'='mc_', 'odps.external.data.output.suffix'='_beijing', 'odps.external.data.enable.extension'='true');View the exported file in the destination OSS path:

Export a file in JSON format. Add the mc_ prefix, add the _beijing suffix, and specify jsonl as the custom extension.
UNLOAD FROM (SELECT * FROM vehicle) INTO LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/json' ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS textfile PROPERTIES ( 'odps.external.data.output.prefix'='mc_', 'odps.external.data.output.suffix'='_beijing', 'odps.external.data.output.explicit.extension'='jsonl');View the exported file in the destination OSS path:

Related topics
To import data in CSV format or other open source formats from an external storage service into MaxCompute, see LOAD.