The UNLOAD command exports data from a MaxCompute project to external storage (OSS or Hologres) for use by other computing engines.
Permission requirements
MaxCompute permissions: Your account must have
Selectpermission on the MaxCompute table that you want to export. For instructions on how to grant permissions, see MaxCompute permissions.External storage permissions: Before you export data from MaxCompute to external storage, you must authorize MaxCompute to access the external storage, such as OSS or Hologres.
OSS authorization: OSS supports a one-click authorization method that provides higher security. For more information, see Authorize a RAM role in STS mode. The examples in this topic use the one-click authorization method, and the RAM role is named
AliyunODPSDefaultRole.Hologres authorization: To complete the authorization, you must create a RAM role, grant it permissions to access Hologres, and add the role to your Hologres instance. For more information, see Create a Hologres external table (STS mode).
Export to external storage
The UNLOAD command uses append mode. When you repeatedly export data to the same destination path, the system generates new files in that path instead of overwriting existing ones. To overwrite the data, clear the destination path before executing the UNLOAD command.
Export to OSS
Text format (CSV or TSV)
This method uses the built-in StorageHandler to export data and adds a .csv or .tsv extension to the files by default.
Syntax
UNLOAD FROM {<select_statement>|<table_name> [PARTITION (<pt_spec>)]}
INTO
LOCATION <oss_location>
STORED BY <StorageHandler>
[WITH SERDEPROPERTIES ('<property_name>'='<property_value>',...)];Key parameters
Parameter | Description |
oss_location | The destination OSS path. The format is |
StorageHandler | Specifies the built-in handler:
|
SERDEPROPERTIES | Configures export properties. The most common properties are:
|
Examples
Example 1: Export as GZIP-compressed CSV
Export data from the sale_detail table in a MaxCompute project to OSS. The following sample data is from the sale_detail table:
-- Partition fields: sale_date, region
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s3 | c3 | 100.3 | 2013 | china |
| null | c5 | NULL | 2014 | shanghai |
| s6 | c6 | 100.4 | 2014 | shanghai |
| s7 | c7 | 100.5 | 2014 | shanghai |
+------------+-------------+-------------+------------+------------+Procedure:
In OSS: Log on to the OSS console, create the
mc-unload/data_location/directory in a bucket in theoss-cn-hangzhouregion, and then construct the OSS path. For more information about how to create a bucket, see Create a bucket in the console. Construct the OSS path based on the bucket, region, and endpoint information:oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_locationIn MaxCompute: Log on to the MaxCompute client and execute the
UNLOADcommand.Method 1: Use a SELECT query
-- Set the amount of data that a single worker can read from the MaxCompute table to 256 MB. This setting controls the number of output files. SET odps.stage.mapper.split.size=256; UNLOAD FROM (SELECT * FROM sale_detail) INTO LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_location' STORED BY 'com.aliyun.odps.CsvStorageHandler' WITH SERDEPROPERTIES ( 'odps.properties.rolearn'='acs:ram::<uid>:role/AliyunODPSDefaultRole', 'odps.text.option.gzip.output.enabled'='true' );Method 2: Specify the table name directly (no query fees are incurred)
-- Set the amount of data that a single worker can read from the MaxCompute table to 256 MB. This setting controls the number of output files. SET odps.stage.mapper.split.size=256; UNLOAD FROM sale_detail INTO LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_location' STORED BY 'com.aliyun.odps.CsvStorageHandler' WITH SERDEPROPERTIES ( 'odps.properties.rolearn'='acs:ram::<uid>:role/AliyunODPSDefaultRole', 'odps.text.option.gzip.output.enabled'='true' );Verify the result: Log on to the OSS console and check the exported files in the destination OSS path. The data is exported in CSV format and compressed in GZIP format.
Example 2: Export partition data as compressed TSV
Export the data from the sale_date='2013' and region='china' partitions of the sale_detail table in TSV format and compress it with GZIP.
-- Partition fields: sale_date, region
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s3 | c3 | 100.3 | 2013 | china |
| null | c5 | NULL | 2014 | shanghai |
| s6 | c6 | 100.4 | 2014 | shanghai |
| s7 | c7 | 100.5 | 2014 | shanghai |
+------------+-------------+-------------+------------+------------+Procedure:
In OSS: Log on to the OSS console, create the
mc-unload/data_location/directory in a bucket in theoss-cn-hangzhouregion, and then construct the OSS path. For more information about how to create a bucket, see Create a bucket in the console. Construct the OSS path based on the bucket, region, and endpoint information:oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_locationIn MaxCompute: Log on to the MaxCompute client and execute the
UNLOADcommand.SET odps.stage.mapper.split.size=256; UNLOAD FROM sale_detail PARTITION (sale_date='2013',region='china') INTO LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_location' STORED BY 'com.aliyun.odps.TsvStorageHandler' WITH SERDEPROPERTIES ( 'odps.properties.rolearn'='acs:ram::<uid>:role/AliyunODPSDefaultRole', 'odps.text.option.gzip.output.enabled'='true' );Verify the result: Log on to the OSS console and check the exported files in the destination OSS path. The data is exported in TSV format and compressed in GZIP format.
Open source format (Parquet or ORC)
This method supports exporting data into various open source formats, such as columnar storage and structured data formats.
Syntax
UNLOAD FROM {<select_statement>|<table_name> [PARTITION (<pt_spec>)]}
INTO
LOCATION <oss_location>
[ROW FORMAT SERDE '<serde_class>'
[WITH SERDEPROPERTIES ('<property_name>'='<property_value>',...)]
]
STORED AS <file_format>
[PROPERTIES ('<tbproperty_name>'='<tbproperty_value>')];Key parameters
Parameter | Description |
oss_location | The destination OSS path. The format is |
serde_class | Specifies the serialization/deserialization (SerDe) library, such as |
SERDEPROPERTIES |
|
file_format | Required. Specifies the file format, such as |
PROPERTIES | - |
Supported formats and compression
File format | Supported compression codecs | Description |
PARQUET | SNAPPY, LZO | A columnar storage format suitable for analytical queries. |
ORC | SNAPPY, LZO | A columnar storage format suitable for the Hadoop ecosystem. |
TEXTFILE | GZIP | A text format that supports custom delimiters. |
RCFILE | - | A row-column hybrid storage format. |
SEQUENCEFILE | - | A Hadoop sequence file format. |
AVRO | - | A data serialization format. |
JSON | - | The JSON format. |
Examples
Example 1: Export as SNAPPY-compressed Parquet
Export data from the sale_detail table in a MaxCompute project to OSS. The following sample data is from the sale_detail table:
-- Partition fields: sale_date, region
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s3 | c3 | 100.3 | 2013 | china |
| null | c5 | NULL | 2014 | shanghai |
| s6 | c6 | 100.4 | 2014 | shanghai |
| s7 | c7 | 100.5 | 2014 | shanghai |
+------------+-------------+-------------+------------+------------+Procedure:
In OSS: Log on to the OSS console, create the
mc-unload/data_location/directory in a bucket in theoss-cn-hangzhouregion, and then construct the OSS path. For more information about how to create a bucket, see Create a bucket in the console. Construct the OSS path based on the bucket, region, and endpoint information:oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_locationIn MaxCompute: Log on to the MaxCompute client and execute the
UNLOADcommand.-- Set the amount of data that a single worker can read from the MaxCompute table to 256 MB. This setting controls the number of output files. SET odps.stage.mapper.split.size=256; UNLOAD FROM (SELECT * FROM sale_detail) INTO LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_location' ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES ( 'odps.properties.rolearn'='acs:ram::<uid>:role/AliyunODPSDefaultRole' ) STORED AS PARQUET PROPERTIES('mcfed.parquet.compression'='SNAPPY');Verify the result: Log on to the OSS console and check the exported files in the destination OSS path. The data is exported in PARQUET format and compressed by using SNAPPY.
Example 2: Export partition data as Parquet
Export data from the MaxCompute sale_detail table in the sale_date='2013' and region='china' partitions to the PARQUET format with SNAPPY compression. The data in sale_detail is as follows:
-- Partition fields: sale_date, region
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s3 | c3 | 100.3 | 2013 | china |
| null | c5 | NULL | 2014 | shanghai |
| s6 | c6 | 100.4 | 2014 | shanghai |
| s7 | c7 | 100.5 | 2014 | shanghai |
+------------+-------------+-------------+------------+------------+Procedure:
In OSS: Log on to the OSS console, create the
mc-unload/data_location/directory in a bucket in theoss-cn-hangzhouregion, and then construct the OSS path. For more information about how to create a bucket, see Create a bucket in the console. Construct the OSS path based on the bucket, region, and endpoint information:oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_locationIn MaxCompute: Log on to the MaxCompute client and execute the
UNLOADcommand.-- Set the amount of data that a single worker can read from the MaxCompute table to 256 MB. This setting controls the number of output files. SET odps.stage.mapper.split.size=256; UNLOAD FROM sale_detail PARTITION (sale_date='2013',region='china') INTO LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_location' ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES ('odps.properties.rolearn'='acs:ram::<uid>:role/AliyunODPSDefaultRole') STORED AS PARQUET PROPERTIES('mcfed.parquet.compression'='SNAPPY');Verify the result: Log on to the OSS console and check the exported files in the destination OSS path. The data is exported in PARQUET format and compressed by using SNAPPY.
Example 3: Export text file with a custom delimiter
Export the data from the sale_detail table as a TXT file and specify a comma as the delimiter.
-- Partition fields: sale_date, region
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s3 | c3 | 100.3 | 2013 | china |
| null | c5 | NULL | 2014 | shanghai |
| s6 | c6 | 100.4 | 2014 | shanghai |
| s7 | c7 | 100.5 | 2014 | shanghai |
+------------+-------------+-------------+------------+------------+Procedure:
In OSS: Log on to the OSS console, create the
mc-unload/data_location/directory in a bucket in theoss-cn-hangzhouregion, and then construct the OSS path. For more information about how to create a bucket, see Create a bucket in the console. Construct the OSS path based on the bucket, region, and endpoint information:oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_locationIn MaxCompute: Log on to the MaxCompute client and execute the
UNLOADcommand.-- Set the amount of data that a single worker can read from the MaxCompute table to 256 MB. This setting controls the number of output files. SET odps.sql.allow.fullscan=true; UNLOAD FROM (SELECT * FROM sale_detail) INTO LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/mc-unload/data_location/' ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('field.delim'=',') STORED AS TEXTFILE PROPERTIES ('odps.external.data.enable.extension'='true');Verify the result: Log on to the OSS console and check the exported files in the destination OSS path. The exported files are in TXT format, and the data is delimited by commas.
Customize file names
When you UNLOAD to OSS, you can use the properties in PROPERTIES or SERDEPROPERTIES to customize the prefix, suffix, and extension of the output file.
Syntax
UNLOAD FROM {<select_statement>|<table_name> [PARTITION (<pt_spec>)]}
INTO
LOCATION <oss_location>
[ROW FORMAT SERDE '<serde_class>'
[WITH SERDEPROPERTIES ('<property_name>'='<property_value>',...)]
]
STORED AS <file_format>
[PROPERTIES ('<tbproperty_name>'='<tbproperty_value>',...)];Key parameters
property_name/tbproperty_name | Description | Example value |
odps.external.data.output.prefix | A file name prefix of 1 to 10 characters, containing letters, digits, and underscores (_). | 'mc_' |
odps.external.data.output.suffix | A file name suffix containing letters, digits, and underscores (_). | '_hangzhou' |
odps.external.data.enable.extension | Specifies whether to display default file extensions, such as | 'true' |
odps.external.data.output.explicit.extension | A custom file extension that overrides the default extension. | 'jsonl' |
Extensions
The file extensions for different external tables are automatically generated as follows when the odps.external.data.enable.extension=true parameter is set.
File format | SerDe | Extension |
SEQUENCEFILE | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | .sequencefile |
TEXTFILE | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | .txt |
RCFILE | org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe | .rcfile |
ORC | org.apache.hadoop.hive.ql.io.orc.OrcSerde | .orc |
PARQUET | org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | .parquet |
AVRO | org.apache.hadoop.hive.serde2.avro.AvroSerDe | .avro |
JSON | org.apache.hive.hcatalog.data.JsonSerDe | .json |
CSV | org.apache.hadoop.hive.serde2.OpenCSVSerde | .csv |
When you export data with SNAPPY or LZO compression, the .snappy or .lzo file extension is not automatically added to the file names.
Examples
Example 1: Customize text file name
Export data from the sale_detail table in a MaxCompute project to a TXT file in OSS, named mc_<system-generated>_beijing.txt. The data in sale_detail is as follows:
-- Partition fields: sale_date, region
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s3 | c3 | 100.3 | 2013 | china |
| null | c5 | NULL | 2014 | shanghai |
| s6 | c6 | 100.4 | 2014 | shanghai |
| s7 | c7 | 100.5 | 2014 | shanghai |
+------------+-------------+-------------+------------+------------+Procedure:
In OSS: Log on to the OSS console, create the
mc-unload/data_location/directory in a bucket in theoss-cn-hangzhouregion, and then construct the OSS path. For more information about how to create a bucket, see Create a bucket in the console. Construct the OSS path based on the bucket, region, and endpoint information:oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_locationIn MaxCompute: Log on to the MaxCompute client and execute the
UNLOADcommand.UNLOAD FROM (SELECT * FROM sale_detail) INTO LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/textfile' row format serde 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS textfile PROPERTIES ( 'odps.external.data.output.prefix'='mc_', 'odps.external.data.output.suffix'='_beijing', 'odps.external.data.enable.extension'='true');Verify the result: Log on to the OSS console and check the exported files in the destination OSS path. The name of the exported file is
mc_<system-generated_name>_beijing.txt.
Example 2: Customize JSON file extension
Export data from the sale_detail table in a MaxCompute project to OSS as a JSON file named in the format mc_<system-generated file name>_beijing.json . The data in sale_detail is as follows:
-- Partition fields: sale_date, region
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s3 | c3 | 100.3 | 2013 | china |
| null | c5 | NULL | 2014 | shanghai |
| s6 | c6 | 100.4 | 2014 | shanghai |
| s7 | c7 | 100.5 | 2014 | shanghai |
+------------+-------------+-------------+------------+------------+Procedure:
In OSS: Log on to the OSS console, create the
mc-unload/data_location/directory in a bucket in theoss-cn-hangzhouregion, and then construct the OSS path. For more information about how to create a bucket, see Create a bucket in the console. Construct the OSS path based on the bucket, region, and endpoint information:oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_locationIn MaxCompute: Log on to the MaxCompute client and execute the
UNLOADcommand.UNLOAD FROM (SELECT * FROM sale_detail) INTO LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/json' ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' WITH SERDEPROPERTIES ( 'odps.external.data.output.prefix'='mc_', 'odps.external.data.output.suffix'='_beijing', 'odps.external.data.output.explicit.extension'='json') STORED AS JSON;Verify the result: Log on to the OSS console and check the exported files in the destination OSS path. The name of the exported file is
mc_<system-generated-name>_beijing.json.
Export to Hologres
Limitations
Dual-signature authorization: Dual-signature authorization is not supported for exporting data to Hologres.
Partitioned tables: Exporting data to a partitioned table in Hologres is not supported.
Data type mapping: The column data types in the destination Hologres table must match those in the source MaxCompute table. For more information, see Data type mapping between MaxCompute and Hologres.
Syntax
UNLOAD FROM {<select_statement>|<table_name> [PARTITION (<pt_spec>)]}
INTO
LOCATION <hologres_location>
STORED BY <StorageHandler>
[WITH SERDEPROPERTIES ('<property_name>'='<property_value>',...)];Key parameters
Parameter | Description |
hologres_location | The JDBC connection string for the destination Hologres table. The format is |
StorageHandler | Specifies the built-in handler. The value is fixed to |
SERDEPROPERTIES | Must include the following three properties:
|
Example
Export the MaxCompute table data_test to the Hologres table mc_2_holo. The data in data_test is as follows:
+------------+------+
| id | name |
+------------+------+
| 3 | rgege |
| 4 | Gegegegr |
+------------+------+Procedure:
In Hologres: Create a destination table named
mc_2_holo.CREATE TABLE mc_2_holo (id INT, name TEXT);In MaxCompute: Log on to the MaxCompute client and execute the
UNLOADcommand.UNLOAD FROM (SELECT * FROM data_test) INTO LOCATION 'jdbc:postgresql://hgprecn-cn-5y**********-cn-hangzhou-internal.hologres.aliyuncs.com:80/test?ApplicationName=MaxCompute¤tSchema=public&useSSL=false&table=mc_2_holo/' STORED BY 'com.aliyun.odps.jdbc.JdbcStorageHandler' WITH SERDEPROPERTIES ( 'odps.properties.rolearn'='acs:ram::13**************:role/aliyunodpsholorole', 'mcfed.mapreduce.jdbc.driver.class'='org.postgresql.Driver', 'odps.federation.jdbc.target.db.type'='holo' );Verify the result: View the exported data in Hologres.
SELECT * FROM mc_2_holo;The following result is returned:
id name 4 Gegegegr 3 rgege
Billing
Command fees
The
UNLOADcommand itself is free of charge.Query clause fees: The query clause in an
UNLOADcommand scans data and consumes computing resources to generate the exported data. Therefore, the query clause is billed as a standard SQL job.
Cost estimation
Storing data in OSS versus MaxCompute involves different cost considerations. To determine if exporting data from MaxCompute is cost-effective, you must estimate the total cost, which includes direct storage costs, changes in data volume, and subsequent query costs.
Direct storage costs
MaxCompute offers Standard, Infrequent Access, and archive storage classes, all supporting zone-redundant storage. For more information about storage billing, see Storage fees.
OSS also provides Standard, Infrequent Access, and archive storage classes that support zone-redundant storage and are priced similarly to MaxCompute storage. Additionally, OSS offers lower-cost cold archive and deep cold archive storage classes that do not support zone-redundant storage. For more information, see OSS storage fees.
Data volume changes
MaxCompute compresses imported data at an approximate 5:1 ratio, and you are billed for the compressed data size.
When you store data in OSS, you must specify and manage the data format and compression method. For information about the data formats and compression methods that MaxCompute supports for data export, see OSS external tables. If you do not apply compression, the data volume will expand when exported to OSS.
Subsequent query costs
Data stored in MaxCompute can be queried at any time, but additional data access fees apply for Infrequent Access and archive storage. When you read data from non-standard storage classes in OSS, data retrieval or early deletion fees may apply.
Related documents
To import data in CSV format or other open-source formats from external storage into MaxCompute, see LOAD.