MaxCompute allows you to export data from MaxCompute projects to Object Storage Service (OSS). This provides an easy way to store structured data in OSS. This also allows other computing engines to use the data that is exported from MaxCompute to OSS. This topic describes how to use UNLOAD statements to export data in the CSV format or another open source format from MaxCompute to OSS.

You can execute the statements that are described in this topic on the following platforms:

Prerequisites

  • OSS is activated.

    For more information about how to activate OSS, see Activate OSS.

  • You have the SELECT permission on the table that you want to export from a MaxCompute project.

    For more information about authorization, see Authorize users.

Limits

The use of UNLOAD statements has the following limits:
  • MaxCompute automatically splits the file that is exported to OSS into multiple parts and generates a name for the file. You cannot customize the name or file name extension for the exported file.
  • File name extensions cannot be added to the exported files in an open source format.
  • If you repeatedly export data, the previously exported file is not overwritten. Instead, a new file is generated.

Usage notes

  • You are not charged for UNLOAD statements. The subquery clauses in the UNLOAD statements need to scan data and use computing resources to calculate the results. Therefore, the subquery clauses are charged as common SQL jobs.
  • In some scenarios, you can store structured data in OSS to reduce storage costs. However, you must estimate the costs in advance.

    The MaxCompute storage fee is USD 0.018 per GB per month. For more information about storage fees, see Storage pricing (pay-as-you-go). The data compression ratio is about 5:1 for the data that is imported into MaxCompute. You are charged based on the size of data after compression.

    If you use the Standard storage class of OSS to store your data, the unit price is USD 0.018 per GB per month. For more information about the fees for the Infrequent Access (IA), Archive, and Cold Archive storage classes, see Storage fees.

    If you want to export data to reduce storage costs, we recommend that you: (1) Evaluate the data compression ratio based on the data feature test. (2) Estimate the costs of using UNLOAD statements based on the query statement used when you export data. (3) Evaluate the method for accessing the exported data to avoid extra costs caused by unnecessary data migration.

Authorization for access to OSS

If you want to export data to OSS, you must grant MaxCompute the permissions to access OSS. The authorization methods for UNLOAD statements are the same as those for MaxCompute external tables. You need to configure the properties specified by WITH SERDEPROPERTIES in UNLOAD statements and then use RAM roles to complete OSS authorization. Authorization procedure:
  1. Log on to the RAM console. In the left-side navigation pane, choose RAM Roles. On the page that appears, click Create RAM Role. In the Create RAM Role panel, specify Alibaba Cloud Service for Trusted entity type in the Select Role Type step. In the Configure Role step, specify Normal Service Role for Role Type, specify RAM Role Name, such as unload2oss, and then select MaxCompute from the Select Trusted Service drop-down list.

    For more information, see Create a RAM role for a trusted Alibaba Cloud service.

    Create RAM Role
  2. After the role is created, attach the system policy AliyunOSSFullAccess to the role.

    For more information, see Grant permissions to a RAM role.

    Add Permissions
After the authorization is complete, you must select one of the following export methods based on the format of the data that you want to export:

Use a built-in extractor to export data

  • Syntax
    unload from {<select_statement>|<table_name> [partition (<pt_spec>)]} 
    into 
    location <external_location>
    [stored by <StorageHandler>]
    [with serdeproperties ('<property_name>'='<property_value>',...)];
  • Parameters
    • select_statement: a SELECT clause. This clause is used to query the data that needs to be inserted into a table in the destination OSS directory from the source table. The source table can be a partitioned table or a non-partitioned table. For more information about SELECT clauses, see SELECT syntax.
    • table_name and pt_spec: You can use the table name or the combination of the table and partition names to specify the data that you want to export. This export method does not automatically generate query statements. Therefore, no fees are incurred. The value of pt_spec is in the (partition_col1 = partition_col_value1, partition_col2 = partition_col_value2, ...) format.
    • external_location: required. The destination OSS directory to which you want to export data. The value of this parameter is in the 'oss://<oss_endpoint>/<object>' format . For more information about OSS directories, see OSS domain names.
    • StorageHandler: required. The name of the storage handler that is considered a built-in extractor. Set this parameter to com.aliyun.odps.CsvStorageHandler or com.aliyun.odps.TsvStorageHandler. The storage handler is used to process CSV and TSV files and defines how to read data from or write data to these files. You need to specify only this parameter based on your business requirements. The related logic is implemented by the system. If you use a built-in extractor to export data, the file name extension .csv or .tsv is automatically added to the files that are exported. You can use this parameter in the same way as you use it for MaxCompute external tables. For more information, see Access OSS data by using a built-in extractor.
    • <property_name>'='<property_value>': optional. property_name specifies the name of a property and property_value specifies the value of a property. You can use this clause in the same way as you use it for MaxCompute external tables. For more information about the properties, see Access OSS data by using a built-in extractor.
  • Examples
    This section demonstrates how to export data of the sale_detail table from a MaxCompute project to OSS. The sale_detail table contains the following data:
    +------------+-------------+-------------+------------+------------+
    | shop_name  | customer_id | total_price | sale_date  | region     |
    +------------+-------------+-------------+------------+------------+
    | s1         | c1          | 100.1       | 2013       | china      |
    | s2         | c2          | 100.2       | 2013       | china      |
    | s3         | c3          | 100.3       | 2013       | china      |
    | null       | c5          | NULL        | 2014       | shanghai   |
    | s6         | c6          | 100.4       | 2014       | shanghai   |
    | s7         | c7          | 100.5       | 2014       | shanghai   |
    +------------+-------------+-------------+------------+------------+
    1. Log on to the OSS console and create the directory mc-unload/data_location/ in the OSS bucket in the oss-cn-hangzhou region and organize the OSS directory. For more information about how to create an OSS bucket, see Create buckets. Bucket
      The following OSS directory is organized based on the bucket, region, and endpoint.
      oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_location
    2. Log on to the MaxCompute client and execute the UNLOAD statement to export data of the sale_detail table to OSS. The following examples are provided:
      • Example 1: Export the data of the sale_detail table as a CSV file and package the file into a GZIP file. Sample statements:
        -- Control the number of exported files: Set the size of data of the MaxCompute table read by a single worker. Unit: megabytes. The MaxCompute table is compressed before you export it. The size of the exported data is about four times the data size before the export. 
        set odps.stage.mapper.split.size=256;
        -- Export data. 
        unload from
        (select * from sale_detail)
        into
        location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_location'
        stored by 'com.aliyun.odps.CsvStorageHandler'
        with serdeproperties ('odps.properties.rolearn'='acs:ram::139699392458****:role/unload2oss', 'odps.text.option.gzip.output.enabled'='true');
        -- The preceding statements are equivalent to the following statements: 
        set odps.stage.mapper.split.size=256;
        unload from sale_detail 
        into
        location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_location'
        stored by 'com.aliyun.odps.CsvStorageHandler'
        with serdeproperties ('odps.properties.rolearn'='acs:ram::139699392458****:role/unload2oss', 'odps.text.option.gzip.output.enabled'='true');                                 
      • Example 2: Export data from the partition (sale_date='2013', region='china') in the sale_detail table as a TSV file to OSS and package the file into a GZIP file.
        -- Control the number of exported files: Set the size of data of the MaxCompute table read by a single worker. Unit: megabytes. The MaxCompute table is compressed before you export it. The size of the exported data is about four times the data size before the export. 
        set odps.stage.mapper.split.size=256;
        -- Export data. 
        unload from sale_detail partition (sale_date='2013',region='china')
        into
        location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_location'
        stored by 'com.aliyun.odps.TsvStorageHandler'
        with serdeproperties ('odps.properties.rolearn'='acs:ram::139699392458****:role/unload2oss', 'odps.text.option.gzip.output.enabled'='true');

      'odps.text.option.gzip.output.enabled'='true' specifies that the exported file is compressed in a GZIP file. Only the GZIP format is supported.

    3. Log on to the OSS console to view the import result in the destination OSS directory.
      • Import result of Example 1 Import result
      • Import result of Example 2 Import result

Export data in another open source format

  • Syntax
    unload from {<select_statement>|<table_name> [partition (<pt_spec>)]}
    into 
    location <external_location>
    [row format serde '<serde_class>'
      [with serdeproperties ('<property_name>'='<property_value>',...)]
    ]
    storeds as <file_format>
    [properties('<tbproperty_name>'='<tbproperty_value>')];
  • Parameters
    • select_statement: a SELECT clause. This clause is used to query the data that needs to be inserted into a table in the destination OSS directory from the source table. The source table can be a partitioned table or a non-partitioned table. For more information about SELECT clauses, see SELECT syntax.
    • table_name and pt_spec: You can use the table name or the combination of the table and partition names to specify the data that you want to export. This export method does not automatically generate query statements. Therefore, no fees are incurred. The value of pt_spec is in the (partition_col1 = partition_col_value1, partition_col2 = partition_col_value2, ...) format.
    • external_location: required. The destination OSS directory to which you want to export data. The value of this parameter is in the 'oss://<oss_endpoint>/<object>' format . For more information about OSS directories, see OSS domain names.
    • serde_class: optional. You can use this clause in the same way as you use it for MaxCompute external tables. For more information, see Open source data formats supported by OSS external tables.
    • '<property_name>'='<property_value>': optional. property_name specifies the name of a property and property_value specifies the value of a property. The supported properties are the same as those supported by MaxCompute external tables. For more information about the properties, see Open source data formats supported by OSS external tables.
    • file_format: required. The format of the exported file, such as ORC, PARQUET, RCFILE, SEQUENCEFILE, or TEXTFILE. You can use this parameter in the same way as you use it for MaxCompute external tables. For more information, see Open source data formats supported by OSS external tables.
    • '<tbproperty_name>'='<tbproperty_value>': optional. tbproperty_name specifies the name of a property in the extended information of the external table. tbproperty_value specifies the value of a property in the extended information of the external table. For example, if you want to export data in an open source format as a file compressed by using Snappy or LZO, you can set the mcfed.parquet.compression property to SNAPPY or LZO.
  • Examples
    This section demonstrates how to export data of the sale_detail table from a MaxCompute project to OSS. The sale_detail table contains the following data:
    +------------+-------------+-------------+------------+------------+
    | shop_name  | customer_id | total_price | sale_date  | region     |
    +------------+-------------+-------------+------------+------------+
    | s1         | c1          | 100.1       | 2013       | china      |
    | s2         | c2          | 100.2       | 2013       | china      |
    | s3         | c3          | 100.3       | 2013       | china      |
    | null       | c5          | NULL        | 2014       | shanghai   |
    | s6         | c6          | 100.4       | 2014       | shanghai   |
    | s7         | c7          | 100.5       | 2014       | shanghai   |
    +------------+-------------+-------------+------------+------------+
    1. Log on to the OSS console and create the directory mc-unload/data_location/ in the OSS bucket in the oss-cn-hangzhou region and organize the OSS directory. For more information about how to create an OSS bucket, see Create buckets. Bucket
      The following OSS directory is organized based on the bucket, region, and endpoint.
      oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_location
    2. Log on to the MaxCompute client and execute the UNLOAD statement to export data of the sale_detail table to OSS. The following examples are provided:
      • Example 1: Export data of the sale_detail table as a file in the PARQUET format and compressed by using Snappy. Sample statements:
        -- Control the number of exported files: Set the size of data of the MaxCompute table read by a single worker. Unit: megabytes. The MaxCompute table is compressed before you export it. The size of the exported data is about four times the data size before the export. 
        set odps.stage.mapper.split.size=256;
        -- Export data. 
        unload from
        (select * from sale_detail)
        into 
        location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_location' 
        row format serde 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
        with serdeproperties ('odps.properties.rolearn'='acs:ram::139699392458****:role/unload2oss') 
        stored as parquet 
        properties('mcfed.parquet.compression'='SNAPPY');
      • Example 2: Export data from the partition (sale_date='2013', region='china') in the sale_detail table as a PARQUET file to OSS and compress the file by using Snappy. Sample statements:
        -- Control the number of exported files: Set the size of data of the MaxCompute table read by a single worker. Unit: megabytes. The MaxCompute table is compressed before you export it. The size of the exported data is about four times the data size before the export. 
        set odps.stage.mapper.split.size=256;
        -- Export data. 
        unload from sale_detail partition (sale_date='2013',region='china') 
        into 
        location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_location' 
        row format serde 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
        with serdeproperties ('odps.properties.rolearn'='acs:ram::139699392458****:role/unload2oss') 
        stored as parquet 
        properties('mcfed.parquet.compression'='SNAPPY');
    3. Log on to the OSS console to view the import result in the destination OSS directory.
      • Import result of Example 1 Import result
      • Import result of Example 2 Import result
      Note If the exported data is compressed by using Snappy or LZO, the file name extension .snappy or .lzo of the exported file cannot be displayed.