MaxCompute allows you to create an Object Storage Service (OSS) external table in your MaxCompute project. The OSS external table maps to an OSS directory. You can use the OSS external table to access unstructured data in data files in the OSS directory or write data from your MaxCompute project to the OSS directory. This topic describes the syntax and parameters that are used to create an OSS external table. This topic also provides examples on how to create an OSS external table.

Background information

OSS is a secure, cost-effective, and highly reliable cloud storage service that can store a large number of data files in any format. If you want to use MaxCompute to read data from an OSS directory or write data from a MaxCompute project to an OSS directory, you can create an OSS external table in your MaxCompute project. The OSS external table maps to the OSS directory.

You can create a partitioned table or a non-partitioned table as an OSS external table. The type of the table that you need to create varies based on the format of the OSS directory in which data files are stored. If data files are stored in a partitioned directory, you must create a partitioned table. Otherwise, you must create a non-partitioned table. For more information about how to read partition data, see Read OSS data stored in partitions.

Prerequisites

Before you create an OSS external table, make sure that the Alibaba Cloud account or RAM user that you use meets the following requirements:
  • The Alibaba Cloud account or RAM user is granted access permissions on OSS.

    For more information about authorization, see STS authorization for OSS.

  • The Alibaba Cloud account or RAM user is granted the CreateTable permission on your project.

    For more information about operation permissions on tables in a MaxCompute project, see Permissions.

Usage notes

When you use OSS external tables, take note of the following items:
  • For data files in different formats, the statements that are used to create OSS external tables are different only in some parameters. We recommend that you carefully read the syntax that is used to create an OSS external table and the parameter descriptions. This helps you create an OSS external table that meets your business requirements. If an invalid parameter is configured when you create an OSS external table, you may fail to read OSS data or write data to OSS by using the OSS external table.
  • OSS external tables record only the mappings between these tables and OSS directories. If you delete an OSS external table, data files in the OSS directory that is mapped to the table are not deleted.
  • If a data file stored in OSS is an object of the Archive storage class, you must first restore the file. For more information about the restoration operation, see Restore objects.

Operation platforms

The following table lists platforms in which you can create OSS external tables.
Creation method Platform
MaxCompute SQL statements MaxCompute client
Query editor in the MaxCompute console
ODPS SQL nodes in the DataWorks console
SQL scripts in MaxCompute Studio
Web UI SQL scripts in MaxCompute Studio
DataWorks console

Syntax used to create an OSS external table

The following table describes the syntax that is used to create OSS external tables in different scenarios. For more information about syntax parameters and properties, see Reference: Parameters in the syntax, Reference: WITH SERDEPROPERTIES properties, and Reference: TBLPROPERTIES properties.

Scenario Syntax Supported format of OSS data files
Create an external table by using a built-in text extractor
create external table [if not exists] <mc_oss_extable_name> 
(
<col_name> <data_type>,
...
)
[partitioned by (<col_name> <data_type>, ...)] 
stored by '<StorageHandler>'  
with serdeproperties (
 ['<property_name>'='<property_value>',...]
) 
location '<oss_location>';
  • CSV data files
  • CSV data files compressed in the GZIP format
  • TSV data files
  • TSV data files compressed in the GZIP format
Create an external table by using a built-in open source data extractor
create external table [if not exists] <mc_oss_extable_name>
(
<col_name> <data_type>,
...
)
[partitioned by (<col_name> <data_type>, ...)]
[row format serde '<serde_class>'
  [with serdeproperties (
    ['<property_name>'='<property_value>',...])
]
stored as <file_format> 
location '<oss_location>' 
[using '<resource_name>']
[tblproperties ('<tbproperty_name>'='<tbproperty_value>',...)];
  • PARQUET data files
  • PARQUET data files compressed in the SNAPPY, GZIP, or LZO format
  • TEXTFILE (JSON or TEXT) data files
  • TEXTFILE data files compressed in the SNAPPY, LZO, BZ2, GZ, or DEFLATE format
  • ORC data files
  • ORC data files compressed in the SNAPPY or ZLIB format
  • AVRO data files
  • SEQUENCEFILE data files
Create an external table by using a custom extractor
create external table [if not exists] <mc_oss_extable_name> 
(
<col_name> <date_type>,
...
)
[partitioned by (<col_name> <data_type>, ...)] 
stored by '<StorageHandler>' 
with serdeproperties (
 ['<property_name>'='<property_value>',...]
) 
location '<oss_location>' 
using '<jar_name>';
Data files except the preceding formats

Syntax used to add partition data to OSS external tables

If the OSS external table that you created is a partitioned table, partition data needs to be added to the OSS external table. MaxCompute automatically adds partition data to the OSS external table based on the OSS directory that is specified when you create the OSS external table. Syntax:
  • Syntax 1 (recommended): After you execute the following statement, MaxCompute automatically parses the OSS directory structure, identifies partitions, and adds partition data to the OSS external table.
    msck repair table <mc_oss_extable_name> add partitions;
  • Syntax 2: Execute the following statement to manually add partition data to the OSS external table.
    alter table <mc_oss_extable_name> add partition (<col_name>=<col_value>) [add partition (<col_name>=<col_value>) ...];
    The values of col_name and col_value must be consistent with the names of the OSS subdirectories where the partition data file is stored. If the OSS directory structure of the partition data file shown in the following figure is used, the value of col_name is direction, and the values of col_value are N, NE, S, SW, and W. One add partition clause in the ALTER TABLE statement is used to add a partition that is mapped to an OSS subdirectory. The number of add partition clauses must be the same as that of OSS subdirectories. Directories

Example: Prepare data

Sample data is provided for you to better understand the examples in this topic.
  • oss_endpoint: oss-cn-hangzhou-internal.aliyuncs.com, which indicates that OSS is deployed in the China (Hangzhou) region.
  • Bucket name: oss-mc-test.
  • Directory name: Demo1/, Demo2/, Demo3/, and SampleData/.
    The following data files are uploaded to the preceding directories:
    • The vehicle.csv file is uploaded to the Demo1/ directory. The vehicle.csv file is used to create a mapping between the Demo1/ directory and the non-partitioned external table that is created by using a built-in text extractor. The file contains the following data:
      1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S
      1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE
      1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE
      1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W
      1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S
      1,6,9,1,46.81006,-92.08174,9/15/2014 0:00,S
      1,7,53,1,46.81006,-92.08174,9/15/2014 0:00,N
      1,8,63,1,46.81006,-92.08174,9/15/2014 0:00,SW
      1,9,4,1,46.81006,-92.08174,9/15/2014 0:00,NE
      1,10,31,1,46.81006,-92.08174,9/15/2014 0:00,N
    • The Demo2/ directory contains the following subdirectories: direction=N/, direction=NE/, direction=S/, direction=SW/, and direction=W/. The vehicle1.csv, vehicle2.csv, vehicle3.csv, vehicle4.csv, and vehicle5.csv files are separately uploaded to the five subdirectories. The files are used to create mappings between the Demo2/ directory and the partitioned external table that is created by using a built-in text extractor. The files contain the following data:
      --vehicle1.csv
      1,7,53,1,46.81006,-92.08174,9/15/2014 0:00
      1,10,31,1,46.81006,-92.08174,9/15/2014 0:00
      
      --vehicle2.csv
      1,2,13,1,46.81006,-92.08174,9/14/2014 0:00
      1,3,48,1,46.81006,-92.08174,9/14/2014 0:00
      1,9,4,1,46.81006,-92.08174,9/15/2014 0:00
      
      --vehicle3.csv
      1,6,9,1,46.81006,-92.08174,9/15/2014 0:00
      1,5,47,1,46.81006,-92.08174,9/14/2014 0:00
      1,6,9,1,46.81006,-92.08174,9/15/2014 0:00
      
      --vehicle4.csv
      1,8,63,1,46.81006,-92.08174,9/15/2014 0:00
      
      --vehicle5.csv
      1,4,30,1,46.81006,-92.08174,9/14/2014 0:00
      Demo2
    • The vehicle.csv.gz file is uploaded to the Demo3/ directory. The vehicle.csv file is contained in the vehicle.csv.gz file and has the same content as the vehicle.csv file in the Demo1/ directory. The vehicle.csv file is used to create a mapping between the Demo3/ directory and the OSS external table whose data is compressed. Compressed file
    • The vehicle6.csv file is uploaded to the SampleData/ directory. The vehicle6.csv file is used to create a mapping between the SampleData/ directory and the OSS external table that is created by using a built-in open source data extractor. The file contains the following data:
      1|1|51|1|46.81006|-92.08174|9/14/2014 0:00|S
      1|2|13|1|46.81006|-92.08174|9/14/2014 0:00|NE
      1|3|48|1|46.81006|-92.08174|9/14/2014 0:00|NE
      1|4|30|1|46.81006|-92.08174|9/14/2014 0:00|W
      1|5|47|1|46.81006|-92.08174|9/14/2014 0:00|S
      1|6|9|1|46.81006|-92.08174|9/14/2014 0:00|S
      1|7|53|1|46.81006|-92.08174|9/14/2014 0:00|N
      1|8|63|1|46.81006|-92.08174|9/14/2014 0:00|SW
      1|9|4|1|46.81006|-92.08174|9/14/2014 0:00|NE
      1|10|31|1|46.81006|-92.08174|9/14/2014 0:00|N

Example: Create a non-partitioned table as an OSS external table by using a built-in text extractor

Create a mapping between the non-partitioned external table and the Demo1/ directory in Example: Prepare data. Sample statement used to create an OSS external table:

create external table if not exists mc_oss_csv_external1
(
vehicleId int,
recordId int,
patientId int,
calls int,
locationLatitute double,
locationLongtitue double,
recordTime string,
direction string
)
stored by 'com.aliyun.odps.CsvStorageHandler' 
with serdeproperties (
 'odps.properties.rolearn'='acs:ram::xxxxxx:role/aliyunodpsdefaultrole'
) 
location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo1/';

You can execute the DESC EXTENDED mc_oss_csv_external1; statement to view the schema of the OSS external table that you created.

Example: Create a partitioned table as an OSS external table by using a built-in text extractor

Create a mapping between the partitioned external table and the Demo2/ directory in Example: Prepare data. Sample statements used to create an OSS external table and add partition data to the OSS external table:

create external table if not exists mc_oss_csv_external2
(
vehicleId int,
recordId int,
patientId int,
calls int,
locationLatitute double,
locationLongtitue double,
recordTime string
)
partitioned by (
direction string
)
stored by 'com.aliyun.odps.CsvStorageHandler' 
with serdeproperties (
 'odps.properties.rolearn'='acs:ram::xxxxxx:role/aliyunodpsdefaultrole'
) 
location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo2/';
-- Add partition data. 
msck repair table mc_oss_csv_external2 add partitions;
-- The preceding statement is equivalent to the following statement: 
alter table mc_oss_csv_external2 add partition (direction = 'N') partition (direction = 'NE') partition (direction = 'S') partition (direction = 'SW') partition (direction = 'W');

You can execute the DESC EXTENDED mc_oss_csv_external2; statement to view the schema of the OSS external table that you created.

Example: Create an OSS external table that is used to read or write compressed data by using a built-in text extractor

Create a mapping between the OSS external table and the Demo3/ directory in Example: Prepare data. Sample statement used to create an OSS external table:

create external table if not exists mc_oss_csv_external3
(
vehicleId int,
recordId int,
patientId int,
calls int,
locationLatitute double,
locationLongtitue double,
recordTime string,
direction string
)
stored by 'com.aliyun.odps.CsvStorageHandler' 
with serdeproperties (
 'odps.properties.rolearn'='acs:ram::xxxxxx:role/aliyunodpsdefaultrole',
 'odps.text.option.gzip.input.enabled'='true',
 'odps.text.option.gzip.output.enabled'='true' 
) 
location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo3/';

You can execute the DESC EXTENDED mc_oss_csv_external3; statement to view the schema of the OSS external table that you created.

Example: Create an OSS external table by using a built-in open source data extractor

  • PARQUET data files are mapped to the OSS external table.
    create external table [if not exists] <mc_oss_extable_name>
    (
    <col_name> <data_type>,
    ...
    )
    [partitioned by (<col_name> <data_type>, ...)]
    stored as parquet  
    location '<oss_location>';
    select ... from <mc_oss_extable_name> ...;
  • TEXTFILE data files are mapped to the OSS external table.
    • Example of creating an OSS external table that maps to TEXT data files
      create external table [if not exists] <mc_oss_extable_name>
      (
      <col_name> <data_type>,
      ...
      )
      [partitioned by (<col_name> <data_type>, ...)]
      row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
      stored as textfile
      location '<oss_location>';
      select ... from <mc_oss_extable_name> ...;
      CREATE EXTERNAL TABLE statements do not support custom row format settings. Default row format settings:
      FIELDS TERMINATED BY: '\001'
      ESCAPED BY: '\'
      COLLECTION ITEMS TERMINATED BY: '\002'
      MAP KEYS TERMINATED BY: '\003'
      LINES TERMINATED BY: '\n'
      NULL DEFINED AS: '\N'
    • Example of creating an OSS external table that maps to CSV data files
      -- Disable the native text reader. 
      set odps.ext.hive.lazy.simple.serde.native=false;
      -- Create an OSS external table. 
      create external table <mc_oss_extable_name>
      (
      <col_name> <data_type>,
      ...
      )
      row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
      with serdeproperties (
      "separatorChar" = ",",
      "quoteChar"= '"',
      "escapeChar"= "\\"
      )
      stored as textfile
      location '<oss_location>'
      tblproperties (
      "skip.header.line.count"="1",
      "skip.footer.line.count"="1"
      );
      select ... from <mc_oss_extable_name> ...;
      Note OpenCSVSerde supports only data of the STRING type. When you execute a Data Manipulation Language (DML) statement, you must add the set odps.sql.hive.compatible=true; command before the DML statement and submit this command with the statement.
    • Example of creating an OSS external table that maps to JSON data files
      create external table [if not exists] <mc_oss_extable_name>
      (
      <col_name> <data_type>,
      ...
      )
      [partitioned by (<col_name> <data_type>, ...)]
      row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
      stored as textfile
      location '<oss_location>';
      select ... from <mc_oss_extable_name> ...;
  • ORC data files are mapped to the OSS external table.
    create external table [if not exists] <mc_oss_extable_name>
    (
    <col_name> <data_type>,
    ...
    )
    [partitioned by (<col_name> <data_type>, ...)]
    stored as orc
    location '<oss_location>';
    select ... from <mc_oss_extable_name> ...;
  • AVRO data files are mapped to the OSS external table.
    create external table [if not exists] <mc_oss_extable_name>
    (
    <col_name> <data_type>,
    ...
    )
    [partitioned by (<col_name> <data_type>, ...)]
    stored as avro
    location '<oss_location>';
    select ... from <mc_oss_extable_name> ...;
  • SEQUENCEFILE data files are mapped to the OSS external table.
    create external table [if not exists] <mc_oss_extable_name>
    (
    <col_name> <data_type>,
    ...
    )
    [partitioned by (<col_name> <data_type>, ...)]
    stored as sequencefile
    location '<oss_location>';
    select ... from <mc_oss_extable_name> ...;

Example: Create an OSS external table by using a custom extractor

Create a mapping between the OSS external table and the SampleData/ directory in Example: Prepare data.

Perform the following steps:

  1. Use the Java programs TextExtractor.java, TextOutputer.java, SplitReader.java, and TextStorageHandler.java that are developed by using MaxCompute Studio.
    For more information about how to develop a Java program, see Develop a UDF.
  2. MaxCompute Studio allows you to package the TextStorageHandler.java program into a JAR file and upload the file to your MaxCompute project as a resource.
    In this example, the JAR file is named javatest-1.0-SNAPSHOT.jar. For more information about how to package resources into files and upload the files to your MaxCompute project, see Package a Java program, upload the package, and create a MaxCompute UDF.
  3. Execute the following statement to create an OSS external table:
    create external table if not exists ambulance_data_txt_external
    (
    vehicleId int,
    recordId int,
    patientId int,
    calls int,
    locationLatitute double,
    locationLongtitue double,
    recordTime string,
    direction string
    )
    stored by 'com.aliyun.odps.udf.example.text.TextStorageHandler' 
      with serdeproperties (
    'delimiter'='|',  
    'odps.properties.rolearn'='acs:ram::xxxxxxxxxxxxx:role/aliyunodpsdefaultrole'
    )
    location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/SampleData/'
    using 'javatest-1.0-SNAPSHOT.jar'; 

    delimiter indicates the delimiter of OSS data.

    You can execute the DESC EXTENDED ambulance_data_txt_external; statement to view the schema of the OSS external table that you created.

Reference: Parameters in the syntax

This section describes the parameters in the syntax.

Parameter Required Description
mc_oss_extable_name Yes The name of the OSS external table that you want to create.

Table names are not case-sensitive. When you query external tables, the table names are not case-sensitive, and forced uppercase and lowercase conversions are not supported.

col_name Yes The name of a column in the OSS external table.

Before you read OSS data, make sure that the schema of the OSS external table is the same as the schema of the OSS data files. Otherwise, OSS data cannot be read.

data_type Yes The data type of a column in the OSS external table.

Before you read OSS data, make sure that the data types of columns in the OSS external table are the same as the data types of columns in the OSS data files. Otherwise, OSS data cannot be read.

partitioned by (<col_name> <data_type>, ...) Required if specific conditions are met If data files in OSS are stored in partitioned directories, this parameter is required to create a partitioned external table.
  • col_name: The name of a partition key column.
  • data_type: The data type of a partition key column.
StorageHandler Required if specific conditions are met If you use a built-in text extractor or a custom extractor to create an OSS external table, this parameter is required. MaxCompute extractors are categorized into the following types:
  • Built-in text extractors:
    • com.aliyun.odps.CsvStorageHandler: defines how to read data from and write data to CSV files or CSV.GZ files.
    • com.aliyun.odps.TsvStorageHandler: defines how to read data from and write data to TSV files or TSV.GZ files.
  • Custom extractors: You can write MaxCompute user defined functions (UDFs) to create a custom extractor. For more information about how to write MaxCompute UDFs, see Develop a UDF.
'<property_name>'='<property_value>' Yes The extended property of the external table. For more information the properties, see Reference: WITH SERDEPROPERTIES properties.
oss_location Yes The OSS directory where data files are stored. The OSS directory is in the oss://<oss_endpoint>/<Bucket name>/<Directory name>/ format. MaxCompute automatically reads data from all files in the OSS directory that you specified.
  • oss_endpoint: the OSS endpoint. We recommend that you use an internal endpoint of OSS to prevent extra fees that are incurred by Internet traffic. For more information about the internal endpoints of OSS, see Regions and endpoints.
    Note We recommend that OSS for storing data files is deployed in the same region as your MaxCompute project. MaxCompute can be deployed only in some regions. Therefore, cross-region data connectivity issues may occur.
  • Bucket name: the name of the OSS bucket. For more information about how to view bucket names, see List buckets.
  • Directory name: the name of the OSS directory. You do not need to include file names in directory names.
    -- Example of a valid OSS directory: 
    oss://oss-cn-shanghai-internal.aliyuncs.com/oss-mc-test/Demo1/
    -- Examples of invalid OSS directories: 
    http://oss-cn-shanghai-internal.aliyuncs.com/oss-mc-test/Demo1/                -- HTTP connections are not supported. 
    https://oss-cn-shanghai-internal.aliyuncs.com/oss-mc-test/Demo1/               -- HTTPS connections are not supported. 
    oss://oss-cn-shanghai-internal.aliyuncs.com/Demo1                              -- The endpoint is invalid. 
    oss://oss-cn-shanghai-internal.aliyuncs.com/oss-mc-test/Demo1/vehicle.csv     -- The file name does not need to be included in the directory name. 
jar_name Required if specific conditions are met If you use a custom extractor to create an OSS external table, this parameter is required. This parameter specifies the JAR package that corresponds to the custom extractor code. The JAR package must be added to your MaxCompute project as a resource.

For more information about how to add resources, see Add resources.

serde_class Required if specific conditions are met This parameter specifies the built-in open source data extractor of MaxCompute. If the data file is in the TEXTFILE format, this parameter is required. You do not need to configure this parameter in other scenarios.
Mappings between open source data formats that are supported by MaxCompute and SerDe classes:
  • PARQUET: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe.
  • TEXTFILE: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, and org.apache.hadoop.hive.serde2.OpenCSVSerde.
  • ORC: org.apache.hadoop.hive.ql.io.orc.OrcSerde.
  • AVRO: org.apache.hadoop.hive.serde2.avro.AvroSerDe.
  • SEQUENCEFILE: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.
  • RCFILE: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe.
  • ORCFILE: org.apache.hadoop.hive.ql.io.orc.OrcSerde.
file_format Required if specific conditions are met If OSS data files are in an open source format, this parameter is required. This parameter specifies the format of the OSS data files.
Note The size of a single file cannot exceed 3 GB. If the file size exceeds 3 GB, we recommend that you split the file into multiple files.
resource_name Required if specific conditions are met If you use a custom SerDe class, you must use this parameter to specify the resource that contains the custom SerDe class.

JAR packages that are related to the SerDe class must be added to your MaxCompute project as resources.

For more information about how to add resources, see Add resources.

'<tbproperty_name>'='<tbproperty_value>' Required if specific conditions are met The extended property of the external table. For more information the properties, see Reference: TBLPROPERTIES properties.

Reference: WITH SERDEPROPERTIES properties

property_name Scenario Description property_value Default value
odps.properties.rolearn If you use Security Token Service (STS) authorization, add this property. Specifies the Alibaba Cloud Resource Name (ARN) of AliyunODPSDefaultRole in Resource Access Management (RAM). You can obtain the ARN from the role details page in the RAM console. None
delimiter If you need to specify column delimiters for a CSV or TSV data file, add this property. MaxCompute normally reads data from each column in an OSS data file based on the specified delimiters. A single character Comma (,)
odps.text.option.gzip.input.enabled If you need to read data from a CSV.GZ or TSV.GZ data file, add this property. CSV or TSV compression properties. If you set this parameter to True, MaxCompute normally reads data from CSV.GZ or TSV.GZ data files. Otherwise, the data read operation fails.
  • True
  • False
False
odps.text.option.gzip.output.enabled If you need to write data to OSS by using the GZIP compression algorithm, add this property. CSV or TSV compression properties. If you set this parameter to True, MaxCompute normally writes data to OSS by using the GZIP compression algorithm. Otherwise, the data is not compressed.
  • True
  • False
False
odps.text.option.header.lines.count If the OSS data file is in the CSV or TSV format and you need to ignore the first N rows in the OSS data file, add this property. Specifies the number of rows that are ignored when you use MaxCompute to read OSS data files. Non-negative integer 0
odps.text.option.null.indicator If the OSS data file is in the CSV or TSV format and you need to define parsing rules of NULL in the OSS data file, add this property. Specifies the string that is parsed as NULL in an SQL statement. For example, if you use \N to represent NULL, a,\N,b is parsed as a, NULL, b. String Empty string
odps.text.option.ignore.empty.lines If the OSS data file is in the CSV or TSV format and you need to define processing rules of empty rows in the OSS data file, add this property. If you set this parameter to True, MaxCompute reads all rows in the data file. Otherwise, MaxCompute reads the empty rows.
  • True
  • False
True
odps.text.option.encoding If the OSS data file is in the CSV or TSV format and the encoding format of the OSS data file is not the default encoding format, add this property. Make sure that the encoding format that is configured in this scenario is the same as the encoding format of the OSS data file. Otherwise, MaxCompute cannot read data successfully.
  • UTF-8
  • UTF-16
  • US-ASCII
  • GBK
UTF-8
odps.text.option.delimiter If you need to specify column delimiters for a CSV or TSV data file, add this property. Make sure that the column delimiters specified in this scenario can be used to normally read each column of the OSS data file. Otherwise, the data read by MaxCompute is misaligned. A single character Comma (,)
odps.text.option.use.quote If fields in a CSV or TSV data file contain carriage return, line feed (CRLF) pairs, double quotation marks ("), or commas (,), add this property. Specifies whether to recognize column delimiters in a CSV file if it uses double quotation marks (") as column delimiters. If fields in the CSV file contain specified symbols that are used to separate multiple values, these fields must be enclosed in double quotation marks ("). The symbols include CRLF pairs, double quotation marks ("), and commas (,). If a field contains a double quotation mark ("), replace the double quotation mark (") with two double quotation marks ("") for escaping.
  • True
  • False
False
mcfed.parquet.compression If you need to write data in a PARQUET data file to OSS by using a compression algorithm, add this property. PARQUET compression property. By default, data in the PARQUET data file is not compressed. If you specify a compression algorithm, you must add the set odps.sql.hive.compatible=true; command before the statement that is used to create an OSS external table and submit this command with the statement.
  • SNAPPY
  • GZIP
None
parquet.file.cache.size If you need to improve the performance of reading data in a PARQUET data file, add this property. Specifies the maximum amount of data that can be cached when you read OSS data files. Unit: KB. 1024 None
parquet.io.buffer.size If you need to improve the performance of reading data in a PARQUET data file, add this property. Specifies the maximum amount of data that can be cached when the size of the OSS data file exceeds 1,024 KB. Unit: KB. 4096 None
separatorChar If you need to specify column delimiters for a CSV data file in the TEXTFILE format, add this property. Specifies the column delimiters for a CSV data file. A single string Comma (,)
quoteChar If fields in a CSV data file in the TEXTFILE format contain CRLF pairs, double quotation marks ("), or commas (,), add this property. Specifies the quote for a CSV data file. A single string None
escapeChar If you need to specify the rules that are used to escape a CSV data file in the TEXTFILE format, add this property. Specifies escape characters for a CSV data file. A single string None

Reference: TBLPROPERTIES properties

property_name Scenario Description property_value Default value
skip.header.line.count If you need to ignore the first N rows in a CSV data file in the TEXTFILE format, add this property. If you use MaxCompute to read OSS data, the data in the specified number of rows that start from the first row is ignored. Non-negative integer None
skip.footer.line.count If you need to ignore the last N rows in a CSV data file in the TEXTFILE format, add this property. If you use MaxCompute to read OSS data, the data in the specified number of rows that start from the last row is ignored. Non-negative integer None
mcfed.orc.compress If you need to write data in an ORC data file to OSS by using a compression algorithm, add this property. ORC compression property. Specifies the compression algorithm of ORC data files. If you specify a compression algorithm, you must add the set odps.sql.hive.compatible=true; command before the statement that is used to create an OSS external table and submit this command with the statement.
  • SNAPPY
  • ZLIB
None
mcfed.mapreduce.output.fileoutputformat.compress If you need to write data in a TEXTFILE data file to OSS by using a compression algorithm, add this property. TEXTFILE compression property. If you set this parameter to True, MaxCompute writes data in the TEXTFILE data files to OSS by using a compression algorithm. Otherwise, data is not compressed. If you specify a compression algorithm, you must add the set odps.sql.hive.compatible=true; command before the statement that is used to create an OSS external table and submit this command with the statement.
  • True
  • False
False
mcfed.mapreduce.output.fileoutputformat.compress.codec If you need to write data in a TEXTFILE data file to OSS by using a compression algorithm, add this property. TEXTFILE compression property. Specifies the compression algorithm of the TEXTFILE data files. If you specify a compression algorithm, you must add the set odps.sql.hive.compatible=true; command before the statement that is used to create an OSS external table and submit this command with the statement.
  • com.hadoop.compression.lzo.LzoCodec
  • com.hadoop.compression.lzo.LzopCodec
  • org.apache.hadoop.io.compress.SnappyCodec
  • com.aliyun.odps.io.compress.SnappyRawCodec
None
io.compression.codecs If the OSS data file is in the raw Snappy format, add this property. If you set this parameter to True, MaxCompute normally reads data from compressed files. Otherwise, the data read operation fails. com.aliyun.odps.io.compress.SnappyRawCodec None