All Products
Search
Document Center

MaxCompute:CSV/TSV external tables

Last Updated:Sep 12, 2025

This topic describes how to create, read from, and write to CSV and TSV external tables in Object Storage Service (OSS).

Prerequisites

  • Ensure your Alibaba Cloud account, RAM user, or RAM role has the required permissions to access OSS external tables. For more information about authorization, see STS authorization for OSS.

  • (Optional) Prepare an OSS bucket, directories, and data files. For more information, see Create buckets, Manage directories, and Simple upload.

    MaxCompute can automatically create directories in OSS. You can use a single SQL statement to read from or write to an external table that uses a UDF. Manual directory creation is no longer required, but the legacy method is still supported.
  • A MaxCompute project has been created. For more information, see Create a MaxCompute project.

    MaxCompute is deployed only in specific regions. To prevent cross-region data connectivity issues, we recommend that you use a bucket in the same region as your MaxCompute project.
  • The Alibaba Cloud account or RAM user has the CreateTable permission on your project. For more information about table operation permissions, see MaxCompute permissions.

Limits

  • The cluster attribute is not supported for OSS external tables.

  • A single file cannot exceed 3 GB. If a file exceeds this limit, we recommend that you split it.

Notes

An error is reported when you read data from a CSV or TSV file if the number of columns in the file does not match the number of columns in the external table's DDL statement. For example: FAILED: ODPS-0123131:User defined function exception - Traceback:java.lang.RuntimeException: SCHEMA MISMATCH:xxx.

To control this behavior, you can set the odps.sql.text.schema.mismatch.mode parameter at the session level:

  • SET odps.sql.text.schema.mismatch.mode=error: If the number of columns does not match, the system reports an error.

  • SET odps.sql.text.schema.mismatch.mode=truncate: If the number of columns does not match, extra columns are discarded. If the source data has fewer columns than the table definition, the missing columns are populated with NULL.

Data type support

Data type

com.aliyun.odps.CsvStorageHandler/

TsvStorageHandler

(built-in text data parser)

org.apache.hadoop.hive.serde2.OpenCSVSerde

(built-in open source data parser)

TINYINT

已开通

已开通

SMALLINT

已开通

已开通

INT

已开通

已开通

BIGINT

已开通

已开通

BINARY

未开通

未开通

FLOAT

已开通

已开通

DOUBLE

已开通

已开通

DECIMAL(precision,scale)

已开通

已开通

VARCHAR(n)

已开通

已开通

CHAR(n)

已开通

已开通

STRING

已开通

已开通

DATE

已开通

已开通

DATETIME

已开通

已开通

TIMESTAMP

已开通

已开通

TIMESTAMP_NTZ

已开通

未开通

BOOLEAN

已开通

已开通

ARRAY

未开通

未开通

MAP

未开通

未开通

STRUCT

未开通

未开通

JSON

未开通

未开通

Compression format support

When you read from or write to compressed OSS files, you must add the with serdeproperties property to the CREATE TABLE statement. For more information, see with serdeproperties parameters.

Compression format

com.aliyun.odps.CsvStorageHandler/

TsvStorageHandler

(built-in text data parser)

org.apache.hadoop.hive.serde2.OpenCSVSerde

(built-in open source data parser)

GZIP

已开通

已开通

SNAPPY

未开通

已开通

LZO

未开通

已开通

Create an external table

Syntax

For more information about the syntax for creating external tables in different formats, see OSS external tables.

  • Built-in text data parser

    CSV format

    CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name> 
    (
      <col_name> <data_type>,
      ...
    )
    [COMMENT <table_comment>]
    [PARTITIONED BY (<col_name> <data_type>, ...)] 
    STORED BY 'com.aliyun.odps.CsvStorageHandler'  
    [WITH serdeproperties (
      ['<property_name>'='<property_value>',...]
    )] 
    LOCATION '<oss_location>'
    [tblproperties ('<tbproperty_name>'='<tbproperty_value>',...)];

    TSV format

    CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name> 
    (
      <col_name> <data_type>,
      ...
    )
    [COMMENT <table_comment>]
    [PARTITIONED BY (<col_name> <data_type>, ...)] 
    STORED BY 'com.aliyun.odps.TsvStorageHandler'  
    [WITH serdeproperties (
      ['<property_name>'='<property_value>',...]
    )] 
    LOCATION '<oss_location>'
    [tblproperties ('<tbproperty_name>'='<tbproperty_value>',...)];
  • Built-in open source data parser

    CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name>
    (
      <col_name> <data_type>,
      ...
    )
    [COMMENT <table_comment>]
    [PARTITIONED BY (<col_name> <data_type>, ...)]
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
    [WITH serdeproperties (
      ['<property_name>'='<property_value>',...]
    )]
    STORED AS TEXTFILE 
    LOCATION '<oss_location>' 
    [tblproperties ('<tbproperty_name>'='<tbproperty_value>',...)];

Common parameters

For more information about common parameters, see Basic syntax parameters.

Unique parameters

with serdeproperties parameters

Applicable parser

Property name

Scenario

Description

Property value

Default value

Built-in text data parser (CsvStorageHandler/TsvStorageHandler)

odps.text.option.gzip.input.enabled

Add this property to read data from GZIP-compressed CSV or TSV files.

CSV and TSV compression property. Set this parameter to True to allow MaxCompute to read compressed files. Otherwise, the read operation fails.

  • True

  • False

False

odps.text.option.gzip.output.enabled

Add this property to write data to OSS in GZIP compression format.

CSV and TSV compression property. Set this parameter to True to allow MaxCompute to write data to OSS in GZIP compression format. Otherwise, the data is not compressed.

  • True

  • False

False

odps.text.option.header.lines.count

Add this property to ignore the first N rows of a CSV or TSV data file in OSS.

When MaxCompute reads a data file from OSS, it ignores the specified number of rows.

A non-negative integer

0

odps.text.option.null.indicator

Add this property to define how NULL values are parsed in a CSV or TSV data file in OSS.

The string specified by this parameter is parsed as NULL in SQL. For example, if \N represents NULL, a,\N,b is parsed as a, NULL, b.

String

Empty string

odps.text.option.ignore.empty.lines

Add this property to define how empty lines are handled in a CSV or TSV data file in OSS.

If this parameter is set to True, MaxCompute ignores empty lines in the data file. Otherwise, empty lines are read.

  • True

  • False

True

odps.text.option.encoding

Add this property if the encoding of the CSV or TSV data file in OSS is not the default encoding.

Make sure that the encoding specified here matches the encoding of the OSS data file. Otherwise, MaxCompute cannot read the data.

  • UTF-8

  • UTF-16

  • US-ASCII

  • GBK

UTF-8

odps.text.option.delimiter

Add this property to specify the column delimiter for a CSV or TSV data file.

Make sure that the specified column delimiter can correctly read each column of the OSS data file. Otherwise, the data read by MaxCompute may be misaligned.

A single character

A comma (,)

odps.text.option.use.quote

Add this property if a field in the CSV or TSV data file contains a line break (CRLF), a double quotation mark, or a comma.

When a field in a CSV file contains a line break, a double quotation mark (escaped by prefixing " with another "), or a comma, the entire field must be enclosed in double quotation marks ("") as the column delimiter. This parameter specifies whether to detect the CSV column delimiter ".

  • True

  • False

False

odps.sql.text.option.flush.header

When writing data to OSS, the first row of the file block is the header.

This parameter is effective only for CSV files.

  • True

  • False

False

odps.sql.text.schema.mismatch.mode

When the number of data columns in the OSS file does not match the number of columns in the external table schema.

Specifies how to handle rows with a mismatched number of columns.

Note: This setting has no effect when the odps.text.option.use.quote parameter is set to True.

  • error: Reports an error.

  • truncate: Discards extra columns. Fills missing columns with null.

  • ignore: Discards inconsistent rows.

error

Built-in open source data parser (OpenCSVSerde)

separatorChar

Add this property to specify the column delimiter for CSV data saved in TEXTFILE format.

Specifies the column delimiter for CSV data.

A single string

A comma (,)

quoteChar

Add this property if a field in the CSV data saved in TEXTFILE format contains a line break, a double quotation mark, or a comma.

Specifies the quote character for CSV data.

A single string

None

escapeChar

Add this property to specify the escape rule for CSV data saved in TEXTFILE format.

Specifies the escape character for CSV data.

A single string

None

tblproperties parameters

Applicable parser

Property name

Scenario

Description

Property value

Default value

Built-in open source data parser (OpenCSVSerde)

skip.header.line.count

Add this property to ignore the first N rows of a CSV file saved in TEXTFILE format.

When MaxCompute reads data from OSS, it ignores the specified number of rows starting from the first row.

A non-negative integer

None

skip.footer.line.count

Add this property to ignore the last N rows of a CSV file saved in TEXTFILE format.

When MaxCompute reads data from OSS, it ignores the specified number of rows starting from the last row.

A non-negative integer

None

mcfed.mapreduce.output.fileoutputformat.compress

Add this property to write TEXTFILE data files to OSS in a compressed format.

TEXTFILE compression property. Set this parameter to True to allow MaxCompute to write TEXTFILE data files to OSS in a compressed format. Otherwise, the data is not compressed.

  • True

  • False

False

mcfed.mapreduce.output.fileoutputformat.compress.codec

Add this property to write TEXTFILE data files to OSS in a compressed format.

TEXTFILE compression property. Sets the compression method for TEXTFILE data files.

Note: Only four compression methods are supported for property_value.

  • com.hadoop.compression.lzo.LzoCodec

  • com.hadoop.compression.lzo.LzopCodec

  • org.apache.hadoop.io.compress.SnappyCodec

  • com.aliyun.odps.io.compress.SnappyRawCodec

None

io.compression.codecs

Add this property if the OSS data file is in Raw-Snappy format.

Set this parameter to True to allow MaxCompute to read compressed data. Otherwise, MaxCompute cannot read the data.

com.aliyun.odps.io.compress.SnappyRawCodec

None

Write data

For more information about the write syntax in MaxCompute, see Write syntax.

Query analysis

More flexible type parsing with Smart Parse

MaxCompute SQL uses Data type 2.0 to read and write data for CSV-formatted OSS external tables. This provides compatibility with various value types in CSV files, whereas previously only values in strict formats were supported. The following table provides more details.

Type

Input as String

Output as String

Description

BOOLEAN

  • "true"/"false"

  • "T"/"F"

  • "1"/"0"

  • "Yes"/"No"

  • "Y"/"N"

  • "" (An empty string is parsed as NULL when read into the table.)

Note

The input is trimmed using the trim() operation during parsing.

  • "true"/"false"

  • "true"/"false"

  • "true"/"false"

  • "true"/"false"

  • "true"/"false"

  • "" (A NULL value is output as an empty string to the CSV file.)

If the value is not in the enumeration, parsing fails.

TINYINT

  • "0"

  • "1"

  • "-100"

  • "1,234,567" (thousands separator notation. The comma cannot be the first or last character of the number string)

  • "1_234_567" (Java-style. The underscore cannot be the first or last character of the number string)

  • "0.3e2" (scientific notation. It can be parsed if the actual value is an integer. Otherwise, an error is reported.)

  • "-1e5" (scientific notation)

  • "0xff" (hexadecimal, case-insensitive)

  • "0b1001" (binary, case-insensitive)

  • "4/2" (fraction. It can be parsed if the actual value is an integer. Otherwise, an error is reported.)

  • "1000%" (percentage. It can be parsed if the actual value is an integer. Otherwise, an error is reported.)

  • "1000‰" (permille. It can be parsed if the actual value is an integer. Otherwise, an error is reported.)

  • "1,000 $" (with currency symbol)

  • "$ 1,000" (with currency symbol)

  • "3M" (K8s-style, base-1000 units)

  • "2Gi" (K8s-style, base-1024 units)

  • "" (An empty string is parsed as NULL when read into the table.)

Note
  • The input is trimmed using the trim() operation during parsing.

  • If you use thousands separator notation, such as "1,234,567", you must manually set the CSV separator to a character other than a comma. For more information, see the usage of odps.text.option.delimiter in with serdeproperties parameters.

  • K8s-style supports base-1000 units such as K, M, G, P, and T, and base-1024 units such as Ki, Mi, Gi, Pi, and Ti. For more information, see resource-management.

  • Supported currency symbols include $/¥/€/£/₩/USD/CNY/EUR/GBP/JPY/KRW/IDR/RP.

  • The values "0", "1", "-100", and "" can also be parsed in naive mode.

  • "0"

  • "1"

  • "-100"

  • "1234567"

  • "1234567"

  • "30"

  • "-100000"

  • "255"

  • "9"

  • "2"

  • "10"

  • "1"

  • "1000"

  • "1000"

  • "3000000" (1M = 1000 × 1000)

  • "2147483648" (1 Gi = 1024 × 1024 × 1024)

  • "" (A NULL value is output as an empty string to the CSV file.)

8-bit integer. An error is reported if the value is outside the range of [-128, 127].

SMALLINT

16-bit integer. An error is reported if the value is outside the range of [-32768, 32767].

INT

32-bit integer. An error is reported if the value is outside the range of [-2147483648, 2147483647].

BIGINT

64-bit integer. An error is reported if the value is outside the range of [-9223372036854775807, 9223372036854775807].

Note

-2<sup>63</sup> (which is -9223372036854775808) is not within the value range due to a limitation of the SQL engine.

FLOAT

  • "3.14"

  • "0.314e1" (scientific notation)

  • "2/5" (fraction)

  • "123.45%" (percentage)

  • "123.45‰" (permille)

  • "1,234,567.89" (thousands separator notation)

  • "1,234.56 $" (with currency symbol)

  • "$ 1,234.56" (with currency symbol)

  • "1.2M" (K8s-style, base-1000 units)

  • "2Gi" (K8s-style, base-1024 units)

  • "NaN" (case-insensitive)

  • "Inf" (case-insensitive)

  • "-Inf" (case-insensitive)

  • "Infinity" (case-insensitive)

  • "-Infinity" (case-insensitive)

  • "" (An empty string is parsed as NULL when read into the table.)

Note
  • The input is trimmed using the trim() operation during parsing.

  • The values "3.14", "0.314e1", "NaN", "Infinity", "-Infinity", and "" can also be parsed in naive mode.

  • "3.14"

  • "3.14"

  • "0.4"

  • "1.2345"

  • "0.12345"

  • "1234567.89"

  • "1234.56"

  • "1234.56"

  • "1200000"

  • "2147483648"

  • "NaN"

  • "Infinity"

  • "-Infinity"

  • "Infinity"

  • "-Infinity"

  • "" (A NULL value is output as an empty string to the CSV file.)

Special values (case-insensitive) include NaN, Inf, -Inf, Infinity, and -Infinity. An error is reported if the value is outside the value range. If the precision is exceeded, the value is rounded and truncated.

DOUBLE

  • "3.1415926"

  • "0.314e1" (scientific notation)

  • "2/5" (fraction)

  • "123.45%" (percentage)

  • "123.45‰" (permille)

  • "1,234,567.89" (thousands separator notation)

  • "1,234.56 $" (with currency symbol)

  • "$ 1,234.56" (with currency symbol)

  • "1.2M" (K8s-style, base-1000 units)

  • "2Gi" (K8s-style, base-1024 units)

  • "NaN" (case-insensitive)

  • "Inf" (case-insensitive)

  • "-Inf" (case-insensitive)

  • "Infinity" (case-insensitive)

  • "-Infinity" (case-insensitive)

  • "" (An empty string is parsed as NULL when read into the table.)

Note
  • The input is trimmed using the trim() operation during parsing.

  • The values "3.1415926", "0.314e1", "NaN", "Infinity", "-Infinity", and "" can also be parsed in naive mode.

  • "3.1415926"

  • "3.14"

  • "0.4"

  • "1.2345"

  • "0.12345"

  • "1234567.89"

  • "1234.56"

  • "1234.56"

  • "1200000"

  • "2147483648"

  • "NaN"

  • "Infinity"

  • "-Infinity"

  • "Infinity"

  • "-Infinity"

  • "" (A NULL value is output as an empty string to the CSV file.)

Special values (case-insensitive) include NaN, Inf, -Inf, Infinity, and -Infinity. An error is reported if the value is outside the value range. If the precision is exceeded, the value is rounded and truncated.

DECIMAL

(precision, scale)

Example: DECIMAL(15,2)

  • "3.358"

  • "2/5" (fraction)

  • "123.45%" (percentage)

  • "123.45‰" (permille)

  • "1,234,567.89" (thousands separator notation)

  • "1,234.56 $" (with currency symbol)

  • "$ 1,234.56" (with currency symbol)

  • "1.2M" (K8s-style, base-1000 units)

  • "2Gi" (K8s-style, base-1024 units)

  • "" (An empty string is parsed as NULL when read into the table.)

Note
  • The input is trimmed using the trim() operation during parsing.

  • The values "3.358" and "" can also be parsed in naive mode.

  • "3.36" (rounded)

  • "0.4"

  • "1.23" (rounded)

  • "0.12" (rounded)

  • "1234567.89"

  • "1234.56"

  • "1234.56"

  • "1200000"

  • "2147483648"

  • "" (A NULL value is output as an empty string to the CSV file.)

An error is reported if the integer part exceeds precision-scale.

If the fractional part exceeds the scale, the value is rounded and truncated.

CHAR(n)

Example: CHAR(7)

  • "abcdefg"

  • "abcdefghijklmn"

  • "abc"

  • "" (An empty string is parsed as NULL when read into the table.)

  • "abcdefg"

  • "abcdefg" (The rest is truncated.)

  • "abc____" (The string "abc" is followed by four spaces, which are represented by the _ character.)

  • "" (A NULL value is output as an empty string to the CSV file.)

The maximum length is 255. If the length is less than n, it is padded with spaces. The padding spaces are not used in comparisons. If the length exceeds n, the string is truncated.

VARCHAR(n)

Example: VARCHAR(7)

  • "abcdefg"

  • "abcdefghijklmn"

  • "abc"

  • "" (An empty string is parsed as NULL when read into the table.)

  • "abcdefg"

  • "abcdefg" (The rest is truncated.)

  • "abc"

  • "" (A NULL value is output as an empty string to the CSV file.)

The maximum length is 65535. If the length exceeds n, the string is truncated.

STRING

  • "abcdefg"

  • "abc"

  • "" (An empty string is parsed as NULL when read into the table.)

  • "abcdefg"

  • "abc"

  • "" (A NULL value is output as an empty string to the CSV file.)

The length limit is 8 MB.

DATE

  • "yyyy-MM-dd" (for example, "2025-02-21")

  • "yyyyMMdd" (for example, "20250221")

  • "MMM d,yyyy" (for example, "October 1,2025")

  • "MMMM d,yyyy" (for example, "Oct 1,2025")

  • "MMMM d, yyyy" (e.g., "October 1, 2025")

  • "" (An empty string is parsed as NULL when read into the table.)

Note

You can also customize the parsing method for input by setting odps.text.option.date.io.format. For example, if you set the format to 'dd/MM/yyyy#yyyy--MM--dd', you can parse '21/02/2025' and '2025--02--21'.

  • "2000-01-01"

  • "2000-01-01"

  • "" (A NULL value is output as an empty string to the CSV file.)

  • There is no hour, minute, or second information. Changing the time zone does not affect the output. The default output format is "yyyy-MM-dd".

  • You can also customize the parsing and output format by setting odps.text.option.date.io.format. The output format uses the first custom pattern. For more information about the format, see DateTimeFormatter.

TIMESTAMP_NTZ

Note

OpenCsvSerde does not support this type because it is not compatible with the Hive data format.

  • The nanosecond part can have a length of 0 to 9. MaxCompute supports the following built-in parsing formats:

    • "yyyy-MM-dd HH:mm:ss[.SSSSSSSSS]" (for example, "2000-01-01 00:00:00.123")

    • "yyyy-MM-ddTHH:mm:ss[.SSSSSSSSS]" (for example, "2000-01-01T00:00:00.123456789")

    • "yyyyMMddHHmmss" (for example, "20000101000000")

    • "" (An empty string is parsed as NULL when read into the table.)

  • You can also customize the parsing method for input by setting odps.text.option.timestamp_ntz.io.format. For example, if you set the format to 'ddMMyyyy-HHmmss', you can parse a string such as '31102024-103055'.

  • "2000-01-01 00:00:00.123000000"

  • "2000-01-01 00:00:00.123456789"

  • "2000-01-01 00:00:00.000000000"

  • "" (A NULL value is output as an empty string to the CSV file.)

  • This is a nanosecond-granularity timestamp. Changing the time zone does not affect the output. By default, the output uses the standard UTC time zone and the format "yyyy-MM-dd HH:mm:ss.SSSSSSSSS".

  • You can also customize the parsing and output format by setting odps.text.option.timestamp_ntz.io.format. For more information about the format, see DateTimeFormatter.

DATETIME

  • The millisecond part can have a length of 0 to 3. `x` represents the time zone offset. If the system time zone is Asia/Shanghai, MaxCompute supports the following built-in parsing formats:

    • "yyyy-MM-dd HH:mm:ss[.SSS][x]" (for example, "2000-01-01 00:00:00.123")

    • "yyyy-MM-ddTHH:mm:ss[.SSS][x]" (for example, "2000-01-01T00:00:00.123+0000")

    • "yyyyMMddHHmmss[x]" (for example, "20000101000000+0000")

    • "" (An empty string is parsed as NULL when read into the table.)

  • You can also customize the parsing method for input by setting odps.text.option.datetime.io.format. For example, if you set the format to 'yyyyMMdd-HHmmss.SSS', you can parse a string such as '20241031-103055.123'.

If the system time zone is Asia/Shanghai:

  • "2000-01-01 00:00:00.123+0800"

  • "2000-01-01 08:00:00.123+0800"

  • "2000-01-01 08:00:00.000+0800"

  • "" (A NULL value is output as an empty string to the CSV file.)

  • This is a millisecond-granularity timestamp. Changing the time zone affects the output. The default output format is "yyyy-MM-dd HH:mm:ss.SSSx".

  • You can change the system time zone using odps.sql.timezone to control the time zone offset of the output value.

  • You can also customize the parsing and output format by setting odps.text.option.datetime.io.format. For more information about the format, see: DateTimeFormatter.

TIMESTAMP

  • The nanosecond part can have a length of 0 to 9. `x` represents the time zone offset. If the system time zone is Asia/Shanghai, MaxCompute supports the following built-in parsing formats:

    • "yyyy-MM-dd HH:mm:ss[.SSSSSSSSS][x]" (for example, "2000-01-01 00:00:00.123456")

    • "yyyy-MM-ddTHH:mm:ss[.SSSSSSSSS][x]" (for example, "2000-01-01T00:00:00.123+0000")

    • "yyyyMMddHHmmss[x]" (for example, "20000101000000+0000")

    • "" (An empty string is parsed as NULL when read into the table.)

  • You can also customize the parsing method for input by setting odps.text.option.timestamp.io.format. For example, if you set the format to 'yyyyMMdd-HHmmss', you can parse a string such as '20240910-103055'.

If the system time zone is Asia/Shanghai:

  • "2000-01-01 00:00:00.123456000+0800"

  • "2000-01-01 08:00:00.123000000+0800"

  • "2000-01-01 08:00:00.000000000+0800"

  • "" (A NULL value is output as an empty string to the CSV file.)

  • This is a nanosecond-granularity timestamp. Changing the time zone affects the output. The default output format is "yyyy-MM-dd HH:mm:ss.SSSSSSSSSx".

  • You can change the system time zone using odps.sql.timezone to control the time zone offset of the output value.

  • You can also customize the parsing and output format by setting odps.text.option.timestamp.io.format. For more information about the format, see DateTimeFormatter.

Notes

  • For any data type, an empty string in a CSV data file is parsed as NULL when read into the table.

  • Complex types such as STRUCT, ARRAY, and MAP are not supported because their values can easily conflict with common CSV delimiters, such as the comma (,), which can cause parsing to fail.

  • BINARY and INTERVAL types are not currently supported. To use these types, you can contact MaxCompute technical support.

  • For INT, SMALLINT, TINYINT, BIGINT, FLOAT, DOUBLE, and DECIMAL data types, the default parsing capability is flexible.

    If you want to parse only basic number strings, you can set the odps.text.option.smart.parse.level parameter to naive in tblproperties. This configures the parser to support only regular number strings, such as "123" and "123.456". Parsing other string formats results in an error.

  • For the four date and time-related types (DATE, DATETIME, TIMESTAMP, and TIMESTAMP_NTZ), the underlying code uses java.time.format.DateTimeFormatter for processing. MaxCompute has several built-in default parsing formats. You can also define multiple parsing formats and one output format by setting odps.text.option.<date|datetime|timestamp|timestamp_ntz>.io.format in tblproperties. You can separate parsing formats with a hash symbol (#).

    • Custom formats have a higher priority than MaxCompute's built-in parsing formats. The output format uses the first custom format.

      For example, if you define a custom format string for the DATE type as pattern1#pattern2#pattern3, you can parse strings that match pattern1, pattern2, or pattern3. However, when you write data to a file, the output uses the pattern1 format. For more information, see DateTimeFormatter.

    • Do not use 'z' (time-zone name) as a custom time zone format, especially for users in China, because 'z' can be ambiguous in some contexts.

      For example, 'CST' usually means China Standard Time (UTC+8) in China. However, when 'CST' is parsed by java.time.format.DateTimeFormatter, it is identified as US Central Standard Time (UTC-6). This can lead to unexpected input or output. Use the 'x' (zone-offset) or 'VV' (time-zone ID) pattern instead.

Examples

Note
  • When you run the following sample code, replace <uid> with your Alibaba Cloud account ID.

  • The role used in the following examples is aliyunodpsdefaultrole. If you want to use a different role, replace aliyunodpsdefaultrole with the name of the target role and grant the target role permissions to access OSS.

Create an OSS external table using the built-in text data parser

Example 1: Non-partitioned table

  1. Map the table to the Demo1/ folder in the sample data. The following sample code shows how to create an OSS external table.

    CREATE EXTERNAL TABLE IF NOT EXISTS mc_oss_csv_external1
    (
      vehicleId INT,
      recordId INT,
      patientId INT,
      calls INT,
      locationLatitute DOUBLE,
      locationLongitude DOUBLE,
      recordTime STRING,
      direction STRING
    )
    STORED BY 'com.aliyun.odps.CsvStorageHandler' 
    WITH serdeproperties (
      'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
    ) 
    LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo1/';
    
    -- You can run the desc extended mc_oss_csv_external1; command to view the schema of the created OSS external table.
  2. Query the non-partitioned external table.

    SELECT * FROM mc_oss_csv_external1;

    The following result is returned:

    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+
    | vehicleid  | recordid   | patientid  | calls      | locationlatitute | locationlongitude | recordtime     | direction  |
    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+
    | 1          | 1          | 51         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          |
    | 1          | 2          | 13         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         |
    | 1          | 3          | 48         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         |
    | 1          | 4          | 30         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | W          |
    | 1          | 5          | 47         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          |
    | 1          | 6          | 9          | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | S          |
    | 1          | 7          | 53         | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | N          |
    | 1          | 8          | 63         | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | SW         |
    | 1          | 9          | 4          | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | NE         |
    | 1          | 10         | 31         | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | N          |
    +------------+------------+------------+------------+------------------+-------------------+------------+----------------+
  3. Write data to the non-partitioned external table and check whether the data is written.

    INSERT INTO mc_oss_csv_external1 VALUES(1,12,76,1,46.81006,-92.08174,'9/14/2014 0:10','SW');
    SELECT * FROM mc_oss_csv_external1 WHERE recordId=12;

    The following result is returned:

    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+
    | vehicleid  | recordid   | patientid  | calls      | locationlatitute | locationlongitude | recordtime     | direction  |
    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+
    | 1          | 12         | 76         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:10 | SW         |
    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+

    A new file is generated in the Demo1/ path in OSS.image

Example 2: Partitioned table

  1. Map the table to the Demo2/ folder in the sample data. The following sample code shows how to create an OSS external table and import partition data.

    CREATE EXTERNAL TABLE IF NOT EXISTS mc_oss_csv_external2
    (
      vehicleId INT,
      recordId INT,
      patientId INT,
      calls INT,
      locationLatitute DOUBLE,
      locationLongitude DOUBLE,
      recordTime STRING
    )
    PARTITIONED BY (
      direction STRING
    )
    STORED BY 'com.aliyun.odps.CsvStorageHandler' 
    WITH serdeproperties (
      'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
    ) 
    LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo2/';
    
    -- You can run the DESC EXTENDED mc_oss_csv_external2; command to view the schema of the created external table.
  2. Import partition data. If the OSS external table that you created is a partitioned table, you must also import the partition data. For more information, see Syntax for adding partition data to an OSS external table.

    MSCK REPAIR TABLE mc_oss_csv_external2 ADD PARTITIONS;
    
    --This is equivalent to the following statement.
    ALTER TABLE mc_oss_csv_external2 ADD PARTITION (direction = 'N') 
      PARTITION (direction = 'NE') PARTITION (direction = 'S') 
      PARTITION (direction = 'SW') PARTITION (direction = 'W');
  3. Query the partitioned external table.

    SELECT * FROM mc_oss_csv_external2 WHERE direction='NE';

    The following result is returned:

    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+
    | vehicleid  | recordid   | patientid  | calls      | locationlatitute | locationlongitude | recordtime     | direction  |
    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+
    | 1          | 2          | 13         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         |
    | 1          | 3          | 48         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         |
    | 1          | 9          | 4          | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | NE         |
    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+
  4. Write data to the partitioned external table and check whether the data is written.

    INSERT INTO mc_oss_csv_external2 PARTITION(direction='NE') VALUES(1,12,76,1,46.81006,-92.08174,'9/14/2014 0:10');
    SELECT * FROM mc_oss_csv_external2 WHERE direction='NE' AND recordId=12;

    The following result is returned:

    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+
    | vehicleid  | recordid   | patientid  | calls      | locationlatitute | locationlongitude | recordtime     | direction  |
    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+
    | 1          | 12         | 76         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:10 | NE         |
    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+

    A new file is generated in the Demo2/direction=NE path in OSS.image

Example 3: Compressed data

This example shows how to create a GZIP-compressed CSV external table and perform read and write operations.

  1. Create an internal table and write test data to it.

    CREATE TABLE vehicle_test(
      vehicleid INT, 
      recordid INT, 
      patientid INT, 
      calls INT, 
      locationlatitute DOUBLE, 
      locationLongitude DOUBLE, 
      recordtime STRING, 
      direction STRING
    );
    
    INSERT INTO vehicle_test VALUES (1,1,51,1,46.81006,-92.08174,'9/14/2014 0:00','S');
  2. Create a GZIP-compressed CSV external table and map it to the Demo3/ (compressed data) folder in the sample data. The following sample statement is used to create an OSS external table:

    CREATE EXTERNAL TABLE IF NOT EXISTS mc_oss_csv_external3
    (
      vehicleId INT,
      recordId INT,
      patientId INT,
      calls INT,
      locationLatitute DOUBLE,
      locationLongitude DOUBLE,
      recordTime STRING,
      direction STRING
    )
    PARTITIONED BY (dt STRING)
    STORED BY 'com.aliyun.odps.CsvStorageHandler' 
    WITH serdeproperties (
      'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole',
      'odps.text.option.gzip.input.enabled'='true',
      'odps.text.option.gzip.output.enabled'='true' 
    ) 
    LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo3/';
    
    -- Import partition data.
    MSCK REPAIR TABLE mc_oss_csv_external3 ADD PARTITIONS;
    
    -- You can run the DESC EXTENDED mc_oss_csv_external3; command to view the schema of the created external table.
  3. Use the MaxCompute client to read data from OSS. The following is a sample command:

    Note

    If the compressed data in OSS is in an open source data format, you must add the set odps.sql.hive.compatible=true; command before the SQL statement and submit them together.

    --Enable a full table scan. This setting is valid only for the current session.
    SET odps.sql.allow.fullscan=true;
    SELECT recordId, patientId, direction FROM mc_oss_csv_external3 WHERE patientId > 25;

    The following result is returned:

    +------------+------------+------------+
    | recordid   | patientid  | direction  |
    +------------+------------+------------+
    | 1          | 51         | S          |
    | 3          | 48         | NE         |
    | 4          | 30         | W          |
    | 5          | 47         | S          |
    | 7          | 53         | N          |
    | 8          | 63         | SW         |
    | 10         | 31         | N          |
    +------------+------------+------------+
  4. Read data from the internal table and write it to the OSS external table.

    You can run the INSERT OVERWRITE or INSERT INTO command on the external table using the MaxCompute client to write data to OSS.

    INSERT INTO TABLE mc_oss_csv_external3 PARTITION (dt='20250418') SELECT * FROM vehicle_test;

    After the command is successfully executed, you can view the exported file in the OSS folder.

Create an OSS external table and specify the first row of the corresponding OSS file as the header

Create the Demo11 folder in the oss-mc-test bucket from the sample data and run the following statements:

--Create an external table.
CREATE EXTERNAL TABLE mf_oss_wtt
(
  id BIGINT,
  name STRING,
  tran_amt DOUBLE
)
STORED BY 'com.aliyun.odps.CsvStorageHandler'
WITH serdeproperties (
  'odps.text.option.header.lines.count' = '1',
  'odps.sql.text.option.flush.header' = 'true'
)
LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo11/';

--Insert data.
INSERT OVERWRITE TABLE mf_oss_wtt VALUES (1, 'val1', 1.1),(2, 'value2', 1.3);

--Query data.
--When you create the table, you can define all fields as STRING. Otherwise, an error occurs when the header is read.
--Alternatively, add the 'odps.text.option.header.lines.count' = '1' parameter when you create the table to skip the header.
SELECT * FROM mf_oss_wtt;

The following result is returned:

+----------+--------+------------+
| id       | name   | tran_amt   |
+----------+--------+------------+
| 1        | val1   | 1.1        |
| 2        | value2 | 1.3        |
+----------+--------+------------+

Create an OSS external table with a mismatched column count

  1. Create the demo folder in the oss-mc-test bucket from the sample data and upload the test.csv file. The test.csv file contains the following content.

    1,kyle1,this is desc1
    2,kyle2,this is desc2,this is two
    3,kyle3,this is desc3,this is three, I have 4 columns
  2. Create an external table.

    1. Set the behavior for rows with a mismatched number of columns to TRUNCATE.

      --Delete the table.
      DROP TABLE test_mismatch;
      --Create a new external table.
      CREATE EXTERNAL TABLE IF NOT EXISTS test_mismatch
      (
        id string,
        name string,
        dect string,
        col4 string
      )
      STORED BY 'com.aliyun.odps.CsvStorageHandler'
      WITH serdeproperties ('odps.sql.text.schema.mismatch.mode' = 'truncate')
      LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/demo/';
    2. Set the behavior for rows with a mismatched number of columns to IGNORE.

      --Delete the table.
      DROP TABLE test_mismatch01;
      --Create a new external table.
      CREATE EXTERNAL TABLE IF NOT EXISTS test_mismatch01
      (
        id STRING,
        name STRING,
        dect STRING,
        col4 STRING
      )
      STORED BY 'com.aliyun.odps.CsvStorageHandler'
      WITH serdeproperties ('odps.sql.text.schema.mismatch.mode' = 'ignore')
      LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/demo/';
    3. Query the table data.

      • Query the test_mismatch table.

        SELECT * FROM test_mismatch;
        
        --Result
        +----+-------+---------------+---------------+
        | id | name  | dect          | col4          |
        +----+-------+---------------+---------------+
        | 1  | kyle1 | this is desc1 | NULL          |
        | 2  | kyle2 | this is desc2 | this is two   |
        | 3  | kyle3 | this is desc3 | this is three |
        +----+-------+---------------+---------------+
      • Query the test_mismatch01 table.

        SELECT * FROM test_mismatch01;
        
        --Result
        +----+-------+----------------+-------------+
        | id | name  | dect           | col4        |
        +----+-------+----------------+-------------+
        | 2  | kyle2 | this is desc2  | this is two  
        +----+-------+----------------+-------------+

Create an OSS external table using the built-in open source parser

This example shows how to create an OSS external table using the built-in open source parser to read a comma-separated file and ignore the first and last rows of data.

  1. Create the demo-test folder in the oss-mc-test bucket from the sample data and upload the test file test.csv.

    The test file contains the following data.

    1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S
    1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE
    1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE
    1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W
    1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S
    1,6,9,1,46.81006,-92.08174,9/15/2014 0:00,S
    1,7,53,1,46.81006,-92.08174,9/15/2014 0:00,N
    1,8,63,1,46.81006,-92.08174,9/15/2014 0:00,SW
    1,9,4,1,46.81006,-92.08174,9/15/2014 0:00,NE
    1,10,31,1,46.81006,-92.08174,9/15/2014 0:00,N
  2. Create an external table, specify the delimiter as a comma, and set parameters to ignore the first and last rows.

    CREATE EXTERNAL TABLE ext_csv_test08
    (
      vehicleId INT,
      recordId INT,
      patientId INT,
      calls INT,
      locationLatitute DOUBLE,
      locationLongitude DOUBLE,
      recordTime STRING,
      direction STRING
    )
    ROW FORMAT serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
    WITH serdeproperties (
    "separatorChar" = ","
    )
    stored AS textfile
    location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/***/'
    -- Set parameters to ignore the first and last rows.
    TBLPROPERTIES (
    "skip.header.line.COUNT"="1",
    "skip.footer.line.COUNT"="1"
    )
    ;
  3. Read the external table.

    SELECT * FROM ext_csv_test08;
    
    -- The result shows that only 8 rows were read, ignoring the first and last rows.
    +------------+------------+------------+------------+------------------+-------------------+------------+------------+
    | vehicleid  | recordid   | patientid  | calls      | locationlatitute | locationlongitude | recordtime | direction  | 
    +------------+------------+------------+------------+------------------+-------------------+------------+------------+
    | 1          | 1          | 51         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          | 
    | 1          | 2          | 13         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         | 
    | 1          | 3          | 48         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         | 
    | 1          | 4          | 30         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | W          | 
    | 1          | 5          | 47         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          | 
    | 1          | 6          | 9          | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | S          | 
    | 1          | 7          | 53         | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | N          | 
    | 1          | 8          | 63         | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | SW         | 
    | 1          | 9          | 4          | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | NE         | 
    | 1          | 10         | 31         | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | N          | 
    +------------+------------+------------+------------+------------------+-------------------+------------+------------+

Create a CSV-formatted OSS external table with custom date and time data types

For more information about custom parsing and output formats for CSV time types, see More flexible type parsing with Smart Parse.

  1. Create a CSV external table for various time data types (datetime, timestamp, and timestamp_ntz).

    CREATE EXTERNAL TABLE test_csv
    (
      col_date DATE,
      col_datetime DATETIME,
      col_timestamp TIMESTAMP,
      col_timestamp_ntz TIMESTAMP_NTZ
    )
    STORED BY 'com.aliyun.odps.CsvStorageHandler'
    LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/demo/'
    TBLPROPERTIES (
      'odps.text.option.date.io.format' = 'MM/dd/yyyy',
      'odps.text.option.datetime.io.format' = 'yyyy-MM-dd-HH-mm-ss x',
      'odps.text.option.timestamp.io.format' = 'yyyy-MM-dd HH-mm-ss VV',
      'odps.text.option.timestamp_ntz.io.format' = 'yyyy-MM-dd HH:mm:ss.SS'
    );
    
    INSERT OVERWRITE test_csv 
      VALUES(DATE'2025-02-21', DATETIME'2025-02-21 08:30:00', TIMESTAMP'2025-02-21 12:30:00', TIMESTAMP_NTZ'2025-02-21 16:30:00.123456789');
  2. After you insert the data, the content of the CSV file is:

    02/21/2025,2025-02-21-08-30-00 +08,2025-02-21 12-30-00 Asia/Shanghai,2025-02-21 16:30:00.12
  3. Read the data again to see the result.

    SELECT * FROM test_csv;

    The following result is returned:

    +------------+---------------------+---------------------+------------------------+
    | col_date   | col_datetime        | col_timestamp       | col_timestamp_ntz      |
    +------------+---------------------+---------------------+------------------------+
    | 2025-02-21 | 2025-02-21 08:30:00 | 2025-02-21 12:30:00 | 2025-02-21 16:30:00.12 |
    +------------+---------------------+---------------------+------------------------+