All Products
Search
Document Center

MaxCompute:CSV and TSV foreign tables

Last Updated:Jan 28, 2026

This topic describes how to create, read from, and write to Object Storage Service (OSS) foreign tables in CSV and TSV formats.

Scope

  • OSS external tables do not support the cluster property.

  • The size of a single file cannot exceed 2 GB. You must split files that are larger than 2 GB.

  • MaxCompute and OSS must be in the same region.

Supported data types

For more information about MaxCompute data types, see Data types V1.0 and Data types V2.0.

For more information about Smart Parse, see Smart Parse for more flexible type compatibility.

Data type

com.aliyun.odps.CsvStorageHandler/

TsvStorageHandler

(built-in text data parser)

org.apache.hadoop.hive.serde2.OpenCSVSerde

(built-in open source data parser)

TINYINT

Supported

Supported

SMALLINT

Supported

Supported

INT

Supported

Supported

BIGINT

Supported

Supported

BINARY

Not supported

Not supported

FLOAT

Supported

Supported

DOUBLE

Supported

Supported

DECIMAL(precision,scale)

Supported

Supported

VARCHAR(n)

Supported

Supported

CHAR(n)

Supported

Supported

STRING

Supported

Supported

DATE

Supported

Supported

DATETIME

Supported

Supported

TIMESTAMP

Supported

Supported

TIMESTAMP_NTZ

Supported

Not supported

BOOLEAN

Supported

Supported

ARRAY

Not supported

Not supported

MAP

Not supported

Not supported

STRUCT

Not supported

Not supported

JSON

Not supported

Not supported

Supported compression formats

When you read from or write to compressed OSS files, you can add the with serdeproperties property to the CREATE TABLE statement. For more information, see with serdeproperties parameters.

Compression format

com.aliyun.odps.CsvStorageHandler/

TsvStorageHandler

(built-in text data parser)

org.apache.hadoop.hive.serde2.OpenCSVSerde

(built-in open source data parser)

GZIP

Supported

Supported

SNAPPY

Not supported

Supported

LZO

Not supported

Supported

Schema evolution support

Operation type

Supported

Description

Add columns

Supported

  • You cannot add columns with default values.

  • You cannot add columns of complex or nested types.

Delete columns

Supported

This operation is not recommended because it may cause a mismatch between the schema and the data.

Change column order

Supported

This operation is not recommended because it may cause a mismatch between the schema and the data.

Change column data types

Supported

For the data type conversion table, see Change column data types.

Change column names

Supported

Change column comments

Supported

The comment must be a valid string of no more than 1024 bytes. Otherwise, an error is reported.

Change the NOT NULL property of a column

Not supported

This operation is not supported. The default value is Nullable.

Parameter settings

The schema of a CSV or TSV foreign table is mapped to the file columns by position. If the number of columns in an OSS file does not match the number of columns in the foreign table schema, you can use the odps.sql.text.schema.mismatch.mode parameter to specify how to handle the mismatched rows.

  • If odps.sql.text.schema.mismatch.mode=truncate is set, after you modify a column:

    • Data that conforms to the modified schema can be read.

    • Existing data that uses the old schema is read based on the new schema.

      For example, if you add a column to a table, the value for this new column in existing rows is filled with NULL when you read the table.

  • If odps.sql.text.schema.mismatch.mode=ignore is set, after you modify a column:

    • Data that conforms to the modified schema can be read.

    • Existing data that uses the old schema is read based on the new schema.

      For example, if you add a column to a table, the entire row of existing data is discarded when you read the table.

  • If odps.sql.text.schema.mismatch.mode=error is set, after you modify a column:

    • Data that conforms to the modified schema can be read.

    • Existing data that uses the old schema is read based on the new schema.

      For example, if you add a column to a table, an error occurs when you read historical data because that data lacks the new column.

Create a foreign table

Syntax

Built-in text data parser

CSV format

CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name> 
(
  <col_name> <data_type>,
  ...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)] 
STORED BY 'com.aliyun.odps.CsvStorageHandler'  
[WITH serdeproperties (
  ['<property_name>'='<property_value>',...]
)] 
LOCATION '<oss_location>'
[tblproperties ('<tbproperty_name>'='<tbproperty_value>',...)];

TSV format

CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name> 
(
  <col_name> <data_type>,
  ...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)] 
STORED BY 'com.aliyun.odps.TsvStorageHandler'  
[WITH serdeproperties (
  ['<property_name>'='<property_value>',...]
)] 
LOCATION '<oss_location>'
[tblproperties ('<tbproperty_name>'='<tbproperty_value>',...)];

Built-in open source data parser

CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name>
(
  <col_name> <data_type>,
  ...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)]
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
[WITH serdeproperties (
  ['<property_name>'='<property_value>',...]
)]
STORED AS TEXTFILE 
LOCATION '<oss_location>' 
[tblproperties ('<tbproperty_name>'='<tbproperty_value>',...)];

Common parameters

For more information about common parameters, see Basic syntax parameters.

Unique parameters

with serdeproperties parameters

Applicable parser

property_name

Scenario

Description

property_value

Default value

Built-in text data parser (CsvStorageHandler/TsvStorageHandler)

odps.text.option.gzip.input.enabled

Add this property when you need to read data from GZIP-compressed CSV or TSV files.

CSV and TSV compression property. Set this parameter to True to allow MaxCompute to read compressed files. Otherwise, the read operation fails.

  • True

  • False

False

odps.text.option.gzip.output.enabled

Add this property when you need to write data to OSS in GZIP compression format.

CSV and TSV compression property. Set this parameter to True to allow MaxCompute to write data to OSS in GZIP compression format. Otherwise, the data is not compressed.

  • True

  • False

False

odps.text.option.header.lines.count

Add this property when the OSS data file is in CSV or TSV format and you need to ignore the first N rows of the file.

When MaxCompute reads the OSS data file, it ignores the specified number of rows.

A non-negative integer.

0

odps.text.option.null.indicator

Add this property when the OSS data file is in CSV or TSV format and you need to define a parsing rule for NULL values in the file.

The string specified by this parameter is parsed as NULL in SQL.

For example, to parse \N in a file as NULL, set this parameter to \\N. The first \ is an escape character. Then, a,\N,b is parsed as a, NULL, b.

String

Empty string

odps.text.option.ignore.empty.lines

Add this property when the OSS data file is in CSV or TSV format and you need to define a rule for handling empty lines in the file.

If this parameter is set to True, MaxCompute ignores empty lines in the data file. Otherwise, empty lines are read.

  • True

  • False

True

odps.text.option.encoding

Add this property when the OSS data file is in CSV or TSV format and its encoding format is not the default.

Make sure that the encoding format specified here is the same as the encoding format of the OSS data file. Otherwise, MaxCompute cannot read the data.

  • UTF-8

  • UTF-16

  • US-ASCII

  • GBK

UTF-8

odps.text.option.delimiter

Add this property when you need to specify the column delimiter for a CSV or TSV data file.

Make sure that the column delimiter specified here can correctly read each column of the OSS data file. Otherwise, the data read by MaxCompute may be misaligned.

Single character

Comma (,)

odps.text.option.use.quote

Add this property when a field in the CSV or TSV data file contains a line break (CRLF), a double quotation mark, or a comma.

When a CSV field contains a line break, a double quote (which must be escaped by adding a " before the "), or a comma, the entire field must be enclosed in double quotes ("") to serve as the column delimiter. This parameter specifies whether to detect the CSV column delimiter ".

  • True

  • False

False

odps.sql.text.option.flush.header

When you write data to OSS, the first row of the file block is the table header.

This parameter takes effect only for files in the CSV format.

  • True

  • False

False

odps.sql.text.schema.mismatch.mode

The number of data columns in the OSS file does not match the number of schema columns in the foreign table.

Specifies how to handle rows with a mismatched number of columns.

Note: This feature does not take effect when odps.text.option.use.quote is set to True.

  • error: An error is reported.

  • truncate: Data in columns that exceed the number of columns in the foreign table is discarded. If the number of columns is smaller than that in the foreign table, the missing columns are filled with null.

  • ignore: Mismatched rows are discarded.

error

Built-in open source data parser (OpenCSVSerde)

separatorChar

Add this property when you need to specify the column delimiter for CSV data saved in TEXTFILE format.

Specifies the column delimiter for CSV data.

Single string

Comma (,)

quoteChar

Add this property when a field in the CSV data saved in TEXTFILE format contains a line break, a double quotation mark, or a comma.

Specifies the quote character for CSV data.

Single string

None

escapeChar

Add this property when you need to specify the escape rule for CSV data saved in TEXTFILE format.

Specifies the escape character for CSV data.

Single string

None

tblproperties parameters

Applicable parser

property_name

Scenario

Description

property_value

Default value

Built-in open source data parser (OpenCSVSerde)

skip.header.line.count

Add this property when you need to ignore the first N rows of a CSV file saved in TEXTFILE format.

When MaxCompute reads data from OSS, it ignores the specified number of rows starting from the first row.

A non-negative integer.

None

skip.footer.line.count

Add this property when you need to ignore the last N rows of a CSV file saved in TEXTFILE format.

When MaxCompute reads data from OSS, it ignores the specified number of rows starting from the last row.

A non-negative integer.

None

mcfed.mapreduce.output.fileoutputformat.compress

Add this property when you need to write a TEXTFILE data file to OSS in a compressed format.

TEXTFILE compression property. Set this parameter to True to allow MaxCompute to write a TEXTFILE data file to OSS in a compressed format. Otherwise, the data is not compressed.

  • True

  • False

False

mcfed.mapreduce.output.fileoutputformat.compress.codec

Add this property when you need to write a TEXTFILE data file to OSS in a compressed format.

TEXTFILE compression property. Sets the compression format for the TEXTFILE data file.

Note: For property_value, only four compression methods are supported.

  • com.hadoop.compression.lzo.LzoCodec

  • com.hadoop.compression.lzo.LzopCodec

  • org.apache.hadoop.io.compress.SnappyCodec

  • com.aliyun.odps.io.compress.SnappyRawCodec

None

io.compression.codecs

Add this property when the OSS data file is in Raw-Snappy format.

Set this parameter to True to allow MaxCompute to read compressed data. Otherwise, MaxCompute cannot read the data.

com.aliyun.odps.io.compress.SnappyRawCodec

None

odps.text.option.bad.row.skipping

Add this property when the OSS data file is in CSV format and you need to skip dirty data in the file.

When MaxCompute reads an OSS data file, you can choose to skip or not skip dirty data.

  • rigid: The skip logic must be executed and cannot be overwritten by session-level or project-level settings.

  • flexible: Enables flexible skipping at the data plane. This setting can be overwritten by session-level or project-level settings.

None

Write data

For more information about the MaxCompute write syntax, see Write syntax.

Query and analysis

BadRowSkipping

If dirty data exists in CSV data, you can use the BadRowSkipping feature to set parameters that specify whether to skip the data that causes errors. Enabling or disabling this feature does not affect the parsing of the underlying data format.

Parameter settings

  • Table-level parameter odps.text.option.bad.row.skipping

    • rigid: The skip logic must be executed and cannot be overwritten by session-level or project-level settings.

    • flexible: Enables flexible skipping at the data plane. This setting can be overwritten by session-level or project-level settings.

  • Session-level or project-level parameters

    • odps.sql.unstructured.text.bad.row.skipping: This parameter can overwrite the flexible table-level parameter but not the rigid table-level parameter.

      • on: Actively enabled. If this parameter is not configured for the table, it is automatically enabled.

      • off: Actively disabled. If the table-level parameter is set to flexible, this feature is disabled. In other cases, the table-level parameter takes precedence.

      • <null>/If the input is invalid: The behavior is determined only by the table-level configuration.

    • odps.sql.unstructured.text.bad.row.skipping.debug.num: Specifies the number of error results that can be printed to standard output (stdout) in Logview.

      • The maximum value is 1000.

      • If the value is less than or equal to 0, this feature is disabled.

      • If the input is invalid, this feature is disabled.

  • Interaction between session-level and table-level parameters

    tbl property

    session flag

    result

    rigid

    on

    On, forcibly enabled

    off

    <null>, an invalid value, or this parameter is not configured

    flexible

    on

    On

    off

    Off, disabled by the session

    <null>, an invalid value, or this parameter is not configured

    On

    Not configured

    on

    Status: Session enabled

    off

    Off

    <null>, an invalid value, or this parameter is not configured

Usage examples

  1. Data preparation

    Upload the test data file csv_bad_row_skipping.csv, which contains dirty data, to the oss-mc-test/badrow/ directory in OSS.

  2. Create a CSV foreign table

    The following three scenarios describe different combinations of table-level parameters and session-level flags.

    • Table parameter: odps.text.option.bad.row.skipping = flexible/rigid/<not specified>

    • Session flag: odps.sql.unstructured.text.bad.row.skipping = on/off/<not specified>

    No table-level parameter is configured

    -- No table-level parameter is configured. Errors are reported as they occur. The behavior is controlled by the session-level flag.
    CREATE EXTERNAL TABLE test_csv_bad_data_skipping_flag
    (
      a INT,
      b INT
    )
    STORED BY 'com.aliyun.odps.CsvStorageHandler' 
    WITH serdeproperties (
      'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
    )
    location '<oss://databucketpath>';

    The table-level parameter is set to skip error rows, but this can be disabled at the session level

    -- The table-level parameter is set to skip error rows, but this can be actively disabled by the session-level flag.
    CREATE EXTERNAL TABLE test_csv_bad_data_skipping_flexible
    (
      a INT,
      b INT
    )
    STORED BY 'com.aliyun.odps.CsvStorageHandler' 
    WITH serdeproperties (
      'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
    )
    location '<oss://databucketpath>'
    tblproperties (
      'odps.text.option.bad.row.skipping' = 'flexible'   -- Flexibly enabled. Can be disabled at the session level.
    );

    The table-level parameter is set to skip error rows, and this cannot be disabled at the session level

    -- The table-level parameter is set to forcibly skip error rows. This cannot be disabled at the session level.
    CREATE EXTERNAL TABLE test_csv_bad_data_skipping_rigid
    (
      a INT,
      b INT
    )
    STORED BY 'com.aliyun.odps.CsvStorageHandler' 
    WITH serdeproperties (
      'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
    )
    location '<oss://databucketpath>'
    tblproperties (
      'odps.text.option.bad.row.skipping' = 'rigid'  -- Forcibly enabled.
    );
  3. Verify the query result

    No table-level parameter is configured

    -- Enable at the session level.
    SET odps.sql.unstructured.text.bad.row.skipping=on;
    
    -- Disable at the session level. If the table-level parameter is set to 'flexible', this feature is disabled. If the table-level parameter is set to 'rigid', this setting has no effect.
    SET odps.sql.unstructured.text.bad.row.skipping=off;
    
    -- Print problematic row data at the session level. A maximum of 1,000 rows can be printed. If the value is less than or equal to 0, printing is disabled.
    SET odps.sql.unstructured.text.bad.row.skipping.debug.num=10;
    
    SELECT * FROM test_csv_bad_data_skipping_flag;

    The following error is returned: FAILED: ODPS-0123131:User defined function exception

    The table-level parameter is set to skip error rows, but this can be disabled at the session level

    -- Enable at the session level.
    SET odps.sql.unstructured.text.bad.row.skipping=on;
    
    -- Disable at the session level. If the table-level parameter is set to 'flexible', this feature is disabled. If the table-level parameter is set to 'rigid', this setting has no effect.
    SET odps.sql.unstructured.text.bad.row.skipping=off;
    
    -- Print problematic row data at the session level. A maximum of 1,000 rows can be printed. If the value is less than or equal to 0, printing is disabled.
    SET odps.sql.unstructured.text.bad.row.skipping.debug.num=10;
    
    SELECT * FROM test_csv_bad_data_skipping_flexible;

    The following error is returned: FAILED: ODPS-0123131:User defined function exception

    The table-level parameter is set to skip error rows, and this cannot be disabled at the session level

    -- Enable at the session level.
    SET odps.sql.unstructured.text.bad.row.skipping=on;
    
    -- Disable at the session level. If the table-level parameter is set to 'flexible', this feature is disabled. If the table-level parameter is set to 'rigid', this setting has no effect.
    SET odps.sql.unstructured.text.bad.row.skipping=off;
    
    -- Print problematic row data at the session level. A maximum of 1,000 rows can be printed. If the value is less than or equal to 0, printing is disabled.
    SET odps.sql.unstructured.text.bad.row.skipping.debug.num=10;
    
    SELECT * FROM test_csv_bad_data_skipping_rigid;

    The following result is returned:

    +------------+------------+
    | a          | b          | 
    +------------+------------+
    | 1          | 26         | 
    | 5          | 37         | 
    +------------+------------+

Smart Parse for more flexible type compatibility

For OSS foreign tables in CSV format, MaxCompute SQL uses data types V2.0 for read and write operations. MaxCompute is now compatible with various types of values in CSV files. The following table provides the details:

Type

Input as String

Output as String

Description

BOOLEAN

  • "true"/"false"

  • "T"/"F"

  • "1"/"0"

  • "Yes"/"No"

  • "Y"/"N"

  • "" (An empty string is parsed as NULL when read into the table.)

Note

The input is trimmed using the trim() operation during parsing.

  • "true"/"false"

  • "true"/"false"

  • "true"/"false"

  • "true"/"false"

  • "true"/"false"

  • "" (NULL is output as an empty string to the CSV file.)

If the value is not in the enumeration, the parsing fails.

TINYINT

  • "0"

  • "1"

  • "-100"

  • "1,234,567" (thousands separator notation; the comma cannot be the first or last character of the number string)

  • "1_234_567" (Java-style; the underscore cannot be the first or last character of the number string)

  • "0.3e2" (scientific notation; can be parsed if the actual value is an integer, otherwise an error is reported)

  • "-1e5" (scientific notation)

  • "0xff" (hexadecimal, case-insensitive)

  • "0b1001" (binary, case-insensitive)

  • "4/2" (fraction; can be parsed if the actual value is an integer, otherwise an error is reported)

  • "1000%" (percentage; can be parsed if the actual value is an integer, otherwise an error is reported)

  • "1000‰" (permille; can be parsed if the actual value is an integer, otherwise an error is reported)

  • "1,000 $" (with currency symbol)

  • "$ 1,000" (with currency symbol)

  • "3M" (K8s-style, base-1000 units)

  • "2Gi" (K8s-style, base-1024 units)

  • "" (An empty string is parsed as NULL when read into the table.)

Note
  • The input is trimmed using the trim() operation during parsing.

  • If you use thousands separator notation, such as "1,234,567", you must manually set the CSV separator to a character other than a comma. For more information, see the usage of odps.text.option.delimiter in with serdeproperties parameters.

  • The supported K8s-style base-1000 units include K, M, G, P, and T. The supported base-1024 units include Ki, Mi, Gi, Pi, and Ti. For more information, see resource-management.

  • The currency symbols include $/¥/€/£/₩/USD/CNY/EUR/GBP/JPY/KRW/IDR/RP.

  • "0", "1", "-100", and "" can also be correctly parsed in naive mode.

  • "0"

  • "1"

  • "-100"

  • "1234567"

  • "1234567"

  • "30"

  • "-100000"

  • "255"

  • "9"

  • "2"

  • "10"

  • "1"

  • "1000"

  • "1000"

  • "3000000" (1M = 1000 × 1000)

  • "2147483648" (1 Gi = 1024 × 1024 × 1024)

  • "" (NULL is output as an empty string to the CSV file.)

8-bit integer. If the value is outside the range of [-128, 127], an error is reported.

SMALLINT

16-bit integer. If the value is outside the range of [-32768, 32767], an error is reported.

INT

32-bit integer. If the value is outside the range of [-2147483648, 2147483647], an error is reported.

BIGINT

64-bit integer. If the value is outside the range of [-9223372036854775807, 9223372036854775807], an error is reported.

Note

-2<sup>63</sup> (which is -9223372036854775808) is not within the value range due to a limitation of the SQL engine.

FLOAT

  • "3.14"

  • "0.314e1" (scientific notation)

  • "2/5" (fraction)

  • "123.45%" (percentage)

  • "123.45‰" (permille)

  • "1,234,567.89" (thousands separator notation)

  • "1,234.56 $" (with currency symbol)

  • "$ 1,234.56" (with currency symbol)

  • "1.2M" (K8s-style, base-1000 units)

  • "2Gi" (K8s-style, base-1024 units)

  • "NaN" (case-insensitive)

  • "Inf" (case-insensitive)

  • "-Inf" (case-insensitive)

  • "Infinity" (case-insensitive)

  • "-Infinity" (case-insensitive)

  • "" (An empty string is parsed as NULL when read into the table.)

Note
  • The input is trimmed using the trim() operation during parsing.

  • "3.14", "0.314e1", "NaN", "Infinity", "-Infinity", and "" can also be correctly parsed in naive mode.

  • "3.14"

  • "3.14"

  • "0.4"

  • "1.2345"

  • "0.12345"

  • "1234567.89"

  • "1234.56"

  • "1234.56"

  • "1200000"

  • "2147483648"

  • "NaN"

  • "Infinity"

  • "-Infinity"

  • "Infinity"

  • "-Infinity"

  • "" (NULL is output as an empty string to the CSV file.)

Special values (case-insensitive) include NaN, Inf, -Inf, Infinity, and -Infinity. If the value is outside the value range, an error is reported. If the precision is exceeded, the value is rounded and truncated.

DOUBLE

  • "3.1415926"

  • "0.314e1" (scientific notation)

  • "2/5" (fraction)

  • "123.45%" (percentage)

  • "123.45‰" (permille)

  • "1,234,567.89" (thousands separator notation)

  • "1,234.56 $" (with currency symbol)

  • "$ 1,234.56" (with currency symbol)

  • "1.2M" (K8s-style, base-1000 units)

  • "2Gi" (K8s-style, base-1024 units)

  • "NaN" (case-insensitive)

  • "Inf" (case-insensitive)

  • "-Inf" (case-insensitive)

  • "Infinity" (case-insensitive)

  • "-Infinity" (case-insensitive)

  • "" (An empty string is parsed as NULL when read into the table.)

Note
  • The input is trimmed using the trim() operation during parsing.

  • "3.1415926", "0.314e1", "NaN", "Infinity", "-Infinity", and "" can also be correctly parsed in naive mode.

  • "3.1415926"

  • "3.14"

  • "0.4"

  • "1.2345"

  • "0.12345"

  • "1234567.89"

  • "1234.56"

  • "1234.56"

  • "1200000"

  • "2147483648"

  • "NaN"

  • "Infinity"

  • "-Infinity"

  • "Infinity"

  • "-Infinity"

  • "" (NULL is output as an empty string to the CSV file.)

Special values (case-insensitive) include NaN, Inf, -Inf, Infinity, and -Infinity. If the value is outside the value range, an error is reported. If the precision is exceeded, the value is rounded and truncated.

DECIMAL

(precision, scale)

Example: DECIMAL(15,2)

  • "3.358"

  • "2/5" (fraction)

  • "123.45%" (percentage)

  • "123.45‰" (permille)

  • "1,234,567.89" (thousands separator notation)

  • "1,234.56 $" (with currency symbol)

  • "$ 1,234.56" (with currency symbol)

  • "1.2M" (K8s-style, base-1000 units)

  • "2Gi" (K8s-style, base-1024 units)

  • "" (An empty string is parsed as NULL when read into the table.)

Note
  • The input is trimmed using the trim() operation during parsing.

  • "3.358" and "" can also be correctly parsed in naive mode.

  • "3.36" (rounded)

  • "0.4"

  • "1.23" (rounded)

  • "0.12" (rounded)

  • "1234567.89"

  • "1234.56"

  • "1234.56"

  • "1200000"

  • "2147483648"

  • "" (NULL is output as an empty string to the CSV file.)

The number of digits in the integer part exceeds precision-scale.

, an error is reported. If the fractional part exceeds the scale, the value is rounded and truncated.

CHAR(n)

Example: CHAR(7)

  • "abcdefg"

  • "abcdefghijklmn"

  • "abc"

  • "" (An empty string is parsed as NULL when read into the table.)

  • "abcdefg"

  • "abcdefg" (The rest of the string is truncated.)

  • "abc " (The _ character represents a space. The string is padded with four spaces.)

  • "" (NULL is output as an empty string to the CSV file.)

The maximum length is 255. If the length is insufficient, the string is padded with spaces. The spaces are not used in comparisons. If the length exceeds n, the string is truncated.

VARCHAR(n)

Example: VARCHAR(7)

  • "abcdefg"

  • "abcdefghijklmn"

  • "abc"

  • "" (An empty string is parsed as NULL when read into the table.)

  • "abcdefg"

  • "abcdefg" (The rest of the string is truncated.)

  • "abc"

  • "" (NULL is output as an empty string to the CSV file.)

The maximum length is 65535. If the length exceeds n, the string is truncated.

STRING

  • "abcdefg"

  • "abc"

  • "" (An empty string is parsed as NULL when read into the table.)

  • "abcdefg"

  • "abc"

  • "" (NULL is output as an empty string to the CSV file.)

The length is limited to 8 MB.

DATE

  • "yyyy-MM-dd" (for example, "2025-02-21")

  • "yyyyMMdd" (for example, "20250221")

  • "MMM d,yyyy" (for example, "October 1,2025")

  • "MMMM d,yyyy" (for example, "Oct 1,2025")

  • "yyyy-M-d" (for example, "2025-10-1")

  • "" (An empty string is parsed as NULL when read into the table.)

Note

You can also customize the parsing method for reads by setting odps.text.option.date.io.format. For example, if you set the format to 'dd/MM/yyyy#yyyy--MM--dd', you can parse '21/02/2025' and '2025--02--21'.

  • "2000-01-01"

  • "2000-01-01"

  • "" (NULL is output as an empty string to the CSV file.)

  • It does not contain time information. Changing the time zone does not affect the output. The default output format is "yyyy-MM-dd".

  • You can also customize the parsing and output formats by setting odps.text.option.date.io.format. The first custom pattern is used as the output format. For more information about the format, see DateTimeFormatter.

TIMESTAMP_NTZ

Note

OpenCsvSerde does not support this data type because it is not compatible with the Hive data format.

  • The nanosecond part can have a length of 0 to 9. MaxCompute supports the following built-in parsing formats:

    • "yyyy-MM-dd HH:mm:ss[.SSSSSSSSS]" (for example, "2000-01-01 00:00:00.123")

    • "yyyy-MM-ddTHH:mm:ss[.SSSSSSSSS]" (for example, "2000-01-01T00:00:00.123456789")

    • "yyyyMMddHHmmss" (for example, "20000101000000")

    • "" (An empty string is parsed as NULL when read into the table.)

  • You can also customize the parsing method for reads by setting odps.text.option.timestamp_ntz.io.format. For example, if you set the format to 'ddMMyyyy-HHmmss', you can parse a string such as '31102024-103055'.

  • "2000-01-01 00:00:00.123000000"

  • "2000-01-01 00:00:00.123456789"

  • "2000-01-01 00:00:00.000000000"

  • "" (NULL is output as an empty string to the CSV file.)

  • It is essentially a timestamp with nanosecond granularity. Changing the time zone does not affect the output. By default, the output is in the standard UTC time zone. The default output format is "yyyy-MM-dd HH:mm:ss.SSSSSSSSS".

  • You can also customize the parsing and output formats by setting odps.text.option.timestamp_ntz.io.format. For more information about the format, see DateTimeFormatter.

DATETIME

  • The millisecond part can have a length of 0 to 3. x represents the time zone offset. If the system time zone is Asia/Shanghai, MaxCompute supports the following built-in parsing formats:

    • "yyyy-MM-dd HH:mm:ss[.SSS][x]" (for example, "2000-01-01 00:00:00.123")

    • "yyyy-MM-ddTHH:mm:ss[.SSS][x]" (for example, "2000-01-01T00:00:00.123+0000")

    • "yyyyMMddHHmmss[x]" (for example, "20000101000000+0000")

    • "" (An empty string is parsed as NULL when read into the table.)

  • You can also customize the parsing method for reads by setting odps.text.option.datetime.io.format. For example, if you set the format to 'yyyyMMdd-HHmmss.SSS', you can parse a string such as '20241031-103055.123'.

If the system time zone is Asia/Shanghai:

  • "2000-01-01 00:00:00.123+0800"

  • "2000-01-01 08:00:00.123+0800"

  • "2000-01-01 08:00:00.000+0800"

  • "" (NULL is output as an empty string to the CSV file.)

  • It is essentially a timestamp with millisecond granularity. Changing the time zone affects the output. The default output format is "yyyy-MM-dd HH:mm:ss.SSSx".

  • You can use odps.sql.timezone to change the system time zone to control the time zone offset of the output value.

  • You can also customize the parsing and output formats by setting odps.text.option.datetime.io.format. For more information about the format, see DateTimeFormatter.

TIMESTAMP

  • The nanosecond part can have a length of 0 to 9. x represents the time zone offset. If the system time zone is Asia/Shanghai, MaxCompute supports the following built-in parsing formats:

    • "yyyy-MM-dd HH:mm:ss[.SSSSSSSSS][x]" (for example, "2000-01-01 00:00:00.123456")

    • "yyyy-MM-ddTHH:mm:ss[.SSSSSSSSS][x]" (for example, "2000-01-01T00:00:00.123+0000")

    • "yyyyMMddHHmmss[x]" (for example, "20000101000000+0000")

    • "" (An empty string is parsed as NULL when read into the table.)

  • You can also customize the parsing method for reads by setting odps.text.option.timestamp.io.format. For example, if you set the format to 'yyyyMMdd-HHmmss', you can parse a string such as '20240910-103055'.

(If the system time zone is Asia/Shanghai)

  • "2000-01-01 00:00:00.123456000+0800"

  • "2000-01-01 08:00:00.123000000+0800"

  • "2000-01-01 08:00:00.000000000+0800"

  • "" (NULL is output as an empty string to the CSV file.)

  • It is essentially a timestamp with nanosecond granularity. Changing the time zone affects the output. The default output format is "yyyy-MM-dd HH:mm:ss.SSSSSSSSSx".

  • You can use odps.sql.timezone to change the system time zone to control the time zone offset of the output value.

  • You can also customize the parsing and output formats by setting odps.text.option.timestamp.io.format. For more information about the format, see DateTimeFormatter.

  • General rules

    • For any data type, an empty string in a CSV data file is parsed as NULL when read into a table.

  • Unsupported data types

    • Complex types (STRUCT, ARRAY, and MAP): Not supported. This is because the values of these complex types can easily conflict with common CSV separators, such as the comma (,), which can cause parsing failures.

    • BINARY and INTERVAL types: Not supported. To use these types, you can contact MaxCompute technical support.

  • Numeric types (such as INT and DOUBLE)

    • For INT, SMALLINT, TINYINT, BIGINT, FLOAT, DOUBLE, and DECIMAL data types, the default parsing capability is flexible.

    • If you want to parse only basic number strings, you can set the odps.text.option.smart.parse.level parameter to naive in tblproperties. This configures the parser to support only common number strings, such as "123" and "123.456". An error is reported when parsing strings in other formats.

  • Date and time types (such as DATE and TIMESTAMP)

    • The four date or time-related types, DATE, DATETIME, TIMESTAMP, and TIMESTAMP_NTZ, are all processed using java.time.format.DateTimeFormatter in the underlying code.

    • Default format: MaxCompute has several built-in default parsing formats.

    • Custom format:

      • You can set odps.text.option.<date|datetime|timestamp|timestamp_ntz>.io.format in tblproperties to customize multiple parsing formats and one output format.

      • You can separate multiple parsing formats with a number sign (#).

      • Custom formats have a higher priority than MaxCompute's built-in parsing formats. The first custom format is used as the output format.

      • Example: If you customize the format string for the DATE type as pattern1#pattern2#pattern3, you can parse strings that conform to pattern1, pattern2, or pattern3. However, when writing to a file, the pattern1 format is used. For more information, see DateTimeFormatter.

  • Important: Notes on the 'z' time zone format

    • We recommend that you do not use 'z' (time-zone name) as a custom time zone format, especially for users in China, because 'z' can be ambiguous in some contexts.

    • We recommend that you use 'x' (zone-offset) or 'VV' (time-zone ID) as the time zone pattern instead.

    • Example: 'CST' usually stands for China Standard Time (UTC+8) in China. However, when 'CST' is parsed by java.time.format.DateTimeFormatter, it is identified as US Central Standard Time (UTC-6). This may lead to unexpected input or output results.

Usage examples

Prerequisites

  • A MaxCompute project is created.

  • An OSS bucket and folder are available. For more information, see Create a bucket and Manage folders.

    MaxCompute supports automatic folder creation in OSS. If an SQL statement involves external tables and user-defined functions (UDFs), you can use a single statement to read from and write to the tables and use the UDFs. You can also create folders manually.

    MaxCompute is deployed only in specific regions. To avoid potential issues with cross-region data connections, ensure that your OSS bucket is in the same region as your MaxCompute project.
  • Authorization

    • You must have permissions to access OSS. You can use an Alibaba Cloud account, a Resource Access Management (RAM) user, or a RAM role to access OSS external tables. For more information about authorization, see Authorize access in STS mode for OSS.

    • You must have the CreateTable permission in the MaxCompute project. For more information about table permissions, see MaxCompute permissions.

Create an OSS foreign table using the built-in text data parser

Example 1: Non-partitioned table

  1. Map the table to the Demo1/ folder in Example data. The following sample code shows how to create an OSS foreign table.

    CREATE EXTERNAL TABLE IF NOT EXISTS mc_oss_csv_external1
    (
      vehicleId INT,
      recordId INT,
      patientId INT,
      calls INT,
      locationLatitute DOUBLE,
      locationLongtitue DOUBLE,
      recordTime STRING,
      direction STRING
    )
    STORED BY 'com.aliyun.odps.CsvStorageHandler' 
    WITH serdeproperties (
      'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
    ) 
    LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo1/';
    
    -- You can run the desc extended mc_oss_csv_external1; command to view the schema of the created OSS foreign table.

    The role used in this example is aliyunodpsdefaultrole. If you use another role, you can replace aliyunodpsdefaultrole with the name of your role and grant the role permissions to access OSS.

  2. Query the non-partitioned foreign table.

    SELECT * FROM mc_oss_csv_external1;

    The following result is returned:

    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+
    | vehicleid  | recordid   | patientid  | calls      | locationlatitute | locationlongtitue | recordtime     | direction  |
    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+
    | 1          | 1          | 51         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          |
    | 1          | 2          | 13         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         |
    | 1          | 3          | 48         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         |
    | 1          | 4          | 30         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | W          |
    | 1          | 5          | 47         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          |
    | 1          | 6          | 9          | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | S          |
    | 1          | 7          | 53         | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | N          |
    | 1          | 8          | 63         | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | SW         |
    | 1          | 9          | 4          | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | NE         |
    | 1          | 10         | 31         | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | N          |
    +------------+------------+------------+------------+------------------+-------------------+------------+----------------+
  3. Write data to the non-partitioned foreign table and check whether the data is written.

    INSERT INTO mc_oss_csv_external1 VALUES(1,12,76,1,46.81006,-92.08174,'9/14/2014 0:10','SW');
    SELECT * FROM mc_oss_csv_external1 WHERE recordId=12;

    The following result is returned:

    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+
    | vehicleid  | recordid   | patientid  | calls      | locationlatitute | locationlongtitue | recordtime     | direction  |
    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+
    | 1          | 12         | 76         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:10 | SW         |
    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+

    Check the OSS path Demo1/. A new file is generated.image

Example 2: Partitioned table

  1. Map the table to the Demo2/ folder in Example data. The following sample code shows how to create an OSS foreign table and import partition data.

    CREATE EXTERNAL TABLE IF NOT EXISTS mc_oss_csv_external2
    (
      vehicleId INT,
      recordId INT,
      patientId INT,
      calls INT,
      locationLatitute DOUBLE,
      locationLongtitue DOUBLE,
      recordTime STRING
    )
    PARTITIONED BY (
      direction STRING
    )
    STORED BY 'com.aliyun.odps.CsvStorageHandler' 
    WITH serdeproperties (
      'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
    ) 
    LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo2/';
    
    -- You can run the DESC EXTENDED mc_oss_csv_external2; command to view the schema of the created foreign table.

    The role used in this example is aliyunodpsdefaultrole. If you use another role, you can replace aliyunodpsdefaultrole with the name of your role and grant the role permissions to access OSS.

  2. Import partition data. When you create a partitioned OSS foreign table, you must perform an additional operation to import partition data. For more information, see Syntax for adding partition data to an OSS foreign table.

    MSCK REPAIR TABLE mc_oss_csv_external2 ADD PARTITIONS;
    
    -- This is equivalent to the following statement.
    ALTER TABLE mc_oss_csv_external2 ADD PARTITION (direction = 'N') 
      PARTITION (direction = 'NE') PARTITION (direction = 'S') 
      PARTITION (direction = 'SW') PARTITION (direction = 'W');
  3. Query the partitioned foreign table.

    SELECT * FROM mc_oss_csv_external2 WHERE direction='NE';

    The following result is returned:

    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+
    | vehicleid  | recordid   | patientid  | calls      | locationlatitute | locationlongtitue | recordtime     | direction  |
    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+
    | 1          | 2          | 13         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         |
    | 1          | 3          | 48         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         |
    | 1          | 9          | 4          | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | NE         |
    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+
  4. Write data to the partitioned foreign table and check whether the data is written.

    INSERT INTO mc_oss_csv_external2 PARTITION(direction='NE') VALUES(1,12,76,1,46.81006,-92.08174,'9/14/2014 0:10');
    SELECT * FROM mc_oss_csv_external2 WHERE direction='NE' AND recordId=12;

    The following result is returned:

    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+
    | vehicleid  | recordid   | patientid  | calls      | locationlatitute | locationlongtitue | recordtime     | direction  |
    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+
    | 1          | 12         | 76         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:10 | NE         |
    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+

    Check the OSS path Demo2/direction=NE. A new file is generated.image

Example 3: Compressed data

This example shows how to create a GZIP-compressed foreign table in CSV format and perform read and write operations.

  1. Create an internal table and write test data to it for subsequent write tests.

    CREATE TABLE vehicle_test(
      vehicleid INT, 
      recordid INT, 
      patientid INT, 
      calls INT, 
      locationlatitute DOUBLE, 
      locationlongtitue DOUBLE, 
      recordtime STRING, 
      direction STRING
    );
    
    INSERT INTO vehicle_test VALUES (1,1,51,1,46.81006,-92.08174,'9/14/2014 0:00','S');
  2. Create a GZIP-compressed foreign table in CSV format and map it to the Demo3/ (compressed data) folder in Example data. The following sample code shows how to create the OSS foreign table.

    CREATE EXTERNAL TABLE IF NOT EXISTS mc_oss_csv_external3
    (
      vehicleId INT,
      recordId INT,
      patientId INT,
      calls INT,
      locationLatitute DOUBLE,
      locationLongtitue DOUBLE,
      recordTime STRING,
      direction STRING
    )
    PARTITIONED BY (dt STRING)
    STORED BY 'com.aliyun.odps.CsvStorageHandler' 
    WITH serdeproperties (
      'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole',
      'odps.text.option.gzip.input.enabled'='true',
      'odps.text.option.gzip.output.enabled'='true' 
    ) 
    LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo3/';
    
    -- Import partition data.
    MSCK REPAIR TABLE mc_oss_csv_external3 ADD PARTITIONS;
    
    -- You can run the DESC EXTENDED mc_oss_csv_external3; command to view the schema of the created foreign table.

    The role used in this example is aliyunodpsdefaultrole. If you use another role, you can replace aliyunodpsdefaultrole with the name of your role and grant the role permissions to access OSS.

  3. Use the MaxCompute client to read data from OSS. The following sample code provides an example:

    Note

    If the compressed data in OSS is in an open source data format, you must add the set odps.sql.hive.compatible=true; command before the SQL statement and submit them together.

    --Enable a full table scan. This setting is valid only for the current session.
    SET odps.sql.allow.fullscan=true;
    SELECT recordId, patientId, direction FROM mc_oss_csv_external3 WHERE patientId > 25;

    The following result is returned:

    +------------+------------+------------+
    | recordid   | patientid  | direction  |
    +------------+------------+------------+
    | 1          | 51         | S          |
    | 3          | 48         | NE         |
    | 4          | 30         | W          |
    | 5          | 47         | S          |
    | 7          | 53         | N          |
    | 8          | 63         | SW         |
    | 10         | 31         | N          |
    +------------+------------+------------+
  4. Read data from the internal table and write it to the OSS foreign table.

    You can run the INSERT OVERWRITE or INSERT INTO command on the foreign table using the MaxCompute client to write data to OSS.

    INSERT INTO TABLE mc_oss_csv_external3 PARTITION (dt='20250418') SELECT * FROM vehicle_test;

    After the statement is executed, you can view the exported file in the OSS folder.

Create an OSS foreign table and specify the first row of the corresponding OSS file as the table header

Create the Demo11 folder in the oss-mc-test bucket from Example data and run the following statements:

--Create a foreign table.
CREATE EXTERNAL TABLE mf_oss_wtt
(
  id BIGINT,
  name STRING,
  tran_amt DOUBLE
)
STORED BY 'com.aliyun.odps.CsvStorageHandler'
WITH serdeproperties (
  'odps.text.option.header.lines.count' = '1',
  'odps.sql.text.option.flush.header' = 'true'
)
LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo11/';

--Insert data.
INSERT OVERWRITE TABLE mf_oss_wtt VALUES (1, 'val1', 1.1),(2, 'value2', 1.3);

--Query data.
--When you create the table, you can set the data type of all fields to STRING. Otherwise, an error is reported when the table header is read.
--Alternatively, add the parameter to skip the table header when you create the table: 'odps.text.option.header.lines.count' = '1'
SELECT * FROM mf_oss_wtt;

The role used in this example is aliyunodpsdefaultrole. If you use another role, you can replace aliyunodpsdefaultrole with the name of your role and grant the role permissions to access OSS.

The following result is returned:

+----------+--------+------------+
| id       | name   | tran_amt   |
+----------+--------+------------+
| 1        | val1   | 1.1        |
| 2        | value2 | 1.3        |
+----------+--------+------------+

Create an OSS foreign table where the number of columns does not match the number of columns in the OSS data

  1. Create the demo folder in the oss-mc-test bucket from Example data and upload the test.csv file. The test.csv file contains the following content.

    1,kyle1,this is desc1
    2,kyle2,this is desc2,this is two
    3,kyle3,this is desc3,this is three, I have 4 columns
  2. Create a foreign table.

    1. Set the handling mode for rows with a mismatched number of columns to TRUNCATE.

      -- Delete the table.
      DROP TABLE test_mismatch;
      -- Create a new foreign table.
      CREATE EXTERNAL TABLE IF NOT EXISTS test_mismatch
      (
        id string,
        name string,
        dect string,
        col4 string
      )
      STORED BY 'com.aliyun.odps.CsvStorageHandler'
      WITH serdeproperties ('odps.sql.text.schema.mismatch.mode' = 'truncate')
      LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/demo/';
    2. Set the handling mode for rows with a mismatched number of columns to IGNORE.

      -- Delete the table.
      DROP TABLE test_mismatch01;
      -- Create a new foreign table.
      CREATE EXTERNAL TABLE IF NOT EXISTS test_mismatch01
      (
        id STRING,
        name STRING,
        dect STRING,
        col4 STRING
      )
      STORED BY 'com.aliyun.odps.CsvStorageHandler'
      WITH serdeproperties ('odps.sql.text.schema.mismatch.mode' = 'ignore')
      LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/demo/';
    3. Query table data.

      • Query the test_mismatch table.

        SELECT * FROM test_mismatch;
        
        --Result
        +----+-------+---------------+---------------+
        | id | name  | dect          | col4          |
        +----+-------+---------------+---------------+
        | 1  | kyle1 | this is desc1 | NULL          |
        | 2  | kyle2 | this is desc2 | this is two   |
        | 3  | kyle3 | this is desc3 | this is three |
        +----+-------+---------------+---------------+
      • Query the test_mismatch01 table.

        SELECT * FROM test_mismatch01;
        
        --Result
        +----+-------+----------------+-------------+
        | id | name  | dect           | col4        |
        +----+-------+----------------+-------------+
        | 2  | kyle2 | this is desc2  | this is two  
        +----+-------+----------------+-------------+

Create an OSS foreign table using the built-in open source parser

This example shows how to create an OSS foreign table using the built-in open source parser to read a comma-separated file and ignore the first and last rows.

  1. Create the demo-test folder in the oss-mc-test bucket from Example data and upload the test file test.csv.

    The test file contains the following data.

    1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S
    1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE
    1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE
    1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W
    1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S
    1,6,9,1,46.81006,-92.08174,9/15/2014 0:00,S
    1,7,53,1,46.81006,-92.08174,9/15/2014 0:00,N
    1,8,63,1,46.81006,-92.08174,9/15/2014 0:00,SW
    1,9,4,1,46.81006,-92.08174,9/15/2014 0:00,NE
    1,10,31,1,46.81006,-92.08174,9/15/2014 0:00,N
  2. Create a foreign table, specify the separator as a comma, and set parameters to ignore the first and last rows.

    CREATE EXTERNAL TABLE ext_csv_test08
    (
      vehicleId INT,
      recordId INT,
      patientId INT,
      calls INT,
      locationLatitute DOUBLE,
      locationLongtitue DOUBLE,
      recordTime STRING,
      direction STRING
    )
    ROW FORMAT serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
    WITH serdeproperties (
    "separatorChar" = ","
    )
    stored AS textfile
    location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/***/'
    -- Set parameters to ignore the first and last rows.
    TBLPROPERTIES (
    "skip.header.line.COUNT"="1",
    "skip.footer.line.COUNT"="1"
    )
    ;
  3. Read the foreign table.

    SELECT * FROM ext_csv_test08;
    
    -- Only 8 rows were read, and the header and footer rows were ignored.
    +------------+------------+------------+------------+------------------+-------------------+------------+------------+
    | vehicleid  | recordid   | patientid  | calls      | locationlatitute | locationlongtitue | recordtime | direction  | 
    +------------+------------+------------+------------+------------------+-------------------+------------+------------+
    | 1          | 1          | 51         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          | 
    | 1          | 2          | 13         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         | 
    | 1          | 3          | 48         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         | 
    | 1          | 4          | 30         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | W          | 
    | 1          | 5          | 47         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          | 
    | 1          | 6          | 9          | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | S          | 
    | 1          | 7          | 53         | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | N          | 
    | 1          | 8          | 63         | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | SW         | 
    | 1          | 9          | 4          | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | NE         | 
    | 1          | 10         | 31         | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | N          | 
    +------------+------------+------------+------------+------------------+-------------------+------------+------------+

Create an OSS foreign table in CSV format with custom date and time data types

For more information about custom parsing and output formats for time types in CSV, see Smart Parse for more flexible type compatibility.

  1. Create a CSV foreign table for each time data type (datetime, timestamp, and timestamp_ntz).

    CREATE EXTERNAL TABLE test_csv
    (
      col_date DATE,
      col_datetime DATETIME,
      col_timestamp TIMESTAMP,
      col_timestamp_ntz TIMESTAMP_NTZ
    )
    STORED BY 'com.aliyun.odps.CsvStorageHandler'
    LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/demo/'
    TBLPROPERTIES (
      'odps.text.option.date.io.format' = 'MM/dd/yyyy',
      'odps.text.option.datetime.io.format' = 'yyyy-MM-dd-HH-mm-ss x',
      'odps.text.option.timestamp.io.format' = 'yyyy-MM-dd HH-mm-ss VV',
      'odps.text.option.timestamp_ntz.io.format' = 'yyyy-MM-dd HH:mm:ss.SS'
    );
    
    INSERT OVERWRITE test_csv 
      VALUES(DATE'2025-02-21', DATETIME'2025-02-21 08:30:00', TIMESTAMP'2025-02-21 12:30:00', TIMESTAMP_NTZ'2025-02-21 16:30:00.123456789');
  2. After the data is inserted, the content of the CSV file is:

    02/21/2025,2025-02-21-08-30-00 +08,2025-02-21 12-30-00 Asia/Shanghai,2025-02-21 16:30:00.12
  3. Read the data again to view the result.

    SELECT * FROM test_csv;

    The following result is returned:

    +------------+---------------------+---------------------+------------------------+
    | col_date   | col_datetime        | col_timestamp       | col_timestamp_ntz      |
    +------------+---------------------+---------------------+------------------------+
    | 2025-02-21 | 2025-02-21 08:30:00 | 2025-02-21 12:30:00 | 2025-02-21 16:30:00.12 |
    +------------+---------------------+---------------------+------------------------+

FAQ

A "column count mismatch" error is reported when reading CSV/TSV data

  • Symptom

    If the number of columns in a CSV/TSV file does not match the number of columns in the foreign table DDL, a "column count mismatch" error is reported when you read the CSV/TSV data. For example: FAILED: ODPS-0123131:User defined function exception - Traceback:java.lang.RuntimeException: SCHEMA MISMATCH:xxx.

  • Solution

    You can control the output by setting the odps.sql.text.schema.mismatch.mode parameter at the session level:

    • SET odps.sql.text.schema.mismatch.mode=error: If the number of columns does not match, the system reports an error.

    • SET odps.sql.text.schema.mismatch.mode=truncate: If the file has more columns than the foreign table DDL, the extra data is discarded. If the file has fewer columns than the foreign table DDL, the missing columns are filled with NULL.