All Products
Search
Document Center

MaxCompute:CSV and TSV external tables

Last Updated:Mar 27, 2026

Use CSV or TSV files stored in Object Storage Service (OSS) as MaxCompute external tables to query, write, and analyze data without moving it into MaxCompute internal storage.

Limitations

  • OSS external tables do not support the cluster property.

  • A single file cannot exceed 2 GB. Split files larger than 2 GB before using them.

  • MaxCompute and OSS must be in the same region.

Choose a parser

MaxCompute provides two built-in parsers for CSV and TSV external tables:

ParserClass nameWhen to use
Built-in text data parsercom.aliyun.odps.CsvStorageHandler (CSV) / com.aliyun.odps.TsvStorageHandler (TSV)Standard use. Supports GZIP compression, Smart Parse, and TIMESTAMP_NTZ.
Built-in open source data parserorg.apache.hadoop.hive.serde2.OpenCSVSerdeHive-compatible workloads. Supports SNAPPY and LZO compression.

Supported data types

For details on MaxCompute data types, see Data types V1.0 and Data types V2.0.

Data typeBuilt-in text data parserOpenCSVSerde
TINYINTSupportedSupported
SMALLINTSupportedSupported
INTSupportedSupported
BIGINTSupportedSupported
BINARYNot supportedNot supported
FLOATSupportedSupported
DOUBLESupportedSupported
DECIMAL(precision,scale)SupportedSupported
VARCHAR(n)SupportedSupported
CHAR(n)SupportedSupported
STRINGSupportedSupported
DATESupportedSupported
DATETIMESupportedSupported
TIMESTAMPSupportedSupported
TIMESTAMP_NTZSupportedNot supported
BOOLEANSupportedSupported
ARRAYNot supportedNot supported
MAPNot supportedNot supported
STRUCTNot supportedNot supported
JSONNot supportedNot supported

Supported compression formats

Compression formatBuilt-in text data parserOpenCSVSerde
GZIPSupportedSupported
SNAPPYNot supportedSupported
LZONot supportedSupported

Add the WITH serdeproperties clause to the CREATE TABLE statement to enable compression. See serdeproperties parameters for the specific properties.

Schema evolution

The schema of a CSV or TSV external table is mapped to file columns by position.

OperationSupportedNotes
Add columnsYesCannot add columns with default values or complex/nested types.
Delete columnsYesNot recommended — may cause a schema-data mismatch.
Change column orderYesNot recommended — may cause a schema-data mismatch.
Change column data typesYesSee Change column data types for the conversion table.
Change column namesYes
Change column commentsYesComments must be no more than 1,024 bytes.
Change the NOT NULL propertyNoAll columns are Nullable by default.

Handle column count mismatches

If an OSS file has a different number of columns than the external table schema, set odps.sql.text.schema.mismatch.mode to control how mismatched rows are handled:

ValueRows with more columns than the schemaRows with fewer columns than the schema
error (default)Reports an errorReports an error
truncateExtra columns are discardedMissing columns are filled with null
ignoreThe entire row is discardedThe entire row is discarded
This parameter has no effect when odps.text.option.use.quote is set to True.

Prerequisites

Before you begin, ensure that you have:

Create an external table

Syntax

Built-in text data parser — CSV format

CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name>
(
  <col_name> <data_type>,
  ...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)]
STORED BY 'com.aliyun.odps.CsvStorageHandler'
[WITH serdeproperties (
  ['<property_name>'='<property_value>',...]
)]
LOCATION '<oss_location>'
[tblproperties ('<tbproperty_name>'='<tbproperty_value>',...)]

Built-in text data parser — TSV format

CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name>
(
  <col_name> <data_type>,
  ...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)]
STORED BY 'com.aliyun.odps.TsvStorageHandler'
[WITH serdeproperties (
  ['<property_name>'='<property_value>',...]
)]
LOCATION '<oss_location>'
[tblproperties ('<tbproperty_name>'='<tbproperty_value>',...)]

Built-in open source data parser

CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name>
(
  <col_name> <data_type>,
  ...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)]
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
[WITH serdeproperties (
  ['<property_name>'='<property_value>',...]
)]
STORED AS TEXTFILE
LOCATION '<oss_location>'
[tblproperties ('<tbproperty_name>'='<tbproperty_value>',...)]

For common parameters shared across all external table types, see Basic syntax parameters.

serdeproperties parameters

Built-in text data parser (CsvStorageHandler / TsvStorageHandler)

ParameterApplicable toDescriptionValuesDefault
odps.text.option.gzip.input.enabledReadEnables reading from GZIP-compressed CSV or TSV files. Set to True or MaxCompute cannot read the file.True / FalseFalse
odps.text.option.gzip.output.enabledWriteEnables writing to OSS in GZIP format. If not set, data is written uncompressed.True / FalseFalse
odps.text.option.header.lines.countReadNumber of rows to skip from the start of the file (for example, to skip a header row).Non-negative integer0
odps.text.option.null.indicatorReadParses a specific string as NULL. For example, set to \\N to parse \N in the file as NULL (the first \ is an escape character), so a,\N,b becomes a, NULL, b. Only one string can be specified.StringEmpty string
odps.text.option.ignore.empty.linesReadIf True, empty lines in the file are skipped.True / FalseTrue
odps.text.option.encodingReadCharacter encoding of the OSS file. Must match the actual encoding or MaxCompute cannot read the data.UTF-8 / UTF-16 / US-ASCII / GBKUTF-8
odps.text.option.delimiterReadColumn delimiter character. Must match the actual delimiter in the file or columns will be misaligned.Single character, (comma)
odps.text.option.use.quoteReadWhen a CSV field contains a line break, a double quote (escaped as ""), or a comma, the field must be enclosed in double quotes. Set to True to enable this detection. When enabled, odps.sql.text.schema.mismatch.mode has no effect.True / FalseFalse
odps.sql.text.option.flush.headerWriteWrites the table header as the first row of each file block. Takes effect only for CSV format.True / FalseFalse
odps.sql.text.schema.mismatch.modeReadControls behavior when the number of columns in the OSS file does not match the external table schema. See Handle column count mismatches. Has no effect when odps.text.option.use.quote is True.error / truncate / ignoreerror

Built-in open source data parser (OpenCSVSerde)

ParameterApplicable toDescriptionValuesDefault
separatorCharRead / WriteColumn delimiter for CSV data.Single character, (comma)
quoteCharRead / WriteQuote character for fields that contain the delimiter.Single characterNone
escapeCharRead / WriteEscape character for special characters.Single characterNone

tblproperties parameters

Built-in text data parser (CsvStorageHandler / TsvStorageHandler)

ParameterApplicable toDescriptionValuesDefault
odps.text.option.smart.parse.levelReadControls the numeric parsing strictness. Set to naive to restrict parsing to basic number strings only (such as "123" and "123.456"). Any other format returns an error. By default, flexible parsing is used. See Smart Parse.naive(flexible)
odps.text.option.date.io.formatRead / WriteCustom parse and output format for DATE columns. Separate multiple parsing formats with #. The first format is used for output. Custom formats take priority over built-in formats. See DateTimeFormatter.Format string(built-in formats)
odps.text.option.datetime.io.formatRead / WriteCustom parse and output format for DATETIME columns. Same rules as above.Format string(built-in formats)
odps.text.option.timestamp.io.formatRead / WriteCustom parse and output format for TIMESTAMP columns. Same rules as above.Format string(built-in formats)
odps.text.option.timestamp_ntz.io.formatRead / WriteCustom parse and output format for TIMESTAMP_NTZ columns. Same rules as above.Format string(built-in formats)

Built-in open source data parser (OpenCSVSerde)

ParameterApplicable toDescriptionValuesDefault
skip.header.line.countReadNumber of rows to skip from the start of the file.Non-negative integerNone
skip.footer.line.countReadNumber of rows to skip from the end of the file.Non-negative integerNone
mcfed.mapreduce.output.fileoutputformat.compressWriteEnables writing TEXTFILE data to OSS in a compressed format.True / FalseFalse
mcfed.mapreduce.output.fileoutputformat.compress.codecWriteSets the compression codec for TEXTFILE output. Only the four listed values are supported.com.hadoop.compression.lzo.LzoCodec / com.hadoop.compression.lzo.LzopCodec / org.apache.hadoop.io.compress.SnappyCodec / com.aliyun.odps.io.compress.SnappyRawCodecNone
io.compression.codecsReadRequired when the OSS file is in Raw-Snappy format. Set to com.aliyun.odps.io.compress.SnappyRawCodec.com.aliyun.odps.io.compress.SnappyRawCodecNone
odps.text.option.bad.row.skippingReadControls how dirty data rows are handled. See BadRowSkipping.rigid / flexibleNone

Write data

Use INSERT OVERWRITE or INSERT INTO on the external table to write data to OSS. For the full write syntax, see Write syntax.

Query and analysis

BadRowSkipping

BadRowSkipping lets you skip rows that cause parse errors in CSV data. Enabling or disabling this feature does not affect how the underlying data format is parsed.

Configure bad row skipping

Control the behavior using a combination of a table-level parameter and a session-level parameter.

Table-level parameter: `odps.text.option.bad.row.skipping` (set in tblproperties)

  • rigid — skipping is always on and cannot be disabled at the session level.

  • flexible — skipping is on by default but can be overridden at the session level.

  • Not set — session-level settings determine the behavior.

Session-level parameters

ParameterValueEffect
odps.sql.unstructured.text.bad.row.skippingonEnables bad row skipping for the session. If the table-level parameter is not set, skipping is automatically enabled.
offDisables bad row skipping. If the table-level parameter is flexible, skipping is disabled. If it is rigid, this setting has no effect.
<null> or invalidBehavior is controlled entirely by the table-level parameter.
odps.sql.unstructured.text.bad.row.skipping.debug.numInteger (max 1,000)Prints details of problematic rows to stdout in Logview, up to the specified number. Set to 0 or a negative number to disable printing.

Interaction between table-level and session-level parameters

Table-level (tblproperties)Session-level (SET)Result
rigidonOn — forcibly enabled
rigidoffOn — forcibly enabled (rigid takes precedence)
rigid<null> or not setOn — forcibly enabled
flexibleonOn
flexibleoffOff — disabled by the session
flexible<null> or not setOn
Not setonOn — enabled by the session
Not setoffOff
Not set<null> or not set

Debug bad rows

To inspect rows that were skipped, set odps.sql.unstructured.text.bad.row.skipping.debug.num before running your query:

SET odps.sql.unstructured.text.bad.row.skipping.debug.num=10;
SELECT * FROM <your_table>;

Details of up to the specified number of problematic rows are printed to stdout in Logview. The maximum value is 1,000.

Smart Parse

Smart Parse enables MaxCompute SQL to parse a wider range of input formats for CSV data when using data types V2.0. This applies to the built-in text data parser only.

Numeric types

For INT, SMALLINT, TINYINT, BIGINT, FLOAT, DOUBLE, and DECIMAL, flexible parsing is enabled by default. The following input formats are accepted in addition to standard number strings:

  • Thousands separator: "1,234,567" — if using comma as the thousands separator, change the CSV column delimiter to a different character using odps.text.option.delimiter.

  • Scientific notation: "0.3e2", "-1e5"

  • Hexadecimal: "0xff" (case-insensitive)

  • Binary: "0b1001" (case-insensitive)

  • Fraction (integer result only): "4/2"

  • Percentage (integer result only): "1000%"

  • Permille (integer result only): "1000‰"

  • With currency symbol: "1,000 $", "$ 1,000" — supported symbols: $/¥/€/£/₩/USD/CNY/EUR/GBP/JPY/KRW/IDR/RP

  • K8s-style base-1000 units: K, M, G, P, T — for example, "3M" = 3,000,000

  • K8s-style base-1024 units: Ki, Mi, Gi, Pi, Ti — for example, "2Gi" = 2,147,483,648

To restrict parsing to basic number strings only (such as "123" and "123.456"), set odps.text.option.smart.parse.level to naive in tblproperties. Any other format returns an error.

BOOLEAN

InputOutput
"true" / "false""true" / "false"
"T" / "F""true" / "false"
"1" / "0""true" / "false"
"Yes" / "No""true" / "false"
"Y" / "N""true" / "false"
"" (empty string)NULL

Input is trimmed with trim() during parsing. If the value is not in the list above, parsing fails.

Date and time types

All four date/time types — DATE, DATETIME, TIMESTAMP, and TIMESTAMP_NTZ — use java.time.format.DateTimeFormatter internally.

Built-in parsing formats:

TypeAccepted input formats
DATE"yyyy-MM-dd", "yyyyMMdd", "MMM d,yyyy", "MMMM d,yyyy", "yyyy-M-d", "" → NULL
TIMESTAMP_NTZ"yyyy-MM-dd HH:mm:ss[.SSSSSSSSS]", "yyyy-MM-ddTHH:mm:ss[.SSSSSSSSS]", "yyyyMMddHHmmss", "" → NULL
DATETIME"yyyy-MM-dd HH:mm:ss[.SSS][x]", "yyyy-MM-ddTHH:mm:ss[.SSS][x]", "yyyyMMddHHmmss[x]", "" → NULL
TIMESTAMP"yyyy-MM-dd HH:mm:ss[.SSSSSSSSS][x]", "yyyy-MM-ddTHH:mm:ss[.SSSSSSSSS][x]", "yyyyMMddHHmmss[x]", "" → NULL

Custom formats:

Set odps.text.option.<date|datetime|timestamp|timestamp_ntz>.io.format in tblproperties to define custom parsing and output formats:

  • Separate multiple parsing formats with #. For example: 'pattern1#pattern2#pattern3'

  • The first format is used for output.

  • Custom formats take priority over built-in formats.

  • For format syntax, see DateTimeFormatter.

Time zone format:

Use 'x' (zone-offset) or 'VV' (time-zone ID) as the time zone pattern. Avoid 'z' (time-zone name) — it is ambiguous in some contexts. For example, 'CST' is parsed by DateTimeFormatter as US Central Standard Time (UTC-6), not China Standard Time (UTC+8).

General rules:

  • An empty string in a CSV file is parsed as NULL for all data types.

  • Complex types (STRUCT, ARRAY, MAP) are not supported because their values can conflict with common CSV separators.

  • BINARY and INTERVAL types are not supported. Contact MaxCompute technical support to use these types.

  • DATETIME and TIMESTAMP output is time-zone-sensitive. Use odps.sql.timezone to set the system time zone.

  • DATE and TIMESTAMP_NTZ output is not affected by time zone changes.

Examples

Example 1: Non-partitioned table (built-in text data parser)

This example maps an external table to the Demo1/ folder in OSS.

  1. Create the external table.

    CREATE EXTERNAL TABLE IF NOT EXISTS mc_oss_csv_external1
    (
      vehicleId INT,
      recordId INT,
      patientId INT,
      calls INT,
      locationLatitute DOUBLE,
      locationLongtitue DOUBLE,
      recordTime STRING,
      direction STRING
    )
    STORED BY 'com.aliyun.odps.CsvStorageHandler'
    WITH serdeproperties (
      'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
    )
    LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo1/';
    
    -- View the schema of the created table.
    DESC EXTENDED mc_oss_csv_external1;

    Replace aliyunodpsdefaultrole with the name of your role if using a different role, and grant that role permissions to access OSS.

  2. Query the table.

    SELECT * FROM mc_oss_csv_external1;

    Expected output:

    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+
    | vehicleid  | recordid   | patientid  | calls      | locationlatitute | locationlongtitue | recordtime     | direction  |
    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+
    | 1          | 1          | 51         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          |
    | 1          | 2          | 13         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         |
    | 1          | 3          | 48         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         |
    | 1          | 4          | 30         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | W          |
    | 1          | 5          | 47         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          |
    | 1          | 6          | 9          | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | S          |
    | 1          | 7          | 53         | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | N          |
    | 1          | 8          | 63         | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | SW         |
    | 1          | 9          | 4          | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | NE         |
    | 1          | 10         | 31         | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | N          |
    +------------+------------+------------+------------+------------------+-------------------+------------+----------------+
  3. Write a row and verify.

    INSERT INTO mc_oss_csv_external1 VALUES(1,12,76,1,46.81006,-92.08174,'9/14/2014 0:10','SW');
    SELECT * FROM mc_oss_csv_external1 WHERE recordId=12;

    Expected output:

    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+
    | vehicleid  | recordid   | patientid  | calls      | locationlatitute | locationlongtitue | recordtime     | direction  |
    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+
    | 1          | 12         | 76         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:10 | SW         |
    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+

    A new file is generated in the Demo1/ path in OSS.image

Example 2: Partitioned table (built-in text data parser)

This example maps a partitioned external table to the Demo2/ folder in OSS.

  1. Create the external table.

    CREATE EXTERNAL TABLE IF NOT EXISTS mc_oss_csv_external2
    (
      vehicleId INT,
      recordId INT,
      patientId INT,
      calls INT,
      locationLatitute DOUBLE,
      locationLongtitue DOUBLE,
      recordTime STRING
    )
    PARTITIONED BY (
      direction STRING
    )
    STORED BY 'com.aliyun.odps.CsvStorageHandler'
    WITH serdeproperties (
      'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
    )
    LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo2/';
    
    -- View the schema of the created table.
    DESC EXTENDED mc_oss_csv_external2;
  2. Import partition data. For partitioned OSS external tables, run MSCK REPAIR TABLE to populate the partitions.

    MSCK REPAIR TABLE mc_oss_csv_external2 ADD PARTITIONS;
    
    -- Equivalent to:
    ALTER TABLE mc_oss_csv_external2 ADD PARTITION (direction = 'N')
      PARTITION (direction = 'NE') PARTITION (direction = 'S')
      PARTITION (direction = 'SW') PARTITION (direction = 'W');

    For details, see Populate partitions for an OSS external table.

  3. Query the partitioned table.

    SELECT * FROM mc_oss_csv_external2 WHERE direction='NE';

    Expected output:

    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+
    | vehicleid  | recordid   | patientid  | calls      | locationlatitute | locationlongtitue | recordtime     | direction  |
    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+
    | 1          | 2          | 13         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         |
    | 1          | 3          | 48         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         |
    | 1          | 9          | 4          | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | NE         |
    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+
  4. Write a row and verify.

    INSERT INTO mc_oss_csv_external2 PARTITION(direction='NE') VALUES(1,12,76,1,46.81006,-92.08174,'9/14/2014 0:10');
    SELECT * FROM mc_oss_csv_external2 WHERE direction='NE' AND recordId=12;

    Expected output:

    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+
    | vehicleid  | recordid   | patientid  | calls      | locationlatitute | locationlongtitue | recordtime     | direction  |
    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+
    | 1          | 12         | 76         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:10 | NE         |
    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+

    A new file is generated in the Demo2/direction=NE path in OSS.image

Example 3: GZIP-compressed CSV table (built-in text data parser)

This example creates a GZIP-compressed external table in CSV format and performs read and write operations.

  1. Create an internal table and insert test data.

    CREATE TABLE vehicle_test(
      vehicleid INT,
      recordid INT,
      patientid INT,
      calls INT,
      locationlatitute DOUBLE,
      locationlongtitue DOUBLE,
      recordtime STRING,
      direction STRING
    );
    
    INSERT INTO vehicle_test VALUES (1,1,51,1,46.81006,-92.08174,'9/14/2014 0:00','S');
  2. Create a GZIP-compressed external table mapped to the Demo3/ folder in OSS.

    CREATE EXTERNAL TABLE IF NOT EXISTS mc_oss_csv_external3
    (
      vehicleId INT,
      recordId INT,
      patientId INT,
      calls INT,
      locationLatitute DOUBLE,
      locationLongtitue DOUBLE,
      recordTime STRING,
      direction STRING
    )
    PARTITIONED BY (dt STRING)
    STORED BY 'com.aliyun.odps.CsvStorageHandler'
    WITH serdeproperties (
      'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole',
      'odps.text.option.gzip.input.enabled'='true',
      'odps.text.option.gzip.output.enabled'='true'
    )
    LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo3/';
    
    -- Import partition data.
    MSCK REPAIR TABLE mc_oss_csv_external3 ADD PARTITIONS;
    
    -- View the schema.
    DESC EXTENDED mc_oss_csv_external3;
  3. Read data from OSS.

    If the compressed data is in an open source data format, prepend SET odps.sql.hive.compatible=true; and submit it together with the SQL statement.
    -- Enable a full table scan for this session.
    SET odps.sql.allow.fullscan=true;
    SELECT recordId, patientId, direction FROM mc_oss_csv_external3 WHERE patientId > 25;

    Expected output:

    +------------+------------+------------+
    | recordid   | patientid  | direction  |
    +------------+------------+------------+
    | 1          | 51         | S          |
    | 3          | 48         | NE         |
    | 4          | 30         | W          |
    | 5          | 47         | S          |
    | 7          | 53         | N          |
    | 8          | 63         | SW         |
    | 10         | 31         | N          |
    +------------+------------+------------+
  4. Write from the internal table to the external table.

    INSERT INTO TABLE mc_oss_csv_external3 PARTITION (dt='20250418') SELECT * FROM vehicle_test;

    After the statement completes, the exported file appears in the OSS folder.

Example 4: Write a table header to OSS

This example creates an external table that writes the column names as the first row of each file block.

-- Create the external table.
CREATE EXTERNAL TABLE mf_oss_wtt
(
  id BIGINT,
  name STRING,
  tran_amt DOUBLE
)
STORED BY 'com.aliyun.odps.CsvStorageHandler'
WITH serdeproperties (
  'odps.text.option.header.lines.count' = '1',
  'odps.sql.text.option.flush.header' = 'true'
)
LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo11/';

-- Insert data.
INSERT OVERWRITE TABLE mf_oss_wtt VALUES (1, 'val1', 1.1),(2, 'value2', 1.3);

-- Query data.
-- If you did not set all column types to STRING, set 'odps.text.option.header.lines.count' = '1'
-- to skip the header row when reading.
SELECT * FROM mf_oss_wtt;

Expected output:

+----------+--------+------------+
| id       | name   | tran_amt   |
+----------+--------+------------+
| 1        | val1   | 1.1        |
| 2        | value2 | 1.3        |
+----------+--------+------------+

Example 5: Handle column count mismatches

This example shows how odps.sql.text.schema.mismatch.mode affects rows where the file has a different number of columns than the schema.

The test.csv file contains:

1,kyle1,this is desc1
2,kyle2,this is desc2,this is two
3,kyle3,this is desc3,this is three, I have 4 columns

With `truncate` mode — extra columns are discarded; missing columns are filled with null:

DROP TABLE test_mismatch;
CREATE EXTERNAL TABLE IF NOT EXISTS test_mismatch
(
  id string,
  name string,
  dect string,
  col4 string
)
STORED BY 'com.aliyun.odps.CsvStorageHandler'
WITH serdeproperties ('odps.sql.text.schema.mismatch.mode' = 'truncate')
LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/demo/';

SELECT * FROM test_mismatch;

Result — row 1 has fewer columns than the schema, so col4 is NULL; row 3 has more columns, so the extra column is discarded:

+----+-------+---------------+---------------+
| id | name  | dect          | col4          |
+----+-------+---------------+---------------+
| 1  | kyle1 | this is desc1 | NULL          |
| 2  | kyle2 | this is desc2 | this is two   |
| 3  | kyle3 | this is desc3 | this is three |
+----+-------+---------------+---------------+

With `ignore` mode — rows where column counts don't match are discarded entirely:

DROP TABLE test_mismatch01;
CREATE EXTERNAL TABLE IF NOT EXISTS test_mismatch01
(
  id STRING,
  name STRING,
  dect STRING,
  col4 STRING
)
STORED BY 'com.aliyun.odps.CsvStorageHandler'
WITH serdeproperties ('odps.sql.text.schema.mismatch.mode' = 'ignore')
LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/demo/';

SELECT * FROM test_mismatch01;

Result — only the row where the column count matches the schema is returned:

+----+-------+----------------+-------------+
| id | name  | dect           | col4        |
+----+-------+----------------+-------------+
| 2  | kyle2 | this is desc2  | this is two |
+----+-------+----------------+-------------+

Example 6: Open source parser with header and footer skipping

This example uses the OpenCSVSerde parser to read a comma-separated file and skip the first and last rows.

  1. Upload test.csv to the demo-test folder in the oss-mc-test bucket.

  2. Create the external table.

    CREATE EXTERNAL TABLE ext_csv_test08
    (
      vehicleId INT,
      recordId INT,
      patientId INT,
      calls INT,
      locationLatitute DOUBLE,
      locationLongtitue DOUBLE,
      recordTime STRING,
      direction STRING
    )
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
    WITH serdeproperties (
      "separatorChar" = ","
    )
    STORED AS TEXTFILE
    LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/***/'
    TBLPROPERTIES (
      "skip.header.line.COUNT"="1",
      "skip.footer.line.COUNT"="1"
    );
  3. Query the table.

    SELECT * FROM ext_csv_test08;

    -- Only 8 rows were read, and the header and footer rows were ignored.

    +------------+------------+------------+------------+------------------+-------------------+------------+------------+
    | vehicleid  | recordid   | patientid  | calls      | locationlatitute | locationlongtitue | recordtime | direction  |
    +------------+------------+------------+------------+------------------+-------------------+------------+------------+
    | 1          | 1          | 51         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S      |
    | 1          | 2          | 13         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE     |
    | 1          | 3          | 48         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE     |
    | 1          | 4          | 30         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | W      |
    | 1          | 5          | 47         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S      |
    | 1          | 6          | 9          | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | S      |
    | 1          | 7          | 53         | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | N      |
    | 1          | 8          | 63         | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | SW     |
    | 1          | 9          | 4          | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | NE     |
    | 1          | 10         | 31         | 1          | 46.81006         | -92.08174         | 9/15/2014 0:00 | N      |
    +------------+------------+------------+------------+------------------+-------------------+------------+------------+

Example 7: Custom date and time formats

This example creates a CSV external table with custom parse and output formats for each time data type.

  1. Create the table with custom date and time formats.

    CREATE EXTERNAL TABLE test_csv
    (
      col_date DATE,
      col_datetime DATETIME,
      col_timestamp TIMESTAMP,
      col_timestamp_ntz TIMESTAMP_NTZ
    )
    STORED BY 'com.aliyun.odps.CsvStorageHandler'
    LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/demo/'
    TBLPROPERTIES (
      'odps.text.option.date.io.format' = 'MM/dd/yyyy',
      'odps.text.option.datetime.io.format' = 'yyyy-MM-dd-HH-mm-ss x',
      'odps.text.option.timestamp.io.format' = 'yyyy-MM-dd HH-mm-ss VV',
      'odps.text.option.timestamp_ntz.io.format' = 'yyyy-MM-dd HH:mm:ss.SS'
    );
    
    INSERT OVERWRITE test_csv
      VALUES(DATE'2025-02-21', DATETIME'2025-02-21 08:30:00', TIMESTAMP'2025-02-21 12:30:00', TIMESTAMP_NTZ'2025-02-21 16:30:00.123456789');
  2. The CSV file content after the insert:

    02/21/2025,2025-02-21-08-30-00 +08,2025-02-21 12-30-00 Asia/Shanghai,2025-02-21 16:30:00.12
  3. Read back to verify.

    SELECT * FROM test_csv;

    Expected output:

    +------------+---------------------+---------------------+------------------------+
    | col_date   | col_datetime        | col_timestamp       | col_timestamp_ntz      |
    +------------+---------------------+---------------------+------------------------+
    | 2025-02-21 | 2025-02-21 08:30:00 | 2025-02-21 12:30:00 | 2025-02-21 16:30:00.12 |
    +------------+---------------------+---------------------+------------------------+

Example 8: BadRowSkipping

This example shows the three BadRowSkipping configurations using a CSV file with dirty data.

Data preparation: Upload csv_bad_row_skipping.csv to the oss-mc-test/badrow/ directory in OSS.

Create three tables, one for each configuration:

-- No table-level parameter: behavior is controlled by the session-level flag.
CREATE EXTERNAL TABLE test_csv_bad_data_skipping_flag
(
  a INT,
  b INT
)
STORED BY 'com.aliyun.odps.CsvStorageHandler'
WITH serdeproperties (
  'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
)
LOCATION '<oss://databucketpath>';

-- Flexible: skipping is on by default but can be disabled at the session level.
CREATE EXTERNAL TABLE test_csv_bad_data_skipping_flexible
(
  a INT,
  b INT
)
STORED BY 'com.aliyun.odps.CsvStorageHandler'
WITH serdeproperties (
  'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
)
LOCATION '<oss://databucketpath>'
TBLPROPERTIES (
  'odps.text.option.bad.row.skipping' = 'flexible'
);

-- Rigid: skipping is always on and cannot be disabled at the session level.
CREATE EXTERNAL TABLE test_csv_bad_data_skipping_rigid
(
  a INT,
  b INT
)
STORED BY 'com.aliyun.odps.CsvStorageHandler'
WITH serdeproperties (
  'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
)
LOCATION '<oss://databucketpath>'
TBLPROPERTIES (
  'odps.text.option.bad.row.skipping' = 'rigid'
);

Verify results — set session-level flags and run each query:

-- Enable bad row skipping at the session level.
SET odps.sql.unstructured.text.bad.row.skipping=on;

-- Disable bad row skipping at the session level.
-- For 'flexible' tables, this disables skipping. For 'rigid' tables, this has no effect.
SET odps.sql.unstructured.text.bad.row.skipping=off;

-- Print details of up to 10 problematic rows to stdout in Logview.
SET odps.sql.unstructured.text.bad.row.skipping.debug.num=10;

The table without a table-level parameter and the flexible table both return an error when skipping is disabled (off) or not set:

FAILED: ODPS-0123131:User defined function exception

The rigid table always skips bad rows, returning only the valid rows regardless of session flags:

+------------+------------+
| a          | b          |
+------------+------------+
| 1          | 26         |
| 5          | 37         |
+------------+------------+

FAQ

A "column count mismatch" error occurs when reading CSV or TSV data

If the number of columns in a CSV or TSV file doesn't match the external table DDL, MaxCompute reports an error similar to:

FAILED: ODPS-0123131:User defined function exception - Traceback:java.lang.RuntimeException: SCHEMA MISMATCH:xxx

Set odps.sql.text.schema.mismatch.mode at the session level to control the behavior:

  • SET odps.sql.text.schema.mismatch.mode=truncate — if the file has more columns than the DDL, extra data is discarded; if fewer, the missing columns are filled with NULL.

  • SET odps.sql.text.schema.mismatch.mode=error — reports an error when column counts don't match (default behavior).