Use CSV or TSV files stored in Object Storage Service (OSS) as MaxCompute external tables to query, write, and analyze data without moving it into MaxCompute internal storage.
Limitations
OSS external tables do not support the cluster property.
A single file cannot exceed 2 GB. Split files larger than 2 GB before using them.
MaxCompute and OSS must be in the same region.
Choose a parser
MaxCompute provides two built-in parsers for CSV and TSV external tables:
| Parser | Class name | When to use |
|---|---|---|
| Built-in text data parser | com.aliyun.odps.CsvStorageHandler (CSV) / com.aliyun.odps.TsvStorageHandler (TSV) | Standard use. Supports GZIP compression, Smart Parse, and TIMESTAMP_NTZ. |
| Built-in open source data parser | org.apache.hadoop.hive.serde2.OpenCSVSerde | Hive-compatible workloads. Supports SNAPPY and LZO compression. |
Supported data types
For details on MaxCompute data types, see Data types V1.0 and Data types V2.0.
| Data type | Built-in text data parser | OpenCSVSerde |
|---|---|---|
| TINYINT | Supported | Supported |
| SMALLINT | Supported | Supported |
| INT | Supported | Supported |
| BIGINT | Supported | Supported |
| BINARY | Not supported | Not supported |
| FLOAT | Supported | Supported |
| DOUBLE | Supported | Supported |
| DECIMAL(precision,scale) | Supported | Supported |
| VARCHAR(n) | Supported | Supported |
| CHAR(n) | Supported | Supported |
| STRING | Supported | Supported |
| DATE | Supported | Supported |
| DATETIME | Supported | Supported |
| TIMESTAMP | Supported | Supported |
| TIMESTAMP_NTZ | Supported | Not supported |
| BOOLEAN | Supported | Supported |
| ARRAY | Not supported | Not supported |
| MAP | Not supported | Not supported |
| STRUCT | Not supported | Not supported |
| JSON | Not supported | Not supported |
Supported compression formats
| Compression format | Built-in text data parser | OpenCSVSerde |
|---|---|---|
| GZIP | Supported | Supported |
| SNAPPY | Not supported | Supported |
| LZO | Not supported | Supported |
Add the WITH serdeproperties clause to the CREATE TABLE statement to enable compression. See serdeproperties parameters for the specific properties.
Schema evolution
The schema of a CSV or TSV external table is mapped to file columns by position.
| Operation | Supported | Notes |
|---|---|---|
| Add columns | Yes | Cannot add columns with default values or complex/nested types. |
| Delete columns | Yes | Not recommended — may cause a schema-data mismatch. |
| Change column order | Yes | Not recommended — may cause a schema-data mismatch. |
| Change column data types | Yes | See Change column data types for the conversion table. |
| Change column names | Yes | |
| Change column comments | Yes | Comments must be no more than 1,024 bytes. |
| Change the NOT NULL property | No | All columns are Nullable by default. |
Handle column count mismatches
If an OSS file has a different number of columns than the external table schema, set odps.sql.text.schema.mismatch.mode to control how mismatched rows are handled:
| Value | Rows with more columns than the schema | Rows with fewer columns than the schema |
|---|---|---|
error (default) | Reports an error | Reports an error |
truncate | Extra columns are discarded | Missing columns are filled with null |
ignore | The entire row is discarded | The entire row is discarded |
This parameter has no effect whenodps.text.option.use.quoteis set toTrue.
Prerequisites
Before you begin, ensure that you have:
An OSS bucket in the same region as your MaxCompute project — see Create a bucket and Manage folders (MaxCompute can also create OSS folders automatically)
Authorization to access OSS (Alibaba Cloud account, Resource Access Management (RAM) user, or RAM role) — see Authorize access in STS mode for OSS
The
CreateTablepermission in the MaxCompute project — see MaxCompute permissions
Create an external table
Syntax
Built-in text data parser — CSV format
CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name>
(
<col_name> <data_type>,
...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)]
STORED BY 'com.aliyun.odps.CsvStorageHandler'
[WITH serdeproperties (
['<property_name>'='<property_value>',...]
)]
LOCATION '<oss_location>'
[tblproperties ('<tbproperty_name>'='<tbproperty_value>',...)]Built-in text data parser — TSV format
CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name>
(
<col_name> <data_type>,
...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)]
STORED BY 'com.aliyun.odps.TsvStorageHandler'
[WITH serdeproperties (
['<property_name>'='<property_value>',...]
)]
LOCATION '<oss_location>'
[tblproperties ('<tbproperty_name>'='<tbproperty_value>',...)]Built-in open source data parser
CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name>
(
<col_name> <data_type>,
...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)]
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
[WITH serdeproperties (
['<property_name>'='<property_value>',...]
)]
STORED AS TEXTFILE
LOCATION '<oss_location>'
[tblproperties ('<tbproperty_name>'='<tbproperty_value>',...)]For common parameters shared across all external table types, see Basic syntax parameters.
serdeproperties parameters
Built-in text data parser (CsvStorageHandler / TsvStorageHandler)
| Parameter | Applicable to | Description | Values | Default |
|---|---|---|---|---|
odps.text.option.gzip.input.enabled | Read | Enables reading from GZIP-compressed CSV or TSV files. Set to True or MaxCompute cannot read the file. | True / False | False |
odps.text.option.gzip.output.enabled | Write | Enables writing to OSS in GZIP format. If not set, data is written uncompressed. | True / False | False |
odps.text.option.header.lines.count | Read | Number of rows to skip from the start of the file (for example, to skip a header row). | Non-negative integer | 0 |
odps.text.option.null.indicator | Read | Parses a specific string as NULL. For example, set to \\N to parse \N in the file as NULL (the first \ is an escape character), so a,\N,b becomes a, NULL, b. Only one string can be specified. | String | Empty string |
odps.text.option.ignore.empty.lines | Read | If True, empty lines in the file are skipped. | True / False | True |
odps.text.option.encoding | Read | Character encoding of the OSS file. Must match the actual encoding or MaxCompute cannot read the data. | UTF-8 / UTF-16 / US-ASCII / GBK | UTF-8 |
odps.text.option.delimiter | Read | Column delimiter character. Must match the actual delimiter in the file or columns will be misaligned. | Single character | , (comma) |
odps.text.option.use.quote | Read | When a CSV field contains a line break, a double quote (escaped as ""), or a comma, the field must be enclosed in double quotes. Set to True to enable this detection. When enabled, odps.sql.text.schema.mismatch.mode has no effect. | True / False | False |
odps.sql.text.option.flush.header | Write | Writes the table header as the first row of each file block. Takes effect only for CSV format. | True / False | False |
odps.sql.text.schema.mismatch.mode | Read | Controls behavior when the number of columns in the OSS file does not match the external table schema. See Handle column count mismatches. Has no effect when odps.text.option.use.quote is True. | error / truncate / ignore | error |
Built-in open source data parser (OpenCSVSerde)
| Parameter | Applicable to | Description | Values | Default |
|---|---|---|---|---|
separatorChar | Read / Write | Column delimiter for CSV data. | Single character | , (comma) |
quoteChar | Read / Write | Quote character for fields that contain the delimiter. | Single character | None |
escapeChar | Read / Write | Escape character for special characters. | Single character | None |
tblproperties parameters
Built-in text data parser (CsvStorageHandler / TsvStorageHandler)
| Parameter | Applicable to | Description | Values | Default |
|---|---|---|---|---|
odps.text.option.smart.parse.level | Read | Controls the numeric parsing strictness. Set to naive to restrict parsing to basic number strings only (such as "123" and "123.456"). Any other format returns an error. By default, flexible parsing is used. See Smart Parse. | naive | (flexible) |
odps.text.option.date.io.format | Read / Write | Custom parse and output format for DATE columns. Separate multiple parsing formats with #. The first format is used for output. Custom formats take priority over built-in formats. See DateTimeFormatter. | Format string | (built-in formats) |
odps.text.option.datetime.io.format | Read / Write | Custom parse and output format for DATETIME columns. Same rules as above. | Format string | (built-in formats) |
odps.text.option.timestamp.io.format | Read / Write | Custom parse and output format for TIMESTAMP columns. Same rules as above. | Format string | (built-in formats) |
odps.text.option.timestamp_ntz.io.format | Read / Write | Custom parse and output format for TIMESTAMP_NTZ columns. Same rules as above. | Format string | (built-in formats) |
Built-in open source data parser (OpenCSVSerde)
| Parameter | Applicable to | Description | Values | Default |
|---|---|---|---|---|
skip.header.line.count | Read | Number of rows to skip from the start of the file. | Non-negative integer | None |
skip.footer.line.count | Read | Number of rows to skip from the end of the file. | Non-negative integer | None |
mcfed.mapreduce.output.fileoutputformat.compress | Write | Enables writing TEXTFILE data to OSS in a compressed format. | True / False | False |
mcfed.mapreduce.output.fileoutputformat.compress.codec | Write | Sets the compression codec for TEXTFILE output. Only the four listed values are supported. | com.hadoop.compression.lzo.LzoCodec / com.hadoop.compression.lzo.LzopCodec / org.apache.hadoop.io.compress.SnappyCodec / com.aliyun.odps.io.compress.SnappyRawCodec | None |
io.compression.codecs | Read | Required when the OSS file is in Raw-Snappy format. Set to com.aliyun.odps.io.compress.SnappyRawCodec. | com.aliyun.odps.io.compress.SnappyRawCodec | None |
odps.text.option.bad.row.skipping | Read | Controls how dirty data rows are handled. See BadRowSkipping. | rigid / flexible | None |
Write data
Use INSERT OVERWRITE or INSERT INTO on the external table to write data to OSS. For the full write syntax, see Write syntax.
Query and analysis
For the SELECT syntax, see Query syntax.
For query plan optimization, see Query optimization.
BadRowSkipping
BadRowSkipping lets you skip rows that cause parse errors in CSV data. Enabling or disabling this feature does not affect how the underlying data format is parsed.
Configure bad row skipping
Control the behavior using a combination of a table-level parameter and a session-level parameter.
Table-level parameter: `odps.text.option.bad.row.skipping` (set in tblproperties)
rigid— skipping is always on and cannot be disabled at the session level.flexible— skipping is on by default but can be overridden at the session level.Not set — session-level settings determine the behavior.
Session-level parameters
| Parameter | Value | Effect |
|---|---|---|
odps.sql.unstructured.text.bad.row.skipping | on | Enables bad row skipping for the session. If the table-level parameter is not set, skipping is automatically enabled. |
off | Disables bad row skipping. If the table-level parameter is flexible, skipping is disabled. If it is rigid, this setting has no effect. | |
<null> or invalid | Behavior is controlled entirely by the table-level parameter. | |
odps.sql.unstructured.text.bad.row.skipping.debug.num | Integer (max 1,000) | Prints details of problematic rows to stdout in Logview, up to the specified number. Set to 0 or a negative number to disable printing. |
Interaction between table-level and session-level parameters
Table-level (tblproperties) | Session-level (SET) | Result |
|---|---|---|
rigid | on | On — forcibly enabled |
rigid | off | On — forcibly enabled (rigid takes precedence) |
rigid | <null> or not set | On — forcibly enabled |
flexible | on | On |
flexible | off | Off — disabled by the session |
flexible | <null> or not set | On |
| Not set | on | On — enabled by the session |
| Not set | off | Off |
| Not set | <null> or not set |
Debug bad rows
To inspect rows that were skipped, set odps.sql.unstructured.text.bad.row.skipping.debug.num before running your query:
SET odps.sql.unstructured.text.bad.row.skipping.debug.num=10;
SELECT * FROM <your_table>;Details of up to the specified number of problematic rows are printed to stdout in Logview. The maximum value is 1,000.
Smart Parse
Smart Parse enables MaxCompute SQL to parse a wider range of input formats for CSV data when using data types V2.0. This applies to the built-in text data parser only.
Numeric types
For INT, SMALLINT, TINYINT, BIGINT, FLOAT, DOUBLE, and DECIMAL, flexible parsing is enabled by default. The following input formats are accepted in addition to standard number strings:
Thousands separator:
"1,234,567"— if using comma as the thousands separator, change the CSV column delimiter to a different character usingodps.text.option.delimiter.Scientific notation:
"0.3e2","-1e5"Hexadecimal:
"0xff"(case-insensitive)Binary:
"0b1001"(case-insensitive)Fraction (integer result only):
"4/2"Percentage (integer result only):
"1000%"Permille (integer result only):
"1000‰"With currency symbol:
"1,000 $","$ 1,000"— supported symbols:$/¥/€/£/₩/USD/CNY/EUR/GBP/JPY/KRW/IDR/RPK8s-style base-1000 units: K, M, G, P, T — for example,
"3M"= 3,000,000K8s-style base-1024 units: Ki, Mi, Gi, Pi, Ti — for example,
"2Gi"= 2,147,483,648
To restrict parsing to basic number strings only (such as "123" and "123.456"), set odps.text.option.smart.parse.level to naive in tblproperties. Any other format returns an error.
BOOLEAN
| Input | Output |
|---|---|
"true" / "false" | "true" / "false" |
"T" / "F" | "true" / "false" |
"1" / "0" | "true" / "false" |
"Yes" / "No" | "true" / "false" |
"Y" / "N" | "true" / "false" |
"" (empty string) | NULL |
Input is trimmed with trim() during parsing. If the value is not in the list above, parsing fails.
Date and time types
All four date/time types — DATE, DATETIME, TIMESTAMP, and TIMESTAMP_NTZ — use java.time.format.DateTimeFormatter internally.
Built-in parsing formats:
| Type | Accepted input formats |
|---|---|
| DATE | "yyyy-MM-dd", "yyyyMMdd", "MMM d,yyyy", "MMMM d,yyyy", "yyyy-M-d", "" → NULL |
| TIMESTAMP_NTZ | "yyyy-MM-dd HH:mm:ss[.SSSSSSSSS]", "yyyy-MM-ddTHH:mm:ss[.SSSSSSSSS]", "yyyyMMddHHmmss", "" → NULL |
| DATETIME | "yyyy-MM-dd HH:mm:ss[.SSS][x]", "yyyy-MM-ddTHH:mm:ss[.SSS][x]", "yyyyMMddHHmmss[x]", "" → NULL |
| TIMESTAMP | "yyyy-MM-dd HH:mm:ss[.SSSSSSSSS][x]", "yyyy-MM-ddTHH:mm:ss[.SSSSSSSSS][x]", "yyyyMMddHHmmss[x]", "" → NULL |
Custom formats:
Set odps.text.option.<date|datetime|timestamp|timestamp_ntz>.io.format in tblproperties to define custom parsing and output formats:
Separate multiple parsing formats with
#. For example:'pattern1#pattern2#pattern3'The first format is used for output.
Custom formats take priority over built-in formats.
For format syntax, see DateTimeFormatter.
Time zone format:
Use 'x' (zone-offset) or 'VV' (time-zone ID) as the time zone pattern. Avoid 'z' (time-zone name) — it is ambiguous in some contexts. For example, 'CST' is parsed by DateTimeFormatter as US Central Standard Time (UTC-6), not China Standard Time (UTC+8).
General rules:
An empty string in a CSV file is parsed as NULL for all data types.
Complex types (STRUCT, ARRAY, MAP) are not supported because their values can conflict with common CSV separators.
BINARY and INTERVAL types are not supported. Contact MaxCompute technical support to use these types.
DATETIME and TIMESTAMP output is time-zone-sensitive. Use
odps.sql.timezoneto set the system time zone.DATE and TIMESTAMP_NTZ output is not affected by time zone changes.
Examples
Example 1: Non-partitioned table (built-in text data parser)
This example maps an external table to the Demo1/ folder in OSS.
Create the external table.
CREATE EXTERNAL TABLE IF NOT EXISTS mc_oss_csv_external1 ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongtitue DOUBLE, recordTime STRING, direction STRING ) STORED BY 'com.aliyun.odps.CsvStorageHandler' WITH serdeproperties ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo1/'; -- View the schema of the created table. DESC EXTENDED mc_oss_csv_external1;Replace
aliyunodpsdefaultrolewith the name of your role if using a different role, and grant that role permissions to access OSS.Query the table.
SELECT * FROM mc_oss_csv_external1;Expected output:
+------------+------------+------------+------------+------------------+-------------------+----------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongtitue | recordtime | direction | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+ | 1 | 1 | 51 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | | 1 | 2 | 13 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | | 1 | 3 | 48 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | | 1 | 4 | 30 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | W | | 1 | 5 | 47 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | | 1 | 6 | 9 | 1 | 46.81006 | -92.08174 | 9/15/2014 0:00 | S | | 1 | 7 | 53 | 1 | 46.81006 | -92.08174 | 9/15/2014 0:00 | N | | 1 | 8 | 63 | 1 | 46.81006 | -92.08174 | 9/15/2014 0:00 | SW | | 1 | 9 | 4 | 1 | 46.81006 | -92.08174 | 9/15/2014 0:00 | NE | | 1 | 10 | 31 | 1 | 46.81006 | -92.08174 | 9/15/2014 0:00 | N | +------------+------------+------------+------------+------------------+-------------------+------------+----------------+Write a row and verify.
INSERT INTO mc_oss_csv_external1 VALUES(1,12,76,1,46.81006,-92.08174,'9/14/2014 0:10','SW'); SELECT * FROM mc_oss_csv_external1 WHERE recordId=12;Expected output:
+------------+------------+------------+------------+------------------+-------------------+----------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongtitue | recordtime | direction | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+ | 1 | 12 | 76 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:10 | SW | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+A new file is generated in the
Demo1/path in OSS.
Example 2: Partitioned table (built-in text data parser)
This example maps a partitioned external table to the Demo2/ folder in OSS.
Create the external table.
CREATE EXTERNAL TABLE IF NOT EXISTS mc_oss_csv_external2 ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongtitue DOUBLE, recordTime STRING ) PARTITIONED BY ( direction STRING ) STORED BY 'com.aliyun.odps.CsvStorageHandler' WITH serdeproperties ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo2/'; -- View the schema of the created table. DESC EXTENDED mc_oss_csv_external2;Import partition data. For partitioned OSS external tables, run
MSCK REPAIR TABLEto populate the partitions.MSCK REPAIR TABLE mc_oss_csv_external2 ADD PARTITIONS; -- Equivalent to: ALTER TABLE mc_oss_csv_external2 ADD PARTITION (direction = 'N') PARTITION (direction = 'NE') PARTITION (direction = 'S') PARTITION (direction = 'SW') PARTITION (direction = 'W');For details, see Populate partitions for an OSS external table.
Query the partitioned table.
SELECT * FROM mc_oss_csv_external2 WHERE direction='NE';Expected output:
+------------+------------+------------+------------+------------------+-------------------+----------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongtitue | recordtime | direction | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+ | 1 | 2 | 13 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | | 1 | 3 | 48 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | | 1 | 9 | 4 | 1 | 46.81006 | -92.08174 | 9/15/2014 0:00 | NE | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+Write a row and verify.
INSERT INTO mc_oss_csv_external2 PARTITION(direction='NE') VALUES(1,12,76,1,46.81006,-92.08174,'9/14/2014 0:10'); SELECT * FROM mc_oss_csv_external2 WHERE direction='NE' AND recordId=12;Expected output:
+------------+------------+------------+------------+------------------+-------------------+----------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongtitue | recordtime | direction | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+ | 1 | 12 | 76 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:10 | NE | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+A new file is generated in the
Demo2/direction=NEpath in OSS.
Example 3: GZIP-compressed CSV table (built-in text data parser)
This example creates a GZIP-compressed external table in CSV format and performs read and write operations.
Create an internal table and insert test data.
CREATE TABLE vehicle_test( vehicleid INT, recordid INT, patientid INT, calls INT, locationlatitute DOUBLE, locationlongtitue DOUBLE, recordtime STRING, direction STRING ); INSERT INTO vehicle_test VALUES (1,1,51,1,46.81006,-92.08174,'9/14/2014 0:00','S');Create a GZIP-compressed external table mapped to the
Demo3/folder in OSS.CREATE EXTERNAL TABLE IF NOT EXISTS mc_oss_csv_external3 ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongtitue DOUBLE, recordTime STRING, direction STRING ) PARTITIONED BY (dt STRING) STORED BY 'com.aliyun.odps.CsvStorageHandler' WITH serdeproperties ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole', 'odps.text.option.gzip.input.enabled'='true', 'odps.text.option.gzip.output.enabled'='true' ) LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo3/'; -- Import partition data. MSCK REPAIR TABLE mc_oss_csv_external3 ADD PARTITIONS; -- View the schema. DESC EXTENDED mc_oss_csv_external3;Read data from OSS.
If the compressed data is in an open source data format, prepend
SET odps.sql.hive.compatible=true;and submit it together with the SQL statement.-- Enable a full table scan for this session. SET odps.sql.allow.fullscan=true; SELECT recordId, patientId, direction FROM mc_oss_csv_external3 WHERE patientId > 25;Expected output:
+------------+------------+------------+ | recordid | patientid | direction | +------------+------------+------------+ | 1 | 51 | S | | 3 | 48 | NE | | 4 | 30 | W | | 5 | 47 | S | | 7 | 53 | N | | 8 | 63 | SW | | 10 | 31 | N | +------------+------------+------------+Write from the internal table to the external table.
INSERT INTO TABLE mc_oss_csv_external3 PARTITION (dt='20250418') SELECT * FROM vehicle_test;After the statement completes, the exported file appears in the OSS folder.
Example 4: Write a table header to OSS
This example creates an external table that writes the column names as the first row of each file block.
-- Create the external table.
CREATE EXTERNAL TABLE mf_oss_wtt
(
id BIGINT,
name STRING,
tran_amt DOUBLE
)
STORED BY 'com.aliyun.odps.CsvStorageHandler'
WITH serdeproperties (
'odps.text.option.header.lines.count' = '1',
'odps.sql.text.option.flush.header' = 'true'
)
LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo11/';
-- Insert data.
INSERT OVERWRITE TABLE mf_oss_wtt VALUES (1, 'val1', 1.1),(2, 'value2', 1.3);
-- Query data.
-- If you did not set all column types to STRING, set 'odps.text.option.header.lines.count' = '1'
-- to skip the header row when reading.
SELECT * FROM mf_oss_wtt;Expected output:
+----------+--------+------------+
| id | name | tran_amt |
+----------+--------+------------+
| 1 | val1 | 1.1 |
| 2 | value2 | 1.3 |
+----------+--------+------------+Example 5: Handle column count mismatches
This example shows how odps.sql.text.schema.mismatch.mode affects rows where the file has a different number of columns than the schema.
The test.csv file contains:
1,kyle1,this is desc1
2,kyle2,this is desc2,this is two
3,kyle3,this is desc3,this is three, I have 4 columnsWith `truncate` mode — extra columns are discarded; missing columns are filled with null:
DROP TABLE test_mismatch;
CREATE EXTERNAL TABLE IF NOT EXISTS test_mismatch
(
id string,
name string,
dect string,
col4 string
)
STORED BY 'com.aliyun.odps.CsvStorageHandler'
WITH serdeproperties ('odps.sql.text.schema.mismatch.mode' = 'truncate')
LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/demo/';
SELECT * FROM test_mismatch;Result — row 1 has fewer columns than the schema, so col4 is NULL; row 3 has more columns, so the extra column is discarded:
+----+-------+---------------+---------------+
| id | name | dect | col4 |
+----+-------+---------------+---------------+
| 1 | kyle1 | this is desc1 | NULL |
| 2 | kyle2 | this is desc2 | this is two |
| 3 | kyle3 | this is desc3 | this is three |
+----+-------+---------------+---------------+With `ignore` mode — rows where column counts don't match are discarded entirely:
DROP TABLE test_mismatch01;
CREATE EXTERNAL TABLE IF NOT EXISTS test_mismatch01
(
id STRING,
name STRING,
dect STRING,
col4 STRING
)
STORED BY 'com.aliyun.odps.CsvStorageHandler'
WITH serdeproperties ('odps.sql.text.schema.mismatch.mode' = 'ignore')
LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/demo/';
SELECT * FROM test_mismatch01;Result — only the row where the column count matches the schema is returned:
+----+-------+----------------+-------------+
| id | name | dect | col4 |
+----+-------+----------------+-------------+
| 2 | kyle2 | this is desc2 | this is two |
+----+-------+----------------+-------------+Example 6: Open source parser with header and footer skipping
This example uses the OpenCSVSerde parser to read a comma-separated file and skip the first and last rows.
Upload test.csv to the
demo-testfolder in theoss-mc-testbucket.Create the external table.
CREATE EXTERNAL TABLE ext_csv_test08 ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongtitue DOUBLE, recordTime STRING, direction STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH serdeproperties ( "separatorChar" = "," ) STORED AS TEXTFILE LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/***/' TBLPROPERTIES ( "skip.header.line.COUNT"="1", "skip.footer.line.COUNT"="1" );Query the table.
SELECT * FROM ext_csv_test08;-- Only 8 rows were read, and the header and footer rows were ignored.
+------------+------------+------------+------------+------------------+-------------------+------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongtitue | recordtime | direction | +------------+------------+------------+------------+------------------+-------------------+------------+------------+ | 1 | 1 | 51 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | | 1 | 2 | 13 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | | 1 | 3 | 48 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | | 1 | 4 | 30 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | W | | 1 | 5 | 47 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | | 1 | 6 | 9 | 1 | 46.81006 | -92.08174 | 9/15/2014 0:00 | S | | 1 | 7 | 53 | 1 | 46.81006 | -92.08174 | 9/15/2014 0:00 | N | | 1 | 8 | 63 | 1 | 46.81006 | -92.08174 | 9/15/2014 0:00 | SW | | 1 | 9 | 4 | 1 | 46.81006 | -92.08174 | 9/15/2014 0:00 | NE | | 1 | 10 | 31 | 1 | 46.81006 | -92.08174 | 9/15/2014 0:00 | N | +------------+------------+------------+------------+------------------+-------------------+------------+------------+
Example 7: Custom date and time formats
This example creates a CSV external table with custom parse and output formats for each time data type.
Create the table with custom date and time formats.
CREATE EXTERNAL TABLE test_csv ( col_date DATE, col_datetime DATETIME, col_timestamp TIMESTAMP, col_timestamp_ntz TIMESTAMP_NTZ ) STORED BY 'com.aliyun.odps.CsvStorageHandler' LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/demo/' TBLPROPERTIES ( 'odps.text.option.date.io.format' = 'MM/dd/yyyy', 'odps.text.option.datetime.io.format' = 'yyyy-MM-dd-HH-mm-ss x', 'odps.text.option.timestamp.io.format' = 'yyyy-MM-dd HH-mm-ss VV', 'odps.text.option.timestamp_ntz.io.format' = 'yyyy-MM-dd HH:mm:ss.SS' ); INSERT OVERWRITE test_csv VALUES(DATE'2025-02-21', DATETIME'2025-02-21 08:30:00', TIMESTAMP'2025-02-21 12:30:00', TIMESTAMP_NTZ'2025-02-21 16:30:00.123456789');The CSV file content after the insert:
02/21/2025,2025-02-21-08-30-00 +08,2025-02-21 12-30-00 Asia/Shanghai,2025-02-21 16:30:00.12Read back to verify.
SELECT * FROM test_csv;Expected output:
+------------+---------------------+---------------------+------------------------+ | col_date | col_datetime | col_timestamp | col_timestamp_ntz | +------------+---------------------+---------------------+------------------------+ | 2025-02-21 | 2025-02-21 08:30:00 | 2025-02-21 12:30:00 | 2025-02-21 16:30:00.12 | +------------+---------------------+---------------------+------------------------+
Example 8: BadRowSkipping
This example shows the three BadRowSkipping configurations using a CSV file with dirty data.
Data preparation: Upload csv_bad_row_skipping.csv to the oss-mc-test/badrow/ directory in OSS.
Create three tables, one for each configuration:
-- No table-level parameter: behavior is controlled by the session-level flag.
CREATE EXTERNAL TABLE test_csv_bad_data_skipping_flag
(
a INT,
b INT
)
STORED BY 'com.aliyun.odps.CsvStorageHandler'
WITH serdeproperties (
'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
)
LOCATION '<oss://databucketpath>';
-- Flexible: skipping is on by default but can be disabled at the session level.
CREATE EXTERNAL TABLE test_csv_bad_data_skipping_flexible
(
a INT,
b INT
)
STORED BY 'com.aliyun.odps.CsvStorageHandler'
WITH serdeproperties (
'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
)
LOCATION '<oss://databucketpath>'
TBLPROPERTIES (
'odps.text.option.bad.row.skipping' = 'flexible'
);
-- Rigid: skipping is always on and cannot be disabled at the session level.
CREATE EXTERNAL TABLE test_csv_bad_data_skipping_rigid
(
a INT,
b INT
)
STORED BY 'com.aliyun.odps.CsvStorageHandler'
WITH serdeproperties (
'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
)
LOCATION '<oss://databucketpath>'
TBLPROPERTIES (
'odps.text.option.bad.row.skipping' = 'rigid'
);Verify results — set session-level flags and run each query:
-- Enable bad row skipping at the session level.
SET odps.sql.unstructured.text.bad.row.skipping=on;
-- Disable bad row skipping at the session level.
-- For 'flexible' tables, this disables skipping. For 'rigid' tables, this has no effect.
SET odps.sql.unstructured.text.bad.row.skipping=off;
-- Print details of up to 10 problematic rows to stdout in Logview.
SET odps.sql.unstructured.text.bad.row.skipping.debug.num=10;The table without a table-level parameter and the flexible table both return an error when skipping is disabled (off) or not set:
FAILED: ODPS-0123131:User defined function exceptionThe rigid table always skips bad rows, returning only the valid rows regardless of session flags:
+------------+------------+
| a | b |
+------------+------------+
| 1 | 26 |
| 5 | 37 |
+------------+------------+FAQ
A "column count mismatch" error occurs when reading CSV or TSV data
If the number of columns in a CSV or TSV file doesn't match the external table DDL, MaxCompute reports an error similar to:
FAILED: ODPS-0123131:User defined function exception - Traceback:java.lang.RuntimeException: SCHEMA MISMATCH:xxxSet odps.sql.text.schema.mismatch.mode at the session level to control the behavior:
SET odps.sql.text.schema.mismatch.mode=truncate— if the file has more columns than the DDL, extra data is discarded; if fewer, the missing columns are filled with NULL.SET odps.sql.text.schema.mismatch.mode=error— reports an error when column counts don't match (default behavior).