All Products
Search
Document Center

MaxCompute:ORC external tables

Last Updated:Mar 26, 2026

ORC external tables let you read and write ORC-formatted data stored in OSS directly from MaxCompute, without loading the data into MaxCompute storage.

Constraints

  • OSS external tables do not support the cluster property.

  • A single file cannot exceed 2 GB. Split larger files before use.

  • MaxCompute and OSS must be in the same region.

Create an external table

Use either the simplified syntax or the full syntax to create an ORC external table.

Simplified syntax (recommended)

Use this syntax when you want MaxCompute to handle authorization automatically using the default RAM role.

CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name>
(
  <col_name> <data_type>,
  ...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)]
STORED AS orc
LOCATION '<oss_location>';

Full syntax

Use this syntax when you need to explicitly configure the SerDe, authorization, or table properties such as compression.

CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name>
(
  <col_name> <data_type>,
  ...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)]
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
WITH serdeproperties(
    'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
)
STORED AS orc
LOCATION '<oss_location>'
tblproperties (
    '<xxx>'='<yyy>'
);

For information about common parameters such as LOCATION and PARTITIONED BY, see Basic syntax parameters.

Schema mismatch behavior

If the ORC file schema and the external table schema do not match, MaxCompute handles the differences as follows:

  • Fewer columns in the file than in the table: Missing columns are filled with NULL.

  • More columns in the file than in the table: Extra columns are discarded.

  • Type mismatch: STRING can read INT data from ORC files, but this is not recommended. INT reading STRING data converts non-numeric values to NULL and accepts numeric values.

WITH serdeproperties

PropertyWhen to useValueDefault
mcfed.orc.schema.resolutionThe ORC files in the same external table have inconsistent schemas.name — maps columns by name.position — maps columns by position.

tblproperties

PropertyWhen to useDescriptionValueDefault
mcfed.orc.compressWrite ORC data to OSS in a compressed format.Compression algorithm for ORC output files.SNAPPY or ZLIBNone
io.compression.codecsOSS data files are in Raw-Snappy format.Enables MaxCompute to read Raw-Snappy compressed data.com.aliyun.odps.io.compress.SnappyRawCodecNone
odps.external.data.output.prefix (also odps.external.data.prefix)Add a custom prefix to output files.Prefix string. Allowed characters: letters, digits, and underscores (a–z, A–Z, 0–9, _). Length: 1–10 characters.Example: mc_None
odps.external.data.enable.extensionControl whether output files include a file extension.True displays the extension; False hides it.True or FalseFalse
odps.external.data.output.suffixAdd a custom suffix to output files.Suffix string. Allowed characters: letters, digits, and underscores.Example: _hangzhouNone
odps.external.data.output.explicit.extensionAdd a custom extension to output files. Takes priority over odps.external.data.enable.extension.Extension string. Allowed characters: letters, digits, and underscores. Length: 1–10 characters.Example: jsonlNone
mcfed.orc.batch.sizeTune memory usage and processing throughput.Number of rows processed per batch (ORC batch size).Non-negative integer1000

Write data

For the write syntax, see Write syntax.

Query data

For the SELECT syntax and query plan optimization, see Query syntax and Query optimization.

Enable predicate pushdown

Predicate pushdown (PPD) improves query performance on ORC external tables by pushing filter conditions down to the data scan layer. PPD requires native mode (odps.ext.oss.orc.native=true).

Add the following statements before your SQL query:

-- Enable the ORC native reader
SET odps.ext.oss.orc.native=true;

-- Enable ORC predicate pushdown
SET odps.storage.orc.use.predicate.pushdown=true;

Example

This example creates an ORC external table with SNAPPY compression, adds an existing partition, reads data, and writes a new row.

Prerequisites

Before you begin, make sure that you have:

Step 1: Prepare the data file

Using the provided sample data, create the folder path orc_snappy/dt=20250526 in the oss-mc-test bucket. Upload the SNAPPY file to the dt=20250526 partition folder.

Step 2: Create the external table

CREATE EXTERNAL TABLE orc_data_type_snappy
(
    vehicleId INT,
    recordId INT,
    patientId INT,
    calls INT,
    locationLatitute DOUBLE,
    locationLongitude DOUBLE,
    recordTime STRING,
    direction STRING
)
PARTITIONED BY (dt STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
WITH serdeproperties (
    'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
)
STORED AS ORC
LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/orc_snappy/'
tblproperties (
    'mcfed.orc.compress'='SNAPPY'
);

Step 3: Add existing partitions

For partitioned external tables, run MSCK REPAIR TABLE to register existing OSS partitions with MaxCompute. For the full syntax, see Adding partitions to an OSS external table.

MSCK REPAIR TABLE orc_data_type_snappy ADD PARTITIONS;

Step 4: Read data

SELECT * FROM orc_data_type_snappy WHERE dt = '20250526' LIMIT 10;

The query returns:

+------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+
| vehicleid  | recordid   | patientid  | calls      | locationlatitute | locationlongitude | recordtime     | direction  | dt         |
+------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+
| 1          | 12         | 76         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:10 | SW         | 20250526   |
| 1          | 1          | 51         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          | 20250526   |
| 1          | 2          | 13         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:01 | NE         | 20250526   |
| 1          | 3          | 48         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:02 | NE         | 20250526   |
| 1          | 4          | 30         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:03 | W          | 20250526   |
| 1          | 5          | 47         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:04 | S          | 20250526   |
| 1          | 6          | 9          | 1          | 46.81006         | -92.08174         | 9/14/2014 0:05 | S          | 20250526   |
| 1          | 7          | 53         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:06 | N          | 20250526   |
| 1          | 8          | 63         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:07 | SW         | 20250526   |
| 1          | 9          | 4          | 1          | 46.81006         | -92.08174         | 9/14/2014 0:08 | NE         | 20250526   |
| 1          | 10         | 31         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:09 | N          | 20250526   |
+------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+

Step 5: Write data

INSERT INTO orc_data_type_snappy PARTITION (dt = '20250526')
  VALUES (1, 16, 76, 1, 46.81006, -92.08174, '9/14/2014 0:10', 'SW');

-- Verify the inserted row
SELECT * FROM orc_data_type_snappy WHERE dt = '20250526' AND recordid = 16;

The query returns:

+------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+
| vehicleid  | recordid   | patientid  | calls      | locationlatitute | locationlongitude | recordtime     | direction  | dt         |
+------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+
| 1          | 16         | 76         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:10 | SW         | 20250526   |
+------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+

Supported data types

MaxCompute supports two modes for reading ORC external tables:

  • JNI mode (SET odps.ext.oss.orc.native=false;): Supports both read and write operations.

  • Native mode (SET odps.ext.oss.orc.native=true;): Supports read operations only. Enables predicate pushdown (PPD).

Data typeJNI mode (read and write)Native mode (read only)
TINYINTSupportedSupported
SMALLINTSupportedSupported
INTSupportedSupported
BIGINTSupportedSupported
BINARYSupportedSupported
FLOATSupportedSupported
DOUBLESupportedSupported
DECIMAL(precision,scale)SupportedSupported
VARCHAR(n)SupportedSupported
CHAR(n)SupportedSupported
STRINGSupportedSupported
DATESupportedSupported
DATETIMENot supportedSupported
TIMESTAMPNot supportedNot supported
TIMESTAMP_NTZSupportedNot supported
BOOLEANSupportedSupported
ARRAYSupportedSupported
MAPSupportedSupported
STRUCTSupportedSupported
JSONNot supportedNot supported

For a complete list of MaxCompute data types, see Data type version 1.0 and Data type version 2.0.

Supported compression formats

To read or write compressed ORC files, add the mcfed.orc.compress property to the tblproperties section when creating the table. MaxCompute supports SNAPPY and ZLIB compression.

For Raw-Snappy files, also add io.compression.codecs with the value com.aliyun.odps.io.compress.SnappyRawCodec.

Schema evolution

ORC external tables support two methods for mapping table columns to ORC file fields: position-based mapping and name-based mapping.

  • Position-based mapping (default): Set 'mcfed.orc.schema.resolution'='position' or omit the property. Columns are matched by their order, so the table column order must exactly match the field order in the ORC file.

  • Name-based mapping: Set 'mcfed.orc.schema.resolution'='name'. Columns are matched by name, independent of order.

The table below shows which schema change operations are compatible with each mapping method. "Compatible" means both newly written data and historical data can be read correctly after the operation.

Schema changePosition-basedName-basedNotes
Add columnSupportedSupportedNew columns are added at the end. Default values apply only to data written from MaxCompute. Historical data without the new column reads as NULL.
Delete columnNot supportedSupportedWith position-based mapping, deleting a column shifts positions and causes read errors on historical data.
Modify column orderNot supportedSupportedWith position-based mapping, reordering columns causes historical data to be read with the wrong schema.
Change column data typeSupportedSupportedFor allowed type conversions, see Change column data type.
Rename columnSupportedNot supportedPosition-based: renaming is compatible; historical data reads correctly. With name-based mapping, the renamed column no longer matches historical data and reads as empty.
Modify column commentSupportedSupportedComment must be a valid string no longer than 1,024 bytes.
Modify column nullabilityNot supportedNot supportedColumns are nullable by default.