This topic describes how to create, read, and write ORC-formatted OSS external tables.
Prerequisites
The Alibaba Cloud account or RAM user is granted permissions to access OSS. Alibaba Cloud accounts, RAM users, and RAM roles can access OSS external tables. For more information about authorization, see STS authorization.
A MaxCompute project is created.
MaxCompute is deployed only in specific regions. To prevent a cross-region data connectivity issue, we recommend that you use a bucket in the same region as your MaxCompute project.
The Alibaba Cloud account or RAM user is granted the CreateTable permission on your project. For more information about table operation permissions, see MaxCompute permissions.
(Optional) An OSS bucket, OSS directories, and OSS data files are prepared. For more information, see Create a bucket, Manage directories, and Simple upload.
MaxCompute can automatically create an OSS directory in OSS. For SQL statements that include external tables and UDFs, you can execute operations to read and write external tables and UDFs with a single SQL statement. You can also manually create an OSS directory.
Limits
OSS external tables do not support the clustering.
A single file cannot exceed 3 GB. If one does, we recommend that you split it into multiple files.
Supported data types
In the following tables, indicates supported and
indicates not supported.
For more information about MaxCompute data types, see MaxCompute V1.0 data type edition and MaxCompute V2.0 data type edition.
JNI mode (Native ORC Reader is not used when reading tables):
set odps.ext.oss.orc.native=false;
, supports both reading and writing.Native mode (Native ORC Reader is used when reading tables):
set odps.ext.oss.orc.native=true;
, supports only reading.
Data type | JNI mode support | Native mode support |
TINYINT | ||
SMALLINT | ||
INT | ||
BIGINT | ||
BINARY | ||
FLOAT | ||
DOUBLE | ||
DECIMAL(precision,scale) | ||
VARCHAR(n) | ||
CHAR(n) | ||
STRING | ||
DATE | ||
DATETIME | ||
TIMESTAMP | ||
TIMESTAMP_NTZ | ||
BOOLEAN | ||
ARRAY | ||
MAP | ||
STRUCT | ||
JSON |
Compression formats
To read or write compressed OSS files, add the with serdeproperties
attribute to the table creation statement. For more information, see With serdeproperties parameters.
Supported data file formats for reading and writing: ORC files compressed using SNAPPY or ZLIB.
Create an external table
Syntax
For more information about the syntax for creating external tables of different formats, see OSS external tables.
Simplified syntax
CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name> ( <col_name> <data_type>, ... ) [COMMENT <table_comment>] [PARTITIONED BY (<col_name> <data_type>, ...)] STORED AS orc LOCATION '<oss_location>';
Complete syntax
CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name> ( <col_name> <data_type>, ... ) [COMMENT <table_comment>] [PARTITIONED BY (<col_name> <data_type>, ...)] ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' WITH serdeproperties( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) STORED AS orc LOCATION '<oss_location>' tblproperties ( '<xxx>'='<yyy>' );
Common parameters
For more information about common parameters, see Basic syntax parameters.
Specific parameters
With serdeproperties attribute parameters
property_name | Scenario | Description | property_value | Default value |
mcfed.orc.schema.resolution | Add this property when the schemas of data in the same OSS external table are different. | Used to set the ORC file parsing method. | name | By default, parsing is based on column numbers. Equivalent to: |
Tblproperties attribute parameters
property_name | Scenario | Description | property_value | Default value |
mcfed.orc.compress | Add this property to write data in an ORC data file to OSS using a compression algorithm. | ORC compression property. Specifies the compression algorithm of ORC data files. |
| None |
io.compression.codecs | Add this property when the OSS data file is in the raw Snappy format. | If you set this parameter to True, MaxCompute normally reads data from compressed files. Otherwise, the data read operation fails. | com.aliyun.odps.io.compress.SnappyRawCodec | None |
Write data
For more information about the syntax for writing data, see Write syntax.
Query and analyze
For more information about the SELECT syntax, see Syntax.
For more information about how to optimize query plans, see Query optimization.
Scenarios
Create an external table in ORC format with SNAPPY compression, and perform data reading and writing operations.
When you execute the following sample code, replace
<uid>
with your Alibaba Cloud account ID.The role used in the following examples is
aliyunodpsdefaultrole
. To use a different role, replacealiyunodpsdefaultrole
with the name of the target role and grant the target role permissions to access OSS.
Prepare a SNAPPY format data file.
Create the
orc_snappy/dt=20250526
directory hierarchy in theoss-mc-test
bucket in Sample data, and store the Snappy file in the partition directorydt=20250526
.Create an ORC external table with SNAPPY compression format.
CREATE EXTERNAL TABLE orc_data_type_snappy ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongtitue DOUBLE, recordTime STRING, direction STRING ) PARTITIONED BY (dt STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' WITH serdeproperties ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) STORED AS ORC LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/orc_snappy/' tblproperties ( 'mcfed.orc.compress'='SNAPPY'); -- Import partition data MSCK REPAIR TABLE orc_data_type_snappy ADD PARTITIONS;
Read data from the ORC external table.
SELECT * FROM orc_data_type_snappy WHERE dt=20250526 LIMIT 10;
The following result is returned:
+------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongtitue | recordtime | direction | dt | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+ | 1 | 12 | 76 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:10 | SW | 20250526 | | 1 | 1 | 51 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | 20250526 | | 1 | 2 | 13 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:01 | NE | 20250526 | | 1 | 3 | 48 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:02 | NE | 20250526 | | 1 | 4 | 30 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:03 | W | 20250526 | | 1 | 5 | 47 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:04 | S | 20250526 | | 1 | 6 | 9 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:05 | S | 20250526 | | 1 | 7 | 53 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:06 | N | 20250526 | | 1 | 8 | 63 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:07 | SW | 20250526 | | 1 | 9 | 4 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:08 | NE | 20250526 | | 1 | 10 | 31 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:09 | N | 20250526 | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+
Write data to the ORC external table and query.
INSERT INTO orc_data_type_snappy PARTITION (dt ='20250526') VALUES (1,16,76,1,46.81006,-92.08174,'9/14/2014 0:10','SW'); -- Query the newly written data SELECT * FROM orc_data_type_snappy WHERE dt = '20250526' AND recordid=16;
The following result is returned:
+------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongtitue | recordtime | direction | dt | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+ | 1 | 16 | 76 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:10 | SW | 20250526 | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+