SequenceFile external tables let you query and write data stored in Hadoop SequenceFile format on Object Storage Service (OSS) directly from MaxCompute, without loading data into MaxCompute first. This is useful when you have existing Hadoop or Hive workloads and want to access them from MaxCompute, or when you need to share data between systems.
Limitations
| Limitation | Details |
|---|---|
cluster property |
Not supported. Do not include the cluster property when creating a SequenceFile external table. |
| Single file size | Cannot exceed 2 GB. Split files larger than 2 GB before referencing them in an external table. |
| Region | MaxCompute and OSS must be in the same region. |
Supported data types
The following MaxCompute data types are supported for SequenceFile external tables. For the full type specification, see Data types (Version 1.0) and Data types (Version 2.0).
| Data type | Supported | Data type | Supported |
|---|---|---|---|
| TINYINT | Yes | STRING | Yes |
| SMALLINT | Yes | DATE | Yes |
| INT | Yes | DATETIME | Yes |
| BIGINT | Yes | TIMESTAMP | Yes |
| BINARY | Yes | TIMESTAMP_NTZ | Yes |
| FLOAT | Yes | BOOLEAN | Yes |
| DOUBLE | Yes | ARRAY | Yes |
| DECIMAL(precision,scale) | Yes | MAP | Yes |
| VARCHAR(n) | Yes | STRUCT | Yes |
| CHAR(n) | Yes | JSON | Yes |
Create an external table
Syntax
CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name>
(
<col_name> <data_type>,
...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>)]
STORED AS sequencefile
LOCATION '<oss_location>';
Parameters
| Parameter | Required | Description |
|---|---|---|
mc_oss_extable_name |
Yes | The name of the external table in MaxCompute. |
col_name data_type |
Yes | Column definitions. All supported data types are listed in the table above. |
COMMENT table_comment |
No | An optional description for the table. |
PARTITIONED BY |
No | Defines partition columns. If the OSS data is organized in Hive-style partitions (for example, dt=20250521/), add a matching partition column here. |
STORED AS sequencefile |
Yes | Specifies the SequenceFile storage format. |
LOCATION |
Yes | The OSS path where the SequenceFile data is stored, in the format oss://<endpoint>/<path>/. |
For a full parameter reference, see Basic syntax parameters.
Write data
For the syntax to write data from MaxCompute to OSS, see Write data to OSS.
Query and analyze data
-
For SELECT syntax, see Read data from OSS.
-
For query performance tips, see Query optimization.
Example: create, query, and write a SequenceFile external table
This example walks through creating a partitioned SequenceFile external table, loading a test dataset from OSS, querying the data, and writing a new row.
Prerequisites
Before you begin, make sure you have:
-
A MaxCompute project. See Create a project.
-
An OSS bucket and folder in the same region as your MaxCompute project. See Create a bucket and Manage folders.
-
OSS access permissions through an Alibaba Cloud account, a Resource Access Management (RAM) user, or a RAM role. See Authorize access in STS mode for OSS.
-
The
CreateTablepermission in your MaxCompute project. See MaxCompute permissions.
MaxCompute supports automatic folder creation in OSS. If an SQL statement involves external tables and user-defined functions (UDFs), you can use a single statement to read from and write to the tables and use the UDFs. You can also create the folder manually in the OSS console before running the statements below.
Step 1: Upload test data
Log in to the OSS console and upload the test file 20250611TableSink to the external-table-test/sequence/dt=20250521/ folder in your OSS bucket. For upload instructions, see Upload files to OSS.
Step 2: Create the external table
Create a partitioned SequenceFile external table that points to the OSS path where the test data is stored.
CREATE EXTERNAL TABLE mc_oss_extable_name_sequencefile1_pt
(
vehicleId STRING,
recordId STRING,
patientId STRING,
calls STRING,
locationLatitute STRING,
locationLongtitue STRING,
recordTime STRING,
direction STRING
)
PARTITIONED BY (dt STRING)
STORED AS sequencefile
LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/external-table-test/sequence/';
Step 3: Import partition data
Because the table is partitioned, run MSCK REPAIR TABLE to detect and register the partitions that already exist in OSS.
MSCK REPAIR TABLE mc_oss_extable_name_sequencefile1_pt ADD PARTITIONS;
Step 4: Query the data
SELECT * FROM mc_oss_extable_name_sequencefile1_pt WHERE dt=20250521;
The query returns 11 rows from the test dataset:
+------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+
| vehicleid | recordid | patientid | calls | locationlatitute | locationlongtitue | recordtime | direction | dt |
+------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+
| 1 | 12 | 76 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:10 | SW | 20250521 |
| 1 | 1 | 51 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | 20250521 |
| 1 | 2 | 13 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:01 | NE | 20250521 |
| 1 | 3 | 48 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:02 | NE | 20250521 |
| 1 | 4 | 30 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:03 | W | 20250521 |
| 1 | 5 | 47 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:04 | S | 20250521 |
| 1 | 6 | 9 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:05 | S | 20250521 |
| 1 | 7 | 53 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:06 | N | 20250521 |
| 1 | 8 | 63 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:07 | SW | 20250521 |
| 1 | 9 | 4 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:08 | NE | 20250521 |
| 1 | 10 | 31 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:09 | N | 20250521 |
+------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+
Step 5: Write data
Insert a new row into the partition.
INSERT INTO mc_oss_extable_name_sequencefile1_pt PARTITION (dt='20250521')
VALUES
('1','16','76','1','46.81006','-92.08174','9/14/2014 0:10','SW');
Step 6: Verify the write
Query the row you just inserted to confirm it was written correctly.
SELECT * FROM mc_oss_extable_name_sequencefile1_pt WHERE dt=20250521 AND recordId='16';
Expected output:
+------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+
| vehicleid | recordid | patientid | calls | locationlatitute | locationlongtitue | recordtime | direction | dt |
+------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+
| 1 | 16 | 76 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:10 | SW | 20250521 |
+------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+
What's next
-
To learn how external tables work with user-defined functions (UDFs), see Basic syntax parameters.
-
To optimize query performance on large SequenceFile datasets, see Query optimization.