External tables let MaxCompute query and analyze data in external storage systems — such as Object Storage Service (OSS), Tablestore, Hologres, and Hadoop Distributed File System (HDFS) — without importing it first. A Data Definition Language (DDL) statement defines the table name, schema, properties, permissions, location, and access protocol. MaxCompute SQL uses this metadata to connect to the external data source at query time.
How it works
MaxCompute supports two patterns for working with external data:
| Pattern | Description | Trade-off |
|---|---|---|
| Import then compute | Import data into MaxCompute internal tables, then run queries. | Higher read/write efficiency. |
| Compute on external data | Run MaxCompute SQL directly against the external data source. Data stays in the external system. | No data movement required. |
Use cases
Query external data directly — Run SQL analytics on data that resides outside MaxCompute.
Centralized data governance — Register and manage external data sources under MaxCompute.
Data lake batch processing — Process structured and unstructured data stored in data lakes.
Cross-system data sharing — Share datasets across systems without duplication.
Archive from real-time to batch — Move data from a real-time data warehouse into a batch data warehouse model.
Supported external table types
OSS external tables
| Format | Documentation |
|---|---|
| CSV/TSV | CSV/TSV external tables |
| Parquet | Parquet external tables |
| ORC | ORC external tables |
| Paimon | Paimon external tables |
| JSON | JSON external tables |
| Avro | Avro external tables |
| Hudi (Discontinued) | Hudi external tables |
| Delta Lake (Discontinued) | Delta Lake external tables |
| RCFile | RCFile external tables |
| SequenceFile | SequenceFile external tables |
For general information, see OSS external tables.
Other data sources
| Data source | Documentation |
|---|---|
| Tablestore | Tablestore external tables |
| Hologres | Hologres external tables |
| Paimon | Paimon external tables |
Billing
External tables incur no storage costs. Data remains in the external system.
Compute costs
| Billing method | Details |
|---|---|
| Subscription or elastically reserved compute units (CUs) | Included in prepaid fees. |
| Pay-as-you-go (billable) | OSS and Tablestore — billed based on data scanned. |
| Pay-as-you-go (non-billable) | HDFS, Hologres, ApsaraDB RDS, ApsaraDB for HBase, and Lindorm — not currently billed, whether accessed through external tables or Data Lakehouse 2.0 external schemas. The amount of scanned data is not tracked, and no computing costs are generated. |
Network costs
Using a public MaxCompute endpoint generates Internet traffic and download fees.
External data source costs
The external data source may generate its own access, compute, and storage costs. Check the billing documentation of each service.
For full pricing details, see Billable items and billing methods.
Limitations
Tunnel
The Tunnel feature and Tunnel SDK do not support operations on external tables. To load data, upload it to MaxCompute internal tables through Tunnel, or upload it to OSS through the OSS Python SDK and then create an external table mapping.
Write consistency
External tables support writing to data sources, but write consistency depends on the external system:
| Data source | Write behavior |
|---|---|
| Hologres | Accesses metadata through Java Database Connectivity (JDBC). No atomicity guarantee for write transactions. The MaxCompute SQL engine can only read, not write, the underlying Pangu data of Hologres. INSERT OVERWRITE is not supported. On job failure, partial data may be written. |
| HDFS | Writing through Hive Metastore (HMS) has a small chance of inconsistency. |
| OSS | A .odps metadata file reduces the chance of incomplete writes. Without the .odps file, a small chance of incomplete writes exists. |
For details on writing data to OSS, see Write data to OSS.
INSERT OVERWRITE behavior
When running INSERT OVERWRITE against an external data source, MaxCompute writes new data first, then deletes the existing data during the DDL commit phase. Deleted data cannot be rolled back or recovered. Back up data before running INSERT OVERWRITE. After the write operation, validate the data. If issues are found, repeat the full write operation.
Related resources
STS mode authorization — Grant MaxCompute access to OSS through a custom RAM role.
External tables in DataWorks — Create and manage external tables through the DataWorks console.
Access unstructured data in Tablestore — Read unstructured data from Tablestore.