All Products
Search
Document Center

MaxCompute:External tables overview

Last Updated:Mar 11, 2026

External tables let MaxCompute query and analyze data in external storage systems — such as Object Storage Service (OSS), Tablestore, Hologres, and Hadoop Distributed File System (HDFS) — without importing it first. A Data Definition Language (DDL) statement defines the table name, schema, properties, permissions, location, and access protocol. MaxCompute SQL uses this metadata to connect to the external data source at query time.

How it works

MaxCompute supports two patterns for working with external data:

PatternDescriptionTrade-off
Import then computeImport data into MaxCompute internal tables, then run queries.Higher read/write efficiency.
Compute on external dataRun MaxCompute SQL directly against the external data source. Data stays in the external system.No data movement required.

Use cases

  • Query external data directly — Run SQL analytics on data that resides outside MaxCompute.

  • Centralized data governance — Register and manage external data sources under MaxCompute.

  • Data lake batch processing — Process structured and unstructured data stored in data lakes.

  • Cross-system data sharing — Share datasets across systems without duplication.

  • Archive from real-time to batch — Move data from a real-time data warehouse into a batch data warehouse model.

Supported external table types

OSS external tables

FormatDocumentation
CSV/TSVCSV/TSV external tables
ParquetParquet external tables
ORCORC external tables
PaimonPaimon external tables
JSONJSON external tables
AvroAvro external tables
Hudi (Discontinued)Hudi external tables
Delta Lake (Discontinued)Delta Lake external tables
RCFileRCFile external tables
SequenceFileSequenceFile external tables

For general information, see OSS external tables.

Other data sources

Data sourceDocumentation
TablestoreTablestore external tables
HologresHologres external tables
PaimonPaimon external tables

Billing

External tables incur no storage costs. Data remains in the external system.

Compute costs

Billing methodDetails
Subscription or elastically reserved compute units (CUs)Included in prepaid fees.
Pay-as-you-go (billable)OSS and Tablestore — billed based on data scanned.
Pay-as-you-go (non-billable)HDFS, Hologres, ApsaraDB RDS, ApsaraDB for HBase, and Lindorm — not currently billed, whether accessed through external tables or Data Lakehouse 2.0 external schemas. The amount of scanned data is not tracked, and no computing costs are generated.

Network costs

Using a public MaxCompute endpoint generates Internet traffic and download fees.

External data source costs

The external data source may generate its own access, compute, and storage costs. Check the billing documentation of each service.

For full pricing details, see Billable items and billing methods.

Limitations

Tunnel

The Tunnel feature and Tunnel SDK do not support operations on external tables. To load data, upload it to MaxCompute internal tables through Tunnel, or upload it to OSS through the OSS Python SDK and then create an external table mapping.

Write consistency

External tables support writing to data sources, but write consistency depends on the external system:

Data sourceWrite behavior
HologresAccesses metadata through Java Database Connectivity (JDBC). No atomicity guarantee for write transactions. The MaxCompute SQL engine can only read, not write, the underlying Pangu data of Hologres. INSERT OVERWRITE is not supported. On job failure, partial data may be written.
HDFSWriting through Hive Metastore (HMS) has a small chance of inconsistency.
OSSA .odps metadata file reduces the chance of incomplete writes. Without the .odps file, a small chance of incomplete writes exists.

For details on writing data to OSS, see Write data to OSS.

INSERT OVERWRITE behavior

When running INSERT OVERWRITE against an external data source, MaxCompute writes new data first, then deletes the existing data during the DDL commit phase. Deleted data cannot be rolled back or recovered. Back up data before running INSERT OVERWRITE. After the write operation, validate the data. If issues are found, repeat the full write operation.

Related resources