When you need to query MaxCompute data directly from AnalyticDB for MySQL, or load it into internal tables for high-performance analytics, external tables let you do both without a separate data movement step. This topic describes how to create MaxCompute external tables and use them to import data into AnalyticDB for MySQL.
When to use external tables
External tables are read-only virtual mappings to MaxCompute source tables. Use them when:
-
You want to run ad-hoc queries on MaxCompute data without copying it first.
-
You want to use
INSERT ... SELECTto copy MaxCompute data into AnalyticDB for MySQL internal tables.
For recurring, large-scale imports with strict isolation requirements, consider the elastic import method described in this topic (requires Milvus V3.1.10.0 or later and a job resource group).
External tables are not a substitute for other import methods such as Data Transmission Service (DTS) or OSS-based batch load when those methods better fit your pipeline.
Access methods by edition
| Edition | Access method | Kernel version | Throughput |
|---|---|---|---|
| Enterprise Edition, Basic Edition, Data Lakehouse Edition | Tunnel Record API | No limit | Suitable for small-scale imports. Slower throughput. |
| Enterprise Edition, Basic Edition, Data Lakehouse Edition | Tunnel Arrow API | V3.2.2.3 or later | Reads data in columns, reducing import time. Faster throughput. |
| Data Warehouse Edition | Tunnel Record API | No limit | Uses a shared public Data Transmission Service resource group. Slower throughput. |
Prerequisites
Before you begin, make sure you have:
-
A MaxCompute project and an AnalyticDB for MySQL cluster in the same region. See Create a cluster.
-
The Virtual Private Cloud (VPC) CIDR block of the AnalyticDB for MySQL cluster added to the whitelist of the MaxCompute project. See Manage IP whitelists. To find the CIDR block: log on to the AnalyticDB for MySQL console and note the VPC ID on the Cluster Information page. Then log on to the VPC console and look up the CIDR block for that VPC.
-
For Enterprise Edition, Basic Edition, and Data Lakehouse Edition clusters:
-
Elastic Network Interface (ENI) access enabled. In the console, go to Cluster Management > Cluster Information > Network Information and turn on the ENI access switch.
ImportantEnabling or disabling ENI access interrupts database connections for approximately 2 minutes. No reads or writes are possible during this window. Evaluate the impact before making this change.
-
(Tunnel Arrow API only) Cluster kernel version V3.2.2.3 or later. To view and update the minor version, go to the Configuration Information section of the Cluster Information page.
-
-
The AccessKey ID and AccessKey secret of an Alibaba Cloud account or a Resource Access Management (RAM) user with read permissions on the MaxCompute project.
Sample data
The following examples use the MaxCompute project odps_project and table odps_nopart_import_test:
-- Create the source table in MaxCompute
CREATE TABLE IF NOT EXISTS odps_nopart_import_test (
id int,
name string,
age int)
PARTITIONED BY (dt string);
-- Add a partition
ALTER TABLE odps_nopart_import_test
ADD PARTITION (dt='202207');
-- Insert sample data
INSERT INTO odps_project.odps_nopart_import_test
PARTITION (dt='202207')
VALUES (1,'james',10),(2,'bond',20),(3,'jack',30),(4,'lucy',40);
Import data using the Tunnel Record API
The Tunnel Record API supports two import methods: regular import (default) and elastic import.
| Method | How it works | Use when |
|---|---|---|
| Regular import | Reads source data on compute nodes and builds indexes on storage nodes. Consumes both compute and storage resources. | General use; no additional resource group required. |
| Elastic import | Reads source data and builds indexes inside a Serverless Spark job running in a job resource group. | You want to minimize impact on real-time reads and writes and improve resource isolation. Requires Milvus V3.1.10.0 or later and a configured job resource group. |
Regular import
-
Open the SQL editor. Log on to the AnalyticDB for MySQL console. In the upper-left corner, select a region. In the left-side navigation pane, click Clusters, find and click the cluster ID, then choose Job Development > SQL Development.
-
Create an external database.
CREATE EXTERNAL DATABASE adb_external_db; -
Create an external table that maps to the MaxCompute source table.
Note-
The external table columns must match the MaxCompute source table exactly: same field names, same number of fields, same field order, and compatible data types. For all
TABLE_PROPERTIESparameters, see CREATE EXTERNAL TABLE.
Parameter Description ENGINE='ODPS'Specifies MaxCompute as the storage engine. endpointThe VPC endpoint of MaxCompute. Only VPC endpoints are supported. See VPC endpoints for endpoints by region. accessidThe AccessKey ID with read permissions on the MaxCompute project. accesskeyThe AccessKey secret corresponding to the AccessKey ID. partition_columnThe partition column name. Omit this parameter if the MaxCompute source table is not partitioned. project_nameThe name of the MaxCompute workspace. table_nameThe name of the source table in MaxCompute. CREATE EXTERNAL TABLE IF NOT EXISTS adb_external_db.test_adb ( id int, name varchar(1023), age int, dt string ) ENGINE='ODPS' TABLE_PROPERTIES='{ "accessid":"yourAccessKeyID", "endpoint":"http://service.cn-hangzhou.maxcompute.aliyun.com/api", "accesskey":"yourAccessKeySecret", "partition_column":"dt", "project_name":"odps_project", "table_name":"odps_nopart_import_test" }';Key parameters:
-
-
Verify the external table by querying the data.
SELECT * FROM adb_external_db.test_adb;Expected output:
+------+-------+------+---------+ | id | name | age | dt | +------+-------+------+---------+ | 1 | james | 10 | 202207 | | 2 | bond | 20 | 202207 | | 3 | jack | 30 | 202207 | | 4 | lucy | 40 | 202207 | +------+-------+------+---------+ -
Create a destination database and table in AnalyticDB for MySQL.
NoteThe destination table and the external table must have the same field order and number of fields, with compatible data types.
CREATE DATABASE adb_demo;CREATE TABLE IF NOT EXISTS adb_demo.adb_import_test( id int, name string, age int, dt string, PRIMARY KEY(id,dt) ) DISTRIBUTED BY HASH(id) PARTITION BY VALUE('dt'); -
Import the data. Choose one of the following methods:
-
Method 1 — `INSERT INTO`: Imports data and ignores rows with duplicate primary keys (equivalent to
INSERT IGNORE INTO). See INSERT INTO.INSERT INTO adb_demo.adb_import_test SELECT * FROM adb_external_db.test_adb;To import from a specific partition only:
INSERT INTO adb_demo.adb_import_test SELECT * FROM adb_external_db.test_adb WHERE dt = '202207'; -
Method 2 — `INSERT OVERWRITE INTO`: Overwrites all existing data in the destination table.
INSERT OVERWRITE INTO adb_demo.adb_import_test SELECT * FROM adb_external_db.test_adb; -
Method 3 — Asynchronous `INSERT OVERWRITE INTO`: Submits the import as a background job. Add the
/*+ direct_batch_load=true*/hint to accelerate the job. See Asynchronous writes.SUBMIT JOB INSERT OVERWRITE INTO adb_demo.adb_import_test SELECT * FROM adb_external_db.test_adb;The command returns a job ID:
+---------------------------------------+ | job_id | +---------------------------------------+ | 2020112122202917203100908203303****** | +---------------------------------------+Use the job ID to monitor or cancel the job. See Submit an asynchronous import task.
-
Elastic import
Elastic import runs the import inside a Serverless Spark job, reducing impact on real-time workloads. It requires Milvus V3.1.10.0 or later and a configured job resource group.
Elastic import only supports INSERT OVERWRITE INTO. You cannot use INSERT INTO with elastic import.
-
Open the SQL editor. Log on to the AnalyticDB for MySQL console. In the upper-left corner, select a region. In the left-side navigation pane, click Clusters, find and click the cluster ID, then choose Job Development > SQL Development.
-
Create a database (skip if one already exists).
CREATE DATABASE adb_demo; -
Create an external table.
Note-
The external table name must match the MaxCompute project name. Otherwise, the external table creation fails.
-
Elastic import requires creating external tables with
CREATE TABLE(notCREATE EXTERNAL TABLE). -
The external table columns must match the MaxCompute source table exactly: same field names, same number of fields, same field order, and same data types.
CREATE TABLE IF NOT EXISTS test_adb ( id int, name string, age int, dt string ) ENGINE='ODPS' TABLE_PROPERTIES='{ "endpoint":"http://service.cn-hangzhou.maxcompute.aliyun-inc.com/api", "accessid":"yourAccessKeyID", "accesskey":"yourAccessKeySecret", "partition_column":"dt", "project_name":"odps_project", "table_name":"odps_nopart_import_test" }';For
TABLE_PROPERTIESparameter descriptions, see Parameter descriptions. -
-
Verify the external table by querying the data.
SELECT * FROM adb_demo.test_adb; -
Create the destination table in AnalyticDB for MySQL.
NoteThe destination table and the external table must have the same field names, number of fields, field order, and data types.
CREATE TABLE IF NOT EXISTS adb_import_test ( id int, name string, age int, dt string, PRIMARY KEY(id,dt) ) DISTRIBUTED BY HASH(id) PARTITION BY VALUE('dt') LIFECYCLE 30; -
Import the data using one of the following methods. Both methods use the
elastic_loadandelastic_load_configshint parameters to enable and configure the elastic import job. Setelastic_loadtotrueto enable elastic import (default:false). Useelastic_load_configsto specify configuration parameters enclosed in[ ]; separate multiple parameters with|.elastic_load_configsaccepts the following parameters:-
Method 1 — Synchronous elastic import:
/*+ elastic_load=true, elastic_load_configs=[adb.load.resource.group.name=resource_group]*/ INSERT OVERWRITE INTO adb_demo.adb_import_test SELECT * FROM adb_demo.test_adb; -
Method 2 — Asynchronous elastic import:
ImportantPriority queues are not supported for asynchronous elastic import tasks.
/*+ elastic_load=true, elastic_load_configs=[adb.load.resource.group.name=resource_group]*/ SUBMIT JOB INSERT OVERWRITE INTO adb_demo.adb_import_test SELECT * FROM adb_demo.test_adb;The command returns a job ID. Use it to monitor or cancel the job. See Submit an asynchronous import task.
Parameter Required Description adb.load.resource.group.nameYes The name of the job resource group that runs the elastic import job. adb.load.job.max.acuNo Maximum resources for the elastic import job. Unit: AnalyticDB Compute Units (ACUs). Minimum: 5 ACUs. Default: number of shards plus 1. To query the number of shards: SELECT count(1) FROM information_schema.kepler_meta_shards;spark.driver.resourceSpecNo Resource type of the Spark driver. Default: small. See the Type column in the Spark resource specifications table.spark.executor.resourceSpecNo Resource type of the Spark executor. Default: large. See the Type column in the Spark resource specifications table.spark.adb.executorDiskSizeNo Disk capacity of the Spark executor. Valid values: (0, 100]. Unit: GiB. Default: 10 GiB. See Specify driver and executor resources. -
-
(Optional) Verify that the job ran as an elastic import task.
SELECT job_name, (job_type = 3) AS is_elastic_load FROM INFORMATION_SCHEMA.kepler_meta_async_jobs WHERE job_name = "2023081818010602101701907303151******";Expected output:
+---------------------------------------+------------------+ | job_name | is_elastic_load | +---------------------------------------+------------------+ | 2023081517195203101701907203151****** | 1 | +---------------------------------------+------------------+is_elastic_load = 1indicates an elastic import task.is_elastic_load = 0indicates a regular import task.
Import data using the Tunnel Arrow API
The Tunnel Arrow API reads MaxCompute data in columns, which reduces data transfer volume and speeds up imports. It requires cluster kernel version V3.2.2.3 or later.
Step 1: Enable the Arrow API
Enable the Arrow API at the cluster level using SET ADB_CONFIG, or at the query level using a hint.
-
Cluster level (persists across queries):
SET ADB_CONFIG <config_name>= <value>; -
Query level (applies to a single query):
/*<config_name>= <value>*/ SELECT * FROM table;
Arrow API configuration parameters:
| Parameter | Description |
|---|---|
ODPS_TUNNEL_ARROW_ENABLED |
Enable the Arrow API. Valid values: true (enabled), false (disabled, default). |
ODPS_TUNNEL_SPLIT_BY_SIZE_ENABLED |
Enable dynamic split. Valid values: true (enabled), false (disabled, default). |
Step 2: Import MaxCompute data
Once the Arrow API is enabled, the import steps are the same as for regular import. Follow steps 1–6 in Regular import.
The cluster automatically uses the Tunnel Arrow API for all subsequent MaxCompute data access and import operations.
Import data (Data Warehouse Edition)
Data Warehouse Edition uses the Tunnel Record API with a shared public Data Transmission Service resource group.
-
Create the destination database.
CREATE DATABASE test_adb; -
Create a MaxCompute external table. This example uses
odps_nopart_import_test_external_table.Parameter Description ENGINE='ODPS'Specifies MaxCompute as the storage engine. endpointThe VPC endpoint of MaxCompute. Only VPC endpoints are supported. See VPC endpoints for endpoints by region. accessidThe AccessKey ID of an Alibaba Cloud account or a RAM user with permissions to access MaxCompute. See Accounts and permissions. accesskeyThe AccessKey secret corresponding to the AccessKey ID. See Accounts and permissions. partition_columnThe partition column name. Omit this parameter if the MaxCompute source table is not partitioned. project_nameThe name of the MaxCompute workspace. table_nameThe name of the source table in MaxCompute. CREATE TABLE IF NOT EXISTS odps_nopart_import_test_external_table ( id int, name string, age int, dt string ) ENGINE='ODPS' TABLE_PROPERTIES='{ "endpoint":"http://service.cn.maxcompute.aliyun-inc.com/api", "accessid":"yourAccessKeyID", "accesskey":"yourAccessKeySecret", "partition_column":"dt", "project_name":"odps_project1", "table_name":"odps_nopart_import_test" }'; -
Create the destination table in AnalyticDB for MySQL.
CREATE TABLE IF NOT EXISTS adb_nopart_import_test ( id int, name string, age int, dt string, PRIMARY KEY(id,dt) ) DISTRIBUTED BY HASH(id) PARTITION BY VALUE('dt') LIFECYCLE 30; -
Import the data. Choose one of the following methods:
-
Method 1 — `INSERT INTO`: Ignores rows with duplicate primary keys (equivalent to
INSERT IGNORE INTO). See INSERT INTO.INSERT INTO adb_nopart_import_test SELECT * FROM odps_nopart_import_test_external_table;To verify the imported data:
SELECT * FROM adb_nopart_import_test;Expected output:
+------+-------+------+---------+ | id | name | age | dt | +------+-------+------+---------+ | 1 | james | 10 | 202207 | | 2 | bond | 20 | 202207 | | 3 | jack | 30 | 202207 | | 4 | lucy | 40 | 202207 | +------+-------+------+---------+To import from a specific partition only:
INSERT INTO adb_nopart_import_test SELECT * FROM odps_nopart_import_test_external_table WHERE dt = '202207'; -
Method 2 — `INSERT OVERWRITE`: Overwrites all existing data in the destination table.
INSERT OVERWRITE adb_nopart_import_test SELECT * FROM odps_nopart_import_test_external_table; -
Method 3 — Asynchronous `INSERT OVERWRITE`: Submits the import as a background job. See Asynchronous write.
SUBMIT JOB INSERT OVERWRITE adb_nopart_import_test SELECT * FROM odps_nopart_import_test_external_table;The command returns a job ID. Use it to monitor or cancel the job. See Submit an asynchronous import task.
-