Import MaxCompute data by using external tables - AnalyticDB

When to use external tables

External tables are read-only virtual mappings to MaxCompute source tables. Use them when:

You want to run ad-hoc queries on MaxCompute data without copying it first.
You want to use INSERT ... SELECT to copy MaxCompute data into AnalyticDB for MySQL internal tables.

For recurring, large-scale imports with strict isolation requirements, consider the elastic import method described in this topic (requires Milvus V3.1.10.0 or later and a job resource group).

External tables are not a substitute for other import methods such as Data Transmission Service (DTS) or OSS-based batch load when those methods better fit your pipeline.

Access methods by edition

Edition	Access method	Kernel version	Throughput
Enterprise Edition, Basic Edition, Data Lakehouse Edition	Tunnel Record API	No limit	Suitable for small-scale imports. Slower throughput.
Enterprise Edition, Basic Edition, Data Lakehouse Edition	Tunnel Arrow API	V3.2.2.3 or later	Reads data in columns, reducing import time. Faster throughput.
Data Warehouse Edition	Tunnel Record API	No limit	Uses a shared public Data Transmission Service resource group. Slower throughput.

Prerequisites

Before you begin, make sure you have:

A MaxCompute project and an AnalyticDB for MySQL cluster in the same region. See Create a cluster.
The Virtual Private Cloud (VPC) CIDR block of the AnalyticDB for MySQL cluster added to the whitelist of the MaxCompute project. See Manage IP whitelists. To find the CIDR block: log on to the AnalyticDB for MySQL console and note the VPC ID on the Cluster Information page. Then log on to the VPC console and look up the CIDR block for that VPC.
For Enterprise Edition, Basic Edition, and Data Lakehouse Edition clusters:
- Elastic Network Interface (ENI) access enabled. In the console, go to Cluster Management > Cluster Information > Network Information and turn on the ENI access switch.
  
  Important
  Enabling or disabling ENI access interrupts database connections for approximately 2 minutes. No reads or writes are possible during this window. Evaluate the impact before making this change.
- (Tunnel Arrow API only) Cluster kernel version V3.2.2.3 or later. To view and update the minor version, go to the Configuration Information section of the Cluster Information page.
The AccessKey ID and AccessKey secret of an Alibaba Cloud account or a Resource Access Management (RAM) user with read permissions on the MaxCompute project.

Sample data

The following examples use the MaxCompute project odps_project and table odps_nopart_import_test:

-- Create the source table in MaxCompute
CREATE TABLE IF NOT EXISTS odps_nopart_import_test (
    id int,
    name string,
    age int)
PARTITIONED BY (dt string);

-- Add a partition
ALTER TABLE odps_nopart_import_test
ADD PARTITION (dt='202207');

-- Insert sample data
INSERT INTO odps_project.odps_nopart_import_test
PARTITION (dt='202207')
VALUES (1,'james',10),(2,'bond',20),(3,'jack',30),(4,'lucy',40);

Import data using the Tunnel Record API

The Tunnel Record API supports two import methods: regular import (default) and elastic import.

Method	How it works	Use when
Regular import	Reads source data on compute nodes and builds indexes on storage nodes. Consumes both compute and storage resources.	General use; no additional resource group required.
Elastic import	Reads source data and builds indexes inside a Serverless Spark job running in a job resource group.	You want to minimize impact on real-time reads and writes and improve resource isolation. Requires Milvus V3.1.10.0 or later and a configured job resource group.

Regular import

Open the SQL editor. Log on to the AnalyticDB for MySQL console. In the upper-left corner, select a region. In the left-side navigation pane, click Clusters, find and click the cluster ID, then choose Job Development > SQL Development.

Create an external database.

CREATE EXTERNAL DATABASE adb_external_db;

Create an external table that maps to the MaxCompute source table.

Note

The external table columns must match the MaxCompute source table exactly: same field names, same number of fields, same field order, and compatible data types. For all TABLE_PROPERTIES parameters, see CREATE EXTERNAL TABLE.

Parameter	Description
`ENGINE='ODPS'`	Specifies MaxCompute as the storage engine.
`endpoint`	The VPC endpoint of MaxCompute. Only VPC endpoints are supported. See VPC endpoints for endpoints by region.
`accessid`	The AccessKey ID with read permissions on the MaxCompute project.
`accesskey`	The AccessKey secret corresponding to the AccessKey ID.
`partition_column`	The partition column name. Omit this parameter if the MaxCompute source table is not partitioned.
`project_name`	The name of the MaxCompute workspace.
`table_name`	The name of the source table in MaxCompute.

CREATE EXTERNAL TABLE IF NOT EXISTS adb_external_db.test_adb (
    id int,
    name varchar(1023),
    age int,
    dt string
) ENGINE='ODPS'
TABLE_PROPERTIES='{
"accessid":"yourAccessKeyID",
"endpoint":"http://service.cn-hangzhou.maxcompute.aliyun.com/api",
"accesskey":"yourAccessKeySecret",
"partition_column":"dt",
"project_name":"odps_project",
"table_name":"odps_nopart_import_test"
}';

Key parameters:

Verify the external table by querying the data.

SELECT * FROM adb_external_db.test_adb;

Expected output:

+------+-------+------+---------+
| id   | name  | age  |   dt    |
+------+-------+------+---------+
|    1 | james |   10 |  202207 |
|    2 | bond  |   20 |  202207 |
|    3 | jack  |   30 |  202207 |
|    4 | lucy  |   40 |  202207 |
+------+-------+------+---------+

Create a destination database and table in AnalyticDB for MySQL.

Note

The destination table and the external table must have the same field order and number of fields, with compatible data types.

CREATE DATABASE adb_demo;

CREATE TABLE IF NOT EXISTS adb_demo.adb_import_test(
    id int,
    name string,
    age int,
    dt string,
    PRIMARY KEY(id,dt)
)
DISTRIBUTED BY HASH(id)
PARTITION BY VALUE('dt');

Import the data. Choose one of the following methods:
- Method 1 — INSERT INTO: Imports data and ignores rows with duplicate primary keys (equivalent to INSERT IGNORE INTO). See INSERT INTO.
```
INSERT INTO adb_demo.adb_import_test
SELECT * FROM adb_external_db.test_adb;
```
  To import from a specific partition only:
```
INSERT INTO adb_demo.adb_import_test
SELECT * FROM adb_external_db.test_adb
WHERE dt = '202207';
```
- Method 2 — INSERT OVERWRITE INTO: Overwrites all existing data in the destination table.
```
INSERT OVERWRITE INTO adb_demo.adb_import_test
SELECT * FROM adb_external_db.test_adb;
```
- Method 3 — Asynchronous INSERT OVERWRITE INTO: Submits the import as a background job. Add the /*+ direct_batch_load=true*/ hint to accelerate the job. See Asynchronous writes.
```
SUBMIT JOB
INSERT OVERWRITE INTO adb_demo.adb_import_test
SELECT * FROM adb_external_db.test_adb;
```
  The command returns a job ID:
```
+---------------------------------------+
| job_id                                |
+---------------------------------------+
| 2020112122202917203100908203303****** |
+---------------------------------------+
```
  Use the job ID to monitor or cancel the job. See Submit an asynchronous import task.

Elastic import

Elastic import runs the import inside a Serverless Spark job, reducing impact on real-time workloads. It requires Milvus V3.1.10.0 or later and a configured job resource group.

Important

Elastic import only supports INSERT OVERWRITE INTO. You cannot use INSERT INTO with elastic import.

Open the SQL editor. Log on to the AnalyticDB for MySQL console. In the upper-left corner, select a region. In the left-side navigation pane, click Clusters, find and click the cluster ID, then choose Job Development > SQL Development.
Create a database (skip if one already exists).
```
CREATE DATABASE adb_demo;
```
Create an external table.
Note
- The external table name must match the MaxCompute project name. Otherwise, the external table creation fails.
- Elastic import requires creating external tables with CREATE TABLE (not CREATE EXTERNAL TABLE).
- The external table columns must match the MaxCompute source table exactly: same field names, same number of fields, same field order, and same data types.
```
CREATE TABLE IF NOT EXISTS test_adb
(
    id int,
    name string,
    age int,
    dt string
)
ENGINE='ODPS'
TABLE_PROPERTIES='{
"endpoint":"http://service.cn-hangzhou.maxcompute.aliyun-inc.com/api",
"accessid":"yourAccessKeyID",
"accesskey":"yourAccessKeySecret",
"partition_column":"dt",
"project_name":"odps_project",
"table_name":"odps_nopart_import_test"
}';
```
For TABLE_PROPERTIES parameter descriptions, see Parameter descriptions.
Verify the external table by querying the data.
```
SELECT * FROM adb_demo.test_adb;
```

Create the destination table in AnalyticDB for MySQL.

Note

The destination table and the external table must have the same field names, number of fields, field order, and data types.

CREATE TABLE IF NOT EXISTS adb_import_test
(
    id int,
    name string,
    age int,
    dt string,
    PRIMARY KEY(id,dt)
)
DISTRIBUTED BY HASH(id)
PARTITION BY VALUE('dt') LIFECYCLE 30;

Import the data using one of the following methods. Both methods use the elastic_load and elastic_load_configs hint parameters to enable and configure the elastic import job. Set elastic_load to true to enable elastic import (default: false). Use elastic_load_configs to specify configuration parameters enclosed in [ ]; separate multiple parameters with |. elastic_load_configs accepts the following parameters:

Method 1 — Synchronous elastic import:

/*+ elastic_load=true, elastic_load_configs=[adb.load.resource.group.name=resource_group]*/
INSERT OVERWRITE INTO adb_demo.adb_import_test SELECT * FROM adb_demo.test_adb;

Method 2 — Asynchronous elastic import:

Important
Priority queues are not supported for asynchronous elastic import tasks.
```
/*+ elastic_load=true, elastic_load_configs=[adb.load.resource.group.name=resource_group]*/
SUBMIT JOB INSERT OVERWRITE INTO adb_demo.adb_import_test SELECT * FROM adb_demo.test_adb;
```
The command returns a job ID. Use it to monitor or cancel the job. See Submit an asynchronous import task.

Parameter	Required	Description
`adb.load.resource.group.name`	Yes	The name of the job resource group that runs the elastic import job.
`adb.load.job.max.acu`	No	Maximum resources for the elastic import job. Unit: AnalyticDB Compute Units (ACUs). Minimum: 5 ACUs. Default: number of shards plus 1. To query the number of shards: `SELECT count(1) FROM information_schema.kepler_meta_shards;`
`spark.driver.resourceSpec`	No	Resource type of the Spark driver. Default: `small`. See the Type column in the Spark resource specifications table.
`spark.executor.resourceSpec`	No	Resource type of the Spark executor. Default: `large`. See the Type column in the Spark resource specifications table.
`spark.adb.executorDiskSize`	No	Disk capacity of the Spark executor. Valid values: (0, 100]. Unit: GiB. Default: 10 GiB. See Specify driver and executor resources.

(Optional) Verify that the job ran as an elastic import task.

SELECT job_name, (job_type = 3) AS is_elastic_load
FROM INFORMATION_SCHEMA.kepler_meta_async_jobs
WHERE job_name = "2023081818010602101701907303151******";

Expected output:

+---------------------------------------+------------------+
| job_name                              | is_elastic_load  |
+---------------------------------------+------------------+
| 2023081517195203101701907203151****** |       1          |
+---------------------------------------+------------------+

is_elastic_load = 1 indicates an elastic import task. is_elastic_load = 0 indicates a regular import task.

Import data using the Tunnel Arrow API

The Tunnel Arrow API reads MaxCompute data in columns, which reduces data transfer volume and speeds up imports. It requires cluster kernel version V3.2.2.3 or later.

Step 1: Enable the Arrow API

Enable the Arrow API at the cluster level using SET ADB_CONFIG, or at the query level using a hint.

Cluster level (persists across queries):
```
SET ADB_CONFIG <config_name>= <value>;
```

Query level (applies to a single query):

/*<config_name>= <value>*/ SELECT * FROM table;

Arrow API configuration parameters:

Parameter	Description
`ODPS_TUNNEL_ARROW_ENABLED`	Enable the Arrow API. Valid values: `true` (enabled), `false` (disabled, default).
`ODPS_TUNNEL_SPLIT_BY_SIZE_ENABLED`	Enable dynamic split. Valid values: `true` (enabled), `false` (disabled, default).

Step 2: Import MaxCompute data

Once the Arrow API is enabled, the import steps are the same as for regular import. Follow steps 1–6 in Regular import.

The cluster automatically uses the Tunnel Arrow API for all subsequent MaxCompute data access and import operations.

Import data (Data Warehouse Edition)

Data Warehouse Edition uses the Tunnel Record API with a shared public Data Transmission Service resource group.

Connect to the AnalyticDB for MySQL cluster.
Create the destination database.
```
CREATE DATABASE test_adb;
```

Create a MaxCompute external table. This example uses odps_nopart_import_test_external_table.

Parameter	Description
`ENGINE='ODPS'`	Specifies MaxCompute as the storage engine.
`endpoint`	The VPC endpoint of MaxCompute. Only VPC endpoints are supported. See VPC endpoints for endpoints by region.
`accessid`	The AccessKey ID of an Alibaba Cloud account or a RAM user with permissions to access MaxCompute. See Accounts and permissions.
`accesskey`	The AccessKey secret corresponding to the AccessKey ID. See Accounts and permissions.
`partition_column`	The partition column name. Omit this parameter if the MaxCompute source table is not partitioned.
`project_name`	The name of the MaxCompute workspace.
`table_name`	The name of the source table in MaxCompute.

CREATE TABLE IF NOT EXISTS odps_nopart_import_test_external_table
(
    id int,
    name string,
    age int,
    dt string
)
ENGINE='ODPS'
TABLE_PROPERTIES='{
"endpoint":"http://service.cn.maxcompute.aliyun-inc.com/api",
"accessid":"yourAccessKeyID",
"accesskey":"yourAccessKeySecret",
"partition_column":"dt",
"project_name":"odps_project1",
"table_name":"odps_nopart_import_test"
}';

Create the destination table in AnalyticDB for MySQL.

CREATE TABLE IF NOT EXISTS adb_nopart_import_test
(
    id int,
    name string,
    age int,
    dt string,
    PRIMARY KEY(id,dt)
)
DISTRIBUTED BY HASH(id)
PARTITION BY VALUE('dt') LIFECYCLE 30;

Import the data. Choose one of the following methods:

Method 1 — INSERT INTO: Ignores rows with duplicate primary keys (equivalent to INSERT IGNORE INTO). See INSERT INTO.

INSERT INTO adb_nopart_import_test
SELECT * FROM odps_nopart_import_test_external_table;

To verify the imported data:

SELECT * FROM adb_nopart_import_test;

Expected output:

+------+-------+------+---------+
| id   | name  | age  |   dt    |
+------+-------+------+---------+
|    1 | james |   10 |  202207 |
|    2 | bond  |   20 |  202207 |
|    3 | jack  |   30 |  202207 |
|    4 | lucy  |   40 |  202207 |
+------+-------+------+---------+

To import from a specific partition only:

INSERT INTO adb_nopart_import_test
SELECT * FROM odps_nopart_import_test_external_table
WHERE dt = '202207';

Method 2 — INSERT OVERWRITE: Overwrites all existing data in the destination table.

INSERT OVERWRITE adb_nopart_import_test
SELECT * FROM odps_nopart_import_test_external_table;

Method 3 — Asynchronous INSERT OVERWRITE: Submits the import as a background job. See Asynchronous write.
```
SUBMIT JOB
INSERT OVERWRITE adb_nopart_import_test
SELECT * FROM odps_nopart_import_test_external_table;
```
The command returns a job ID. Use it to monitor or cancel the job. See Submit an asynchronous import task.

AnalyticDB:Import MaxCompute data by using external tables

When to use external tables

Access methods by edition

Prerequisites

Sample data

Import data using the Tunnel Record API

Regular import

Elastic import

Import data using the Tunnel Arrow API

Step 1: Enable the Arrow API

Step 2: Import MaxCompute data

Import data (Data Warehouse Edition)

Related topics