All Products
Search
Document Center

AnalyticDB:Import MaxCompute data by using external tables

Last Updated:Mar 30, 2026

When you need to query MaxCompute data directly from AnalyticDB for MySQL, or load it into internal tables for high-performance analytics, external tables let you do both without a separate data movement step. This topic describes how to create MaxCompute external tables and use them to import data into AnalyticDB for MySQL.

When to use external tables

External tables are read-only virtual mappings to MaxCompute source tables. Use them when:

  • You want to run ad-hoc queries on MaxCompute data without copying it first.

  • You want to use INSERT ... SELECT to copy MaxCompute data into AnalyticDB for MySQL internal tables.

For recurring, large-scale imports with strict isolation requirements, consider the elastic import method described in this topic (requires Milvus V3.1.10.0 or later and a job resource group).

External tables are not a substitute for other import methods such as Data Transmission Service (DTS) or OSS-based batch load when those methods better fit your pipeline.

Access methods by edition

Edition Access method Kernel version Throughput
Enterprise Edition, Basic Edition, Data Lakehouse Edition Tunnel Record API No limit Suitable for small-scale imports. Slower throughput.
Enterprise Edition, Basic Edition, Data Lakehouse Edition Tunnel Arrow API V3.2.2.3 or later Reads data in columns, reducing import time. Faster throughput.
Data Warehouse Edition Tunnel Record API No limit Uses a shared public Data Transmission Service resource group. Slower throughput.

Prerequisites

Before you begin, make sure you have:

  • A MaxCompute project and an AnalyticDB for MySQL cluster in the same region. See Create a cluster.

  • The Virtual Private Cloud (VPC) CIDR block of the AnalyticDB for MySQL cluster added to the whitelist of the MaxCompute project. See Manage IP whitelists. To find the CIDR block: log on to the AnalyticDB for MySQL console and note the VPC ID on the Cluster Information page. Then log on to the VPC console and look up the CIDR block for that VPC.

  • For Enterprise Edition, Basic Edition, and Data Lakehouse Edition clusters:

    • Elastic Network Interface (ENI) access enabled. In the console, go to Cluster Management > Cluster Information > Network Information and turn on the ENI access switch.

      Important

      Enabling or disabling ENI access interrupts database connections for approximately 2 minutes. No reads or writes are possible during this window. Evaluate the impact before making this change.

    • (Tunnel Arrow API only) Cluster kernel version V3.2.2.3 or later. To view and update the minor version, go to the Configuration Information section of the Cluster Information page.

  • The AccessKey ID and AccessKey secret of an Alibaba Cloud account or a Resource Access Management (RAM) user with read permissions on the MaxCompute project.

Sample data

The following examples use the MaxCompute project odps_project and table odps_nopart_import_test:

-- Create the source table in MaxCompute
CREATE TABLE IF NOT EXISTS odps_nopart_import_test (
    id int,
    name string,
    age int)
PARTITIONED BY (dt string);

-- Add a partition
ALTER TABLE odps_nopart_import_test
ADD PARTITION (dt='202207');

-- Insert sample data
INSERT INTO odps_project.odps_nopart_import_test
PARTITION (dt='202207')
VALUES (1,'james',10),(2,'bond',20),(3,'jack',30),(4,'lucy',40);

Import data using the Tunnel Record API

The Tunnel Record API supports two import methods: regular import (default) and elastic import.

Method How it works Use when
Regular import Reads source data on compute nodes and builds indexes on storage nodes. Consumes both compute and storage resources. General use; no additional resource group required.
Elastic import Reads source data and builds indexes inside a Serverless Spark job running in a job resource group. You want to minimize impact on real-time reads and writes and improve resource isolation. Requires Milvus V3.1.10.0 or later and a configured job resource group.

Regular import

  1. Open the SQL editor. Log on to the AnalyticDB for MySQL console. In the upper-left corner, select a region. In the left-side navigation pane, click Clusters, find and click the cluster ID, then choose Job Development > SQL Development.

  2. Create an external database.

    CREATE EXTERNAL DATABASE adb_external_db;
  3. Create an external table that maps to the MaxCompute source table.

    Note
    • The external table columns must match the MaxCompute source table exactly: same field names, same number of fields, same field order, and compatible data types. For all TABLE_PROPERTIES parameters, see CREATE EXTERNAL TABLE.

    Parameter Description
    ENGINE='ODPS' Specifies MaxCompute as the storage engine.
    endpoint The VPC endpoint of MaxCompute. Only VPC endpoints are supported. See VPC endpoints for endpoints by region.
    accessid The AccessKey ID with read permissions on the MaxCompute project.
    accesskey The AccessKey secret corresponding to the AccessKey ID.
    partition_column The partition column name. Omit this parameter if the MaxCompute source table is not partitioned.
    project_name The name of the MaxCompute workspace.
    table_name The name of the source table in MaxCompute.
    CREATE EXTERNAL TABLE IF NOT EXISTS adb_external_db.test_adb (
        id int,
        name varchar(1023),
        age int,
        dt string
    ) ENGINE='ODPS'
    TABLE_PROPERTIES='{
    "accessid":"yourAccessKeyID",
    "endpoint":"http://service.cn-hangzhou.maxcompute.aliyun.com/api",
    "accesskey":"yourAccessKeySecret",
    "partition_column":"dt",
    "project_name":"odps_project",
    "table_name":"odps_nopart_import_test"
    }';

    Key parameters:

  4. Verify the external table by querying the data.

    SELECT * FROM adb_external_db.test_adb;

    Expected output:

    +------+-------+------+---------+
    | id   | name  | age  |   dt    |
    +------+-------+------+---------+
    |    1 | james |   10 |  202207 |
    |    2 | bond  |   20 |  202207 |
    |    3 | jack  |   30 |  202207 |
    |    4 | lucy  |   40 |  202207 |
    +------+-------+------+---------+
  5. Create a destination database and table in AnalyticDB for MySQL.

    Note

    The destination table and the external table must have the same field order and number of fields, with compatible data types.

    CREATE DATABASE adb_demo;
    CREATE TABLE IF NOT EXISTS adb_demo.adb_import_test(
        id int,
        name string,
        age int,
        dt string,
        PRIMARY KEY(id,dt)
    )
    DISTRIBUTED BY HASH(id)
    PARTITION BY VALUE('dt');
  6. Import the data. Choose one of the following methods:

    • Method 1 — `INSERT INTO`: Imports data and ignores rows with duplicate primary keys (equivalent to INSERT IGNORE INTO). See INSERT INTO.

      INSERT INTO adb_demo.adb_import_test
      SELECT * FROM adb_external_db.test_adb;

      To import from a specific partition only:

      INSERT INTO adb_demo.adb_import_test
      SELECT * FROM adb_external_db.test_adb
      WHERE dt = '202207';
    • Method 2 — `INSERT OVERWRITE INTO`: Overwrites all existing data in the destination table.

      INSERT OVERWRITE INTO adb_demo.adb_import_test
      SELECT * FROM adb_external_db.test_adb;
    • Method 3 — Asynchronous `INSERT OVERWRITE INTO`: Submits the import as a background job. Add the /*+ direct_batch_load=true*/ hint to accelerate the job. See Asynchronous writes.

      SUBMIT JOB
      INSERT OVERWRITE INTO adb_demo.adb_import_test
      SELECT * FROM adb_external_db.test_adb;

      The command returns a job ID:

      +---------------------------------------+
      | job_id                                |
      +---------------------------------------+
      | 2020112122202917203100908203303****** |
      +---------------------------------------+

      Use the job ID to monitor or cancel the job. See Submit an asynchronous import task.

Elastic import

Elastic import runs the import inside a Serverless Spark job, reducing impact on real-time workloads. It requires Milvus V3.1.10.0 or later and a configured job resource group.

Important

Elastic import only supports INSERT OVERWRITE INTO. You cannot use INSERT INTO with elastic import.

  1. Open the SQL editor. Log on to the AnalyticDB for MySQL console. In the upper-left corner, select a region. In the left-side navigation pane, click Clusters, find and click the cluster ID, then choose Job Development > SQL Development.

  2. Create a database (skip if one already exists).

    CREATE DATABASE adb_demo;
  3. Create an external table.

    Note
    • The external table name must match the MaxCompute project name. Otherwise, the external table creation fails.

    • Elastic import requires creating external tables with CREATE TABLE (not CREATE EXTERNAL TABLE).

    • The external table columns must match the MaxCompute source table exactly: same field names, same number of fields, same field order, and same data types.

    CREATE TABLE IF NOT EXISTS test_adb
    (
        id int,
        name string,
        age int,
        dt string
    )
    ENGINE='ODPS'
    TABLE_PROPERTIES='{
    "endpoint":"http://service.cn-hangzhou.maxcompute.aliyun-inc.com/api",
    "accessid":"yourAccessKeyID",
    "accesskey":"yourAccessKeySecret",
    "partition_column":"dt",
    "project_name":"odps_project",
    "table_name":"odps_nopart_import_test"
    }';

    For TABLE_PROPERTIES parameter descriptions, see Parameter descriptions.

  4. Verify the external table by querying the data.

    SELECT * FROM adb_demo.test_adb;
  5. Create the destination table in AnalyticDB for MySQL.

    Note

    The destination table and the external table must have the same field names, number of fields, field order, and data types.

    CREATE TABLE IF NOT EXISTS adb_import_test
    (
        id int,
        name string,
        age int,
        dt string,
        PRIMARY KEY(id,dt)
    )
    DISTRIBUTED BY HASH(id)
    PARTITION BY VALUE('dt') LIFECYCLE 30;
  6. Import the data using one of the following methods. Both methods use the elastic_load and elastic_load_configs hint parameters to enable and configure the elastic import job. Set elastic_load to true to enable elastic import (default: false). Use elastic_load_configs to specify configuration parameters enclosed in [ ]; separate multiple parameters with |. elastic_load_configs accepts the following parameters:

    • Method 1 — Synchronous elastic import:

      /*+ elastic_load=true, elastic_load_configs=[adb.load.resource.group.name=resource_group]*/
      INSERT OVERWRITE INTO adb_demo.adb_import_test SELECT * FROM adb_demo.test_adb;
    • Method 2 — Asynchronous elastic import:

      Important

      Priority queues are not supported for asynchronous elastic import tasks.

      /*+ elastic_load=true, elastic_load_configs=[adb.load.resource.group.name=resource_group]*/
      SUBMIT JOB INSERT OVERWRITE INTO adb_demo.adb_import_test SELECT * FROM adb_demo.test_adb;

      The command returns a job ID. Use it to monitor or cancel the job. See Submit an asynchronous import task.

    Parameter Required Description
    adb.load.resource.group.name Yes The name of the job resource group that runs the elastic import job.
    adb.load.job.max.acu No Maximum resources for the elastic import job. Unit: AnalyticDB Compute Units (ACUs). Minimum: 5 ACUs. Default: number of shards plus 1. To query the number of shards: SELECT count(1) FROM information_schema.kepler_meta_shards;
    spark.driver.resourceSpec No Resource type of the Spark driver. Default: small. See the Type column in the Spark resource specifications table.
    spark.executor.resourceSpec No Resource type of the Spark executor. Default: large. See the Type column in the Spark resource specifications table.
    spark.adb.executorDiskSize No Disk capacity of the Spark executor. Valid values: (0, 100]. Unit: GiB. Default: 10 GiB. See Specify driver and executor resources.
  7. (Optional) Verify that the job ran as an elastic import task.

    SELECT job_name, (job_type = 3) AS is_elastic_load
    FROM INFORMATION_SCHEMA.kepler_meta_async_jobs
    WHERE job_name = "2023081818010602101701907303151******";

    Expected output:

    +---------------------------------------+------------------+
    | job_name                              | is_elastic_load  |
    +---------------------------------------+------------------+
    | 2023081517195203101701907203151****** |       1          |
    +---------------------------------------+------------------+

    is_elastic_load = 1 indicates an elastic import task. is_elastic_load = 0 indicates a regular import task.

Import data using the Tunnel Arrow API

The Tunnel Arrow API reads MaxCompute data in columns, which reduces data transfer volume and speeds up imports. It requires cluster kernel version V3.2.2.3 or later.

Step 1: Enable the Arrow API

Enable the Arrow API at the cluster level using SET ADB_CONFIG, or at the query level using a hint.

  • Cluster level (persists across queries):

    SET ADB_CONFIG <config_name>= <value>;
  • Query level (applies to a single query):

    /*<config_name>= <value>*/ SELECT * FROM table;

Arrow API configuration parameters:

Parameter Description
ODPS_TUNNEL_ARROW_ENABLED Enable the Arrow API. Valid values: true (enabled), false (disabled, default).
ODPS_TUNNEL_SPLIT_BY_SIZE_ENABLED Enable dynamic split. Valid values: true (enabled), false (disabled, default).

Step 2: Import MaxCompute data

Once the Arrow API is enabled, the import steps are the same as for regular import. Follow steps 1–6 in Regular import.

The cluster automatically uses the Tunnel Arrow API for all subsequent MaxCompute data access and import operations.

Import data (Data Warehouse Edition)

Data Warehouse Edition uses the Tunnel Record API with a shared public Data Transmission Service resource group.

  1. Connect to the AnalyticDB for MySQL cluster.

  2. Create the destination database.

    CREATE DATABASE test_adb;
  3. Create a MaxCompute external table. This example uses odps_nopart_import_test_external_table.

    Parameter Description
    ENGINE='ODPS' Specifies MaxCompute as the storage engine.
    endpoint The VPC endpoint of MaxCompute. Only VPC endpoints are supported. See VPC endpoints for endpoints by region.
    accessid The AccessKey ID of an Alibaba Cloud account or a RAM user with permissions to access MaxCompute. See Accounts and permissions.
    accesskey The AccessKey secret corresponding to the AccessKey ID. See Accounts and permissions.
    partition_column The partition column name. Omit this parameter if the MaxCompute source table is not partitioned.
    project_name The name of the MaxCompute workspace.
    table_name The name of the source table in MaxCompute.
    CREATE TABLE IF NOT EXISTS odps_nopart_import_test_external_table
    (
        id int,
        name string,
        age int,
        dt string
    )
    ENGINE='ODPS'
    TABLE_PROPERTIES='{
    "endpoint":"http://service.cn.maxcompute.aliyun-inc.com/api",
    "accessid":"yourAccessKeyID",
    "accesskey":"yourAccessKeySecret",
    "partition_column":"dt",
    "project_name":"odps_project1",
    "table_name":"odps_nopart_import_test"
    }';
  4. Create the destination table in AnalyticDB for MySQL.

    CREATE TABLE IF NOT EXISTS adb_nopart_import_test
    (
        id int,
        name string,
        age int,
        dt string,
        PRIMARY KEY(id,dt)
    )
    DISTRIBUTED BY HASH(id)
    PARTITION BY VALUE('dt') LIFECYCLE 30;
  5. Import the data. Choose one of the following methods:

    • Method 1 — `INSERT INTO`: Ignores rows with duplicate primary keys (equivalent to INSERT IGNORE INTO). See INSERT INTO.

      INSERT INTO adb_nopart_import_test
      SELECT * FROM odps_nopart_import_test_external_table;

      To verify the imported data:

      SELECT * FROM adb_nopart_import_test;

      Expected output:

      +------+-------+------+---------+
      | id   | name  | age  |   dt    |
      +------+-------+------+---------+
      |    1 | james |   10 |  202207 |
      |    2 | bond  |   20 |  202207 |
      |    3 | jack  |   30 |  202207 |
      |    4 | lucy  |   40 |  202207 |
      +------+-------+------+---------+

      To import from a specific partition only:

      INSERT INTO adb_nopart_import_test
      SELECT * FROM odps_nopart_import_test_external_table
      WHERE dt = '202207';
    • Method 2 — `INSERT OVERWRITE`: Overwrites all existing data in the destination table.

      INSERT OVERWRITE adb_nopart_import_test
      SELECT * FROM odps_nopart_import_test_external_table;
    • Method 3 — Asynchronous `INSERT OVERWRITE`: Submits the import as a background job. See Asynchronous write.

      SUBMIT JOB
      INSERT OVERWRITE adb_nopart_import_test
      SELECT * FROM odps_nopart_import_test_external_table;

      The command returns a job ID. Use it to monitor or cancel the job. See Submit an asynchronous import task.

Related topics