All Products
Search
Document Center

AnalyticDB for MySQL:Use external tables to import data to Data Warehouse Edition

Last Updated:Mar 11, 2024

AnalyticDB for MySQL allows you to access and import MaxCompute data by using external tables. This maximizes the utilization of cluster resources and improves data import performance. This topic describes how to use external tables to import data from MaxCompute to AnalyticDB for MySQL Data Warehouse Edition (V3.0).

Prerequisites

  • A MaxCompute project and an AnalyticDB for MySQL cluster are created in the same region. For more information, see Create a cluster.

  • The CIDR blocks of the virtual private cloud (VPC) in which the AnalyticDB for MySQL cluster is located are added to a whitelist of the MaxCompute project.

    You can log on to the AnalyticDB for MySQL console to view the VPC ID on the Cluster Information page. Then, you can log on to the VPC console and find the VPC ID on the VPCs page to view the CIDR blocks. For information about how to configure a MaxCompute whitelist, see Manage IP address whitelists.

Sample data

In this example, a MaxCompute project named odps_project1 and a MaxCompute table named odps_nopart_import_test are used. Execute the following statement to create a table:

CREATE TABLE IF NOT EXISTS odps_nopart_import_test (
    id int,
    name string,
    age int)
partitioned by (dt string);

Execute the following statement to create a partition in the odps_nopart_import_test table:

ALTER TABLE odps_nopart_import_test 
ADD 
PARTITION (dt='202207');

Execute the following statement to insert data into the partition:

INSERT INTO odps_project1.odps_nopart_import_test 
PARTITION (dt='202207') 
VALUES (1,'james',10),(2,'bond',20),(3,'jack',30),(4,'lucy',40);

Access and import MaxCompute data in tunnel mode

  1. Connect to the AnalyticDB for MySQL cluster. For more information, see Connect to an AnalyticDB for MySQL cluster.

  2. Create a database.

    CREATE database test_adb;
  3. Create a MaxCompute external table. In this example, an external table named odps_nopart_import_test_external_table is used.

    CREATE TABLE IF NOT EXISTS odps_nopart_import_test_external_table
    (
        id int,
        name string,
        age int,
        dt string
    )
     ENGINE='ODPS'
     TABLE_PROPERTIES='{
     "endpoint":"http://service.cn.maxcompute.aliyun-inc.com/api",
     "accessid":"L*******FsE",
     "accesskey":"CcwF********iWjv",
     "partition_column":"dt",
     "project_name":"odps_project1",
     "table_name":"odps_nopart_import_test"
     }';                 

    Parameter

    Description

    ENGINE='ODPS'

    The storage engine of the external table. To read and write MaxCompute data, set the storage engine to ODPS.

    endpoint

    The endpoint of MaxCompute.

    Note

    AnalyticDB for MySQL can access MaxCompute only by using VPC endpoints.

    For information about the VPC endpoint of each region, see the "Endpoints in different regions (VPC)" section of the Endpoints topic.

    accessid

    The AccessKey ID of an Alibaba Cloud account or a Resource Access Management (RAM) user used to access MaxCompute.

    For information about how to obtain an AccessKey ID and an AccessKey secret, see Accounts and permissions.

    accesskey

    The AccessKey secret of an Alibaba Cloud account or a RAM user used to access MaxCompute.

    For information about how to obtain an AccessKey ID and an AccessKey secret, see Accounts and permissions.

    partition_column

    The partition column. The partition_column parameter is required to create a partitioned table. If the MaxCompute table does not contain partitions, you must create a table without partitions in AnalyticDB for MySQL and do not need to configure the partition_column parameter.

    project_name

    The name of the MaxCompute project.

    table_name

    The name of the MaxCompute table.

  4. Create a table named adb_nopart_import_test in the test_adb database to store data imported from MaxCompute.

    CREATE TABLE IF NOT EXISTS adb_nopart_import_test
    (   id int,
        name string,
        age int,
        dt string,
        PRIMARY KEY(id,dt)
    )
    DISTRIBUTE BY HASH(id)
    PARTITION BY VALUE('dt') LIFECYCLE 30;  
  5. Import data.

    • Method 1: Execute the INSERT INTO statement to import data. If the primary key has duplicate values, data is not repeatedly inserted and the INSERT INTO statement is equivalent to the INSERT IGNORE INTO statement. For more information, see INSERT INTO.

      INSERT INTO adb_nopart_import_test SELECT * FROM odps_nopart_import_test_external_table;   

      Execute the SELECT statement to query the data written to the table.

      SELECT * FROM adb_nopart_import_test;

      Sample result:

      +------+-------+------+---------+
      | id   | name  | age  |   dt    |
      +------+-------+------+---------+
      |    1 | james |   10 |  202207 |
      |    2 | bond  |   20 |  202207 |
      |    3 | jack  |   30 |  202207 |
      |    4 | lucy  |   40 |  202207 |
      +------+-------+------+---------+

      If you want to import data from a specific partition to the adb_nopart_import_test table, you can execute the following statement:

      INSERT INTO adb_nopart_import_test SELECT * FROM odps_nopart_import_test_external_table WHERE dt = '202207'; 
    • Method 2: Execute the INSERT OVERWRITE statement to import data. If the primary key has duplicate values, the original value is overwritten by the new value.

      INSERT OVERWRITE adb_nopart_import_test
      SELECT * FROM odps_nopart_import_test_external_table;
    • Method 3: Execute the INSERT OVERWRITE statement to asynchronously import data. Typically, the SUBMIT JOB statement is used to submit an asynchronous job. You can add a hint before the data import statement to accelerate the job. For more information, see the "Asynchronous writing" section of the INSERT OVERWRITE SELECT topic.

      SUBMIT JOB INSERT OVERWRITE adb_nopart_import_test SELECT * FROM odps_nopart_import_test_external_table;  

      Sample result:

      +---------------------------------------+
      | job_id                                |
      +---------------------------------------+
      | 2020112122202917203100908203303****** |

      For information about how to asynchronously submit a job, see Asynchronously submit an import job.