All Products
Search
Document Center

AnalyticDB for MySQL:Use external tables to import data to Data Warehouse Edition

Last Updated:Jun 20, 2023

This topic describes how to use external tables to import data from MaxCompute to AnalyticDB for MySQL Data Warehouse Edition (V3.0).

Prerequisites

  • A MaxCompute project and a MaxCompute table are created. For more information about the MaxCompute project, table, and data used in this example, see the "Sample data" section of this topic.

  • An AnalyticDB for MySQL cluster is created in the same region as the MaxCompute project. For more information, see Create a Data Warehouse Edition (V3.0) cluster.

  • The destination AnalyticDB for MySQL database is created.

  • The CIDR blocks of the virtual private cloud (VPC) in which the AnalyticDB for MySQL cluster is located are added to a whitelist of the MaxCompute project.

    You can log on to the AnalyticDB for MySQL console and view the VPC ID on the Cluster Information page. Then, you can log on to the VPC console and find the VPC ID on the VPCs page to view the CIDR blocks. For more information about how to configure a MaxCompute whitelist, see Manage IP address whitelists.

Sample data

In this example, a MaxCompute project named odps_project1 and a MaxCompute table named odps_nopart_import_test are used. The following statement shows how to create this table:

CREATE TABLE IF NOT EXISTS odps_nopart_import_test (
    id int,
    name string,
    age int)
partitioned by (dt string);

The following statement shows how to create a partition in the odps_nopart_import_test table:

ALTER TABLE odps_nopart_import_test 
ADD 
PARTITION (dt='202207');

The following statement shows how to insert data to the partition:

INSERT INTO odps_project1.odps_nopart_import_test 
PARTITION (dt='202207') 
VALUES (1,'james',10),(2,'bond',20),(3,'jack',30),(4,'lucy',40);

Procedure

The following example demonstrates how to import data from the odps_nopart_import_test table to an AnalyticDB for MySQL database named test_adb.

  1. Connect to the AnalyticDB for MySQL cluster and enter the test_adb database.

  2. Execute the following CREATE TABLE statement to create an external table named odps_nopart_import_test_external_table in the test_adb database:

    CREATE TABLE IF NOT EXISTS odps_nopart_import_test_external_table
    (
        id int,
        name string,
        age int,
        dt string
    )
     ENGINE='ODPS'
     TABLE_PROPERTIES='{
     "endpoint":"http://service.cn.maxcompute.aliyun-inc.com/api",
     "accessid":"L*******FsE",
     "accesskey":"CcwF********iWjv",
     "partition_column":"dt",
     "project_name":"odps_project1",
     "table_name":"odps_nopart_import_test"
     }';                 

    Parameter

    Description

    ENGINE='ODPS'

    The storage engine of the external table. To read and write MaxCompute data, set the storage engine to ODPS.

    endpoint

    The endpoint of MaxCompute.

    Note

    AnalyticDB for MySQL can access MaxCompute only by using VPC endpoints.

    For more information about the VPC endpoint of each region, see Endpoints in different regions (VPC).

    accessid

    The AccessKey ID of an Alibaba Cloud account or a Resource Access Management (RAM) user used to access MaxCompute.

    For more information about how to obtain an AccessKey ID and an AccessKey secret, see Accounts and permissions.

    accesskey

    The AccessKey secret of an Alibaba Cloud account or a RAM user used to access MaxCompute.

    For more information about how to obtain an AccessKey ID and an AccessKey secret, see Accounts and permissions.

    partition_column

    The partition column. The partition_column parameter is required to create a partitioned table. If the MaxCompute table does not contain partitions, you must create a table without partitions in AnalyticDB for MySQL and do not need to configure the partition_column parameter.

    project_name

    The name of the MaxCompute project.

    table_name

    The name of the MaxCompute table.

  3. Execute the following CREATE TABLE statement to create a table named adb_nopart_import_test in the test_adb database to store data imported from MaxCompute:

    CREATE TABLE IF NOT EXISTS adb_nopart_import_test
    (   id int,
        name string,
        age int,
        dt string,
        PRIMARY KEY(id,dt)
    )
    DISTRIBUTE BY HASH(id)
    PARTITION BY VALUE('dt') LIFECYCLE 30;  
  4. Use the external table odps_nopart_import_test_external_table to import data from MaxCompute to the adb_nopart_import_test table of AnalyticDB for MySQL.

    • Method 1: Execute an INSERT INTO statement to import data. If the primary key has duplicate values, data is not repeatedly inserted and the INSERT INTO statement is equivalent to INSERT IGNORE INTO. For more information, see INSERT INTO.

      INSERT INTO adb_nopart_import_test
      SELECT * FROM odps_nopart_import_test_external_table;   

      Execute a SELECT statement to query the data written to the table.

      SELECT * FROM adb_nopart_import_test;

      The following information is returned:

      +------+-------+------+---------+
      | id   | name  | age  |   dt    |
      +------+-------+------+---------+
      |    1 | james |   10 |  202207 |
      |    2 | bond  |   20 |  202207 |
      |    3 | jack  |   30 |  202207 |
      |    4 | lucy  |   40 |  202207 |
      +------+-------+------+---------+

      If you need to import data from a specific partition to the adb_nopart_import_test table, you can execute the following statement:

      INSERT INTO adb_nopart_import_test
      SELECT * FROM odps_nopart_import_test_external_table 
      WHERE dt = '202207'; 
    • Method 2: Execute an INSERT OVERWRITE statement to import data. If the primary key has duplicate values, the original value is overwritten by the new value.

      INSERT OVERWRITE adb_nopart_import_test
      SELECT * FROM odps_nopart_import_test_external_table;
    • Method 3: Execute a SUBMIT JOB statement that contains INSERT OVERWRITE to submit an asynchronous import task. You can add a hint (/* direct_batch_load=true*/) before the data import statement to accelerate the task. For more information, see the "Asynchronous writing" section of the INSERT OVERWRITE SELECT topic.

      SUBMIT JOB
      INSERT OVERWRITE adb_nopart_import_test
      SELECT * FROM odps_nopart_import_test_external_table;  

      The following information is returned:

      +---------------------------------------+
      | job_id                                |
      +---------------------------------------+
      | 2020112122202917203100908203303****** |

      For more information about how to asynchronously submit a task, see Asynchronously submit an import task.