This topic describes how to use external tables to import data from MaxCompute to AnalyticDB for MySQL Data Warehouse Edition (V3.0).
Prerequisites
A MaxCompute project and a MaxCompute table are created. For more information about the MaxCompute project, table, and data used in this example, see the "Sample data" section of this topic.
An AnalyticDB for MySQL cluster is created in the same region as the MaxCompute project. For more information, see Create a Data Warehouse Edition (V3.0) cluster.
The destination AnalyticDB for MySQL database is created.
The CIDR blocks of the virtual private cloud (VPC) in which the AnalyticDB for MySQL cluster is located are added to a whitelist of the MaxCompute project.
You can log on to the AnalyticDB for MySQL console and view the VPC ID on the Cluster Information page. Then, you can log on to the VPC console and find the VPC ID on the VPCs page to view the CIDR blocks. For more information about how to configure a MaxCompute whitelist, see Manage IP address whitelists.
Sample data
In this example, a MaxCompute project named odps_project1
and a MaxCompute table named odps_nopart_import_test
are used. The following statement shows how to create this table:
CREATE TABLE IF NOT EXISTS odps_nopart_import_test (
id int,
name string,
age int)
partitioned by (dt string);
The following statement shows how to create a partition in the odps_nopart_import_test
table:
ALTER TABLE odps_nopart_import_test
ADD
PARTITION (dt='202207');
The following statement shows how to insert data to the partition:
INSERT INTO odps_project1.odps_nopart_import_test
PARTITION (dt='202207')
VALUES (1,'james',10),(2,'bond',20),(3,'jack',30),(4,'lucy',40);
Procedure
The following example demonstrates how to import data from the odps_nopart_import_test
table to an AnalyticDB for MySQL database named test_adb
.
Connect to the AnalyticDB for MySQL cluster and enter the
test_adb
database.Execute the following CREATE TABLE statement to create an external table named
odps_nopart_import_test_external_table
in thetest_adb
database:CREATE TABLE IF NOT EXISTS odps_nopart_import_test_external_table ( id int, name string, age int, dt string ) ENGINE='ODPS' TABLE_PROPERTIES='{ "endpoint":"http://service.cn.maxcompute.aliyun-inc.com/api", "accessid":"L*******FsE", "accesskey":"CcwF********iWjv", "partition_column":"dt", "project_name":"odps_project1", "table_name":"odps_nopart_import_test" }';
Parameter
Description
ENGINE='ODPS'
The storage engine of the external table. To read and write MaxCompute data, set the storage engine to ODPS.
endpoint
The endpoint of MaxCompute.
NoteAnalyticDB for MySQL can access MaxCompute only by using VPC endpoints.
For more information about the VPC endpoint of each region, see Endpoints in different regions (VPC).
accessid
The AccessKey ID of an Alibaba Cloud account or a Resource Access Management (RAM) user used to access MaxCompute.
For more information about how to obtain an AccessKey ID and an AccessKey secret, see Accounts and permissions.
accesskey
The AccessKey secret of an Alibaba Cloud account or a RAM user used to access MaxCompute.
For more information about how to obtain an AccessKey ID and an AccessKey secret, see Accounts and permissions.
partition_column
The partition column. The
partition_column
parameter is required to create a partitioned table. If the MaxCompute table does not contain partitions, you must create a table without partitions in AnalyticDB for MySQL and do not need to configure thepartition_column
parameter.project_name
The name of the MaxCompute project.
table_name
The name of the MaxCompute table.
Execute the following CREATE TABLE statement to create a table named
adb_nopart_import_test
in thetest_adb
database to store data imported from MaxCompute:CREATE TABLE IF NOT EXISTS adb_nopart_import_test ( id int, name string, age int, dt string, PRIMARY KEY(id,dt) ) DISTRIBUTE BY HASH(id) PARTITION BY VALUE('dt') LIFECYCLE 30;
Use the external table
odps_nopart_import_test_external_table
to import data from MaxCompute to theadb_nopart_import_test
table of AnalyticDB for MySQL.Method 1: Execute an INSERT INTO statement to import data. If the primary key has duplicate values, data is not repeatedly inserted and the INSERT INTO statement is equivalent to
INSERT IGNORE INTO
. For more information, see INSERT INTO.INSERT INTO adb_nopart_import_test SELECT * FROM odps_nopart_import_test_external_table;
Execute a SELECT statement to query the data written to the table.
SELECT * FROM adb_nopart_import_test;
The following information is returned:
+------+-------+------+---------+ | id | name | age | dt | +------+-------+------+---------+ | 1 | james | 10 | 202207 | | 2 | bond | 20 | 202207 | | 3 | jack | 30 | 202207 | | 4 | lucy | 40 | 202207 | +------+-------+------+---------+
If you need to import data from a specific partition to the
adb_nopart_import_test
table, you can execute the following statement:INSERT INTO adb_nopart_import_test SELECT * FROM odps_nopart_import_test_external_table WHERE dt = '202207';
Method 2: Execute an INSERT OVERWRITE statement to import data. If the primary key has duplicate values, the original value is overwritten by the new value.
INSERT OVERWRITE adb_nopart_import_test SELECT * FROM odps_nopart_import_test_external_table;
Method 3: Execute a SUBMIT JOB statement that contains INSERT OVERWRITE to submit an asynchronous import task. You can add a hint (
/* direct_batch_load=true*/
) before the data import statement to accelerate the task. For more information, see the "Asynchronous writing" section of the INSERT OVERWRITE SELECT topic.SUBMIT JOB INSERT OVERWRITE adb_nopart_import_test SELECT * FROM odps_nopart_import_test_external_table;
The following information is returned:
+---------------------------------------+ | job_id | +---------------------------------------+ | 2020112122202917203100908203303****** |
For more information about how to asynchronously submit a task, see Asynchronously submit an import task.