All Products
Search
Document Center

One-click metadata synchronization to ApsaraDB for RDS

Last Updated: May 23, 2019

Background

Data Lake Analytics (DLA) serves as the hub for in-cloud data processing. It allows you to query and analyze the data in ApsaraDB for RDS (MySQL, SQL Server, and PostgreSQL) through standard Java Database Connectivity (JDBC), and insert the results into ApsaraDB for RDS (RDS).

You can create an RDS schema, create mapping tables in the schema, and then read data from RDS in DLA. For more information, see Use DLA to read and write MySQL data. If the RDS database that uses the schema contains multiple data tables, you need to create a mapping table for each data table in DLA. This operation takes more and more time as the number of data tables increases. DLA supports one-step metadata synchronization from RDS. You only need to create an RDS schema in DLA, and then run the MSCK REPAIR TABLE command to synchronize the data of all tables in the database to DLA immediately.

This topic describes the one-step metadata synchronization feature by using ApsaraDB RDS for MySQL (MySQL) as an example.

Notes

Before you create a MySQL schema, SQL Server schema, or PostgreSQL schema in DLA, you must add the CIDR block 100.104.0.0/16 to the whitelist of your RDS instance.

Your RDS instance is in a VPC, and by default DLA has no access to resources in the VPC. To enable access to your RDS instance from DLA, you need to use reverse VPC technology, that is, add the CIDR block 100.104.0.0/16 to the whitelist of your RDS instance.

Permission statements: When you add the CIDR block 100.104.0.0/16 to the whitelist of your RDS instance, you agree to the use of reverse VPC technology to read and write RDS data.

Prerequisites

Test data is ready in your RDS instance. If not, follow these steps to prepare test data in the instance:

  1. Create a MySQL instance.

  2. Configure a whitelist.

  3. Create accounts and databases.

  4. Connect to the instance.

  5. Create tables.

Procedure

You can execute SQL statements on the DMS for Data Lake Analytics page to create a MySQL schema. Alternatively, you can connect to DLA by using a MySQL client or MySQL CLI tool, and then execute SQL statements to create a MySQL schema.

Step 1: Create an RDS schema

  1. Log on to the Data Lake Analytics console.

  2. In the left-side navigation pane, click Endpoint. In the endpoint list, locate the row that contains the target endpoint and click Log on in DMS in the Actions column. On the DMS for Data Lake Analytics page that appears, execute the following statement to create a MySQL schema in DLA:

    1. CREATE SCHEMA hello_mysql_vpc_rds WITH DBPROPERTIES (
    2. CATALOG = 'mysql',
    3. LOCATION = 'jdbc:mysql://rm-2zer0vg58mf*****.mysql.rds.aliyuncs.com:3306/rds_mysql_dbname',
    4. USER = 'rds_mysqldb_username',
    5. PASSWORD = 'rds_mysqldb_password',
    6. INSTANCE_ID = 'rds_mysql_instance_id',
    7. VPC_ID = 'rds_mysqldb_vpcid'
    8. );
    • VPC_ID: the ID of the VPC where your MySQL instance is located, as shown in the following figure.

      1

    • INSTANCE_ID: the ID of your MySQL instance. You can view the instance ID on the Instance Details page.

Step 2: Run the MSCK REPAIR TABLE command to synchronize data in tables to DLA.

Run show tables to check whether tables exist in the schema:

  1. mysql> use hello_mysql_vpc_rds;
  2. Database changed
  3. mysql> show tables;
  4. Empty set (0.01 sec)

Execute the following SQL statement to synchronize data of all the tables in the RDS database to the schema.

  1. mysql> msck repair database hello_mysql_vpc_rds;
  2. +-------------------------------+-----------+
  3. | TableName | Operation |
  4. +-------------------------------+-----------+
  5. | admin_acc_log | CREATED |
  6. | fractest | CREATED |
  7. | person | CREATED |
  8. | prep_stmt_test | CREATED |
  9. | staples | CREATED |
  10. | test_datetime | CREATED |
  11. | top10_user | CREATED |
  12. | type_test | CREATED |
  13. | world1 | CREATED |
  14. | world2 | CREATED |
  15. +-------------------------------+-----------+
  16. 17 rows in set (1.60 sec)
  17. mysql> show tables;
  18. +-------------------------------+
  19. | Table_Name |
  20. +-------------------------------+
  21. | admin_acc_log |
  22. | fractest |
  23. | person |
  24. | prep_stmt_test |
  25. | staples |
  26. | test_datetime |
  27. | top10_user |
  28. | type_test |
  29. | world1 |
  30. | world2 |
  31. +-------------------------------+
  32. 17 rows in set (0.02 sec)

After all data in RDS is synchronized to DLA, you can manage the data in DLA.