All Products
Search
Document Center

Perform joint query on multiple MySQL instances in DLA

Last Updated: May 23, 2019

Background

Data Lake Analytics (DLA) serves as the hub for in-cloud data processing. It allows you to query and analyze the data in a single ApsaraDB for RDS (RDS), Table Store, OSS, ApsaraDB for MongoDB (MongoDB), or ApsaraDB for Redis (Redis) instance through standard Java Database Connectivity (JDBC). To support business growth or data sharding, you may need to create multiple instances of RDS, Table Store, OSS, or other types to store data. In such multi-data source scenarios, you can use DLA to perform joint queries in multiple data sources of the same type.

This topic takes two ApsaraDB RDS for MySQL (MySQL) instances as an example to demonstrate how to use DLA to perform joint queries in multiple database instances.

Prerequisites

Test data is ready in the MySQL instances. If not, follow these steps to insert test data into the MySQL instances:

  1. Create two MySQL instances.

    DLA and the two MySQL instances must be in the same region. Otherwise, the operations described in this topic are not allowed.

    DLA connects to a MySQL instance over the VPC of the MySQL instance. Therefore, we recommend that you select VPC as the network type when creating the MySQL instances. In addition, you can change the network type of the MySQL instances from classic network to VPC by following the instructions provided in Set the network type.

  2. Configure a whitelist.

  3. Connect to the instance.

  4. Insert test data.

    • In MySQL instance 1, create a database named orders_db, create a table named order_item in the database, and then insert the following data into the table:

      1. create table orders_db.order_item (
      2. id bigint not null primary key auto_increment,
      3. prod_id bigint comment 'product ID',
      4. prod_cnt int comment 'product quantity'
      5. );
      6. insert into orders_db.order_item values]
      7. (1, 1, 2),
      8. (2, 2, 3),
      9. (3, 3, 4),
      10. (4, 2, 5),
      11. (5, 1, 6);
    • In MySQL instance 2, create a database named prod_db, create a table named prod in the database, and then insert the following data into the table:

      1. create table prod_db.prod (
      2. id bigint not null primary key auto_increment,
      3. prod_name varchar(31) comment 'product name'
      4. );
      5. insert into prod_db.prod values
      6. (1, 'keyboard'),
      7. (2, 'mouse'),
      8. (3, 'monitor');

Procedure

Notes

Before creating a connection to the MySQL database in DLA, you must add the CIDR block 100.104.0.0/16 to the MySQL whitelist.

Your MySQL instances are in a VPC, and by default DLA has no access to resources in the VPC. To enable access to the MySQL instances from DLA, use the reverse VPC technology, that is, add the CIDR block 100.104.0.0/16 to the MySQL whitelist.

Permission statements: When you add the CIDR block 100.104.0.0/16 to the whitelist, you agree to the use of reverse VPC technology to read data from the MySQL instances.

Step 1: Create two databases mapped to the two MySQL databases in DLA

Log on to the Data Lake Analytics console. In the endpoint list, locate the row that contains the target endpoint and click Log on in DMS in the Actions column to create two databases mapped to the two MySQL databases in DLA.

  1. CREATE SCHEMA dla_orders_db WITH DBPROPERTIES (
  2. CATALOG = 'mysql',
  3. LOCATION = 'jdbc:mysql://rm-******.mysql.rds.aliyuncs.com:3306/orders_db',
  4. USER = 'mysql_db_user_name',
  5. PASSWORD = 'mysql_db_password',
  6. VPC_ID = 'mysql_vpc_id',
  7. INSTANCE_ID = 'mysql_instance_id'
  8. );
  9. CREATE SCHEMA dla_prod_db WITH DBPROPERTIES (
  10. CATALOG = 'mysql',
  11. LOCATION = 'jdbc:mysql://rm-******.mysql.rds.aliyuncs.com:3306/prod_db',
  12. USER = 'mysql_db_user_name',
  13. PASSWORD = 'mysql_db_password',
  14. VPC_ID = 'mysql_vpc_id',
  15. INSTANCE_ID = 'mysql_instance_id'
  16. );

Parameter description

  • LOCATION: the URL in the format of jdbc:mysql://MySQL intranet address:Port/MySQL database name.

    1

  • USER: the username that is used to connect to the MySQL database.

  • PASSWORD: the password of the username that is used to connect to the MySQL database.

  • VPC_ID: the ID of the VPC where the MySQL instance is located.

  • INSTANCE_ID: the ID of your MySQL instance.

Step 2: Create a MySQL external table in DLA

For the order_item table in orders_db on MySQL instance 1, create an external table named order_item in DMS for Data Lake Analytics:

  1. create external table order_item (
  2. id bigint,
  3. prod_id bigint,
  4. prod_cnt int,
  5. );

For the prod table in prod_db on MySQL instance 2, create an external table named prod in DMS for Data Lake Analytics:

  1. create external table prod (
  2. id bigint,
  3. prod_name varchar(31)
  4. );

Step 3: Use DLA to perform joint queries on multiple MySQL instances

After the MySQL database connections and external tables are created, you can use your MySQL client or MySQL CLI tool to connect to DLA and execute standard SQL statements in the MySQL databases.

Alternatively, you can directly perform operations in the MySQL databases in DMS for Data Lake Analytics.

The following example uses the MySQL CLI tool to connect to DLA and perform joint queries in the order_item and prod tables:

  1. mysql> select * from dla_orders_db.order_item;
  2. +------+---------+----------+
  3. | id | prod_id | prod_cnt |
  4. +------+---------+----------+
  5. | 1 | 1 | 2 |
  6. | 2 | 2 | 3 |
  7. | 3 | 3 | 4 |
  8. | 4 | 2 | 5 |
  9. | 5 | 1 | 6 |
  10. +------+---------+----------+
  11. 5 rows in set (0.22 sec)
  12. mysql> select * from dla_prod_db.prod;
  13. +------+-----------+
  14. | id | prod_name |
  15. +------+-----------+
  16. | 1 | Keyboard |
  17. | 2 | Mouse |
  18. | 3 | Monitor |
  19. +------+-----------+
  20. 3 rows in set (0.18 sec)
  21. select p.prod_name, sum(o.prod_cnt) cnt
  22. from dla_prod_db.prod p inner join dla_orders_db.order_item o
  23. on p.id = o.prod_id
  24. group by p.prod_name
  25. order by cnt desc;
  26. +------+-----------+
  27. | prod_name | cnt |
  28. +-----------+------+
  29. | Keyboard | 8 |
  30. | Mouse | 8 |
  31. | Monitor | 4 |
  32. +-----------+------+
  33. 3 rows in set (0.55 sec)

Subsequent operations

Use DLA to store the data that is obtained by the joint queries in Alibaba Cloud Object Storage Service (OSS). OSS is an Alibaba Cloud storage service that features massive capacity, robust security, low cost, and high reliability. When needed, data that is stored in OSS can be read directly by using DLA, without moving the OSS files.