This topic describes how to use the Presto service of an E-MapReduce (EMR) cluster to query table data in an ApsaraDB RDS for MySQL database. To use this feature, make sure that the EMR cluster and ApsaraDB RDS for MySQL database are created in the same virtual private cloud (VPC).
Prerequisites
- An EMR cluster is created, and Presto is selected from the optional services when you create the cluster. For more information, see Create a cluster.
- An ApsaraDB RDS for MySQL instance is purchased. For more information, see Create an ApsaraDB RDS for MySQL instance.
Note We recommend that you select 5.7 from the MySQL drop-down list for Database Engine and set Edition to High-availability when you create an ApsaraDB RDS for MySQL instance.
Background information
Step 1: Configure a connector
- Go to the Presto service page.
- Log on to the Alibaba Cloud EMR console.
- In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
- Click the Cluster Management tab.
- On the Cluster Management page, find your cluster and click Details in the Actions column.
- In the left-side navigation pane, choose .
- Configure connector-related parameters.
- Save the configurations.
- In the upper-right corner of the Service Configuration section, click Save.
- In the Confirm Changes dialog box, configure Description and turn on Auto-update Configuration.
- Click OK.
- Restart the service for the configurations to take effect.
- In the upper-right corner of the Presto service page, choose .
- In the Cluster Activities dialog box, configure Description.
- Click OK.
- In the Confirm message, click OK.
Step 2: View information about the ApsaraDB RDS for MySQL database
Step 3: Query table data
Note In this example,
hive.default.tbl_department
is a Hive table, and connector1.emruser.tbl_employee
is a MySQL table.
- Query data in the emruser.tbl_employee table.
select * from connector1.emruser.tbl_employee;
Data similar to the following example is returned:id | name | dept_id | salary ----+------------+---------+--------- 1 | Ming Li | 1 | 10000.0 2 | Eric Cai | 1 | 11000.0 3 | Bonnie Liu | 2 | 11000.0 (3 rows)
- Query data in the tbl_department table.
select * from hive.default.tbl_department;
Data similar to the following example is returned:dept_id | dept_name --------+---------- 1 | IT 2 | Finance (2 rows)
- Perform a cross query on data in the tables.
Run the following command to perform a cross query on data in the hive.default.tbl_department and connector1.emruser.tbl_employee tables.
select * from hive.default.tbl_department a, connector1.emruser.tbl_employee b where a.dept_id = b.dept_id;
The following information is returned:dept_id |dept_name | id | name | dept_id | salary --------+----------+----+------------+---------+--------- 2 | Finance | 3 | Bonnie Liu | 2 | 11000.0 1 | IT | 2 | Eric Cai | 1 | 11000.0 1 | IT | 1 | Ming Li | 1 | 10000.0 (3 rows)