This topic describes how to use the Presto service of an E-MapReduce (EMR) cluster to query table data in an ApsaraDB RDS for MySQL database. The EMR cluster and ApsaraDB RDS for MySQL database are created in the same VPC.

Prerequisites

  • An EMR cluster is created, and Presto is selected from the optional services during the cluster creation. For more information, see Create a cluster.
  • An ApsaraDB RDS for MySQL instance is purchased. For more information, see Create an ApsaraDB RDS for MySQL instance.
    Note We recommend that you select 5.7 from the MySQL drop-down list for Database Engine and set Edition to High-availability when you create an ApsaraDB RDS for MySQL instance.

Background information

For terms related to Presto, see Terms.

Step 1: Configure a connector

  1. Go to the Presto service page.
    1. Log on to the Alibaba Cloud EMR console.
    2. In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
    3. Click the Cluster Management tab.
    4. On the Cluster Management page, find your cluster and click Details in the Actions column.
    5. In the left-side navigation pane, choose Cluster Service > Presto.
  2. Configure connector-related parameters.
    1. On the Presto service page, click the Configure tab.
    2. In the Service Configuration section, click the connector1.properties tab.
      To connect to multiple ApsaraDB RDS for MySQL databases, you can also configure parameters on the connector2.properties and connector3.properties tabs.
    3. Set connector.name to mysql.
    4. In the upper-right corner of the Service Configuration section, click Custom Configuration.
    5. In the Add Configuration Item dialog box, add the parameters that are described in the following table.
      Parameter Description
      connection-user The username that is used to access the database. In this example, the username is hiveuser.
      connection-password The password that is used to access the database.
      connection-url The URL of the database. For more information, see View and change the internal and public endpoints and port numbers of an ApsaraDB RDS for MySQL instance.

      Example: jdbc:mysql://rm-2ze5ipacsu8265qxxxxxxxx.mysql.rds.aliyuncs.com:3306.

  3. Save the configurations.
    1. In the upper-right corner of the Service Configuration section, click Save.
    2. In the Confirm Changes dialog box, specify Description and turn on Auto-update Configuration.
    3. Click OK.
  4. Restart the service for the configurations to take effect.
    1. In the upper-right corner of the Presto service page, choose Actions > Restart All Components.
    2. In the Cluster Activities dialog box, specify Description.
    3. Click OK.
    4. In the Confirm message, click OK.

Step 2: View information about the ApsaraDB RDS for MySQL database

  1. Log on to the master node of the EMR cluster in SSH mode.
  2. Run the following command to connect to the Presto client:
    presto --server emr-header-1:9090 --catalog hive --schema default --user hadoop
    If the following information is returned, the Presto client is connected:
    presto:default>
  3. Run the following command to view the schema from connector1.properties:
    show schemas from connector1;
    Information similar to the following example is returned:
          Schema
    --------------------
     emruser
     information_schema
     performance_schema
     sys
    (4 rows)

Step 3: Query table data

Note In this example, hive.default.tbl_department is a Hive table and connector1.emruser.tbl_employee is a MySQL table.
  • Query data in the connector1.emruser.tbl_employee table.
    select * from connector1.emruser.tbl_employee;
    Data similar to the following example is returned:
     id |    name    | dept_id | salary
    ----+------------+---------+---------
      1 | Ming Li    |       1 | 10000.0
      2 | Eric Cai   |       1 | 11000.0
      3 | Bonnie Liu |       2 | 11000.0
    (3 rows)                       
  • Query data in the hive.default.tbl_department table.
    select * from hive.default.tbl_department;
    Data similar to the following example is returned:
    dept_id | dept_name
    --------+----------
          1 | IT
          2 | Finance
    (2 rows)
  • Perform a cross query on data in the tables.
    Run the following command to perform a cross query on data in the hive.default.tbl_department and connector1.emruser.tbl_employee tables.
    select * from hive.default.tbl_department a, connector1.emruser.tbl_employee b where a.dept_id = b.dept_id;
    The following information is returned:
    dept_id |dept_name | id |    name    | dept_id | salary
    --------+----------+----+------------+---------+---------
          2 | Finance  |  3 | Bonnie Liu |       2 | 11000.0
          1 | IT       |  2 | Eric Cai   |       1 | 11000.0
          1 | IT       |  1 | Ming Li    |       1 | 10000.0
    (3 rows)