You can use the MySQL connector to query data in tables in an external MySQL instance and create tables in the instance. You can use the connector to join tables between two systems such as MySQL and Hive, or between two MySQL instances.
Background information
Prerequisites
- A Hadoop or data lake cluster with the Presto service deployed is created, or a Presto cluster is created. For more information, see Create a cluster.
- An ApsaraDB RDS for MySQL instance is purchased. For more information, see Create an ApsaraDB RDS for MySQL instance.
Limits
- Only Hadoop or Presto clusters of EMR V3.38.0 and later support the MySQL connector.
- You can use the MySQL connector to connect only to MySQL 5.7, 8.0, or later.
- You must make sure that the coordinator node and all the worker nodes of Presto can access MySQL. The default port is 3306.
- The MySQL connector does not support the following SQL statements:
- DELETE
- GRANT
- REVOKE
- SHOW GRANTS
- SHOW ROLES
- SHOW ROLE GRANTS
Configure the MySQL connector
Modify the configurations of the MySQL connector. For more information, see Configure a connector.
Default configurations of the connector
Configuration item | Description |
---|---|
connection-url | The URL of the ApsaraDB RDS for MySQL database that you want to access. For more information,
see View and change the internal and public endpoints and port numbers of an ApsaraDB
RDS for MySQL instance.
Example: jdbc:mysql://rm-2ze5ipacsu8265q****.mysql.rds.aliyuncs.com:3306. |
connection-user | The username of the account that is used to access the database. You must make sure that the account has the permissions to access tables in the database. |
connection-password | The password of the account that is specified by the connection-user configuration item. |
Configure multiple MySQL services
If you want to connect to multiple MySQL services, you can create multiple configuration files in the etc/catalog directory. Make sure that the file names are different and the file name extension is .properties. For example, if you create a configuration file named sales.properties, Presto uses the connector configured in the file to create a catalog named sales.
Example
- Go to the Configure tab of the Presto service in the EMR console. On the Configure tab, click the mysql.properties tab. Modify the connection-user, connection-password, and connection-url configuration items.
You can also click Add Configuration Item in the upper part of the Configure tab to add custom configuration items. For more information, see Add configuration items..
- Save the configurations.
- In the lower part of the Configure tab, click Save.
- In the Save dialog box, enter a reason in the Execution Reason field, turn on Automatically Update Configurations, and then click Save.
- Restart the Presto service.
- In the upper-right corner, choose .
- In the dialog box that appears, enter a reason in the Execution Reason field and click OK.
- In the Confirm message, click OK.
Data type mappings
Processing of data of the DECIMAL type
To map the MySQL DECIMAL type whose precision is greater than 38 to the Presto DECIMAL type, you can specify the decimal_mapping configuration property or set the session property of the decimal_mapping configuration property to allow_overflow. The scope of the resulting data type is determined by the decimal-default-scale or decimal-rounding-mode configuration property, and the precision is fixed as 38.
By default, values that require rounding or truncation for mapping result in a failure at runtime. The rounding mode is determined by the decimal-rounding-mode configuration property or the session property of the decimal-rounding-mode configuration property. You can set this configuration property to UNNECESSARY, UP, DOWN, CEILING, FLOOR, HALF_UP, HALF_DOWN, or HALF_EVEN. The default value is UNNECESSARY.
General configuration properties
Property | Description |
---|---|
unsupported-type-handling | The method used to process columns whose data types are not supported. Valid values:
|
jdbc-types-mapped-to-varchar | Specifies whether to allow the forcible conversion of data that consists of elements separated by commas (,) into data of the unbounded VARCHAR type. |
case-insensitive-name-matching | Specifies whether the names of databases and tables are case-sensitive. Valid values:
|
case-insensitive-name-matching.cache-ttl | The cache time to live (TTL) of the database and table names that are not case-sensitive.
Default value: 1. Unit: minutes. |
metadata.cache-ttl | The cache TTL of metadata, which includes the statistical information of tables and
columns.
Default value: 0. A value of 0 specifies that metadata cannot be cached. |
metadata.cache-missing | Specifies whether to cache the status of metadata, which includes the statistical
information of tables and columns. Valid values:
|
Example: Query MySQL data
Pushdown
For more information about the pushdown feature, see Pushdown in the official Presto documentation.
- Operators
- Join
- Limit
- Top-N
- Functions
- avg()
- count()
- max()
- min()
- sum()
- stddev()
- stddev_pop()
- stddev_samp()
- variance()
- var_pop()
- var_samp()