Read data from MySQL by using Spark SQL - AnalyticDB - Alibaba Cloud Documentation Center

Prerequisites

Before you begin, ensure that you have:

An AnalyticDB for MySQL Data Lakehouse Edition cluster
The AnalyticDB for MySQL cluster and the ApsaraDB RDS for MySQL instance in the same region
A job resource group created for the cluster. See Create a resource group
A database account for the AnalyticDB for MySQL cluster:
- Alibaba Cloud account: a privileged account. See Create a privileged account
- Resource Access Management (RAM) user: both a privileged account and a standard account, with the standard account associated with the RAM user. See Create a database account and Associate or disassociate a database account with or from a RAM user
The ApsaraDB RDS for MySQL instance added to a security group with inbound and outbound rules that allow traffic on the instance's port. See Configure a security group for an ApsaraDB RDS for MySQL instance and Add a security group rule

Step 1: Prepare data

Create a database and table in the ApsaraDB RDS for MySQL instance, then insert sample rows:

CREATE DATABASE `db`;

CREATE TABLE `db`.`test` (
  `id` int(11) DEFAULT NULL,
  `first_name` varchar(32) DEFAULT NULL,
  `laster_name` varchar(32) DEFAULT NULL,
  `age` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO test VALUES(1,'a','b',5);
INSERT INTO test VALUES(2,'c','d',6);
INSERT INTO test VALUES(3,'e','f',7);

Step 2: (Optional) Upload a CA certificate and the ApsaraDB RDS for MySQL driver

Skip this step if you do not need SSL encryption. Go directly to Step 3: Submit a Spark SQL job.

Enable SSL encryption on the ApsaraDB RDS for MySQL instance and download the CA certificate. See Configure the SSL encryption feature.

Important
You can encrypt an internal or public connection. This example uses an internal connection. The CA certificate is valid for one year. Generate a new certificate before it expires — an expired certificate cannot establish an SSL connection.
Download the JDBC driver that matches your ApsaraDB RDS for MySQL engine version from https://dev.mysql.com/downloads/connector/j/. This example uses mysql-connector-java-8.0.29.jar.
Extract the CA certificate package. Upload the JKS file and the driver JAR to Object Storage Service (OSS). See Upload objects.

Step 3: Submit a Spark SQL job

Log on to the AnalyticDB for MySQL console. In the upper-left corner, select a region. In the left-side navigation pane, click Clusters. On the Data Lakehouse Edition tab, click the cluster ID.
In the left-side navigation pane, choose Job Development > SQL Development.
On the SQLConsole tab, select the Spark engine and a job resource group.
Enter and run the SQL for your access mode.

Important

The examples below use placeholder values for user and password. Do not hardcode credentials in production jobs. Store them securely and reference them at runtime.

View mode

In view mode, Spark SQL creates a temporary view over the MySQL table using JDBC. The view is session-scoped — recreate it each time you start a new job.

Choose one of the following connection methods.

Connect over an elastic network interface (ENI)

set spark.adb.eni.enabled=true;
set spark.adb.eni.vswitchId=<vsw-bp1sxxsodv28ey5dl****>;
set spark.adb.eni.securityGroupId=<sg-bp19mr685pmg4ihc****>;

CREATE TEMPORARY VIEW table_tmp
USING org.apache.spark.sql.jdbc
OPTIONS (
  url '<jdbc:mysql://rm-bp1k87323a7ia****.mysql.rds.aliyuncs.com:3306>',
  dbtable '<db.test>',
  user '<user>',
  password '<password>'
);

SELECT * FROM table_tmp;

Replace the placeholders with your actual values:

Parameter	Description
`spark.adb.eni.enabled`	Set to `true` to enable ENI routing
`spark.adb.eni.vswitchId`	The vSwitch ID of the ApsaraDB RDS for MySQL instance. On the Database Connection page, hover over VPC to view the vSwitch ID
`spark.adb.eni.securityGroupId`	The security group ID. See Configure a security group for an ApsaraDB RDS for MySQL instance
`url`	The internal endpoint and port of the instance. Format: `jdbc:mysql://<endpoint>:<port>`
`dbtable`	The table to read. Format: `db_name.table_name`. In this example, db_1.table_11 is used
`user`	The database account name
`password`	The database account password

Connect over SSL

add jar oss://<bucketname>/mysql-connector-java-8.0.11.jar;
set spark.app.name=SSL_RDS_SQL;
set spark.adb.eni.enabled=true;
set spark.adb.eni.vswitchId=<vsw-bp1sxxsodv28ey5dl****>;
set spark.adb.eni.securityGroupId=<sg-bp19mr685pmg4ihc****>;
set spark.kubernetes.driverEnv.ADB_SPARK_DOWNLOAD_FILES=oss://<bucketname>/ApsaraDB-CA-Chain.jks;
set spark.executorEnv.ADB_SPARK_DOWNLOAD_FILES=oss://<bucketname>/ApsaraDB-CA-Chain.jks;

CREATE TEMPORARY VIEW table_view
USING org.apache.spark.sql.jdbc
OPTIONS (
  url '<jdbc:mysql://rm-bp1k87323a7ia****.mysql.rds.aliyuncs.com:3306/?useSSL=true&clientCertificateKeyStoreUrl=file:///tmp/<bucketname>/folder/ApsaraDB-CA-Chain.jks&clientCertificateKeyStorePassword=apsaradb&trustCertificateKeyStoreUrl=file:///tmp/<bucketname>/folder/ApsaraDB-CA-Chain.jks&trustCertificateKeyStorePassword=apsaradb&trustCertificateKeyStoreType=JKS&clientCertificateKeyStoreType=JKS>',
  dbtable '<db.test>',
  user '<user>',
  password '<password>'
);

Replace the placeholders with your actual values:

Parameter	Description
`add jar`	The OSS path of the JDBC driver. This example uses `mysql-connector-java-8.0.11.jar`
`spark.app.name`	The Spark SQL job name
`spark.adb.eni.enabled`	Set to `true` to enable ENI routing
`spark.adb.eni.vswitchId`	The vSwitch ID of the instance. On the Database Connection page, hover over VPC to view the ID
`spark.adb.eni.securityGroupId`	The security group ID
`spark.kubernetes.driverEnv.ADB_SPARK_DOWNLOAD_FILES`	The OSS path of the JKS certificate for the Spark driver. Separate multiple paths with commas. Example: `oss://testBucketName/a.jks,oss://testBucketName/b.jks`
`spark.executorEnv.ADB_SPARK_DOWNLOAD_FILES`	The OSS path of the JKS certificate for the Spark executor. Separate multiple paths with commas
`url`	The internal endpoint, port, and SSL parameters. The JKS keystore password is `apsaradb` and the store type is `JKS`. For SSL parameter details, see the Parameters section
`dbtable`	The table to read. Format: `db_name.table_name`. This example uses `db.test`
`user`	The database account name
`password`	The database account password

Catalog mode

In catalog mode, you configure the MySQL data source using Spark's JDBC catalog interface. Run the following statements to register the catalog and query data.

set spark.adb.eni.enabled=true;
set spark.adb.eni.vswitchId=<vsw-bp1d14ddiw46fkgu1****>;
set spark.adb.eni.securityGroupId=<sg-bp19varsa8j0hyb****>;
set spark.sql.catalog.jdbc=org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog;
set spark.sql.catalog.jdbc.url=<jdbc:mysql://rm-bp11mpql1e01l****.mysql.rds.aliyuncs.com:3306>;
set spark.sql.catalog.jdbc.user=<user>;
set spark.sql.catalog.jdbc.password=<password>;
USE jdbc;
SELECT * FROM db.test;

Replace the placeholders with your actual values:

Parameter	Description
`spark.sql.catalog.jdbc`	The catalog implementation class. Always set to `org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog`. The suffix `jdbc` is the catalog name — replace it with a custom name if needed (for example, `spark.sql.catalog.mycatalog`)
`spark.sql.catalog.jdbc.url`	The internal endpoint and port of the instance. Format: `jdbc:mysql://<endpoint>:<port>`
`spark.sql.catalog.jdbc.user`	The database account name
`spark.sql.catalog.jdbc.password`	The database account password

For the full list of JDBC parameters supported by Spark SQL, see JDBC to other databases.