Access ApsaraDB RDS for MySQL from a Spark application over ENI or SSL - AnalyticDB

AnalyticDB for MySQL Data Lakehouse Edition connects to ApsaraDB RDS for MySQL through Spark jobs using two methods: an elastic network interface (ENI) or an SSL connection. ENI-based access is simpler to configure. SSL-based access adds transport-layer encryption for environments with stricter data security requirements. Both methods support same-account and cross-account scenarios.

Choose a connection method

Method	When to use	Security
ENI	The RDS instance and the AnalyticDB cluster are in the same VPC, or cross-account access with a RAM role is acceptable	Standard VPC-level isolation
SSL	Encrypted connections are required by compliance policy or the data is highly sensitive	Transport-layer encryption (SSL) on top of VPC

Prerequisites

Before you begin, ensure that you have:

Cluster and instance setup:

An AnalyticDB for MySQL Data Lakehouse Edition cluster. See Create a cluster.Data Lakehouse Edition
An ApsaraDB RDS for MySQL instance in the same region as the cluster. See Create an ApsaraDB RDS for MySQL instance.
A job resource group for the cluster. See Create a resource group.
An Object Storage Service (OSS) bucket in the same region as the cluster. See Activate OSS and Create buckets.

Accounts and permissions:

A database account for the AnalyticDB for MySQL cluster:
- Alibaba Cloud account: create a privileged account. See the "Create a privileged account" section of Create a database account.
- Resource Access Management (RAM) user: create a privileged account and a standard account, then associate the standard account with the RAM user. See Create a database account and Associate or disassociate a database account with or from a RAM user.
Authorization is complete. See Perform authorization.

Network and security:

The RDS instance is added to a security group whose inbound and outbound rules allow traffic on the instance's port. See Configure a security group for an ApsaraDB RDS for MySQL instance and Add a security group rule.

Important

Same-account access requires the AliyunADBSparkProcessingDataRole permission. Cross-account access requires additional authorization for the other Alibaba Cloud accounts.

Prepare data

Run the following statements against your ApsaraDB RDS for MySQL instance to create the sample database and table used in the examples below:

CREATE DATABASE `test`;

CREATE TABLE `test`.`persons` (
  `id` int(11) DEFAULT NULL,
  `first_name` varchar(32) DEFAULT NULL,
  `laster_name` varchar(32) DEFAULT NULL,
  `age` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO persons VALUES(1,'a','b',5);
INSERT INTO persons VALUES(2,'c','d',6);
INSERT INTO persons VALUES(3,'e','f',7);

Access ApsaraDB RDS for MySQL over an ENI

All examples in this section use spark.adb.eni.enabled: true to route Spark job traffic through an elastic network interface to the RDS instance.

Step 1: Prepare the driver and JAR package

Write a Spark program that reads from the RDS table, then compile it into a JAR package. The example below produces a JAR named rds_test.jar:

package com.aliyun.spark
import org.apache.spark.sql.SparkSession
object SparkRDS {
  def main(args: Array[String]): Unit = {
    val sparkSession = SparkSession.builder()
      .appName("rds mysql test")
      .getOrCreate()
    // Internal endpoint of the ApsaraDB RDS for MySQL instance.
    // See "View and change the endpoints and port numbers" for details.
    val url = "jdbc:mysql://rm-bp11mpql1e01****.mysql.rds.aliyuncs.com"
    // Table name in db_name.table_name format
    val dbtable = "test.persons"
    // Database account credentials — pass via args or environment variables in production
    val user = "mysql_username"
    val password = "mysql_password"
    val jdbcDF = sparkSession.read
      .format("jdbc")
      .option("url", url)
      .option("driver", "com.mysql.jdbc.Driver")
      .option("dbtable", dbtable)
      .option("user", user)
      .option("password", password)
      .load()
    jdbcDF.show()
  }
}

Download the MySQL Connector/J driver that matches your RDS engine version from https://dev.mysql.com/downloads/connector/j/. This example uses mysql-connector-java-8.0.11.jar.
Upload both the JAR package (rds_test.jar) and the driver (mysql-connector-java-8.0.11.jar) to your OSS bucket. See Upload objects.

Step 2: Submit the Spark job

Same-account access

Log on to the AnalyticDB for MySQL console. In the upper-left corner, select a region. In the left-side navigation pane, click Clusters. On the Clusters page, click an edition tab, find the cluster, and click its ID.
In the left-side navigation pane, choose Job Development > Spark JAR Development.
Select a job resource group and set the job type to Batch.

Paste the following configuration into the Spark editor, replacing the placeholder values:

Parameter	Required	Description
`name`	Yes	Name of the Spark job
`jars`	Yes	OSS path of the MySQL Connector/J driver JAR
`file`	Yes	OSS path of the Spark job JAR package
`className`	Yes	Entry class of the Java or Scala application. In this example: `com.aliyun.spark.SparkRDS`
`conf`	Yes	Spark configuration parameters in `key:value` format, separated by commas. See Spark application configuration parameters
`spark.adb.eni.enabled`	Yes	Set to `true` to enable ENI-based routing
`spark.adb.eni.vswitchId`	Yes	vSwitch ID of the RDS instance. To find it, move the pointer over VPC on the Database Connection page
`spark.adb.eni.securityGroupId`	Yes	ID of the security group that the RDS instance belongs to. See Configure a security group for an ApsaraDB RDS for MySQL instance
`spark.driver.resourceSpec`	Yes	Resource specification for the Spark driver
`spark.executor.instances`	Yes	Number of Spark executor instances
`spark.executor.resourceSpec`	Yes	Resource specification for each Spark executor

{
    "name": "rds-mysql-example",
    "jars": [
        "oss://testBucketName/mysql-connector-java-8.0.11.jar"
    ],
    "file": "oss://testBucketName/rds_test.jar",
    "className": "com.aliyun.spark.SparkRDS",
    "conf": {
        "spark.adb.eni.enabled": "true",
        "spark.adb.eni.vswitchId": "vsw-bp17jqw3lrrobn6y****",
        "spark.adb.eni.securityGroupId": "sg-bp163uxgt4zandx****",
        "spark.driver.resourceSpec": "small",
        "spark.executor.instances": 1,
        "spark.executor.resourceSpec": "small"
    }
}

Click Run Now.
After the job completes, verify the result in the Spark job logs. See the View information about a Spark application section of the Spark editor topic.

Cross-account access

Follow steps 1–3 from Same-account access, then use the configuration below. It adds spark.adb.eni.roleArn to assume a RAM role in the account that owns the RDS instance.

{
    "name": "rds-mysql-example",
    "jars": [
        "oss://testBucketName/mysql-connector-java-8.0.11.jar"
    ],
    "file": "oss://testBucketName/rds_test.jar",
    "className": "com.aliyun.spark.SparkRDS",
    "conf": {
        "spark.adb.eni.enabled": "true",
        "spark.adb.eni.vswitchId": "vsw-bp17jqw3lrrobn6y****",
        "spark.adb.eni.securityGroupId": "sg-bp163uxgt4zandx****",
        "spark.driver.resourceSpec": "small",
        "spark.executor.instances": 1,
        "spark.executor.resourceSpec": "small",
        "spark.adb.eni.roleArn": "acs:ram::testAccountID:role/testUserName"
    }
}

Parameter	Required	Description
`spark.adb.eni.roleArn`	Yes (cross-account only)	ARN of the RAM role used to access the RDS instance across accounts. Separate multiple ARNs with commas. Format: `acs:ram::testAccountID:role/testUserName`, where `testAccountID` is the ID of the account that owns the RDS instance and `testUserName` is the RAM role name created during cross-account authorization. See Perform authorization across Alibaba Cloud accounts

All other parameters are the same as in the same-account configuration above.

Click Run Now, then verify the result in the Spark job logs.

Access ApsaraDB RDS for MySQL over an SSL connection

Before proceeding, enable SSL encryption for the ApsaraDB RDS for MySQL instance and make sure the internal endpoint connection is encrypted. See Configure the SSL encryption feature.

Step 1: Download the CA certificate and upload to OSS

Log on to the ApsaraDB RDS console. In the upper-left corner, select a region. In the left-side navigation pane, click Instances, then click the instance ID.
In the left-side navigation pane, click Data Security.
Click Download CA Certificate.
Important
The CA certificate is valid for one year. After it expires, generate a new one. An expired certificate cannot be used to establish an SSL connection.
Decompress the downloaded package, then upload the extracted JKS file (ApsaraDB-CA-Chain.jks) to your OSS bucket. See Upload objects.

Step 2: Prepare the Spark JAR package

Write a Spark program that connects to RDS over SSL, then compile it into test.jar. The program reads the JKS certificate path and database credentials from command-line arguments, keeping sensitive values out of the source code:

package org.example
import org.apache.spark.sql.SparkSession

object Test {
  def main(args: Array[String]): Unit = {
    // OSS path of the JKS certificate file, e.g. oss://testBucketName/folder/ApsaraDB-CA-Chain.jks
    val JKS_FILE_PATH = args(0)
    // Database account credentials
    val USERNAME = args(1)
    val PASSWORD = args(2)
    // Database and table names
    val DATABASE_NAME = args(3)
    val TABLE_NAME = args(4)

    // Internal endpoint of the ApsaraDB RDS for MySQL instance.
    // The JKS file is downloaded to /tmp/<OSS_PATH> at runtime by the ADB_SPARK_DOWNLOAD_FILES setting.
    val mysqlUrl = "jdbc:mysql://rm-bp11mpql1e01****.mysql.rds.aliyuncs.com:3306/?" +
      "useSSL=true" +
      s"&trustCertificateKeyStoreUrl=file:///tmp/testBucketName/folder/ApsaraDB-CA-Chain.jks" +
      "&trustCertificateKeyStorePassword=apsaradb" +
      "&trustCertificateKeyStoreType=JKS"

    val spark = SparkSession.builder().getOrCreate()

    spark.read.format("jdbc")
      .option("driver", "com.mysql.cj.jdbc.Driver")
      .option("url", mysqlUrl)
      .option("user", USERNAME)
      .option("password", PASSWORD)
      .option("dbtable", s"${DATABASE_NAME}.${TABLE_NAME}")
      .load()
      .show()
  }
}

The JDBC connection URL uses the following SSL parameters:

Parameter	Value	Required	Description
`useSSL`	`true` (default: `false`)	Yes	Enables SSL encryption for the JDBC connection
`trustCertificateKeyStoreUrl`	`file:///tmp/<JKS_FILE_PATH>`	Yes	Local path of the JKS certificate file. The runtime maps the OSS path to `/tmp/<OSS_PATH>`. For example, `oss://testBucketName/folder/ApsaraDB-CA-Chain.jks` maps to `file:///tmp/testBucketName/folder/ApsaraDB-CA-Chain.jks`
`trustCertificateKeyStorePassword`	`apsaradb`	Yes	Fixed password for the JKS certificate file
`trustCertificateKeyStoreType`	`JKS`	Yes	Certificate storage format

Upload test.jar to your OSS bucket. See Upload objects.

Step 3: Submit the Spark job

Same-account access

Log on to the AnalyticDB for MySQL console. Select a region, click Clusters, find the cluster, and click its ID.
In the left-side navigation pane, choose Job Development > Spark JAR Development.
Select a job resource group and set the job type to Batch.

Paste the following configuration into the Spark editor:

Parameter	Required	Description
`name`	Yes	Name of the Spark job
`file`	Yes	OSS path of the Spark job JAR package
`className`	Yes	Entry class of the Java or Scala application. In this example: `org.example.Test`
`conf`	Yes	Spark configuration parameters in `key:value` format, separated by commas. See Spark application configuration parameters
`spark.kubernetes.driverEnv.ADB_SPARK_DOWNLOAD_FILES`	Yes	OSS path of the JKS certificate file for the Spark driver. Separate multiple files with commas. Example: `oss://testBucketName/a.jks,oss://testBucketName/b.jks`
`spark.executorEnv.ADB_SPARK_DOWNLOAD_FILES`	Yes	OSS path of the JKS certificate file for the Spark executor. Separate multiple files with commas
`spark.adb.eni.enabled`	Yes	Set to `true` to enable ENI-based routing
`spark.adb.eni.vswitchId`	Yes	vSwitch ID of the RDS instance. Find it by hovering over VPC on the Database Connection page
`spark.adb.eni.securityGroupId`	Yes	ID of the security group that the RDS instance belongs to. See Configure a security group for an ApsaraDB RDS for MySQL instance
`spark.driver.resourceSpec`	Yes	Resource specification for the Spark driver
`spark.executor.instances`	Yes	Number of Spark executor instances
`spark.executor.resourceSpec`	Yes	Resource specification for each Spark executor

{
  "file": "oss://testBucketName/test.jar",
  "className": "org.example.Test",
  "name": "MYSQL PEM Test",
  "conf": {
    "spark.kubernetes.driverEnv.ADB_SPARK_DOWNLOAD_FILES": "oss://testBucketName/folder/ApsaraDB-CA-Chain.jks",
    "spark.executorEnv.ADB_SPARK_DOWNLOAD_FILES": "oss://testBucketName/folder/ApsaraDB-CA-Chain.jks",
    "spark.driver.resourceSpec": "small",
    "spark.executor.instances": 1,
    "spark.executor.resourceSpec": "small",
    "spark.adb.eni.enabled": "true",
    "spark.adb.eni.vswitchId": "vsw-bp17jqw3lrrobn6y****",
    "spark.adb.eni.securityGroupId": "sg-bp163uxgt4zandx****"
  }
}

Click Run Now.
After the job completes, verify the result in the Spark job logs. See the View information about a Spark application section of the Spark editor topic.

Cross-account access

Follow steps 1–3 from Same-account access, then add spark.adb.eni.roleArn to the configuration:

{
  "file": "oss://testBucketName/test.jar",
  "className": "org.example.Test",
  "name": "MYSQL PEM Test",
  "conf": {
    "spark.kubernetes.driverEnv.ADB_SPARK_DOWNLOAD_FILES": "oss://testBucketName/folder/ApsaraDB-CA-Chain.jks",
    "spark.executorEnv.ADB_SPARK_DOWNLOAD_FILES": "oss://testBucketName/folder/ApsaraDB-CA-Chain.jks",
    "spark.driver.resourceSpec": "small",
    "spark.executor.instances": 1,
    "spark.executor.resourceSpec": "small",
    "spark.adb.eni.enabled": "true",
    "spark.adb.eni.vswitchId": "vsw-bp17jqw3lrrobn6y****",
    "spark.adb.eni.securityGroupId": "sg-bp163uxgt4zandx****",
    "spark.adb.eni.roleArn": "acs:ram::testAccountID:role/testUserName"
  }
}

spark.adb.eni.roleArn is the only additional parameter. Its format and behavior are the same as described in Cross-account access for the ENI method.

Click Run Now, then verify the result in the Spark job logs.

AnalyticDB:Access ApsaraDB RDS for MySQL

Choose a connection method

Prerequisites

Prepare data

Access ApsaraDB RDS for MySQL over an ENI

Step 1: Prepare the driver and JAR package

Step 2: Submit the Spark job

Same-account access

Cross-account access

Access ApsaraDB RDS for MySQL over an SSL connection

Step 1: Download the CA certificate and upload to OSS

Step 2: Prepare the Spark JAR package

Step 3: Submit the Spark job

Same-account access

Cross-account access

What's next