This topic describes how to use the serverless Spark engine of Data Lake Analytics (DLA) to access ApsaraDB RDS databases in a virtual private cloud (VPC) over an elastic network interface (ENI).

Prerequisites

The serverless Spark engine is authorized to access your VPC. For more information about how to authorize the serverless Spark engine to access a VPC, see Configure the network of data sources.

Configure a whitelist for an ApsaraDB RDS instance

For more information, see Configure an IP address whitelist for an ApsaraDB RDS for MySQL instance. Add the classless inter-domain routing (CIDR) block of the vSwitch where the ENI resides to a whitelist of the ApsaraDB RDS instance. You can also add the security group to which the ENI belongs to the security group of the ApsaraDB RDS instance. tp

Write test data to an ApsaraDB RDS database

For more information, see Use DMS to log on to an ApsaraDB RDS for SQL Server instance. The following code shows the test data:
CREATE TABLE `persons` (
  `id` int(11) DEFAULT NULL,
  `first_name` varchar(32) DEFAULT NULL,
  `laster_name` varchar(32) DEFAULT NULL,
  `age` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
;

insert into persons VALUES(1,'a','b',5);
insert into persons VALUES(2,'c','d',6);
insert into persons VALUES(3,'e','f',7);

Write a Spark application to access the ApsaraDB RDS database

A Spark application can access the ApsaraDB RDS database and tables by using the serverless Spark engine based on the imported information of the ApsaraDB RDS database and tables and then display the information. The following information is an example:
package com.aliyun.spark

import org.apache.spark.sql.SparkSession

object SparkRDS {

  def main(args: Array[String]): Unit = {
    val sparkSession = SparkSession.builder()
      .appName("rds test")
      .getOrCreate()

    val url = args(0)
    val dbtable = args(1)
    val user = args(2)
    val password = args(3)

    val jdbcDF = sparkSession.read
      .format("jdbc")
      .option("url", url)
      .option("driver", "com.mysql.jdbc.Driver")
      .option("dbtable", dbtable)
      .option("user", user)
      .option("password", password)
      .load()

    jdbcDF.show()
  }

}

Upload files to OSS

Compile code and package it into a JAR file, and upload the JAR file of the Spark application and MySQL driver dependency to Object Storage Service (OSS). You can click the link to download the MySQL driver dependency.

For more information, see Upload objects.

Submit a job

Write a spark-submit script in the serverless Spark engine. For more information, see Create and run Spark jobs. The following code shows the script:
{
    "args": [
        "jdbc:mysql://URL of your ApsaraDB RDS database",
        "persons",
        "spark",
        "Password to access your ApsaraDB RDS database"
    ],
    "name": "changqing-dla-test",
    "jars": [
        "oss://changqing-dla-test/mysql-connector-java.jar"
    ],
    "file": "oss://changqing-dla-test/rds_test.jar",
    "className": "com.aliyun.spark.SparkRDS",
    "conf": {
        "spark.dla.eni.enable": "true",
        "spark.dla.eni.vswitch.id": "ID of the vSwitch that you selected",
        "spark.dla.eni.security.group.id": "ID of the security group that you selected",
        "spark.driver.resourceSpec": "medium",
        "spark.executor.instances": 1,
        "spark.executor.resourceSpec": "medium"
    }
}
After the job succeeds, click Log in the Operation column. If the information shown in the following figure appears, the serverless Spark engine accesses the ApsaraDB RDS database. tp