AnalyticDB for MySQL allows you to submit a Spark SQL job to access self-managed SQL Server databases or ApsaraDB RDS for SQL Server in view mode. This topic describes how to use Spark SQL to access data from ApsaraDB RDS for SQL Server.
Prerequisites
An AnalyticDB for MySQL Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster is created.
A job resource group is created for the AnalyticDB for MySQL cluster.
A database account is created for the AnalyticDB for MySQL cluster.
If you use an Alibaba Cloud account, you need to only create a privileged account.
If you use a Resource Access Management (RAM) user, you must create a privileged account and a standard account and associate the standard account with the RAM user.
An ApsaraDB RDS for SQL Server instance is created in the same virtual private cloud (VPC) as the AnalyticDB for MySQL cluster.
The ApsaraDB RDS for SQL Server instance is added to a security group. The inbound and outbound rules of the security group allow requests from the port of the ApsaraDB RDS for SQL Server instance.
The CIDR block of the vSwitch to which the ApsaraDB RDS for SQL Server instance is connected is added to an IP address whitelist of the ApsaraDB RDS for SQL Server instance.
Step 1: Prepare data
Create a database in the ApsaraDB RDS for SQL Server instance, create a table in the database, and then insert data into the table. Sample statements:
CREATE DATABASE db;
CREATE TABLE test (
id INT NULL,
first_name NVARCHAR(32) NULL,
last_name NVARCHAR(32) NULL,
age INT NULL
);
INSERT INTO test VALUES(1,'a','b',5);
INSERT INTO test VALUES(2,'c','d',6);
INSERT INTO test VALUES(3,'e','f',7);Step 2: Download and upload the ApsaraDB RDS for SQL Server driver
Decompress the downloaded driver package. Upload the JAR package with the suffix jre8.jar in the jars folder to Object Storage Service (OSS).
Step 3: Submit a Spark SQL job
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. On the Enterprise Edition, Basic Edition, or Data Lakehouse Edition tab, find the cluster that you want to manage and click the cluster ID.
In the left-side navigation pane, choose .
On the SQLConsole tab, select the Spark engine and a job resource group.
On the SQLConsole tab, configure the parameters based on the access mode.
ADD jar oss://testBucketName/mssql-jdbc-12.8.1.jre8.jar; SET spark.adb.eni.enabled=true; SET spark.adb.eni.vswitchId=<vsw-bp1sxxsodv28ey5dl****>; SET spark.adb.eni.securityGroupId=<sg-bp19mr685pmg4ihc****>; CREATE TEMPORARY VIEW table_tmp USING org.apache.spark.sql.jdbc OPTIONS ( url '<jdbc:mysql://rm-t4nn744j34****.mssql.singapore.rds.aliyuncs.com:1433;databaseName=db;trustServerCertificate=true;encrypt=true;>', driver 'com.microsoft.sqlserver.jdbc.SQLServerDriver', dbtable '<tableName>', user '<user>', password '<password>' ); INSERT INTO table_tmp VALUES(4,'e','f',8); SELECT * FROM table_tmp;The following table describes the parameters.
Parameter
Required
Description
ADD jar
Yes
The OSS path of the ApsaraDB RDS for SQL Server driver.
In this example, the OSS path of the mssql-jdbc-12.8.1.jre8.jar file is used.
spark.adb.eni.enabled
Yes
Specifies whether to enable Elastic Network Interface (ENI).
When you access data, you must set the
spark.adb.eni.enabledparameter to true.spark.adb.eni.vswitchId
Yes
The vSwitch ID of the ApsaraDB RDS for SQL Server instance. To view the vSwitch ID, move the pointer over VPC on the Database Connection page.
spark.adb.eni.securityGroupId
Yes
The ID of the security group to which the ApsaraDB RDS for SQL Server instance is added. For more information, see Configure security group settings.
USING org.apache.spark.sql.jdbc
Yes
The value of this parameter is automatically set to
USING org.apache.spark.sql.jdbc.OPTIONS
Yes
url: the internal endpoint and port number of the ApsaraDB RDS for SQL Server instance. Format:
jdbc:sqlserver://rm-t4nn744j34****.mssql.singapore.rds.aliyuncs.com:1433;databaseName=db;trustServerCertificate=true;encrypt=true;. Set the databaseName parameter to the actual name of the database. Set thetrustServerCertificateandencryptparameters totrue.driver: The value of this parameter is automatically set to
com.microsoft.sqlserver.jdbc.SQLServerDriver.dbtable: the name of the table in the ApsaraDB RDS for SQL Server instance. Format:
db_name.table_name. In this example, db.test is used.user: the name of the database account of the ApsaraDB RDS for SQL Server instance.
password: the password of the database account.
Click Execute.
After you run the Spark job, go to the Spark JAR Development page and click Logs in the Actions column on the Applications tab to view the table data. For more information, see Spark editor.