AnalyticDB for MySQL supports Spark SQL jobs that read from and write to ApsaraDB RDS for PostgreSQL databases over JDBC. This tutorial walks through setting up the JDBC driver, creating a temporary view, and running read and write queries against an RDS for PostgreSQL table.
Prerequisites
Before you begin, ensure that you have:
-
An AnalyticDB for MySQL Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster
-
A job resource group created for the cluster — Create a resource group
-
A database account for the AnalyticDB for MySQL cluster:
-
Alibaba Cloud account: a privileged account
-
Resource Access Management (RAM) user: a privileged account and a standard account, with the standard account associated with the RAM user
-
-
An RDS for PostgreSQL instance in the same VPC as the AnalyticDB for MySQL cluster
-
A database account for the RDS for PostgreSQL instance
-
A security group configured for the RDS for PostgreSQL instance, with rules that allow inbound and outbound access on the PostgreSQL port
-
The vSwitch network segment associated with the RDS for PostgreSQL instance added to the RDS for PostgreSQL whitelist
Step 1: Prepare data in RDS for PostgreSQL
In your RDS for PostgreSQL instance, create a schema named demo and a table named tbl, then insert sample rows:
CREATE SCHEMA demo;
CREATE TABLE demo.tbl (
id INTEGER,
first_name VARCHAR(32),
last_name VARCHAR(32),
age INTEGER
);
INSERT INTO demo.tbl VALUES(1, 'a', 'b', 5);
INSERT INTO demo.tbl VALUES(2, 'c', 'd', 6);
INSERT INTO demo.tbl VALUES(3, 'e', 'f', 7);
Step 2: Download and upload the PostgreSQL JDBC driver
Spark requires the PostgreSQL JDBC driver on the classpath to open a JDBC connection at runtime. Upload the driver to Object Storage Service (OSS) so that Spark can load it when the job starts.
-
Download the driver that matches your RDS for PostgreSQL version from the official website. This tutorial uses postgresql-42.7.7.jar.
-
Upload the JAR file to OSS.
Step 3: Submit the Spark SQL job
-
Log on to the AnalyticDB for MySQL console. In the upper-left corner, select a region. In the left-side navigation pane, click Clusters, then click the target cluster ID.
-
In the left-side navigation pane, choose Job Development > SQL Development.
-
In the SQLConsole window, select the Spark engine and the job-specific resource group.
-
Enter the following statements. Replace the placeholders with your actual values.
-- Load the PostgreSQL JDBC driver from OSS ADD jar oss://<OSS_BUCKET_NAME>/postgresql-42.7.7.jar; -- Enable elastic network interface (ENI) access so Spark can reach the RDS for PostgreSQL instance SET spark.adb.eni.enabled=true; SET spark.adb.eni.vswitchId=<VSWITCH_ID>; SET spark.adb.eni.securityGroupId=<SECURITY_GROUP_ID>; -- Create a temporary view mapped to the RDS for PostgreSQL table CREATE TEMPORARY VIEW table_tmp USING org.apache.spark.sql.jdbc OPTIONS ( url 'jdbc:postgresql://<RDS_ENDPOINT>:5432/<DB_NAME>', driver 'org.postgresql.Driver', dbtable 'demo.tbl', user '<DB_USER>', password '<DB_PASSWORD>' ); -- Write: insert a new row into the RDS for PostgreSQL table through the view INSERT INTO table_tmp VALUES(4, 'e', 'f', 8); -- Read: query all rows from the RDS for PostgreSQL table through the view SELECT * FROM table_tmp;Replace each placeholder with the appropriate value:
Placeholder Description Example <OSS_BUCKET_NAME>Name of the OSS bucket where you uploaded the driver my-bucket<VSWITCH_ID>ID of the vSwitch associated with the RDS for PostgreSQL instance. Find it on the instance's Database Connection page by hovering over the VPC. vsw-bp1sxxsodv28ey5dl****<SECURITY_GROUP_ID>ID of the security group associated with the RDS for PostgreSQL instance sg-bp19mr685pmg4ihc****<RDS_ENDPOINT>Connection endpoint of the RDS for PostgreSQL instance pgm-t4n37****.pgsql.singapore.rds.aliyuncs.com<DB_NAME>Name of the target database test<DB_USER>Database account of the RDS for PostgreSQL instance myuser<DB_PASSWORD>Password for the database account — The following table describes the key parameters:
Parameter Description Scope ADD jarOSS path of the PostgreSQL JDBC driver JAR Read/Write spark.adb.eni.enabledEnables ENI-based network access. Set to true.Read/Write spark.adb.eni.vswitchIdvSwitch ID for the RDS for PostgreSQL instance Read/Write spark.adb.eni.securityGroupIdSecurity group ID associated with the RDS for PostgreSQL instance Read/Write table_tmpThe name of the temporary view. This tutorial uses table_tmpas an example.Read/Write USING org.apache.spark.sql.jdbcFixed value. Specifies the JDBC data source. Read/Write urlJDBC connection string. Format: jdbc:postgresql://<host>:5432/<db_name>Read/Write driverFixed value: org.postgresql.DriverRead/Write dbtableTable to access. Format: schema_name.table_nameRead/Write userDatabase account of the RDS for PostgreSQL instance Read/Write passwordPassword for the database account Read/Write -
Click Execute.
-
After the job completes, view the query results in Logs on the Applications tab of the Job Development > Spark JAR Development page. For details, see Spark Development Editor.