StarRocks provides an official Spark Connector to read and write data between Spark and StarRocks. You can configure Serverless Spark to connect to StarRocks during development. This topic describes how to read data from and write data to StarRocks in EMR Serverless Spark.
Prerequisites
A Serverless Spark workspace has been created. For more information, see Create a workspace.
An EMR Serverless StarRocks instance has been created. For more information, see Create an instance.
Limitations
The Serverless Spark engine version must be esr-2.5.0, esr-3.1.0, esr-4.1.0, or later.
Procedure
Step 1: Get the Spark Connector JAR file and upload it to OSS
For more information, see Read data from StarRocks using Spark connector. Download the Spark Connector JAR file that corresponds to your Spark version.
For example, this topic uses a pre-compiled JAR file downloaded from the Maven Central Repository.
NoteThe Connector JAR file is named in the format
starrocks-spark-connector-${spark_version}_${scala_version}-${connector_version}.jar. For example, if you use engine version esr-4.1.0 (Spark 3.5.2, Scala 2.12) and want to use Connector version 1.1.2, you must selectstarrocks-spark-connector-3.5_2.12-1.1.2.jar.Upload the downloaded Spark Connector JAR file to Alibaba Cloud Object Storage Service (OSS). For more information about uploading files, see Simple upload.
Step 2: Add network connectivity
Obtain network information.
On the EMR Serverless Starrocks page, go to the Instance Details page of the target StarRocks instance to obtain its virtual private cloud (VPC) and vSwitch information.
Add a network connection.
On the EMR Serverless Spark page, go to the Network Connection page of the target Spark workspace and click Create Network Connection.
In the Create Network Connection dialog box, enter a Name, select the VPC and vSwitch of the StarRocks instance, and then click OK.
For more information about network connectivity, see Enable network communication between EMR Serverless Spark and other VPCs.
Step 3: Create a database and a table in StarRocks
Connect to the StarRocks instance. For more information, see Connect to a StarRocks instance using EMR StarRocks Manager.
On the Queries page of the SQL Editor, click File or the
icon in the right-side area, and then click Confirm to add a new file.In the new file, enter the following SQL statement and click Run.
CREATE DATABASE `testdb`; CREATE TABLE `testdb`.`score_board` ( `id` int(11) NOT NULL COMMENT '', `name` varchar(65533) NULL DEFAULT '' COMMENT '', `score` int(11) NOT NULL DEFAULT '0' COMMENT '' ) ENGINE=OLAP PRIMARY KEY(`id`) COMMENT 'OLAP' DISTRIBUTED BY HASH(`id`);
Step 4: Read from and write to StarRocks using Serverless Spark
Method 1: Use SQL sessions and Notebook sessions
For more information about session types, see Session Manager.
SQL sessions
Write data to StarRocks using Serverless Spark.
Create an SQL session. For more information, see Manage SQL sessions.
When you create the session, select an engine version that matches your Spark Connector version. For Network Connection, select the network connection that you created in the previous step. In the Spark Configuration section, add the following parameter to load the Spark Connector.
spark.emr.serverless.user.defined.jars oss://<bucketname>/path/connector.jarReplace
oss://<bucketname>/path/connector.jarwith the OSS path of the Spark Connector that you uploaded in Step 1. For example,oss://emr-oss/spark/starrocks-spark-connector-3.5_2.12-1.1.2.jar.On the Development page, create a SparkSQL task. In the upper-right corner, select the SQL session that you created.
For more information, see Develop a SparkSQL job.
Copy the following code to the new SparkSQL tab. Modify the parameters as needed, and then click Run.
CREATE TEMPORARY VIEW score_board USING starrocks OPTIONS ( "starrocks.table.identifier" = "testdb.score_board", "starrocks.fe.http.url" = "<fe_host>:<fe_http_port>", "starrocks.fe.jdbc.url" = "jdbc:mysql://<fe_host>:<fe_query_port>", "starrocks.user" = "<user>", "starrocks.password" = "<password>" ); INSERT INTO `score_board` VALUES (1, "starrocks", 100), (2, "spark", 100);The following table describes the parameters.
<fe_host>: The internal or public endpoint of the FE node in the Serverless StarRocks instance. You can find this on the FE Details tab of the Instance Details page.If you use an internal endpoint, ensure that the services are in the same VPC.
If you use a public endpoint, ensure that the security group rules allow communication on the required ports. For more information, see Network access and security configuration.
<fe_http_port>: The HTTP port of the FE node in the Serverless StarRocks instance. The default port is 8030. You can find this on the FE Details tab of the Instance Details page.<fe_query_port>: The query port of the FE node in the Serverless StarRocks instance. The default port is 9030. You can find this on the FE Details tab of the Instance Details page.<user>: The username for the Serverless StarRocks instance. The default administrator username is admin. You can also add a new user on the User Management page. For more information about adding a user, see Manage users and grant permissions.<password>: The password that corresponds to the<user>.
Query the written data using Serverless Spark.
In this example, a temporary view named
test_viewis created in the SparkSQL task to query data fromscore_board. Copy the following code to the new SparkSQL tab, select the code, and then click Run the selected(1000).CREATE TEMPORARY VIEW test_view USING starrocks OPTIONS ( "starrocks.table.identifier" = "testdb.score_board", "starrocks.fe.http.url" = "<fe_host>:<fe_http_port>", "starrocks.fe.jdbc.url" = "jdbc:mysql://<fe_host>:<fe_query_port>", "starrocks.user" = "<user>", "starrocks.password" = "<password>" ); SELECT * FROM test_view;The output is shown in the following figure.

Notebook sessions
Write data to StarRocks using Serverless Spark.
Create a Notebook session. For more information, see Manage Notebook sessions.
When you create the session, select an engine version that matches your Spark Connector version. For Network Connection, select the network connection that you created in the previous step. In the Spark Configuration section, add the following parameter to load the Spark Connector.
spark.emr.serverless.user.defined.jars oss://<bucketname>/path/connector.jarReplace
oss://<bucketname>/path/connector.jarwith the OSS path of the Spark Connector that you uploaded in Step 1. For example,oss://emr-oss/spark/starrocks-spark-connector-3.5_2.12-1.1.2.jar.On the Data Development page, create an task. In the upper-right corner, select the Notebook session that you created.
For more information, see Manage Notebook sessions.
Copy the following code into a Python cell of the new Notebook and click Run.
# Replace with your Serverless StarRocks configurations. fe_host = "<fe_host>" fe_http_port = "<fe_http_port>" fe_query_port = "<fe_query_port>" user = "<user>" password = "<password>" # Create a view. create_table_sql = f""" CREATE TEMPORARY VIEW score_board USING starrocks OPTIONS ( "starrocks.table.identifier" = "testdb.score_board", "starrocks.fe.http.url" = "{fe_host}:{fe_http_port}", "starrocks.fe.jdbc.url" = "jdbc:mysql://{fe_host}:{fe_query_port}", "starrocks.user" = "{user}", "starrocks.password" = "{password}" ) """ spark.sql(create_table_sql) # Insert data. insert_data_sql = """ INSERT INTO `score_board` VALUES (1, "starrocks", 100), (2, "spark", 100) """ spark.sql(insert_data_sql)The following figure shows an example.

The following table describes the parameters.
<fe_host>: The internal or public endpoint of the FE node in the Serverless StarRocks instance. You can find this on the FE Details tab of the Instance Details page.If you use an internal endpoint, ensure that the services are in the same VPC.
If you use a public endpoint, ensure that the security group rules allow communication on the required ports. For more information, see Network access and security configuration.
<fe_http_port>: The HTTP port of the FE node in the Serverless StarRocks instance. The default port is 8030. You can find this on the FE Details tab of the Instance Details page.<fe_query_port>: The query port of the FE node in the Serverless StarRocks instance. The default port is 9030. You can find this on the FE Details tab of the Instance Details page.<user>: The username for the Serverless StarRocks instance. The default administrator username is admin. You can also add a new user on the User Management page. For more information about adding a user, see Manage users and grant permissions.<password>: The password that corresponds to the<user>.
Query the written data using Serverless Spark.
In this example, add a new Python cell to create a temporary view named
test_view. Use this view to query data fromscore_board. Copy the following code into the new Python cell and click the
icon.# Create a view. create_view_sql=f""" CREATE TEMPORARY VIEW test_view USING starrocks OPTIONS ( "starrocks.table.identifier" = "testdb.score_board", "starrocks.fe.http.url" = "{fe_host}:{fe_http_port}", "starrocks.fe.jdbc.url" = "jdbc:mysql://{fe_host}:{fe_query_port}", "starrocks.user" = "{user}", "starrocks.password" = "{password}" ) """ spark.sql(create_view_sql) # Query. query_sql="SELECT * FROM test_view" result_df = spark.sql(query_sql) result_df.show()The output is shown in the following figure.

Method 2: Use a Spark batch job
Create a Spark batch job.
On the EMR Serverless Spark page, click Development in the navigation pane on the left.
On the Development tab, click the
icon.In the New dialog box, enter a Name, set Type to , and then click OK.
You can select a different type based on your needs. This topic uses SQL as an example. For more information about job types, see Develop an application.
Read data from and write data to StarRocks using a Spark batch job.
In the upper-right corner of the new task development page, select a queue.
For more information about adding a queue, see Manage resource queues.
On the new task development page, configure the following parameters and leave the others at their default settings. Then, click Run.
Parameter
Description
SQL File
This example uses the spark_sql_starrocks.sql file. The file contains the SQL statements from the SQL session section. Before you use the file, download it, modify the configurations as needed, and then upload it on the Artifacts page.
Engine Version
Select an engine version that matches your Spark Connector version.
Network Connection
Select the network connection that you created.
Spark Configuration
In the Spark Configuration section, add the following parameter to load the Spark Connector.
spark.emr.serverless.user.defined.jars oss://<bucketname>/path/connector.jarReplace
oss://<bucketname>/path/connector.jarwith the OSS path of the Spark Connector that you uploaded in Step 1. For example,oss://emr-oss/spark/starrocks-spark-connector-3.5_2.12-1.1.2.jar.
View log information.
In the Execution Records section at the bottom, click Details in the Actions column.
Click the Log Exploration tab to view the log information for the task.

References
Official StarRocks documentation: