All Products
Search
Document Center

E-MapReduce:Read from and write to StarRocks

Last Updated:Nov 29, 2025

StarRocks provides an official Spark Connector to read and write data between Spark and StarRocks. You can configure Serverless Spark to connect to StarRocks during development. This topic describes how to read data from and write data to StarRocks in EMR Serverless Spark.

Prerequisites

  • A Serverless Spark workspace has been created. For more information, see Create a workspace.

  • An EMR Serverless StarRocks instance has been created. For more information, see Create an instance.

Limitations

The Serverless Spark engine version must be esr-2.5.0, esr-3.1.0, esr-4.1.0, or later.

Procedure

Step 1: Get the Spark Connector JAR file and upload it to OSS

  1. For more information, see Read data from StarRocks using Spark connector. Download the Spark Connector JAR file that corresponds to your Spark version.

    For example, this topic uses a pre-compiled JAR file downloaded from the Maven Central Repository.

    Note

    The Connector JAR file is named in the format starrocks-spark-connector-${spark_version}_${scala_version}-${connector_version}.jar. For example, if you use engine version esr-4.1.0 (Spark 3.5.2, Scala 2.12) and want to use Connector version 1.1.2, you must select starrocks-spark-connector-3.5_2.12-1.1.2.jar.

  2. Upload the downloaded Spark Connector JAR file to Alibaba Cloud Object Storage Service (OSS). For more information about uploading files, see Simple upload.

Step 2: Add network connectivity

  1. Obtain network information.

    On the EMR Serverless Starrocks page, go to the Instance Details page of the target StarRocks instance to obtain its virtual private cloud (VPC) and vSwitch information.

  2. Add a network connection.

    1. On the EMR Serverless Spark page, go to the Network Connection page of the target Spark workspace and click Create Network Connection.

    2. In the Create Network Connection dialog box, enter a Name, select the VPC and vSwitch of the StarRocks instance, and then click OK.

      For more information about network connectivity, see Enable network communication between EMR Serverless Spark and other VPCs.

Step 3: Create a database and a table in StarRocks

  1. Connect to the StarRocks instance. For more information, see Connect to a StarRocks instance using EMR StarRocks Manager.

  2. On the Queries page of the SQL Editor, click File or the image icon in the right-side area, and then click Confirm to add a new file.

  3. In the new file, enter the following SQL statement and click Run.

    CREATE DATABASE `testdb`;
    
    CREATE TABLE `testdb`.`score_board`
    (
        `id` int(11) NOT NULL COMMENT '',
        `name` varchar(65533) NULL DEFAULT '' COMMENT '',
        `score` int(11) NOT NULL DEFAULT '0' COMMENT ''
    )
    ENGINE=OLAP
    PRIMARY KEY(`id`)
    COMMENT 'OLAP'
    DISTRIBUTED BY HASH(`id`);

Step 4: Read from and write to StarRocks using Serverless Spark

Method 1: Use SQL sessions and Notebook sessions

For more information about session types, see Session Manager.

SQL sessions

  1. Write data to StarRocks using Serverless Spark.

    1. Create an SQL session. For more information, see Manage SQL sessions.

      When you create the session, select an engine version that matches your Spark Connector version. For Network Connection, select the network connection that you created in the previous step. In the Spark Configuration section, add the following parameter to load the Spark Connector.

      spark.emr.serverless.user.defined.jars  oss://<bucketname>/path/connector.jar

      Replace oss://<bucketname>/path/connector.jar with the OSS path of the Spark Connector that you uploaded in Step 1. For example, oss://emr-oss/spark/starrocks-spark-connector-3.5_2.12-1.1.2.jar.

    2. On the Development page, create a SparkSQL task. In the upper-right corner, select the SQL session that you created.

      For more information, see Develop a SparkSQL job.

    3. Copy the following code to the new SparkSQL tab. Modify the parameters as needed, and then click Run.

      CREATE TEMPORARY VIEW score_board
      USING starrocks
      OPTIONS
      (
        "starrocks.table.identifier" = "testdb.score_board",
        "starrocks.fe.http.url" = "<fe_host>:<fe_http_port>",
        "starrocks.fe.jdbc.url" = "jdbc:mysql://<fe_host>:<fe_query_port>",
        "starrocks.user" = "<user>",
        "starrocks.password" = "<password>"
      );
      
      INSERT INTO `score_board` VALUES (1, "starrocks", 100), (2, "spark", 100);

      The following table describes the parameters.

      • <fe_host>: The internal or public endpoint of the FE node in the Serverless StarRocks instance. You can find this on the FE Details tab of the Instance Details page.

        • If you use an internal endpoint, ensure that the services are in the same VPC.

        • If you use a public endpoint, ensure that the security group rules allow communication on the required ports. For more information, see Network access and security configuration.

      • <fe_http_port>: The HTTP port of the FE node in the Serverless StarRocks instance. The default port is 8030. You can find this on the FE Details tab of the Instance Details page.

      • <fe_query_port>: The query port of the FE node in the Serverless StarRocks instance. The default port is 9030. You can find this on the FE Details tab of the Instance Details page.

      • <user>: The username for the Serverless StarRocks instance. The default administrator username is admin. You can also add a new user on the User Management page. For more information about adding a user, see Manage users and grant permissions.

      • <password>: The password that corresponds to the <user>.

  2. Query the written data using Serverless Spark.

    In this example, a temporary view named test_view is created in the SparkSQL task to query data from score_board. Copy the following code to the new SparkSQL tab, select the code, and then click Run the selected(1000).

    CREATE TEMPORARY VIEW test_view
    USING starrocks
    OPTIONS
    (
       "starrocks.table.identifier" = "testdb.score_board",
       "starrocks.fe.http.url" = "<fe_host>:<fe_http_port>",
       "starrocks.fe.jdbc.url" = "jdbc:mysql://<fe_host>:<fe_query_port>",
       "starrocks.user" = "<user>",
       "starrocks.password" = "<password>"
    );
    
    SELECT * FROM test_view;

    The output is shown in the following figure.

    image

Notebook sessions

  1. Write data to StarRocks using Serverless Spark.

    1. Create a Notebook session. For more information, see Manage Notebook sessions.

      When you create the session, select an engine version that matches your Spark Connector version. For Network Connection, select the network connection that you created in the previous step. In the Spark Configuration section, add the following parameter to load the Spark Connector.

      spark.emr.serverless.user.defined.jars  oss://<bucketname>/path/connector.jar

      Replace oss://<bucketname>/path/connector.jar with the OSS path of the Spark Connector that you uploaded in Step 1. For example, oss://emr-oss/spark/starrocks-spark-connector-3.5_2.12-1.1.2.jar.

    2. On the Data Development page, create an Interactive Development > Notebook task. In the upper-right corner, select the Notebook session that you created.

      For more information, see Manage Notebook sessions.

    3. Copy the following code into a Python cell of the new Notebook and click Run.

      # Replace with your Serverless StarRocks configurations.
      fe_host = "<fe_host>"
      fe_http_port = "<fe_http_port>"
      fe_query_port = "<fe_query_port>"
      user = "<user>"
      password = "<password>"
      
      # Create a view.
      create_table_sql = f"""
      CREATE TEMPORARY VIEW score_board
      USING starrocks
      OPTIONS (
        "starrocks.table.identifier" = "testdb.score_board",
        "starrocks.fe.http.url" = "{fe_host}:{fe_http_port}",
        "starrocks.fe.jdbc.url" = "jdbc:mysql://{fe_host}:{fe_query_port}",
        "starrocks.user" = "{user}",
        "starrocks.password" = "{password}"
      )
      """
      
      spark.sql(create_table_sql)
      
      # Insert data.
      insert_data_sql = """
      INSERT INTO `score_board` VALUES (1, "starrocks", 100), (2, "spark", 100)
      """
      
      spark.sql(insert_data_sql)
      

      The following figure shows an example.

      image

      The following table describes the parameters.

      • <fe_host>: The internal or public endpoint of the FE node in the Serverless StarRocks instance. You can find this on the FE Details tab of the Instance Details page.

        • If you use an internal endpoint, ensure that the services are in the same VPC.

        • If you use a public endpoint, ensure that the security group rules allow communication on the required ports. For more information, see Network access and security configuration.

      • <fe_http_port>: The HTTP port of the FE node in the Serverless StarRocks instance. The default port is 8030. You can find this on the FE Details tab of the Instance Details page.

      • <fe_query_port>: The query port of the FE node in the Serverless StarRocks instance. The default port is 9030. You can find this on the FE Details tab of the Instance Details page.

      • <user>: The username for the Serverless StarRocks instance. The default administrator username is admin. You can also add a new user on the User Management page. For more information about adding a user, see Manage users and grant permissions.

      • <password>: The password that corresponds to the <user>.

  2. Query the written data using Serverless Spark.

    In this example, add a new Python cell to create a temporary view named test_view. Use this view to query data from score_board. Copy the following code into the new Python cell and click the image icon.

    # Create a view.
    create_view_sql=f"""
    CREATE TEMPORARY VIEW test_view
    USING starrocks
    OPTIONS (
      "starrocks.table.identifier" = "testdb.score_board",
      "starrocks.fe.http.url" = "{fe_host}:{fe_http_port}",
      "starrocks.fe.jdbc.url" = "jdbc:mysql://{fe_host}:{fe_query_port}",
      "starrocks.user" = "{user}",
      "starrocks.password" = "{password}"
    )
    """
    spark.sql(create_view_sql)
      
    # Query.
    query_sql="SELECT * FROM test_view"
    result_df = spark.sql(query_sql)
    result_df.show()

    The output is shown in the following figure.

    image

Method 2: Use a Spark batch job

  1. Create a Spark batch job.

    1. On the EMR Serverless Spark page, click Development in the navigation pane on the left.

    2. On the Development tab, click the image icon.

    3. In the New dialog box, enter a Name, set Type to Application(Batch) > SQL, and then click OK.

      You can select a different type based on your needs. This topic uses SQL as an example. For more information about job types, see Develop an application.

  2. Read data from and write data to StarRocks using a Spark batch job.

    1. In the upper-right corner of the new task development page, select a queue.

      For more information about adding a queue, see Manage resource queues.

    2. On the new task development page, configure the following parameters and leave the others at their default settings. Then, click Run.

      Parameter

      Description

      SQL File

      This example uses the spark_sql_starrocks.sql file. The file contains the SQL statements from the SQL session section. Before you use the file, download it, modify the configurations as needed, and then upload it on the Artifacts page.

      spark_sql_starrocks.sql parameters

      The following table describes the parameters.

      • <fe_host>: The internal or public endpoint of the FE node in the Serverless StarRocks instance. You can find this on the FE Details tab of the Instance Details page.

        • If you use an internal endpoint, ensure that the services are in the same VPC.

        • If you use a public endpoint, ensure that the security group rules allow communication on the required ports. For more information, see Network access and security configuration.

      • <fe_http_port>: The HTTP port of the FE node in the Serverless StarRocks instance. The default port is 8030. You can find this on the FE Details tab of the Instance Details page.

      • <fe_query_port>: The query port of the FE node in the Serverless StarRocks instance. The default port is 9030. You can find this on the FE Details tab of the Instance Details page.

      • <user>: The username for the Serverless StarRocks instance. The default administrator username is admin. You can also add a new user on the User Management page. For more information about adding a user, see Manage users and grant permissions.

      • <password>: The password that corresponds to the <user>.

      Engine Version

      Select an engine version that matches your Spark Connector version.

      Network Connection

      Select the network connection that you created.

      Spark Configuration

      In the Spark Configuration section, add the following parameter to load the Spark Connector.

      spark.emr.serverless.user.defined.jars  oss://<bucketname>/path/connector.jar

      Replace oss://<bucketname>/path/connector.jar with the OSS path of the Spark Connector that you uploaded in Step 1. For example, oss://emr-oss/spark/starrocks-spark-connector-3.5_2.12-1.1.2.jar.

  3. View log information.

    1. In the Execution Records section at the bottom, click Details in the Actions column.

    2. Click the Log Exploration tab to view the log information for the task.

      image

References

Official StarRocks documentation: