All Products
Search
Document Center

E-MapReduce:Use DataWorks to connect to a StarRocks instance

Last Updated:Apr 22, 2025

DataWorks provides StarRocks Reader and StarRocks Writer for you to read data from and write data to StarRocks data sources. You can create a StarRocks node to develop and periodically schedule StarRocks tasks and integrate StarRocks tasks with other types of tasks. This topic describes how to use DataWorks to connect to an E-MapReduce (EMR) StarRocks instance.

Prerequisites

  • DataWorks is activated, and a workspace is created. For more information, see Activate DataWorks.

  • A resource group is purchased and associated with your workspace, and network settings are configured for the resource group. For more information, see Resource group management.

  • An EMR Serverless StarRocks instance is created. For more information, see Create an instance.

Procedure

Step 1: Configure network settings

To ensure the network connectivity of the resource group that you want to use, you must add the IP address or CIDR block of the resource group to the internal IP address whitelist of the desired EMR Serverless StarRocks instance in advance.

Step 2: Create a StarRocks data source

  1. Go to the Data Integration page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Integration > Data Integration. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.

  2. In the left-side navigation pane, click Data source to go to the Data Sources page.

  3. On the Data Sources page, click Add Data Source.

    1. In the Add Data Source dialog box, enter StarRocks in the search box and click StarRocks.

    2. In the Add StarRocks Data Source dialog box, configure the required parameters described in the following table and use the default values for other parameters.

      image

      Parameter

      Description

      Data Source Name

      Specify a name based on your business requirements. In this example, the name is StarRocks.

      Configuration Mode

      Select Alibaba Cloud Instance Mode.

      • If you want to connect to an EMR Serverless StarRocks instance over an internal network, make sure that the DataWorks resource group you use and the StarRocks instance are in the same VPC.

      • If you want to connect to an EMR Serverless StarRocks instance over the Internet, set the parameter to Connection String Mode. For more information, see StarRocks data source.

      Region

      Select the region where the EMR Serverless StarRocks instance resides. Example: China East 1 (Hangzhou).

      Instance

      Select the created EMR Serverless StarRocks instance from the drop-down list.

      Database Name

      Specify the name of the database to which you want to connect. You can obtain the database name by using the following methods:

      Note

      When you write SQL statements in DataWorks, if you want to access tables across databases, make sure that you are granted the permissions to access the database and specify the table name in the <database name>.<table name> format.

      Username and Password

      The username and password used to access the EMR Serverless StarRocks instance.

      The default administrator user is admin, and the password is the password that you specify when you create the StarRocks instance. If you forget the password, you can reset the password by referring to How do I reset the password of a StarRocks instance?

  4. In the Connection Configuration section of the dialog box, find the resource group that is associated with the workspace and click Test Network Connectivity in the Connection Status column.

    • If Connected is displayed in the Connection Status column, proceed to the next step.

    • If Connection failed is displayed in the Connection Status column, the resource group cannot be connected to the data source. In this case, tasks that use the data source cannot be run. In the Network Connectivity Diagnostic Tool panel that appears, you can view the failure cause and troubleshoot connectivity issues.

  5. Click Complete.

Step 3: Create a StarRocks node

You can write SQL statements for a StarRocks node to develop, debug, and schedule tasks. After you create a StarRocks node, you can specify the scheduling cycle of the node.

  1. Go to the DataStudio page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and O&M > Data Development. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.

  2. Find the desired workflow, right-click the workflow name, and then choose Create Node > Database > StarRocks.

  3. In the Create Node dialog box, configure the Name parameter and click Confirm. Then, you can use the created node to develop and configure tasks.

Step 4: Develop StarRocks tasks

  1. On the configuration tab of the StarRocks node, select the created StarRocks data source from the Select Data Source drop-down list.

  2. Write and run SQL code.

    Write SQL code based on your business requirements and run the SQL code. In the dialog box that appears, select the resource group for scheduling that you want to use from the drop-down list. The following examples describe how to develop StarRocks tasks:

    • Example 1: Create a database

      CREATE DATABASE IF NOT EXISTS load_test;

      After the statement is successfully executed, you can verify the result in EMR StarRocks Manager.

      1. Use EMR StarRocks Manager to connect to an EMR Serverless StarRocks instance.

      2. In the left-side navigation pane of EMR StarRocks Manager, click SQL Editor. Create a file, enter the following command in the SQL editor, and then click Run:

        SHOW DATABASES;

        If the load_test database appears in the result, the database is created.

        image

    • Example 2: Query information about the tables in the StarRocks database

      SELECT * FROM information_schema.tables
      WHERE table_type = 'BASE TABLE';

      The following figure shows the output.

      image

References

For more information about how to use a StarRocks node to develop and schedule tasks, see Configure a StarRocks node.