All Products
Search
Document Center

DataWorks:Best practices for DataWorks on EMR Serverless StarRocks

Last Updated:Jul 09, 2025

You can add a StarRocks data source to a DataWorks workspace and connect DataWorks to E-MapReduce (EMR) Serverless StarRocks based on the StarRocks data source to implement features, such as data synchronization, development, and analysis, and data services for EMR Serverless StarRocks. This topic describes the relevant procedures in DataWorks.

Background information

Overview of EMR Serverless StarRocks

StarRocks is a next-generation, high-speed data analytics engine that is built based on the Massively Parallel Processing (MPP) framework and supports efficient and unified data analysis.

EMR Serverless StarRocks is a fully managed service of open source StarRocks on Alibaba Cloud. You can create StarRocks instances and manage the instances and data on the EMR Serverless StarRocks page in the EMR console in a flexible manner. StarRocks is compatible with the MySQL protocol. StarRocks provides excellent performance and supports a variety of data models in online analytical processing (OLAP) scenarios, including multi-dimensional analysis, data lake analysis, high-concurrency queries, and real-time data analysis.

Important
  • For workspaces that Participate in Public Preview of Data Studio, a data source with the same name is automatically generated when you associate an EMR Serverless StarRocks computing resource. In this case, you do not need to create a data source as described in this topic.

  • For workspaces that do not Participate in Public Preview of Data Studio, follow the steps in this topic to manually create a StarRocks data source before using StarRocks in DataWorks for development.

Overview of DataWorks on EMR Serverless StarRocks

DataWorks is an end-to-end big data development and governance platform. After you add a StarRocks data source to a DataWorks workspace and connect DataWorks to EMR Serverless StarRocks based on the StarRocks data source, you can implement data synchronization and periodic job scheduling for EMR Serverless StarRocks. In addition, DataWorks offers extensive support for the application of StarRocks in various business scenarios based on the high-speed performance in data analysis and data service of the StarRocks engine.

View terms and the overview of DataWorks services

The following table describes the terms and DataWorks services that are involved when you use EMR Serverless StarRocks.

Term/DataWorks service

Description

References

Resource group

You must use DataWorks resource groups to run various types of tasks in DataWorks.

Data source

You must add a data source to a DataWorks workspace before you can use the data source in DataWorks. If you want to use EMR Serverless StarRocks, you must first add a StarRocks data source. This way, you can connect DataWorks to EMR Serverless StarRocks based on the data source for task development and running.

For more information about the StarRocks data source, see StarRocks data source.

Data Integration

DataWorks provides the Data Integration service that you can use to synchronize data between multiple types of data sources in multiple synchronization scenarios.

For more information about Data Integration, see Data Integration overview.

DataStudio and Operation Center

DataWorks provides the DataStudio and Operation Center services. You can develop and debug tasks in DataStudio, and then commit and deploy the tasks to Operation Center for periodic running.

DataAnalysis

DataWorks provides the DataAnalysis service that allows you to analyze, modify, and share data online.

For more information, see DataAnalysis overview.

DataService Studio

DataWorks provides the DataService Studio service. The service is a flexible, light-weighted, secure, and stable platform that allows you to create and publish APIs. DataService Studio provides comprehensive data services and data sharing capabilities for individuals, teams, and enterprises to help them manage APIs for internal and external systems in a centralized manner.

For more information, see DataService Studio overview.

Data Map

DataWorks provides the Data Map service that is used to manage data directories of enterprises based on metadata. The service provides various features, such as global data search, display of metadata details, data preview, display of data lineages, and data category management. Data Map can help you better search for, understand, and use data.

For more information, see Data Map overview.

Prerequisites

  • DataWorks is activated and a workspace is created. For more information about how to activate DataWorks, see Activate DataWorks.

  • A resource group is purchased and associated with your workspace, and network settings are configured for the resource group. For more information, see Resource group management.

  • An EMR Serverless StarRocks instance is created. For more information.

    Note

    After you create an EMR Serverless StarRocks instance, you can view information about the instance in the EMR console, and connect to the instance by using EMR StarRocks Manager to view database and table information.

  • The IP address or CIDR block of the resource group is added to the IP address whitelist of the EMR Serverless StarRocks instance.

    The following figure shows the entry points for accessing the IP address whitelists of an EMR Serverless StarRocks instance.

    image.png

Add a data source

Before you can use EMR Serverless StarRocks in DataWorks, you must add a StarRocks data source and connect DataWorks to the created EMR Serverless StarRocks instance based on the StarRocks data source. This way, you can use EMR Serverless StarRocks in different services of DataWorks.

Important
  • For workspaces that Participate in Public Preview of Data Studio, a data source with the same name is automatically generated when you associate EMR Serverless StarRocks computing resources. In this case, you do not need to create a data source as described in this topic.

  • For workspaces that do not Participate in Public Preview of Data Studio, follow the steps in this topic to manually create a StarRocks data source before using StarRocks in DataWorks for development.

For more information about a StarRocks data source, see StarRocks data source. The following information describes the entry point for adding a StarRocks data source and the key configurations when you add a StarRocks data source.

  1. Go to the Data Sources page.

    1. Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose More > Management Center. On the page that appears, select the desired workspace from the drop-down list and click Go to Management Center.

    2. In the left-side navigation pane of the SettingCenter page, click Data Sources.

  2. On the Data Sources page, click Add Data Source and configure the parameters. The following table describes the key configurations. You can use the default values for other parameters.

    The method to add a StarRocks data source varies based on the network connectivity between the EMR Serverless StarRocks instance and your DataWorks resource group. For more information about network connectivity solutions, see Network connectivity solutions.

    Connection over an internal network

    Key parameter

    Description

    Configuration Mode

    The mode in which you want to add the data source. Select Alibaba Cloud Instance Mode.

    Alibaba Cloud Account

    • Select Current Alibaba Cloud Account if the EMR Serverless StarRocks instance belongs to the same Alibaba Cloud account as DataWorks.

    • Select Another Alibaba Cloud Account if the EMR Serverless StarRocks instance does not belong to the same Alibaba Cloud account as DataWorks. After you select Another Alibaba Cloud Account, you must also configure the UID of Another Alibaba Cloud Account and RAM Role parameters. For more information about RAM role configuration, see Configure cross-account authorization.

    Region

    Select the region in which the EMR Serverless StarRocks instance resides.

    Instance

    Select the created EMR Serverless StarRocks instance.

    Database Name

    The name of the database to be connected. You can use EMR StarRocks Manager to connect to the instance and view the database name on the Metadata Management page.

    Username and Password

    The username and password that are used to access the instance.

    By default, when you create an EMR Serverless StarRocks instance, an admin user is created. The password is the custom password that is specified when you create the instance.

    Connection Configuration

    In this section, you must test the network connectivity between the data source and the purchased resource group. If the connection status is Connected, the data source is connected to the resource group.

    Connection over the Internet

    Key parameter

    Description

    Configuration Mode

    The mode in which you want to add the data source. Select Connection String Mode.

    Host Address/IP Address

    The public address of the FE node in the EMR Serverless StarRocks instance.

    image

    Port

    The query port of the FE node in the EMR Serverless StarRocks instance. The default query port is 9030.

    Load URL

    The address of the FE node that is used to access Stream Load in StarRocks. The address is in the Public address of an FE node:HTTP port of an FE node format. If you specify multiple addresses, separate the addresses with commas (,).

    Database Name

    The name of the database to be connected. You can use EMR StarRocks Manager to connect to the instance and view the database name on the Metadata Management page.

    Username and Password

    The username and password that are used to access the instance.

    By default, when you create an EMR Serverless StarRocks instance, an admin user is created. The password is the custom password that is specified when you create the instance.

    Connection Configuration

    In this section, you must test the network connectivity between the data source and the purchased resource group. If the connection status is Connected, the data source is connected to the resource group.

Data synchronization

DataWorks allows you to synchronize data from multiple types of data sources, such as MySQL, Hive, Kafka, Object Storage Service (OSS), and Hadoop Distributed File System (HDFS), to EMR Serverless StarRocks tables. The following example describes how to create a batch synchronization node in DataStudio to synchronize data from a MySQL data source to EMR Serverless StarRocks tables.

Note

For information about how to configure a synchronization node for a StarRocks data source, see StarRocks data source.

  1. Go to the DataStudio page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and O&M > Data Development. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.

  2. Create a batch synchronization node. Specify MySQL as the source and StarRocks as the destination.

    image.png

  3. Select a resource group and separately test the network connectivity between the resource group and the source and destination.

  4. Specify the scheduling cycle, and commit and deploy the node.

    After the debugging is complete, you can perform the following operations to allow the node to be periodically run: Click Properties in the right-side navigation pane of the configuration tab of the node. On the Properties tab, configure parameters such as the scheduling cycle and rerun policy, and select a resource group that you want to use to run the node. Then, commit and deploy the node.

Data development, scheduling, and O&M

For an EMR Serverless StarRocks task that requires periodic scheduling, you can perform the following steps to allow the task to be periodically run:

  1. Go to the DataStudio page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and O&M > Data Development. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.

  2. Create a StarRocks node in DataStudio. On the configuration tab of the StarRocks node, select the desired StarRocks data source, and write SQL statements.

    image.png

  3. Select the SQL statements that you want to debug, click Run, and then select a resource group to perform the debug operation.

  4. Specify the scheduling cycle, and commit and deploy the node.

    After the debugging is complete, you can perform the following operations to allow the task on the node to be periodically run: Click Properties in the right-side navigation pane of the configuration tab of the node. On the Properties tab, configure parameters such as the scheduling cycle and rerun policy, and select a resource group that you want to use to run the node. Then, commit and deploy the node.

Data analysis

You can use the DataAnalysis service provided by DataWorks to quickly analyze data in EMR Serverless StarRocks tables.

  1. Go to the SQL Query page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Analysis and Service > DataAnalysis. On the page that appears, click Go to DataAnalysis. In the left-side navigation pane of the page that appears, click SQL Query.

  2. Click the image icon in the left-side navigation pane and choose More > System Management. On the System Management page, select a resource group for the StarRocks engine type.

    image.png

  3. Go to the SQL Query page and create an SQL query file. On the configuration tab of the created file, select StarRocks from the Data Source Type drop-down list and a data source from the Data Source Name drop-down list in the upper-right corner, and modify and execute SQL query statements to analyze EMR Serverless StarRocks data.

    image.png

Data services

DataService Studio allows you to create APIs of the StarRocks data source type.

  1. Go to the DataService Studio page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Analysis and Service > DataService Studio. On the page that appears, select the desired workspace from the drop-down list and click Go to DataService Studio.

  2. Create an API and configure parameters.

    DataService Studio allows you to create an API by using the codeless UI or code editor. In code editor mode, the request parameters and response parameters of an API can be automatically generated based on SQL statements. The following figure shows an example on how to create an API by using the codeless UI.

    image.png

    On the configuration tab of the API, set the Datasource Type parameter to StarRocks, select the added StarRocks data source, and then select a table. Then, select API configurations such as request parameters and response parameters as prompted.

  3. Click Resource Group in the right-side navigation pane of the configuration tab of the API. On the Resource Group tab, set the Scheme parameter to Exclusive Resource Group for DataService Studio and select an exclusive resource group for DataService Studio based on your business requirements.

    image.png

  4. Test the API. After the test is successful, submit and publish the API.

Data Map

The Data Map service allows you to collect metadata of the StarRocks data source, search for tables, and view information on the table details page.

Collect metadata

  1. Go to the Data Map page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Governance > Data Map. On the page that appears, click Go to Data Map.

  2. In the left-side navigation pane of the DataMap page, click the image icon. On the page that appears, find StarRocks on the Data Source Perspective tab and click Manage in the upper-right corner.

  3. Click the Data Sources for Which No Crawler Is Created tab. Find the desired data source and click Create Crawler in the Actions column.

  4. In the Configure Collection Plan dialog box, configure the Resource Group Name parameter and click Test Network Connectivity to test the network connectivity. After the test is successful, configure the Collection Plan parameter and click Confirmation.

    image

    Note
    • For more information about metadata collection, see Metadata collection.

    • Only serverless resource groups can be used to run this type of task.

Search for tables

  1. Go to the Data Map page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Governance > Data Map. On the page that appears, click Go to Data Map.

  2. In the left-side navigation pane of the DataMap page, click the image icon. On the page that appears, select StarRocks on the Data Source tab, and then search for tables by type in the upper part of the page.

    Note

    For more information about table search, see Query and manage common data.

    image

View table details

  1. Go to the Data Map page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Governance > Data Map. On the page that appears, click Go to Data Map.

  2. Find the desired table on the homepage of Data Map or use the search feature to find the desired table. Then, click the table name to go to the details page of the table.

  3. On the table details page, you can view information in the Table Basic Information, Technical Information, and Business Information sections, and view information on the Details, Output, and Lineage tabs.

    Note
    • For more information about table details, see Query and manage common data.

    • EMR Serverless StarRocks instances of V3.1.13, V3.2.9, or a minor version later than V3.1.13 or V3.2.9 allow you to enable the metadata lineage analysis feature. For more information, see View lineages.

    image