You can add a StarRocks data source to a DataWorks workspace and connect DataWorks to E-MapReduce (EMR) Serverless StarRocks based on the StarRocks data source to implement features, such as data synchronization, development, and analysis, and data services for EMR Serverless StarRocks. This topic describes the relevant procedures in DataWorks.
Background information
Overview of EMR Serverless StarRocks
StarRocks is a next-generation, high-speed data analytics engine that is built based on the Massively Parallel Processing (MPP) framework and supports efficient and unified data analysis.
EMR Serverless StarRocks is a fully managed service of open source StarRocks on Alibaba Cloud. You can create StarRocks instances and manage the instances and data on the EMR Serverless StarRocks page in the EMR console in a flexible manner. StarRocks is compatible with the MySQL protocol. StarRocks provides excellent performance and supports a variety of data models in online analytical processing (OLAP) scenarios, including multi-dimensional analysis, data lake analysis, high-concurrency queries, and real-time data analysis.
For workspaces that Participate in Public Preview of Data Studio, a data source with the same name is automatically generated when you associate an EMR Serverless StarRocks computing resource. In this case, you do not need to create a data source as described in this topic.
For workspaces that do not Participate in Public Preview of Data Studio, follow the steps in this topic to manually create a StarRocks data source before using StarRocks in DataWorks for development.
Overview of DataWorks on EMR Serverless StarRocks
DataWorks is an end-to-end big data development and governance platform. After you add a StarRocks data source to a DataWorks workspace and connect DataWorks to EMR Serverless StarRocks based on the StarRocks data source, you can implement data synchronization and periodic job scheduling for EMR Serverless StarRocks. In addition, DataWorks offers extensive support for the application of StarRocks in various business scenarios based on the high-speed performance in data analysis and data service of the StarRocks engine.
Prerequisites
DataWorks is activated and a workspace is created. For more information about how to activate DataWorks, see Activate DataWorks.
A resource group is purchased and associated with your workspace, and network settings are configured for the resource group. For more information, see Resource group management.
An EMR Serverless StarRocks instance is created. For more information.
NoteAfter you create an EMR Serverless StarRocks instance, you can view information about the instance in the EMR console, and connect to the instance by using EMR StarRocks Manager to view database and table information.
The IP address or CIDR block of the resource group is added to the IP address whitelist of the EMR Serverless StarRocks instance.
The following figure shows the entry points for accessing the IP address whitelists of an EMR Serverless StarRocks instance.

Add a data source
Before you can use EMR Serverless StarRocks in DataWorks, you must add a StarRocks data source and connect DataWorks to the created EMR Serverless StarRocks instance based on the StarRocks data source. This way, you can use EMR Serverless StarRocks in different services of DataWorks.
For workspaces that Participate in Public Preview of Data Studio, a data source with the same name is automatically generated when you associate EMR Serverless StarRocks computing resources. In this case, you do not need to create a data source as described in this topic.
For workspaces that do not Participate in Public Preview of Data Studio, follow the steps in this topic to manually create a StarRocks data source before using StarRocks in DataWorks for development.
For more information about a StarRocks data source, see StarRocks data source. The following information describes the entry point for adding a StarRocks data source and the key configurations when you add a StarRocks data source.
Go to the Data Sources page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Management Center.
In the left-side navigation pane of the SettingCenter page, click Data Sources.
On the Data Sources page, click Add Data Source and configure the parameters. The following table describes the key configurations. You can use the default values for other parameters.
The method to add a StarRocks data source varies based on the network connectivity between the EMR Serverless StarRocks instance and your DataWorks resource group. For more information about network connectivity solutions, see Network connectivity solutions.
Connection over an internal network
Key parameter
Description
Configuration Mode
The mode in which you want to add the data source. Select Alibaba Cloud Instance Mode.
Alibaba Cloud Account
Select Current Alibaba Cloud Account if the EMR Serverless StarRocks instance belongs to the same Alibaba Cloud account as DataWorks.
Select Another Alibaba Cloud Account if the EMR Serverless StarRocks instance does not belong to the same Alibaba Cloud account as DataWorks. After you select Another Alibaba Cloud Account, you must also configure the UID of Another Alibaba Cloud Account and RAM Role parameters. For more information about RAM role configuration, see Configure cross-account authorization.
Region
Select the region in which the EMR Serverless StarRocks instance resides.
Instance
Select the created EMR Serverless StarRocks instance.
Database Name
The name of the database to be connected. You can use EMR StarRocks Manager to connect to the instance and view the database name on the Metadata Management page.
Username and Password
The username and password that are used to access the instance.
By default, when you create an EMR Serverless StarRocks instance, an admin user is created. The password is the custom password that is specified when you create the instance.
Connection Configuration
In this section, you must test the network connectivity between the data source and the purchased resource group. If the connection status is Connected, the data source is connected to the resource group.
Connection over the Internet
Key parameter
Description
Configuration Mode
The mode in which you want to add the data source. Select Connection String Mode.
Host Address/IP Address
The public address of the FE node in the EMR Serverless StarRocks instance.

Port
The query port of the FE node in the EMR Serverless StarRocks instance. The default query port is 9030.
Load URL
The address of the FE node that is used to access Stream Load in StarRocks. The address is in the
Public address of an FE node:HTTP port of an FE nodeformat. If you specify multiple addresses, separate the addresses with commas (,).Database Name
The name of the database to be connected. You can use EMR StarRocks Manager to connect to the instance and view the database name on the Metadata Management page.
Username and Password
The username and password that are used to access the instance.
By default, when you create an EMR Serverless StarRocks instance, an admin user is created. The password is the custom password that is specified when you create the instance.
Connection Configuration
In this section, you must test the network connectivity between the data source and the purchased resource group. If the connection status is Connected, the data source is connected to the resource group.
Data synchronization
DataWorks allows you to synchronize data from multiple types of data sources, such as MySQL, Hive, Kafka, Object Storage Service (OSS), and Hadoop Distributed File System (HDFS), to EMR Serverless StarRocks tables. The following example describes how to create a batch synchronization node in DataStudio to synchronize data from a MySQL data source to EMR Serverless StarRocks tables.
For information about how to configure a synchronization node for a StarRocks data source, see StarRocks data source.
Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
Create a batch synchronization node. Specify MySQL as the source and StarRocks as the destination.

Select a resource group and separately test the network connectivity between the resource group and the source and destination.
Specify the scheduling cycle, and commit and deploy the node.
After the debugging is complete, you can perform the following operations to allow the node to be periodically run: Click Properties in the right-side navigation pane of the configuration tab of the node. On the Properties tab, configure parameters such as the scheduling cycle and rerun policy, and select a resource group that you want to use to run the node. Then, commit and deploy the node.
Data development, scheduling, and O&M
For an EMR Serverless StarRocks task that requires periodic scheduling, you can perform the following steps to allow the task to be periodically run:
Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
Create a StarRocks node in DataStudio. On the configuration tab of the StarRocks node, select the desired StarRocks data source, and write SQL statements.

Select the SQL statements that you want to debug, click Run, and then select a resource group to perform the debug operation.
Specify the scheduling cycle, and commit and deploy the node.
After the debugging is complete, you can perform the following operations to allow the task on the node to be periodically run: Click Properties in the right-side navigation pane of the configuration tab of the node. On the Properties tab, configure parameters such as the scheduling cycle and rerun policy, and select a resource group that you want to use to run the node. Then, commit and deploy the node.
Data analysis
You can use the DataAnalysis service provided by DataWorks to quickly analyze data in EMR Serverless StarRocks tables.
Go to the SQL Query page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, click Go to DataAnalysis. In the left-side navigation pane of the page that appears, click SQL Query.
Click the
icon in the left-side navigation pane and choose . On the System Management page, select a resource group for the StarRocks engine type. 
Go to the SQL Query page and create an SQL query file. On the configuration tab of the created file, select StarRocks from the Data Source Type drop-down list and a data source from the Data Source Name drop-down list in the upper-right corner, and modify and execute SQL query statements to analyze EMR Serverless StarRocks data.

Data services
DataService Studio allows you to create APIs of the StarRocks data source type.
Go to the DataService Studio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to DataService Studio.
Create an API and configure parameters.
DataService Studio allows you to create an API by using the codeless UI or code editor. In code editor mode, the request parameters and response parameters of an API can be automatically generated based on SQL statements. The following figure shows an example on how to create an API by using the codeless UI.

On the configuration tab of the API, set the Datasource Type parameter to StarRocks, select the added StarRocks data source, and then select a table. Then, select API configurations such as request parameters and response parameters as prompted.
Click Resource Group in the right-side navigation pane of the configuration tab of the API. On the Resource Group tab, set the Scheme parameter to Exclusive Resource Group for DataService Studio and select an exclusive resource group for DataService Studio based on your business requirements.

Test the API. After the test is successful, submit and publish the API.
Data Map
The Data Map service allows you to collect metadata of the StarRocks data source, search for tables, and view information on the table details page.
Collect metadata
Go to the Data Map page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, click Go to Data Map.
In the left-side navigation pane of the DataMap page, click the
icon. On the page that appears, find StarRocks on the Data Source Perspective tab and click Manage in the upper-right corner. Click the Data Sources for Which No Crawler Is Created tab. Find the desired data source and click Create Crawler in the Actions column.
In the Configure Collection Plan dialog box, configure the Resource Group Name parameter and click Test Network Connectivity to test the network connectivity. After the test is successful, configure the Collection Plan parameter and click Confirmation.
NoteFor more information about metadata collection, see Metadata collection.
Only serverless resource groups can be used to run this type of task.
Search for tables
Go to the Data Map page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, click Go to Data Map.
In the left-side navigation pane of the DataMap page, click the
icon. On the page that appears, select StarRocks on the Data Source tab, and then search for tables by type in the upper part of the page. NoteFor more information about table search, see Query and manage common data.

View table details
Go to the Data Map page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, click Go to Data Map.
Find the desired table on the homepage of Data Map or use the search feature to find the desired table. Then, click the table name to go to the details page of the table.
On the table details page, you can view information in the Table Basic Information, Technical Information, and Business Information sections, and view information on the Details, Output, and Lineage tabs.
NoteFor more information about table details, see Query and manage common data.
EMR Serverless StarRocks instances of V3.1.13, V3.2.9, or a minor version later than V3.1.13 or V3.2.9 allow you to enable the metadata lineage analysis feature. For more information, see View lineages.
