DataWorks provides StarRocks Reader and StarRocks Writer for you to read data from and write data to StarRocks data sources. You can create a StarRocks node to develop and periodically schedule StarRocks tasks and integrate StarRocks tasks with other types of tasks. This topic describes how to use DataWorks to connect to an E-MapReduce (EMR) StarRocks instance.
Prerequisites
DataWorks is activated, and a workspace is created. For more information, see Activate DataWorks.
A resource group is purchased and associated with your workspace, and network settings are configured for the resource group. For more information, see Resource group management.
An EMR Serverless StarRocks instance is created. For more information, see Create an instance.
Procedure
Step 1: Configure network settings
To ensure the network connectivity of the resource group that you want to use, you must add the IP address or CIDR block of the resource group to the internal IP address whitelist of the desired EMR Serverless StarRocks instance in advance.
For information about how to obtain the IP address or CIDR block of a resource group in DataWorks, see Configure an IP address whitelist.
The following figure shows the entry points for accessing IP address whitelists of an EMR Serverless StarRocks instance. For more information, see Configure network security settings.
Step 2: Create a StarRocks data source
Go to the Data Integration page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.
In the left-side navigation pane, click Data source to go to the Data Sources page.
On the Data Sources page, click Add Data Source.
In the Add Data Source dialog box, enter StarRocks in the search box and click StarRocks.
In the Add StarRocks Data Source dialog box, configure the required parameters described in the following table and use the default values for other parameters.
Parameter
Description
Data Source Name
Specify a name based on your business requirements. In this example, the name is StarRocks.
Configuration Mode
Select Alibaba Cloud Instance Mode.
If you want to connect to an EMR Serverless StarRocks instance over an internal network, make sure that the DataWorks resource group you use and the StarRocks instance are in the same VPC.
If you want to connect to an EMR Serverless StarRocks instance over the Internet, set the parameter to Connection String Mode. For more information, see StarRocks data source.
Region
Select the region where the EMR Serverless StarRocks instance resides. Example: China East 1 (Hangzhou).
Instance
Select the created EMR Serverless StarRocks instance from the drop-down list.
Database Name
Specify the name of the database to which you want to connect. You can obtain the database name by using the following methods:
Use EMR StarRocks Manager to connect to an EMR Serverless StarRocks instance and view the name of an existing database on the Metadata Management page.
Use the name of the built-in database in the EMR Serverless StarRocks instance. Example: information_schema.
NoteWhen you write SQL statements in DataWorks, if you want to access tables across databases, make sure that you are granted the permissions to access the database and specify the table name in the
<database name>.<table name>
format.Username and Password
The username and password used to access the EMR Serverless StarRocks instance.
The default administrator user is
admin
, and the password is the password that you specify when you create the StarRocks instance. If you forget the password, you can reset the password by referring to How do I reset the password of a StarRocks instance?
In the Connection Configuration section of the dialog box, find the resource group that is associated with the workspace and click Test Network Connectivity in the Connection Status column.
If Connected is displayed in the Connection Status column, proceed to the next step.
If Connection failed is displayed in the Connection Status column, the resource group cannot be connected to the data source. In this case, tasks that use the data source cannot be run. In the Network Connectivity Diagnostic Tool panel that appears, you can view the failure cause and troubleshoot connectivity issues.
Click Complete.
Step 3: Create a StarRocks node
You can write SQL statements for a StarRocks node to develop, debug, and schedule tasks. After you create a StarRocks node, you can specify the scheduling cycle of the node.
Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
Find the desired workflow, right-click the workflow name, and then choose
.In the Create Node dialog box, configure the Name parameter and click Confirm. Then, you can use the created node to develop and configure tasks.
Step 4: Develop StarRocks tasks
On the configuration tab of the StarRocks node, select the created StarRocks data source from the Select Data Source drop-down list.
Write and run SQL code.
Write SQL code based on your business requirements and run the SQL code. In the dialog box that appears, select the resource group for scheduling that you want to use from the drop-down list. The following examples describe how to develop StarRocks tasks:
Example 1: Create a database
CREATE DATABASE IF NOT EXISTS load_test;
After the statement is successfully executed, you can verify the result in EMR StarRocks Manager.
Use EMR StarRocks Manager to connect to an EMR Serverless StarRocks instance.
In the left-side navigation pane of EMR StarRocks Manager, click SQL Editor. Create a file, enter the following command in the SQL editor, and then click Run:
SHOW DATABASES;
If the
load_test
database appears in the result, the database is created.
Example 2: Query information about the tables in the StarRocks database
SELECT * FROM information_schema.tables WHERE table_type = 'BASE TABLE';
The following figure shows the output.
References
For more information about how to use a StarRocks node to develop and schedule tasks, see Configure a StarRocks node.