DataWorks StarRocks nodes enable you to develop and periodically schedule StarRocks tasks and integrate them with other jobs. This topic describes the main steps for developing tasks using StarRocks nodes.
Background information
StarRocks is a next-generation, ultra-fast, full-scenario Massively Parallel Processing (MPP) database and an OLAP analytics engine compatible with the MySQL protocol. It delivers exceptional performance and supports rich OLAP scenarios, such as multidimensional OLAP analysis, data lake analytics, high concurrency queries, and real-time data analytics.
Prerequisites
-
The business process has been created.
Data Studio performs engine-specific development operations based on business flows. Before creating a node, create a business flow first. For more information, see Create a business flow.
-
The StarRocks data source has been created.
You must first register your StarRocks database as a StarRocks data source in DataWorks. For more information, see Create a StarRocks data source.
NoteStarRocks nodes support only StarRocks data sources created using a Java Database Connectivity (JDBC) connection string.
-
(Optional; required for Resource Access Management (RAM) users) The RAM user used for task development has been added to the target workspace and assigned either the Development or Workspace Administrator role (which grants broad permissions—assign with caution). For more information about adding members and granting permissions, see Add members to a workspace.
Limits
Supported regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Chengdu), China (Hong Kong), Singapore, Malaysia (Kuala Lumpur), Germany (Frankfurt), US (Silicon Valley), and US (Virginia).
Step 1: Create a StarRocks node
Log on to the DataWorks console. In the target region, click in the left-side navigation pane. Select a workspace from the drop-down list and click Go to Data Development.
-
Right-click the target business flow and choose .
-
In the Create Node dialog box, enter a Name for the node and click OK. The node is created. You can now develop and configure your task in the node.
Step 2: Develop a StarRocks task
(Optional) Select a StarRocks data source
If your workspace has multiple StarRocks data sources, select the target data source from the drop-down list at the top of the StarRocks node editing page. If only one StarRocks data source exists, it is used by default.
StarRocks nodes support only StarRocks data sources created using a Java Database Connectivity (JDBC) connection string.
Develop SQL code: Simple example
Write the StarRocks task in the code editor of the StarRocks node. The following example queries information about all base tables in the StarRocks database:
SELECT * FROM information_schema.tables
WHERE table_type = 'BASE TABLE';
Develop SQL code: Switch catalog and database
SET CATALOG catalog_name; -- Switch the catalog effective for the current session.
USE catalog_name.db_name; -- Specify the database effective for the current session.
If a catalog or database name is a keyword, wrap it in backticks (``) to avoid parsing errors.
Develop SQL code: Use scheduling parameters
DataWorks Scheduling Parameter enable dynamic input in recurring schedules. In your node task, define variables in your code using the format ${variable_name}. Then, in the Scheduling > Scheduling Parameter section of the navigation pane on the right, assign values to these variables. For supported formats and configuration details, see Supported scheduling parameter formats and Configure and use scheduling parameters.
In the following example, the scheduling parameter a is set to $[yyyymmdd] (today’s date). The code queries tables created on the current day:
SELECT * FROM information_schema.tables
WHERE CREAT_TIME = '${a}';
Step 3: Configure task scheduling
To periodically run the node task, click Scheduling on the right side of the node editing page and configure scheduling settings based on your needs. For more information, see Overview of task scheduling properties.
You must configure the node’s Rerun attribute and Parent Nodes before you can submit the node.
Step 4: Test task code
Perform the following test operations as needed to verify that the task behaves as expected.
-
(Optional) Select a resource group and assign custom parameter values.
-
Click the
icon in the toolbar. In the Parameter dialog box, select the schedule resource group for testing. -
If your task code uses scheduling parameter variables, assign values to them here for testing. For more information about parameter assignment logic, see Task debugging process.
-
-
Save and run the task code.
Click the
icon in the toolbar to save your task code. Then click the
icon to run the task. -
(Optional) Perform smoke testing.
To run smoke testing in the development environment and verify that the scheduled node task executes as expected, perform smoke testing either during or after node submission. For more information, see Perform smoke testing.
Step 5: Submit and publish the task
After configuring the node task, submit and publish it. Once published, the node runs periodically based on its scheduling configuration.
-
Click the
icon in the toolbar to save the node. -
Click the
icon in the toolbar to submit the node task.In the Submission dialog box, enter a Change Description. Optionally, choose whether to require code review after submission.
Note-
You must configure the node’s Rerun attribute and Parent Nodes before you can submit the node.
-
Code review helps ensure code quality and prevents errors caused by unreviewed code being published directly to production. If code review is enabled, the submitted node code must be approved by reviewers before it can be published. For more information, see Code review.
-
If you are using a workspace in standard mode, after successfully submitting the task, click Publish in the upper-right corner of the node editing page to deploy the task to the production environment. For more information, see Publish a task.
What to do next
After the task is submitted and published, it runs periodically based on its configuration. Click O&M Personnel in the upper-right corner of the node editing interface to go to Operation Center and monitor the scheduling status of recurring tasks. For more information, see Manage recurring tasks.