Use the DRDS node in DataWorks to run SQL tasks against a Distributed Relational Database Service (DRDS) database on a recurring schedule and integrate them with other jobs in your workflow. DRDS is a distributed database service that lets you horizontally scale a relational database into a distributed system, supporting massive data storage and access while maintaining the original features of a relational database, such as MySQL. For more information, see DRDS Product Overview.
Prerequisites
Before you begin, ensure that you have:
A Business Flow in DataStudio. DataStudio organizes development by Business Flows. For more information, see Create a workflow.
A DRDS data source added to DataWorks using a Java Database Connectivity (JDBC) connection string. For more information, see Data Source Management and DRDS (PolarDB-X 1.0) data source.
Network connectivity established between the data source and the resource group. For more information, see Network connection solutions.
(Optional; required for RAM users) The RAM user added to the workspace with the Develop or Workspace Administrator role assigned. Grant the Workspace Administrator role with caution due to its high privileges. For more information, see Add members to a workspace.
DRDS nodes support only DRDS data sources created using a JDBC connection string.
Supported regions
China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Chengdu), China (Hong Kong), Singapore, Malaysia (Kuala Lumpur), Germany (Frankfurt), US (Silicon Valley), and US (Virginia).
Step 1: Create a DRDS node
Go to the DataStudio page. Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and O&M > Data Development. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
Right-click the target business flow and choose .
In the Create Node dialog box, set the Name parameter for the node and click OK.
Step 2: Develop a DRDS task
Select a DRDS data source (optional)
If multiple DRDS data sources exist in your workspace, select the required data source on the node editing page. If only one data source exists, it is used by default.
Write SQL code
In the code editor, write the SQL for the task. The following example selects all records from a table.
SELECT * FROM usertablename;Use scheduling parameters
DataWorks provides scheduling parameters to pass dynamic values into your SQL code at runtime. Define variables in the ${variable name} format in your code. Then, on the Schedule tab in the right-side navigation pane, go to Scheduling Parameters and assign a value to each variable.
The following example uses a scheduling parameter. At runtime, DataWorks replaces ${var} with the value you configured in the Scheduling Parameters section.
SELECT '${var}';For supported variable formats and configuration details, see Supported formats of scheduling parameters and Configure and use scheduling parameters.
Step 3: Configure task scheduling
Click Scheduling Configuration on the right to set the schedule and dependency properties.
Configure the Rerun Property and Upstream Dependent Node before submitting the task.
For a full reference of all scheduling options, see Overview.
Step 4: Debug the task code
(Optional) Select a debugging resource group and assign parameter values.
Click the
icon in the toolbar. In the Parameters dialog box, select a resource group.Assign values to any scheduling parameters for debugging. For more information, see Task debugging process.
Save and run the task. Click the
icon to save, then click the
icon to run.(Optional) Run a smoke test during or after submission to verify execution in the development environment. For more information, see Perform smoke testing.
Step 5: Submit and publish the task
Configure the Rerun Property and Upstream Dependent Node before submitting the task.
Click the
icon in the toolbar to save the node.Click the
icon to submit the node. In the Submit dialog box, enter a Change Description and select code review options.NoteIf code review is enabled, a reviewer must approve the code before it can be published. For more information, see Code review.
In standard mode workspaces, click Publish in the upper-right corner to deploy the task to production. For more information, see Publish tasks.
What to do next
After the task is published, it runs on a recurring schedule based on your configuration. Click O&M in the upper-right corner to go to Operation Center, where you can monitor the scheduling and running status of the task. For more information, see Manage recurring tasks.