Automate Redshift SQL Jobs via DataWorks Scheduling - DataWorks

Use the Redshift node in DataWorks to write, schedule, and monitor SQL tasks against an Amazon Redshift database. This topic covers the end-to-end process: create the node, write SQL, configure the schedule, debug, and publish.

Background information

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. You can use Amazon Redshift Serverless to access and analyze data without configuring a provisioned data warehouse. For more information, see Amazon Redshift.

Prerequisites

Before you begin, ensure that you have:

A Business Flow. DataStudio organizes development by Business Flows. For details, see Create a workflow.
A Redshift data source added to your DataWorks workspace. For general data source setup, see Data Source Management. For Redshift-specific configuration, see Amazon Redshift data source.
Network connectivity between the data source and the resource group you plan to use. See Network connection solutions.
(RAM users only) Your RAM user added to the workspace with the Develop or Workspace Administrator role. Grant the Workspace Administrator role with caution — it carries high privileges. See Add members to a workspace.

Important

Redshift nodes support only data sources connected via a Java Database Connectivity (JDBC) connection string. Data sources using other connection methods are not supported.

Supported regions

China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Chengdu), China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Germany (Frankfurt), US (Silicon Valley), and US (Virginia).

Step 1: Create a Redshift node

Go to the DataStudio page. Log on to the DataWorks console. In the top navigation bar, select the target region. In the left-side navigation pane, choose Data Development and O\&M \> Data Development. Select your workspace from the drop-down list and click Go to Data Development.
In the DataStudio page, right-click the target Business Flow and choose Create Node \> Database \> Redshift.
In the Create Node dialog box, enter a Name and click OK.

Step 2: Develop a Redshift task

Select a data source

If your workspace has multiple Redshift data sources, select the appropriate one on the node configuration page. If only one exists, it is selected by default.

Write SQL

In the code editor, write the SQL for your task.

Simple query:

SELECT * FROM usertablename;

Query with scheduling parameters:

DataWorks scheduling parameters let you pass dynamic values into recurring tasks. Define a variable in your SQL using the ${variable_name} syntax, then assign a value to it under Schedule \> Scheduling Parameters in the right panel.

SELECT '${var}'; -- Use with scheduling parameters.

For supported parameter formats, see Supported formats of scheduling parameters. For configuration steps, see Configure and use scheduling parameters.

Step 3: Configure task scheduling

Click Scheduling Configuration in the right panel and set the scheduling properties. For details, see Overview.

Note

Configure Rerun Property and Upstream Dependent Node before submitting the node.

Step 4: Debug the task

(Optional) Select a resource group and assign parameter values for debugging. Click the icon in the toolbar. In the Parameters dialog box, select a resource group and assign values to any scheduling parameters. See Task debugging process.
Save and run the task. Click the icon to save, then click the icon to run.
(Optional) Run a smoke test. Run a smoke test during or after submission to verify execution in the development environment. See Perform smoke testing.

Step 5: Submit and publish the task

Click the icon to save the node.
Click the icon to submit the node. In the Submit dialog box, enter a Change Description and select code review options.

Note
If code review is enabled, a reviewer must approve the code before it can be published. See Code review.
In standard mode workspaces, click Publish in the upper-right corner to deploy to the production environment. See Publish tasks.

What's next

After the task is published, it runs on a recurring basis. Click O\&M in the upper-right corner of the node configuration tab to open Operation Center, where you can monitor the scheduling and run status of the task. See Manage recurring tasks.