Use the DB2 node in DataWorks to develop SQL tasks against an IBM DB2 database, schedule them on a recurring basis, and integrate them with other nodes in a Business Flow.
Background information
DB2 is a relational database management system (RDBMS) that stores, retrieves, and manages data. DB2 is suitable for high throughput, large datasets, complex queries, and transaction processing in data warehouses. For more information, see the official DB2 website.
Limitations
DB2 nodes are supported in the following countries and regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Chengdu), Hong Kong (China), Singapore, Malaysia (Kuala Lumpur), Germany (Frankfurt), US (Silicon Valley), and US (Virginia).
Prerequisites
Before you begin, ensure that you have:
-
A Business Flow. DataStudio organizes development by Business Flows. For more information, see Create a workflow.
-
A DB2 data source created using a JDBC connection string. DB2 nodes do not support other connection types. For general instructions, see Data Source Management. For DB2-specific configuration, see DB2 data source.
-
Network connectivity established between the data source and the resource group. For configuration options, see Network connection solutions.
-
(Optional, required for RAM users) The RAM user added to the workspace with the Develop or Workspace Administrator role. Grant the Workspace Administrator role with caution due to its elevated privileges. For more information, see Add members to a workspace.
Step 1: Create a DB2 node
-
Go to the DataStudio page. Log on to the DataWorks console. In the top navigation bar, select the target region. In the left-side navigation pane, choose Data Development and O&M > Data Development. Select the target workspace from the drop-down list and click Go to Data Development.
-
In the left panel, right-click the target Business Flow and choose Create Node > Database > DB2.
-
In the Create Node dialog box, set the Name for the node and click Confirm. The node configuration tab opens, where you can develop and configure the task.
Step 2: Develop a DB2 task
Select a data source (optional)
If your workspace has multiple DB2 data sources, select the one to use on the node configuration page. If only one DB2 data source exists, it is selected by default.
Write SQL code
In the code editor, write the SQL for the task. For example:
SELECT * FROM usertablename;
Use scheduling parameters
DataWorks scheduling parameters let you pass dynamic values at runtime without modifying your SQL. Define a variable in your code using the ${variable_name} format, then assign its value in the right-side pane under Schedule > Parameters.
SELECT '${var}';
For supported formats, see Supported formats of scheduling parameters. For step-by-step setup, see Configure and use scheduling parameters.
Step 3: Configure task scheduling
In the right-side pane, click Scheduling Configuration and set the scheduling properties.
Configure Rerun Property and Upstream Dependent Node before submitting the node.
For a full list of scheduling options, see Overview.
Step 4: Debug the task
-
(Optional) Select a debugging resource group and assign parameter values. Click the
icon in the toolbar to open the Parameters dialog box. Select a resource group and assign values to any scheduling parameters used in the task. For parameter assignment details, see Task debugging process. -
Save and run the task. Click the
icon to save, then click the
icon to run. -
(Optional) Run a smoke test during or after submission to verify execution in the development environment. For instructions, see Perform smoke testing.
Step 5: Submit and publish the task
-
Click the
icon to save the node. -
Click the
icon to submit. In the Submit dialog box, enter a Change Description and select code review options.NoteConfigure Rerun Property and Upstream Dependent Node before submitting. If code review is enabled, a reviewer must approve the code before it can be published. For more information, see Code review.
-
In standard mode workspaces, click Publish in the upper-right corner to deploy the task to production. For more information, see Publish tasks.
What's next
After the task is published, it runs on the schedule defined in its configuration. To monitor the task, click O&M in the upper-right corner of the node configuration tab to open Operation Center, where you can view the scheduling and run status of recurring tasks. For more information, see Manage recurring tasks.