PolarDB for PostgreSQL nodes in DataWorks let you write and schedule SQL tasks against a PolarDB for PostgreSQL database, and integrate those tasks with other jobs in your pipeline.
Background
PolarDB for PostgreSQL is a cloud-native relational database product developed by Alibaba Cloud. It is 100% compatible with PostgreSQL and provides high compatibility with Oracle syntax. It offers a fast, elastic, high-performance, secure, and reliable database service with mass storage capabilities. It also supports the Alibaba Cloud-developed Ganos multi-dimensional and multi-modal spatiotemporal information engine and the open source PostGIS geographic information engine. For more information, see PolarDB for PostgreSQL.
Prerequisites
Before you begin, ensure that you have:
A Business Flow in DataStudio. DataStudio organizes development by Business Flows — create one before creating any node. See Create a workflow
A PolarDB for PostgreSQL data source added to your workspace. See Data source management and PolarDB data sources
Network connectivity between the data source and the resource group you plan to use. See Network connection solutions
(RAM users only) The Develop or Workspace Administrator role in the workspace. The Workspace Administrator role carries high privileges — assign it with caution. See Add members to a workspace
PolarDB for MySQL nodes support only PolarDB for MySQL data sources created with a Java Database Connectivity (JDBC) connection string.
Supported regions
China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Chengdu), China (Hong Kong), Singapore, Malaysia (Kuala Lumpur), Germany (Frankfurt), US (Silicon Valley), and US (Virginia).
Step 1: Create a PolarDB for PostgreSQL node
Log on to the DataWorks console. In the top navigation bar, select your region. In the left-side navigation pane, go to Data Development and O&M > Data Development, select your workspace, and click Go to Data Development.
In DataStudio, right-click the target Business Flow and choose Create Node > Database > PolarDB PostgreSQL.
In the Create Node dialog box, enter a Name for the node and click OK. The node configuration tab opens. Use it to develop and configure the task.
Step 2: Develop the task
Select a data source (if needed)
If your workspace has multiple PolarDB for PostgreSQL data sources, select the one to use on the node editing page. If only one exists, it is used by default.
Write SQL code
In the code editor, write the SQL to execute. Two common patterns:
Basic query
SELECT * FROM usertablename;Query with scheduling parameters
DataWorks supports scheduling parameters for recurring schedules. Define variables in ${variable_name} format in your code, then assign values on the Schedule tab under Scheduling parameters.
SELECT '${var}'; -- You can use this with scheduling parameters.For supported formats and configuration details, see Supported formats of scheduling parameters and Configure and use scheduling parameters.
Step 3: Configure task scheduling
Click Scheduling Configuration on the right panel and set the scheduling properties.
Set the Rerun Property and Upstream Dependent Node before submitting the task. For all available properties, see Overview.
Step 4: Debug the task
(Optional) Select a debugging resource group and assign parameter values. Click the
icon in the toolbar. In the Parameters dialog box, select a resource group and assign values to any scheduling parameters. See Task debugging process.Save and run the task. Click the
icon to save, then click the
icon to run.(Optional) Run a smoke test to verify execution in the development environment. See Perform smoke testing.
Step 5: Submit and publish the task
Click the
icon to save the node.Click the
icon to submit. In the Submit dialog box, enter a Change Description and select code review options.NoteIf code review is enabled, a reviewer must approve the code before it can be published. See Code review.
In standard mode workspaces, click Publish in the upper-right corner to deploy to production. See Publish tasks.
What to do next
After publishing, the task runs on a recurring schedule based on the node's configuration. Click O&M in the upper-right corner of the node configuration tab to open Operation Center, where you can monitor the scheduling and run status of the task. See Manage recurring tasks.