Use the Vertica node in DataWorks to write and periodically schedule SQL tasks against a Vertica database, and integrate those tasks into larger pipelines.
Background information
Vertica is a high-performance, columnar storage database management system (DBMS). It is designed for high-speed processing and querying of large-scale datasets and is primarily used for big data analytics and real-time queries. For more information, see the official Vertica website.
Prerequisites
Before you begin, make sure you have:
-
A business flow. DataStudio organizes development by business flows. Create a workflow if you don't have one.
-
A Vertica data source configured with a JDBC connection string. Vertica nodes only support data sources created using a Java Database Connectivity (JDBC) connection string — other connection types are not supported. To create one, see Data Source Management and Vertica data source.
-
Network connectivity between the data source and the resource group. Verify that the resource group can reach your Vertica database. For configuration options, see Network connection solutions.
-
(RAM users only) The Develop or Workspace Administrator role. Add the RAM user to the workspace and assign the appropriate role. Grant the Workspace Administrator role with caution — it carries elevated privileges. See Add members to a workspace.
Supported regions
China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Chengdu), China (Hong Kong), Singapore, Malaysia (Kuala Lumpur), Germany (Frankfurt), US (Silicon Valley), and US (Virginia).
Step 1: Create a Vertica node
-
Go to the DataStudio page. Log on to the DataWorks console. In the top navigation bar, select the target region. In the left-side navigation pane, choose Data Development and O\&M > Data Development. Select the target workspace from the drop-down list and click Go to Data Development.
-
Right-click the target business flow and choose Create Node > Database > Vertica.
-
In the Create Node dialog box, enter a Name and click Confirm.
Step 2: Develop the Vertica task
(Optional) Select a data source
If the workspace has multiple Vertica data sources, select the appropriate one on the Vertica node configuration tab. If only one Vertica data source exists, it is selected by default.
Only data sources created using a JDBC connection string are supported.
Write SQL
Enter SQL in the code editor. The following example queries a table:
SELECT * FROM usertablename;
Use scheduling parameters
Scheduling parameters let you pass dynamic values into recurring tasks at runtime. Define variables in your SQL using ${variable_name}, then assign values under Schedule > Parameters in the right-side pane.
SELECT '${var}'; -- Use with scheduling parameters.
For supported formats and configuration steps, see Supported formats for scheduling parameters and Configure and use scheduling parameters.
Step 3: Configure task scheduling
Click Scheduling Configuration on the right-side pane and set the scheduling properties.
Configure Rerun Property and Upstream Dependent Node before submitting the task. For a full reference, see Overview.
Step 4: Debug the task
-
(Optional) Set a debugging resource group and assign parameter values. Click the
icon in the toolbar. In the Parameters dialog box, select a resource group and assign values to any scheduling parameters used in the task. For parameter assignment logic, see Task debugging process. -
Save and run the task. Click the
icon to save, then click the
icon to run. -
(Optional) Run a smoke test. Run a smoke test to verify execution in the development environment during or after submission. See Perform smoke testing.
Step 5: Submit and publish the task
-
Click the
icon to save the node. -
Click the
icon. In the Submit dialog box, enter a Change Description and select code review options.Configure Rerun Property and Upstream Dependent Node before submitting. If code review is enabled, a reviewer must approve the code before it can be published. See Code review.
-
In standard mode workspaces, click Publish in the upper-right corner to deploy the task to the production environment. See Publish tasks.
What's next
After the task is published, it runs on a recurring schedule. To monitor its status, click O\&M in the upper-right corner of the node configuration tab to open Operation Center, where you can view the scheduling and run history of recurring tasks. See Manage recurring tasks.