All Products
Search
Document Center

DataWorks:Database nodes

Last Updated:Apr 23, 2026

DataWorks lets you create various types of database nodes to develop SQL tasks, run them on a schedule, and integrate them with other jobs.

Prerequisites

  • A RAM user is added to the workspace (optional).

    The RAM user for task development has been added to the workspace and granted the Development or Workspace Administrator (provides extensive permissions and should be granted with caution) role. For more information, see Add members to a workspace.

  • A DataWorks data source is created.

  • A database node has been created. For more information, see Create task nodes.

Step 1: Develop a database node

  1. After you create a database node, you can develop it.

    1. Select a data source.

      From the Select a data source drop-down listimage, select the data source for the task. If the required data source is not available, click Add Connection to add a new data source.image

      Note
      • In a standard mode workspace, DataWorks displays only data sources configured for both the development and production environments.

      • Database nodes support only data sources created using a JDBC connection string.

    2. Develop the SQL script.

      In the SQL editor, write SQL statements to create the task. The following code is a simple query example:

      SELECT * FROM you_table_name;  --Query the table.
      SELECT '${var}'; --Configure placeholder parameters.
      Note

      You can write statements based on the SQL syntax supported by your configured data source.

    3. Configure a resource group for debugging.

      Click Run Configuration, and from the Compute Resource > DataWorks Resource Group drop-down list, select a serverless resource group that has network connectivity to the data source.image

      Note

      To access data sources in a public network or VPC environment, use a scheduling resource group that has passed the connectivity test with the data source. For more information, see Network connectivity solutions.

    4. Configure debugging parameters.

      Click Run Configuration. In the Script Parameters section, you can assign values to the parameters configured in the database node script.

      image

    5. After configuration, click image to save the SQL node, and then click image to run and test the SQL script and verify that it works as expected.

  2. After debugging the SQL script, click schedule settings on the right side of the SQL editor to configure the schedule for the database node. For more information, see Configure schedule settings.

Step 2: Deploy and manage database nodes

  1. After you configure the schedule settings, you can submit and deploy the database node to the production environment. For more information, see Submit and deploy nodes.

  2. After deployment, the task runs periodically based on the schedule you configured. You can view the deployed scheduled tasks in Operation and Maintenance Center > Node O&M > Auto Triggered Task O&M > Auto Triggered Task and perform O&M operations. For more information, see Manage scheduled tasks.

Supported data sources

DataWorks supports creating database nodes from various data sources. The following lists the supported database nodes:

Note
  • Data sources used for database nodes must be created using a JDBC connection string.

  • Some databases natively support stored procedures, but stored procedures are not supported in DataWorks Data Studio.

Data sources that support database nodes

Data source type

Description

MySQL

MySQL is a relational database management system (RDBMS) used to store and process data. It is one of the most popular RDBMSs, known for its small footprint, fast speed, and low total cost of ownership. For more information, see MySQL.

SQL Server

SQL Server is an RDBMS used to store and process data. It provides reliable, efficient, and secure data management and analytics services. For more information, see SQL Server.

Oracle

Oracle is an RDBMS used to store and process data. It provides reliable, efficient, and secure data management and analytics services. For more information, see Oracle.

PostgreSQL

PostgreSQL is a powerful and flexible open-source RDBMS with a robust data model, high scalability, stability, and a rich set of core features. For more information, see PostgreSQL.

DRDS

DRDS is a distributed database service. It allows you to horizontally scale relational databases into distributed systems, supporting massive data storage and access while maintaining the features of relational databases such as MySQL. For more information, see DRDS.

PolarDB MySQL

PolarDB for MySQL is a next-generation cloud-native database independently developed by Alibaba Cloud. Built on a compute-storage separation architecture, it leverages the advantages of integrated hardware and software to provide highly elastic, high-performance, massively scalable, secure, and reliable database services. It is 100% compatible with MySQL and PostgreSQL ecosystems and highly compatible with Oracle syntax. For more information, see PolarDB for MySQL.

PolarDB PostgreSQL

PolarDB for PostgreSQL is a cloud-native relational database fully developed by Alibaba Cloud. It is 100% compatible with PostgreSQL and highly compatible with Oracle syntax. It provides fast elastic scaling, high performance, massive storage, and secure and reliable database services, and supports the Alibaba Cloud proprietary Ganos multi-dimensional spatiotemporal engine and the open-source PostGIS geographic information engine. For more information, see PolarDB for PostgreSQL.

Doris

Apache Doris is a high-performance, real-time analytical database that is well suited for report analysis, ad hoc queries, and data lake federated query acceleration. For more information, see Introduction to Doris.

MariaDB

MariaDB is an open-source RDBMS that is highly compatible with MySQL. It can seamlessly replace MySQL. After you uninstall MySQL, you can install MariaDB in its place without modifying your application code. For more information, see MariaDB.

SelectDB

SelectDB is a next-generation multi-cloud native real-time data warehouse built on Apache Doris. It focuses on meeting enterprise-level real-time big data analytics needs and provides cost-effective, easy-to-use data analytics services. For more information, see SelectDB.

Redshift

Amazon Redshift is a fully managed, petabyte-scale cloud data warehouse service. You can access and analyze data through Amazon Redshift Serverless without configuring a provisioned data warehouse. For more information, see Amazon Redshift.

SAP HANA

SAP HANA is a high-performance in-memory database and application platform that combines database, data processing, and application platform capabilities to deliver enterprise-level in-memory computing. For more information, see SAP HANA.

Vertica

Vertica is a high-performance columnar storage database management system (DBMS) that can process and query large-scale datasets at high speed. It is primarily used for big data analytics and real-time queries. For more information, see Vertica.

DM

DM (Dameng) is an OLTP database integrated into business systems. It combines the advantages of distributed computing, elastic computing, and cloud computing, and features flexibility, ease of use, reliability, and high security. For more information, see DM.

KingbaseES

KingbaseES is a large-scale RDBMS that supports the SQL standard. It is suitable for enterprise-level applications that require processing large volumes of data with high concurrency and high availability. For more information, see KingbaseES.

OceanBase

OceanBase is a distributed relational database independently developed by Ant Group and Alibaba. It features strong data consistency, high availability, high performance, online scalability, high compatibility with the SQL standard and mainstream relational databases, and low cost. For more information, see OceanBase.

DB2

DB2 is an RDBMS used to store, retrieve, and manage data. It is suitable for handling high-throughput, large-scale datasets, and complex queries and transaction processing in data warehouses. For more information, see DB2.

GBase 8a

GBase 8a is an RDBMS that supports large-volume data storage and high-concurrency read/write operations. It is commonly used in government, finance, telecom, and energy sectors. GBase 8a supports the SQL standard and provides a range of enterprise-level features such as data partitioning, load balancing, and disaster recovery. For more information, see GBase 8a.