This topic describes how to create a database SQL task in Dataphin.
Limits
You can create and execute stored procedures, but Dataphin does not verify or parse the syntax.
To execute database SQL, you need execution permissions for the data source. Table-level permission verification is not supported. For more information, see Apply for, renew, and return data source permissions.
Triggering data source quality rules through SQL data updates based on MySQL or Oracle data sources is not supported. For more information, see Create data source quality rules.
Full download of query results is not supported. Only sample data download is supported.
Database SQL is not supported for offline computing template node types.
When using a data source that is configured only for the production environment in a Dev-Prod mode project, data backfill or run operations in the development environment may modify production data. Use with caution.
Procedure
In the top navigation bar of the Dataphin homepage, choose Develop > Data Development.
In the top navigation bar of the Development page, select Project (for Dev-Prod mode, you need to select an environment).
In the navigation pane on the left, choose Data Processing > Script Task. In the Script Task list, click the
icon and select Database Sql.In the Create Database SQL Task dialog box, configure the following parameters.
Parameter
Description
Basic Information
Task Name
Enter a name for the code task.
The name cannot exceed 256 characters and cannot contain vertical lines (|), forward slashes (/), backslashes (\), colons (:), question marks (?), angle brackets (<>), asterisks (*), or double quotation marks (").
Schedule Type
Select the scheduling type for the task. Schedule Type includes:
Recurring Task: Automatically participates in the system's periodic scheduling.
One-Time Task: Requires manual triggering to run.
Select Directory
Select the directory where the task will be stored.
If no directory exists, you can Create Folder as follows:
Click the
icon above the computing task list on the left side of the page to open the Create Folder dialog box.In the Create Folder dialog box, enter the folder Name and Select Directory location as needed.
Click OK.
Use Template
Turn on the Use Template switch to specify whether to use a code template. If you turn on this switch, you must also select a Template and a Template Version. By default, the latest version of the selected template is used.
Reference a code template to improve development efficiency. The code of a template task is read-only. Simply configure the template parameters to complete code development. For more information, see Create an offline computing template.
NoteAfter a database SQL task that uses a data source of AnalyticDB for PostgreSQL, StarRocks, Doris, SelectDB, or GaussDB (DWS) references an offline computing template, the task type cannot be changed.
Description
Enter a brief description of the task, within 1,000 characters.
Data Source Configuration
Data Source Type
Select the data source type for the SQL task. For more information about the supported data source types, see the Offline R&D - Database SQL column in Data sources supported by Dataphin.
If you use a template, the data source type is automatically set to that of the template and cannot be changed.
Datasource
Select the data source for the SQL task. If no data source is available, you can click +Create Data Source to create one.
If you use a template, the data source is automatically set to that of the template, but you can select a different one.
Catalog
If you set Data Source Type to Presto or Trino, you must also configure the catalog after you configure the data source.
Database/Schema
If you set Data Source Type to AnalyticDB for PostgreSQL, Oracle, Presto, GaussDB (DWS), Microsoft SQL Server, Hologres, openGauss, DM, OceanBase (Oracle tenant mode), Trino, or PolarDB-X 2.0, you must also configure the schema after you configure the data source.
If you use a template, the schema is automatically set to that of the template, but you can select a different one.
Click OK.
On the tab for the database SQL task, write the code for the offline database SQL task in the code editor. Database SQL supports cross-node output parameters. For more information, see Assign values to cross-node output parameters and Best practices for cross-node parameters. After you write the code, click Run above the code editor.
Click Property in the right sidebar. In the Property panel, configure the task's Basic Information, Runtime Parameter, Scheduling Properties (for recurring tasks), Schedule Dependency (for recurring tasks), Run Configuration, and Resource Configuration.
Basic Information
Used to define basic information such as the name of the scheduling task, the responsible person, and description. For configuration instructions, see Configure basic task information.
Runtime Parameter
If your task calls parameter variables, you can assign values to the parameters in the properties, so that when the node is scheduled, parameter variables can be automatically replaced with the corresponding variable values. For configuration instructions, see Parameter configuration and using node parameters.
Scheduling Properties (for recurring tasks)
If the scheduling type of the offline computing task is Recurring Task, in addition to Basic Information, you also need to configure the scheduling properties of the task. For configuration instructions, see Configure scheduling properties.
Schedule Dependency (for recurring tasks)
If the scheduling type of the offline computing task is Recurring Task, in addition to Basic Information, you also need to configure the scheduling dependencies of the task. For configuration instructions, see Configure scheduling dependencies.
ImportantAutomatic dependency parsing by the system may not be accurate. Please confirm based on your code.
Run Configuration
You can configure task-level runtime timeout and retry policies for task failures based on your business scenarios. If not configured, the default values set at the tenant level will be inherited. For configuration instructions, see Computing task run configuration.
Resource Configuration
Resources consumed by instances generated from the task scheduling. Resources between different resource groups are isolated and do not affect each other. Database SQL is a shared resource task and does not support specifying custom resource groups. For configuration instructions, see Computing task resource configuration.
In the current database SQL task tab, save and submit the current task.
Click the
icon above the code editing area to save the code.Click the
icon above the code editing area to submit the code.
On the Submitting Log page, confirm the Submission Content and Pre-Check results, and enter remarks. For more information, see Offline computing task submission instructions.
After confirmation, click OK And Submit.
What to do next
If your development mode is Dev-Prod mode, after the task is successfully submitted, you need to go to the release list to publish the task to the production environment. For more information, see Manage release tasks.
If your development mode is Basic mode, the successfully submitted database SQL task can participate in production environment scheduling. You can go to the Operation Center to view your published tasks. For more information, see View and manage script tasks, View and manage one-time tasks.
Appendix: Modify the task type
For a database SQL task that uses AnalyticDB for PostgreSQL, StarRocks, Doris, SelectDB, or GaussDB (DWS) as its data source, you can change its type to a corresponding offline computing engine SQL task, such as ADB_PG_SQL, STARROCKS_SQL, DORIS_SQL, SELECTDB_SQL, or GAUSSDB_SQL.
On the Dataphin homepage, choose Develop > Data Development from the top menu bar.
On the Development page, select a project from the top menu bar. If you are in Dev-Prod mode, you must also select an environment.
In the navigation pane on the left, choose Data Processing > Script Task. In the Script Task list, select the target database SQL task.
Click the
icon next to the database SQL task and select Modify Task Type.In the Modify Task Type dialog box, click OK to switch the task type.
After you switch the task type, task properties such as the owner, scheduling settings, and upstream and downstream dependencies remain unchanged.
If the database SQL task uses an offline computing template, you cannot modify the task type.
If the current project does not have the corresponding SQL task type, you cannot change the database SQL task to the corresponding offline computing engine SQL task.
After you convert a database SQL task to an SQL task of the same engine type, you can roll it back to a database SQL task. For more information, see View and manage historical versions of offline tasks.