Create offline computing template - Dataphin - Alibaba Cloud Documentation Center

Dataphin supports the creation of offline computing templates to streamline task development. This topic describes the steps to create, configure, and submit offline computing templates.

Background information

When multiple tasks share similar code logic but differ in certain configuration items or input parameters, you can encapsulate the code into an offline computing template with variable parameters for these configurations and inputs. By referencing the template in subsequent tasks, you can easily maintain and reuse common code logic, enhancing task code development efficiency.

Typically, tasks have dedicated runtime resources, which can lead to high resource consumption and affect task concurrency when many tasks run simultaneously. Dataphin allows multiple tasks referencing the same offline computing template to share runtime resources, ensuring efficient resource utilization and uninterrupted operation of other tasks. To enable this feature, activate the shared runtime resource switch for the offline computing template.

Prerequisites

Before enabling shared runtime resources for an offline computing template, ensure the global shared resource switch is active. For more information, see Runtime Settings.

Limits

Creation, configuration, and submission of offline computing templates are restricted to users with superuser, project administrator, or project developer roles.
For more information about how to obtain the project administrator and developer roles, see Add project members.
The shared runtime resource switch for offline computing templates can only be enabled by Super Administrator and System Administrator.

Procedure

Navigate to the Dataphin home page, select Development > Data Development from the top menu bar.
On the Development page, select Project from the top menu bar (select the environment in Dev-Prod mode).
In the navigation pane on the left, click on Data Processing, then Template. From the Template list, select the icon and choose Offline Computing Template.

In the Create Offline Computing Template dialog box, you can configure the following parameters.

Parameter	Description
Template Name	The naming convention is as follows: Supports uppercase and lowercase English letters, numbers, underscores (_), and hyphens (-). Globally unique. Cannot exceed 64 characters.
Node Type	Supports Shell, Python, and Database Sql. You can create different offline computing templates based on the offline compute engine. Offline computing templates supported by different compute engines MaxCompute compute engine MaxCompute SQL Spark on MaxCompute MapReduce on MaxCompute Hadoop (Hive) compute engine Hive SQL Impala SQL Note Impala must be configured and enabled. Spark on Yarn MapReduce on Yarn TDH Inceptor compute engine Inceptor SQL Spark on Yarn MapReduce on Yarn StarRocks compute engine: StarRocks SQL ArgoDB compute engine: ArgoDB SQL GaussDB (DWS) compute engine: GaussDB SQL Databricks compute engine: Databricks SQL SelectDB compute engine: SelectDB SQL Doris compute engine: Doris SQL AnalyticDB for PostgreSQL compute engine: AnalyticDB for PostgreSQL SQL Data source types supported by Database SQL MySQL Microsoft SQL Server PostgreSQL AnalyticDB for MySQL 2.0 AnalyticDB for MySQL 3.0 AnalyticDB for PostgreSQL OceanBase Oracle ClickHouse DM openGauss GaussDB (DWS) Hologres StarRocks Doris SelectDB Presto Trino If Node Type is Shell or Python, you can configure Python Third-party Packages. After you add a third-party module to a Python package, you must declare a reference to the module in the node before you can import it in the code. You can edit the referenced module in the Python Module configuration item of the compute node properties. When Node Type is set to Database SQL, you must also select a Database/Schema. For Presto data sources, you must also configure the Catalog.
Select Directory	The default selection is the offline computing template. You can also create a target folder on the Template page and select it as the directory for the computing template. The procedure for creating a new folder is as follows: Click the icon above the compute template list to open the Create Folder dialog box. In the Create Folder dialog box, enter the folder Name, select Offline Type, and then Select Directory location as needed. Click Confirm.
Description	Provide a brief description of the offline computing template, within 1000 characters.

Click Confirm.
Develop the offline computing template code on the code development page.
You can define template variable parameters in the format @@{variable_parameter_name}. The parameter name must start with a letter and can contain only letters, digits, and underscores (_). For example, @@{variable}.
After you finish developing the code for the offline computing template, click the Test button in the upper-left corner of the page. In the Enter Parameters dialog box, enter values for the parameters.
Click Confirm.
On the code development page, click Attribute on the right side.

In the Attribute panel, set the parameters.

Parameter		Description
Basic Information	Description	Provide a brief description of the offline computing template.
Python Third-Party Packages		Select the required Python third-party package. For more information, see Install and manage Python third-party packages. Note After adding a third-party module in the Python third-party package, you need to declare a reference in the task before you can import the module in the code.
Runtime Configuration	Shared Runtime Resources	After enabling shared runtime resources, task instances that reference this template can share runtime resources to save resources. This operation is limited to superuser execution. Important The Shared Runtime Resources configuration of the template must be enabled in the operations (metadata warehouse) tenant with global shared resources. Otherwise, configuration is not supported. Tasks created by referencing this template are exclusive resource tasks. For specific operations, see Runtime settings.
Parameter Settings	Parameter Description	Provide parameter descriptions in the code to facilitate developer understanding.
	Default Value	Assign values to parameters in the code. You can modify the parameter values in tasks that reference this template, and they will take effect after the task runs.
	Parameter Encryption	After enabling parameter encryption, the default parameter values will be stored in ciphertext to protect sensitive data. In subsequent tasks that reference this template, the default parameter values cannot be viewed in plaintext. Dataphin will automatically decrypt the default parameter values when the task runs. After disabling parameter encryption, the configured default parameter values will be automatically cleared.
Resource Configuration	Resource Group	Task scheduling requires consuming scheduling resources. You can specify the scheduling resource group that each task instance generated by referencing the template can use. During instance scheduling, resources will be occupied from the specified resource group's quota. If the specified resource group lacks available resources, it will enter the Waiting for Scheduling Resources status. Resources between different resource groups are isolated and do not affect each other, ensuring scheduling stability. After enabling Shared Runtime Resources, custom resource groups cannot be specified. Tasks created by referencing the template are shared resource tasks. Shared resource tasks are configured with a shared scheduling resource pool by default to support the scheduling of all shared resource tasks. To modify the maximum concurrency limit supported by the shared resource pool, please contact the metadata warehouse system administrator for modification. After you disable Shared runtime resources, you can specify a custom resource group. You can only select a resource group that is used for Daily node scheduling and is associated with the project to which the current node belongs. For more information, see Resource group configuration. Important Tasks created by referencing the template only support configuring the scheduling resource group on the template. If the Project Default Resource Group is selected, it will be automatically updated based on the project default resource group's configuration.

Click Confirm to finalize the offline computing template configuration.
On the code development page for the offline computing template, click the Submit button at the top of the page. In the submission remarks dialog box, enter your Remarks.
Click Confirm And Submit.
After you edit a submitted SQL compute template for a database:
- The system automatically checks whether the template code introduces a new input table. If no new input table is introduced, the submission proceeds as normal.
- If a new input table is introduced in the template code, the system checks whether any downstream nodes are configured to automatically reference the latest version. If such nodes exist, the submission fails. Otherwise, the submission proceeds as normal.
- If the SQL code fails to parse, you can still submit the template. However, the system displays the following prompt: The system cannot parse the code because template variables exist. Modifying the SQL template code may introduce new input tables, which can cause node instances that reference this template to lack upstream dependencies. Proceed with caution.

What to do next

Create tasks using the new offline computing template. For detailed instructions, refer to: