Configure Scheduling Properties in LHM Overview Report - Migration Hub

This document describes the advanced features of the overview report (LHMPackageOverview.xls). You can edit the report to modify scheduling properties for workflows and nodes, along with create a blacklist.

1. What is the LHM standard scheduling package?

The standard scheduling package is an intermediate product in the LHM scheduling migration process. It serves as a unified description layer with a standard data structure for N2N scheduling migration.

The standard package contains an overview report file (LHMPackageOverview.xls). This file summarizes the basic information about the transformed workflows, nodes, resources, functions, and data sources.

2. Advanced features of the overview report

LHM lets you edit the overview report to modify common scheduling properties and create a blacklist.

2.1 How to perform operations

You can perform the operations as follows:

Advanced operations in the overview report take effect during the initialization phase of the next migration stage:

1. If you modify the package in the export tool, the scheduling transform tool retrieves the changes during initialization and applies them.

2. If you modify the package in the transform tool, the destination import tool retrieves the changes during initialization and applies them.

2.2 Modify workflow scheduling properties

You can change some workflow properties in the report. Editable fields are marked in blue.

Property Name	Description	Type	Value	Example
ID	Workflow ID	Read-only	/	16373885761152
Name	Workflow name	Read-only	/	TestWorkflow2
Path	Path where the workflow is located	Read-only	/	ds3_0410
Owner	/	Read/Write	/	admin
Description	/	Read/Write	/	This is a test workflow 123
Scheduling parameters	Workflow-level parameters	Read/Write	JSON format. You can modify, add, or delete parameters.	{"prop1":"value1","prop2":"value2"}
Scheduling information	Cron expression	Read/Write	Must be a valid cron expression.	00 00 * * * * ?
Instance generation method	/	Read/Write	IMMEDIATELY: Generates instances immediately after publishing. T_PLUS_1: Generates instances on the next day (T+1).	T_PLUS_1
Number of nodes	/	Read-only	/	3
Source ID (if the workflow was migrated)	Advanced feature	Read-only	/	/
Statistics by node type (multiple columns)	/	Read-only	/	/

2.3 Modify node scheduling properties

You can change some node properties in the report. Editable fields are marked in blue.

Property Name	Description	Type	Value	Example
ID	/	Read-only	/	16373881518720
Name	/	Read-only	/	Node1
Owner	/	Read/Write	/	admin
Description	/	Read/Write	/	This is a test node 123
Parent Workflow ID	/	Read-only	/	16373885761152
Parent Workflow Name	/	Read-only	/	TestWorkflow2
Type	/	Read/Write	/	SQL
Related data source type	/	Read/Write	/	MYSQL
Related data source	/	Read/Write	/	test_mysql_123
Scheduling parameters¹	Workflow-level parameters	Read/Write	JSON format. You can modify, add, or delete parameters.	{"prop1":"value1","prop2":"value2"}
Script parameter extraction¹	Parameter references in the script	Read/Write	JSON format. You can modify, add, or delete parameters.	{"$(param1)":"${Param1}"}
Scheduling information	Cron expression	Read/Write	Must be a valid cron expression.	00 00 * * * * ?
Instance generation method	/	Read/Write	IMMEDIATELY: Generates instances immediately after publishing. T_PLUS_1: Generates instances on the next day (T+1).	T_PLUS_1
Scheduling type	/	Read/Write	NORMAL: Normal scheduling PAUSE: Pause scheduling SKIP: Dry-run scheduling	NORMAL
Rerun type	/	Read/Write	ALL_ALLOWED: Reruns are allowed after both successful and failed runs. ALL_DENIED: Reruns are not allowed after successful or failed runs. FAILURE_ALLOWED: Reruns are not allowed after successful runs, but are allowed after failed runs.	ALL_ALLOWED
Scheduling resource group	DataWorks scheduling resource group	Read/Write	DataWorks general or scheduling resource group ID	Serverless_res_group_580581087550304_692540198941344
Compute CU	DataWorks compute CU	Read/Write	Float	0.25
Image	DataWorks image	Read/Write	Image ID	System_emr_datalake_5151_ 20240731
Data Integration resource group (DI only)	DataWorks integration resource group	Read/Write	DataWorks general or integration resource group ID	Serverless_res_group_580581087550304_692540198941344
Data Integration CU (DI only)	DataWorks Data Integration CU	Read/Write	Float	0.5
Source ID (if the workflow was migrated)	Advanced feature	Read-only	/	/

Note 1: For usage details, see section 2.3.1, "Handle variables in nodes and code."

2.3.1 Handle variables in nodes and code

2.3.1.1 Differences in node variables among scheduling engines

Different scheduling engines use node variables in different ways. These differences must be handled during the scheduling transformation. The main differences fall into three categories:

· Differences in call format

Common call formats include ${param}, $[param], and $(param).

In DataWorks, node variables primarily use the ${} and $[] formats. When transforming schedules, you must handle variable references in the node code.

· Differences in built-in variables of scheduling engines

Scheduling engines provide various built-in variables, with time variables being the most common. Other variables, such as ${workflowName} and ${taskName}, are also available. Different scheduling engines offer different sets of built-in variables, and the format of time variables may also differ. These differences must be handled during the transformation.

In addition, DataWorks requires you to define built-in variables in the node's parameter table before you can use them. However, some scheduling engines, such as WeData, allow you to use built-in variables directly in the node code. When migrating to DataWorks, you must add these variables to the node variables.

· Differences in multi-level parameter references

The rules for multi-level references to Project-level, Workflow-level, and Node-level variables may differ.

2.3.1.2 Use the overview report to modify variables in nodes and code

In a migration scenario, there are many differences between the node variables of the source and destination scheduling engines. The tool provides a general feature that lets you handle variables simply. This handling must be performed after the scheduling transformation is complete and before the scheduling import.

· Node variable completion

The tool can automatically detect variables referenced in the node code and compare them with the node's existing variables. If a variable is not found in the node's variable list, the tool automatically pre-populates the parameter in the parameter column of the report.

The tool detects variables in the node code by searching for substrings enclosed in ${}, $[], or $() and then deduplicating them.

For example, a script contains three substrings that appear to be variables: param1, param2, and param3. These variables do not exist in the node's custom variable table.

The tool automatically detects these substrings and displays a prompt in the overview report:

The tool automatically creates variable names. You can edit the table to add values for these variables. If a variable was incorrectly detected, you can delete it or leave it unchanged.

You can also edit the table to add additional node parameters, even if the tool did not pre-create the variable names.

· Replace variable call methods in node code

Because call formats differ, you must replace the variable references in the node code.

As an example, a node code contains three substrings that appear to be parameters: param1, param2, and param3.

The tool automatically detects these substrings and builds a map. You can modify this map to replace the variable call methods in the node code.

The key (before the colon) is the original string in the node code. The value (after the colon) is the replacement string.

For example, to replace all instances of $() and $[] with ${}, you can modify the cell as follows:

{"${param1}":"${param1}","$[param2]":"${param2}","$(param3)":"${param3}"}

The replacement uses the String.replace(CharSequence target, CharSequence replacement) method. Regular expressions will not work.

You can also edit this cell to replace variable names in the code. Note that if you change a variable name, you must also modify the node's scheduling variable table accordingly. For example, you can replace param1 with P1.

2.4 Workflow blacklist

The report lets you prevent the tool from processing certain workflows by deleting their rows from the workflow child table. This feature is known as the workflow blacklist.

Note: If workflows have dependencies on each other, process them in the same batch. Do not separate them using the blacklist. Separating dependent workflows will cause errors.