This document describes the advanced features of the overview report (LHMPackageOverview.xls). You can edit the report to modify scheduling properties for workflows and nodes, along with create a blacklist.
1. What is the LHM standard scheduling package?
The standard scheduling package is an intermediate product in the LHM scheduling migration process. It serves as a unified description layer with a standard data structure for N2N scheduling migration.
The standard package contains an overview report file (LHMPackageOverview.xls). This file summarizes the basic information about the transformed workflows, nodes, resources, functions, and data sources.
2. Advanced features of the overview report
LHM lets you edit the overview report to modify common scheduling properties and create a blacklist.
2.1 How to perform operations
You can perform the operations as follows:
Advanced operations in the overview report take effect during the initialization phase of the next migration stage:
1. If you modify the package in the export tool, the scheduling transform tool retrieves the changes during initialization and applies them.
2. If you modify the package in the transform tool, the destination import tool retrieves the changes during initialization and applies them.
2.2 Modify workflow scheduling properties
You can change some workflow properties in the report. Editable fields are marked in blue.
Property Name | Description | Type | Value | Example |
ID | Workflow ID | Read-only | / | 16373885761152 |
Name | Workflow name | Read-only | / | TestWorkflow2 |
Path | Path where the workflow is located | Read-only | / | ds3_0410 |
Owner | / | Read/Write | / | admin |
Description | / | Read/Write | / | This is a test workflow 123 |
Scheduling parameters | Workflow-level parameters | Read/Write | JSON format. You can modify, add, or delete parameters. | {"prop1":"value1","prop2":"value2"} |
Scheduling information | Cron expression | Read/Write | Must be a valid cron expression. | 00 00 * * * * ? |
Instance generation method | / | Read/Write | IMMEDIATELY: Generates instances immediately after publishing. T_PLUS_1: Generates instances on the next day (T+1). | T_PLUS_1 |
Number of nodes | / | Read-only | / | 3 |
Source ID (if the workflow was migrated) | Advanced feature | Read-only | / | / |
Statistics by node type (multiple columns) | / | Read-only | / | / |
2.3 Modify node scheduling properties
You can change some node properties in the report. Editable fields are marked in blue.
Property Name | Description | Type | Value | Example |
ID | / | Read-only | / | 16373881518720 |
Name | / | Read-only | / | Node1 |
Owner | / | Read/Write | / | admin |
Description | / | Read/Write | / | This is a test node 123 |
Parent Workflow ID | / | Read-only | / | 16373885761152 |
Parent Workflow Name | / | Read-only | / | TestWorkflow2 |
Type | / | Read/Write | / | SQL |
Related data source type | / | Read/Write | / | MYSQL |
Related data source | / | Read/Write | / | test_mysql_123 |
Scheduling parameters1 | Workflow-level parameters | Read/Write | JSON format. You can modify, add, or delete parameters. | {"prop1":"value1","prop2":"value2"} |
Script parameter extraction1 | Parameter references in the script | Read/Write | JSON format. You can modify, add, or delete parameters. | {"$(param1)":"${Param1}"} |
Scheduling information | Cron expression | Read/Write | Must be a valid cron expression. | 00 00 * * * * ? |
Instance generation method | / | Read/Write | IMMEDIATELY: Generates instances immediately after publishing. T_PLUS_1: Generates instances on the next day (T+1). | T_PLUS_1 |
Scheduling type | / | Read/Write | NORMAL: Normal scheduling PAUSE: Pause scheduling SKIP: Dry-run scheduling | NORMAL |
Rerun type | / | Read/Write | ALL_ALLOWED: Reruns are allowed after both successful and failed runs. ALL_DENIED: Reruns are not allowed after successful or failed runs. FAILURE_ALLOWED: Reruns are not allowed after successful runs, but are allowed after failed runs. | ALL_ALLOWED |
Scheduling resource group | DataWorks scheduling resource group | Read/Write | DataWorks general or scheduling resource group ID | Serverless_res_group_580581087550304_692540198941344 |
Compute CU | DataWorks compute CU | Read/Write | Float | 0.25 |
Image | DataWorks image | Read/Write | Image ID | System_emr_datalake_5151_ 20240731 |
Data Integration resource group (DI only) | DataWorks integration resource group | Read/Write | DataWorks general or integration resource group ID | Serverless_res_group_580581087550304_692540198941344 |
Data Integration CU (DI only) | DataWorks Data Integration CU | Read/Write | Float | 0.5 |
Source ID (if the workflow was migrated) | Advanced feature | Read-only | / | / |
Note 1: For usage details, see section 2.3.1, "Handle variables in nodes and code."
2.3.1 Handle variables in nodes and code
2.3.1.1 Differences in node variables among scheduling engines
Different scheduling engines use node variables in different ways. These differences must be handled during the scheduling transformation. The main differences fall into three categories:
· Differences in call format
Common call formats include ${param}, $[param], and $(param).
In DataWorks, node variables primarily use the ${} and $[] formats. When transforming schedules, you must handle variable references in the node code.
· Differences in built-in variables of scheduling engines
Scheduling engines provide various built-in variables, with time variables being the most common. Other variables, such as ${workflowName} and ${taskName}, are also available. Different scheduling engines offer different sets of built-in variables, and the format of time variables may also differ. These differences must be handled during the transformation.
In addition, DataWorks requires you to define built-in variables in the node's parameter table before you can use them. However, some scheduling engines, such as WeData, allow you to use built-in variables directly in the node code. When migrating to DataWorks, you must add these variables to the node variables.
· Differences in multi-level parameter references
The rules for multi-level references to Project-level, Workflow-level, and Node-level variables may differ.
2.3.1.2 Use the overview report to modify variables in nodes and code
In a migration scenario, there are many differences between the node variables of the source and destination scheduling engines. The tool provides a general feature that lets you handle variables simply. This handling must be performed after the scheduling transformation is complete and before the scheduling import.
· Node variable completion
The tool can automatically detect variables referenced in the node code and compare them with the node's existing variables. If a variable is not found in the node's variable list, the tool automatically pre-populates the parameter in the parameter column of the report.
The tool detects variables in the node code by searching for substrings enclosed in ${}, $[], or $() and then deduplicating them.
For example, a script contains three substrings that appear to be variables: param1, param2, and param3. These variables do not exist in the node's custom variable table.
The tool automatically detects these substrings and displays a prompt in the overview report:
The tool automatically creates variable names. You can edit the table to add values for these variables. If a variable was incorrectly detected, you can delete it or leave it unchanged.
You can also edit the table to add additional node parameters, even if the tool did not pre-create the variable names.
· Replace variable call methods in node code
Because call formats differ, you must replace the variable references in the node code.
As an example, a node code contains three substrings that appear to be parameters: param1, param2, and param3.
The tool automatically detects these substrings and builds a map. You can modify this map to replace the variable call methods in the node code.
The key (before the colon) is the original string in the node code. The value (after the colon) is the replacement string.
For example, to replace all instances of $() and $[] with ${}, you can modify the cell as follows:
{"${param1}":"${param1}","$[param2]":"${param2}","$(param3)":"${param3}"}
The replacement uses the String.replace(CharSequence target, CharSequence replacement) method. Regular expressions will not work.
You can also edit this cell to replace variable names in the code. Note that if you change a variable name, you must also modify the node's scheduling variable table accordingly. For example, you can replace param1 with P1.
2.4 Workflow blacklist
The report lets you prevent the tool from processing certain workflows by deleting their rows from the workflow child table. This feature is known as the workflow blacklist.
Note: If workflows have dependencies on each other, process them in the same batch. Do not separate them using the blacklist. Separating dependent workflows will cause errors.