All Products
Search
Document Center

Migration Hub:Feature overview

Last Updated:Sep 02, 2025

This topic describes the basic features of the scheduling migration tool in Lakehouse Migration.

Function overview

Lakehouse Migration (LHM) lets you quickly migrate jobs from open source and other cloud scheduling engines to DataWorks.

  • The scheduling migration process consists of three steps: exporting source jobs, transforming heterogeneous jobs, and importing jobs into DataWorks. Intermediate results are accessible, which gives you full control over the migration.

  • Flexible transformation configurations support multiple compute engines in DataWorks, such as MaxCompute, EMR, and Hologres.

  • It features a lightweight deployment that only requires a JDK 17 runtime environment and network connectivity.

  • Enhanced data security. The migration runs locally, and intermediate results are not uploaded.

Architecture diagram:

Scheduling migration steps

The LHM scheduling migration tool migrates and transforms jobs from any scheduling engine to DataWorks in a three-step process.

  1. Export scheduling tasks from the migration source (source discovery).

The tool retrieves scheduling task information from the source and parses it into the standard LHM data structure for scheduling workflows. This step standardizes the data structure.

  1. Transform scheduling properties from the migration source to DataWorks properties.

Source scheduling task properties are transformed into DataWorks task properties. This includes task types, scheduling settings, task parameters, and scripts for some task types. The transformation is based on the standard LHM data structure for scheduling workflows.

  1. Import scheduling tasks into DataWorks.

The tool automatically builds DataWorks workflow definitions and imports tasks by calling the DataWorks software development kit (SDK). The tool automatically determines whether to create or update tasks. This supports multiple migration rounds and the synchronization of source changes.

Scheduling migration capability matrix

The LHM tool currently supports automated migration of tasks from the following scheduling engines to DataWorks.

Scheduling migration from open source engines to DataWorks

Source type

Source version

Supported node types for transformation

DolphinScheduler

1.x

Shell, SQL, Python, DataX, Sqoop, Spark (Java, Python, SQL), MapReduce, Conditions, Dependent, SubProcess

2.x

Shell, SQL, Python, DataX, Sqoop, HiveCLI, Spark (Java, Python, SQL), MapReduce, Procedure, HTTP, Conditions, Switch, Dependent, SubProcess

3.x

Shell, SQL, Python, DataX, Sqoop, SeaTunnel, HiveCLI, Spark (Java, Python, SQL), MapReduce, Procedure, HTTP, Conditions, Switch, Dependent, SubProcess (renamed to SubWorkflow in version 3.3.0-alpha)

Airflow

2.x

EmptyOperator, DummyOperator, ExternalTaskSensor, BashOperator, HiveToMySqlTransfer, PrestoToMySqlTransfer, PythonOperator, HiveOperator, SqoopOperator, SparkSqlOperator, SparkSubmitOperator, SQLExecuteQueryOperator, PostgresOperator, MySqlOperator

AzkabanBeta

3.x

Noop, Shell, Subprocess

OozieBeta

5.x

Start, End, Kill, Decision, Fork, Join, MapReduce, Pig, FS, SubWorkflow, Java

HUEBeta

Latest

Fork, Join, OK, Error, Sqoop, Hive, Hive2, Shell

  • Latest refers to the latest version as of May 2025.

Scheduling migration from other cloud scheduling engines to DataWorks

Source type

Source version

Supported node types for transformation

DataArts (DGC)

Latest

CDMJob, HiveSQL, DWSSQL, DLISQL, RDSSQL, SparkSQL, Shell, DLISpark, MRSSpark, DLFSubJob, RESTAPI, Note, Dummy

WeData

Latest

Shell, HiveSql, JDBCSql, Python, SparkPy, SparkSql, Foreach, ForeachStart, ForeachEnd, Offline Sync

Azure Data Factory (ADF)Beta

Latest

DatabricksNotebook, ExecutePipeline, Copy, Script, Wait, WebActivity, AppendVariable, Delete, DatabricksSparkJar, DatabricksSparkPython, Fail, Filter, ForEach, GetMetadata, HDInsightHive, HDInsightMapReduce, HDInsightSpark, IfCondition, Lookup, SetVariable, SqlServerStoredProcedure, Switch, Until, Validation, SparkJob

Scheduling migration from EMR Workflow to DataWorks

EMR Workflow

2024.03 (Latest)

Shell, SQL, Python, DataX, Sqoop, SeaTunnel, HiveCLI, Spark, ImpalaShell, RemoteShell, MapReduce, Procedure, HTTP, Conditions, Switch, Dependent, SubProcess

DataWorks like-for-like migration path

Source type

Source version

Supported node types for transformation

DataWorks

New version

All nodes included in a periodically scheduled workflow

DataWorks Spec

New version

All nodes included in a periodically scheduled workflow