DataWorks allows you to migrate tasks from open source scheduling engines, such as Oozie, Azkaban, and Airflow to DataWorks. This topic describes the requirements for exporting such tasks.
Background information
Export a task from Oozie
- Requirements
The package must contain XML-formatted definition files and configuration items of a flow task. The package is exported in the ZIP format.
- Structure
Oozie task descriptions are saved in an HDFS directory. For example, each subdirectory under the apps directory in the Examples package at the Apache Oozie official website is a flow task of Oozie. Each subdirectory contains XML-formatted definition files and configuration items of a flow task.
Export a task from Azkaban
You can download a specific flow task in the Azkaban console.
Export a task from Airflow
- To export a task from Airflow, perform the following steps:
- Go to the running environment of Airflow.
- Use the Python library of Airflow to load the directed acyclic graph (DAG) folder that is scheduled on Airflow. The DAG Python file is stored in the DAG folder.
- Use the export tool to read the task information and dependencies stored in the DAG
Python file based on the Python library of Airflow in memory. Then, write the generated
DAG information to a JSON file and export the file.
You can download the export tool on the Schemes of Scheduling Engine Export page of Migration Assistant in the DataWorks console. For more information about how to go to the Schemes of Scheduling Engine Export page, see Export a task of another open source engine.
- Usage notes of the export tool
- Execute the following statement to decompress the airflow-exporter.tgz package:
tar zxvf airflow-exporter.tgz
- Execute the following statement to set PYTHONPATH to the directory of the Python library:
export PYTHONPATH=/usr/local/lib/python3.6/site-packages
- Execute the following statement to export the task from Airflow:
cd airflow-exporter 2python3.6 ./parser -d /path/to/airflow/dag/floder/ -o output.json
- Go to the Scheduling Engine Import tab of the Migration Assistant page in the DataWorks
console to import the task.
- Execute the following statement to compress the exported output.json file into a ZIP
package:
zip out.zip output.json
- Go to the Import Tasks page of Migration Assistant in the DataWorks console to import the generated out.zip package. For more information, see Import tasks of open source engines.
- Execute the following statement to compress the exported output.json file into a ZIP
package:
- Execute the following statement to decompress the airflow-exporter.tgz package:
Export a task of another open source engine
DataWorks provides a standard template for you to export tasks of open source engines except for Oozie, Azkaban, and Airflow. Before you run an export task, you must download the standard template and modify the content based on the file structure in the template. You can go to the Schemes of Scheduling Engine Export page to download the standard template and view the file structure.
- Go to the DataStudio page.
- Log on to the DataWorks console.
- In the left-side navigation pane, click Workspaces.
- In the top navigation bar, select the region where your workspace resides, find the workspace, and then click Data Analytics in the Actions column.
- Click the
icon in the upper-left corner. Then, choose .
- In the left-side navigation pane, choose Schemes of Scheduling Engine Export page. to go to the
- Click Standard Template.
- On the Standard Template tab, click standard format Template to download the template.
- Modify the content in the template to generate a package that you want to export.