DataWorks allows you to migrate tasks from open source scheduling engines such as Oozie and Azkaban to DataWorks. This topic describes the requirements for the exported files.

Background information

Before you import a task of an open source scheduling engine to DataWorks, you must export the task to your machine or Object Storage Service (OSS). For more information about the import procedure, see Import tasks of open source engines.

Export a task from Oozie

Requirements and structure of the exported package:
  • Requirements

    The package must contain XML-formatted definition files and configuration items of a flow task. The package is exported in the ZIP format.

  • Structure
    Oozie task descriptions are saved in a HDFS directory. For example, each subdirectory under the apps directory in the Examples package at the Apache Oozie official website is a flow task of Oozie. Each subdirectory contains XML-formatted definition files and configuration items of a flow task.Directories

Export a task from Azkaban

You can download a specific flow task in the Azkaban console.

  1. Log on to the Azkaban console to go to the Projects page.
  2. Select a project whose package you want to download. On the page for the project, click Flows to show all flow tasks under the project.
  3. Click Download in the upper-right corner of the page to download the package of the project.
    download

    Native Azkaban packages can be exported. No limit is imposed on the exported packages of Azkaban. The exported package in the ZIP format contains information about all tasks and relationships under a specific project of Azkaban.

Export a task of another open source engine

DataWorks provides a standard template for you to export tasks of open source engines except Oozie and Azkaba. Before you run an export task, you must download the standard template and modify the content based on the file structure in the template. You can go to the Open Source engine export page to download the standard template and view the file structure.

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides, find the workspace, and then click Data Analytics in the Actions column.
  2. Click the Icon icon in the upper-left corner. Then, choose All Products > Other > Migration Assistant.
  3. In the left-side navigation pane, choose Cloud tasks > Open Source engine export to go to the Open Source engine export scheme selection page.
  4. Click the Standard Template tab.
  5. On the Standard Template tab, click standard format Template to download the template.
  6. Modify the content in the template to generate a package to be exported.