This topic describes how to migrate E-MapReduce (EMR) projects to DataWorks workspaces by using the one-click migration feature or by exporting and importing a package.

Prerequisites

  • DataWorks is activated, and a DataWorks workspace is created.
  • A RAM user is assigned the administrator role or granted the AliyunDataWorksFullAccess and AliyunEMRFullAccess permissions. This is required if you use the Migration Assistant service of DataWorks as the RAM user.
  • An EMR cluster is associated with the DataWorks workspace. For more information, see Configure a workspace.

Background information

DataWorks provides the following two methods to use the Migration Assistant service to migrate workflows, one-time tasks, resources, and data sources from EMR clusters to DataWorks workspaces. When you migrate workflows, you can migrate nodes and scheduling policies. You can go to the Migration Assistant page in the DataWorks console to view the migration progress, migration result, and migration reports. For more information, see View the migration reports and result.

Method 1: Use the one-click migration feature in the EMR console

You can use the one-click migration feature in the EMR console to migrate the configuration information in EMR clusters to DataWorks workspaces. Perform the following steps:

  1. Log on to the EMR console.
  2. In the top navigation bar, select the region where the project that you want to migrate resides.
  3. Click the Data Platform tab and click the ID of the project that you want to migrate. On the page that appears, click the Projects tab, click Project Migration to DataWorks in the left-side navigation pane, and then click Select DataWorks Workspace. Project migration
  4. On the Procedure for Migrating EMR Workflow to DataWorks page, select a workspace, enter your remarks, and then click Migrate.
    Note After you click Migrate, the system automatically compresses the configuration information that you migrate into a package, exports the package from EMR, and then imports the package to the specified DataWorks workspace.
    One-click migration
  5. In the Note message, take note of the mappings between the types of nodes, scheduling policies, one-time tasks, resources, and data sources before the migration and those after the migration. You can use the mappings to check the integrity and validity of the migration. If the types are correct, click OK.
    • Original Node Type: the type in EMR before the project is migrated.
    • DataWorks Node Type: the type in DataWorks after the project is migrated to DataWorks.
    Type mappings before and after the migration
  6. The system starts to migrate the project. You can click View import tasks to view the migration progress. For more information, see View the migration reports and result.

Method 2: Use the DataWorks console to export and import a package

You can use the DataWorks console to export the nodes, scheduling policies, one-time tasks, resources, and data sources that are stored in an EMR cluster into a package, and then import the package to a specified DataWorks workspace. The Migration Assistant service of DataWorks in different editions provides different migration policies. Different roles are granted different permissions to use the Migration Assistant service. For more information, see Limits.

Note If you use the Migration Assistant service as a RAM user, make sure that the RAM user is granted the AliyunEMRFullAccess permission. Otherwise, when you select a value from the Project Name drop-down list, the system reports an error.
  1. Go to the Migration Assistant page in the DataWorks console.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. Select the region where the required workspace resides, find the workspace, and then click Data Analytics in the Actions column.
    4. On the DataStudio page, click the Icon icon in the upper-left corner and choose All Products > Other > Migration Assistant.
  2. Export a project from EMR.
    1. In the left-side navigation pane of the Migration Assistant page, choose Cloud tasks > Scheduling Engine Export.
    2. On the Schemes of Scheduling Engine Export page, click EMR and click Create Export Task. Procedure
    3. In the Create Export Task dialog box, enter the name of the export task in the Name field, select a project from the Project Name drop-down list, and then click Export.
      Export a project
    4. After the project is exported, return to the Schemes of Scheduling Engine Export page to view the export result. Click Download Export Package to download the exported package to your computer.
      Note The download address is valid for 30 days. Download the package before the expiration date. After the expiration date, if you want to download the package, you must re-export the project.
      Schemes of Scheduling Engine Export
  3. Import the downloaded package to a specified DataWorks workspace.
    1. In the left-side navigation pane of the Migration Assistant page, choose Cloud tasks > Scheduling Engine Import and click Create Import Task.
    2. In the Create Import Task dialog box, set the following parameters and click OK.
      Import task
      Parameter Description
      Name The name of the import task. You can customize the name.
      Engine type The type of the engine for the project that you want to import. Select E-MapReduce (EMR).
      Upload From The source of the package that you want to upload.
      • Local: Upload the package from your computer to the DataWorks workspace if the package is less than or equal to 30 MB in size.
      • OSS: Upload the package to Object Storage Service (OSS) and specify the OSS URL of the package if the package exceeds 30 MB in size. You can copy the OSS URL in the View Details panel of the package in the OSS console and specify the obtained OSS URL to upload the package to the DataWorks workspace. For more information about how to upload objects to OSS, see Upload objects. For more information about how to obtain the OSS URL of an object, see Share objects. OSS URL
      Select File Select the exported package of the EMR project. After the package is uploaded, the system automatically verifies whether the package meets the requirements.
      File Name The name of the uploaded package. The system automatically generates the name based on the package that you upload.
      Remarks The description of the import task.
    3. In the Edit Import Tasks page, check the project that you want to import and click Import.
    4. The system starts to migrate the project. You can click View import tasks to view the migration progress. For more information, see View the migration reports and result.

View the migration reports and result

After the project is migrated, you can go to the Migration Assistant page to view the migration progress, migration result, and migration reports.

  • View the report on the import task

    On the Scheduling Engine Import page, find the import task and click View Import Report in the Actions column.

    View the report on the import task
  • View the report on the export task

    On the Scheduling Engine Job Export page, click the EMR tab, find the export task, and then click View Export Report in the Actions column.

    View the report on the export task