This topic describes how to migrate data across DataWorks workspaces in the same region.

Prerequisites

All the steps in the tutorial Build an online operation analysis platform are completed. For more information, see Business scenario and development process.

Background information

This topic uses the bigdata_DOC workspace created in the tutorial Build an online operation analysis platform as the source workspace. You need to create a destination workspace to store the tables, resources, configurations, and data synchronized from the source workspace.

Procedure

  1. Create a destination workspace.
    1. Log on to the DataWorks console. In the left-side navigation pane, click Workspaces.
    2. On the Workspaces page that appears, select the China (Hangzhou) region in the upper-left corner and click Create Workspace.
    3. In the Create Workspace pane that appears, set parameters in the Basic Settings step and click Next.
      Section Parameter Description
      Basic Information Workspace Name The name of the workspace. The name must be 3 to 27 characters in length and start with a letter. It can contain only letters, underscores (_), and digits.
      Display Name The display name of the workspace. The display name can be up to 27 characters in length. It can only start with a letter and can contain only letters, underscores (_), and digits.
      Mode The mode of the workspace. The mode is a new feature of DataWorks and includes the basic and standard modes. For more information about the differences between the basic and standard modes, see Basic mode and standard mode.
      • Basic mode: A basic workspace is associated with only one MaxCompute project. Basic workspaces do not isolate the development environment from the production environment. In basic workspaces, you can only perform basic data development and cannot strictly control the data development process and table permissions.
      • Standard mode: A standard workspace is associated with two different MaxCompute projects. One of the projects serves as the development environment, and the other serves as the production environment. Standard workspaces guarantee code development in a standard way and allow you to strictly control table permissions. Standard workspaces impose restrictions on table operations in the production environment for data security.
      Description The description of the workspace.
      Advanced Settings Download SELECT Query Result Specifies whether workspace members can download the query results returned by SELECT statements in DataStudio. If you disable this option, workspace members cannot download the query results.

      The source workspace bigdata_DOC is in the basic mode. For convenience, set Mode to Basic Mode (Production Environment Only) in the Basic Settings step when you create a destination workspace.

      Set Workspace Name to a globally unique name. We recommend that you use a name that is easy to distinguish. In this example, set Workspace Name to clone_test_doc.

    4. In the Select Engines and Services step, select the MaxCompute check box and Pay-As-You-Go in the Compute Engines section and click Next.
    5. In the Engine Details step, set the required parameters and click Create Workspace.
      Compute engine Parameter Description
      MaxCompute Instance Display Name The display name of the compute engine instance. The display name must be 3 to 27 characters in length, and can contain only letters, underscores (_), and digits. It must start with a letter.
      MaxCompute Project Name The name of the MaxCompute project. By default, the name is the same as that of the DataWorks workspace.
      Account for Accessing MaxCompute The identity used to access the MaxCompute project. For the development environment, the value is fixed to Task owner.

      For the production environment, the valid values are Alibaba Cloud primary account and Alibaba Cloud sub-account.

      Resource Group The quotas of computing resources and disk spaces for the compute engine instance.
  2. Clone node configurations and resources across workspaces.
    You can use the cross-workspace cloning feature of DataWorks to clone the node configurations and resources from the bigdata_DOC workspace to the clone_test_doc workspace. For more information, see Clone nodes across workspaces.
    Note
    • The cross-workspace cloning feature cannot clone table schemas or data.
    • The cross-workspace cloning feature cannot clone combined nodes. If the destination workspace needs to use the combined nodes that exist in the source workspace, you need to manually create the combined nodes in the destination workspace.
    1. Go to the bigdata_DOC workspace and click Cross-project cloning in the upper-right corner. The Create Clone Task page appears.
    2. Set Target Workspace to clone_test_doc and Workflow to Workshop that needs to be cloned. Select all the nodes in the workflow and click Add to List. Click To-Be-Cloned Node List in the upper-right corner.
    3. In the Nodes to Clone pane that appears, click Clone All. The selected nodes are cloned to the clone_test_doc workspace.
    4. Go to the destination workspace and check whether the nodes are cloned.
  3. Create tables.
    The cross-workspace cloning feature cannot clone table schemas. Therefore, you need to manually create required tables in the destination workspace.
    • For non-partitioned tables, we recommend that you use the following SQL statement to synchronize the table schema from the source workspace:
      create table table_name as select * from Source workspace. Table name; 
    • For partitioned tables, we recommend that you use the following SQL statement to synchronize the table schema from the source workspace:
      create table table_name partitioned by (Partition key column string); 

    Commit the tables to the production environment. For more information, see Create tables.

  4. Synchronize data.
    The cross-workspace cloning feature cannot clone data from the source workspace to the destination workspace. You need to manually synchronize required data to the destination workspace. To synchronize the data of the rpt_user_trace_log table from the source workspace to the destination workspace, follow these steps:
    1. Create a connection.
      1. Go to the Data Integration page and click Connection in the left-side navigation pane.
      2. On the Data Source page that appears, click Add a Connection in the upper-right corner. In the Add Connection dialog box that appears, select MaxCompute(ODPS) in the Big Data Storage section.
      3. In the Add MaxCompute(ODPS) Connection dialog box that appears, set Connection Name, MaxCompute Project Name, AccessKey ID, and AccessKey Secret, and click Complete. For more information, see Configure a MaxCompute connection.
    2. Create a batch sync node.
      1. Go to the DataStudio page, click the Data Analytics tab, and then click Workshop under Business Flow. Right-click Data Integration and choose Create > Batch Synchronization to create a batch sync node.
      2. On the configuration tab of the batch sync node, set the required parameters, as shown in the following figure. In this example, set Connection under Source to bigdata_DOC and that under Target to odps_first. Set Table to rpt_user_trace_log. After the configuration is completed, click the Properties tab in the right-side navigation pane.
      3. Click Use Root Node in the Dependencies section and commit the batch sync node.
    3. Generate retroactive data for the batch sync node.
      1. On the DataStudio page, click the DataWorks icon in the upper-left corner and choose All Products > Operation Center.
      2. On the page that appears, choose Cycle Task Maintenance > Cycle Task in the left-side navigation pane.
      3. On the page that appears, find the batch sync node you created in the node list and click the node name. On the canvas that appears on the right, right-click the batch sync node and choose Run > Current Node Retroactively.
      4. In the Patch Data dialog box that appears, set the required parameters. In this example, set Data Timestamp to Jun 11, 2019 - Jun 17, 2019 to synchronize data from multiple partitions. Click OK.
      5. On the Patch Data page that appears, check the running status of the retroactive instances that are generated. If Successful appears in the STATUS column of a retroactive instance, the instance is run and the corresponding data is synchronized.
    4. Verify the data synchronization.
      On the Data Analytics tab of the DataStudio page, right-click the Workshop workflow under Business Flow and choose Create > MaxCompute > ODPS SQL to create an ODPS SQL node. On the configuration tab of the ODPS SQL node, run the following SQL statement to check whether data is synchronized to the destination workspace:
      select * from rpt_user_trace_log where dt BETWEEN '20190611' and '20190617';