The offline integration feature of Data Management (DMS) provides a low-code tool that you can use to develop data processing tasks. You can combine various task nodes to form a data flow and configure periodic scheduling to process or synchronize data.

Supported database types

  • MySQL: ApsaraDB RDS for MySQL, PolarDB for MySQL, MyBase for MySQL, PolarDB for Xscale, AnalyticDB for MySQL V3.0, and MySQL databases that are not on Alibaba Cloud
  • SQL Server: ApsaraDB RDS for SQL Server, MyBase for SQL Server, and SQL Server databases that are not on Alibaba Cloud
  • PostgreSQL: ApsaraDB RDS for PostgreSQL, PolarDB for PostgreSQL, MyBase for PostgreSQL, AnalyticDB for PostgreSQL, and PostgreSQL databases that are not on Alibaba Cloud
  • Oracle
  • Db2
  • MaxCompute
  • Hologres
    Note Hologres is supported only for Data Import nodes.
  • OSS
    Note OSS is supported only for Data Output nodes.

Scenarios

The offline integration feature supports the batch processing of data. You can use the feature in the following scenarios:

  • You can construct an offline data warehouse by using this low-code tool in a visualized way. Then, you can use this data warehouse to perform ad hoc query, data analysis from multiple dimensions, data mining, and offline computing.
  • You can process a large amount of complex big data in scenarios such as refined enterprise operations, digital marketing, and intelligent recommendation.
  • You can use the offline integration feature that is developed based on Spark SQL to significantly improve the efficiency of Spark SQL nodes on a Hadoop-based platform.
Note To contact technical support and give feedback, search for the DingTalk group ID 31826394 and join the DingTalk group.

Procedure

  1. Log on to the DMS console V5.0.
  2. In the top navigation bar, click DTS. In the left-side navigation pane, choose Data integration > Batch Integration.
  3. Click Create Data Flow.
  4. In the Create Data Flow dialog box, set the Data Flow Name and Description parameters. Then, click OK.
  5. On the details page of the data flow, create nodes for the data flow. For more information, see Configure a data flow.
  6. Click the blank area on the canvas to configure the data flow.
    1. Click the Data Flow Information tab. In the Properties section, set the Data Flow Name, Description, Owner, and Stakeholders parameters.
    2. In the Scheduling Settings section, turn on Enable Scheduling to schedule the data flow based on your needs. For more information, see Overview.
    3. Click the Advanced Settings tab and configure variables. For more information, see Configure time variables.
  7. Publish the data flow. For more information, see Publish a data flow.
  8. Optional: In the upper-right corner of the pages, click Go to O&M to perform O&M operations on the data flow. For more information, see Manage a data flow.