All Products
Search
Document Center

Data Management:Overview

Last Updated:Oct 31, 2023

The offline integration feature of Data Management (DMS) provides a low-code tool that you can use to develop data processing tasks. You can combine a variety of task nodes to create a data flow and configure periodic scheduling to process or synchronize data.

Supported database types

  • MySQL: ApsaraDB RDS for MySQL, PolarDB for MySQL, ApsaraDB MyBase for MySQL, PolarDB-X, AnalyticDB for MySQL V3.0, and MySQL databases from other sources

  • SQL Server: ApsaraDB RDS for SQL Server, ApsaraDB MyBase for SQL Server, and SQL Server databases from other sources

  • PostgreSQL: ApsaraDB RDS for PostgreSQL, PolarDB for PostgreSQL, ApsaraDB MyBase for PostgreSQL, AnalyticDB for PostgreSQL, and PostgreSQL databases from other sources

  • Oracle

  • Db2

  • MaxCompute

  • Hologres

  • Object Storage Service (OSS)

Note

Hologres databases can be used only to configure Data Import nodes. OSS buckets can be used only to configure Data Output nodes.

Scenarios

The offline integration feature supports the batch processing of data. You can use the feature in the following scenarios:

  • You can construct an offline data warehouse by using the low-code tool in a visualized way. Then, you can use this data warehouse for ad hoc query tasks, data analysis from multiple dimensions, data mining, and offline computing.

  • You can use the offline integration feature to process complex big data in scenarios such as refined enterprise operations, digital marketing, and intelligent recommendation.

  • You can use the offline integration feature that is developed based on Spark SQL to significantly improve the efficiency of Spark SQL nodes on a Hadoop-based platform.

Note

To contact technical support and give feedback, search for the DingTalk group ID 31826394 and join the DingTalk group.

Data integration flowchart

shujuliu

Procedure

  1. Log on to the DMS console V5.0.
  2. In the top navigation bar, click DTS > Data Integration > Batch Integration.

    Note

    If you use the DMS console in simple mode, move the pointer over the 2022-10-21_15-25-22.png icon in the upper-left corner of the DMS console and choose All functions > DTS > Data Integration > Batch Integration.

  3. Click Create Data Flow. In the Create Data Flow dialog box, enter a name in the Data Flow Name field. Then, click OK.

  4. On the details page of the data flow, create nodes for the data flow. For more information, see Configure a data flow.

  5. In the lower part of the page, configure the data flow.

    1. Click the Data Flow Information tab. In the Properties section, configure the Data Flow Name, Description, Owner, and Stakeholders parameters.

    2. To schedule the data flow, turn on Enable Scheduling in the Scheduling Settings section and configure the required parameters.

      Parameter

      Description

      Scheduling Type

      The scheduling type of the task flow. Valid values:

      • Cyclic scheduling: The task flow is periodically scheduled. For example, the task flow is run once a week.

      • Schedule once: The task flow is run once at a specific point in time. You need to specify only the point in time when the task flow is run.

      Effective Time

      The period during which the scheduling properties take effect. The default time period is from January 1, 1970 to January 1, 9999, which indicates that the scheduling properties permanently take effect.

      Scheduling Cycle

      The scheduling cycle of the task flow. Valid values:

      • Hour: The task flow is run within the hours that you specify.

      • Day: The task flow is run at the specified point in time every day. If you select this value, you must set the Specific Point in Time parameter.

      • Week: The task flow is run at the specified point in time on the days that you select every week. If you select this value, you must set the Specified Time and Specific Point in Time parameters.

      • Month: The task flow is run at the specified point in time on the days that you select every month. If you select this value, you must set the Specified Time and Specific Point in Time parameters.

      Timed Scheduling

      The scheduling method of the task flow. This parameter is displayed only if you select Hour as the value of the Scheduling Cycle parameter. DMS provides the following scheduling methods:

      • Run at an interval: You must specify the time range and the interval to run the task flow. Unit: hours.

        For example, you can set the Starting Time parameter to 00:00, the Intervals parameter to 6, and the End Time parameter to 20:59. In this case, DMS runs the task flow at 00:00, 06:00, 12:00, and 18:00.

      • Run at the specified point in time: You can select the hours at which DMS runs the task flow by using the Specified Time parameter.

        For example, if you select 0Hour and 5Hour, DMS runs the task flow at 00:00 and 05:00.

      Specified Time

      This parameter is displayed if you select Week or Month as the value of the Scheduling Cycle parameter.

      • If you set the Scheduling Cycle parameter to Week, you can select one or more days of a week from the drop-down list.

      • If you set the Scheduling Cycle parameter to Month, you can select one or more days of a month from the drop-down list.

      Specific Point in Time

      This parameter is displayed if you select Day, Week, or Month as the value of the Scheduling Cycle parameter, or if you select Schedule Once as the value of the Scheduling Type parameter.

      The point in time of the specified days at which DMS runs the task.

      For example, if you set this parameter to 02:55, DMS runs the task flow at 02:55 on the specified days.

      Cron Expression

      The CRON expression that is automatically generated based on the values that you specify for the preceding parameters.

    3. Click the Advanced Settings tab to configure the parameters in the Variable Setting section. For more information, see the "Configure time variables" section of the Variables topic.

  6. Publish the task flow. For more information, see Publish a data flow.

  7. Optional: In the upper-right corner of the pages, click Go to O&M to perform O&M operations on the data flow. For more information, see Manage a data flow.