All Products
Search
Document Center

Data Management:Configure a data flow

Last Updated:Aug 29, 2023

Data Management (DMS) provides the batch integration feature that allows you to combine various task nodes to form a data flow and configure periodic scheduling to process or synchronize data. This topic describes how to configure a data flow.

Limits

The databases to be used in a data flow must be of the following types:

  • MySQL: ApsaraDB RDS for MySQL, PolarDB for MySQL, MyBase for MySQL, PolarDB for Xscale, AnalyticDB for MySQL V3.0, and MySQL databases that are not on Alibaba Cloud

  • SQL Server: ApsaraDB RDS for SQL Server, MyBase for SQL Server, and SQL Server databases that are not on Alibaba Cloud

  • PostgreSQL: ApsaraDB RDS for PostgreSQL, PolarDB for PostgreSQL, MyBase for PostgreSQL, AnalyticDB for PostgreSQL, and PostgreSQL databases that are not on Alibaba Cloud

  • Oracle

  • Db2

  • MaxCompute

  • Hologres

    Note

    Hologres databases can be used only to configure Data Import nodes.

  • OSS

    Note

    OSS buckets can be used only to configure Data Output nodes.

Procedure

  1. Log on to the DMS console V5.0.
  2. In the top navigation bar, click DTS > Data Integration > Batch Integration.

    Note

    If you use the DMS console in simple mode, move the pointer over the 2022-10-21_15-25-22.png icon in the upper-left corner of the DMS console and choose All functions > DTS > Data Integration > Batch Integration.

  3. Click the name of the data flow that you want to configure to go to the details page of the data flow.

  4. Configure a Data Import node.

    Note

    The first node of the data flow must be a Data Import node, which specifies the source table from which the data flow reads data.

    1. In the Task Type list on the left side of the canvas, drag the Data Import node to the blank area on the canvas.

    2. Click the Data Import node on the canvas. On the Source of data tab in the lower part, configure the parameters that are described in the following table.

      Parameter

      Description

      Database Type

      The type of the database from which the data flow reads data.

      Database

      1. The name of the database from which the data flow reads data. Enter a keyword to search for databases and select the source database from the drop-down list.

      2. If you have not logged on to the database, configure the Database Account and Database password parameters in the Login Instance dialog box.

      Table

      The name of the table from which the data flow reads data.

  5. Configure a data processing node. In this example, a Data Filtering node is configured to filter data in the data source.

    Note

    All nodes except Data Import and Data Output nodes can be configured as data processing nodes.

    1. In the Task Type list on the left side of the canvas, drag the Data Filtering node to the blank area on the canvas.

    2. Move the pointer over the Data Import node, click the hollow circle on the right side of the Data Import node, and then drag the connection line to the Data Filtering node.

    3. Click the Data Filtering node on the canvas. On the Data Filtering tab in the lower part, configure the Filter Expressions parameter.

      For example, you can enter name='Jack' in the field as a filter condition.

      Note

      You can also double-click a function on the right side of the Data Filtering tab to specify filter conditions.

  6. Configure a Data Output node.

    Note

    The last node of the data flow must be a Data Output node, which specifies the destination table to which the processed data is written.

    1. In the Task Type list on the left side of the canvas, drag the Data Output node to the blank area on the canvas.

    2. Click the Data Output node on the canvas. On the Data Output tab in the lower part, configure the parameters based on your business requirements.

      • The following table describes the parameters that you can configure if the data output is a database.

        Parameter

        Description

        Database Type

        The type of the database in which the destination table resides.

        Database

        The name of the database in which the destination table resides. Enter a keyword to search for databases and select the destination database from the drop-down list.

        Note

        If you have not logged on to the database, configure the Database Account and Database password parameters in the Login Instance dialog box.

        Table name

        The destination table to which the data flow writes the processed data. Enter the name of an existing table or a new table.

        SQL Statements Executed Before Writing

        The SQL statements to be executed before the data is written.

        SQL Statements Executed After Writing

        The SQL statements to be executed after the data is written.

        Automatic Table Creation

        Specifies whether to automatically create a table as the destination table if the specified destination table does not exist. You can turn on or off Automatic Table Creation.

        • Turn off: does not automatically create a table as the destination table. In this case, the data flow fails to run.

        • Turn on: automatically creates a table as the destination table. In this case, the data flow continues to run.

      • The following table describes the parameters that you can configure if the data output is an OSS bucket.

        Parameter

        Description

        Database Type

        The type of the database in which the destination table resides. In this example, OSS is selected.

        OSS Bucket

        The OSS bucket in which the destination table resides. Enter a keyword to search for OSS buckets and select the destination bucket from the drop-down list.

        OSS Directory

        The path in which data is stored in the OSS bucket.

        Table name

        The destination table to which the data flow writes the processed data. Enter the name of an existing table or a new table.

        Overwrite Destination Table

        Specifies whether to overwrite the existing data in the specified destination table. You can turn on or off Overwrite Destination Table.

        • Turn off: writes data to the destination table.

        • Turn on: clears the existing data in the destination table or a partition, and then writes data.

        File Format

        The storage format of the destination table.

        Valid values: parquet, orc, avro, and csv.

        Compress

        The compression format of the destination table.

        Partition

        The partition key of the destination table. You can use the value of the partition key to query the data that you need.

        Note

        You can configure this parameter only after you configure the Data Import node and connect the nodes on the canvas.

    3. Move the pointer over the Data Filtering node, click the hollow circle on the right side of the Data Filtering node, and then drag the connection line to the Data Output node.

      Then, the ! icon icon on the right side of the nodes disappears. This indicates that the dependencies of the nodes in the data flow are configured.