Data Management (DMS) provides the offline integration feature that allows you to combine various task nodes to form a data flow and configure periodic scheduling to process or synchronize data. This topic describes how to create and configure a data flow.

Limits

The following types of databases are supported:
  • MySQL: ApsaraDB RDS for MySQL, PolarDB for MySQL, MyBase for MySQL, PolarDB-X, AnalyticDB for MySQL V3.0, and MySQL databases that are not on Alibaba Cloud
  • SQL Server: ApsaraDB RDS for SQL Server, MyBase for SQL Server, and SQL Server databases that are not on Alibaba Cloud
  • PostgreSQL: ApsaraDB RDS for PostgreSQL, PolarDB for PostgreSQL, MyBase for PostgreSQL, AnalyticDB for PostgreSQL, and PostgreSQL databases that are not on Alibaba Cloud
  • Oracle
  • Db2
  • MaxCompute
  • Hologres
    Note Hologres is supported only for Data Import nodes.
  • OSS
    Note OSS is supported only for Data Output nodes.
  1. Log on to the DMS console.
  2. In the top navigation bar, click DTS. In the left-side navigation pane, choose Data integration > Data processing.
  3. Click the name of the data flow that you want to configure to go to the details page of the data flow.
  4. Configure a Data Import node.
    Note The first node of the data flow must be a Data Import node, which specifies the source table from which the data flow reads data.
    1. In the Task Type list on the left side of the canvas, drag the Data Import node to the blank area on the canvas.
    2. Select the Data Import node that you created on the canvas. On the Source of data tab in the lower part, configure the data source.
      Parameter Description
      Database Type Select the type of the database from which the data flow reads data.
      Database
      1. Enter a keyword to search for databases and select the source database from the drop-down list.
      2. If you have not logged on to the selected database, the Login Instance dialog box appears. In the dialog box, set the Database Account and Database password parameters.
      Table Select the table from which the data flow reads data.
  5. Configure a data processing node. In this example, a Data Filtering node is configured to filter data in the data source.
    Note All types of nodes other than Data Import and Data Output can be configured as data processing nodes.
    1. In the Task Type list on the left side of the canvas, drag the Data Filtering node to the blank area on the canvas.
    2. Move the pointer over the Data Import node, click the hollow dot on the right side of the Data Import node, and then drag the connection line to the Data Filtering node.
    3. Select the Data Filtering node that you created on the canvas. On the Data Filtering tab in the lower part, specify filter conditions for the data source.Filter Expressions
      For example, you can enter name='Jack' in the field as a filter condition.
      Note You can also double-click a function on the right side of the Data Filtering tab to specify filter conditions.
  6. Configure a Data Output node.
    Note The last node of the data flow must be a Data Output node, which specifies the destination table to which the processed data is written.
    1. In the Task Type list on the left side of the canvas, drag the Data Output node to the blank area on the canvas.
    2. Select the Data Output node that you created on the canvas. On the Data Output tab in the lower part, configure the data output.
      • Parameters that you can set if the data output is a database
        Parameter Description
        Database Type The type of the database in which the destination table resides.
        Database The name of the destination database. You can enter a keyword to search for databases.
        Note If you have not logged on to the selected database, enter the database account and password in the Login Instance dialog box that appears.
        Table name The destination table to which the data flow writes the processed data. Enter the name of an existing table or a new table.
        SQL Statements Executed Before Writing The SQL statements to be executed before the data is written.
        SQL Statements Executed After Writing The SQL statements to be executed after the data is written.
        Automatic Table Creation Specifies whether to automatically create a table as the destination table if the specified destination table does not exist. You can turn on or off Automatic Table Creation.
        • Turn off: does not automatically create a table as the destination table. In this case, the data flow fails to run.
        • Turn on: automatically creates a table as the destination table. In this case, the data flow continues to run.
      • Parameters that you can set if the data output is an Object Storage Service (OSS) bucket
        Parameter Description
        Database Type The type of the database in which the destination table resides. In this example, select OSS.
        OSS Bucket The OSS bucket in which the destination table resides. You can enter a keyword to search for OSS buckets.
        OSS Directory The path in which data is stored in the OSS bucket.
        Table name The destination table to which the data flow writes the processed data. Enter the name of an existing table or a new table.
        Overwrite Destination Table Specifies whether to overwrite the existing data in the specified destination table. You can turn on or off Overwrite Destination Table.
        • Turn off: writes data to the destination table.
        • Turn on: clears the existing data in the destination table or a partition, and then writes data.
        File Format The storage format of the destination table.

        Valid values: parquet, orc, avro, and csv.

        Compress The compression format of the destination table.
        Partition The partition key of the destination table. You can use the value of the partition key to query the data you need.
        Note You can set this parameter only after you configure the Data Import node and connect the nodes on the canvas.
    3. Move the pointer over the Data Filtering node, click the hollow dot on the right side of the Data Filtering node, and then drag the connection line to the Data Output node.
      Then, the ! icon on the right side of the nodes automatically disappears, which means the dependencies of the nodes in the data flow are all configured.