Data Management (DMS) provides the batch integration feature that allows you to combine various task nodes to form a data flow and configure periodic scheduling to process or synchronize data. This topic describes how to configure a data flow.
Limits
The databases to be used in a data flow must be of the following types:
MySQL: ApsaraDB RDS for MySQL, PolarDB for MySQL, MyBase for MySQL, PolarDB for Xscale, AnalyticDB for MySQL V3.0, and MySQL databases that are not on Alibaba Cloud
SQL Server: ApsaraDB RDS for SQL Server, MyBase for SQL Server, and SQL Server databases that are not on Alibaba Cloud
PostgreSQL: ApsaraDB RDS for PostgreSQL, PolarDB for PostgreSQL, MyBase for PostgreSQL, AnalyticDB for PostgreSQL, and PostgreSQL databases that are not on Alibaba Cloud
Oracle
Db2
MaxCompute
Hologres
NoteHologres databases can be used only to configure Data Import nodes.
OSS
NoteOSS buckets can be used only to configure Data Output nodes.
Procedure
- Log on to the DMS console V5.0.
In the top navigation bar, click .
NoteIf you use the DMS console in simple mode, move the pointer over the icon in the upper-left corner of the DMS console and choose
.Click the name of the data flow that you want to configure to go to the details page of the data flow.
Configure a Data Import node.
NoteThe first node of the data flow must be a Data Import node, which specifies the source table from which the data flow reads data.
In the Task Type list on the left side of the canvas, drag the Data Import node to the blank area on the canvas.
Click the Data Import node on the canvas. On the Source of data tab in the lower part, configure the parameters that are described in the following table.
Parameter
Description
Database Type
The type of the database from which the data flow reads data.
Database
The name of the database from which the data flow reads data. Enter a keyword to search for databases and select the source database from the drop-down list.
If you have not logged on to the database, configure the Database Account and Database password parameters in the Login Instance dialog box.
Table
The name of the table from which the data flow reads data.
Configure a data processing node. In this example, a Data Filtering node is configured to filter data in the data source.
NoteAll nodes except Data Import and Data Output nodes can be configured as data processing nodes.
In the Task Type list on the left side of the canvas, drag the Data Filtering node to the blank area on the canvas.
Move the pointer over the Data Import node, click the hollow circle on the right side of the Data Import node, and then drag the connection line to the Data Filtering node.
Click the Data Filtering node on the canvas. On the Data Filtering tab in the lower part, configure the Filter Expressions parameter.
For example, you can enter
name='Jack'
in the field as a filter condition.NoteYou can also double-click a function on the right side of the Data Filtering tab to specify filter conditions.
Configure a Data Output node.
NoteThe last node of the data flow must be a Data Output node, which specifies the destination table to which the processed data is written.
In the Task Type list on the left side of the canvas, drag the Data Output node to the blank area on the canvas.
Click the Data Output node on the canvas. On the Data Output tab in the lower part, configure the parameters based on your business requirements.
The following table describes the parameters that you can configure if the data output is a database.
Parameter
Description
Database Type
The type of the database in which the destination table resides.
Database
The name of the database in which the destination table resides. Enter a keyword to search for databases and select the destination database from the drop-down list.
NoteIf you have not logged on to the database, configure the Database Account and Database password parameters in the Login Instance dialog box.
Table name
The destination table to which the data flow writes the processed data. Enter the name of an existing table or a new table.
SQL Statements Executed Before Writing
The SQL statements to be executed before the data is written.
SQL Statements Executed After Writing
The SQL statements to be executed after the data is written.
Automatic Table Creation
Specifies whether to automatically create a table as the destination table if the specified destination table does not exist. You can turn on or off Automatic Table Creation.
Turn off: does not automatically create a table as the destination table. In this case, the data flow fails to run.
Turn on: automatically creates a table as the destination table. In this case, the data flow continues to run.
The following table describes the parameters that you can configure if the data output is an OSS bucket.
Parameter
Description
Database Type
The type of the database in which the destination table resides. In this example, OSS is selected.
OSS Bucket
The OSS bucket in which the destination table resides. Enter a keyword to search for OSS buckets and select the destination bucket from the drop-down list.
OSS Directory
The path in which data is stored in the OSS bucket.
Table name
The destination table to which the data flow writes the processed data. Enter the name of an existing table or a new table.
Overwrite Destination Table
Specifies whether to overwrite the existing data in the specified destination table. You can turn on or off Overwrite Destination Table.
Turn off: writes data to the destination table.
Turn on: clears the existing data in the destination table or a partition, and then writes data.
File Format
The storage format of the destination table.
Valid values: parquet, orc, avro, and csv.
Compress
The compression format of the destination table.
Partition
The partition key of the destination table. You can use the value of the partition key to query the data that you need.
NoteYou can configure this parameter only after you configure the Data Import node and connect the nodes on the canvas.
Move the pointer over the Data Filtering node, click the hollow circle on the right side of the Data Filtering node, and then drag the connection line to the Data Output node.
Then, the icon on the right side of the nodes disappears. This indicates that the dependencies of the nodes in the data flow are configured.