After you configure the DM input component, you can retrieve data from a DM data source into Dataphin for data integration and data development. This topic describes how to configure the DM input component.
Prerequisites
A DM data source is created. For more information, see Create a DM data source.
The account used to configure the DM input component has read-through permission for the data source. If you do not have this permission, request it. For more information, see Request, renew, and return data source permissions.
Procedure
On the menu bar at the top of the Dataphin home page, choose Development > Data Integration.
On the menu bar at the top of the Data Integration page, select a project. In Dev-Prod mode, also select an environment.
In the navigation pane on the left, click Batch Pipeline. In the Batch Pipeline list, click the target batch pipeline to open its configuration page.
In the upper-right corner of the page, click Component Library to open the Component Library panel.
In the navigation pane of the Component Library panel, choose Input. Find the DM component in the list and drag it to the canvas.
Click the
icon on the DM input component card to open the DM Input Configuration dialog box.In the DM Input Configuration dialog box, configure the parameters.
Parameter
Description
Step Name
The name of the DM input component. Dataphin automatically generates a step name. You can also change the name as needed. The naming conventions are as follows:
The name can contain only Chinese characters, letters, underscores (_), and digits.
The name cannot exceed 64 characters in length.
Datasource
The data source drop-down list displays all DM data sources in the current Dataphin project. This includes data sources for which you have read-through permission and those for which you do not. Click the
icon to copy the current data source name.For a data source where you lack read-through permission, click Request next to the data source to request the permission. For more information, see Request, renew, and return data source permissions.
If you do not have a DM data source, click Create Data Source to create one. For more information, see Create a DM data source.
Number of Source Tables
Select a single table or multiple tables that have the same schema as the input, as needed. The number of source tables can be Single Table or Multiple Tables.
Single Table: Use this option to sync data from one source table to one destination table.
Multiple Tables: Use this option to sync data from multiple source tables to the same destination table. When data from multiple tables is written to a single data table, the union algorithm is used.
Table Match Method
Select General Rule or Database Regex.
NoteThis parameter is available only when you set Number of Source Tables to Multiple Tables.
Table
Select the source table or tables:
If you set Number of Source Tables to Single Table, enter a keyword to search for the table, or enter the exact table name and click Exact Match. After you select a table, the system automatically checks its status. Click the
icon to copy the name of the selected table.If you set Number of Source Tables to Multiple Tables, enter an expression to add tables based on the selected table match method.
If you select General Rule for Table Match Method: In the input box, enter a table expression to filter for tables with the same structure. The system supports enumerations, regular expression-like patterns, and a mix of both. For example,
table_[001-100];table_102;.If you select Database Regex for Table Match Method: In the input box, enter a regular expression that the current database supports. The system matches tables in the destination database based on this expression. At runtime, the node uses the database regex to match the new range of tables in real time for synchronization.
After you enter the expression, click Exact Match to view the list of matched tables in the Confirm Match Details dialog box.
Split Key (Optional)
The system partitions data based on the configured split key. You can use this parameter with the concurrency parameter to enable concurrent reads. You can use a column from the source table as the split key. Use a primary key or an indexed column as the split key to ensure high performance.
ImportantIf you select a date and time type, the system identifies the maximum and minimum values and performs a rough split based on the total time range and concurrency. The splits are not guaranteed to be even.
Batch Read Size (Optional)
The number of records to read at a time. When reading from the source database, configure a specific batch read size, such as 1024 records, instead of reading one record at a time. This reduces interactions with the data source, improves I/O efficiency, and lowers network latency.
Input Filter (Optional)
Enter the filter information for the input fields. For example,
ds=${bizdate}. The Input Filter is applicable to the following scenarios:A fixed portion of data.
Parameter-based filtering.
Output Fields
The Output Fields section displays all fields from the selected tables that match the filter criteria. The following operations are supported:
Field Management: If you do not need to output certain fields to downstream components, delete them:
Single Field Deletion Scenario: To delete a small number of fields, click the
icon in the Actions column to delete the extra fields.To delete fields in a batch: To delete many fields, click Field Management. In the Field Management dialog box, select multiple fields, click the
left arrow icon to move the selected fields to the unselected list, and then click Confirm.
Batch Add: Click Batch Add. You can configure fields in a batch using JSON, TEXT, or DDL format.
NoteAfter you add fields in a batch and click Confirm, the existing field configuration is overwritten.
To configure in a batch using JSON format, for example:
// Example: [{ "index": 1, "name": "id", "type": "int(10)", "mapType": "Long", "comment": "comment1" }, { "index": 2, "name": "user_name", "type": "varchar(255)", "mapType": "String", "comment": "comment2" }]NoteThe `index` parameter specifies the column number of the object. The `name` parameter defines the field name, and the `type` parameter defines the field type after import. For example,
"index":3,"name":"user_id","type":"String"indicates that the fourth column from the file is imported, with 'user_id' as the field name and 'String' as the field type.To configure in a batch using TEXT format, for example:
// Example: 1,id,int(10),Long,comment1 2,user_name,varchar(255),Long,comment2The row delimiter separates the information for each field. The default delimiter is a line feed (\n). You can also use a semicolon (;) or a period (.).
The column delimiter separates field names from field types. The default value is a comma (,). You can use
','as the column delimiter. The field type is optional and defaults to','.
To configure in a batch using DDL format, for example:
CREATE TABLE tablename ( user_id serial, username VARCHAR(50), password VARCHAR(50), email VARCHAR (255), created_on TIMESTAMP, );
Add Output Field: Click +Add Output Field. Follow the prompts to enter the Column, Type, and Comment, and select the Mapping Type. After you configure the current row, click the
icon to save.
Click Confirm to complete the configuration of the DM input component properties.