Configure the Doris input component - Dataphin - Alibaba Cloud Documentation Center

The Doris input component reads data from a Doris data source into Dataphin for data integration and development.

Prerequisites

You have created a Doris data source. For more information, see Create a Doris data source.
The account that you use to configure the Doris input component must have sync read permission for the data source. If the permission is not granted, request it. For more information, see Request data source permissions.

Procedure

On the Dataphin home page, in the top menu bar, choose Develop > Data Integration.
On the integration page, in the top menu bar, select a Project. In Dev-Prod mode, also select an environment.
In the navigation pane on the left, click Offline Integration. Then, in the Offline Integration list, click the offline pipeline you want to develop to open its configuration page.
In the upper-right corner of the page, click Component Library to open the Component Library panel.
In the left navigation pane of the Component Library panel, select Input. In the list of input components on the right, locate the Doris component and drag it to the canvas.
Click the icon on the Doris input component card to open the Doris Input Configuration dialog box.

In the Doris Input Configuration dialog box, configure the parameters.

Parameter	Description
Step Name	The name of the Doris input component. Dataphin automatically generates a step name, which you can change. The naming convention is as follows: The name can contain only Chinese characters, letters, underscores (_), and digits. The name cannot exceed 64 characters in length.
Datasource	Lists all Doris data sources in the current Dataphin project, including those you have sync read permission for and those you do not. Click the icon to copy the data source name. For a data source for which you do not have sync read permission, click Request next to the data source to request the permission. For more information, see Request data source permissions. If you do not have a Doris data source, click Create Data Source to create one. For more information, see Create a Doris data source.
Number of source tables	Specifies whether to read from one or more tables with the same schema. The options are Single Table and Multiple Tables: Single Table: Syncs data from one source table to a single destination table. Multiple Tables: Syncs data from multiple source tables to a single destination table using a union algorithm.
Table matching mode	The matching mode for selecting source tables. You can select General Rule or Database Regex. Note This parameter is available only when Number of source tables is set to Multiple Tables.
Table	Select the source table or tables: If you set Number of source tables to Single Table, enter a keyword to search for the table, or enter the exact table name and click Exact Match. After you select a table, the system automatically checks its status. Click the icon to copy the name of the selected table. If you set Number of source tables to Multiple Tables, enter an expression to add tables based on the selected table matching mode. If you set Table matching mode to General Rule, enter an expression in the input box to filter for tables that have the same schema. The system supports enumerations, regular expression-like patterns, and a mix of both. For example, `table_[001-100];table_102;`. If you set Table matching mode to Database Regex, enter a regular expression that the destination database supports. The system matches tables based on this expression and dynamically synchronizes any newly matched tables at runtime. After you enter the expression, click Exact Match to view the list of matched tables in the Confirm Match Details dialog box.
Split key	An integer column from the source table used to partition data for concurrent reads. Use the primary key or an indexed column for best performance.
Batch read size	The number of records to read per batch. Configuring a batch size (for example, 1024) reduces round trips to the data source and improves I/O efficiency.
Input filter	A filter condition to extract specific data. Use a static value to extract corresponding data. For example, `ds=20210101`. Use a variable to extract a subset of data. For example, `ds=${bizdate}`.
Output Fields	Displays all fields from the selected tables that match the filter conditions. Manage fields: Delete fields that you do not need to output to downstream components: To delete a single field: Click the icon in the Actions column to remove the field. To delete fields in a batch: Click Manage Fields. In the Manage Fields dialog box, select multiple fields, click the shift-left icon to move them to the unselected list, and then click OK. Batch Add: Click Batch Add to configure fields in a batch using JSON, TEXT, or DDL format. Note After you add fields in a batch and click OK, the existing field configuration is overwritten. To configure in JSON format, for example: `// Example: [{ "index": 1, "name": "id", "type": "int(10)", "mapType": "Long", "comment": "comment1" }, { "index": 2, "name": "user_name", "type": "varchar(255)", "mapType": "String", "comment": "comment2" }]` Note `index` specifies the column number of the object. `name` specifies the field name after import. `type` specifies the field type after import. For example, `"index":3,"name":"user_id","type":"String"` means that the fourth column of the file is imported with the field name `user_id` and the field type `String`. To configure in TEXT format, for example: `// Example: 1,id,int(10),Long,comment1 2,user_name,varchar(255),Long,comment2` The row delimiter separates the information for each field. The default delimiter is a line feed (\n). Semicolons (;) and periods (.) are also supported. The column delimiter separates the field name from the field type. The default is a half-width comma (,). A`','` is supported. The field type can be omitted, and the default delimiter is a`','`. To configure in DDL format, for example: `CREATE TABLE tablename ( user_id serial, username VARCHAR(50), password VARCHAR(50), email VARCHAR (255), created_on TIMESTAMP, );` Add Output Field: Click + Add Output Field, and follow the prompts to enter the Column, Type, and Comment, and select the Mapping Type. After you configure the current row, click the icon to save.

Click Confirm to save the configuration for the Doris input component.