After you configure the Doris input component, you can read data from a Doris data source to Dataphin for data integration and data development. This topic describes how to configure a Doris input component.
Prerequisites
A Doris data source is added. For more information, see Create a Doris data source.
The account that configures the Doris input component properties must have the read-through permission on the data source. If you do not have the permission, you need to request the data source permission. For more information, see Request data source permissions.
Procedure
In the top navigation bar of the Dataphin homepage, choose Development > Data Integration.
In the top navigation bar of the integration page, select a project (In the Dev-Prod mode, you need to select an environment).
In the left-side navigation pane, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline that you want to develop to open its configuration page.
Click Component Library in the upper-right corner of the page to open the Component Library panel.
In the left-side navigation pane of the Component Library panel, select Inputs. Find the Doris component in the input component list on the right and drag it to the canvas.
Click the
icon in the Doris input component card to open the Doris Input Configuration dialog box.In the Doris Input Configuration dialog box, configure the parameters.
Parameter
Description
Step Name
The name of the Doris input component. Dataphin automatically generates a step name. You can also modify it based on your business scenario. The name must meet the following requirements:
It can contain only Chinese characters, letters, underscores (_), and digits.
It cannot exceed 64 characters in length.
Datasource
The data source dropdown list displays all Doris data sources in the current Dataphin instance, including those for which you have the read-through permission and those for which you do not have the read-through permission. Click the
icon to copy the current data source name.For data sources for which you do not have the read-through permission, you can click Request next to the data source to request the read permission for the corresponding data source. For more information about how to request the read permission for a data source, see Request data source permissions.
If you do not have a Doris data source, click Create Data Source to create a data source. For more information, see Create a Doris data source.
Source Table Quantity
Based on your actual scenario requirements, select a single table or multiple tables with the same structure as the input. Source Table Quantity includes Single Table and Multiple Tables:
Single Table: This option is applicable to scenarios where business data from one table is synchronized to one destination table.
Multiple Tables: This option is applicable to scenarios where business data from multiple tables is synchronized to the same destination table. When data from multiple tables is written to the same data table, the union algorithm is used.
Table
Select a source table:
If you select Single Table for Source Table Quantity, you can enter a keyword to search for a table or enter the exact table name and click Exact Match. After you select a table, the system automatically checks the table status. Click the
icon to copy the name of the selected table.If you select Multiple Tables for Source Table Quantity, perform the following operations to add tables.
In the input box, enter a table expression to filter tables with the same structure.
The system supports enumeration, regular expressions, and a combination of both. For example,
table_[001-100];table_102.Click Exact Match. In the Confirm Matching Details dialog box, view the list of matched tables.
Click OK.
Shard Key
You can use a column of the integer type in the source data table as the shard key. We recommend that you use the primary key or a column with an index as the shard key. When reading data, the system shards the data based on the configured shard key field to implement concurrent reading, which can improve data synchronization efficiency.
Batch Read Count
The number of records to read at a time. When reading data from the source database, you can configure a specific batch read count (such as 1,024 records) instead of reading records one by one. This reduces the number of interactions with the data source, improves I/O efficiency, and reduces network latency.
Input Filter
Configure the filtering conditions for data extraction. The configuration instructions are as follows:
Configure a static value to extract the corresponding data, for example,
ds=20210101.Configure a variable parameter to extract a specific part of the data, for example,
ds=${bizdate}.
Output Fields
The Output Fields section displays all fields that match the selected table and filtering conditions. You can perform the following operations:
Field Management: If you do not need to output certain fields to downstream components, you can delete these fields:
Delete a single field: If you need to delete a small number of fields, you can click the
icon in the Operation column to delete unnecessary fields.Delete multiple fields in batches: If you need to delete many fields, you can click Field Management. In the Field Management dialog box, select multiple fields, click the
left arrow icon to move the selected input fields to the unselected input fields, and then click OK to complete the batch deletion of fields.
Batch Add: Click Batch Add to configure fields in JSON format, TEXT format, or DDL format in batches.
NoteAfter you complete the batch addition and click OK, the system overwrites the configured field information.
Configure fields in JSON format, for example:
// Example: [{ "index": 1, "name": "id", "type": "int(10)", "mapType": "Long", "comment": "comment1" }, { "index": 2, "name": "user_name", "type": "varchar(255)", "mapType": "String", "comment": "comment2" }]Noteindex indicates the column number of the specified object, name indicates the field name after import, and type indicates the field type after import. For example,
"index":3,"name":"user_id","type":"String"indicates that the fourth column in the file is imported, the field name is user_id, and the field type is String.Configure fields in TEXT format, for example:
// Example: 1,id,int(10),Long,comment1 2,user_name,varchar(255),Long,comment2The row delimiter is used to separate the information of each field. The default is a line feed (\n). Line feed (\n), semicolon (;), and period (.) are supported.
The column delimiter is used to separate the field name and field type. The default is a comma (,). The system supports
','. The field type can be omitted. The default is','.
Configure fields in DDL format, for example:
CREATE TABLE tablename ( user_id serial, username VARCHAR(50), password VARCHAR(50), email VARCHAR (255), created_on TIMESTAMP, );
Create an output field: Click + Create Output Field, and fill in Column, Type, and Description, and select Mapping Type as prompted. After you complete the configuration of the current row, click the
icon to save it.
Click OK to complete the property configuration of the Doris input component.