The OceanBase input component enables reading data from an OceanBase data source. When synchronizing data from OceanBase to other data sources, configure the source data source information in the OceanBase input component, followed by the target data source for synchronization. This topic describes the configuration process for the OceanBase input component.
Prerequisites
An OceanBase data source has been established. For more information, see Create OceanBase Data Source.
To configure the OceanBase input component properties, the account must have read-through permission for the data source. If you lack the necessary permissions, you must obtain them from the data source. For more information, see Request, Renew, and Return Data Source Permissions.
Procedure
Select Development > Data Integration from the top menu bar on the Dataphin home page.
Choose Project from the top menu bar on the integration page (Dev-Prod mode requires selecting the environment).
In the left-side navigation pane, click Batch Pipeline. From the Batch Pipeline list, select the offline pipeline you want to develop to access its configuration page.
Click on the Component Library at the top right of the page to access the Component Library panel.
In the Component Library panel's left-side navigation pane, select Input. Locate the OceanBase component in the list on the right and drag it onto the canvas.
To configure the OceanBase input component, click the
icon on the component card, which opens the OceanBase Input Configuration dialog box.Configure the necessary parameters in the Oceanbase Input Configuration dialog box.
Parameter
Description
Step Name
The name of the OceanBase input component. Dataphin automatically generates the step name, and you can also modify it according to the business scenario. The naming convention is as follows:
Can only contain Chinese characters, letters, underscores (_), and numbers.
Cannot exceed 64 characters.
Datasource
Select the data source. The data source drop-down list displays all OceanBase type data sources in the current Dataphin, including data sources for which you have read-through permission and those for which you do not. Click the
icon to copy the current data source name.For data sources without read-through permission, you can click Request after the data source to request read-through permission for the data source. For more information, see Request, Renew, and Return Data Source Permissions.
If you do not have an OceanBase type data source, click Create to create a data source. For more information, see Create OceanBase Data Source.
Table
Select the source table for data synchronization. You can enter the table name keyword to search or enter the exact table name and then click Precise Search. After selecting the table, the system will automatically detect the table status. Click the
icon to copy the name of the currently selected table.Shard Key (Optional)
The system performs data sharding based on the configured shard key field, which can be used in conjunction with the concurrency configuration to achieve concurrent reading. It supports using a column in the source data table as the shard key. Additionally, it is recommended to use the primary key or an indexed column as the shard key to ensure transmission performance.
ImportantWhen selecting the date-time type, the system will perform brute-force sharding based on the maximum and minimum values identified, according to the total time range and concurrency. Average is not guaranteed.
Batch Read Count (Optional)
The number of data entries read at one time. When reading data from the source database, you can configure a specific batch read count (such as 1024 records) instead of reading one by one to reduce the number of interactions with the data source, improve I/O efficiency, and reduce network latency.
Input Filter (Optional)
Configure the filter conditions for extracting data. The configuration instructions are as follows:
Configure Static Field: Extract the corresponding data, such as
ds=20210101.Configure Variable Parameter: Extract a portion of the data, such as
ds=${bizdate}.
Output Fields
The output fields area displays all fields hit by the selected table and filter conditions. You can perform the following operations:
Field Management: If you do not need to output certain fields to downstream components, you can delete the corresponding fields:
Single Field Deletion Scenario: If you need to delete a small number of fields, you can click the
icon under the operation column to delete the extra fields.Batch Field Deletion Scenario: If you need to delete many fields, you can click Field Management, select multiple fields in the Field Management dialog box, click the
shift left icon to move the selected input fields to the unselected input fields, and click Confirm to complete the batch deletion of fields.
Batch addition: Click Batch Addition to support batch configuration in JSON, TEXT, or DDL format.
Batch configuration in JSON format, for example:
// Example: [{ "index": 0, "name": "id", "type": "int(10)", "mapType": "Long", "comment": "comment1" }, { "index": 1, "name": "user_name", "type": "varchar(255)", "mapType": "String", "comment": "comment2" }]NoteIndex indicates the column number of the specified object. Name indicates the field name after import. Type indicates the field type after import. For example,
"index":3,"name":"user_id","type":"String"indicates that the fourth column in the file is imported, with the field name user_id and the field type String.Batch configuration in TEXT format, for example:
// Example: 0,id,int(10),Long,comment1 1,user_name,varchar(255),Long,comment2The row delimiter is used to separate each field's information. The default is the line feed (\n), and it supports line feed (\n), semicolon (;), or period (.).
The column delimiter is used to separate the field name and field type. The default is a comma (,).
Batch configuration in DDL format, for example:
CREATE TABLE tablename ( user_id serial, username VARCHAR(50), password VARCHAR(50), email VARCHAR (255), created_on TIMESTAMP, );
Create Output Field: Click +create Output Field, fill in Column, Type, Comment, and select Mapping Type according to the page prompts.
NoteThe field types in pipeline development are uniformly mapped by the Dataphin system based on the source field conditions. For manually added fields, please set the field type in the format of
Type(Length), such asint(10).If the data source is selected as a data source with network interconnection with the Dataphin cluster, each field in the output fields only supports deletion and does not support editing.
Click Confirm to finalize the property configuration for the Oceanbase Input Component.