Sync Oracle Data Using the Batch Pipeline Input Component - Dataphin

The Oracle input component retrieves data from an Oracle data source. When you sync data from an Oracle data source to another data source, configure the Oracle input component first to specify the source data source and then configure the destination data source. This topic describes how to configure the Oracle input component.

Prerequisites

You have created an Oracle data source. For more information, see Create an Oracle Data Source.
The account used to configure the Oracle input component must have sync-read permission on the data source. If the account does not have this permission, request it. For more information, see Request Data Source Permissions.

Procedure

On the Dataphin homepage, in the top menu bar, click Develop, and then click Data Integration.
On the Data Integration page, in the top menu bar, select a Project. If you are using Dev-Prod mode, also select an environment.
In the left navigation pane, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline that you want to develop. The offline pipeline configuration page opens.
In the upper-right corner of the page, click Component Library to open the Component Library panel.
In the left navigation pane of the Component Library panel, click Input. In the input component list on the right, locate the Oracle component and drag it onto the canvas.
On the Oracle input component card, click the icon to open the Oracle Input Configuration dialog box.

In the Oracle Input Configuration dialog box, configure the following parameters.

Parameter	Description
Step Name	The name of the Oracle input component. Dataphin generates a step name automatically. You can change it based on your business scenario. Naming rules: Use only Chinese characters, letters, underscores (_), and digits. Keep the name up to 64 characters long.
Datasource	The drop-down list shows all Oracle data sources in Dataphin. It includes data sources for which you have sync-read permission and those for which you do not. Click the icon to copy the current data source name. If you do not have sync-read permission for a data source, click Request next to the data source to request sync-read permission. For more information, see Request Data Source Permissions. If you do not have an Oracle data source, click Create Data Source to create one. For more information, see Create an Oracle Data Source.
Time Zone	Dataphin processes time-formatted data based on the current time zone. By default, Dataphin uses the time zone configured for the selected data source. You cannot change this setting. Note For tasks created before version V5.1.2, you can choose Data Source Default Configuration or Channel Configuration Time Zone. The default option is Channel Configuration Time Zone. Data Source Default Configuration: The default time zone of the selected data source. Channel Configuration Time Zone: The time zone configured for the current integration task in Properties > Channel Configuration.
Schema (optional)	Select the schema where the table resides. This supports cross-schema table selection. If you do not specify a schema, Dataphin uses the schema configured for the data source.
Source Table Count	Select the number of source tables. Options are Single Table and Multiple Tables: Single Table: Use this option when you sync business data from one source table to one destination table. Multiple Tables: Use this option when you sync business data from multiple source tables to one destination table. When writing data from multiple tables into one destination table, Dataphin uses the union algorithm.
Table Matching Method	Choose Generic Rule or Database Regex. Note This parameter is available only when you select Multiple Tables for Source Table Count.
Table	Select the source table: If you select Single Table for Source Table Count, search by entering a keyword for the table name. Or enter the exact table name and click Exact Match. After you select a table, Dataphin automatically checks its status. Click the icon to copy the selected table name. If you select Multiple Tables for Source Table Count, enter an expression based on the table matching method. If you select Generic Rule for table matching: Enter an expression in the field to filter tables with the same structure. Dataphin supports enumeration, regex-like patterns, and mixed formats. For example: `table_[001-100];table_102;`. If you select Database Regex for table matching: Enter a regex pattern supported by your database. Dataphin matches tables in the destination database using this pattern. During task runtime, Dataphin dynamically matches new tables based on the regex. After you enter the expression, click Exact Match. In the Confirm Match Details dialog box, view the list of matched tables.
Split Key (optional)	Dataphin splits data based on the split key column you specify. Use this with concurrency settings to read data concurrently. You can use any column from the source table as the split key. For best performance, use a primary key or indexed column. Important If you select a date-time type, Dataphin performs brute-force splitting across the full time range based on the maximum and minimum values. This method does not guarantee even distribution.
Batch Read Size (optional)	The number of records to read at a time. Configure a batch size such as 1024 records instead of reading one record at a time. This reduces interactions with the source database, improves I/O efficiency, and lowers network latency.
Codec (optional)	Select the codec for reading data. Dataphin supports the following Codecs: UTF-8, GBK, and ISO-8859-1.
Input Filter (optional)	Set conditions to filter the data you extract. Configuration details: Static value: Extract specific data. For example: `ds=20211111`. Variable parameter: Extract part of the data. For example: `ds=${bizdate}`.
Output Fields	This section lists all fields from the selected table and filtered by your input filter. You can perform the following actions: Field management: Remove fields you do not need to pass to downstream components: Remove individual fields: Click the icon in the Actions column to remove extra fields. Batch Field Deletion Scenario: To delete many fields, you can click Field Management. In the Field Management dialog box, select multiple fields, click the left-shift icon to move the selected input fields to the unselected input fields, and click OK to complete batch field deletion. Batch add: Click Batch Add to configure output fields in JSON, TEXT, or DDL format. Note After you click OK, the batch configuration overwrites existing field configurations. JSON format example: `// Example: [{ "index": 1, "name": "id", "type": "int(10)", "mapType": "Long", "comment": "comment1" }, { "index": 2, "name": "user_name", "type": "varchar(255)", "mapType": "String", "comment": "comment2" }]` Note index is the column number. name is the field name after import. type is the field type after import. For example, `"index":3,"name":"user_id","type":"String"` imports column 4 from the file as user_id with type String. Batch configuration in TEXT format, for example: `// Example: 1,id,int(10),Long,comment1 2,user_name,varchar(255),Long,comment2` Row delimiter separates field entries. The default is line feed (\n). You can also use semicolon (;) or period (.). Column delimiter separates field names and types. The default is comma (,). You can use`','`. Field type is optional and defaults to`','`. DDL format example: `CREATE TABLE tablename ( user_id serial, username VARCHAR(50), password VARCHAR(50), email VARCHAR (255), created_on TIMESTAMP, );` Create a new output field: Click + Create Output Field. Enter the Column, Type, and Comment. Select a Mapping Type. Click the icon to save the row.

Click OK to complete the configuration of the Oracle input component.