The Easysearch input component reads data from an Easysearch data source. To sync data from an Easysearch data source to another data source, you must first configure the Easysearch input component to read the source data. This topic describes how to configure the Easysearch input component.
Prerequisites
An Easysearch data source has been created. For more information, see Create an Easysearch data source.
The account that you use to configure the Easysearch input component must have read-through permission for the data source. If your account does not have the required permission, you must request it. For more information, see Request data source permissions.
Procedure
In the top menu bar of the Dataphin home page, choose Develop > Data Integration.
On the Data Integration page, select a Project from the top menu bar. In Dev-Prod mode, you must also select an environment.
In the navigation pane on the left, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline that you want to configure to open its configuration page.
In the upper-right corner of the page, click Component Library to open the Component Library panel.
In the navigation pane on the left of the Component Library panel, choose Input. Find the Easysearch component in the list of input components and drag it to the canvas.
On the Easysearch input component, click the
icon to open the Easysearch Input Configuration dialog box.In the Easysearch Input Configuration dialog box, configure the parameters.
Parameter
Description
Basic Configuration
Step Name
The name of the Easysearch input component. Dataphin automatically generates a step name. You can also change it as needed. The naming convention is as follows:
Can contain only Chinese characters, letters, underscores (_), and digits.
Cannot exceed 64 characters in length.
Datasource
The drop-down list displays all Easysearch data sources and project levels in the current Dataphin instance. This includes data sources for which you may or may not have read-through permissions. Click the
icon to copy the current data source name.For a data source without read-through permissions, click Request next to the data source to request the permissions. For more information, see Request data source permissions.
If you do not have an Easysearch data source, click Create to create one. For more information, see Create an Easysearch data source.
Index Document
The name of the index in Easysearch. Click the
icon to copy the name of the selected index document.Query Condition
The query parameter for Easysearch. Use it for full or incremental queries. For example,
{ "match_all": {}}performs a full query.Cursor Time
The duration to keep the scroll context alive. This is the paging parameter for Easysearch.
If this value is too small and the idle time between two data fetches exceeds the scroll time, the scroll context expires. This can cause data loss.
If this value is too large and too many queries are initiated at the same time, the number of open scroll contexts may exceed the
max_open_scroll_contextsetting on the server. This causes query errors. For example, `5m` specifies a 5-minute scroll time.
Unit: d (days), h (hours), m (minutes), s (seconds), ms (milliseconds), micros (microseconds), nanos (nanoseconds).
Advanced Configuration
Batch Size
The number of records to read at a time. The default value is 1024. When reading from the source database, you can configure a specific batch size instead of reading one record at a time. This reduces interactions with the data source, improves I/O efficiency, and lowers network latency.
Connection Timeout
The client connection timeout. The default value is 60000 milliseconds.
Read Timeout
The client read timeout. The default value is 60000 milliseconds.
Date Format
If a field to be synced is of the date type and its
mappingdoes not have aformatconfiguration, you must configure thedateFormatparameter. The default format in Easysearch isyyyy-MM-dd'T'HH:mm:ssZ.Output Fields
Displays the output fields.
Add Fields In Batch.
Click Batch Add.
Configure in JSON format. The following is an example:
[{"name":"col_integer","type":"integer"}, {"name":"col_long","type":"long"}, {"name":"col_double","type":"double"}]Note`name` is the name of the imported field, and `type` is the data type of the field after import. For example,
"name":"user_id","type":"String"imports the field named `user_id` and sets its data type to String.Configure in TEXT format. The following is an example:
col_long,long col_double,doubleThe row delimiter separates the information for each field. The default delimiter is a line feed (\n). Semicolons (;) and periods (.) are also supported.
The column delimiter separates the field name from the field type. The default delimiter is a comma (,).
Click OK.
Create An Output Field.
Click Create Output Field, then enter a Column and select a Type as prompted.
Manage output fields.
You can perform the following operations on added fields:
Click and drag the
move icon next to Column to change the position of the field.Click the Edit icon
in the Actions column to edit existing fields.Click the
delete icon in the Actions column to delete an existing field.
Click Confirm to complete the property configuration for the Easysearch input component.