This topic describes how to configure the Elasticsearch output component.
Prerequisites
An Elasticsearch data source is created. For more information, see Create an Elasticsearch Data Source.
To configure the Elasticsearch output component properties, the account must possess write-through permissions for the data source. Should you lack these permissions, it is necessary to request access to the data source. For more information, see Request Data Source Permission.
Procedure
In the top menu bar of the Dataphin home page, select Development > Data Integration.
In the top menu bar of the integration page, select Project (Dev-Prod mode requires selecting an environment).
In the left-side navigation pane, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline that needs to be developed to open its configuration page.
Click Component Library in the upper right corner of the page to open the Component Library panel.
In the left-side navigation pane of the Component Library panel, select Output, find the Elasticsearch component in the input component list on the right, and drag it to the canvas.
Click and drag the
icon of the target input component to connect it to the current Elasticsearch output component.
Click the
icon in the Elasticsearch output component card to open the Elasticsearch Output Configuration dialog box.
In the Elasticsearch Output Configuration dialog box, configure parameters.
Parameter
Description
Step Name
The name of the Elasticsearch output component. Dataphin automatically generates the step name, which you can modify according to your business scenario. The naming convention is as follows:
Contains only Chinese characters, uppercase and lowercase English letters, underscores (_), and numbers.
Does not exceed 64 characters in length.
Datasource
In the data source drop-down list, all Elasticsearch-type data sources are displayed, including those for which you have write-through permission and those for which you do not.
If the data source lacks write-through permission, select Request adjacent to the data source to apply for the necessary permissions. For additional details, refer to Request, Renew, and Return Data Source Permission.
If you do not have an Elasticsearch-type data source, click Create to create a data source. For more information, see Create an Elasticsearch Data Source.
Query Type
You can select the index document to write based on Index or Alias. Different query types require different configuration information.
ImportantWhen selecting Alias, only aliases pointing to a single index and aliases with
is_write_index
set are supported. Otherwise, writing will fail.Index.
Index Document: The
index
name in Elasticsearch.Index Document Type: The type name of the index in Elasticsearch.
NoteIndex Document and Index Document Type are required for Elasticsearch 6.x and 7.x versions, but not required for Elasticsearch 8.x version.
Alias.
Index alias: An
index
alias in Elasticsearch isAlias
.Index Document Type: The type name of the index in Elasticsearch.
Field Separator
This is optional. Enter the separator between fields. If not entered, the system automatically adds a comma (,) as the separator.
Loading Policy
Select the policy for writing data to the target table. Loading Policy includes:
Overwrite Data means overwriting historical data in the target table based on the current source table.
Append Data means appending data to the existing data in the target table without modifying historical data.
NoteWhen Query Type is set to Alias, Loading Policy can only be set to Append Data.
Input Fields
Displays input fields based on the upstream output.
Output Fields
Displays output fields.
Get Field Information.
When Query Type is set to Index, you can click Get Field Information to obtain the field information of the selected Index.
Batch Add Fields.
Click Batch Add.
Configure in JSON format in batches. The example is as follows:
[{"name":"col_integer","type":"integer"}, {"name":"col_long","type":"long"}, {"name":"col_double","type":"double"}]
Notename indicates the name of the imported field, and type indicates the type of the imported field. For example:
"name":"user_id","type":"String"
indicates importing the field named user_id and setting the field type to String.Configure in TEXT format in batches. The example is as follows:
col_long,long col_double,double
The row delimiter is used to separate each field's information. The default is a line feed (\n), and it supports line feed (\n), semicolon (;), and period (.).
The column delimiter is used to separate the field name and field type. The default is a comma (,).
Click Confirm.
Create Output Field.
Click Create Output Field, and fill in Column and select Type as prompted on the page.
Copy Upstream Fields.
Reference upstream input fields as output fields.
Manage Output Fields.
You can perform the following operations on the added fields:
Click the drag Column next to the
shift icon to change the position of the field.
Click the Operation column's
edit icon to edit the existing fields.
Click the Operation column's
delete icon to remove the existing field.
Mapping
Mapping is used to map the input fields of the source table to the output fields of the target table, facilitating subsequent data synchronization. Mapping includes same-name mapping and same-row mapping. The applicable scenarios are described as follows:
Same-name Mapping: Maps fields with the same field name.
Same-row Mapping: The field names of the source table and the target table are inconsistent, but the data in the corresponding rows of the fields need to be mapped. Only fields in the same row are mapped.
Index Schema
NoteThis item needs to be configured only when Query Type is set to Index and Loading Policy is set to Overwrite Data.
Supports selecting system default or reusing online.
Reuse Online: Reuse the existing Elasticsearch index schema each time the index is rebuilt.
System Default: Automatically generate the index schema based on the output fields configured in the Elasticsearch output component each time the index is rebuilt.
Click Confirm to complete the property configuration of the Elasticsearch output component.