All Products
Search
Document Center

Dataphin:Configure Elasticsearch Input Components

Last Updated:May 28, 2025

The Elasticsearch input component is designed to read data from Elasticsearch data sources. When synchronizing data from Elasticsearch to other data sources, it's essential to configure the Elasticsearch input component first, followed by the target data source for synchronization. This guide describes the configuration process for the Elasticsearch input component.

Prerequisites

  • An Elasticsearch data source has been created. For more information, see Create Elasticsearch Data Source.

  • The account configuring the Elasticsearch input component must possess read-through permission for the data source. If you lack this permission, you need to request access to the data source. For more information, see Request Data Source Permission.

Procedure

  1. On the Dataphin home page, select Development > Data Integration from the top menu bar.

  2. In the integration page's top menu bar, select Project (Dev-Prod mode requires selecting an environment).

  3. Click Batch Pipeline in the left-side navigation pane. Then, in the Batch Pipeline list, click the offline pipeline to open its configuration page.

  4. Click Component Library in the upper-right corner to open the Component Library panel.

  5. In the Component Library panel's left-side navigation pane, select Input. Find the Elasticsearch component in the list on the right and drag it to the canvas.

  6. Click the image icon on the Elasticsearch input component card to open the Elasticsearch Input Configuration dialog box.

  7. Configure parameters in the Elasticsearch Input Configuration dialog box.

    Parameter

    Description

    Basic Configuration

    Step Name

    This is the name of the Elasticsearch input component. Dataphin automatically generates the step name, and you can also modify it based on the business scenario. The naming convention is as follows:

    • Can only contain Chinese characters, letters, underscores (_), and numbers.

    • Cannot exceed 64 characters.

    Datasource

    In the data source drop-down list, all Elasticsearch type data sources and project levels in the current Dataphin are displayed, including whether the data source has read-through permission. Click the image icon to copy the current data source name.

    • For data sources without read-through permission, you can click Request after the data source to request read-through permission for the data source. For more information, see Request Data Source Permission.

    • If you do not have an Elasticsearch type data source, click Create to create a data source. For more information, see Create Elasticsearch Data Source.

    Query Type

    You can select the index document to read based on the index or index alias. Different query types require different parameters.

    • Index.

      • Index Document: The index name in Elasticsearch. Click the image icon to copy the name of the currently selected index document.

      • Index Document Type: The type name of the index in Elasticsearch.

        Note

        Index Document and Index Document Type are required in Elasticsearch 6.x and Elasticsearch 7.x versions, and optional in Elasticsearch 8.x version.

    • Index Alias.

      • Index Alias: The alias of the index in Elasticsearch.

      • Index Document Type: The type name of the index in Elasticsearch.

    Query Conditions

    The query parameter of Elasticsearch, used for full or incremental queries. For example, { "match_all": {}} indicates a full query.

    Cursor Time

    Fill in the cursor storage time, which is the paging parameter of Elasticsearch.

    • If the setting is too small, and the idle time between retrieving two pages of data exceeds the scroll, it will cause the cursor to expire, leading to data loss.

    • When the setting is too large, if too many queries are initiated at the same time and exceed the server-side max_open_scroll_context configuration, it will result in a data query error. For example, 5m represents a cursor time of 5 minutes.

    Unit: Days (-d), hours (-h), minutes (-m), seconds (-s), milliseconds (-ms), microseconds (-micros), nanoseconds (-nanos).

    Advanced Configuration

    Batch Read Count

    The number of data read at one time, default is 1024. When reading data from the source database, you can configure a specific batch read count instead of reading one by one to reduce the number of interactions with the data source, improve I/O efficiency, and reduce network latency.

    Connection Timeout

    The client connection timeout, default is 6000 seconds.

    Management Timeout

    The client read timeout, default is 6000 seconds.

    Date Format

    When the synchronized field has a date type and the mapping of the field does not have a format configuration, you need to configure the dateFormat parameter. The default format in ES is: yyyy-MM-dd'T'HH:mm:ssZ.

    Output Fields

    Displays the output fields for you.

    • Retrieve Field Information.

      When the query type is Index, you can click Retrieve Field Information to obtain the field information of the selected Index.

    • Batch Add Fields.

      1. Click Batch Add.

        • Configure in JSON format in batches. The following sample code provides an example:

          [{"name":"col_integer","type":"integer"},
           {"name":"col_long","type":"long"},
           {"name":"col_double","type":"double"}]
          Note

          name indicates the name of the field to be introduced, and type indicates the type of the field after introduction. For example: "name":"user_id","type":"String" indicates that the field named user_id is introduced and the field type is set to String.

        • Configure in TEXT format in batches. The following sample code provides an example:

          col_long,long
          col_double,double
          • The row delimiter is used to separate each field's information. The default is a line feed (\n). It supports line feed (\n), semicolon (;), and period (.).

          • The column delimiter is used to separate the field name and field type. The default is a comma (,).

      2. Click Confirm.

    • Create New Output Field.

      Click Create New Output Field, and fill in Column and select Type according to the page prompts.

    • Manage Output Fields.

      You can perform the following operations on the added fields:

      • Click and drag the Column next to the image shift icon to change the position of the field.

      • Click the Operation column's agag edit icon to edit existing fields.

      • Click the Operation column's agfag delete icon to delete an existing field.

  8. Click Confirm to finalize the Elasticsearch input component's property configuration.