All Products
Search
Document Center

DataWorks:Configure Elasticsearch Writer

Last Updated:Aug 01, 2023

You can build a real-time data warehouse by using the real-time write capability of Elasticsearch.

Prerequisites

A reader or conversion node is configured. For more information about the data sources that support real-time synchronization, see Data source types that support real-time synchronization.

Limits

DataWorks allows you to add Alibaba Cloud Elasticsearch V5.X, V6.X, and V7.X clusters as data sources. Self-managed Elasticsearch clusters are not supported.

Procedure

  1. Go to the DataStudio page.

    1. Log on to the DataWorks console.

    2. In the left-side navigation pane, click Workspaces.

    3. In the top navigation bar, select the region in which the workspace that you want to manage resides. On the Workspaces page, find the workspace and click Shortcuts > Data Development in the Actions column.

  2. In the Scheduled Workflow pane, move the pointer over the Create a table icon and choose Create Node > Data Integration > Real-time synchronization.

    Alternatively, right-click the required workflow, and then choose Create Node > Data Integration > Real-time synchronizationReal-time synchronization.

  3. In the Create Node dialog box, set the Sync Method parameter to End-to-end ETL and configure the Name and Path parameters.

    Important

    The node name cannot exceed 128 characters in length and can contain letters, digits, underscores (_), and periods (.).

  4. Click Confirm.

  5. On the configuration tab of the real-time synchronization node, drag Elasticsearch in the Output section to the canvas. Connect it to the configured reader or conversion node in the canvas.
  6. Click the Elasticsearch node. In the panel that appears, configure the parameters.
    es
    ParameterDescription
    Data sourceThe Elasticsearch data source that you configured. In this example, you can select only an Elasticsearch data source.

    If no data source is available, click New data source on the right to add a data source on the Data Source page. For more information, see Add an Elasticsearch data source.

    IndexThe name of the index to which you want to write data.
    You can click Create Index on the right to create an index. You can directly use the default index information to create an index. Alternatively, you can modify the settings of Index Name, Index Type, Dynamic Mapping Status, Shards, Replicas, and Statement for Creating Index and create an index.
    • Index Type: This parameter is available only for Elasticsearch V6.X, V5.X, or earlier.
    • Dynamic Mapping Status: This parameter is used to specify the value of the dynamic parameter. The dynamic parameter determines whether Elasticsearch Writer dynamically writes new fields to the mappings of the index.
      • If you use an Elasticsearch cluster whose version is earlier than V7.10, this parameter has the following valid values: true, false, and strict.
      • If you use an Elasticsearch cluster whose version is V7.10 or later, this parameter has the following valid values: true, false, strict, and runtime.
      where:
      • true: indicates that Elasticsearch Writer writes new fields to the mappings of the index and the fields can be searched.
      • false: indicates that Elasticsearch Writer writes new fields to the mappings of the index but the fields cannot be searched.
      • strict: indicates that if Elasticsearch Writer detects new fields, it returns an error message and does not write the fields to the mappings of the index.
      • runtime: indicates that Elasticsearch Writer writes new fields to the mappings of the index as runtime fields but the fields cannot be searched.
      For more information, see the dynamic parameter for open source Elasticsearch.
    • Shards: the number of primary shards. An index can be divided into multiple primary shards. These primary shards can be distributed among different nodes to support distributed searches. When you create an index, you must specify the number of primary shards for the index. After the index is created, you cannot change the number. For more information, see Terms.
    • Replicas: the number of replica shards for each primary shard. The replica shards can be used for fault tolerance and to process the read request workloads of the cluster. If the capacity of the cluster is insufficient, only a single backup is required for each primary shard, or the cluster encounters bottlenecks in write performance, set Replicas to 1.
    • Statement for Creating Index: The field configurations are configured in properties. You can modify the types of the fields.
    Enable Partitioning for Elasticsearch IndexesSpecifies whether to enable the routing mechanism. You can customize the value of the routing parameter. The default value of routing is the ID of a document. A Hash function is used to convert the value of routing to obtain a number. The number is used to divide the number of primary shards to obtain a remainder. The remainder indicates the position of the document in the primary shards.
    Set Primary Key (By_Id)Set the method used to assign values to the IDs of Elasticsearch indexes during data synchronization.
    • Primary Key: uses one of the columns in the source table as the primary key.
    • Composite Primary Key: combines multiple columns in the source table to form the primary key.
    Field MappingConfigure field mappings between the source and destination. The synchronization node synchronizes data based on the field mappings.
  7. Click the Save icon icon in the top toolbar to save the configurations.