This topic describes the data types and parameters that are supported by Elasticsearch Writer. For example, you can configure write modes, field mappings, and connections for Elasticsearch Writer. This topic also provides an example to describe how to configure Elasticsearch Writer.

Background information

The shared resource group supports only Elasticsearch Writer for Elasticsearch V5.X. Exclusive resource groups for Data Integration support Elasticsearch Writer for Elasticsearch V5.X, V6.X, and V7.X. For more information about exclusive resource groups for Data Integration, see Exclusive resource groups for Data Integration.

Elasticsearch is an open source product that complies with the Apache open standards. It is a mainstream enterprise-class search engine. Elasticsearch is a Lucene-based data search and analysis tool that provides distributed services. The following information shows the mappings between Elasticsearch core concepts and database core concepts:
Relational database (instance) -> database -> table -> row -> column
Elasticsearch        -> Index              -> Types       -> Documents       -> Fields

Elasticsearch can contain multiple indexes (databases). Each index can contain multiple types (tables). Each type can contain multiple documents (rows). Each document can contain multiple fields (columns). Elasticsearch Writer uses the Rest API of Elasticsearch to write multiple data records that are retrieved by a reader to Elasticsearch at a time.

Parameters

Parameter Description Required Default value
endpoint The endpoint of Elasticsearch, in the format of http://xxxx.com:9999. No N/A
accessId The AccessKey ID that is used to connect to Elasticsearch. The AccessKey ID is used for authorization before a connection to Elasticsearch can be established.
Note The accessId and accessKey parameters are required. If you do not set the parameters, an error is returned. If you use self-managed Elasticsearch for which basic authentication is not configured, the AccessKey ID and AccessKey secret are not required. In this case, you can set the accessId and accessKey parameters to random values.
No N/A
accessKey The AccessKey secret that is used to connect to Elasticsearch. No N/A
index The index name in Elasticsearch. No N/A
indexType The type name in the index of Elasticsearch. No Elasticsearch
cleanup Specifies whether to clear the existing data in the index. To clear the existing data, you must delete and rebuild the index. The default value false indicates that the existing data in the index is retained. No false
batchSize The number of data records to write at a time. No 1,000
trySize The number of retries after a failure. No 30
timeout The connection timeout period of the client. No 600,000
discovery Specifies whether to enable Node Discovery. When Node Discovery is enabled, the server list in the client is polled and regularly updated. No false
compression Specifies whether to enable compression for an HTTP request. No true
multiThread Specifies whether to use multiple threads for an HTTP request. No true
ignoreWriteError Specifies whether to ignore write errors and proceed with writing without retries. No false
ignoreParseError Specifies whether to ignore format parsing errors and proceed with writing. No true
alias The alias feature of Elasticsearch is similar to the view feature of a database. For example, if you create an alias named my_index_alias for the index my_index, the operations on my_index_alias also take effect on my_index.

If you specify the alias parameter for an index, an alias is created for the index after the data is imported.

No N/A
aliasMode The mode in which an alias is added after the data is imported. Valid values: append and exclusive.
  • When the aliasMode parameter is set to append, an alias is added to the current index. One alias maps multiple indexes.
  • When the aliasMode parameter is set to exclusive, the existing alias of the current index is deleted and a new alias is added. One alias maps one index.

Elasticsearch Writer can convert aliases to actual index names. By using aliases, you can migrate data from one index to another, search data across multiple indexes in a unified manner, and create a view on a subset of data in an index.

No append
splitter The delimiter (-,-) that is used to split the source data if you are inserting an array to Elasticsearch.

Assume that the source column stores data a-,-b-,-c-,-d of the STRING type. The delimiter (-,-) is used to split the source data and the array ["a", "b", "c", "d"] is obtained. Then, the array is written to the corresponding field in Elasticsearch.

No -,-
settings The settings of an index. The settings must be in accordance with Elasticsearch official specifications. No N/A
column The fields of the document. The parameters for each field include basic parameters such as name and type and advanced parameters such as analyzer, format, and array.
Elasticsearch supports the following field types:
- id  // The id type corresponds to the _id type in Elasticsearch, and can be considered as the unique primary key. Data with the same ID will be overwritten and not indexed.
- string
- text
- keyword
- long
- integer
- short
- byte
- double
- float
- date
- boolean
- binary
- integer_range
- float_range
- long_range
- double_range
- date_range
- geo_point
- geo_shape
- ip
- token_count
- array
- object
- nested
The following content describes the field types:
  • If the field type is text, you can specify the analyzer, norms, and index_options parameters. Example:
    {
        "name": "col_text",
        "type": "text",
        "analyzer": "ik_max_word"
        }
  • If the field type is date, you can specify the format and timezone parameters, which indicate the date serialization format and the time zone. You can also specify the origin parameter instead of the timezone parameter. Example:
    {
        "name": "col_date",
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss",
        "origin": true
        }
    Note You must specify one of the timezone and origin parameters.
    • If you specify the origin parameter, Elasticsearch Writer updates the mappings between aliases and indexes and writes data to Elasticsearch in the original format. We recommend that you specify the origin parameter.
    • If you want to use the Data Integration service to convert the time zone, delete the origin parameter and specify the timezone parameter.
  • If the field type is geo_shape, you can specify the tree (geohash or quadtree) and precision parameters. Example:
    {
        "name": "col_geo_shape",
        "type": "geo_shape",
        "tree": "quadtree",
        "precision": "10m"
        }
If you specify the array parameter for a field and set the array parameter to true, the field is an array column. Elasticsearch Writer uses the delimiter that is specified by the splitter parameter to split the source data, converts the data to an array of strings, and writes the array to the destination. Only one delimiter is supported for one node. Example:
{
    "name": "col_integer_array",
    "type": "integer",
    "array": true
    }
Yes N/A
dynamic A value of true indicates that Elasticsearch Writer uses the mapping configuration of Elasticsearch instead of the mapping configuration of Data Integration.

In Elasticsearch V7.X, the default value of the type parameter is _doc. When you use the mapping configuration of Elasticsearch, set _doc and the esVersion parameter to 7.

You must add the following parameter configuration that specifies the version information to the code: "esVersion": "7".

No false
actionType The type of the action for writing data to Elasticsearch. Data Integration supports only the following action types: index and update. Default value: index.
  • index: Data Integration uses Index.Builder of the Elasticsearch SDK to construct a request for writing multiple data records at a time. In the index mode, Elasticsearch first checks whether an ID is specified for the document to be inserted.
    • If no ID is specified, Elasticsearch generates a unique ID. In this case, the document is directly inserted to Elasticsearch.
    • If an ID is specified, Elasticsearch replaces the existing document with the document to be inserted. You cannot modify specific fields in the document.
      Note The replacement in this case is different from that in Elasticsearch where specific fields can be modified.
  • update: Data Integration uses Update.Builder of the Elasticsearch SDK to construct a request for writing multiple data records at a time. In update mode, Elasticsearch calls the get method of InternalEngine to obtain the information about the original document for each update. This way, you can modify specific fields. In update mode, you must obtain the information about the original document for each update, which greatly affects the performance. However, you can modify specific fields in this mode. If the original document does not exist, the new document is directly inserted.
No index

Code editor mode

For more information about the code editor mode, see Create a sync node by using the code editor.

The following example shows how to use the code editor. For more information about the parameters, see the preceding parameter description.
{
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    },
    "setting": {
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "concurrent": 1,
            "throttle": false
        }
    },
    "steps": [
        {
            "category": "reader",
            "name": "Reader",
            "parameter": {

            },
            "stepType": "stream"
        },
        {
            "category": "writer",
            "name": "Writer",
            "parameter": {
                "endpoint": "http://xxxx.com:9999",
                "accessId": "xxxx",
                "accessKey": "yyyy",
                "index": "test-1",
                "type": "default",
                "cleanup": true,
                "settings": {
                    "index": {
                        "number_of_shards": 1,
                        "number_of_replicas": 0
                    }
                },
                "discovery": false,
                "batchSize": 1000,
                "splitter": ",",
                "column": [
                    {
                        "name": "pk",
                        "type": "id"
                    },
                    {
                        "name": "col_ip",
                        "type": "ip"
                    },
                    {
                        "name": "col_double",
                        "type": "double"
                    },
                    {
                        "name": "col_long",
                        "type": "long"
                    },
                    {
                        "name": "col_integer",
                        "type": "integer"
                    },
                    {
                        "name": "col_keyword",
                        "type": "keyword"
                    },
                    {
                        "name": "col_text",
                        "type": "text",
                        "analyzer": "ik_max_word"
                    },
                    {
                        "name": "col_geo_point",
                        "type": "geo_point"
                    },
                    {
                        "name": "col_date",
                        "type": "date",
                        "format": "yyyy-MM-dd HH:mm:ss"
                    },
                    {
                        "name": "col_nested1",
                        "type": "nested"
                    },
                    {
                        "name": "col_nested2",
                        "type": "nested"
                    },
                    {
                        "name": "col_object1",
                        "type": "object"
                    },
                    {
                        "name": "col_object2",
                        "type": "object"
                    },
                    {
                        "name": "col_integer_array",
                        "type": "integer",
                        "array": true
                    },
                    {
                        "name": "col_geo_shape",
                        "type": "geo_shape",
                        "tree": "quadtree",
                        "precision": "10m"
                    }
                ]
            },
            "stepType": "elasticsearch"
        }
    ],
    "type": "job",
    "version": "2.0"
}
Note A connection failure may occur if you use the default resource group to connect to Elasticsearch that is deployed in a virtual private cloud (VPC). To write data to Elasticsearch that is deployed in a VPC, use exclusive resource groups for Data Integration or custom resource groups. For more information about how to add these two types of resource groups, see Exclusive resources for Data Integration and Add a custom resource group.