This topic describes the parameters that are supported by Elasticsearch Writer and how to configure Elasticsearch Writer by using the code editor.
Limits
You can add Elasticsearch V5.X, V6.X, and V7.X data sources to DataWorks. Self-managed Elasticsearch data sources are not supported.
Background information
Elasticsearch Writer can write data to Elasticsearch V5.X clusters by using the shared resource group for Data Integration and to Elasticsearch V5.X, V6.X, and V7.X clusters by using exclusive resource groups for Data Integration. For more information about exclusive resource groups for Data Integration, see Create and use an exclusive resource group for Data Integration.
Relational database instance -> Database -> Table -> Row -> Column
Elasticsearch -> Index -> Type -> Document -> Field
Elasticsearch can contain multiple indexes (databases). Each index can contain multiple types (tables). Each type can contain multiple documents (rows). Each document can contain multiple fields (columns). Elasticsearch Writer obtains data records from a reader and uses the RESTful API of Elasticsearch to write the data records to Elasticsearch in batches.
Parameters
Parameter | Description | Required | Default value |
---|---|---|---|
endpoint | The endpoint of Elasticsearch. Specify the endpoint in the http://example.com:9999 format.
|
No | No default value |
accessId | The AccessKey ID that is used to connect to the destination Elasticsearch cluster.
The AccessKey ID is used for authentication before a connection to the Elasticsearch
cluster can be established.
Note The accessId and accessKey parameters are required. If you do not specify the parameters,
an error is returned. If you use a self-managed Elasticsearch cluster for which basic
access authentication is not configured, the AccessKey ID and AccessKey secret are
not required. In this case, you can set the accessId and accessKey parameters to random
values.
|
No | No default value |
accessKey | The AccessKey secret that is used to connect to the destination Elasticsearch cluster. | No | No default value |
index | The name of the index in the destination Elasticsearch cluster. | No | No default value |
indexType | The name of the index type in the destination Elasticsearch cluster. | No | Elasticsearch |
cleanup | Specifies whether to delete the existing data from the index. To delete the existing data, you must delete and recreate the index. The default value of this parameter is false, which indicates that the existing data in the index is retained. | No | false |
batchSize | The number of data records to write at a time. | No | 1,000 |
trySize | The maximum number of retries allowed after a failure. | No | 30 |
timeout | The timeout period of the connection to the client. | No | 600,000 |
discovery | Specifies whether to enable node discovery. If node discovery is enabled, the server list in the client is polled and regularly updated. | No | false |
compression | Specifies whether to enable compression for an HTTP request. | No | true |
multiThread | Specifies whether to use multiple threads for an HTTP request. | No | true |
ignoreWriteError | Specifies whether to ignore write errors and proceed with data write without retries. | No | false |
ignoreParseError | Specifies whether to ignore format parsing errors and proceed with data write. | No | true |
alias | The alias feature of Elasticsearch is similar to the view feature of a database. For
example, if you create an alias named my_index_alias for the index my_index, the operations
on my_index_alias also take effect on my_index.
If you configure the alias parameter, the alias that you specify in this parameter is created for the index after data is written to the index. |
No | No default value |
aliasMode | The mode in which an alias is added after data is written to the index. Valid values:
append and exclusive.
Elasticsearch Writer can convert aliases to actual index names. You can use aliases to migrate data from one index to another index, search for data across multiple indexes in a unified manner, and create a view on a subset of data in an index. |
No | append |
splitter | The delimiter (-,-) based on which Elasticsearch Writer splits the source data if
the source data is an array.
For example, the source column stores the |
No | -,- |
settings | The settings of the index. The settings must follow official Elasticsearch specifications. | No | No default value |
column | The fields of the document. The parameters for each field include basic parameters
such as name and type, and advanced parameters such as analyzer, format, and array.
Elasticsearch supports the following field types:
The following information describes the field types:
If you set the array parameter to true for a field, the field is an array column. In this case, Elasticsearch Writer splits
the source data based on the delimiter that is specified by the splitter parameter,
converts the data to an array of strings, and writes the array to the index. Only
one type of delimiter is supported for one node. Example:
|
Yes | No default value |
dynamic | If you set this parameter to true, Elasticsearch Writer uses the mapping configuration of the destination Elasticsearch
cluster instead of the mapping configuration of Data Integration.
In Elasticsearch V7.X, the default value of the type parameter is _doc. If you use the mapping configuration of the destination Elasticsearch cluster, set the type parameter to _doc and the esVersion parameter to 7. You must add the following parameter configuration that specifies the version information
to the code: |
No | false |
actionType | The type of action for writing data to the destination Elasticsearch cluster. Data
Integration supports only the following action types: index and update. Default value: index.
|
No | index |
Configure Elasticsearch Writer by using the code editor
For more information about how to configure a synchronization node by using the code editor, see Create a sync node by using the code editor.
{
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
},
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"throttle":true,// Specifies whether to enable bandwidth throttling. The value false indicates that bandwidth throttling is disabled, and the value true indicates that bandwidth throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true.
"concurrent":1, // The maximum number of parallel threads.
"mbps":"12"// The maximum transmission rate.
}
},
"steps": [
{
"category": "reader",
"name": "Reader",
"parameter": {
},
"stepType": "stream"
},
{
"category": "writer",
"name": "Writer",
"parameter": {
"endpoint": "http://example.com:9999",
"accessId": "xxxx",
"accessKey": "yyyy",
"index": "test-1",
"type": "default",
"cleanup": true,
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0
}
},
"discovery": false,
"batchSize": 1000,
"splitter": ",",
"column": [
{
"name": "pk",
"type": "id"
},
{
"name": "col_ip",
"type": "ip"
},
{
"name": "col_double",
"type": "double"
},
{
"name": "col_long",
"type": "long"
},
{
"name": "col_integer",
"type": "integer"
},
{
"name": "col_keyword",
"type": "keyword"
},
{
"name": "col_text",
"type": "text",
"analyzer": "ik_max_word"
},
{
"name": "col_geo_point",
"type": "geo_point"
},
{
"name": "col_date",
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
{
"name": "col_nested1",
"type": "nested"
},
{
"name": "col_nested2",
"type": "nested"
},
{
"name": "col_object1",
"type": "object"
},
{
"name": "col_object2",
"type": "object"
},
{
"name": "col_integer_array",
"type": "integer",
"array": true
},
{
"name": "col_geo_shape",
"type": "geo_shape",
"tree": "quadtree",
"precision": "10m"
}
]
},
"stepType": "elasticsearch"
}
],
"type": "job",
"version": "2.0"
}