edit-icon download-icon

Configure ElasticSearch Writer

Last Updated: Apr 03, 2018

Quick Information

Plug-in for importing data into ElasticSearch

How it works

Write the data read from Reader into ElasticSearch in bulk with REST API of ElasticSearch.

Features

Parameter description

  • endpoint

    • Description: The URL to ElasticSearch.

    • Required: No

    • Default value: None
  • accessID

    • Description: User in http auth.

    • Required: No

    • Default value: None
  • accessKey

    • Description: Password in http auth.

    • Required: No

    • Default value: None
  • index

    • Description: Index name in ElasticSearch.

    • Required: No

    • Default value: None
  • indexType

    • Description: Type name of index in ElasticSearch.

    • Required: No

    • Default value: elasticsearch
  • cleanup

    • Description: Whether to delete the original table.

    • Required: No

    • Default value: False
  • batchSize

    • Description: Number of entries in the batch data each time.

    • Required: No

    • Default value: 1000
  • trySize

    • Description: Number of retries after failure.

    • Required: No

    • Default value: 30
  • timeout

    • Description: Timeout value of the client.

    • Required: No

    • Default value: 600000
  • discovery

    • Description: When Node Discovery is enabled, server list in the client is polled and regularly updated.

    • Required: No

    • Default value: False
  • compression

    • Description: http request, compression enabled.

    • Required: No

    • Default value: True
  • multiThread

    • Description: http request, multiple threads or not.

    • Required: No

    • Default value: True
  • ignoreWriteError

    • Description: Ignore writing error and keep writing without retries.

    • Required: No

    • Default value: False
  • ignoreParseError

    • Description: Ignore format error of parsing data and keep writing.

    • Required: No

    • Default value: True
  • alias

    • Description: Write alias after the data is imported.

    • Required: No

    • Default value: None
  • aliasMode

    • Description: Modes of adding an alias after the data is imported: append and exclusive.

    • Required: No

    • Default value: Append
  • settings

    • Description: Settings to create index are the same as the official settings of ElasticSearch.

    • Required: No

    • Default value: None
  • splitter

    • Description: Use specified separators if the data to be inserted is an array.

    • Required: No

    • Default value: -,-
  • column

    • Description: The sample contains all field types supported by ElasticSearch.

    • Required: Yes

    • Default value: None
  • dynamic

    • Description: Use automatic mappings of ElasticSearch instead of DataX mappings.

    • Required: No

    • Default value: False

Development in script mode

  1. {
  2. "job": {
  3. "setting": {
  4. ...
  5. },
  6. "content": [
  7. {
  8. "reader": {
  9. ...
  10. },
  11. "writer": {
  12. "name": "elasticsearchwriter",
  13. "parameter": {
  14. "endpoint": "xxxx",
  15. "accessId": "xxx",
  16. "accessKey": "xxx",
  17. "index": "test-1",
  18. "type": "default",
  19. "cleanup": true,
  20. "settings": {"index" :{"number_of_shards": 1, "number_of_replicas": 0}},
  21. "discovery": false,
  22. "batchSize": 1000,
  23. "splitter": ",",
  24. "column": [
  25. {"name": "pk", "type": "id"},
  26. { "name": "col_ip","type": "ip" },
  27. { "name": "col_double","type": "double" },
  28. { "name": "col_long","type": "long" },
  29. { "name": "col_integer","type": "integer" },
  30. { "name": "col_keyword", "type": "keyword" },
  31. { "name": "col_text", "type": "text", "analyzer": "ik_max_word"},
  32. { "name": "col_geo_point", "type": "geo_point" },
  33. { "name": "col_date", "type": "date", "format": "yyyy-MM-dd HH:mm:ss"},
  34. { "name": "col_nested1", "type": "nested" },
  35. { "name": "col_nested2", "type": "nested" },
  36. { "name": "col_object1", "type": "object" },
  37. { "name": "col_object2", "type": "object" },
  38. { "name": "col_integer_array", "type":"integer", "array":true},
  39. { "name": "col_geo_shape", "type":"geo_shape", "tree": "quadtree", "precision": "10m"}
  40. ]
  41. }
  42. }
  43. }
  44. ]
  45. }
  46. }

Note: The ElasticSearch in VPC environment can only use custom scheduling resources. Running on the default resource group will lead to network failure. For more information about how to add scheduling resources, see Add scheduling resources.

Thank you! We've received your feedback.