Configure Elasticsearch synonyms - Elasticsearch - Alibaba Cloud Documentation Center

Alibaba Cloud Elasticsearch allows you to update a synonym dictionary file that is uploaded. After you upload the updated synonym dictionary file, you can apply the file to Alibaba Cloud Elasticsearch clusters and use the new dictionary for searches. You can use synonyms by using one of the following methods: use a synonym dictionary file and reference synonyms. This topic describes how to use synonyms by using the methods.

Background information

All commands provided in this topic can be run in the Kibana console. For information about how to log on to the Kibana console, see Log on to the Kibana console.

Method 1: Use a synonym dictionary file

Prerequisites: A synonym dictionary file is uploaded. For more information, see Upload a synonym dictionary file.

In the following example, synonyms are used based on a filter, and the aliyun_synonyms.txt file that contains begin, start is used as a test file.

Run the following command to create an index:

PUT /aliyun-index-test
{
  "settings": {
    "index":{
      "analysis": {
          "analyzer": {
            "by_smart": {
              "type": "custom",
              "tokenizer": "ik_smart",
              "filter": ["by_tfr","by_sfr"],
              "char_filter": ["by_cfr"]
            },
            "by_max_word": {
              "type": "custom",
              "tokenizer": "ik_max_word",
              "filter": ["by_tfr","by_sfr"],
              "char_filter": ["by_cfr"]
            }
         },
         "filter": {
            "by_tfr": {
              "type": "stop",
              "stopwords": [" "]
              },
           "by_sfr": {
              "type": "synonym",
              "synonyms_path": "analysis/aliyun_synonyms.txt"
              }
          },
          "char_filter": {
            "by_cfr": {
              "type": "mapping",
              "mappings": ["| => |"]
            }
          }
      }
    }
  }
}

Configure the title synonym field.
- Command for an Elasticsearch cluster of a version earlier than V7.0
```
PUT /aliyun-index-test/_mapping/doc
{
"properties": {
 "title": {
   "type": "text",
   "analyzer": "by_max_word",
   "search_analyzer": "by_smart"
 }
}
}
```
- Command for an Elasticsearch cluster of V7.0 or later
```
PUT /aliyun-index-test/_mapping/
{
"properties": {
 "title": {
   "type": "text",
   "analyzer": "by_max_word",
   "search_analyzer": "by_smart"
 }
}
}
```
  Important In open source Elasticsearch 7.0 and later, mapping types are deprecated, and _doc is automatically used. You do not need to specify mapping types in the mapping configurations of an index. If you specify mapping types, an error is reported.

Run the following command to verify synonyms:

GET /aliyun-index-test/_analyze
{
"analyzer": "by_smart",
"text":"begin"
}

If the command is successfully run, the following result is returned:

{
"tokens": [
 {
   "token": "begin",
   "start_offset": 0,
   "end_offset": 5,
   "type": "ENGLISH",
   "position": 0
 },
 {
   "token": "start",
   "start_offset": 0,
   "end_offset": 5,
   "type": "SYNONYM",
   "position": 0
 }
]
}

Add data for further testing.

Command for an Elasticsearch cluster of a version earlier than V7.0

PUT /aliyun-index-test/doc/1
{
"title": "Shall I begin?"
}

PUT /aliyun-index-test/doc/2
{
"title": "I start work at nine."
}

Command for an Elasticsearch cluster of V7.0 or later

PUT /aliyun-index-test/_doc/1
{
"title": "Shall I begin?"
}

PUT /aliyun-index-test/_doc/2
{
"title": "I start work at nine."
}

Run the following command to perform a search test and verify synonyms:

GET /aliyun-index-test/_search
{
 "query" : { "match" : { "title" : "begin" }},
 "highlight" : {
     "pre_tags" : ["<red>", "<bule>"],
     "post_tags" : ["</red>", "</bule>"],
     "fields" : {
         "title" : {}
     }
 }
}

If the command is successfully run, the following result is returned:

{
"took": 11,
"timed_out": false,
"_shards": {
 "total": 5,
 "successful": 5,
 "failed": 0
},
"hits": {
 "total": 2,
 "max_score": 0.41048482,
 "hits": [
   {
     "_index": "aliyun-index-test",
     "_type": "doc",
     "_id": "2",
     "_score": 0.41048482,
     "_source": {
       "title": "I start work at nine."
     },
     "highlight": {
       "title": [
         "I <red>start</red> work at nine."
       ]
     }
   },
   {
     "_index": "aliyun-index-test",
     "_type": "doc",
     "_id": "1",
     "_score": 0.39556286,
     "_source": {
       "title": "Shall I begin?"
     },
     "highlight": {
       "title": [
         "Shall I <red>begin</red>?"
       ]
     }
   }
 ]
}
}

Method 2: Reference synonyms

In the following example, synonyms are referenced and the ik_smart tokenizer is used for word splitting.

Run the following command to create an index:

PUT /my_index
{
 "settings": {
     "analysis": {
         "analyzer": {
             "my_synonyms": {
                 "filter": [
                     "lowercase",
                     "my_synonym_filter"
                 ],
                 "tokenizer": "ik_smart"
             }
         },
         "filter": {
             "my_synonym_filter": {
                 "synonyms": [
                     "begin,start"
                 ],
                 "type": "synonym"
             }
         }
     }
 }
}

The preceding command works based on the following principles:

Configure the my_synonym_filter synonym filter and a synonym dictionary.
Configure the my_synonyms analyzer and use the ik_smart tokenizer to split words.
After word splitting is complete, use the ik_smart tokenizer to convert all letters into lowercase and process all split words as synonyms.

Configure the title synonym field.
- Command for an Elasticsearch cluster of a version earlier than V7.0
```
PUT /my_index/_mapping/doc
{
"properties": {
 "title": {
   "type": "text",
   "analyzer": "my_synonyms"
 }
}
}
```
- Command for an Elasticsearch cluster of V7.0 or later
```
PUT /my_index/_mapping/
{
"properties": {
 "title": {
   "type": "text",
   "analyzer": "my_synonyms"
 }
}
}
```
  Important In open source Elasticsearch 7.0 and later, mapping types are deprecated, and _doc is automatically used. You do not need to specify mapping types in the mapping configurations of an index. If you specify mapping types, an error is reported.

Run the following command to verify synonyms:

GET /my_index/_analyze
{
 "analyzer":"my_synonyms",
 "text":"Shall I begin?"
}

If the command is successfully run, the following result is returned:

{
"tokens": [
 {
   "token": "shall",
   "start_offset": 0,
   "end_offset": 5,
   "type": "ENGLISH",
   "position": 0
 },
 {
   "token": "i",
   "start_offset": 6,
   "end_offset": 7,
   "type": "ENGLISH",
   "position": 1
 },
 {
   "token": "begin",
   "start_offset": 8,
   "end_offset": 13,
   "type": "ENGLISH",
   "position": 2
 },
 {
   "token": "start",
   "start_offset": 8,
   "end_offset": 13,
   "type": "SYNONYM",
   "position": 2
 }
]
}

Add data for further testing.

Command for an Elasticsearch cluster of a version earlier than V7.0

PUT /my_index/doc/1
{
"title": "Shall I begin?"
}

PUT /my_index/doc/2
{
"title": "I start work at nine."
}

Command for an Elasticsearch cluster of V7.0 or later

PUT /my_index/_doc/1
{
"title": "Shall I begin?"
}

PUT /my_index/_doc/2
{
"title": "I start work at nine."
}

Run the following command to perform a search test and verify synonyms:

GET /my_index/_search
{
"query" : { "match" : { "title" : "begin" }},
"highlight" : {
  "pre_tags" : ["<red>", "<bule>"],
  "post_tags" : ["</red>", "</bule>"],
  "fields" : {
      "title" : {}
  }
}
}

If the command is successfully run, the following result is returned:

{
"took": 11,
"timed_out": false,
"_shards": {
 "total": 5,
 "successful": 5,
 "failed": 0
},
"hits": {
 "total": 2,
 "max_score": 0.41913947,
 "hits": [
   {
     "_index": "my_index",
     "_type": "doc",
     "_id": "2",
     "_score": 0.41913947,
     "_source": {
       "title": "I start work at nine."
     },
     "highlight": {
       "title": [
         "I <red>start</red> work at nine."
       ]
     }
   },
   {
     "_index": "my_index",
     "_type": "doc",
     "_id": "1",
     "_score": 0.39556286,
     "_source": {
       "title": "Shall I begin?"
     },
     "highlight": {
       "title": [
         "Shall I <red>begin</red>?"
       ]
     }
   }
 ]
}
}