Enable Synonym Search via Dictionary Files - Elasticsearch

Alibaba Cloud Elasticsearch supports two ways to configure synonyms for search: uploading a synonym dictionary file to the cluster, or defining synonyms inline in the index settings. Both approaches use the synonym token filter in a custom analyzer.

Run all commands in this topic from the Kibana console. For access instructions, see Log on to the Kibana console.

Method 1: Use a synonym dictionary file

Prerequisites

Before you begin, ensure that you have:

A synonym dictionary file uploaded to the cluster. See Upload a synonym dictionary file.

The examples below use analysis/aliyun_synonyms.txt, which contains the entry begin, start.

Step 1: Create an index

Create an index with a custom analyzer that references the uploaded synonym file:

PUT /aliyun-index-test
{
  "settings": {
    "index":{
      "analysis": {
          "analyzer": {
            "by_smart": {
              "type": "custom",
              "tokenizer": "ik_smart",
              "filter": ["by_tfr","by_sfr"],
              "char_filter": ["by_cfr"]
            },
            "by_max_word": {
              "type": "custom",
              "tokenizer": "ik_max_word",
              "filter": ["by_tfr","by_sfr"],
              "char_filter": ["by_cfr"]
            }
         },
         "filter": {
            "by_tfr": {
              "type": "stop",
              "stopwords": [" "]
              },
            "by_sfr": {
              "type": "synonym",
              "synonyms_path": "analysis/aliyun_synonyms.txt"
              }
           },
           "char_filter": {
             "by_cfr": {
               "type": "mapping",
               "mappings": ["| => |"]
             }
           }
       }
     }
   }
}

The by_sfr filter loads synonyms from the uploaded dictionary file. The by_max_word analyzer is used for indexing, and by_smart is used for search queries.

Step 2: Configure the mapping

Map the title field to use the custom analyzers. The command differs by cluster version.

Elasticsearch 7.0 or later:

PUT /aliyun-index-test/_mapping/
{
"properties": {
 "title": {
   "type": "text",
   "analyzer": "by_max_word",
   "search_analyzer": "by_smart"
 }
}
}

Earlier than Elasticsearch 7.0:

PUT /aliyun-index-test/_mapping/doc
{
"properties": {
 "title": {
   "type": "text",
   "analyzer": "by_max_word",
   "search_analyzer": "by_smart"
 }
}
}

Important

In open source Elasticsearch 7.0 and later, mapping types are deprecated and _doc is used automatically. Do not specify a mapping type in the path. If you do, an error is returned.

Step 3: Verify the analyzer

Confirm that the synonym filter is working by analyzing a test term:

GET /aliyun-index-test/_analyze
{
"analyzer": "by_smart",
"text":"begin"
}

The response should include both begin (type ENGLISH) and start (type SYNONYM) at the same position:

{
"tokens": [
 {
   "token": "begin",
   "start_offset": 0,
   "end_offset": 5,
   "type": "ENGLISH",
   "position": 0
 },
 {
   "token": "start",
   "start_offset": 0,
   "end_offset": 5,
   "type": "SYNONYM",
   "position": 0
 }
]
}

Step 4: Add test documents

Index two documents with different but synonymous terms.

Elasticsearch 7.0 or later:

PUT /aliyun-index-test/_doc/1
{
"title": "Shall I begin?"
}

PUT /aliyun-index-test/_doc/2
{
"title": "I start work at nine."
}

Earlier than Elasticsearch 7.0:

PUT /aliyun-index-test/doc/1
{
"title": "Shall I begin?"
}

PUT /aliyun-index-test/doc/2
{
"title": "I start work at nine."
}

Step 5: Test synonym search

Search for begin and verify that both documents are returned — including the one that contains start:

GET /aliyun-index-test/_search
{
 "query" : { "match" : { "title" : "begin" }},
 "highlight" : {
     "pre_tags" : ["<red>", "<bule>"],
     "post_tags" : ["</red>", "</bule>"],
     "fields" : {
         "title" : {}
     }
 }
}

A successful response returns both documents, with the matched terms highlighted:

{
"took": 11,
"timed_out": false,
"_shards": {
 "total": 5,
 "successful": 5,
 "failed": 0
},
"hits": {
 "total": 2,
 "max_score": 0.41048482,
 "hits": [
   {
     "_index": "aliyun-index-test",
     "_type": "doc",
     "_id": "2",
     "_score": 0.41048482,
     "_source": {
       "title": "I start work at nine."
     },
     "highlight": {
       "title": [
         "I <red>start</red> work at nine."
       ]
     }
   },
   {
     "_index": "aliyun-index-test",
     "_type": "doc",
     "_id": "1",
     "_score": 0.39556286,
     "_source": {
       "title": "Shall I begin?"
     },
     "highlight": {
       "title": [
         "Shall I <red>begin</red>?"
       ]
     }
   }
 ]
}
}

Both documents are matched because begin and start are defined as synonyms.

Method 2: Define inline synonyms

This method embeds synonym rules directly in the index settings using the synonyms parameter. Use this approach for simple, stable synonym sets that don't need frequent updates.

Step 1: Create an index with inline synonyms

PUT /my_index
{
 "settings": {
     "analysis": {
         "analyzer": {
             "my_synonyms": {
                 "filter": [
                     "lowercase",
                     "my_synonym_filter"
                 ],
                 "tokenizer": "ik_smart"
             }
         },
         "filter": {
             "my_synonym_filter": {
                 "synonyms": [
                     "begin,start"
                 ],
                 "type": "synonym"
             }
         }
     }
 }
}

This configuration does the following:

Defines a synonym filter (my_synonym_filter) with the inline synonym rule begin,start.
Creates the my_synonyms analyzer using the ik_smart tokenizer.
After tokenization, applies lowercase and my_synonym_filter to all tokens.

Step 2: Configure the mapping

Elasticsearch 7.0 or later:

PUT /my_index/_mapping/
{
"properties": {
 "title": {
   "type": "text",
   "analyzer": "my_synonyms"
 }
}
}

Earlier than Elasticsearch 7.0:

PUT /my_index/_mapping/doc
{
"properties": {
 "title": {
   "type": "text",
   "analyzer": "my_synonyms"
 }
}
}

Important

In open source Elasticsearch 7.0 and later, mapping types are deprecated and _doc is used automatically. Do not specify a mapping type in the path. If you do, an error is returned.

Step 3: Verify the analyzer

GET /my_index/_analyze
{
 "analyzer":"my_synonyms",
 "text":"Shall I begin?"
}

The response includes begin (type ENGLISH) and start (type SYNONYM) at the same position, confirming the synonym filter is active:

{
"tokens": [
 {
   "token": "shall",
   "start_offset": 0,
   "end_offset": 5,
   "type": "ENGLISH",
   "position": 0
 },
 {
   "token": "i",
   "start_offset": 6,
   "end_offset": 7,
   "type": "ENGLISH",
   "position": 1
 },
 {
   "token": "begin",
   "start_offset": 8,
   "end_offset": 13,
   "type": "ENGLISH",
   "position": 2
 },
 {
   "token": "start",
   "start_offset": 8,
   "end_offset": 13,
   "type": "SYNONYM",
   "position": 2
 }
]
}

Step 4: Add test documents

Elasticsearch 7.0 or later:

PUT /my_index/_doc/1
{
"title": "Shall I begin?"
}

PUT /my_index/_doc/2
{
"title": "I start work at nine."
}

Earlier than Elasticsearch 7.0:

PUT /my_index/doc/1
{
"title": "Shall I begin?"
}

PUT /my_index/doc/2
{
"title": "I start work at nine."
}

Step 5: Test synonym search

GET /my_index/_search
{
"query" : { "match" : { "title" : "begin" }},
"highlight" : {
  "pre_tags" : ["<red>", "<bule>"],
  "post_tags" : ["</red>", "</bule>"],
  "fields" : {
      "title" : {}
  }
}
}

The response returns both documents:

{
"took": 11,
"timed_out": false,
"_shards": {
 "total": 5,
 "successful": 5,
 "failed": 0
},
"hits": {
 "total": 2,
 "max_score": 0.41913947,
 "hits": [
   {
     "_index": "my_index",
     "_type": "doc",
     "_id": "2",
     "_score": 0.41913947,
     "_source": {
       "title": "I start work at nine."
     },
     "highlight": {
       "title": [
         "I <red>start</red> work at nine."
       ]
     }
   },
   {
     "_index": "my_index",
     "_type": "doc",
     "_id": "1",
     "_score": 0.39556286,
     "_source": {
       "title": "Shall I begin?"
     },
     "highlight": {
       "title": [
         "Shall I <red>begin</red>?"
       ]
     }
   }
 ]
}
}