辞書ファイルを使用した類義語検索の有効化 - Elasticsearch (ES)

Alibaba Cloud Elasticsearch では、アップロードされた同義語辞書ファイルを更新できます。更新された同義語辞書ファイルをアップロードした後、Alibaba Cloud Elasticsearch クラスタに適用し、新しい辞書を検索に使用できます。同義語を使用するには、次のいずれかの方法を使用します。同義語辞書ファイルを使用する、同義語を参照する。このトピックでは、これらの方法を使用して同義語を使用する方法について説明します。

背景情報

このトピックで提供されるすべてのコマンドは、Kibana コンソールで実行できます。 Kibana コンソールへのログイン方法については、Kibana コンソールにログインするを参照してください。

方法 1：同義語辞書ファイルを使用する

前提条件：同義語辞書ファイルがアップロードされていること。詳細については、同義語辞書ファイルをアップロードするを参照してください。

次の例では、フィルターに基づいて同義語が使用され、aliyun_synonyms.txtbegin, start を含むファイルがテストファイルとして使用されます。

次のコマンドを実行してインデックスを作成します：

PUT /aliyun-index-test
{
  "settings": {
    "index":{
      "analysis": {
          "analyzer": {
            "by_smart": {
              "type": "custom",
              "tokenizer": "ik_smart",
              "filter": ["by_tfr","by_sfr"],
              "char_filter": ["by_cfr"]
            },
            "by_max_word": {
              "type": "custom",
              "tokenizer": "ik_max_word",
              "filter": ["by_tfr","by_sfr"],
              "char_filter": ["by_cfr"]
            }
         },
         "filter": {
            "by_tfr": { // ストップワードフィルター
              "type": "stop",
              "stopwords": [" "]
              },
           "by_sfr": { // 同義語フィルター
              "type": "synonym",
              "synonyms_path": "analysis/aliyun_synonyms.txt"
              }
          },
          "char_filter": { // 文字フィルター
            "by_cfr": {
              "type": "mapping",
              "mappings": ["| => |"]
            }
          }
      }
    }
  }
}

title 同義語フィールドを設定します。
- V7.0 より前のバージョンの Elasticsearch クラスタのコマンド
```
PUT /aliyun-index-test/_mapping/doc
{
"properties": {
 "title": {
   "type": "text",
   "analyzer": "by_max_word",
   "search_analyzer": "by_smart"
 }
}
}
```
- V7.0 以降のバージョンの Elasticsearch クラスタのコマンド
```
PUT /aliyun-index-test/_mapping/
{
"properties": {
 "title": {
   "type": "text",
   "analyzer": "by_max_word",
   "search_analyzer": "by_smart"
 }
}
}
```
  重要オープンソースの Elasticsearch 7.0 以降では、マッピングタイプは非推奨となり、_doc が自動的に使用されます。インデックスのマッピング設定でマッピングタイプを指定する必要はありません。マッピングタイプを指定すると、エラーが報告されます。

次のコマンドを実行して同義語を確認します。

GET /aliyun-index-test/_analyze
{
"analyzer": "by_smart",
"text":"begin"
}

コマンドが正常に実行されると、次の結果が返されます。

{
"tokens": [
 {
   "token": "begin",
   "start_offset": 0,
   "end_offset": 5,
   "type": "ENGLISH",
   "position": 0
 },
 {
   "token": "start",
   "start_offset": 0,
   "end_offset": 5,
   "type": "SYNONYM",
   "position": 0
 }
]
}

詳細なテストのためにデータを追加します。

V7.0 より前のバージョンの Elasticsearch クラスタのコマンド

PUT /aliyun-index-test/doc/1
{
"title": "Shall I begin?"
}

PUT /aliyun-index-test/doc/2
{
"title": "I start work at nine."
}

V7.0 以降のバージョンの Elasticsearch クラスタのコマンド

PUT /aliyun-index-test/_doc/1
{
"title": "Shall I begin?"
}

PUT /aliyun-index-test/_doc/2
{
"title": "I start work at nine."
}

次のコマンドを実行して検索テストを実行し、同義語を確認します。

GET /aliyun-index-test/_search
{
 "query" : { "match" : { "title" : "begin" }},
 "highlight" : {
     "pre_tags" : ["<red>", "<bule>"],
     "post_tags" : ["</red>", "</bule>"],
     "fields" : {
         "title" : {}
     }
 }
}

コマンドが正常に実行されると、次の結果が返されます。

{
"took": 11,
"timed_out": false,
"_shards": {
 "total": 5,
 "successful": 5,
 "failed": 0
},
"hits": {
 "total": 2,
 "max_score": 0.41048482,
 "hits": [
   {
     "_index": "aliyun-index-test",
     "_type": "doc",
     "_id": "2",
     "_score": 0.41048482,
     "_source": {
       "title": "I start work at nine."
     },
     "highlight": {
       "title": [
         "I <red>start</red> work at nine."
       ]
     }
   },
   {
     "_index": "aliyun-index-test",
     "_type": "doc",
     "_id": "1",
     "_score": 0.39556286,
     "_source": {
       "title": "Shall I begin?"
     },
     "highlight": {
       "title": [
         "Shall I <red>begin</red>?"
       ]
     }
   }
 ]
}
}

方法 2：同義語を参照する

次の例では、同義語が参照され、ik_smart トークナイザーが単語分割に使用されます。

次のコマンドを実行してインデックスを作成します。

PUT /my_index
{
 "settings": {
     "analysis": {
         "analyzer": { // アナライザー設定
             "my_synonyms": {
                 "filter": [
                     "lowercase",
                     "my_synonym_filter"
                 ],
                 "tokenizer": "ik_smart"
             }
         },
         "filter": { // フィルター設定
             "my_synonym_filter": { // 同義語フィルター
                 "synonyms": [
                     "begin,start"
                 ],
                 "type": "synonym"
             }
         }
     }
 }
}

上記のコマンドは、次の原則に基づいて動作します。

my_synonym_filter 同義語フィルターと synonym 辞書を設定します。
my_synonyms アナライザーを設定し、ik_smart トークナイザーを使用して単語を分割します。
単語分割が完了した後、ik_smart トークナイザーを使用してすべての文字を小文字に変換し、分割されたすべての単語を同義語として処理します。

title 同義語フィールドを設定します。
- V7.0 より前のバージョンの Elasticsearch クラスタのコマンド
```
PUT /my_index/_mapping/doc
{
"properties": {
 "title": {
   "type": "text",
   "analyzer": "my_synonyms"
 }
}
}
```
- V7.0 以降のバージョンの Elasticsearch クラスタのコマンド
```
PUT /my_index/_mapping/
{
"properties": {
 "title": {
   "type": "text",
   "analyzer": "my_synonyms"
 }
}
}
```
  重要オープンソースの Elasticsearch 7.0 以降では、マッピングタイプは非推奨となり、_doc が自動的に使用されます。インデックスのマッピング設定でマッピングタイプを指定する必要はありません。マッピングタイプを指定すると、エラーが報告されます。

次のコマンドを実行して同義語を確認します。

GET /my_index/_analyze
{
 "analyzer":"my_synonyms",
 "text":"Shall I begin?"
}

コマンドが正常に実行されると、次の結果が返されます。

{
"tokens": [
 {
   "token": "shall",
   "start_offset": 0,
   "end_offset": 5,
   "type": "ENGLISH",
   "position": 0
 },
 {
   "token": "i",
   "start_offset": 6,
   "end_offset": 7,
   "type": "ENGLISH",
   "position": 1
 },
 {
   "token": "begin",
   "start_offset": 8,
   "end_offset": 13,
   "type": "ENGLISH",
   "position": 2
 },
 {
   "token": "start",
   "start_offset": 8,
   "end_offset": 13,
   "type": "SYNONYM",
   "position": 2
 }
]
}

詳細なテストのためにデータを追加します。

V7.0 より前のバージョンの Elasticsearch クラスタのコマンド

PUT /my_index/doc/1
{
"title": "Shall I begin?"
}

PUT /my_index/doc/2
{
"title": "I start work at nine."
}

V7.0 以降のバージョンの Elasticsearch クラスタのコマンド

PUT /my_index/_doc/1
{
"title": "Shall I begin?"
}

PUT /my_index/_doc/2
{
"title": "I start work at nine."
}

次のコマンドを実行して検索テストを実行し、同義語を確認します。

GET /my_index/_search
{
"query" : { "match" : { "title" : "begin" }},
"highlight" : {
  "pre_tags" : ["<red>", "<bule>"],
  "post_tags" : ["</red>", "</bule>"],
  "fields" : {
      "title" : {}
  }
}
}

コマンドが正常に実行されると、次の結果が返されます。

{
"took": 11,
"timed_out": false,
"_shards": {
 "total": 5,
 "successful": 5,
 "failed": 0
},
"hits": {
 "total": 2,
 "max_score": 0.41913947,
 "hits": [
   {
     "_index": "my_index",
     "_type": "doc",
     "_id": "2",
     "_score": 0.41913947,
     "_source": {
       "title": "I start work at nine."
     },
     "highlight": {
       "title": [
         "I <red>start</red> work at nine."
       ]
     }
   },
   {
     "_index": "my_index",
     "_type": "doc",
     "_id": "1",
     "_score": 0.39556286,
     "_source": {
       "title": "Shall I begin?"
     },
     "highlight": {
       "title": [
         "Shall I <red>begin</red>?"
       ]
     }
   }
 ]
}
}