All Products
Search
Document Center

Elasticsearch:Use the analysis-ik plug-in

Last Updated:Aug 19, 2025

The analysis-ik plug-in is a Chinese tokenization extension for Alibaba Cloud Elasticsearch (ES). It includes several built-in default dictionaries that you can use directly. You can also update the dictionaries to customize the default dictionaries of the analysis-ik plug-in or add new ones. This process optimizes tokenization and makes the results more suitable for your scenarios. The analysis-ik plug-in supports dynamic loading of dictionary files from Object Storage Service (OSS) for remote management and improved operational efficiency. This topic describes the tokenization rules and dictionary types of the analysis-ik plug-in, and explains how to update dictionaries and use the plug-in.

Background information

The analysis-ik plug-in consists of three main components: tokenizers, dictionary files, and dictionary update mechanisms.

  • Tokenizer: Splits Chinese text into meaningful words (tokens) and determines the tokenization granularity.

  • Dictionary files: Provide the vocabulary base that the tokenizer uses for tokenization. These files support extension and customization.

  • Dictionary update methods: Supports cold and hot updates. This lets you flexibly adjust dictionaries as needed to ensure that the tokenization meets your business requirements.

Tokenization rules

The analysis-ik plug-in supports the following tokenization rules:

  • ik_max_word: Splits text at the finest granularity, which makes it suitable for term queries.

  • ik_smart: A tokenizer that splits text at a coarse granularity, making it suitable for phrase queries.

Dictionary types

The following table describes the dictionary types supported by the analysis-ik plug-in.

Dictionary type

Description

Dictionary file requirement

Supported update methods

Main dictionary

The default built-in main dictionary is main.dic, which contains more than 270,000 Chinese words.

If you specify the main dictionary when you create an ES index, the cluster matches the data written to the index against the words in the main dictionary. The cluster then creates an index for the matched words. You can then retrieve the index using the corresponding keywords.

One word per line, saved as a UTF-8 encoded DIC file.

Stopword dictionary

The default built-in stopword dictionary is stopword.dic. It contains English stopwords, such as a, the, and, at, and but.

If you specify a stopword dictionary when you create an ES index, the cluster matches the data written to the index against the words in the stopword dictionary. The matched words are filtered out and do not appear in the inverted index.

Preposition dictionary

The default built-in preposition dictionary is preposition.dic. It stores prepositions to help the tokenizer split prepositions from subsequent words.

Not applicable

Cold update

Quantifier dictionary

The default built-in quantifier dictionary is quantifier.dic. It stores unit-related words and quantifiers to help the tokenizer identify combinations of quantifiers and nouns.

suffix.dic

Stores suffixes to help the tokenizer split words with suffixes.

Not applicable

Updates are not supported.

surname.dic

Stores Chinese surnames to help the tokenizer recognize personal names.

Dictionary update methods

If the default dictionaries do not meet your business requirements, you can update them. The analysis-ik plug-in supports the following dictionary update methods.

Update method

Description

Scenarios

Cold update

Dictionary changes take effect by restarting the ES cluster. This method updates the dictionaries for the entire cluster.

The system sends the uploaded dictionary file to the ES nodes and then restarts the nodes to load the file. After the restart, the new configuration takes effect.

  • Replace the default dictionary file or delete content from the default dictionary file.

  • Update the preposition or quantifier dictionary file.

Hot update

  • If only the content of an existing dictionary changes, a cluster restart is not triggered. The cluster directly loads the new dictionary during runtime to implement dynamic updates.

  • If the name of an existing dictionary or the list of dictionary files changes (that is, a file is added or deleted), a cluster restart is triggered to reload the dictionary configuration. After the restart, the new configuration takes effect.

Note

Only the main and stopword dictionaries can be changed.

  • Extend the main or stopword dictionary files. This means you need to add other extension dictionaries in addition to the default main or stopword dictionaries.

  • Change the content of existing main or stopword dictionary files, including default and extension dictionaries.

Prerequisites

  • Make sure that the instance is in the Normal status. You can view the instance status on the Basic Information page of the instance.

    Note

    The operations in this topic are demonstrated on an Alibaba Cloud ES 7.10.0 instance. The console interface and supported features may vary for different versions. The actual console interface takes precedence.

  • (Optional) To update a dictionary, you must complete the following steps.

    • To update a dictionary using the Upload OSS File method, you must first create an OSS bucket and upload the required dictionary file.

    • To update using the Upload Local File method, first save the required dictionary file to your computer.

Update IK dictionaries

If the default IK dictionaries do not meet your business requirements, you can update them. Before you update, familiarize yourself with the corresponding update methods. For an index that is configured with IK tokenization, an updated dictionary takes effect only on new data, which includes newly added and updated data. If you want the update to take effect for all data, you must re-create the index.

Cold update

To perform a cold update on an IK dictionary, follow these steps:

Warning

A standard update triggers a cluster restart. To ensure that your business is not affected, we recommend that you perform the update during off-peak hours.

  1. Go to the instance details page.

    1. Log on to the Alibaba Cloud Elasticsearch console.

    2. In the left navigation pane, click Elasticsearch Instances.

    3. In the top menu bar, select a resource group and a region.

    4. In the Elasticsearch Instances list, click the ID of the target instance to go to its details page.

  2. Go to the cold update page for the analysis-ik plug-in.

    1. In the left navigation pane, choose Configuration and Management > Plug-in Configuration.

    2. On the Built-in Plug-ins tab, find the analysis-ik plug-in, and then click Standard Update in the Actions column.

  3. Perform the cold update.

    1. In the IK Dictionary Configuration - Cold Update dialog box, find the dictionary that you want to update and click Edit. Follow the on-screen instructions to upload the dictionary file and click Save.

      You can upload a dictionary file in one of the following ways:

      • Upload Local File: Click the image icon or drag a local file to upload.

      • Upload OSS File: Enter the bucket name and the name of the dictionary file. Then, click Add.

        • The bucket and the Alibaba Cloud ES instance must be in the same region.

        • Automatic synchronization of dictionary files in OSS is not supported. If the content of the source dictionary file in OSS changes, you must manually perform an IK dictionary update to apply the changes.

      Note
    2. To restart the instance, select the risk notification check box and click OK.

      After the ES instance restarts, the dictionary file is updated.

  4. (Optional) Verify that the dictionary is effective.

    1. Log on to the Kibana console.

    2. Click the image icon in the upper-left corner and choose Management > Dev Tools to open the Code Editor page.

      For example, run the following code to split the input text computer Chinese character input method at a coarse-grained level.

      Note

      When you use this code, replace text with a word from your dictionary.

      GET _analyze
      {
        "analyzer": "ik_smart",
        "text": "computer Chinese character input method"
      }

      The expected result is as follows.

      {
        "tokens" : [
          {
            "token" : "computer",
            "start_offset" : 0,
            "end_offset" : 3,
            "type" : "CN_WORD",
            "position" : 0
          },
          {
            "token" : "Chinese character input",
            "start_offset" : 3,
            "end_offset" : 7,
            "type" : "CN_WORD",
            "position" : 1
          },
          {
            "token" : "method",
            "start_offset" : 7,
            "end_offset" : 9,
            "type" : "CN_WORD",
            "position" : 2
          }
        ]
      }

Hot update

To perform a hot update on an IK dictionary, follow these steps:

Note

A cluster restart is not required if only the file content changes. If you change the number of files or the file names, the cluster must be restarted. To avoid business interruptions, perform this operation during off-peak hours. After the restart, the dictionary takes effect automatically.

  1. Go to the instance details page.

    1. Log on to the Alibaba Cloud Elasticsearch console.

    2. In the left navigation pane, click Elasticsearch Instances.

    3. In the top menu bar, select a resource group and a region.

    4. In the Elasticsearch Instances list, click the ID of the target instance to go to its details page.

  2. Go to the hot update page for the analysis-ik plug-in.

    1. In the left navigation pane, choose Configuration and Management > Plug-in Configuration.

    2. On the Built-in Plug-ins tab, in the Actions column for the analysis-ik plug-in, click Rolling Update.

  3. Perform the hot update.

    1. In the IK Dictionary Configuration - Hot Update dialog box, click Edit for the target dictionary. Follow the on-screen instructions to upload the dictionary file and click Save.

      You can upload a dictionary file in one of the following ways:

      • Upload Local File: Click the image icon or drag a local file to upload.

      • Upload OSS File: Enter the bucket name and the name of the dictionary file. Then, click Add.

        • The bucket and the Alibaba Cloud ES instance must be in the same region.

        • Automatic synchronization of dictionary files in OSS is not supported. If the content of the source dictionary file in OSS changes, you must manually perform an IK dictionary update to apply the changes.

      Note
      • You can upload multiple dictionary files. The file extension must be .dic. A file name can contain uppercase letters, lowercase letters, digits, and underscores (_) and must be 30 characters or less.

      • To modify an uploaded dictionary file, click the 下载按钮 icon to the right of the file to download and modify it. Then, delete the original file and upload the modified version. You must click Save after deleting the original file. Otherwise, you will receive an error message that a file with the same name already exists when you try to upload the modified file.

    2. Click OK and wait for the dictionary to load on the ES nodes.

      The plug-in on the Alibaba Cloud ES nodes can automatically load dictionary files, but the time it takes for each node to retrieve the file varies. After the file is loaded, the dictionary takes effect. This process may take some time.

  4. (Optional) Verify that the dictionary is effective.

    1. Log on to the Kibana console.

    2. Click the image icon in the upper-left corner and choose Management > Dev Tools to open the Code Editor page.

      For example, run the following code to split the input text computer Chinese character input method at a coarse-grained level.

      Note

      When you use this code, replace text with a word from your dictionary.

      GET _analyze
      {
        "analyzer": "ik_smart",
        "text": "computer Chinese character input method"
      }

      The expected result is as follows.

      {
        "tokens" : [
          {
            "token" : "computer",
            "start_offset" : 0,
            "end_offset" : 3,
            "type" : "CN_WORD",
            "position" : 0
          },
          {
            "token" : "Chinese character input",
            "start_offset" : 3,
            "end_offset" : 7,
            "type" : "CN_WORD",
            "position" : 1
          },
          {
            "token" : "method",
            "start_offset" : 7,
            "end_offset" : 9,
            "type" : "CN_WORD",
            "position" : 2
          }
        ]
      }

Use the analysis-ik plug-in

This example shows how to use the IK tokenizer and the Pinyin filter to tokenize specified text.

  1. Go to the Kibana Dev Tools page of the ES instance.

    1. Log on to the Kibana console.

    2. Click the image icon in the upper-left corner and choose Management > Dev Tools to open the code editor.

  2. Create an index and configure the IK tokenizer and Pinyin filter.

    On the Dev Tools page, run the following command to create the ik_pinyin index and a custom analyzer named ik_pinyin_analyzer. This analyzer uses the ik_max_word fine-grained tokenization rule and a Pinyin filter to convert Chinese words into Pinyin.

    Note

    The Pinyin filter runs after Chinese tokenization is complete. It first tokenizes the Chinese text and then converts the tokenization results into Pinyin for output.

    PUT ik_pinyin
    {
      "settings":{
        "analysis": {
          "filter": {
            "my_pinyin" : {
                "type" : "pinyin",
                "keep_separate_first_letter" : false,
                "keep_full_pinyin" : true,
                "keep_original" : true,
                "limit_first_letter_length" : 16,
                "lowercase" : true,
                "remove_duplicated_term" : true
              }
          },
          "analyzer": {
            "ik_pinyin_analyzer": {
              "type": "custom",
              "tokenizer": "ik_max_word",
              "filter": ["my_pinyin"]
            }
          }
        }
      },
      "mappings":{
        "properties":{
          "text":{
            "type" : "text",
            "analyzer" : "ik_pinyin_analyzer"
          }
        }
      }
    }

    The core parameters are described as follows:

    • Pinyin Filter (filter)

      Note

      For more information about the configurations of the Pinyin analysis plug-in, see Pinyin Analysis for Elasticsearch.

      Parameter

      Description

      my_pinyin

      The name of the Pinyin filter that you define.

      type

      Set to pinyin to specify the Pinyin filter.

      keep_separate_first_letter

      Set to false to not keep the first letter of each character.

      keep_full_pinyin

      Set to true to keep the full Pinyin.

      keep_original

      Set to true to keep the original input text.

      limit_first_letter_length

      Set to 16 to limit the length of the first letter to a maximum of 16 characters.

      lowercase

      Set to true to use lowercase for the output Pinyin.

      remove_duplicated_term

      Set to true to remove duplicate terms. For example, this avoids results such as "zh, zh".

    • Analyzer (analyzer):

      Parameter

      Description

      ik_pinyin_analyzer

      The name of the analyzer that you define.

      type

      Set to custom to specify a custom analyzer.

      tokenizer

      Set to ik_max_word to split text at the finest granularity.

      filter

      Set to my_pinyin to call the my_pinyin Pinyin filter.

      The following result is returned, which indicates that the index was created successfully.image

  3. Verify the tokenization results.

    You can run the following code to tokenize the input text This is a test.

    GET ik_pinyin/_analyze
    {
      "text": "This is a test",
      "analyzer": "ik_pinyin_analyzer"
    }

    The expected result is as follows.

    {
      "tokens" : [
        {
          "token" : "zhe",
          "start_offset" : 0,
          "end_offset" : 2,
          "type" : "CN_WORD",
          "position" : 0
        },
        {
          "token" : "This is",
          "start_offset" : 0,
          "end_offset" : 2,
          "type" : "CN_WORD",
          "position" : 0
        },
        {
          "token" : "zs",
          "start_offset" : 0,
          "end_offset" : 2,
          "type" : "CN_WORD",
          "position" : 0
        },
        {
          "token" : "shi",
          "start_offset" : 0,
          "end_offset" : 2,
          "type" : "CN_WORD",
          "position" : 1
        },
        {
          "token" : "ge",
          "start_offset" : 2,
          "end_offset" : 3,
          "type" : "CN_CHAR",
          "position" : 2
        },
        {
          "token" : "a",
          "start_offset" : 2,
          "end_offset" : 3,
          "type" : "CN_CHAR",
          "position" : 2
        },
        {
          "token" : "g",
          "start_offset" : 2,
          "end_offset" : 3,
          "type" : "CN_CHAR",
          "position" : 2
        },
        {
          "token" : "ce",
          "start_offset" : 3,
          "end_offset" : 5,
          "type" : "CN_WORD",
          "position" : 3
        },
        {
          "token" : "shi",
          "start_offset" : 3,
          "end_offset" : 5,
          "type" : "CN_WORD",
          "position" : 4
        },
        {
          "token" : "test",
          "start_offset" : 3,
          "end_offset" : 5,
          "type" : "CN_WORD",
          "position" : 4
        },
        {
          "token" : "cs",
          "start_offset" : 3,
          "end_offset" : 5,
          "type" : "CN_WORD",
          "position" : 4
        }
      ]
    }
    

References