All Products
Search
Document Center

Elasticsearch:Use the analysis-ik plug-in

Last Updated:Oct 16, 2024

analysis-ik is an IK analysis plug-in provided by Alibaba Cloud Elasticsearch. The plug-in provides the dictionary-based tokenization capability, and cannot be removed by default. The built-in dictionary files provided by the plug-in are used by default. You can update the files to improve the tokenization effect and make tokenization results more suitable for your business scenarios. The plug-in integrates the features of the IK analysis plug-in provided by open source Elasticsearch and can dynamically load the dictionary files that are stored in Object Storage Service (OSS).

Prerequisites

Your Alibaba Cloud Elasticsearch cluster is in a normal state. You can view the status of the cluster on the Basic Information page of the cluster. For more information, see View the basic information of a cluster.

Note

The Basic Information page of or features supported by a cluster of an earlier version may vary. The actual Basic Information page and supported features prevail.

Introduction to the analysis-ik plug-in

Tokenizers

The analysis-ik plug-in integrates the following tokenizers:

  • ik_max_word: splits text at the finest granularity. This tokenizer is suitable for term searches.

  • ik_smart: splits text at a coarse granularity. This tokenizer is suitable for phrase searches.

Dictionary types

The analysis-ik plug-in supports the following dictionary types.

Dictionary type

Description

Dictionary file requirement

Update method

Main dictionary (main.dic)

The built-in main dictionary of the analysis-ik plug-in contains more than 270,000 Chinese words.

When you create an index in an Elasticsearch cluster, you can specify a main dictionary. This way, if you write data that contains a token in the main dictionary to the index, the system writes the data to the index, and you can use the token as a keyword to query the data in the index.

A dictionary file is a DIC file that is encoded in UTF-8. In a dictionary file, each row contains one word.

Stopword dictionary (stopword.dic)

The built-in stopword dictionary of the analysis-ik plug-in contains English stopwords, such as a, the, and, at, and but.

When you create an index in an Elasticsearch cluster, you can specify a stopword dictionary. This way, if you write data that contains a token in the stopword dictionary to the index, the system filters out the token, and the token is not included in an inverted index.

Preposition dictionary (preposition.dic)

The dictionary contains prepositions.

-

Standard update

Quantifier dictionary (quantifier.dic)

The dictionary contains quantifiers and unit-related words.

suffix.dic

The dictionary contains suffixes.

-

Updates are not supported.

surname.dic

The dictionary contains surnames.

Dictionary update methods

The analysis-ik plug-in supports two update methods for IK dictionaries: standard update and rolling update. The following table describes the two methods.

Update method

Application mode

Loading mode

Recommended use scenario

Standard update

This method updates the dictionaries on all nodes in an Elasticsearch cluster. The update can take effect only after the cluster is restarted.

The system sends an uploaded dictionary file to all nodes in an Elasticsearch cluster and restarts the nodes to load the file.

  • You want to replace the default dictionary file or delete content from the default dictionary file.

  • You want to update the preposition or quantifier dictionary file.

Rolling update

  • If you upload a dictionary file to an Elasticsearch cluster and the file has the same name as the original dictionary file, the cluster is not restarted after the upload. Dictionaries are automatically loaded during the running of the cluster.

  • If you upload a dictionary file for the first time, add or delete a dictionary file, or the name of a dictionary file is changed, a cluster restart is triggered to load dictionary configurations.

  • If the content of a dictionary file changes, all nodes in the cluster automatically load the uploaded file to update dictionaries.

  • If the dictionary file list or dictionary name changes, the cluster needs to be restarted to re-load dictionary configurations.

You want to extend the main dictionary file or stopword dictionary file or want to update an extended dictionary file.

Update dictionaries

New dictionaries take effect only for data that is inserted or updated after a standard or rolling update. If you also want the new dictionaries to take effect for existing data, you must reindex the existing data.

Standard update

You can perform a standard update to replace an original dictionary file of an Elasticsearch cluster. The replacement can take effect only after the cluster is restarted.

Warning

A standard update triggers a cluster restart. To ensure that your business is not affected, we recommend that you perform the update during off-peak hours.

  1. Log on to the Alibaba Cloud Elasticsearch console.
  2. In the left-side navigation pane, click Elasticsearch Clusters.
  3. Navigate to the desired cluster.
    1. In the top navigation bar, select the resource group to which the cluster belongs and the region where the cluster resides.
    2. On the Elasticsearch Clusters page, find the cluster and click its ID.
  4. In the left-side navigation pane of the page that appears, choose Configuration and Management > Plug-ins.

  5. On the Built-in Plug-ins tab, find the analysis-ik plug-in and click Standard Update in the Actions column.

  6. In the Configure IK Dictionaries - Standard Update panel, click Edit on the right side of the dictionary that you want to update, add or replace the related dictionary file, and then click Save.

    You can use one of the following methods to update a dictionary file:

    • Upload On-premises File: Click the upload area and select the file that you want to upload from your on-premises machine. Alternatively, drag the file that you want to upload from your on-premises machine to the upload area.

    • Upload OSS File: Configure the Bucket Name and File Name parameters, and click Add.

      • Make sure that the bucket that you specify resides in the same region as your Elasticsearch cluster.

      • The dictionary file that you specify cannot be automatically updated. If the content of the dictionary file that is stored in OSS changes, you must perform a standard update to make the changes take effect.

    Note
  7. Click OK.

    After the cluster is restarted, the dictionary file is updated.

  8. Optional. Log on to the Kibana console of the cluster and check whether the new dictionary file takes effect.

    Note

    For information about how to log on to the Kibana console of a cluster, see Log on to the Kibana console.

    On the Dev Tools page of the Kibana console, run the following command:

    GET _analyze
    {
      "analyzer": "ik_smart",
      "text": ["Tokens in the new dictionary file"]
    }

Rolling update

You can update extended dictionaries to extend the main dictionary and stopword dictionary. If the dictionary file name and the number of dictionary files remain unchanged, the system does not restart the cluster.

  1. Log on to the Alibaba Cloud Elasticsearch console.
  2. In the left-side navigation pane, click Elasticsearch Clusters.
  3. Navigate to the desired cluster.
    1. In the top navigation bar, select the resource group to which the cluster belongs and the region where the cluster resides.
    2. On the Elasticsearch Clusters page, find the cluster and click its ID.
  4. In the left-side navigation pane of the page that appears, choose Configuration and Management > Plug-ins.

  5. On the Built-in Plug-ins tab, find the analysis-ik plug-in and click Rolling Update in the Actions column.

  6. In the Configure IK Dictionaries - Rolling Update panel, click Edit on the right side of the dictionary that you want to update, upload a dictionary file, and then click Save.

    You can use one of the following methods to update a dictionary file:

    • Upload On-premises File: Click the upload area and select the file that you want to upload from your on-premises machine. Alternatively, drag the file that you want to upload from your on-premises machine to the upload area.

    • Upload OSS File: Configure the Bucket Name and File Name parameters, and click Add.

      • Make sure that the bucket that you specify resides in the same region as your Elasticsearch cluster.

      • The dictionary file that you specify cannot be automatically updated. If the content of the dictionary file that is stored in OSS changes, you must perform a rolling update to make the changes take effect.

    Note
    • The file name extension must be .dic. The file name must be 1 to 30 characters in length and can contain letters, digits, and underscores (_).

    • If you want to modify the uploaded dictionary file, you can click the 下载按钮 icon next to the file to download and modify it. Then, delete the file and upload the file again.

    • You can upload multiple dictionary files. The cluster needs to be restarted only when the content of dictionary files changes. If the dictionary file names and the number of dictionary files remain unchanged, the system does not restart the cluster. To ensure that your business is not affected, we recommend that you perform the update during off-peak hours. After the restart is complete, the new dictionary file takes effect.

  7. Click OK.

    The analysis-ik plug-in on the nodes of the Elasticsearch cluster automatically loads the dictionary file. The time that is required by each node to load the dictionary file varies.

  8. Optional. Log on to the Kibana console of the cluster and check whether the new dictionary file takes effect.

    Note

    For information about how to log on to the Kibana console of a cluster, see Log on to the Kibana console.

    To ensure accuracy, run the following command multiple times on the Dev Tools page of the Kibana console for verification:

    GET _analyze
    {
      "analyzer": "ik_smart",
      "text": ["Tokens in the new dictionary file"]
    }

References

  • API operation for performing a rolling update on IK dictionaries: UpdateHotIkDicts

  • API operation for performing a standard update on IK dictionaries: UpdateDict