analysis-ik is an IK analysis plug-in provided by Alibaba Cloud Elasticsearch. This plug-in cannot be removed by default. The plug-in integrates the features of the IK analysis plug-in provided by open source Elasticsearch, can dynamically load the dictionaries that are stored in Object Storage Service (OSS), and provides the standard update and rolling update methods to update dictionaries. This topic describes how to use the analysis-ik plug-in.

Prerequisites

Your Alibaba Cloud Elasticsearch cluster is in a normal state. You can view the status of the cluster on the Basic Information page of the cluster. For more information, see View the basic information of a cluster.

Limits

Elasticsearch clusters of V7.16 or later and Elasticsearch V7.10 clusters deployed based on the new cloud-native control architecture in some regions do not allow you to perform a standard update for IK dictionaries.

Precautions

New dictionaries take effect only for data that is inserted or updated after a standard or rolling update. If you also want the new dictionaries to take effect for existing data, you must reindex the existing data.

Dictionary update methods

The analysis-ik plug-in supports two update methods for IK dictionaries: standard update and rolling update. The following table describes the two methods.

Update methodApplication modeLoading modeDescription
Standard updateThis method updates the dictionaries on all nodes in an Elasticsearch cluster. The update can take effect only after the cluster is restarted. The system automatically sends an uploaded dictionary file to all nodes in an Elasticsearch cluster, modifies the IKAnalyzer.cfg.xml file, and then restarts the nodes to load the file. You can use the standard update method to update the built-in IK main dictionary and stopword list of the analysis-ik plug-in. In the Standard Update panel of the Elasticsearch console, you can view the built-in main dictionary SYSTEM_MAIN.dic and the built-in stopword list SYSTEM_STOPWORD.dic.
Important If you want to customize the IKAnalyzer.cfg.xml file, you can download dictionary files, modify the files, and upload them. For more information, seeUpdate a dictionary file.
Rolling updateThe first time you upload a dictionary file, the dictionaries on all nodes in an Elasticsearch cluster are updated. The update can take effect only after the cluster is restarted. If the name of the dictionary file that you upload is the same as that of the existing dictionary file, the cluster is not restarted. The dictionaries are automatically loaded during the running of the cluster. If the content of a dictionary file changes, you can use this method to update the dictionaries on all nodes in an Elasticsearch cluster. After you upload the latest dictionary file, the nodes automatically load the file.

If the list of dictionary files changes when you perform a rolling update, the changes are synchronized to the IKAnalyzer.cfg.xml file, and all nodes in the cluster need to reload dictionary configurations. For example, if you upload a new dictionary file or delete an existing dictionary file when you perform a rolling update, all nodes in the cluster reload dictionary configurations.

The first time you upload a dictionary file, the system automatically modifies the IKAnalyzer.cfg.xml file. After dictionaries are updated, the cluster must be restarted for the update to take effect.

Instructions on dictionaries

Dictionary typeDescriptionLimitUpdate method
Main dictionaryWhen you create an index in an Elasticsearch cluster, you can specify a main dictionary. This way, if you write data that contains a token in the main dictionary to the index, the system writes the data to the index, and you can use the token as a keyword to query the data in the index.
  • A dictionary file is a DIC file that is encoded in UTF-8. In a dictionary file, each row contains one word.
  • The name of a dictionary file cannot exceed 30 characters in length and can contain letters, digits, and underscores (_).
  • You are not allowed to upload a file that is named the same as an existing dictionary file for a main dictionary or stopword list.
Standard update
Stopword listWhen you create an index in an Elasticsearch cluster, you can specify a stopword list. This way, if you write data that contains a token in the stopword list to the index, the system filters out the token.
Alibaba Cloud Elasticsearch provides a built-in stopword list. The list contains the following tokens: a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with. If you want to remove some tokens from the stopword list, you can download the related dictionary file, remove the tokens from the file, rename the file SYSTEM_STOPWORD.dic, and then upload the file. For more information, see Update a dictionary file.
Note If you want to use Chinese tokens, you can refer to the extra_stopword.dic file in the config directory of the analysis-ik configuration file provided by open source Elasticsearch. The extra_stopword.dic file contains the following Chinese tokens: 也, 了, 仍, 从, 以, 使, 则, 却, 又, 及, 对, 就, 并, 很, 或, 把, 是, 的, 着, 给, 而, 被, 让, 在, 还, 比, 等, 当, 与, 于, 但.

Perform a standard update for IK dictionaries

  1. Log on to the Alibaba Cloud Elasticsearch console.
  2. In the left-side navigation pane, click Elasticsearch Clusters.
  3. Navigate to the desired cluster.
    1. In the top navigation bar, select the resource group to which the cluster belongs and the region where the cluster resides.
    2. On the Elasticsearch Clusters page, find the cluster and click its ID.
  4. In the left-side navigation pane of the page that appears, choose Configuration and Management > Plug-ins.
  5. On the Built-in Plug-ins tab, find the analysis-ik plug-in and click Standard Update in the Actions column.
    Standard Update
  6. In the Standard Update panel, click Configure in the lower-left corner.
  7. Select a method to upload a dictionary file from the drop-down list that is below the IK Main Dictionary section. Then, upload the dictionary file based on the following instructions.
    IK Main Dictionary
    You can select the Upload DIC File or Add OSS File method.
    • Upload DIC File: If you select this method, click Upload DIC File and select the file that you want to upload from your on-premises machine.
    • Add OSS File: If you select this method, configure the Bucket Name and File Name parameters, and click Add.

      Make sure that the bucket that you specify resides in the same region as your Elasticsearch cluster and the file that you want to upload is a DIC file. If the content of the dictionary that is stored in OSS changes, you must upload the dictionary file again.

    Warning The following operation restarts your Elasticsearch cluster. Before you perform this operation, make sure that the restart does not affect your business.
  8. Scroll down to the lower part of the panel, select This operation will restart the cluster. Continue?, and then click Save.
    If you use the standard update method, the system restarts your cluster regardless of whether you upload a new dictionary file, remove a dictionary file, or update dictionary content.
  9. After the cluster is restarted, log on to the Kibana console of the cluster and run the following command to check whether the new dictionary file is in effect.
    For more information about how to log on to the Kibana console, see Log on to the Kibana console.
    GET _analyze
    {
      "analyzer": "ik_smart",
      "text": ["Tokens in the new dictionary file"]
    }

    If the dictionary file is in effect, the system splits the value of text based on the tokenization rules of the tokenizer ik_smart and returns the results.

    Note The analysis-ik plug-in integrates the following tokenizers:
    • ik_max_word: splits text at the finest granularity. This tokenizer is suitable for term searches.
    • ik_smart: splits text at a coarse granularity. This tokenizer is suitable for phrase searches.

Perform a rolling update for IK dictionaries

  1. On the Built-in Plug-ins tab, find the analysis-ik plug-in and click Rolling Update in the Actions column.
    Rolling Update
  2. In the Rolling Update panel, click Configure in the lower-left corner.
  3. Select a method to upload a dictionary file from the drop-down list that is below the IK Main Dictionary section. Then, upload the dictionary file based on the following instructions.
    Plug-in configuration
    Note You cannot use the rolling update method to modify the built-in main dictionary. If you want to modify the built-in main dictionary, use the standard update method.
    You can select the Upload DIC File or Add OSS File method.
    • Upload DIC File: If you select this method, click Upload DIC File and select the file that you want to upload from your on-premises machine.
    • Add OSS File: If you select this method, configure the Bucket Name and File Name parameters, and click Add.

      Make sure that the bucket you specify resides in the same region as your Elasticsearch cluster and the file that you want to upload is a DIC file. The dic_0.dic file is used in the following operations. If the content of the dictionary that is stored in OSS changes, you must upload the related dictionary file again.

    Warning The following operation restarts your Elasticsearch cluster. Before you perform this operation, make sure that the restart does not affect your business.
  4. Scroll down to the lower part of the panel, select This operation will restart the cluster. Continue?, and then click Save. The first time you upload a dictionary file, the system automatically restarts the cluster.

    After you click Save, the system performs a rolling restart for the cluster. After the rolling restart is complete, the new dictionary takes effect.

    If you want to add tokens to or remove tokens from the dictionary, perform the following steps to modify the dic_0.dic file:

  5. In the Rolling Update panel, delete the existing dic_0.dic file and upload a new dictionary file. The name of the new dictionary file must be the same as that of the existing dictionary file.
    The system does not restart the cluster because the dictionary file name remains unchanged.
  6. Click Save.
    The analysis-ik plug-in on the nodes of the Elasticsearch cluster automatically loads the dictionary file. The time that is required by each node to load the dictionary file varies. About 2 minutes are required for all nodes to load the dictionary file. You can log on to the Kibana console of your cluster and run the following command to check whether the new dictionary is in effect.
    Note For more information about how to log on to the Kibana console, see Log on to the Kibana console.
    GET _analyze
    {
      "analyzer": "ik_smart",
      "text": ["Tokens in the new dictionary file"]
    }

    If the dictionary file is in effect, the system splits the value of text based on the tokenization rules of the tokenizer ik_smart and returns the results.

Update a dictionary file

If you want to update a dictionary file that is uploaded, you can download the file, modify the file, and then upload the file again. This section describes how to download and update the SYSTEM_STOPWORD.dic file in the Standard Update panel.

  1. On the Built-in Plug-ins tab, find the analysis-ik plug-in and click Standard Update in the Actions column.
    Standard Update
  2. In the Standard Update panel, click the Download icon icon next to the SYSTEM_STOPWORD.dic file in the IK Stopword List section.
    Download icon
    Important If you use the standard update method, you can modify the built-in main dictionary or stopword list. You cannot delete the built-in main dictionary or stopword list. You can use the following methods to modify the built-in main dictionary or stopword list:
    • To update the built-in main dictionary, upload a dictionary file named SYSTEM_MAIN.dic. The new dictionary file automatically overwrites the existing file. For more information, see IK Analysis for Elasticsearch.
    • To update the built-in stopword list, upload a file named SYSTEM_STOPWORD.dic. The new file automatically overwrites the existing file. For more information, see IK Analysis for Elasticsearch.
  3. Modify the downloaded file and upload the file again.
    For more information about how to upload the file, see the related steps in the Perform a standard update for IK dictionaries or Perform a rolling update for IK dictionaries section.

References

  • API operation for performing a rolling update on IK dictionaries: UpdateHotIkDicts
  • API operation for performing a standard update on IK dictionaries: UpdateDict