analysis-ik is an IK analysis plug-in provided by Alibaba Cloud Elasticsearch. This plug-in cannot be removed. In addition to open source features, the plug-in can dynamically load the dictionaries that are stored in Object Storage Service (OSS) and allows you to use the standard or rolling update method to update dictionaries. This topic describes how to use the analysis-ik plug-in.

Background information

The analysis-ik plug-in supports two update methods for IK dictionaries: standard update and rolling update. The following table describes the two methods.
Update method Application mode Loading mode Description
Standard update This method updates the dictionaries on all the nodes in an Elasticsearch cluster. The update can take effect only after the cluster is restarted. The system sends an uploaded dictionary file to all the nodes in an Elasticsearch cluster, modifies the IKAnalyzer.cfg.xml file, and then restarts the nodes to load the file. You can use the standard update method to update the built-in IK main dictionary and stopword list of the analysis-ik plug-in. In the Standard Update panel of the Elasticsearch console, you can view the built-in main dictionary SYSTEM_MAIN.dic and the built-in stopword list SYSTEM_STOPWORD.dic.
Notice If you want to customize the IKAnalyzer.cfg.xml file, you can download dictionary files, modify them, and upload them again. For more information, see Update an uploaded dictionary file.
Rolling update The first time you upload a dictionary file, the dictionaries on all the nodes in an Elasticsearch cluster are updated. The update can take effect only after the cluster is restarted. If the dictionary file that you upload has the same name as the existing dictionary file, the cluster is not restarted. The dictionaries are directly loaded when the cluster is running. If the content of a dictionary file changes, you can use this method to update the dictionaries on all the nodes in an Elasticsearch cluster. After you upload the latest dictionary file, the nodes automatically load the file.

If the dictionary file list changes when you perform a rolling update, all the nodes in the cluster need to reload dictionary configurations. For example, when you upload a new dictionary file or delete an existing dictionary file, the changes are synchronized to the IKAnalyzer.cfg.xml file.

The first time you upload a dictionary file, the system modifies the IKAnalyzer.cfg.xml file. After the dictionaries are updated, the cluster must be restarted to make the update take effect.
Notice

New dictionaries apply only to data that is inserted after a standard or rolling update. If you want to apply the new dictionaries to both existing data and new data, you must reindex the existing data.

If you use the standard update method, you can modify the built-in main dictionary or stopword list. However, you cannot delete the built-in main dictionary or stopword list. You can use the following methods to modify the built-in main dictionary or stopword list:
  • To update the built-in main dictionary, upload a dictionary file named SYSTEM_MAIN.dic. The new dictionary file automatically overwrites the existing file. For more information, see IK Analysis for Elasticsearch.
  • To update the built-in stopword list, upload a file named SYSTEM_STOPWORD.dic. The new file automatically overwrites the existing file. For more information, see IK Analysis for Elasticsearch and Configure a stopword list.

Prerequisites

Your Elasticsearch cluster is in a normal state. You can check the cluster status on the Basic Information page of the cluster. For more information, see View the basic information of a cluster.

Perform a standard update for IK dictionaries

  1. Log on to the Elasticsearch console.
  2. In the left-side navigation pane, click Elasticsearch Clusters.
  3. Navigate to the desired cluster.
    1. In the top navigation bar, select the resource group to which the cluster belongs and the region where the cluster resides.
    2. In the left-side navigation pane, click Elasticsearch Clusters. On the Elasticsearch Clusters page, find the cluster and click its ID.
  4. In the left-side navigation pane, click Plug-ins.
  5. On the Built-in Plug-ins tab, find the analysis-ik plug-in and click Standard Update in the Actions column.
    Standard Update
  6. In the Standard Update panel, click Configure in the lower-right corner.
  7. Select a method to upload a dictionary file from the drop-down list that is below the IK Main Dictionary section. Then, upload the dictionary file based on the following instructions.
    IK Main Dictionary
    You can select the Upload DIC File or Add OSS File method.
    • Upload DIC File: If you select this method, click Upload DIC File and select the file that you want to upload from your on-premises machine.
    • Add OSS File: If you select this method, configure Bucket Name and File Name, and click Add.

      Make sure that the bucket that you specify resides in the same region as your Elasticsearch cluster and the file that you want to upload is a DIC file. If the content of the dictionary that is stored in OSS changes, you must manually upload the dictionary file again.

    Warning The following operation restarts your Elasticsearch cluster. Before you perform this operation, make sure that the restart does not affect your business.
  8. Scroll down to the lower part of the panel, select This operation will restart the cluster. Continue?, and then click Save.
    If you use the standard update method, the system restarts your cluster no matter whether you upload a new dictionary file, remove a dictionary file, or update dictionary content.
  9. After the cluster is restarted, log on to the Kibana console of the cluster and run the following command to check whether the new dictionary file is in effect.
    For more information about how to log on to the Kibana console, see Log on to the Kibana console.
    GET _analyze
    {
    "analyzer": "ik_smart",
    "text": ["Tokens in the new dictionary file"]
    }
    Note The analysis-ik plug-in integrates two tokenizers: ik_smart and ik_max_word. The following descriptions provide the differences between the two tokenizers:
    • ik_max_word: splits text at the finest granularity. For example, National Anthem of the People's Republic of China is split into National Anthem,the People's Republic of China,People's Republic,Republic of China,People,Republic,China. This tokenizer is suitable for term searches.
    • ik_smart: splits text at a coarse granularity. For example, National Anthem of the People's Republic of China is split into National Anthem,the People's Republic of China. This tokenizer is suitable for phrase searches.

Perform a rolling update for IK dictionaries

  1. On the Built-in Plug-ins tab, find the analysis-ik plug-in and click Rolling Update in the Actions column.
    Rolling update
  2. In the Rolling Update panel, click Configure in the lower-right corner.
  3. Select a method to upload a dictionary file from the drop-down list that is below the IK Main Dictionary section. Then, upload the dictionary file based on the following instructions.
    Plug-in configuration
    Note You cannot use the rolling update method to modify the built-in main dictionary. If you want to modify the built-in main dictionary, use the standard update method.
    You can select the Upload DIC File or Add OSS File method.
    • Upload DIC File: If you select this method, click Upload DIC File and select the file that you want to upload from your on-premises machine.
    • Add OSS File: If you select this method, configure Bucket Name and File Name, and click Add.

      Make sure that the bucket you specify resides in the same region as your Elasticsearch cluster and the file that you want to upload is a DIC file. The dic_0.dic file is used in the following operations. If the content of the dictionary that is stored in OSS changes, you must manually upload the dictionary file again.

    Warning The following operation restarts your Elasticsearch cluster. Before you perform this operation, make sure that the restart does not affect your business.
  4. Scroll down to the lower part of the panel, select This operation will restart the instance. Continue?, and then click Save. If this is the first time that you upload a dictionary file, the system automatically restarts the cluster.

    After you click Save, the system performs a rolling restart for the cluster. After the rolling restart is complete, the new dictionary takes effect.

    If you want to add tokens to or remove tokens from the dictionary, perform the following steps to modify the dic_0.dic file:

  5. In the Rolling Update panel, delete the existing dic_0.dic file and upload a new dictionary file. The new dictionary file must have the same name.
    This operation changes the content of the existing dictionary file in the cluster and uploads a new file that has the same name. The system does not restart the cluster for the update to take effect.
  6. Click Save.
    The analysis-ik plug-in on the nodes of the Elasticsearch cluster automatically loads the dictionary file. The time that is required by each node to load the dictionary file varies. About 2 minutes are required for all the nodes to load the dictionary file. You can log on to the Kibana console and run the following command to verify that the new dictionary is in effect.
    Note For more information, see Log on to the Kibana console.
    GET _analyze
    {
    "analyzer": "ik_smart",
    "text": ["Tokens in the new dictionary file"]
    }

Configure a stopword list

Alibaba Cloud Elasticsearch provides a built-in stopword list. The list contains the following predefined tokens: a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with.

You can perform the following steps to remove tokens from the stopword list:

  1. Download the default IK configuration file from the official website of open source Elasticsearch.
  2. Decompress the downloaded package and open the stopword.dic dictionary file in the config folder.
  3. Remove the tokens that you no longer require and save the dictionary file.
  4. Change the name of the stopword.dic dictionary file to SYSTEM_STOPWORD.dic.
  5. Upload the SYSTEM_STOPWORD.dic file to your Elasticsearch cluster. The file automatically overwrites the existing stopword list.
  6. Select the check box for restart confirmation and click Save. After the cluster is restarted, the new stopword list takes effect.

Update an uploaded dictionary file

If you want to update an uploaded dictionary file, you can download the file, modify it, and then upload it. This section describes how to download and update the SYSTEM_STOPWORD.dic file in the Standard Update panel.

  1. On the Built-in Plug-ins tab, find the analysis-ik plug-in and click Standard Update in the Actions column.
    Standard Update
  2. In the Standard Update panel, click the Download icon icon next to the SYSTEM_STOPWORD.dic file in the IK Stopword List section.
    Download icon
  3. Modify the downloaded file and upload it again.
    For more information about how to upload the file, see the related step in the Perform a standard update for IK dictionaries or Perform a rolling update for IK dictionaries section.