The analysis-ik plug-in is an IK analyzer plug-in of Alibaba Cloud Elasticsearch. This plug-in cannot be removed. In addition to its open-source functionalities, analysis-ik supports the dynamic loading of dictionaries stored on Object Storage Service (OSS). It also allows you to use the standard update or rolling update method to update dictionaries.

Background information

The analysis-ik plug-in supports standard update and rolling update. For more information, see Standard update of IK dictionaries and Rolling update of IK dictionaries. The following table provides more details about the two methods.
Update method Application mode Loading mode Description
Standard update This method updates the dictionaries on all nodes in an Elasticsearch cluster. It requires a restart of the cluster for the update to take effect. Elasticsearch sends the uploaded dictionary file to all nodes in the cluster, modifies the IKAnalyzer.cfg.xml file, and then restarts the nodes to load the file. You can use the standard update method to update the IK main dictionary and stopword list. In the standard update pane, you can check the built-in main dictionary SYSTEM_MAIN.dic and the built-in stopword list SYSTEM_STOPWORD.dic.
Rolling update If you upload a dictionary file for the first time, the dictionaries on all nodes in a cluster are updated. A restart of the cluster is required for the update to take effect. If you upload a file with the same name for the second time, a cluster restart is not required. The dictionaries are directly loaded while the cluster is running. When the content of a dictionary file changes, you can use this method to update the dictionaries. After you upload the latest dictionary file, all nodes in the Elasticsearch cluster automatically load the file.

When you perform a rolling update, if the dictionary file list changes, all nodes in the cluster need to reload the dictionary configuration. For example, when you upload a new dictionary file or delete an existing dictionary file, the changes are synchronized to the IKAnalyzer.cfg.xml file.

When you upload a dictionary file for the first time, you must modify the IKAnalyzer.cfg.xml file. After the dictionaries are updated, a restart of the cluster is required for the update to take effect.
Notice

If indexes are configured with analysis-ik, the new dictionaries only apply to new data in these indexes. If you want to apply the new dictionaries to both the existing data and new data, you must recreate the indexes.

If you select the standard update method, you cannot delete a built-in main dictionary or stopword list. However, you can modify the built-in main dictionary or stopword list. The following two modification methods are available:
  • If you want to update the built-in main dictionary, upload a dictionary file named SYSTEM_MAIN.dic. The new dictionary file automatically overwrites the existing file. For more information, visit IK Analysis for Elasticsearch.
  • If you want to update the built-in stopword list, upload a dictionary file named SYSTEM_STOPWORD.dic. The new dictionary file automatically overwrites the existing file. For more information, visit IK Analysis for Elasticsearch and Stopword configuration.

Prerequisites

The cluster status is normal. You can query the status on the Basic Information page.

Standard update of IK dictionaries

  1. Log on to the Alibaba Cloud Elasticsearch console.
  2. In the top navigation bar, select the region where your Alibaba Cloud Elasticsearch cluster resides.
  3. Find the target cluster and click its ID.
  4. In the left-side navigation pane, click Plug-ins.
  5. On the Built-in Plug-ins tab, find the analysis-ik plug-in, and click Standard Update in the Actions column.
    Standard update
  6. In the Plug-ins pane, click Configure.
  7. Below the IK Main Dictionary section, select Upload DIC File from the drop-down list, and upload a dictionary file as follows:
    IK Main Dictionary
    You can select either Upload DIC File or Add OSS File.
    • Upload DIC File: Click Upload DIC File and select a local file to upload.
    • Add OSS File: Enter the bucket name and object name, and click Add.

      Make sure that the bucket is in the same region as the Elasticsearch cluster and the object is in the DIC format. If the content of a dictionary stored on OSS changes, you must manually upload the dictionary file again.

    Warning The following operation restarts the Elasticsearch cluster. Make sure that your business is not affected before you confirm the operation.
  8. Scroll down to the lower part of the page, select This operation will restart the cluster. Continue? and click Save.
    Note If you choose the standard update method, Elasticsearch restarts the cluster no matter whether you upload a new dictionary file, remove a dictionary file, or update dictionary content.
  9. After the cluster is restarted, log on to the Kibana console of the Elasticsearch cluster and run the following command to check whether the new dictionaries take effect.
    Note For more information about how to log on to the Kibana console, see Log on to the Kibana console.
    GET _analyze
    {
    "analyzer": "ik_smart",
    "text": ["tokens in the new dictionaries"]
    }

Rolling update of IK dictionaries

  1. On the Built-in Plug-ins tab, find the analysis-ik plug-in, and click Rolling Update in the Actions column.
    Rolling update
  2. In the Plug-ins pane, click Configure.
  3. Below the IK Main Dictionary section, select Upload DIC File from the drop-down list, and upload a dictionary file as follows:
    Plug-in configuration
    Note You cannot use the rolling update method to modify a built-in main dictionary. If you want to modify the built-in main dictionary, use the standard update method.
    You can select either Upload DIC File or Add OSS File.
    • Upload DIC File: Click Upload DIC File and select a local file to upload.
    • Add OSS File: Enter the bucket name and object name, and click Add.

      Make sure that the bucket is in the same region as the Elasticsearch cluster and the object is in the DIC format. The following operations use dic_0.dic as an example. If the content of a dictionary stored on OSS changes, you must manually upload the dictionary file again.

    Warning The following operation restarts the Elasticsearch cluster. Make sure that your business is not affected before you confirm the operation.
  4. Scroll down to the lower part of the page, select This operation will restart the instance. Continue? and click Save. If this is the first time that you upload a dictionary file, Elasticsearch automatically restarts the cluster.

    After you click Save, the cluster performs a rolling update. After the rolling update is complete, the new dictionaries take effect.

    If you want to add or remove tokens from the new dictionaries, perform the following steps to modify the dic_0.dic file:

  5. In the rolling update pane, delete the existing dic_0.dic file, and then upload a new dictionary file. The new dictionary file must have the same name.
    This operation changes the content of the existing dictionary file in the cluster and uploads a new file that has the same name. Elasticsearch does not need to restart the cluster for the update to take effect.
  6. Click Save.
    The analysis-ik plug-in on the nodes of the Elasticsearch cluster automatically loads the dictionary file. The time to load the dictionary file varies with the nodes. It requires about two minutes for all nodes to load the dictionary file. You can log on to the Kibana console of the Elasticsearch cluster and run the following command multiple times to verify the new dictionaries.
    Note For more information about how to log on to the Kibana console, see Log on to the Kibana console.
    GET _analyze
    {
    "analyzer": "ik_smart",
    "text": ["tokens in the new dictionaries"]
    }

Stopword configuration

Elasticsearch provides a built-in stopword list, which contains the following predefined tokens: a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with.

You can follow these steps to delete tokens from the stopword list:

  1. Download the default IK configuration file from open-source Elasticsearch.
  2. Decompress the package and open the stopword.dic dictionary file in the config folder.
  3. Delete the tokens and save the dictionary file.
  4. Change the name of the stopword.dic dictionary file to SYSTEM_STOPWORD.dic.
  5. Upload the SYSTEM_STOPWORD.dic file to your Elasticsearch cluster. The file automatically overwrites the existing stopword list.
  6. After the cluster is restarted, the new stopword list takes effect.