analysis-ik is an IK analysis plug-in provided by Alibaba Cloud Elasticsearch. This plug-in cannot be removed by default. The plug-in integrates the features of the IK analysis plug-in provided by open source Elasticsearch, can dynamically load the dictionaries that are stored in Object Storage Service (OSS), and provides the standard update and rolling update methods to update dictionaries. This topic describes how to use the analysis-ik plug-in.
Prerequisites
Your Alibaba Cloud Elasticsearch cluster is in a normal state. You can view the status of the cluster on the Basic Information page of the cluster. For more information, see View the basic information of a cluster.
Dictionary update methods
The analysis-ik plug-in supports two update methods for IK dictionaries: standard update and rolling update. The following table describes the two methods.
Update method | Application mode | Loading mode | Description |
---|---|---|---|
Standard update | This method updates the dictionaries on all nodes in an Elasticsearch cluster. The update can take effect only after the cluster is restarted. | The system sends an uploaded dictionary file to all nodes in an Elasticsearch cluster, modifies the IKAnalyzer.cfg.xml file, and then restarts the nodes to load the file. | You can use the standard update method to update the built-in IK main dictionary and
stopword list of the analysis-ik plug-in. In the Standard Update panel of the Elasticsearch
console, you can view the built-in main dictionary SYSTEM_MAIN.dic and the built-in stopword list SYSTEM_STOPWORD.dic.
Notice If you want to customize the IKAnalyzer.cfg.xml file, you can download dictionary files, modify the files, and upload them. For more
information, seeUpdate a dictionary file.
|
Rolling update | The first time you upload a dictionary file, the dictionaries on all nodes in an Elasticsearch cluster are updated. The update can take effect only after the cluster is restarted. If the name of the dictionary file that you upload is the same as that of the existing dictionary file, the cluster is not restarted. The dictionaries are automatically loaded during the running of the cluster. | If the content of a dictionary file changes, you can use this method to update the
dictionaries on all nodes in an Elasticsearch cluster. After you upload the latest
dictionary file, the nodes automatically load the file.
If the list of dictionary files changes when you perform a rolling update, the changes are synchronized to the IKAnalyzer.cfg.xml file, and all nodes in the cluster need to reload dictionary configurations. For example, if you upload a new dictionary file or delete an existing dictionary file when you perform a rolling update, all nodes in the cluster reload dictionary configurations. |
The first time you upload a dictionary file, the system modifies the IKAnalyzer.cfg.xml file. After dictionaries are updated, the cluster must be restarted for the update to take effect. |
Instructions on dictionaries
Dictionary type | Description | Limit | Update method |
---|---|---|---|
Main dictionary | When you create an index in an Elasticsearch cluster, you can specify a main dictionary. This way, if you write data that contains a token in the main dictionary to the index, the system writes the data to the index, and you can use the token as a keyword to query the data in the index. |
|
Perform a standard update for IK dictionaries |
Stopword list | When you create an index in an Elasticsearch cluster, you can specify a stopword list.
This way, if you write data that contains a token in the stopword list to the index,
the system filters out the token.
Alibaba Cloud Elasticsearch provides a built-in stopword list. The list contains the
following tokens: a, an, and, are, as, at, be, but, by, for, if, in, into, is, it,
no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was,
will, with. If you want to remove some tokens from the stopword list, you can download
the related dictionary file, remove the tokens from the file, rename the file SYSTEM_STOPWORD.dic, and then upload the file. For more information, see Update a dictionary file.
Note If you want to use Chinese tokens, you can refer to the extra_stopword.dic file in the config directory of the analysis-ik configuration file provided by open source Elasticsearch. The extra_stopword.dic file contains the following
Chinese tokens: 也, 了, 仍, 从, 以, 使, 则, 却, 又, 及, 对, 就, 并, 很, 或, 把, 是, 的, 着, 给, 而, 被,
让, 在, 还, 比, 等, 当, 与, 于, 但.
|
Perform a standard update for IK dictionaries
Perform a rolling update for IK dictionaries
Update a dictionary file
If you want to update a dictionary file that is uploaded, you can download the file, modify the file, and then upload the file again. This section describes how to download and update the SYSTEM_STOPWORD.dic file in the Standard Update panel.