analysis-ik is an IK analysis plug-in provided by Alibaba Cloud Elasticsearch. The plug-in provides the dictionary-based tokenization capability, and cannot be removed by default. The built-in dictionary files provided by the plug-in are used by default. You can update the files to improve the tokenization effect and make tokenization results more suitable for your business scenarios. The plug-in integrates the features of the IK analysis plug-in provided by open source Elasticsearch and can dynamically load the dictionary files that are stored in Object Storage Service (OSS).
Prerequisites
Your Alibaba Cloud Elasticsearch cluster is in a normal state. You can view the status of the cluster on the Basic Information page of the cluster. For more information, see View the basic information of a cluster.
The Basic Information page of or features supported by a cluster of an earlier version may vary. The actual Basic Information page and supported features prevail.
Introduction to the analysis-ik plug-in
Tokenizers
The analysis-ik plug-in integrates the following tokenizers:
ik_max_word: splits text at the finest granularity. This tokenizer is suitable for term searches.
ik_smart: splits text at a coarse granularity. This tokenizer is suitable for phrase searches.
Dictionary types
The analysis-ik plug-in supports the following dictionary types.
Dictionary type | Description | Dictionary file requirement | Update method |
Main dictionary (main.dic) | The built-in main dictionary of the analysis-ik plug-in contains more than 270,000 Chinese words. When you create an index in an Elasticsearch cluster, you can specify a main dictionary. This way, if you write data that contains a token in the main dictionary to the index, the system writes the data to the index, and you can use the token as a keyword to query the data in the index. | A dictionary file is a DIC file that is encoded in UTF-8. In a dictionary file, each row contains one word. | |
Stopword dictionary (stopword.dic) | The built-in stopword dictionary of the analysis-ik plug-in contains English stopwords, such as a, the, and, at, and but. When you create an index in an Elasticsearch cluster, you can specify a stopword dictionary. This way, if you write data that contains a token in the stopword dictionary to the index, the system filters out the token, and the token is not included in an inverted index. | ||
Preposition dictionary (preposition.dic) | The dictionary contains prepositions. | - | |
Quantifier dictionary (quantifier.dic) | The dictionary contains quantifiers and unit-related words. | ||
suffix.dic | The dictionary contains suffixes. | - | Updates are not supported. |
surname.dic | The dictionary contains surnames. |
Dictionary update methods
The analysis-ik plug-in supports two update methods for IK dictionaries: standard update and rolling update. The following table describes the two methods.
Update method | Application mode | Loading mode | Recommended use scenario |
Standard update | This method updates the dictionaries on all nodes in an Elasticsearch cluster. The update can take effect only after the cluster is restarted. | The system sends an uploaded dictionary file to all nodes in an Elasticsearch cluster and restarts the nodes to load the file. |
|
Rolling update |
|
| You want to extend the main dictionary file or stopword dictionary file or want to update an extended dictionary file. |
Update dictionaries
New dictionaries take effect only for data that is inserted or updated after a standard or rolling update. If you also want the new dictionaries to take effect for existing data, you must reindex the existing data.
Standard update
You can perform a standard update to replace an original dictionary file of an Elasticsearch cluster. The replacement can take effect only after the cluster is restarted.
A standard update triggers a cluster restart. To ensure that your business is not affected, we recommend that you perform the update during off-peak hours.
- Log on to the Alibaba Cloud Elasticsearch console.
- In the left-side navigation pane, click Elasticsearch Clusters.
- Navigate to the desired cluster.
- In the top navigation bar, select the resource group to which the cluster belongs and the region where the cluster resides.
- On the Elasticsearch Clusters page, find the cluster and click its ID.
In the left-side navigation pane of the page that appears, choose .
On the Built-in Plug-ins tab, find the analysis-ik plug-in and click Standard Update in the Actions column.
In the Configure IK Dictionaries - Standard Update panel, click Edit on the right side of the dictionary that you want to update, add or replace the related dictionary file, and then click Save.
You can use one of the following methods to update a dictionary file:
Upload On-premises File: Click the upload area and select the file that you want to upload from your on-premises machine. Alternatively, drag the file that you want to upload from your on-premises machine to the upload area.
Upload OSS File: Configure the Bucket Name and File Name parameters, and click Add.
Make sure that the bucket that you specify resides in the same region as your Elasticsearch cluster.
The dictionary file that you specify cannot be automatically updated. If the content of the dictionary file that is stored in OSS changes, you must perform a standard update to make the changes take effect.
NoteThe file name extension must be
.dic
. The file name must be 1 to 30 characters in length and can contain letters, digits, and underscores (_).You can upload only one DIC file for each type of dictionary. After the update is initiated, the system replaces the original dictionary file with the uploaded file.
If you want to recover a default dictionary file, you can download the file and upload it. Default dictionary files:
Click OK.
After the cluster is restarted, the dictionary file is updated.
Optional. Log on to the Kibana console of the cluster and check whether the new dictionary file takes effect.
NoteFor information about how to log on to the Kibana console of a cluster, see Log on to the Kibana console.
On the Dev Tools page of the Kibana console, run the following command:
GET _analyze { "analyzer": "ik_smart", "text": ["Tokens in the new dictionary file"] }
Rolling update
You can update extended dictionaries to extend the main dictionary and stopword dictionary. If the dictionary file name and the number of dictionary files remain unchanged, the system does not restart the cluster.
- Log on to the Alibaba Cloud Elasticsearch console.
- In the left-side navigation pane, click Elasticsearch Clusters.
- Navigate to the desired cluster.
- In the top navigation bar, select the resource group to which the cluster belongs and the region where the cluster resides.
- On the Elasticsearch Clusters page, find the cluster and click its ID.
In the left-side navigation pane of the page that appears, choose .
On the Built-in Plug-ins tab, find the analysis-ik plug-in and click Rolling Update in the Actions column.
In the Configure IK Dictionaries - Rolling Update panel, click Edit on the right side of the dictionary that you want to update, upload a dictionary file, and then click Save.
You can use one of the following methods to update a dictionary file:
Upload On-premises File: Click the upload area and select the file that you want to upload from your on-premises machine. Alternatively, drag the file that you want to upload from your on-premises machine to the upload area.
Upload OSS File: Configure the Bucket Name and File Name parameters, and click Add.
Make sure that the bucket that you specify resides in the same region as your Elasticsearch cluster.
The dictionary file that you specify cannot be automatically updated. If the content of the dictionary file that is stored in OSS changes, you must perform a rolling update to make the changes take effect.
NoteThe file name extension must be
.dic
. The file name must be 1 to 30 characters in length and can contain letters, digits, and underscores (_).If you want to modify the uploaded dictionary file, you can click the icon next to the file to download and modify it. Then, delete the file and upload the file again.
You can upload multiple dictionary files. The cluster needs to be restarted only when the content of dictionary files changes. If the dictionary file names and the number of dictionary files remain unchanged, the system does not restart the cluster. To ensure that your business is not affected, we recommend that you perform the update during off-peak hours. After the restart is complete, the new dictionary file takes effect.
Click OK.
The analysis-ik plug-in on the nodes of the Elasticsearch cluster automatically loads the dictionary file. The time that is required by each node to load the dictionary file varies.
Optional. Log on to the Kibana console of the cluster and check whether the new dictionary file takes effect.
NoteFor information about how to log on to the Kibana console of a cluster, see Log on to the Kibana console.
To ensure accuracy, run the following command multiple times on the Dev Tools page of the Kibana console for verification:
GET _analyze { "analyzer": "ik_smart", "text": ["Tokens in the new dictionary file"] }
References
API operation for performing a rolling update on IK dictionaries: UpdateHotIkDicts
API operation for performing a standard update on IK dictionaries: UpdateDict