The AliNLP tokenization plug-in, also known as analysis-aliws, is a built-in plug-in of Alibaba Cloud Elasticsearch. This plug-in integrates an analyzer and a tokenizer into Elasticsearch to implement document analysis and retrieval. The plug-in allows you to upload a tailored dictionary file to it. After the upload, the system performs a rolling update for your Elasticsearch cluster to apply the dictionary file. This topic describes how to use the analysis-aliws plug-in.
Background information
- Analyzer: aliws, which does not return function words, function phrases, or symbols
- Tokenizer: aliws_tokenizer
You can use the analyzer and tokenizer to search for documents. You can also upload a tailored dictionary file to the plug-in. For more information, see Search for a document and Configure dictionaries. If you fail to get the expected results by using the analysis-aliws plug-in, reference Test the analyzer and Test the tokenizer to locate the cause.
Prerequisites
The analysis-aliws plug-in is installed. It is not installed by default.
If the analysis-aliws plug-in is not installed, install it. Make sure that each data node in your Elasticsearch cluster offers at least 4 GiB of memory. If your cluster runs in the production environment, each data node in the cluster must offer at least 8 GiB of memory. For more information about how to install the analysis-aliws plug-in, see Install and remove a built-in plug-in.
- Elasticsearch V5.X clusters do not support the analysis-aliws plug-in.
- If the memory size of data nodes in your cluster does not meet the preceding requirements, upgrade the configuration of your cluster. For more information, see Upgrade the configuration of a cluster.
Limits
Elasticsearch V5.X clusters do not support the analysis-aliws plug-in.
Search for a document
Configure dictionaries
- After the analysis-aliws plug-in is installed, no default dictionary file is provided. You must manually upload a tailored dictionary file.
- Before you upload a tailored dictionary file, you must name the dictionary file aliws_ext_dict.txt.
Test the analyzer
Run the following command to test the aliws analyzer:
GET _analyze
{
"text": "I like go to school.",
"analyzer": "aliws"
}
{
"tokens" : [
{
"token" : "i",
"start_offset" : 0,
"end_offset" : 1,
"type" : "word",
"position" : 0
},
{
"token" : "like",
"start_offset" : 2,
"end_offset" : 6,
"type" : "word",
"position" : 2
},
{
"token" : "go",
"start_offset" : 7,
"end_offset" : 9,
"type" : "word",
"position" : 4
},
{
"token" : "school",
"start_offset" : 13,
"end_offset" : 19,
"type" : "word",
"position" : 8
}
]
}
Test the tokenizer
Run the following command to test the aliws_tokenizer tokenizer:
GET _analyze
{
"text": "I like go to school.",
"tokenizer": "aliws_tokenizer"
}
{
"tokens" : [
{
"token" : "I",
"start_offset" : 0,
"end_offset" : 1,
"type" : "word",
"position" : 0
},
{
"token" : " ",
"start_offset" : 1,
"end_offset" : 2,
"type" : "word",
"position" : 1
},
{
"token" : "like",
"start_offset" : 2,
"end_offset" : 6,
"type" : "word",
"position" : 2
},
{
"token" : " ",
"start_offset" : 6,
"end_offset" : 7,
"type" : "word",
"position" : 3
},
{
"token" : "go",
"start_offset" : 7,
"end_offset" : 9,
"type" : "word",
"position" : 4
},
{
"token" : " ",
"start_offset" : 9,
"end_offset" : 10,
"type" : "word",
"position" : 5
},
{
"token" : "to",
"start_offset" : 10,
"end_offset" : 12,
"type" : "word",
"position" : 6
},
{
"token" : " ",
"start_offset" : 12,
"end_offset" : 13,
"type" : "word",
"position" : 7
},
{
"token" : "school",
"start_offset" : 13,
"end_offset" : 19,
"type" : "word",
"position" : 8
},
{
"token" : ".",
"start_offset" : 19,
"end_offset" : 20,
"type" : "word",
"position" : 9
}
]
}
FAQ
- How do I configure the analysis-aliws plug-in? What is the format of the dictionary file for this plug-in?
- What are the differences among Elasticsearch synonyms, IK tokens, and AliNLP tokens?
- If I use the rolling update method to update dictionaries that are dynamically loaded from OSS and the dictionaries stored in OSS are updated, will the dictionaries on all nodes in my Elasticsearch cluster be automatically updated?