analysis-aliws is a built-in plug-in of Alibaba Cloud Elasticsearch. This plug-in integrates an analyzer and a tokenizer into Elasticsearch to implement document analysis and retrieval.

Prerequisites

The analysis-aliws plug-in is installed. It is not installed by default.

If it is not installed, install it. Make sure that your Elasticsearch cluster has at least 4 GiB of memory. For more information about the installation procedure, see Install and remove a built-in plug-in.

Notice If your cluster is in a production environment, make sure that your cluster has at least 8 GiB of memory. If the memory capacity does not meet this requirement, upgrade the memory. For more information, see Upgrade the cluster configuration.

Background information

After the analysis-aliws plug-in is installed, the following analyzer and tokenizer are integrated into Elasticsearch:
  • Analyzer: aliws, which does not return function words, function phrases, or symbols
  • Tokenizer: aliws_tokenizer

You can use the analyzer and tokenizer to search for documents that contain specific content.

Procedure

  1. Log on to the Kibana console of your Elasticsearch cluster.
    For more information about how to log on to the console, see Log on to the Kibana console.
  2. In the left-side navigation pane, click Dev Tools.
  3. On the Console tab, run the following command to create an index:
    PUT /index
    {
        "mappings": {
            "fulltext": {
                "properties": {
                    "content": {
                        "type": "text",
                        "analyzer": "aliws"
                    }
                }
            }
        }
    }

    The preceding command creates an index named index. The type of the index is fulltext. The index contains a content property. The type of the property is text. The command also adds the aliws analyzer.

    If the command succeeds, the following result is returned:
    {
      "acknowledged": true,
      "shards_acknowledged": true,
      "index": "index"
    }
  4. Run the following command to add a document:
    POST /index/fulltext/1
    {
      "content": "I like go to school."
    }

    The preceding command adds a document named 1, and sets the value of the content field in the document to I like go to school.

    If the command succeeds, the following result is returned:
    {
      "_index": "index",
      "_type": "fulltext",
      "_id": "1",
      "_version": 1,
      "result": "created",
      "_shards": {
        "total": 2,
        "successful": 2,
        "failed": 0
      },
      "_seq_no": 0,
      "_primary_term": 1
    }
  5. Run the following command to search for the document:
    GET /index/fulltext/_search
    {
      "query": {
        "match": {
          "content": "school"
        }
      }
    }

    The preceding command uses the aliws analyzer to analyze all fulltext-type documents, and returns the document that has school contained in the content field.

    If the command succeeds, the following result is returned:
    {
      "took": 5,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": 1,
        "max_score": 0.2876821,
        "hits": [
          {
            "_index": "index",
            "_type": "fulltext",
            "_id": "2",
            "_score": 0.2876821,
            "_source": {
              "content": "I like go to school."
            }
          }
        ]
      }
    }
Note If you fail to obtain the expected results by using the analysis-aliws plug-in, find the cause by following the instructions provided in Test the analyzer and Test the tokenizer.

Test the analyzer

Run the following command to test the aliws analyzer:

GET _analyze
{
  "text": "I like go to school.",
  "analyzer": "aliws"
}
If the command succeeds, the following result is returned:
{
  "tokens" : [
    {
      "token" : "i",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "like",
      "start_offset" : 2,
      "end_offset" : 6,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "go",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "word",
      "position" : 4
    },
    {
      "token" : "school",
      "start_offset" : 13,
      "end_offset" : 19,
      "type" : "word",
      "position" : 8
    }
  ]
}

Test the tokenizer

Run the following command to test the aliws_tokenizer tokenizer:

GET _analyze
{
  "text": "I like go to school.",
  "tokenizer": "aliws_tokenizer"
}
If the command succeeds, the following result is returned:
{
  "tokens" : [
    {
      "token" : "I",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : " ",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "like",
      "start_offset" : 2,
      "end_offset" : 6,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : " ",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "go",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "word",
      "position" : 4
    },
    {
      "token" : " ",
      "start_offset" : 9,
      "end_offset" : 10,
      "type" : "word",
      "position" : 5
    },
    {
      "token" : "to",
      "start_offset" : 10,
      "end_offset" : 12,
      "type" : "word",
      "position" : 6
    },
    {
      "token" : " ",
      "start_offset" : 12,
      "end_offset" : 13,
      "type" : "word",
      "position" : 7
    },
    {
      "token" : "school",
      "start_offset" : 13,
      "end_offset" : 19,
      "type" : "word",
      "position" : 8
    },
    {
      "token" : ".",
      "start_offset" : 19,
      "end_offset" : 20,
      "type" : "word",
      "position" : 9
    }
  ]
}