All Products
Search
Document Center

Intervention dictionaries for term weight analysis

Last Updated: Sep 15, 2021

Overview

After you create an intervention dictionary for term weight analysis, you can select the dictionary when you create or modify a query analysis rule. This way, you can intervene in the analysis of term weights. OpenSearch is built in with a dictionary to analyze term weight. You can perform the following steps to intervene in term weight analysis:

  1. Create an intervention dictionary for term weight analysis. To create an intervention dictionary, log on to the OpenSearch console. In the left-side navigation pane, choose Search Algorithm Center > Retrieval Configuration. On the Basic Configuration page, click Dictionary Management in the left-side pane. On the Dictionary Management page, click Create in the upper-right corner. Specify a name for the intervention dictionary, select a dictionary type, and then click Save. After the intervention dictionary is created, it appears in the dictionary list.

  2. Create and manage intervention entries in the intervention dictionary. To go to the entry management page of the created dictionary, find the dictionary in the dictionary list and click Manage Entries in the Actions column. On this page, create and manage intervention entries as needed. You can specify a search query in an intervention entry and select an analyzer to segment the search query into terms. Supported analyzers include the general analyzer, the e-commerce analyzer, the IT content analyzer, and custom analyzers that are developed based on one of these analyzers. Then, you can specify a high, medium, or low for each term.

  3. Use the intervention dictionary. After you create intervention entries in the intervention dictionary, you can use the intervention dictionary in a query analysis rule on an application as needed.

  4. Test and publish the intervention dictionary. After the intervention dictionary is associated with the query analysis rule, we recommend that you perform a search test before you apply the rule to online environments. This ensures expected search performance.

Intervention rules

Intervention search queries are used to intervene in actual search queries based on the following rules:

  1. Rule 1: If an intervention search query fully or partially matches an actual search query, this intervention entry prevails over other intervention entries. If the actual search query is partially matched, the number of the matched terms cannot exceed five.

  2. Rule 2: If the terms of an intervention search query match the terms that are closest to the beginning of an actual search query, this intervention search query has a higher priority.

  3. Rule 3: If one or more terms of two intervention search queries match the same terms of an actual search query, the intervention search query that contains more terms matching with the terms in the actual search query takes a priority. A maximum of five matched terms are supported. For example, the search query "mysqldatabase" contains two terms, which are "mysql" and "database".

  4. Rule 4: When you associate an intervention dictionary for term weight analysis with a query analysis rule, you can specify whether to ignore spaces in search queries.

Example on intervention rules

  • Sample intervention search queries:

    1. Database permission management: This search query can be segmented into three terms, which are database, permission, and management. The weights of the three terms are 7, 4, and 1 in sequence.

    2. MySQL database: This search query can be segmented into two terms, which are MySQL and database. The weights of the two terms are 7 and 1 in sequence.

    3. Database permission: This search query can be segmented into two terms, which are database and permission. The weights of the two terms are 4 and 1 in sequence.

    4. MySQL database permission management in the Linux environment: This search query can be segmented into eight terms, which are MySQL, database, permission, management, in, the, Linux, and environment. The weights of the eight terms are 7, 4, 1, 1, 1, 1, 7, and 1. The number of the terms in this search query exceeds five.

  • Actual search queries

    • MySQL database permission management: The intervention search query b takes effect based on Rule 2.

    • SQLServer database permission management: The intervention search query a takes effect based on Rule 3.

    • Database permission configuration: The intervention search query c takes effect based on Rule 2.

    • Configure database permissions: No intervention search query is matched.

    • MySQL database permission management in the Linux environment: The intervention search query d takes effect based on Rule 1.

    • Instructions on the MySQL database permission management in the Linux environment: The intervention search query d partially matches this actual search query. However, the number of terms after segmentation exceeds five. As a result, the intervention search query d does not take effect.

    • MySQL database permission configuration: The intervention search queries b and c take effect.

Note:

  • When OpenSearch intervenes in a search query, the search query may be modified to two versions. The first version is used to retrieve documents based on the terms whose weight is 7 or 4. The second version is used for a second search. By default, a second search is required only when no search results are returned by using the first version. To increase the number of retrieved documents, the second search is performed based on only the terms whose weight is 4 in the second version.

  • Error code 6612: term_weight makeup data fail. This error indicates that no intervention search query takes effect.

Example

Scenario: You have created a query analysis rule with term weight analysis enabled for the OpenSearch application of your e-commerce shopping guide service. After you apply this rule to the online application, the returned search results are unsatisfactory. To resolve the issue, intervention in term weight analysis is implemented.

Unsatisfactory search results: The search query data permission management is modified based on the following rule: default:'permission' RANK default:'data' RANK default:'management'. However, the target term in the original search query is "data" instead of "permission".

Problem description: The built-in intervention dictionary for term weight analysis cannot meet your requirements. You must intervene in term weight analysis.

Solution: Create an intervention dictionary and associate the intervention dictionary with a query analysis rule that is used for the online application.

Procedure:

1.Log on to the OpenSearch console. In the left-side navigation pane, choose Search Algorithm Center > Retrieval Configuration. On the Basic Configuration page, click Dictionary Management in the left-side pane. On the Dictionary Management page, click Create in the upper-right corner to create an intervention dictionary.

1

In the Create Query Analysis Dictionary panel, specify a name for the intervention dictionary and set the Dictionary Type parameter to Term Weight.

2

2.Find the created intervention dictionary and click Manage Entries in the Actions column. On the page that appears, click Add Intervention Entry. In the Add Intervention Entry panel, enter a search query, select an analyzer, and specify the weight for each term.

3

3.Go to the Query Analysis page and click Create in the upper-right corner. In the Add Rule panel, associate the created intervention dictionary for term weight analysis with the query analysis rule. Do not publish the rule in this step.

4

Note: You can specify whether to ignore spaces in search queries during intervention. By default, spaces are not ignored. For example, the actual search query is sql database. The intervention search query is sqldatabase. If you choose to ignore spaces, the intervention is implemented based on the term weights that are specified in the intervention search query. If you choose to not ignore spaces, no intervention is implemented.

4.Perform a search test by increasing the weight of the term "data".

Use an SDK to create an intervention dictionary for term weight analysis

Maven dependencies for the SDK for Java:

<dependency>
    <groupId>com.aliyun</groupId>
    <artifactId>aliyun-java-sdk-opensearch</artifactId>
    <version>0.7.0</version>
</dependency>
<dependency>
    <groupId>com.aliyun</groupId>
    <artifactId>aliyun-java-sdk-core</artifactId>
    <version>4.5.0</version>
</dependency>

Demo of the SDK for Java:

public class TestTermWeightingInQueryProcessor {
    private static DefaultAcsClient client;
    public static void main(String[] args) throws Exception {
        String regionId = "cn-hangzhou"; // region Id
        IClientProfile profile = DefaultProfile.getProfile(regionId, "{ak}", "{secret}");
        DefaultProfile.addEndpoint(regionId, regionId, "Opensearch", "opensearch." + regionId + ".aliyuncs.com");
        DefaultAcsClient client = new DefaultAcsClient(profile);
        String dictionaryName = "Name of the analyzer for which you want to create an intervention dictionary"; // The name of the analyzer for which you want to create an intervention dictionary.
        String appName = "Application name"; // The name of the application that you want to use.
        int versionId = 1234;  // The version ID of the application.
//        System.out.println("List intervention dictionaries");
//        listInterventionDictionaries();
        Thread.sleep(10000);
        System.out.println("Create intervention dictionary: " + dictionaryName);
        createDictionary(dictionaryName);
//
//        Thread.sleep(10000);
//        System.out.println("Describe intervention dictionary");
//        describeInterventionDictionary(dictionaryName);
//
//        Thread.sleep(10000);
//        System.out.println("List intervention dictionary entries before");
//        listEntries(dictionaryName);
        Thread.sleep(10000);
        System.out.println("Post dictionary entries added");
        postEntries(dictionaryName, "add");
        Thread.sleep(10000);
        System.out.println("List intervention dictionary entries after add");
        listEntries(dictionaryName);
        Thread.sleep(10000);
        System.out.print("Set intervention dictionary to qp");
        setQueryProcessor(appName, versionId, dictionaryName);
        // Proceed with caution. Test the query analysis rule in a search test. Do not default the query analysis rule until it satisfies your requirements.
        Thread.sleep(10000);
        System.out.println("Set default query processor");
        setDefaultQueryProcessor(appName, versionId, dictionaryName);
//        Thread.sleep(10000);
//        System.out.println("Delete dictionary");
//        deleteDictionary(dictionaryName);
    }
    public static void listInterventionDictionaries() throws ClientException {
        ListInterventionDictionariesRequest listInterventionDictionariesRequest = new ListInterventionDictionariesRequest();
        listInterventionDictionariesRequest.setPageSize(50);
        HttpResponse response = client.doAction(listInterventionDictionariesRequest);
        System.out.println(response.getHttpContentString());
    }
    public static void createDictionary(String dictionaryName) throws UnsupportedEncodingException, ClientException {
        CreateInterventionDictionaryRequest request = new CreateInterventionDictionaryRequest();
        String body = "{\"name\": \"" + dictionaryName + "\", \"type\": \"term_weighting\"}";
        request.setHttpContent(body.getBytes("UTF-8"), "UTF-8", FormatType.JSON);
        HttpResponse response = client.doAction(request);
        System.out.println(response.getHttpContentString());
    }
    public static void describeInterventionDictionary(String dictionaryName) throws ClientException {
        DescribeInterventionDictionaryRequest request = new DescribeInterventionDictionaryRequest();
        request.setName(dictionaryName);
        HttpResponse response = client.doAction(request);
        System.out.println(response.getHttpContentString());
    }
    public static void listEntries(String dictionaryName) throws ClientException {
        ListInterventionDictionaryEntriesRequest request = new ListInterventionDictionaryEntriesRequest();
        request.setName(dictionaryName);
        HttpResponse response = client.doAction(request);
        System.out.println(response.getHttpContentString());
    }
    public static void postEntries(String dictionaryName, String cmd) throws UnsupportedEncodingException, ClientException {
        PushInterventionDictionaryEntriesRequest request = new PushInterventionDictionaryEntriesRequest();
        request.setName(dictionaryName);
        // Replace the following search query with the one that you want to intervene in. In this example, the term weights are 7, 4, and 1 in descending order.
       // For more information about the data structure, see InterventionDictionaryEntry (https://www.alibabacloud.com/help/doc-detail/173606.html)
                "  \"word\": \"data permission management\",\n" +
                "  \"cmd\": \"" + cmd + "\",\n" +
                "  \"tokens\": [\n" +
                "    {\n" +
                "      \"token\": \"data\",\n" +
                "      \"weight\": 7\n" +
                "    },\n" +
                "    {\n" +
                "      \"token\": \"permission\",\n" +
                "      \"weight\": 4\n" +
                "    },\n" +
                "    {\n" +
                "      \"token\": \"management\",\n" +
                "      \"weight\": 1\n" +
                "    }\n" +
                "  ]\n" +
                "}]";
        request.setHttpContent(body.getBytes("UTF-8"), "UTF-8", FormatType.JSON);
        HttpResponse response = client.doAction(request);
        System.out.println(response.getHttpContentString());
    }
    public static void deleteDictionary(String dictionaryName) throws ClientException {
        RemoveInterventionDictionaryRequest request = new RemoveInterventionDictionaryRequest();
        request.setName(dictionaryName);
        HttpResponse response = client.doAction(request);
        System.out.println(response.getHttpContentString());
    }
    public static void setQueryProcessor(String appName, int versionId, String dictionaryName) throws UnsupportedEncodingException, ClientException {
        CreateQueryProcessorRequest request = new CreateQueryProcessorRequest();
        request.setAppGroupIdentity(appName);
        request.setAppId(versionId);
        // For more information about search query processors, see QueryProcessor (https://www.alibabacloud.com/help/doc-detail/170014.html)
        String body = "{\"name\":\""+  dictionaryName +"\",\"domain\":\"GENERAL\",\"indexes\":[\"default\"],\"processors\":[{\"name\":\"term_weighting\",\"useSystemDictionary\":true, \"interventionDictionary\":\""+dictionaryName+"\"}]}";
        request.setHttpContent(body.getBytes("UTF-8"), "UTF-8", FormatType.JSON);
        HttpResponse response = client.doAction(request);
        System.out.println(response.getHttpContentString());
    }
    public static void setDefaultQueryProcessor(String appName, int versionId, String dictionaryName) throws UnsupportedEncodingException, ClientException {
        ModifyQueryProcessorRequest request = new ModifyQueryProcessorRequest();
        request.setAppGroupIdentity(appName);
        request.setAppId(versionId);
        request.setName(dictionaryName);
        String body = "{\"active\":true}";
        request.setHttpContent(body.getBytes("UTF-8"), "UTF-8", FormatType.JSON);
        HttpResponse response = client.doAction(request);
        System.out.println(response.getHttpContentString());
    }
    public static void deleteQueryProcessor(String appName, int versionId, String dictionaryName) throws ClientException {
        RemoveQueryProcessorRequest request = new RemoveQueryProcessorRequest();
        request.setAppGroupIdentity(appName);
        request.setAppId(versionId);
        request.setName(dictionaryName);
        HttpResponse response = client.doAction(request);
        System.out.println(response.getHttpContentString());
    }
    public static void listQueryProcessors(String appName, int versionId) throws ClientException {
        ListQueryProcessorsRequest request = new ListQueryProcessorsRequest();
        request.setAppGroupIdentity(appName);
        request.setAppId(versionId);
        HttpResponse response = client.doAction(request);
        System.out.println(response.getHttpContentString());
    }
}