All Products
Search
Document Center

OpenSearch:Intervention dictionaries for term weight analysis

Last Updated:Nov 22, 2023

Overview

After you create an intervention dictionary for term weight analysis, you can specify the intervention dictionary when you create or modify a query analysis rule. This way, you can intervene in the analysis of term weights. OpenSearch provides a built-in dictionary to analyze term weights. You can perform the following steps to intervene in term weight analysis:

  1. Create an intervention dictionary for term weight analysis. To create an intervention dictionary, log on to the OpenSearch console. In the left-side navigation pane, choose Search Algorithm Center > Retrieval Configuration. On the Basic Configuration page, click Dictionary Management in the left-side pane. On the Dictionary Management page, click Create. In the Create Query Analysis Dictionary panel, enter a name for the intervention dictionary, select a dictionary type, and then click Save. After the intervention dictionary is created, it appears in the dictionary list.

  2. Add and manage intervention entries in the intervention dictionary. Find the created dictionary in the dictionary list and click Manage Entries in the Actions column to go to the Manage Entries page. On this page, add and manage intervention entries as needed. You can specify a search query in an intervention entry and select an analyzer to segment the search query into terms. Supported analyzers include general analyzers, E-commerce analyzers, IT content analyzers, and custom analyzers that are developed based on one of these analyzer types. Then, you can specify a high, medium, or low priority level for each term.

  3. Use the intervention dictionary. After you add intervention entries to the intervention dictionary, you can associate the intervention dictionary with a query analysis rule on an application as needed.

  4. Test and publish the intervention dictionary. After the intervention dictionary is associated with the query analysis rule, we recommend that you perform a search test before you apply the rule to online environments. This ensures expected search performance.

Intervention rules

Intervention search queries are used to intervene in actual search queries based on the following rules:

  1. Rule 1: If an actual search query fully or partially matches an intervention search query, this intervention entry prevails over other intervention entries. If the actual search query partially matches the intervention search query, the number of matched terms cannot exceed five.

  2. Rule 2: If the terms of an intervention search query match the terms that are closest to the beginning of an actual search query, this intervention search query has a higher priority.

  3. Rule 3: If one or more terms of two intervention search queries match the same terms of an actual search query, the intervention search query that contains more terms matching with the terms in the actual search query takes priority. A maximum of five matched terms are supported. For example, the search query "mysqldatabase" contains two terms, which are "mysql" and "database".

  4. Rule 4: When you associate an intervention dictionary for term weight analysis with a query analysis rule, you can specify whether to ignore spaces in search queries.

Examples

  • Sample intervention search queries:

    1. database permission management: This search query can be segmented into three terms, which are database, permission, and management. The weights of the three terms are 7, 4, and 1 in sequence.

    2. MySQL database: This search query can be segmented into two terms, which are MySQL and database. The weights of the two terms are 7 and 1 in sequence.

    3. database permission: This search query can be segmented into two terms, which are database and permission. The weights of the two terms are 4 and 1 in sequence.

    4. MySQL database permission management in the Linux environment: This search query can be segmented into eight terms, which are MySQL, database, permission, management, in, the, Linux, and environment. The weights of the eight terms are 7, 4, 1, 1, 1, 1, 7, and 1. The number of terms in this search query exceeds five.

  • Actual search queries:

    • MySQL database permission management: The intervention search query "MySQL database" takes effect based on Rule 2.

    • SQL Server database permission management: The intervention search query "database permission management" takes effect based on Rule 3.

    • database permission configuration: The intervention search query "database permission" takes effect based on Rule 2.

    • configure database permissions: No intervention search query is matched.

    • MySQL database permission management in the Linux environment: The intervention search query "MySQL database permission management in the Linux environment" takes effect based on Rule 1.

    • instructions on the MySQL database permission management in the Linux environment: This actual search query partially matches the intervention search query "MySQL database permission management in the Linux environment". However, the number of matched terms after analysis exceeds five. As a result, the intervention search query does not take effect.

    • MySQL database permission configuration: The intervention search queries "MySQL database" and "database permission" take effect.

Important

  • When OpenSearch intervenes in a search query, the search query may be modified to two versions. The first version is used to retrieve documents based on the terms whose weight is 7 or 4. The second version is used for a re-search. By default, a re-search is required only when no search results are returned by using the first version. To increase the number of retrieved documents, the re-search is performed based only on the terms whose weight is 7 in the second version.

  • Error code 6612: term_weight makeup data fail. This error indicates that no intervention search query takes effect.

  • When you add intervention entries to an intervention dictionary for term weight analysis, you can configure a tailored model analyzer only for exclusive applications.

Example

Scenario: You have created query analysis rules with term weight analysis capabilities for the OpenSearch application of your E-commerce shopping guide service. After you apply these rules to the online application, the returned search results are unsatisfactory. To resolve the issue, intervention in term weight analysis is implemented.

Problem description: A user enters the search query data permission management. After term weight analysis, the query clause is rewritten as default:'permission' RANK default:'data' RANK default:'management'. However, the user wants to hit "data" instead of "permission" in the search query.

Cause: The built-in dictionary for term weight analysis cannot meet requirements.

Solution: Create an intervention dictionary for term weight analysis and associate the intervention dictionary with a query analysis rule on the online application.

Procedure:

  1. Log on to the OpenSearch console. In the left-side navigation pane, choose Search Algorithm Center > Retrieval Configuration. On the Basic Configuration page, click Dictionary Management in the left-side pane. On the Dictionary Management page, click Create.

image

In the Create Query Analysis Dictionary panel, enter a name for the intervention dictionary and set the Dictionary Type parameter to Term Weight.

image

  1. Find the created intervention dictionary and click Manage Entries in the Actions column. On the page that appears, click Add Intervention Entry. In the Add Intervention Entries panel, enter a search query, set the Analyzer Type parameter to Built-in Analyzer or Custom Analyzer, and then specify the weight for each term. You can select an analyzer type based on the type of analyzer used by indexes specified in the query analysis rule that you want to use.

imageIf a tailored model analyzer is used for indexes specified in the query analysis rule, set the Analyzer Type parameter to Tailored Model Analyzer, select the name of your application from the Select Instance drop-down list, and then select the tailored model analyzer.

image

  1. On the Query Analysis Rule Configuration page, associate the created intervention dictionary for term weight analysis with a query analysis rule. Do not apply the rule to the online application in this step.

image

Important

You can specify whether to ignore spaces in search queries during intervention in term weight analysis. By default, spaces are not ignored. For example, the actual search query is "sql database". The intervention search query is "sqldatabase". If you choose to ignore spaces, intervention is implemented based on the term weights that are specified for the intervention search query. If you choose not to ignore spaces, intervention is not implemented.

  1. Perform a search test. In this example, the weight of the term "data" is higher than that of the term "permission".

image

Use an SDK to create an intervention dictionary for term weight analysis

Maven dependencies for the SDK for Java

<dependency>
    <groupId>com.aliyun</groupId>
    <artifactId>aliyun-java-sdk-opensearch</artifactId>
    <version>0.7.0</version>
</dependency>
<dependency>
    <groupId>com.aliyun</groupId>
    <artifactId>aliyun-java-sdk-core</artifactId>
    <version>4.5.0</version>
</dependency>

Demo of the SDK for Java

public class TestTermWeightingInQueryProcessor {
    private static DefaultAcsClient client;
    public static void main(String[] args) throws Exception {
        String regionId = "cn-hangzhou"; // region Id
        IClientProfile profile = DefaultProfile.getProfile(regionId, "{ak}", "{secret}");
        DefaultProfile.addEndpoint(regionId, regionId, "Opensearch", "opensearch." + regionId + ".aliyuncs.com");
        DefaultAcsClient client = new DefaultAcsClient(profile);
        String dictionaryName = "Name of the intervention dictionary"; // The name of the intervention dictionary that you want to create.
        String appName = "Application name"; // The name of the application that you want to use.
        int versionId = 1234;  // The version ID of the application.
//        System.out.println("List intervention dictionaries");
//        listInterventionDictionaries();
        Thread.sleep(10000);
        System.out.println("Create intervention dictionary: " + dictionaryName);
        createDictionary(dictionaryName);
//
//        Thread.sleep(10000);
//        System.out.println("Describe intervention dictionary");
//        describeInterventionDictionary(dictionaryName);
//
//        Thread.sleep(10000);
//        System.out.println("List intervention dictionary entries before");
//        listEntries(dictionaryName);
        Thread.sleep(10000);
        System.out.println("Post dictionary entries added");
        postEntries(dictionaryName, "add");
        Thread.sleep(10000);
        System.out.println("List intervention dictionary entries after add");
        listEntries(dictionaryName);
        Thread.sleep(10000);
        System.out.print("Set intervention dictionary to qp");
        setQueryProcessor(appName, versionId, dictionaryName);
        // Proceed with caution. Test the query analysis rule in a search test. Do not default the query analysis rule until it satisfies your requirements.
        Thread.sleep(10000);
        System.out.println("Set default query processor");
        setDefaultQueryProcessor(appName, versionId, dictionaryName);
//        Thread.sleep(10000);
//        System.out.println("Delete dictionary");
//        deleteDictionary(dictionaryName);
    }
    public static void listInterventionDictionaries() throws ClientException {
        ListInterventionDictionariesRequest listInterventionDictionariesRequest = new ListInterventionDictionariesRequest();
        listInterventionDictionariesRequest.setPageSize(50);
        HttpResponse response = client.doAction(listInterventionDictionariesRequest);
        System.out.println(response.getHttpContentString());
    }
    public static void createDictionary(String dictionaryName) throws UnsupportedEncodingException, ClientException {
        CreateInterventionDictionaryRequest request = new CreateInterventionDictionaryRequest();
        String body = "{\"name\": \"" + dictionaryName + "\", \"type\": \"term_weighting\"}";
        request.setHttpContent(body.getBytes("UTF-8"), "UTF-8", FormatType.JSON);
        HttpResponse response = client.doAction(request);
        System.out.println(response.getHttpContentString());
    }
    public static void describeInterventionDictionary(String dictionaryName) throws ClientException {
        DescribeInterventionDictionaryRequest request = new DescribeInterventionDictionaryRequest();
        request.setName(dictionaryName);
        HttpResponse response = client.doAction(request);
        System.out.println(response.getHttpContentString());
    }
    public static void listEntries(String dictionaryName) throws ClientException {
        ListInterventionDictionaryEntriesRequest request = new ListInterventionDictionaryEntriesRequest();
        request.setName(dictionaryName);
        HttpResponse response = client.doAction(request);
        System.out.println(response.getHttpContentString());
    }
    public static void postEntries(String dictionaryName, String cmd) throws UnsupportedEncodingException, ClientException {
        PushInterventionDictionaryEntriesRequest request = new PushInterventionDictionaryEntriesRequest();
        request.setName(dictionaryName);
        // Replace the following search query with the one that you want to intervene in. In this example, the term weights are 7, 4, and 1 in descending order.
        String body = "[{\n" +
                "  \"word\": \"data permission management\",\n" +
                "  \"cmd\": \"" + cmd + "\",\n" +
                "  \"tokens\": [\n" +
                "    {\n" +
                  "      \"token\": \"data\",\n" +
                "      \"weight\": 7\n" +
                "    },\n" +
                "    {\n" +
                  "      \"token\": \"permission\",\n" +
                "      \"weight\": 4\n" +
                "    },\n" +
                "    {\n" +
                    "      \"token\": \"management\",\n" +
                "      \"weight\": 1\n" +
                "    }\n" +
                "  ]\n" +
                "}]";
        request.setHttpContent(body.getBytes("UTF-8"), "UTF-8", FormatType.JSON);
        HttpResponse response = client.doAction(request);
        System.out.println(response.getHttpContentString());
    }
    public static void deleteDictionary(String dictionaryName) throws ClientException {
        RemoveInterventionDictionaryRequest request = new RemoveInterventionDictionaryRequest();
        request.setName(dictionaryName);
        HttpResponse response = client.doAction(request);
        System.out.println(response.getHttpContentString());
    }
    public static void setQueryProcessor(String appName, int versionId, String dictionaryName) throws UnsupportedEncodingException, ClientException {
        CreateQueryProcessorRequest request = new CreateQueryProcessorRequest();
        request.setAppGroupIdentity(appName);
        request.setAppId(versionId);
        
        String body = "{\"name\":\""+  dictionaryName +"\",\"domain\":\"GENERAL\",\"indexes\":[\"default\"],\"processors\":[{\"name\":\"term_weighting\",\"useSystemDictionary\":true, \"interventionDictionary\":\""+dictionaryName+"\"}]}";
        request.setHttpContent(body.getBytes("UTF-8"), "UTF-8", FormatType.JSON);
        HttpResponse response = client.doAction(request);
        System.out.println(response.getHttpContentString());
    }
    public static void setDefaultQueryProcessor(String appName, int versionId, String dictionaryName) throws UnsupportedEncodingException, ClientException {
        ModifyQueryProcessorRequest request = new ModifyQueryProcessorRequest();
        request.setAppGroupIdentity(appName);
        request.setAppId(versionId);
        request.setName(dictionaryName);
        String body = "{\"active\":true}";
        request.setHttpContent(body.getBytes("UTF-8"), "UTF-8", FormatType.JSON);
        HttpResponse response = client.doAction(request);
        System.out.println(response.getHttpContentString());
    }
    public static void deleteQueryProcessor(String appName, int versionId, String dictionaryName) throws ClientException {
        RemoveQueryProcessorRequest request = new RemoveQueryProcessorRequest();
        request.setAppGroupIdentity(appName);
        request.setAppId(versionId);
        request.setName(dictionaryName);
        HttpResponse response = client.doAction(request);
        System.out.println(response.getHttpContentString());
    }
    public static void listQueryProcessors(String appName, int versionId) throws ClientException {
        ListQueryProcessorsRequest request = new ListQueryProcessorsRequest();
        request.setAppGroupIdentity(appName);
        request.setAppId(versionId);
        HttpResponse response = client.doAction(request);
        System.out.println(response.getHttpContentString());
    }
}
Note

  • For more information about the data structure of an intervention entry, see InterventionDictionaryEntry.

  • If English is included in the query, please use all lowercase English.

  • For more information about the data structure of a query analysis rule, see QueryProcessor.