All Products
Search
Document Center

OpenSearch:NER

Last Updated:Mar 02, 2023

Overview

The named entity recognition (NER) feature identifies semantic entities with specific meanings in a search query. The query analysis feature rewrites query clauses based on the weights of identified entity categories to ensure that the retrieved documents meet your expectations. The NER feature of OpenSearch supports only text from the E-commerce industry. The following table describes major entity categories.

Category

Common

Material

Style

Element

Color

Brand

Function

Size

Quality

Scenario

People

Suit

Season

Model

New-release

Series

Marketing

Region

Name

Entertainment

Organization

Movie

Game

Number

Unit

Category

New-word

Adjective

Proper-noun

Category-modifier

Symbol

Prefix

Suffix

Gift

Negative

Agent

Scenarios of the NER feature

In query analysis, the NER feature helps rewrite query clauses and train category prediction models.

Rewrite query clauses based on the NER result

Query analysis can be used to rewrite a query clause at most twice. The first rewritten clause is more accurate than the second rewritten clause in which the number of terms used to retrieve documents is reduced. If the retrieved documents are insufficient when you use the first rewritten clause, you can use the second rewritten clause to increase the number of retrieved documents.

OpenSearch rewrites query clauses based on the priorities of entities. Entities with higher priorities are used to retrieve documents, whereas entities with lower priorities do not affect retrievals but affect sorting. The priorities of entities have three levels: high, medium, and low.

The following rules are applied to rewrite query clauses:

  1. Entities with high priority are used to retrieve documents.

  2. Entities with low priority are not used to retrieve documents.

  3. For entities with medium priority, their priorities decrease based on the configured order. Rules of using entities with medium priority:

    1. If a query clause contains entities with high priority, entities with medium priority are used to retrieve documents when you use the first rewritten clause. When you use the second rewritten clause, entities with medium priority are not used.

    2. If a query clause contains no entities with high priority, entities with medium priority are used to retrieve documents when you use the first rewritten clause. When you use the second rewritten clause, only entities with the highest priority can be used.

  4. If a query clause contains no entities with high or medium priority, the NER result is not referenced to rewrite the query clause.

  5. If a query clause contains only entities with high priority or only entities with medium and low priorities, OpenSearch rewrites the query clause only once.

Examples

In this example, the following entity categories are used:

High: category
Medium: brand, material, element, style, and color

1. A query clause that contains entities with high and medium priorities:

query=default:'Yang Mi Same-style Nike Slim Dresses Delivery-free'
NER result: Yang Mi (name) Same-style (suffix) Nike (brand) Slim (element) Dresses (category) Delivery-free (marketing)

Rewritten query clauses:
First rewritten clause: query=default:'Nike' AND default:'Slim' AND default:'Dresses' RANK default:'Yang Mi' RANK default:'Delivery-free' RANK default:'Same-style'
Second rewritten clause: query=default:'Dresses' RANK default:'Yang Mi' RANK default:'Nike' RANK default:'Delivery-free' RANK default:'Same-style' RANK default:'Slim'

2. A query clause that contains entities with high and low priorities:

query=default:'Dresses Delivery-free'
NER result: Dresses (category) Delivery-free (marketing)

Rewritten query clause:
query=default:'Dresses' RANK default:'Delivery-free'

3. A query clause that contains only entities with high priority:

query=default:'Dresses'
NER result: Dresses (category)

Rewritten query clause:
query=default:'Dresses'

4. A query clause that contains entities with medium and low priorities:

query=default:'Nike Slim Delivery-free'
NER result: Nike (brand) Slim (element) Delivery-free (marketing)

Rewritten query clauses:
First rewritten clause: query=default:'Nike' AND default:'Slim' RANK default:'Delivery-free'
Second rewritten clause: query=default:'Nike' RANK default:'Slim' RANK default:'Delivery-free'

5. A query clause that contains only entities with low priority:

query=default:'Yang Mi Same-style Delivery-free'
NER result: Yang Mi (name) Same-style (suffix) Delivery-free (marketing)

No rewritten query clause is generated based on the NER result.

Use NER with category prediction

The weights of different categories of entities in a query clause are different. If the category prediction result of the original search query is not as expected, OpenSearch removes entities that have less or no relevance to the search intent and conducts category prediction. This contributes to the category prediction of long search queries. Entities of the following categories are reserved:

Category
People
Season
Element
Style

Example

If the NER result of a query clause is Yang Mi (name) Same-style (suffix) Spring (season) Slim (element) Dresses (category), OpenSearch prioritizes entities after specific entities are removed.

Spring Slim Dresses
Spring Dresses
Slim Dresses
Dresses

OpenSearch retrieves documents by using the preceding search queries in order.

Procedure

1. Log on to the OpenSearch console. In the left-side navigation pane, click Retrieval Configuration. On the Basic Configuration page, click Query Analysis Rule Configuration in the left-side pane. On the Query Analysis Rule Configuration page, select an application and the online or offline version of the application, and click Create.

image

2. In the Create Rule panel, enter a rule name, specify an index range, set the Industry Type parameter to E-commerce, select Entity Recognition, and then click OK.

imageNote: In the Entity Priorities section, you can add or remove entity categories. By default, the built-in dictionary for NER is used. If specific entity categories identified based on the built-in dictionary are invalid, specify an intervention dictionary.

3. After the rule is created, run a search test.

imageThe following figure shows the search results.imageThe following figure shows how to view the process of query analysis.image

4. After you confirm that the process of query analysis is correct, click Index Orientation on the Query Analysis Rule Configuration page. Then, specify the created query analysis rule as the default query analysis rule.

image

5. Check the default query analysis rule.

image

Intervention dictionaries for NER

The meaning of specific entities varies with business scenarios. OpenSearch allows you to use an intervention dictionary for NER to customize the meaning of specific entities. You can intervene in NER in two aspects: adjust NER results or adjust the priorities of entity categories. If NER results are not ideal, you can use an intervention dictionary for NER to adjust the NER results. To use an intervention dictionary for NER, create the dictionary and configure the dictionary for a query analysis rule. For more information about how to create and use an intervention dictionary for NER, see Intervention dictionaries for NER.