All Products
Search
Document Center

Named entity recognition

Last Updated: Sep 09, 2021

Overview

The Named entity recognition (NER) feature identifies semantic entities with specific meanings in a search query. Query analysis is used to rewrite query clauses based on the weights of identified entity categories to ensure that the retrieved documents meet your expectations. The NER feature of OpenSearch supports only text from the E-commerce industry. The following table describes major entity categories.

Category

Common

Material

Style

Element

Color

Brand

Function

Size

Quality

Scenario

People

Suit

Season

Model

New-release

Series

Marketing

Region

Name

Entertainment

Organization

Movie

Game

Number

Unit

Category

New-word

Adjective

Proper-noun

Category-modifier

Symbol

Prefix

Suffix

Gift

Negative

Agent

Scenarios of the NER feature

In query analysis, the NER feature helps rewrite query clauses and train category prediction models.

Rewrite query clauses based on the NER result

Query analysis can be used to rewrite a query clause twice at most. The first rewritten clause is more accurate than the second rewritten clause in which the number of terms used to retrieve documents is reduced. If the retrieved documents is insufficient when you use the first rewritten clause, you can use the second rewritten clause to increase the number of retrieved documents.

OpenSearch rewrites query clauses based on the priorities of entities. Entities with higher priorities are used to retrieve documents while entities with lower priorities do not affect retrievals but affect sorting. The priorities of entities have three levels: high, medium, and low.

Rules of rewriting query clauses

  1. Entities with high priorities are used to retrieve documents.

  2. Entities with low priorities are not used to retrieve documents.

  3. For entities with medium priorities, their priorities decrease based on a configured order. Rules of using entities with medium priorities:

    1. If a query clause contains entities with high priorities, entities with medium priorities are used to retrieve documents when you use the first rewritten clause. When you use the second rewritten clause, entities with medium priorities are not used.

    2. If a query clause contains no entities with high priorities, entities with medium priorities are used to retrieve documents when you use the first rewritten clause. When you use the second rewritten clause, only entities with the highest priority can be used.

  4. If a query clause contains no entities with high or medium priorities, the NER result is not referenced to rewrite the query clause.

  5. If a query clause contains only entities with high priorities or only entities with medium and low priorities, OpenSearch rewrites the query clause only once.

Example:

Priorities of entities:

High: category
Medium: brand material element style color

1.Query clause that contains entities with high and medium priorities:

query=default:'杨幂同款耐克修身连衣裙包邮'
NER result: 杨幂 (name) 同款 (suffix) 耐克 (brand) 修身 (element) 连衣裙 (category) 包邮 (marketing)

Rewritten query clauses:
First rewritten clause: query=default:'耐克' AND default:'修身' AND default:'连衣裙' RANK default:'杨幂' RANK default:'包邮' RANK default:'同款'
Second rewritten clause: query=default:'连衣裙' RANK default:'杨幂' RANK default:'耐克' RANK default:'包邮' RANK default:'同款' RANK default:'修身'

2.Query clause that contains entities with high and low priorities:

query=default:'连衣裙包邮'
NER result: 连衣裙 (category) 包邮 (marketing)

Rewritten query clause:
query=default:'连衣裙' RANK default:'包邮'

3.Query clause that contains only entities with high priorities:

query=default:'连衣裙'
NER result: 连衣裙 (category)

Rewritten query clause:
query=default:'连衣裙'

4.Query clause that contains entities with medium and low priorities:

query=default:'耐克修身包邮'
NER result: 耐克 (brand) 修身 (element) 包邮 (marketing)

Rewritten query clauses:
First rewritten clause: query=default:'耐克' AND default:'修身' RANK default:'包邮'
Second rewritten clause: query=default:'耐克' RANK default:'修身' RANK default:'包邮'

5.Query clause that contains only entities with low priorities:

query=default:'杨幂同款包邮'
NER result: 杨幂 (name) 同款 (suffix) 包邮 (marketing)

No rewritten query clause is generated based on the NER result.

Use NER with category prediction

The weights of different categories of entities in a query clause are different. If the category prediction result of the original search query is not as expected, OpenSearch removes entities that have less or no relevance to search purposes and conducts category prediction. This contributes to the category prediction of long search queries. Entities of the following categories are reserved:

Category
People
Season
Element
Style

Example:

If the NER result of a query clause is 杨幂 (name) 同款 (suffix) 春季 (season) 修身 (element) 连衣裙 (category), OpenSearch prioritizes entities after specific entities are removed.

春季修身连衣裙
春季连衣裙
修身连衣裙
连衣裙

OpenSearch retrieves documents by using the preceding search queries in order.

Procedure

1.Log on to the OpenSearch console. In the left-side navigation pane, choose Search Algorithm Center > Retrieval Configuration. On the Basic Configuration page, click Query Analysis Rule Management in the left-side pane.

1

2.On the Query Analysis Rule Management page, select an application name and the online or offline version of the application, and click Create.

2

3.In the Add Rule panel, enter a rule name, select indexes, set the Industry Type parameter to E-commerce, select Entity Recognition, and then click OK.

3

Note: In the Configure Entity Type Importance section, you can add or remove entity categories. By default, a built-in dictionary is used. If specific entity categories identified by OpenSearch are wrong, specify an intervention dictionary.

4.After the rule is created, run a search test.

4

View the process of query analysis:

5.After you confirm that the process of query analysis is correct, click Index Orientation on the Query Analysis Rule Management page. Then, specify the created query analysis rule as the default query analysis rule.

7

6.Check the default query analysis rule.

8

Intervention dictionary for NER

The meaning of specific entities varies with business scenarios. OpenSearch allows you to use an intervention dictionary for NER to customize the meaning of specific entities. You can intervene in NER in two aspects: adjust NER results or adjust the priorities of entity categories. If NER results are not ideal, you can use an intervention dictionary for NER to adjust the NER results. To use an intervention dictionary for NER, create the dictionary and configure the dictionary for a query analysis rule. For more information about how to create and use an intervention dictionary for NER, see Intervention dictionaries for NER.