Overview
Entity Recognition, also known as named entity recognition (NER), identifies semantic entities in a search query. Query analysis then uses these results to rewrite the query based on the priority of each entity type. This ensures the retrieved documents match the user's search intent. Currently, the Entity Recognition feature in OpenSearch supports only the e-commerce industry. The main entity types are as follows:
|
Category |
|||
|
Common word |
Material |
Style |
Style element |
|
Color |
Brand |
Functionality |
Size/Specification |
|
Quality/Condition |
Scenario |
People |
Set |
|
Time/season |
Model |
New release |
Series |
|
Marketing service |
Location/region |
Person name |
Cultural products |
|
Organization |
Film/TV title |
Game title |
Number |
|
Unit |
Category |
New word |
Modifier |
|
Proper noun |
Category modifier |
Symbol |
Prefix |
|
Suffix |
Gift |
Negation |
Agent |
How entity recognition works
In query analysis, Entity Recognition is primarily used for query rewriting and category prediction.
Query rewriting
OpenSearch query analysis generates up to two rewritten queries. The first query is more precise. The second query uses fewer terms to broaden the search, which is useful when the first query returns too few results.
Query rewriting is based on entity priority. During retrieval, terms from high-priority entities are kept to ensure they are used for recall. Terms from low-priority entities do not affect recall but only influence algorithmic sorting. Entity priorities fall into three levels: high, medium, and low.
The rules for query rewriting are as follows:
-
High-priority entities are always included in retrieval conditions.
-
Low-priority entities are never included in retrieval conditions.
-
For medium-priority entities, their precedence decreases based on their configured order in query analysis. The rewriting rules are as follows:
-
If a query contains high-priority entities, the system includes medium-priority entities in the first rewritten query but excludes them from the second.
-
If a query contains no high-priority entities, the system includes medium-priority entities in the first rewritten query. In the second, it only includes terms from the highest-ranked medium-priority entity type; all others are excluded.
-
-
If the query contains no high- or medium-priority entities, Entity Recognition results are not used for query rewriting.
-
If a query contains only high-priority entities, or only high- and low-priority entities, the system generates only one rewritten query.
Example:
Assume the entity priorities are set as follows:
High: Category
Medium: Brand, Material, style element, Style, Color
1. Query with high- and medium-priority entities:
query=default:'Yang Mi same style Nike slim-fit dress free shipping'
Entity Recognition result: Yang Mi (Person name) same style (Suffix) Nike (Brand) slim-fit (style element) dress (Category) free shipping (Marketing service)
Rewritten queries:
Query1: (default:'Nike' AND default:'slim-fit' AND default:'dress' RANK default:'Yang Mi' RANK default:'free shipping' RANK default:'same style')
Query2: (default:'dress' RANK default:'Yang Mi' RANK default:'Nike' RANK default:'free shipping' RANK default:'same style' RANK default:'slim-fit')
2. Query with high- and low-priority entities:
query=default:'dress free shipping'
Entity Recognition result: dress (Category) free shipping (Marketing service)
Rewritten query:
Query1: (default:'dress' RANK default:'free shipping')
3. Query with only high-priority entities:
query=default:'dress'
Entity Recognition result: dress (Category)
Rewritten query:
Query1: (default:'dress')
4. Query with medium- and low-priority entities:
query=default:'Nike slim-fit free shipping'
Entity Recognition result: Nike (Brand) slim-fit (style element) free shipping (Marketing service)
Rewritten queries:
Query1: (default:'Nike' AND default:'slim-fit' RANK default:'free shipping')
Query2: (default:'Nike' RANK default:'slim-fit' RANK default:'free shipping')
5. Query with only low-priority entities:
query=default:'Yang Mi same style free shipping'
Entity Recognition result: Yang Mi (Person name) same style (Suffix) free shipping (Marketing service)
No query is rewritten based on Entity Recognition.
Use with category prediction
If a query fails to yield a category prediction, the system removes terms with low relevance to the category and retries the prediction. This process significantly improves category prediction for long-tail queries. The entity types that are kept for this process include:
Category
People
time/season
style element
Style
Example:
For the query Yang Mi (Person name) same style (Suffix) spring (time/season) slim-fit (style element) dress (Category), the queries after term removal are prioritized as follows:
spring slim-fit dress
spring dress
slim-fit dress
dress
The system attempts category prediction by using these queries in the specified order.
Procedure
1. In the OpenSearch console, navigate to retrieval configuration in the left-side navigation pane, and then click query analysis configuration. Select the application name and application type (online/offline), and then click Create.
2. Enter a rule name, select an index range, set Industry Type to Enhanced Query Analysis for E-commerce, select Entity Recognition as the feature, and then click OK.
By default, entity priorities are: High (Brand, Category), Medium (Material, style element, Style, Color, functionality, Scenario, People, time/season, Model, location/region, Person name, Modifier, category modifier), and Low (Size/Specification, Quality/Condition, Set, New release, Series, Marketing service, Cultural products, Organization). You can customize the entity types in each level. Note: When you customize entity priorities, the system uses its built-in dictionary. If the system incorrectly identifies entities, you can use an intervention dictionary to correct the results.
3. After the rule is created, you can test its search performance.
In the list of query analysis configurations, find the target rule, and click Search Test in the Actions column. On the search test page, enter a query (for example, query=title:'Chinese style summer hanfu silk mulberry silk modified version cheongsam dress'), select your query analysis rule name (for example, test_entity) for the qp parameter, and click Search. At the bottom of the page, the Actual Query Executed by System section shows how the query was rewritten into multiple AND conditions for the title field, such as (title:'Chinese' AND title:'style' AND title:'summer' AND title:'hanfu' AND title:'silk' AND title:'mulberry' AND title:'silk' AND title:'modified' AND title:'version' AND title:'cheongsam' AND title:'dress'). You can click View query analysis details to view the detailed analysis. In the search test, the Query analysis details pop-up window displays the rewritten query (a boolean search expression), the original query, and the index name. It also lists the processing results from features in a table, including the specific output of normalization, tokenization, and Entity Recognition, and provides configuration links for spelling correction, stop word, synonym, and category prediction. At the bottom, a link labeled Incorrect query analysis? Go to intervene allows you to navigate to the intervention dictionary to make corrections.
4. After debugging is complete and the rule works as expected, go to the query analysis page and switch to the "Index View" tab. Find your rule and click Set as Default in the Actions column to make it the default rule.
5. After you set a custom query analysis rule as the default, the UI updates as follows:
A [Default] tag is added to the rule name, for example, [Default] test_entity.
Entity recognition intervention dictionary
Because the meaning of a term can vary by business scenario, OpenSearch provides an intervention dictionary for Entity Recognition to let you customize entity meanings. You can intervene in two main ways: by directly correcting the Entity Recognition results or by adjusting entity priorities. If Entity Recognition is inaccurate, use the intervention dictionary to make corrections. Creating and associating an intervention dictionary with your query analysis rule lets you override the default behavior of the Entity Recognition feature. For details about how to configure and use intervention dictionaries, see Intervention Dictionaries.