All Products
Search
Document Center

OpenSearch:Basic built-in features

Last Updated:Jan 26, 2024

This topic describes the basic built-in features used in custom sorting models.

Diagram of basic features

image.png

Types of basic features

The basic features are classified into two categories: item features and user features.

Item features:

  • Field features: Select the feature fields that you want to process. By default, all fields in an application are supported. Then, select a processing method for each feature field. For example, for feature fields of the text type, you can select tokenization or vectorization. For feature fields of the numeric type, select original value mapping. If the required feature fields are not in the application, you can add them by using a MaxCompute external table.

  • Statistical features: The system collects statistics such as the number of impressions, the number of clicks, and the click-through rate (CTR) of an item in the previous seven days based on the search logs and collected behavior data of the application.

User features:

  • User profile features: You can use a MaxCompute external table to import user profile features for model training. When user behaviors are predicted, the query field is used to pass parameters. This type of feature is not supported in OpenSearch.

  • Query features: You can pass query features by specifying raw_query in a query request. This type of features determines query-related information, such as query tokens and vectorization.

Processing methods for item features:

  • Original value mapping

  • Tokenization

  • Generating lookup features after tokenization

  • Counting the number of terms after tokenization

Examples:

For example, the content of the field to be processed is "White T-shirt". The following list shows the result of each processing method:

Original value mapping: "White T-shirt".

Tokenization: "White^]T-shirt". The query tokens are separated by ^].

Generating lookup features after tokenization: "White:White^]T-shirt:T-shirt".

Counting the number of terms after tokenization: 2.

Built-in item features

Field name (Feature name)

Field type

Description

system_item_id

STRING

The ID of the item. It is the unique identifier of an item.

system_all_nid_ctr_30

BIGINT

The CRT of the item in the previous 30 days. The feature is discretized.

system_all_nid_ctr_7

BIGINT

The CRT of the item in the previous seven days. The feature is discretized.

system_all_nid_ctr_1

BIGINT

The CRT of the item within the previous day. The feature is discretized.

system_all_nid_pv_30

BIGINT

The number of impressions of the item in the previous 30 days. The feature is discretized.

system_all_nid_pv_7

BIGINT

The number of impressions of the item in the previous seven days. The feature is discretized.

system_all_nid_pv_1

BIGINT

The number of impressions of the item within the previous day. The feature is discretized.

system_all_nid_ipv_30

BIGINT

The number of clicks of the item in the previous 30 days. The feature is discretized.

system_all_nid_ipv_7

BIGINT

The number of clicks of the item in the previous seven days. The feature is discretized.

system_all_nid_ipv_1

BIGINT

The number of clicks of the item within the previous day. The feature is discretized.

system_query_score_decay

STRING

The click ratios of the top N queries relevant to the item. The default value of N in top N queries is 20.

Example: 'query1:score1^]query2:score2'.

system_qterm_score_decay

STRING

The click ratios of the top N query terms relevant to the item. The default value of N in top N query terms is 300.

Example: 'term1:score1^]term2:score2'.

system_query_ctr_decay

STRING

The CTRs of the top N queries relevant to the item.

Example: 'query1:ctr1^]query2:ctr2'.

system_qterm_ctr_decay

STRING

The CTRs of the top N query terms relevant to the item.

Example: 'term1:ctr1^]term2:ctr2'.

system_query_match_decay

STRING

The search queries and their matches in the top N queries relevant to the item.

Example: 'query1:query1^]query2:query2'.

system_qterm_match_decay

STRING

The search query terms and their matches in the top N query terms relevant to the item.

Example: 'term1:term1^]term2:term2'.

system_query_seq_decay

STRING

The top N queries relevant to the item. It is a multi-valued feature.

Example: 'query1^]query2'.

system_qterm_seq_decay

STRING

The top N query terms relevant to the item. It is a multi-valued feature.

Example: 'term1^]term2'.

system_query_cnt

BIGINT

The number of values in the system_query_seq_decay feature.

system_qterm_cnt

BIGINT

The number of values in the system_qterm_seq_decay feature.

dt

STRING

The time partition by day. Example: 20230316.

Built-in user features

The following table describes the built-in user features.

Feature name

Type

Description

system_exp_time

STRING

The date on which the behavior is performed. The value is a day of the week. Examples: Monday and Tuesday.

system_terms2

STRING

The first 15 query tokens in the list.

system_user_id

STRING

The user ID.

system_raw_q_ultra

STRING

The original query before tokenization.

system_term_seq

STRING

The sequence feature of the query.

system_term_seq_length

DOUBLE

The length of the sequence feature of the query.