This topic describes the basic built-in features used in custom sorting models.
Diagram of basic features
Types of basic features
The basic features are classified into two categories: item features and user features.
Item features:
Field features: Select the feature fields that you want to process. By default, all fields in an application are supported. Then, select a processing method for each feature field. For example, for feature fields of the text type, you can select tokenization or vectorization. For feature fields of the numeric type, select original value mapping. If the required feature fields are not in the application, you can add them by using a MaxCompute external table.
Statistical features: The system collects statistics such as the number of impressions, the number of clicks, and the click-through rate (CTR) of an item in the previous seven days based on the search logs and collected behavior data of the application.
User features:
User profile features: You can use a MaxCompute external table to import user profile features for model training. When user behaviors are predicted, the query field is used to pass parameters. This type of feature is not supported in OpenSearch.
Query features: You can pass query features by specifying raw_query in a query request. This type of features determines query-related information, such as query tokens and vectorization.
Processing methods for item features:
Original value mapping
Tokenization
Generating lookup features after tokenization
Counting the number of terms after tokenization
Examples:
For example, the content of the field to be processed is "White T-shirt". The following list shows the result of each processing method:
Original value mapping: "White T-shirt".
Tokenization: "White^]T-shirt". The query tokens are separated by ^].
Generating lookup features after tokenization: "White:White^]T-shirt:T-shirt".
Counting the number of terms after tokenization: 2.
Built-in item features
Field name (Feature name) | Field type | Description |
system_item_id | STRING | The ID of the item. It is the unique identifier of an item. |
system_all_nid_ctr_30 | BIGINT | The CRT of the item in the previous 30 days. The feature is discretized. |
system_all_nid_ctr_7 | BIGINT | The CRT of the item in the previous seven days. The feature is discretized. |
system_all_nid_ctr_1 | BIGINT | The CRT of the item within the previous day. The feature is discretized. |
system_all_nid_pv_30 | BIGINT | The number of impressions of the item in the previous 30 days. The feature is discretized. |
system_all_nid_pv_7 | BIGINT | The number of impressions of the item in the previous seven days. The feature is discretized. |
system_all_nid_pv_1 | BIGINT | The number of impressions of the item within the previous day. The feature is discretized. |
system_all_nid_ipv_30 | BIGINT | The number of clicks of the item in the previous 30 days. The feature is discretized. |
system_all_nid_ipv_7 | BIGINT | The number of clicks of the item in the previous seven days. The feature is discretized. |
system_all_nid_ipv_1 | BIGINT | The number of clicks of the item within the previous day. The feature is discretized. |
system_query_score_decay | STRING | The click ratios of the top N queries relevant to the item. The default value of N in top N queries is 20. Example: 'query1:score1^]query2:score2'. |
system_qterm_score_decay | STRING | The click ratios of the top N query terms relevant to the item. The default value of N in top N query terms is 300. Example: 'term1:score1^]term2:score2'. |
system_query_ctr_decay | STRING | The CTRs of the top N queries relevant to the item. Example: 'query1:ctr1^]query2:ctr2'. |
system_qterm_ctr_decay | STRING | The CTRs of the top N query terms relevant to the item. Example: 'term1:ctr1^]term2:ctr2'. |
system_query_match_decay | STRING | The search queries and their matches in the top N queries relevant to the item. Example: 'query1:query1^]query2:query2'. |
system_qterm_match_decay | STRING | The search query terms and their matches in the top N query terms relevant to the item. Example: 'term1:term1^]term2:term2'. |
system_query_seq_decay | STRING | The top N queries relevant to the item. It is a multi-valued feature. Example: 'query1^]query2'. |
system_qterm_seq_decay | STRING | The top N query terms relevant to the item. It is a multi-valued feature. Example: 'term1^]term2'. |
system_query_cnt | BIGINT | The number of values in the system_query_seq_decay feature. |
system_qterm_cnt | BIGINT | The number of values in the system_qterm_seq_decay feature. |
dt | STRING | The time partition by day. Example: 20230316. |
Built-in user features
The following table describes the built-in user features.
Feature name | Type | Description |
system_exp_time | STRING | The date on which the behavior is performed. The value is a day of the week. Examples: Monday and Tuesday. |
system_terms2 | STRING | The first 15 query tokens in the list. |
system_user_id | STRING | The user ID. |
system_raw_q_ultra | STRING | The original query before tokenization. |
system_term_seq | STRING | The sequence feature of the query. |
system_term_seq_length | DOUBLE | The length of the sequence feature of the query. |