The sort stage runs after the fine-grained sorting stage. In this stage, you can sort items, apply diversification, and add window rules.
How to configure
The sorting configuration corresponds to `SortConfs` in the configuration overview. `SortConfs` is a `[]object` structure that lets you configure multiple sorting policies. PAI-Rec provides the following built-in policies: BoostScoreSort, BoostScoreByWeight, ItemRankScore, DiversityRuleSort, DPPSort, and MultiRecallMixSort.
Common sorting configurations
Each sorting configuration uses a subset of the common configurations. This section explains these configurations to avoid repetition.
Configuration example:
{
"SortConfs": [
{
"Name": "",
"SortType": ""
}
]
}Field | Type | Required | Description |
Name | string | Yes | A custom name for the sort. You can reference this name in `SortNames`. |
SortType | string | Yes | The sorting type. Enumeration values:
|
Boost score sort (BoostScoreSort)
After the fine-grained sorting model is invoked, each item receives a score. Based on business requirements, you may need to modify this score by boosting or demoting it.
The boost or demotion operation consists of two parts:
Set conditional rules. You can use item or user properties, such as category or gender, to determine whether the conditions are met.
Set the boost or demotion expression. Currently, you can only set expressions for the score, such as `score × 1.2` or `score × 0.5`.
Configuration example:
{
"SortConfs": [
{
"Name": "BoostScoreSort",
"SortType": "BoostScoreSort",
"Debug": false,
"BoostScoreConditions": [
{
"Conditions": [
{
"Name": "sex",
"Domain": "item",
"Type": "string",
"Value": "gender",
"Operator": "equal"
}
],
"Expression": "score * 2"
}
]
}
]
}This configuration multiplies the score by 2 for items where the `sex` feature is `male`.
Field | Type | Required | Description |
Name | string | Yes | A custom sort name. |
SortType | string | Yes | The sorting type. Static field: `BoostScoreSort`. |
Debug | bool | No | A debug flag. If set to `true`, the original score before the boost or demotion is recorded in the item's `properties` as `org_score`. You can then enable the debug flag in the request to view the item property value. This is for debugging only and should not be enabled in a production environment. |
BoostScoreConditions | json array | Yes | The conditional configuration for boosting or demoting scores. You can configure multiple conditions to boost or demote scores. |
| []FilterParamConfig | Yes | The conditional rules for boosting or demoting scores. |
| string | Yes | The expression to boost or demote the score. `score` represents the current item score. The expression can reference item properties. For example, if `item_weight` is an item property, you can set the expression to `score × item_weight`. |
The `FilterParamConfig` configuration is as follows:
Field | Type | Required | Description |
Name | string | Yes | The feature name for the item or user. |
Domain | string | Yes | Enumeration value: `item` or `user`. Specifies whether the `Name` option is an item feature or a user feature. The name must be found in the `properties` of the item or user. |
Operator | string | Yes | Enumeration values: `equal`, `not_equal`, `in`, `not_in`, `greater`, `greaterThan`, `less`, `lessThan`, `contains`, or `not_contains`. |
Type | string | Yes | The type of the feature. |
Value | object | Yes | The value of the feature. |
For more information about conditional settings, see Adjust count filter (AdjustCountFilter).
Boost score by weight (BoostScoreByWeight)
When you boost or demote item scores, different items can have different weights. The weight is a field in the item table and is used to adjust the score.
Score formula: `weight × item.score`
Configuration example:
{
"SortConfs": [
{
"Name": "BoostScoreByWeight",
"SortType": "BoostScoreByWeight",
"TimeInterval": 172800,
"BoostScoreByWeightDao": {
"AdapterType": "hologres",
"HologresName": "pai_rec",
"HologresTableName": "test",
"ItemFieldName": "item_id",
"WeightFieldName": "weight"
}
}
]
}BoostScoreByWeightDao
Field | Type | Required | Description |
AdapterType | string | Yes | The type of the data source. Currently, only `hologres` is supported. |
HologresName | string | Yes | The custom name of the Hologres instance configured in the data source configuration (`HologresConfs`), such as `holo_info` in the data source configuration. |
HologresTableName | string | Yes | The name of the item weight table in Hologres. |
ItemFieldName | string | Yes | The primary key of the item weight table. |
WeightFieldName | string | Yes | The weight field in the item weight table. |
Item rank score (ItemRankScore)
`ItemRankScore` sorts items in descending order based on their scores. This feature is built into the DPI engine and can be used directly in `SortNames`.
Diversity rule sort (DiversityRuleSort)
When outputting recommendation results, consider not only user interests but also item diversity. This lets you mix items from different categories and with different properties in the output.
You can configure the rules as follows:
Recommendation diversity rules
Glossary:
Diversification dimension: The item property used for diversification, such as category, author, or tag.
Diversification policy:
Minimum interval k: An item from a specific diversification dimension can appear consecutively at most k times.
Maximum frequency m: Within a window of size n, an item from the same diversification dimension cannot appear more than m times.
Diversification logic:
Diversification rules apply only to the results of a single request. Cross-request diversity is not considered.
You can configure multiple diversification dimensions.
For each diversification dimension, you can configure multiple diversification policies, each with different parameters (k, m, n).
Configuration example:
{
"SortConfs": [
{
"Name": "DiversityRuleSort",
"SortType": "DiversityRuleSort",
"DiversitySize": 100,
"DiversityRules": [
{
"Dimensions": ["spfl"],
"WindowSize": 10,
"FrequencySize": 1
}
],
"ExcludeRecalls": [
"ColdStartVideoVectorRecall",
"LinUcbRecall_default2"
],
"Conditions": [
{
"Name": "spflPick",
"Domain": "user",
"Type": "string",
"Value": "",
"Operator": "equal"
}
]
}
]
}This sort must be used with the `ExcludeRecalls` parameter.
DiversityRules
Field | Type | Required | Description |
Name | string | Yes | A custom sort name. |
SortType | string | Yes | The sorting type. Static field: `DiversityRuleSort`. |
DiversitySize | int | No | The number of items to diversify. The default value is the `size` of the request. |
Conditions | []FilterParamConfig | No | The condition for the diversification rule. The diversification rule is applied only when user properties meet certain conditions. For details about conditional settings, see Operator examples for conditional matching. The conditions here are set based on user properties, so you must set `Domain` to `user`. |
ExcludeRecalls | []string | No | A list of recall IDs to exclude from diversity sorting. |
DiversityRules | json array | Yes | Diversification rules. You can set multiple rules. |
| []string | Yes | The item properties to use for diversification. |
| int | Yes | Controls the number of consecutive occurrences of items from the same dimension. This is the k value described earlier. |
| int | No | The window size. This is the n value described earlier. |
| int | No | The number of repetitions within the window. This is the m value described earlier. |
| int | No | The weight of the diversity rule. |
ExclusionRules | json array | No | Exclusion rules. Excludes certain items that meet the conditions from specific positions. |
| []int | Yes | The output position of the item, starting from 1. If the request `size` is 10, the positions are 1, 2, 3, ..., 10. |
| []FilterParamConfig | Yes | Rule conditions. For details about conditional settings, see Operator examples for conditional matching. These are mainly set for item properties. |
ExploreItemSize | int | No | By default, if the first item in the candidate set does not meet the diversification rule, the search continues until it is complete. This parameter controls the size of the candidate set to search. If this value is exceeded, the search stops. |
See the following example for exclusion rules and search depth.
Items that meet the condition `tag=t1` do not appear in output positions 1, 2, 3, or 4. The items that appear in positions 1, 2, 3, and 4 must still comply with the diversification rules, but they cannot have `tag=t1`.
{
"Name": "DiversityRuleSort",
"SortType": "DiversityRuleSort",
"DiversityRules": [
{
"Dimensions": [
"tag"
],
"WindowSize": 5,
"FrequencySize": 1
}
],
"ExclusionRules": [
{
"Positions": [
1,2,3,4
],
"Conditions": [
{
"Name": "tag",
"Domain": "item",
"Type": "string",
"Value": "t1",
"Operator": "equal"
}
]
}
],
"ExploreItemSize" : 200
}When searching for items that satisfy diversity rules, an item is output by default only if it meets all rules. If any diversity rule is not met, the system searches for the next item until one that meets all rules is found. If no item in the candidate set meets all rules, the first item searched is selected for output.
Another policy is available. You can add weights to the diversity rules. If no item in the candidate set meets all rules, the item that satisfies the rules with the highest total weight is selected for output. If multiple items have the same highest total weight, the one that appears earliest in the list is chosen.
See the following example for a configuration with weights:
{
"Name": "DiversityRuleSort",
"SortType": "DiversityRuleSort",
"DiversityRules": [
{
"Dimensions": [
"tag"
],
"WindowSize": 5,
"FrequencySize": 1,
"Weight": 1
},
{
"Dimensions": [
"category"
],
"WindowSize": 3,
"FrequencySize": 1,
"Weight": 3
}
],
"ExclusionRules": [
{
"Positions": [
1
],
"Conditions": [
{
"Name": "tag",
"Domain": "item",
"Type": "string",
"Value": "t1",
"Operator": "equal"
}
]
}
]
}DPPSort
For more information about the DPP (Determinantal Point Process) diversity algorithm, see An Intuitive Understanding of the Determinantal Point Process-based Algorithm for Improving Recommendation Diversity.
Prerequisites: To use the DPP algorithm, you must have item embedding vectors. These vectors must represent the content of the items. The similarity of the embeddings must represent similarity at the item content level, not at other levels such as behavior. For example:
Recommended: Item image embeddings, text description embeddings, or embeddings generated from a combination of static item content such as categories and properties.
Not recommended: Embeddings trained from models based on user behavioral data.
Essentially, the dimensions you want to diversify must be reflected in the embeddings. For example, if you want the recommendation list to have diversity in the product price dimension, the price feature must be included when training the model to obtain the embedding vectors. Otherwise, you will not achieve the desired effect.
Configuration example:
{
"SortConfs": [
{
"Name": "DPPSort",
"SortType": "DPPSort",
"DPPConf": {
"Name": "DPPSort",
"DaoConf": {
"AdapterType": "hologres",
"HologresName": "geeko_rec"
},
"TableName": "item_embedding_metric_learning",
"TableSuffixParam": "embedding_date",
"TablePKey": "product_id",
"EmbeddingColumn": "embedding",
"Alpha": 4.5,
"NormalizeEmb": "false",
"WindowSize": 10
}
}
]
}DPPConf
Field | Type | Required | Description |
Name | string | Yes | A custom sort name. |
DaoConf | DaoConfig | Yes | Configure Hologres information. |
TableName | string | No | The name of the item embedding vector table in Hologres. This is required if `EmbeddingHookNames` is not configured. |
TableSuffixParam | string | No | If not empty, the system retrieves the value of this configuration item for the current scenario from the `Parameter Management` module in the PAI-Rec |
TablePKey | string | No | The primary key of the embedding vector table. |
EmbeddingColumn | string | No | The name of the vector field in the embedding vector table. |
EmbeddingSeparator | string | No | The separator for the embedding vector. The default is a comma. |
Alpha | float | Yes | A parameter in the DPP algorithm used to balance relevance and diversity. A larger value favors relevance. |
CacheTimeInMinutes | int | No | The time in minutes to cache the embedding vectors in memory. Default: 360. |
EmbeddingHookNames | []string | No | The names of the functions that generate item embeddings. These functions must be registered in advance. |
NormalizeEmb | string | No | Specifies whether to perform L2 normalization on the embedding vectors. If L2 normalization was already done when the embeddings were generated, you do not need to do it again. Otherwise, set this to `true`. |
WindowSize | int | No | The size of the sliding window for the diversity algorithm. Diversity is only guaranteed within the window. Default: 10. |
EmbMissedThreshold | float | No | An error is reported if the proportion of items missing embeddings is higher than this value. Default: 0.5. |
FilterRetrieveIds | []string | No | Specifies a list of items that do not need to call the DPP module, such as cold-start items. |
EnsurePositiveSim | string | No | Specifies whether to ensure that the item similarity calculated based on embeddings is a positive value. Default: `true`. |
CandidateCount | int | No | The size of the candidate set for diversification. By default, it includes all items that enter the sorting stage. You can set it to a smaller number to limit the candidate set to the original top N items. |
AbortRunCount | int | No | If the number of items entering the sorting stage is less than this value, diversification is not performed for the current request. Default: 0. |
MinScorePercent | float | No | An item is eligible to be output by the diversification module only if its max-normalized sorting score is greater than this value. Default: 0. |
SSDSort
For more information about the SSD (Structured Self-Distillation) diversity algorithm, see Improving the Diversity of Recommendation Results: An Analysis of MMR/DPP/SSD Principles.
Prerequisites: To use the SSD algorithm, you must have item embedding vectors. These vectors must represent the content of the items. The similarity of the embeddings must represent similarity at the item content level, not at other levels such as behavior. For example:
Recommended: Item image embeddings, text description embeddings, or embeddings generated from a combination of static item content such as categories and properties.
Not recommended: Embeddings trained from models based on user behavioral data.
Essentially, the dimensions you want to diversify must be reflected in the embeddings. For example, if you want the recommendation list to have diversity in the product price dimension, the price feature must be included when training the model to obtain the embedding vectors. Otherwise, you will not achieve the desired effect.
Configuration example:
{
"SortConfs": [
{
"Name": "SSDSort",
"SortType": "SSDSort",
"SSDConf": {
"Name": "SSDSort",
"DaoConf": {
"AdapterType": "hologres",
"HologresName": "geeko_rec"
},
"TableName": "item_embedding_metric_learning",
"TablePKey": "item_id",
"EmbeddingColumn": "embedding",
"Gamma": 0.25,
"UseSSDStar": true,
"NormalizeEmb": "false",
"MinScorePercent": 0.1,
"CandidateCount": 200,
"WindowSize": 5
}
}
]
}SSDConf
Field | Type | Required | Description |
Name | string | Yes | A custom sort name. |
DaoConf | DaoConfig | Yes | Configure Hologres information. |
TableName | string | No | The name of the item embedding vector table in Hologres. This is required if `EmbeddingHookNames` is not configured. |
TableSuffixParam | string | No | If not empty, the system retrieves the value of this configuration item for the current scenario from the `Parameter Management` module in the PAI-Rec |
TablePKey | string | No | The primary key of the embedding vector table. |
EmbeddingColumn | string | No | The name of the vector field in the embedding vector table. |
EmbeddingSeparator | string | No | The separator for the embedding vector. The default is a comma. |
Gamma | float | Yes | A parameter in the SSD algorithm used to balance relevance and diversity. A larger value favors diversity. |
UseSSDStar | bool | No | Specifies whether to enable the optimization algorithm mentioned in the SSD paper. Default: `false`. Enabling it is recommended. |
CacheTimeInMinutes | int | No | The time in minutes to cache the embedding vectors in memory. Default: 360. |
EmbeddingHookNames | []string | No | The names of the functions that generate item embeddings. These functions must be registered in advance. |
NormalizeEmb | string | No | Specifies whether to perform L2 normalization on the embedding vectors. If L2 normalization was already done when the embeddings were generated, you do not need to do it again. Otherwise, set this to `true`. |
WindowSize | int | No | The size of the sliding window for the diversity algorithm. Diversity is only guaranteed within the item list in the window. Default: 5. |
EmbMissedThreshold | float | No | An error is reported if the proportion of items missing embeddings is higher than this value. Default: 0.5. |
FilterRetrieveIds | []string | No | Specifies a list of items that do not need to call the SSD module, such as cold-start items. |
EnsurePositiveSim | string | No | Specifies whether to ensure that the item similarity calculated based on embeddings is a positive value. Default: `true`. |
CandidateCount | int | No | The size of the candidate set for diversification. By default, it includes all items that enter the sorting stage. You can set it to a smaller number to limit the candidate set to the original top N items. |
AbortRunCount | int | No | If the number of items entering the sorting stage is less than this value, diversification is not performed for the current request. Default: 0. |
MinScorePercent | float | No | An item is eligible to be output by the diversification module only if its max-normalized sorting score is greater than this value. Default: 0. |
Multi-channel recall sort (MultiRecallMixSort)
Typically, you will have many recall channels. Sometimes, based on business requirements, you need to mix the output based on the recall type. For example:
There are exposure quantity requirements for cold-start recall.
There are position requirements for a specific recall channel.
This may also include multiple mixing rules.
The configuration is as follows:
{
"SortConfs": [
{
"Name": "MixSort",
"SortType": "MultiRecallMixSort",
"RemainItem": false,
"MixSortRules": [
{
"MixStrategy": "random_position",
"NumberRate": 0.1,
"RecallNames": [
"OTSGlobalHot"
]
},
{
"MixStrategy": "fix_position",
"Positions": [1,3,5],
"RecallNames": [
"RecallName1"
]
}
]
}
]
}In addition to using recall names to match and filter items, you can also use conditional filtering to select items for exposure.
{
"SortConfs": [
{
"Name": "MixSortByItemFeature",
"SortType": "MultiRecallMixSort",
"RemainItem": false,
"MixSortRules": [
{
"MixStrategy": "random_position",
"NumberRate": 0.1,
"Conditions": [
{
"Name": "gender",
"Domain": "item",
"Type": "string",
"Value": "man",
"Operator": "equal"
}
]
}
]
}
]
}Field | Type | Required | Description |
Name | string | Yes | A custom sort name. |
SortType | string | Yes | The sorting type. Static field: `MultiRecallMixSort`. |
RemainItem | bool | No | Specifies whether to keep all items. For example, if there are 500 items to process but a request size is 30, setting this to `false` means only the mixed results are kept. If set to `true`, the remaining items are also kept, but they are placed after the 30 mixed items. This allows further control and processing by subsequent sort stages. |
MixSortRules | json array | Yes | Mixing rules. You can set multiple rules. |
| string | Yes | The mixing policy. Enumeration values: `random_position` or `fix_position`.
|
| []int | No | If `MixStrategy` is `fix_position`, you must specify `Positions`. Positions start from 1. |
| string | No | If `MixStrategy` is `fix_position`, you can get the position from an item's property field. `Positions` and `PositionField` are mutually exclusive. You can only set one of them. |
| int | No | The absolute value of the quantity. |
| float | No | The proportion of mixed items. This is set only when `MixStrategy` is `random_position`. The valid value is from 0 to 1. The specific number is calculated by `request_size × NumberRate`. |
| []string | No | The names of the recall channels. You can set multiple names. If multiple names are set, they share the configuration. However, the specific recall channel used is not fixed. The order is determined by the position when entering this sort stage. |
| []FilterParamConfig | No | Items that meet the matching conditions are mixed. For details about conditional settings, see Operator examples for conditional matching. |
How to use
The sorting configuration is similar to the recall configuration. After configuration, you provide a `SortNames` parameter for use in different scenarios. `SortNames` is a `Map[string]object` structure where the key is the scenario, and each scenario corresponds to a set of sorting policies.
{
"SortNames": {
"${scene_name}": [
"ItemRankScore"
]
}
}`${scene_name}` is the scenario name. If you want to use the same configuration for multiple scenarios, you can use `default`.
`ItemRankScore`: This parameter is the custom name of the sort defined in `SortConfs`.