Sorting configuration - Artificial Intelligence Recommendation

The sort stage runs after the fine-grained sorting stage. In this stage, you can sort items, apply diversification, and add window rules.

How to configure

The sorting configuration corresponds to `SortConfs` in the configuration overview. `SortConfs` is a `[]object` structure that lets you configure multiple sorting policies. PAI-Rec provides the following built-in policies: BoostScoreSort, BoostScoreByWeight, ItemRankScore, DiversityRuleSort, DPPSort, and MultiRecallMixSort.

Common sorting configurations

Each sorting configuration uses a subset of the common configurations. This section explains these configurations to avoid repetition.

Configuration example:

{
    "SortConfs": [
        {
            "Name": "",
            "SortType": ""
        }
    ]
}

Field

Type

Required

Description

Name

string

Yes

A custom name for the sort. You can reference this name in `SortNames`.

SortType

string

Yes

The sorting type. Enumeration values:

ItemRankScore
BoostScoreSort
DiversityRuleSort
DPPSort
MultiRecallMixSort

Boost score sort (BoostScoreSort)

After the fine-grained sorting model is invoked, each item receives a score. Based on business requirements, you may need to modify this score by boosting or demoting it.

The boost or demotion operation consists of two parts:

Set conditional rules. You can use item or user properties, such as category or gender, to determine whether the conditions are met.
Set the boost or demotion expression. Currently, you can only set expressions for the score, such as `score × 1.2` or `score × 0.5`.

Configuration example:

{
    "SortConfs": [
        {
            "Name": "BoostScoreSort",
            "SortType": "BoostScoreSort",
            "Debug": false,
            "BoostScoreConditions": [
                {
                    "Conditions": [
                        {
                            "Name": "sex",
                            "Domain": "item",
                            "Type": "string",
                            "Value": "gender",
                            "Operator": "equal"
                        }
                    ],
                    "Expression": "score * 2"
                }
            ]
        }
    ]
}

This configuration multiplies the score by 2 for items where the `sex` feature is `male`.

Field	Type	Required	Description
Name	string	Yes	A custom sort name.
SortType	string	Yes	The sorting type. Static field: `BoostScoreSort`.
Debug	bool	No	A debug flag. If set to `true`, the original score before the boost or demotion is recorded in the item's `properties` as `org_score`. You can then enable the debug flag in the request to view the item property value. This is for debugging only and should not be enabled in a production environment.
BoostScoreConditions	json array	Yes	The conditional configuration for boosting or demoting scores. You can configure multiple conditions to boost or demote scores.
Conditions	[]FilterParamConfig	Yes	The conditional rules for boosting or demoting scores.
Expression	string	Yes	The expression to boost or demote the score. `score` represents the current item score. The expression can reference item properties. For example, if `item_weight` is an item property, you can set the expression to `score × item_weight`.

The `FilterParamConfig` configuration is as follows:

Field	Type	Required	Description
Name	string	Yes	The feature name for the item or user.
Domain	string	Yes	Enumeration value: `item` or `user`. Specifies whether the `Name` option is an item feature or a user feature. The name must be found in the `properties` of the item or user.
Operator	string	Yes	Enumeration values: `equal`, `not_equal`, `in`, `not_in`, `greater`, `greaterThan`, `less`, `lessThan`, `contains`, or `not_contains`.
Type	string	Yes	The type of the feature.
Value	object	Yes	The value of the feature.

For more information about conditional settings, see Adjust count filter (AdjustCountFilter).

Boost score by weight (BoostScoreByWeight)

When you boost or demote item scores, different items can have different weights. The weight is a field in the item table and is used to adjust the score.

Score formula: `weight × item.score`

Configuration example:

{
    "SortConfs": [
        {
            "Name": "BoostScoreByWeight",
            "SortType": "BoostScoreByWeight",
            "TimeInterval": 172800,
            "BoostScoreByWeightDao": {
                "AdapterType": "hologres",
                "HologresName": "pai_rec",
                "HologresTableName": "test",
                "ItemFieldName": "item_id",
                "WeightFieldName": "weight"
            }
        }
    ]
}

BoostScoreByWeightDao

Field	Type	Required	Description
AdapterType	string	Yes	The type of the data source. Currently, only `hologres` is supported.
HologresName	string	Yes	The custom name of the Hologres instance configured in the data source configuration (`HologresConfs`), such as `holo_info` in the data source configuration.
HologresTableName	string	Yes	The name of the item weight table in Hologres.
ItemFieldName	string	Yes	The primary key of the item weight table.
WeightFieldName	string	Yes	The weight field in the item weight table.

Item rank score (ItemRankScore)

`ItemRankScore` sorts items in descending order based on their scores. This feature is built into the DPI engine and can be used directly in `SortNames`.

Diversity rule sort (DiversityRuleSort)

When outputting recommendation results, consider not only user interests but also item diversity. This lets you mix items from different categories and with different properties in the output.

You can configure the rules as follows:

Recommendation diversity rules

Glossary:

Diversification dimension: The item property used for diversification, such as category, author, or tag.
Diversification policy:
- Minimum interval k: An item from a specific diversification dimension can appear consecutively at most k times.
- Maximum frequency m: Within a window of size n, an item from the same diversification dimension cannot appear more than m times.

Diversification logic:

Diversification rules apply only to the results of a single request. Cross-request diversity is not considered.
You can configure multiple diversification dimensions.
For each diversification dimension, you can configure multiple diversification policies, each with different parameters (k, m, n).

Configuration example:

{
    "SortConfs": [
        {
            "Name": "DiversityRuleSort",
            "SortType": "DiversityRuleSort",
            "DiversitySize": 100,
            "DiversityRules": [
                {
                    "Dimensions": ["spfl"],
                    "WindowSize": 10,
                    "FrequencySize": 1
                }
            ],
            "ExcludeRecalls": [
                "ColdStartVideoVectorRecall",
                "LinUcbRecall_default2"
            ],
            "Conditions": [
                {
                    "Name": "spflPick",
                    "Domain": "user",
                    "Type": "string",
                    "Value": "",
                    "Operator": "equal"
                }
            ]
        }
    ]
}

This sort must be used with the `ExcludeRecalls` parameter.

DiversityRules

Field	Type	Required	Description
Name	string	Yes	A custom sort name.
SortType	string	Yes	The sorting type. Static field: `DiversityRuleSort`.
DiversitySize	int	No	The number of items to diversify. The default value is the `size` of the request.
Conditions	[]FilterParamConfig	No	The condition for the diversification rule. The diversification rule is applied only when user properties meet certain conditions. For details about conditional settings, see Operator examples for conditional matching. The conditions here are set based on user properties, so you must set `Domain` to `user`.
ExcludeRecalls	[]string	No	A list of recall IDs to exclude from diversity sorting.
DiversityRules	json array	Yes	Diversification rules. You can set multiple rules.
Dimensions	[]string	Yes	The item properties to use for diversification.
IntervalSize	int	Yes	Controls the number of consecutive occurrences of items from the same dimension. This is the k value described earlier.
WindowSize	int	No	The window size. This is the n value described earlier.
FrequencySize	int	No	The number of repetitions within the window. This is the m value described earlier.
Weight	int	No	The weight of the diversity rule.
ExclusionRules	json array	No	Exclusion rules. Excludes certain items that meet the conditions from specific positions.
Positions	[]int	Yes	The output position of the item, starting from 1. If the request `size` is 10, the positions are 1, 2, 3, ..., 10.
Conditions	[]FilterParamConfig	Yes	Rule conditions. For details about conditional settings, see Operator examples for conditional matching. These are mainly set for item properties.
ExploreItemSize	int	No	By default, if the first item in the candidate set does not meet the diversification rule, the search continues until it is complete. This parameter controls the size of the candidate set to search. If this value is exceeded, the search stops.

See the following example for exclusion rules and search depth.

Items that meet the condition `tag=t1` do not appear in output positions 1, 2, 3, or 4. The items that appear in positions 1, 2, 3, and 4 must still comply with the diversification rules, but they cannot have `tag=t1`.

{
    "Name": "DiversityRuleSort",
    "SortType": "DiversityRuleSort",
    "DiversityRules": [
        {
            "Dimensions": [
                "tag"
            ],
            "WindowSize": 5,
            "FrequencySize": 1
        }
    ],
    "ExclusionRules": [
        {
            "Positions": [
                1,2,3,4
            ],
            "Conditions": [
                {
                    "Name": "tag",
                    "Domain": "item",
                    "Type": "string",
                    "Value": "t1",
                    "Operator": "equal"
                }
            ]
        }
    ],
    "ExploreItemSize" : 200
}

When searching for items that satisfy diversity rules, an item is output by default only if it meets all rules. If any diversity rule is not met, the system searches for the next item until one that meets all rules is found. If no item in the candidate set meets all rules, the first item searched is selected for output.

Another policy is available. You can add weights to the diversity rules. If no item in the candidate set meets all rules, the item that satisfies the rules with the highest total weight is selected for output. If multiple items have the same highest total weight, the one that appears earliest in the list is chosen.

See the following example for a configuration with weights:

{
    "Name": "DiversityRuleSort",
    "SortType": "DiversityRuleSort",
    "DiversityRules": [
        {
            "Dimensions": [
                "tag"
            ],
            "WindowSize": 5,
            "FrequencySize": 1,
            "Weight": 1
        },
        {
            "Dimensions": [
                "category"
            ],
            "WindowSize": 3,
            "FrequencySize": 1,
             "Weight": 3
        }
    ],
    "ExclusionRules": [
        {
            "Positions": [
                1
            ],
            "Conditions": [
                {
                    "Name": "tag",
                    "Domain": "item",
                    "Type": "string",
                    "Value": "t1",
                    "Operator": "equal"
                }
            ]
        }
    ]
}

DPPSort

For more information about the DPP (Determinantal Point Process) diversity algorithm, see An Intuitive Understanding of the Determinantal Point Process-based Algorithm for Improving Recommendation Diversity.

Prerequisites: To use the DPP algorithm, you must have item embedding vectors. These vectors must represent the content of the items. The similarity of the embeddings must represent similarity at the item content level, not at other levels such as behavior. For example:

Recommended: Item image embeddings, text description embeddings, or embeddings generated from a combination of static item content such as categories and properties.
Not recommended: Embeddings trained from models based on user behavioral data.

Essentially, the dimensions you want to diversify must be reflected in the embeddings. For example, if you want the recommendation list to have diversity in the product price dimension, the price feature must be included when training the model to obtain the embedding vectors. Otherwise, you will not achieve the desired effect.

Configuration example:

{
    "SortConfs": [
        {
            "Name": "DPPSort",
            "SortType": "DPPSort",
            "DPPConf": {
                "Name": "DPPSort",
                "DaoConf": {
                    "AdapterType": "hologres",
                    "HologresName": "geeko_rec"
                },
                "TableName": "item_embedding_metric_learning",
                "TableSuffixParam": "embedding_date",
                "TablePKey": "product_id",
                "EmbeddingColumn": "embedding",
                "Alpha": 4.5,
                "NormalizeEmb": "false",
                "WindowSize": 10
            }
        }
    ]
}

DPPConf

Field	Type	Required	Description
Name	string	Yes	A custom sort name.
DaoConf	DaoConfig	Yes	Configure Hologres information.
TableName	string	No	The name of the item embedding vector table in Hologres. This is required if `EmbeddingHookNames` is not configured.
TableSuffixParam	string	No	If not empty, the system retrieves the value of this configuration item for the current scenario from the `Parameter Management` module in the PAI-Rec `DPI Engine Service Management` page. This value is used as a suffix for `TableName`. This is used to switch vector tables daily to keep the embeddings up to date. In this case, the Hologres table usually needs to be a partitioned table.
TablePKey	string	No	The primary key of the embedding vector table.
EmbeddingColumn	string	No	The name of the vector field in the embedding vector table.
EmbeddingSeparator	string	No	The separator for the embedding vector. The default is a comma.
Alpha	float	Yes	A parameter in the DPP algorithm used to balance relevance and diversity. A larger value favors relevance.
CacheTimeInMinutes	int	No	The time in minutes to cache the embedding vectors in memory. Default: 360.
EmbeddingHookNames	[]string	No	The names of the functions that generate item embeddings. These functions must be registered in advance.
NormalizeEmb	string	No	Specifies whether to perform L2 normalization on the embedding vectors. If L2 normalization was already done when the embeddings were generated, you do not need to do it again. Otherwise, set this to `true`.
WindowSize	int	No	The size of the sliding window for the diversity algorithm. Diversity is only guaranteed within the window. Default: 10.
EmbMissedThreshold	float	No	An error is reported if the proportion of items missing embeddings is higher than this value. Default: 0.5.
FilterRetrieveIds	[]string	No	Specifies a list of items that do not need to call the DPP module, such as cold-start items.
EnsurePositiveSim	string	No	Specifies whether to ensure that the item similarity calculated based on embeddings is a positive value. Default: `true`.
CandidateCount	int	No	The size of the candidate set for diversification. By default, it includes all items that enter the sorting stage. You can set it to a smaller number to limit the candidate set to the original top N items.
AbortRunCount	int	No	If the number of items entering the sorting stage is less than this value, diversification is not performed for the current request. Default: 0.
MinScorePercent	float	No	An item is eligible to be output by the diversification module only if its max-normalized sorting score is greater than this value. Default: 0.

SSDSort

For more information about the SSD (Structured Self-Distillation) diversity algorithm, see Improving the Diversity of Recommendation Results: An Analysis of MMR/DPP/SSD Principles.

Prerequisites: To use the SSD algorithm, you must have item embedding vectors. These vectors must represent the content of the items. The similarity of the embeddings must represent similarity at the item content level, not at other levels such as behavior. For example:

Recommended: Item image embeddings, text description embeddings, or embeddings generated from a combination of static item content such as categories and properties.

Not recommended: Embeddings trained from models based on user behavioral data.

Configuration example:

{
    "SortConfs": [
        {
            "Name": "SSDSort",
            "SortType": "SSDSort",
            "SSDConf": {
                "Name": "SSDSort",
                "DaoConf": {
                    "AdapterType": "hologres",
                    "HologresName": "geeko_rec"
                },
                "TableName": "item_embedding_metric_learning",
                "TablePKey": "item_id",
                "EmbeddingColumn": "embedding",
                "Gamma": 0.25,
                "UseSSDStar": true,
                "NormalizeEmb": "false",
                "MinScorePercent": 0.1,
                "CandidateCount": 200,
                "WindowSize": 5
            }
        }
    ]
}

SSDConf

Field	Type	Required	Description
Name	string	Yes	A custom sort name.
DaoConf	DaoConfig	Yes	Configure Hologres information.
TableName	string	No	The name of the item embedding vector table in Hologres. This is required if `EmbeddingHookNames` is not configured.
TableSuffixParam	string	No	If not empty, the system retrieves the value of this configuration item for the current scenario from the `Parameter Management` module in the PAI-Rec `DPI Engine Service Management` page. This value is used as a suffix for `TableName`. This is used to switch vector tables daily to keep the embeddings up to date. In this case, the Hologres table usually needs to be a partitioned table.
TablePKey	string	No	The primary key of the embedding vector table.
EmbeddingColumn	string	No	The name of the vector field in the embedding vector table.
EmbeddingSeparator	string	No	The separator for the embedding vector. The default is a comma.
Gamma	float	Yes	A parameter in the SSD algorithm used to balance relevance and diversity. A larger value favors diversity.
UseSSDStar	bool	No	Specifies whether to enable the optimization algorithm mentioned in the SSD paper. Default: `false`. Enabling it is recommended.
CacheTimeInMinutes	int	No	The time in minutes to cache the embedding vectors in memory. Default: 360.
EmbeddingHookNames	[]string	No	The names of the functions that generate item embeddings. These functions must be registered in advance.
NormalizeEmb	string	No	Specifies whether to perform L2 normalization on the embedding vectors. If L2 normalization was already done when the embeddings were generated, you do not need to do it again. Otherwise, set this to `true`.
WindowSize	int	No	The size of the sliding window for the diversity algorithm. Diversity is only guaranteed within the item list in the window. Default: 5.
EmbMissedThreshold	float	No	An error is reported if the proportion of items missing embeddings is higher than this value. Default: 0.5.
FilterRetrieveIds	[]string	No	Specifies a list of items that do not need to call the SSD module, such as cold-start items.
EnsurePositiveSim	string	No	Specifies whether to ensure that the item similarity calculated based on embeddings is a positive value. Default: `true`.
CandidateCount	int	No	The size of the candidate set for diversification. By default, it includes all items that enter the sorting stage. You can set it to a smaller number to limit the candidate set to the original top N items.
AbortRunCount	int	No	If the number of items entering the sorting stage is less than this value, diversification is not performed for the current request. Default: 0.
MinScorePercent	float	No	An item is eligible to be output by the diversification module only if its max-normalized sorting score is greater than this value. Default: 0.

Multi-channel recall sort (MultiRecallMixSort)

Typically, you will have many recall channels. Sometimes, based on business requirements, you need to mix the output based on the recall type. For example:

There are exposure quantity requirements for cold-start recall.
There are position requirements for a specific recall channel.

This may also include multiple mixing rules.

The configuration is as follows:

{
    "SortConfs": [
        {
            "Name": "MixSort",
            "SortType": "MultiRecallMixSort",
            "RemainItem": false,
            "MixSortRules": [
                {
                    "MixStrategy": "random_position",
                    "NumberRate": 0.1,
                    "RecallNames": [
                        "OTSGlobalHot"
                    ]
                },
                {
                    "MixStrategy": "fix_position",
                    "Positions": [1,3,5],
                    "RecallNames": [
                        "RecallName1"
                    ]
                }
            ]
        }
    ]
}

In addition to using recall names to match and filter items, you can also use conditional filtering to select items for exposure.

{
    "SortConfs": [
        {
            "Name": "MixSortByItemFeature",
            "SortType": "MultiRecallMixSort",
            "RemainItem": false,
            "MixSortRules": [
                {
                    "MixStrategy": "random_position",
                    "NumberRate": 0.1,
                    "Conditions": [
                        {
                            "Name": "gender",
                            "Domain": "item",
                            "Type": "string",
                            "Value": "man",
                            "Operator": "equal"
                        }
                    ]
                }
            ]
        }
    ]
}

Field	Type	Required	Description
Name	string	Yes	A custom sort name.
SortType	string	Yes	The sorting type. Static field: `MultiRecallMixSort`.
RemainItem	bool	No	Specifies whether to keep all items. For example, if there are 500 items to process but a request size is 30, setting this to `false` means only the mixed results are kept. If set to `true`, the remaining items are also kept, but they are placed after the 30 mixed items. This allows further control and processing by subsequent sort stages.
MixSortRules	json array	Yes	Mixing rules. You can set multiple rules.
MixStrategy	string	Yes	The mixing policy. Enumeration values: `random_position` or `fix_position`. `random_position`: Indicates that the position is random. `fix_position`: Indicates a fixed position. You must specify `Positions`.
Positions	[]int	No	If `MixStrategy` is `fix_position`, you must specify `Positions`. Positions start from 1.
PositionField	string	No	If `MixStrategy` is `fix_position`, you can get the position from an item's property field. `Positions` and `PositionField` are mutually exclusive. You can only set one of them.
Number	int	No	The absolute value of the quantity.
NumberRate	float	No	The proportion of mixed items. This is set only when `MixStrategy` is `random_position`. The valid value is from 0 to 1. The specific number is calculated by `request_size × NumberRate`.
RecallNames	[]string	No	The names of the recall channels. You can set multiple names. If multiple names are set, they share the configuration. However, the specific recall channel used is not fixed. The order is determined by the position when entering this sort stage.
Conditions	[]FilterParamConfig	No	Items that meet the matching conditions are mixed. For details about conditional settings, see Operator examples for conditional matching.

How to use

The sorting configuration is similar to the recall configuration. After configuration, you provide a `SortNames` parameter for use in different scenarios. `SortNames` is a `Map[string]object` structure where the key is the scenario, and each scenario corresponds to a set of sorting policies.

{
    "SortNames": {
        "${scene_name}": [
            "ItemRankScore"
        ]
    }
}

`${scene_name}` is the scenario name. If you want to use the same configuration for multiple scenarios, you can use `default`.
`ItemRankScore`: This parameter is the custom name of the sort defined in `SortConfs`.