The PAI-Rec engine provides multiple built-in filter templates, including the templates for the User2ItemExposureFilter, ItemStateFilter, and AdjustCountFilter filters.
Filter configurations
You can configure filters by configuring the FilterConfs parameter in the following sample code. FilterConfs is an array of objects and can be used to define multiple filter policies.
Overview of common filter configurations
The following section describes the common configurations that are referenced by different filters. The configurations are not repeated in the detailed descriptions of each filter in this topic.
Sample configuration:
"FilterConfs":[
{
"Name":"",
"FilterType":"",
"Dimension":"",
"DaoConf":{},
"AdjustCountConfs":[{}],
"ItemStateDaoConf":{},
"FilterParams":[{}],
"DiversityDaoConf":{},
"FilterVal":{}
}
]Parameter | Type | Required | Description |
Name | string | Yes | The custom name of the filter. You can use the name when you configure the FilterNames parameter. |
FilterType | string | Yes | The type of the built-in filter for the engine. Valid values:
|
Dimension | string | No | The dimension of the item. |
DaoConf | DaoConfig | No | The information about the source table. |
AdjustCountConfs | No | The configurations of the PriorityAdjustCountFilter filter. | |
ItemStateDaoConf | No | The configurations of the ItemStateFilter filter. | |
FilterParams | No | The configurations of contextual conditions. |
User2ItemExposureFilter
Exposure blocking is required in many business scenarios to prevent items from being repeatedly recommended. In this case, exposure refers to pseudo-exposure.
Due to the latency of real-time logs, the exposed items cannot be identified immediately. Therefore, the items returned by the recommendation engine are considered as pseudo-exposed items.
The PAI-Rec engine is used to read and write pseudo-exposure data. Real exposure data is obtained from the real-time logs of users and written to a database by using a real-time computing engine such as Apache Flink. Then, PAI-Rec can consume the data.
The following table describes the common parameter configurations of different data sources for User2ItemExposureFilter.
Parameter | Type | Required | Description |
Name | string | Yes | The custom name of the filter. |
FilterType | string | Yes | The type of the filter. Set the value to User2ItemExposureFilter. |
MaxItems | int | Yes | The maximum number of recent batches of items. This parameter is equivalent to limit ${MaxItems} in an SQL statement. MaxItems specifies the maximum number of batches, instead of the maximum number of items. One batch of items is returned for a recommendation request. |
TimeInterval | int | Yes | The time period for retrieving items based on timestamps. Unit: seconds. |
WriteLog | bool | Yes | Specifies whether to write exposure logs. |
ClearLogIfNotEnoughScene | string | No | Specifies the scenario in which data of the exposure table is to be deleted. |
GenerateItemDataFuncName | string | No | The function that is used to write the item data to the exposure table. If this parameter is left empty, the built-in function of the PAI-Rec engine is used. In this case, only item IDs are returned. |
WriteLogExcludeScenes | []string | No | Specifies the scenarios in which exposure logs are not written. |
Hologres
"FilterConfs" :[
{
"Name": "holo_exposure_filter",
"FilterType": "User2ItemExposureFilter",
"MaxItems": 100,
"TimeInterval": 172800,
"WriteLog": true,
"DaoConf":{
"AdapterType": "hologres",
"HologresName": "holo_info",
"HologresTableName": "exposure_history"
}
}
]Parameters of DaoConf
Parameter | Type | Required | Description |
AdapterType | string | Yes | The type of the data source. Set the value to hologres. |
HologresName | string | Yes | The custom name of the data source that is specified in the HologresConfs parameter. Example: holo_info. |
HologresTableName | string | Yes | The name of the exposure table. |
You can set the time_to_live_in_seconds parameter based on your business requirements to define the exposure table.
BEGIN;
CREATE TABLE "exposure_history" (
"uid" text NOT NULL,
"item" text NOT NULL,
"create_time" int4 NOT NULL,
PRIMARY KEY ("uid","create_time")
);
CALL SET_TABLE_PROPERTY('"exposure_history"', 'orientation', 'column');
CALL SET_TABLE_PROPERTY('"exposure_history"', 'clustering_key', '"uid","create_time"');
CALL SET_TABLE_PROPERTY('"exposure_history"', 'segment_key', '"create_time"');
CALL SET_TABLE_PROPERTY('"exposure_history"', 'bitmap_columns', '"uid","item"');
CALL SET_TABLE_PROPERTY('"exposure_history"', 'dictionary_encoding_columns', '"uid","item"');
CALL SET_TABLE_PROPERTY('"exposure_history"', 'time_to_live_in_seconds', '172800');
comment on table "exposure_history" is 'the table that stores exposure records';
COMMIT;ApsaraDB for Redis
"FilterConfs" :[
{
"Name": "redis_exposure_filter",
"FilterType": "User2ItemExposureFilter",
"MaxItems": 100,
"TimeInterval": 172800,
"WriteLog": true,
"DaoConf":{
"AdapterType": "redis",
"RedisName": "redis_info",
"RedisPrefix": "exposure_"
}
}
]Parameters of DaoConf
Parameter | Type | Required | Description |
AdapterType | string | Yes | The type of the data source. Set the value to redis. |
RedisName | string | Yes | The custom name of the data source that is specified in the RedisConfs parameter. Example: redis_info. |
RedisPrefix | string | No | The prefix of the key for exposure data. A key consists of the value of RedisPrefix and the unique ID (UID) of a user. |
Tablestore
"FilterConfs" :[
{
"Name": "ots_exposure_filter",
"FilterType": "User2ItemExposureFilter",
"MaxItems": 100,
"TimeInterval": 172800,
"WriteLog": true,
"DaoConf":{
"AdapterType": "tablestore",
"TableStoreName": "tablestore_info",
"TableStoreTableName": "exposure_history"
}
}
]Parameters of DaoConf
Parameter | Type | Required | Description |
AdapterType | string | Yes | The type of the data source. Valid values: hologres, mysql, and tablestore. |
TableStoreName | string | Yes | The custom name of the data source that is specified in the TableStoreConfs parameter. Example: tablestore_info. |
TableStoreTableName | string | Yes | The name of the exposure table. |
The exposure table is defined by using the following parameters:
time_to_live_in_seconds: the lifecycle of the data. You must specify a custom value for the parameter.
Parameter | Category | Type | Description | Example |
user_id | Primary key | string | The UID of the user. | 10944750 |
auto_id | Primary key | integer | The auto-increment column. | |
item_ids | Property | string | The item IDs. Multiple item IDs are separated with commas (,). When multiple items are exposed simultaneously, the system will insert a single record with the item IDs. | 17019277,17019278 |
User2ItemCustomFilter
You need to provide a custom user-to-item (U2I) table for filtering data.
Tablestore
"FilterConfs" :[
{
"Name": "u2i_custom_filter",
"FilterType": "User2ItemCustomFilter",
"DaoConf":{
"AdapterType": "tablestore",
"TableStoreName": "tablestore_info",
"TableStoreTableName": "u2i_table"
}
}
]Parameters of DaoConf
Parameter | Type | Required | Description |
AdapterType | string | Yes | The type of the data source. Valid values: hologres, mysql, and tablestore. |
TableStoreName | string | Yes | The custom name of the data source that is specified in the TableStoreConfs parameter. Example: tablestore_info. |
TableStoreTableName | string | Yes | The name of the exposure table. |
The exposure table is defined by using the following parameters.
Parameter | Category | Type | Description | Example |
user_id | Primary key | string | The UID of the user. | 10944750 |
item_ids | Property | string | The item IDs. Multiple item IDs are separated with commas (,). | 17019277,17019278 |
AdjustCountFilter
AdjustCountFilter is used to randomly shuffle the items returned by recall links and then retain a specified number of items.
Sample configuration:
"FilterConfs" :[
{
"Name": "adjust_count_filter",
"FilterType": "AdjustCountFilter",
"ShuffleItem": true,
"RetainNum": 500
}
]Parameter | Type | Required | Description |
ShuffleItem | string | Yes | Specifies whether to shuffle the items returned by the recall link. |
RetainNum | string | Yes | The number of items that you want to retain. |
PriorityAdjustCountFilter
PriorityAdjustCountFilter is used to control the number of items that are selected from the returned results of recall links based on scores. Each recall link sorts the recommended items based on their scores.
Sample configuration:
"FilterConfs" :[
{
"Name": "priority_adjust_count_filter",
"FilterType": "PriorityAdjustCountFilter",
"AdjustCountConfs" :[
{
"RecallName" :"recall_1",
"Count" :125,
"Type" : "accumulator"
},
{
"RecallName" :"recall_2",
"Count" :250,
"Type" : "accumulator"
},
{
"RecallName" :"recall_3",
"Count" :400,
"Type" : "accumulator"
}
]
}
]Parameter | Type | Required | Description |
Name | string | Yes | The custom name of the filter. |
FilterType | string | Yes | The type of the filter. Set the value to PriorityAdjustCountFilter. |
RecallName | string | Yes | The name of the recall link. |
AdjustCountConfs | json array | Yes | The configurations of the PriorityAdjustCountFilter filter. |
| int | Yes | The maximum number of items that are selected from the returned results of the recall link. |
| string | No | The type of the number adjustment. Valid values: accumulator and fix. accumulator:
fix:
|
ItemStateFilter
ItemStateFilter is used to filter items returned by recall links based on their states. The state of an item may change in real time. In most cases, a specific table is used to store item states. In this case, the item states need to be queried in real time before filtering.
Hologres
"FilterConfs" :[
{
"Name": "ItemStateFilter",
"FilterType": "ItemStateFilter",
"ItemStateDaoConf":{
"AdapterType": "hologres",
"HologresName": "",
"HologresTableName": "",
"ItemFieldName" : "",
"WhereClause": "",
"SelectFields" :""
},
"FilterParams" :[]
}
]Parameters of ItemStateDaoConfig
Parameter | Type | Required | Description |
AdapterType | string | Yes | The type of the data source. Valid values: hologres, mysql, and tablestore. |
HologresName | string | Yes | The custom name of the data source that is specified in the HologresConfs parameter. Example: holo_info. |
HologresTableName | string | Yes | The name of the table that stores item states in the Hologres instance. |
ItemFieldName | string | Yes | The primary key of the table that stores item states. |
WhereClause | string | No | The conditional statement that is used for filtering. |
SelectFields | string | Yes | The fields that you want to query. |
Parameters of FilterParams
"FilterParams" :[
{
"Name" : "publicStatus",
"Type" : "int",
"Operator" : "equal",
"Value" : 0
},
{
"Name" : "state",
"Type" : "int",
"Operator" : "equal",
"Value" : 1
},
{
"Name" : "checkStatus",
"Type" : "int",
"Operator" : "not_equal",
"Value" : 2
},
{
"Name" : "norec",
"Type" : "int",
"Operator" : "not_equal",
"Value" : 1
}
]Parameter | Type | Required | Description |
Name | string | Yes | The name of the feature. |
Domain | string | No | The object to which the feature belongs. Valid values: user and item. |
Operator | string | Yes | The operator. Valid values: equal, not_equal, in, greater, greaterThan, less, and lessThan. |
Type | string | Yes | The type of the feature. |
Value | object | Yes | The condition value. |
Both the WhereClause and the FilterParams parameters can be configured for filtering. The WhereClause parameter is used to specify filtering conditions in query requests. The FilterParams parameter is used to filter query results.
For more information about how to use operators, see the Appendix section of this topic.
CompletelyFairFilter
CompletelyFairFilter is used to sort items returned by recall links based on the scores of the items, and then select items from the results of each link in a fair manner.
"FilterConfs" :[
{
"Name": "CompletelyFairFilter",
"FilterType": "CompletelyFairFilter",
"RetainNum": 500
}
]DimensionFieldUniqueFilter
DimensionFieldUniqueFilter works in a different way from UniqueFilter. DimensionFieldUniqueFilter removes items with duplicate field values.
"FilterConfs" :[
{
"Name": "GroupWeightCountFilter",
"FilterType": "GroupWeightCountFilter",
"Dimension:""
}
]UniqueFilter
UniqueFilter is designed to ensure that each item ID is unique. If the same item ID is returned by two recall links, UniqueFilter prioritizes the item ID that is first returned.
You can use UniqueFilter when you configure the FilterNames parameter without the need to configure UniqueFilter.
Filter usage
The usage of filters is similar to that of recall links. After you complete filter configurations, you can configure the FilterNames parameter for a specific scenario. FilterNames is in the Map<String, Object> structure. The key indicates a scenario, and the value indicates a set of filter policies.
"FilterNames": {
"default": [
"UniqueFilter"
]
}default: the name of the scenario. If the scenario is not explicitly configured, default is used.
UniqueFilter: the custom name of the filter specified in the FilterConfis parameter.
Appendix
Operator examples
equal (equal to the specified value)
{
"Name" : "publicStatus",
"Type" : "int",
"Operator" : "equal",
"Value" : 0
}not_equal (not equal to the specified value)
{
"Name" : "checkStatus",
"Type" : "int",
"Operator" : "not_equal",
"Value" : 2
}greater (greater than the specified value)
{
"Name" : "checkStatus",
"Type" : "int",
"Operator" : "greater",
"Value" : 2
}greaterThan (greater than or equal to the specified value)
{
"Name" : "checkStatus",
"Type" : "int",
"Operator" : "greaterThan",
"Value" : 2
}less (less than the specified value)
{
"Name" : "checkStatus",
"Type" : "int",
"Operator" : "less",
"Value" : 2
}lessThan (less than or equal to the specified value)
{
"Name" : "checkStatus",
"Type" : "int",
"Operator" : "lessThan",
"Value" : 2
}in (equal to one value or some values in the array)
{
"Name" : "state",
"Type" : "int",
"Operator" : "in",
"Value" : [2, 4, 6]
}String
{
"Name" : "state",
"Type" : "string",
"Operator" : "in",
"Value" : ["success", "ok"]
}