This topic describes how to configure the loading of user or item features.
Feature configuration corresponds to FeatureConfs in the configuration overview. FeatureConfs is a Map[string]object structure where the key is the scenario name. You can configure different features for different scenarios.
Load features
Before fine-ranking, you must retrieve user or item feature data from the feature source.
In some cases, the retrieved feature data requires further processing. For example, you can generate new features from existing ones or combine existing features.
Features can be loaded from multiple data sources, such as Hologres, Redis, Tablestore (OTS), and PAI-FeatureStore.
Hologres
Configuration example
{
"FeatureConfs": {
"scene_name": {
"AsynLoadFeature": true,
"FeatureLoadConfs": [
{
"FeatureDaoConf": {
"AdapterType": "hologres",
"HologresName": "holo-pai",
"FeatureKey": "user:uid",
"UserFeatureKeyName": "user_id",
"HologresTableName": "recom_user_features_processed_holo_online",
"UserSelectFields": "rids_count,sex,alladdfriendnum,allpayrosenum",
"FeatureStore": "user"
},
"Features": []
},
{
"FeatureDaoConf": {
"AdapterType": "hologres",
"HologresName": "holo-pai",
"ItemFeatureKeyName": "item_id",
"FeatureKey": "item:id",
"HologresTableName": "recom_user_features_processed_holo_online",
"ItemSelectFields": "rids_count as rids2_count,sex as guestsex,alladdfriendnum as alladdfriendnum2",
"FeatureStore": "item"
},
"Features": []
}
]
}
}
}AsynLoadFeature: Specifies whether to load features asynchronously. This allows features to be loaded concurrently when multiple `FeatureLoadConfs` are configured.
FeatureLoadConfs/FeatureDaoConf
Field name | Type | Required | Description |
AdapterType | string | Yes | The type of the data source. Set the value to `hologres`. |
HologresName | string | Yes | The custom name of the Hologres instance defined in `HologresConfs`. For example, `holo_info`. |
FeatureKey | string | Yes | The feature used to query the feature table. The `FeatureKey` specifies which field of the user or item to use for the lookup. For example, `user:uid` gets the value of the `uid` property from the user, and `item:id` gets the value of the `id` property from the item. |
UserFeatureKeyName | string | Yes | The primary key field of the user feature table. |
ItemFeatureKeyName | string | Yes | The primary key field of the item feature table. |
HologresTableName | string | Yes | The name of the feature table in Hologres. |
UserSelectFields | string | No | The user features to retrieve. Use `*` to retrieve all features, or specify features in a comma-separated list, such as `feature1,feature2`. |
ItemSelectFields | string | No | The item features to retrieve. Use `*` to retrieve all features, or specify features in a comma-separated list, such as `feature1,feature2`. |
FeatureStore | string | Yes | The location where the retrieved features are stored. Valid values: `user` or `item`. |
CacheSize | integer | No | The number of feature entries to cache locally. The default value is 0, which means data is not cached. |
CacheTime | integer | No | The expiration time for locally cached feature entries, in seconds. This parameter takes effect only if `CacheSize` is greater than 0. The default value is 3600. |
PAI-FeatureStore
For more information about how to configure the PAI-FeatureStore platform, see FeatureStore overview.
Configuration example
{
"FeatureConfs": {
"scene_name": {
"AsynLoadFeature": true,
"FeatureLoadConfs": [
{
"FeatureDaoConf": {
"AdapterType": "featurestore",
"FeatureStoreName": "pairec-fs",
"FeatureKey": "user:uid",
"FeatureStoreModelName": "rank_v1",
"FeatureStoreEntityName": "user",
"FeatureStore": "user"
}
}
]
}
}
}FeatureLoadConfs/FeatureDaoConf
Field Name | Type | Required | Description |
AdapterType | string | Yes | The type of the data source. Set the value to `featurestore`. |
FeatureStoreName | string | Yes | The custom name of the FeatureStore instance defined in `FeatureStoreConfs`. For example, `pairec-fs`. |
FeatureKey | string | Yes | The feature used to query the feature table. The `FeatureKey` specifies which field of the user or item to use for the lookup. For example, `user:uid` gets the value of the `uid` property from the user, and `item:id` gets the value of the `id` property from the item. |
FeatureStoreModelName | string | Yes | The model feature name in PAI-FeatureStore. |
FeatureStoreEntityName | string | Yes | The entity name in PAI-FeatureStore. |
FeatureStore | string | Yes | The location where the retrieved features are stored. Valid values: `user` or `item`. |
CacheSize | integer | No | The number of feature entries to cache locally. The default value is 0, which means data is not cached. |
CacheTime | integer | No | The expiration time for locally cached feature entries, in seconds. This parameter takes effect only if `CacheSize` is greater than 0. The default value is 3600. |
The preceding configuration retrieves all features of the user entity from the rank_v1 model feature. To retrieve features of the item entity, configure the parameters as follows:
{
"FeatureConfs" :{
"scene_name": {
"AsynLoadFeature": true,
"FeatureLoadConfs": [{
"FeatureDaoConf": {
"AdapterType": "featurestore",
"FeatureStoreName": "pairec-fs",
"FeatureKey": "item:id",
"FeatureStoreModelName": "rank_v1",
"FeatureStoreEntityName": "item",
"FeatureStore": "item"
}
}]
}
}
}However, you often need to retrieve only a subset of features from a specific feature view. For example, to retrieve all features from the feature view named user_table_preprocess_all_feature_v1 and store them in the user properties, configure the parameters as follows:
{
"FeatureConfs" :{
"scene_name": {
"AsynLoadFeature": true,
"FeatureLoadConfs": [{
"FeatureDaoConf": {
"AdapterType": "featurestore",
"FeatureStoreName": "pairec-fs",
"FeatureKey": "user:uid",
"FeatureStoreViewName": "user_table_preprocess_all_feature_v1",
"UserSelectFields": "*",
"FeatureStore": "user"
}
},
{
"FeatureDaoConf": {
"AdapterType": "featurestore",
"FeatureStoreName": "pairec-fs",
"FeatureKey": "item:id",
"FeatureStoreViewName": "item_table_preprocess_all_feature_v1",
"ItemSelectFields": "author,duration,category",
"FeatureStore": "item"
}
}]
}
}
}Field name | Type | Required | Description |
FeatureStoreViewName | string | Yes | The name of the feature view. |
UserSelectFields | string | No | The user features to retrieve. Use `*` to retrieve all features, or specify features in a comma-separated list, such as `feature1,feature2`. |
ItemSelectFields | string | No | The item features to retrieve. Use `*` to retrieve all features, or specify features in a comma-separated list, such as `feature1,feature2`. |
TableStore(OTS)
Configuration example
{
"FeatureConfs": {
"scene_name": {
"AsynLoadFeature": true,
"FeatureLoadConfs": [
{
"FeatureDaoConf": {
"AdapterType": "tablestore",
"TableStoreName": "tablestore_info",
"FeatureKey": "user:uid",
"UserFeatureKeyName": "uid",
"TableStoreTableName": "",
"UserSelectFields": "*",
"FeatureStore": "user"
},
"Features": []
},
{
"FeatureDaoConf": {
"AdapterType": "tablestore",
"TableStoreName": "tablestore_info",
"FeatureKey": "item:id",
"ItemFeatureKeyName": "item_id",
"TableStoreTableName": "",
"ItemSelectFields": "*",
"FeatureStore": "item"
},
"Features": []
}
]
}
}
}FeatureLoadConfs/FeatureDaoConf
Field name | Type | Required | Description |
AdapterType | string | Yes | The type of the data source. Set the value to `tablestore`. |
TableStoreName | string | Yes | The custom name of the Tablestore instance defined in `TableStoreConfs`. For example, `tablestore_info`. |
FeatureKey | string | Yes | The feature used to query the feature table. The `FeatureKey` specifies which field of the user or item to use for the lookup. For example, `user:uid` gets the value of the `uid` property from the user, and `item:pair_id` gets the value of the `pair_id` property from the item. |
UserFeatureKeyName | string | No | The primary key field of the user feature table. |
ItemFeatureKeyName | string | No | The primary key field of the item feature table. |
TableStoreTableName | string | Yes | The name of the feature table in Tablestore. |
UserSelectFields | string | No | The user features to retrieve. Use `*` to retrieve all features, or specify features in a comma-separated list, such as `feature1,feature2`. |
ItemSelectFields | string | No | The item features to retrieve. Use `*` to retrieve all features, or specify features in a comma-separated list, such as `feature1,feature2`. |
FeatureStore | string | Yes | The location where the retrieved features are stored. Valid values: `user` or `item`. |
CacheSize | integer | No | The number of feature entries to cache locally. The default value is 0, which means data is not cached. |
CacheTime | integer | No | The expiration time for locally cached feature entries, in seconds. This parameter takes effect only if `CacheSize` is greater than 0. The default value is 3600. |
Redis
Redis provides flexible data storage. You can use a key-value (KV) format, where the value can be in CSV or JSON format. You can also use the HASH format to retrieve all or part of the data by specifying field names.
Example configuration
{
"FeatureConfs": {
"scene_name": {
"AsynLoadFeature": true,
"FeatureLoadConfs": [
{
"FeatureDaoConf": {
"AdapterType": "redis",
"RedisName": "user_redis",
"RedisPrefix": "UF_V2_",
"FeatureKey": "user:uid",
"FeatureStore": "user",
"RedisDataType": "string",
"RedisFieldType": "csv",
"RedisValueDelimeter": ","
},
"Features": []
},
{
"FeatureDaoConf": {
"AdapterType": "redis",
"RedisName": "item_redis",
"RedisPrefix": "IF_V2_FM_",
"FeatureKey": "item:id",
"FeatureStore": "item",
"RedisDataType": "string",
"RedisFieldType": "json"
},
"Features": [ ]
}
]
}
}
}In the preceding example, user features and item features are both retrieved in KV format. However, user features are stored as CSV with the separator specified by `RedisValueDelimeter`, while item features are stored as JSON.
The following example demonstrates how to retrieve feature data in the HASH format.
{
"FeatureConfs": {
"scene_name": {
"AsynLoadFeature": true,
"FeatureLoadConfs": [
{
"FeatureDaoConf": {
"AdapterType": "redis",
"RedisName": "user_redis",
"RedisPrefix": "UF_V2_",
"FeatureKey": "user:uid",
"FeatureStore": "user",
"RedisDataType": "hash",
"UserSelectFields": "*"
},
"Features": []
},
{
"FeatureDaoConf": {
"AdapterType": "redis",
"RedisName": "item_redis",
"RedisPrefix": "IF_V2_FM_",
"FeatureKey": "item:id",
"FeatureStore": "item",
"RedisDataType": "hash",
"ItemSelectFields": "city,author,duration"
},
"Features": []
}
]
}
}
}FeatureLoadConfs/FeatureDaoConf
Field name | Type | Required | Description |
AdapterType | string | Yes | The type of the data source. Set the value to `redis`. |
RedisName | string | Yes | The custom name of the Redis instance defined in `RedisConfs`. For example, `redis_info`. |
RedisPrefix | string | Yes | The key prefix. |
FeatureKey | string | Yes | The feature used to query the feature table. The `FeatureKey` specifies which field of the user or item to use for the lookup. For example, `user:uid` gets the value of the `uid` property from the user, and `item:pair_id` gets the value of the `pair_id` property from the item. |
UserSelectFields | string | No | The user features to retrieve. Use `*` to retrieve all features, or specify features in a comma-separated list, such as `feature1,feature2`. This parameter is effective only when `RedisDataType` is set to `hash`. |
ItemSelectFields | string | No | The item features to retrieve. Use `*` to retrieve all features, or specify features in a comma-separated list, such as `feature1,feature2`. This parameter is effective only when `RedisDataType` is set to `hash`. |
FeatureStore | string | Yes | The location where the retrieved features are stored. Valid values: `user` or `item`. |
RedisDataType | string | Yes | The data storage format. 2. hash specifies the HASH. |
RedisFieldType | string | No | Valid values are `csv` or `json`. This parameter is effective only when `RedisDataType` is set to `string`. |
RedisValueDelimeter | string | No | The data separator. This parameter is effective only when `RedisFieldType` is set to `csv`. |
CacheSize | integer | No | The number of feature entries to cache locally. The default value is 0, which means data is not cached. |
CacheTime | integer | No | The expiration time for locally cached feature entries, in seconds. This parameter takes effect only if `CacheSize` is greater than 0. The default value is 3600. |
Feature transformation
After features are loaded, you may need to process them further by transforming them or generating new ones. The engine has several built-in feature transformation types, which are described below.
FeatureType (Transformation type) | Description |
new_feature | Generates a completely new feature. |
raw_feature | Creates a new feature based on an existing feature. |
compose_feature | Composite feature. Combines multiple existing features to generate a new feature. |
new_feature
The day_h and week_day features are often used for user features. These two features are generated in real-time.
{
"FeatureType": "new_feature",
"FeatureName": "day_h",
"Normalizer": "hour_in_day",
"FeatureStore": "user"
}{
"FeatureType": "new_feature",
"FeatureName": "week_day",
"Normalizer": "weekday",
"FeatureStore": "user"
}FeatureName: The name of the generated feature.
FeatureStore: Specifies whether the feature is stored on the user side or the item side. Valid values: user or item.
Generate a random number. Random numbers are sometimes used for probability-based judgments. The following configuration generates the rand_int_v feature with a value in the range of [0, 100).
{
"FeatureType": "new_feature",
"FeatureName": "rand_int_v",
"Normalizer": "random",
"FeatureStore": "user"
}Generate a static field. The following configuration generates a feature named alg with the value ALRC.
{
"FeatureType": "new_feature",
"FeatureStore": "user",
"Normalizer": "const_value",
"FeatureValue": "ALRC",
"FeatureName": "alg"
}Generate a feature based on an expression. The expression is interpreted and executed by the https://github.com/Knetic/govaluate library.
The following configuration generates a boolean feature named is_retarget by checking if recall_name is in an array. The value of the boolean feature is represented by 1 or 0.
{
"FeatureType": "new_feature",
"FeatureStore": "item",
"FeatureSource": "item:recall_name",
"Normalizer": "expression",
"Expression": "recall_name in ('retarget_u2i','realtime_retarget_click')",
"FeatureName": "is_retarget"
}Expression: For more information about expression rules, see https://github.com/Knetic/govaluate/blob/master/MANUAL.md.
FeatureSource: Specifies the source of the feature value. For example, item:recall_name indicates that the value comes from the recall_name feature of the item. If the expression contains multiple item properties, you can omit FeatureSource. In this case, all item properties are passed to the expression for calculation.
You can use currentTime to refer to the current system Unix timestamp in seconds.
Generate a feature based on an expression (V2). This version offers more flexible syntax and better performance. It uses the https://github.com/expr-lang/expr library to interpret and execute the expression. This is the recommended method for using expressions.
The preceding expression can be rewritten as follows:
{
"FeatureType": "new_feature",
"FeatureStore": "item",
"Normalizer": "expr",
"Expression": "item.recall_name in ['retarget_u2i','realtime_retarget_click']",
"FeatureName": "is_retarget"
}Expression: For more information about expression rules, see https://expr-lang.org/docs/language-definition.
For item-side features, you can add the item. prefix. For user-side features, you can add the user. prefix. item.recall_name refers to the recall_name feature on the item side. When FeatureStore is set to item, you can also reference user-side features in the expression.
You can use currentTime to refer to the current system Unix timestamp in seconds.
To use both user-side and item-side features, see the following example:
{
"FeatureType": "new_feature",
"FeatureStore": "item",
"Normalizer": "expr",
"Expression": "user.user_index - item.index",
"FeatureName": "index_delta"
}raw_feature
Transforms an existing feature.
{
"FeatureType": "raw_feature",
"FeatureStore": "user",
"FeatureSource": "user:age",
"RemoveFeatureSource": true,
"FeatureName": "age_v2"
}This configuration generates a new feature named age_v2 based on the user's age feature. The value of the age feature is assigned to age_v2. The RemoveFeatureSource parameter determines whether to delete the original source feature (age).
user:age uses a user-side feature. To use an item-side feature, you would specify it as item:city.
compose_feature
Generates a composite feature.
{
"FeatureType": "compose_feature",
"FeatureStore": "item",
"FeatureSource": "user:category,item:author",
"FeatureName": "item_author"
}To reference multiple features in `FeatureSource`, add the `
user:` prefix to user features and the `item:` prefix to item features.A new feature, `item_author`, is generated by joining the values of the source features with an underscore (`
_`). For example, if the value of `user:category` is `category1` and the value of `item:author` is `author1`, the value of `item_author` is `category1_author1`.You can also create this type of composite feature more flexibly using the expression (V2) method for new_feature.
Built-in expression functions
In the new_feature section, the expression-based methods use expressions to generate new features. The engine also provides the following built-in user-defined functions that you can use directly in expressions.
Function name | Function signature | Description |
getString | getString(a, b) | If a is not an empty value, returns a. Otherwise, returns b. |
trim | trim(str, cutset) | Removes all leading and trailing characters from `str` that are contained in `cutset`. |
trimPrefix | trimPrefix(str, cutset) | Removes the leading characters from `str` that are contained in `cutset`. |
replace | replace(str, old, new) | Replaces occurrences of the `old` string with the `new` string in `str`. |
round | round(number) round(number, places) |
|
hash32 | hash32(str) | A hash algorithm that uses murmur3.Sum32. |
log | log(number) | Calculates the natural logarithm (base e) of a number. |
log10 | log10(number) | Calculates the common logarithm (base 10) of a number. |
log2 | log2(number) | Calculates the binary logarithm (base 2) of a number. |
pow | pow(base, exponent) | Calculates the power of a number. |
s2CellID | s2CellID(lat,lng) s2CellID(lat, lng, level) |
|
s2CellNeighbors | s2CellNeighbors(lat,lng) s2CellNeighbors(lat,lng, level) |
|
geoHash | geoHash(lat, lng) geoHash(lat, lng, precision) |
|
geoHashWithNeighbors | geoHashWithNeighbors(lat,lng) geoHashWithNeighbors(lat, lng, precision) |
|
haversine | haversine(lng1, lat1, lng2, lat2) |
|
sphereDistance | sphereDistance(lng1, lat1, lng2, lat2) |
|