id_feature
Function introduction
The id_feature operator represents a discrete feature. It supports single-value discrete features, such as user IDs and item IDs, along with multi-value discrete features, such as product colors that can have multiple values.
Configuration
{
"feature_type": "id_feature",
"feature_name": "item_is_main",
"expression": "item:is_main",
"need_prefix": true,
"separator": "\u001D",
"default_value": ""
}Field name | Required | Description |
feature_name | Yes | The prefix for the output feature. |
expression | Yes | The source field that the feature depends on. |
need_prefix | No | Specifies whether to add the feature_name as a prefix. Valid values:
|
value_type | No | The data type of the output feature. The default value is string. |
separator | No | The separator for multi-value input features. The default value is |
default_value | No | The default value to use when the input feature is empty. |
weighted | No | Specifies whether the input is in key:value format. If you set this parameter to |
value_dimension | No | This parameter truncates the output if a feature has multiple values. The default value is If the value is |
stub_type | No | The default value is false. If you set this parameter to true, the configured feature transform is used only as an intermediate result in the pipeline and is not output to the model. |
This operator supports binning. For more information, see Feature binning (discretization).
This operator supports multi-value inputs of the array type.
Example
The following examples use the item-side feature is_main to show feature input and output with different configurations:
Type | Value of item:is_main | Output feature |
int64_t | 100 | item_is_main_100 |
double | 5.2 | item_is_main_5.2 |
string | abc | item_is_main_abc |
Multi-value string | abc^]bcd | [item_is_main_abc, item_is_main_bcd] |
Multi-value int | 123^]456 | [item_is_main_123, item_is_main_456] |
The ^] symbol represents the multi-value separator. This is a single character with the ASCII code "\x1D", which can also be written as "\u001d".
raw_feature
Function introduction
The raw_feature operator represents a continuous feature. It supports numeric types such as int, float, and double, and handles both single-value and multi-value continuous features.
Configuration
{
"feature_type" : "raw_feature",
"feature_name" : "ctr",
"expression" : "item:ctr",
"normalizer" : "method=log10"
}Field name | Required | Description |
feature_name | Yes | The feature name. |
expression | Yes | The source field that the feature depends on. The source must be user, item, or context. |
normalizer | No | The normalization method. For more information, see the following sections. |
value_type | No | The data type of the output feature. The default value is float. |
separator | No | The separator for multi-value input features. The default value is "\u001D". Only a single character is supported. |
default_value | No | The default value to use when the input feature is empty. |
value_dimension | No | The dimension of the output field. The default value is 1. You can use this parameter to truncate the output in offline tasks. If the value is 1, the schema type of the output table is |
stub_type | No | The default value is false. If you set this parameter to true, the configured feature transform is used only as an intermediate result in the pipeline and is not output to the model. |
This operator supports binning. For more information, see Feature binning (discretization).
This operator supports multi-value inputs of the array type.
Example
The ^] symbol represents the multi-value separator. Note that this is a single character with the ASCII code "\x1D", not two characters.
Type | Value of item:ctr | Output feature |
int64_t | 100 | 100 |
double | 100.1 | 100.1 |
Multi-value int | 123^]456 | [123, 456] (The input field must have the same dimension as the configured dimension). |
Normalizer
The raw_feature and match_feature operators support four normalization methods: minmax, zscore, log10, and expression. The configuration and calculation methods are as follows:
minmax
Example configuration: method=minmax,min=2.1,max=2.2
Formula: x = (x - min) / (max - min)
zscore
Example configuration: method=zscore,mean=0.0,standard_deviation=10.0
Formula: x = (x - mean) / standard_deviation
log10
Example configuration: method=log10,threshold=1e-10,default=-10
Formula: x = x > threshold ? log10(x) : default;
expression
Example configuration: method=expression,expr=sign(x)
Formula: You can configure any function or expression. The variable name is fixed as x, which represents the input of the expression.
expr_feature
Function introduction
The expr_feature operator represents an expression feature. It evaluates an expression and outputs a feature value of the float type. It supports batch computing and broadcasting.
Note: When you use this feature operator, all inputs must be convertible to the double type.
Configuration
{
"feature_type" : "expr_feature",
"feature_name" : "ctr_sigmoid",
"expression" : "sigmoid(pv/(1+click))",
"variables": ["item:pv", "item:click"]
}If pv = 2, click = 3, the value of the expression feature is: 0.6224593312
Field name | Required | Description |
feature_name | Yes | The feature name. |
expression | Yes | Specifies the content of the expression. |
variables | Yes | The variables used in the expression, which are the input fields. The source must be user, item, or context. |
separator | No | The separator for multi-value input fields of the string type. The default value is "\u001D". Only a single character is supported. |
default_value | No | The default value to use when the input feature is empty. |
value_dimension | No | The dimension of the output field. The default value is 0. You can use this to truncate or pad the output. If the value is 1, the schema type of the output table is |
stub_type | No | The default value is false. If you set this parameter to true, the configured feature transform is used only as an intermediate result in the pipeline and is not output to the model. |
Configuration example
{
"feature_name": "expr_feat",
"feature_type": "expr_feature",
"expression": "a+b",
"variables": ["a", "b"],
"value_dimension": 3
}Scalar and vector calculations (broadcasting)
If variable
a=1and variableb=[1, 2, 6], the result is[2, 3, 7].
Vector-to-vector
element-wisecomputationIf variable
a=[3, 2, 1]and variableb=[1, 2, 6], the result is[4, 4, 7].
Supports temporary variables and comma expressions
For example:
x=roundp(a),(a-x)*b. In this example,xis a temporary variable and does not need to be configured invariables.A comma expression is evaluated from left to right. The value of the rightmost sub-expression is returned.
To reduce memory overhead, reuse existing variables as temporary variables where semantically appropriate.
Combining expression features and sequence features
{
"features": [
{
"feature_name": "sphere_distance",
"feature_type": "expr_feature",
"expression": "sphere_dist(click_id_lng,click_id_lat,j_lng,j_lat)",
"variables": ["user:click_id_lng", "user:click_id_lat", "item:j_lng", "item:j_lat"],
"default_value": "0",
"value_dimension": 3,
"stub_type": true
},
{
"feature_name": "time_diff",
"feature_type": "expr_feature",
"variables": ["user:cur_time", "user:clk_time_seq"],
"expression": "cur_time-clk_time_seq",
"default_value": "0",
"separator": ";",
"value_dimension": 3,
"stub_type": true
},
{
"sequence_name": "click_seq",
"sequence_length": 3,
"sequence_delim": ";",
"sequence_pk": "user:click_item",
"features": [
{
"feature_name": "spherical_distance",
"feature_type": "raw_feature",
"expression": "feature:sphere_distance",
"default_value": "0.0"
},
{
"feature_name": "time_diff_seq",
"feature_type": "id_feature",
"expression": "feature:time_diff",
"default_value": "0.0",
"num_buckets": 10000
}
]
}
]
}Expressions
Built-in functions (scalar)
Function name
Number of parameters
Description
rnd
0
Generate a random number between 0 and 1
sin
1
sine function
cos
1
cosine function
tan
1
tangens function
asin
1
arcus sine function
acos
1
arcus cosine function
atan
1
arcus tangens function
sinh
1
hyperbolic sine function
cosh
1
hyperbolic cosine
tanh
1
hyperbolic tangens function
asinh
1
hyperbolic arcus sine function
acosh
1
hyperbolic arcus tangens function
atanh
1
hyperbolic arcur tangens function
log2
1
logarithm to the base 2
log10
1
logarithm to the base 10
log
1
logarithm to base e (2.71828...)
ln
1
logarithm to base e (2.71828...)
exp
1
e raised to the power of x
sqrt
1
square root of a value
sign
1
sign function -1 if x<0; 1 if x>0
abs
1
absolute value
rint
1
round to nearest integer
round
1
Rounds to the nearest integer. It always rounds half away from zero.
roundp
1
Rounds to a custom precision. For example, roundp(3.14159,2)=3.14.
mod
2
Modulo operation
floor
1
Rounds down to the nearest integer.
ceil
1
Rounds up to the nearest integer.
trunc
1
Truncates to an integer by removing the decimal part.
sigmoid
1
sigmoid function
sphere_dist
4
sphere distance between two gps points, args(lng1, lat1, lng2, lat2)
haversine
4
haversine distance between two gps points, args(lng1, lat1, lng2, lat2)
min
var.
min of all arguments
max
var.
max of all arguments
sum
var.
sum of all arguments
avg
var.
mean value of all arguments
Note: The preceding built-in functions support batch computing and broadcasting.
Built-in vector operation functions
Function name
Number of parameters
Description
len
1
the length of a vector
l2_norm
1
l2 normalize of a vector
squared_norm
1
squared normalize of a vector
dot
2
dot product of two vectors
euclid_dist
2
euclidean distance between two vectors
corr
2
Pearson Correlation Coefficient of two vectors
std_dev
1
standard deviation of a vector, divide n
pop_std_dev
1
population standard deviation of a vector, divide n-1
variance
1
sample variance of a vector, divide n
pop_variance
1
population variance of a vector, divide n-1
reduce_min
1
reduce min of a vector
reduce_max
1
reduce max of a vector
reduce_sum
1
reduce sum of a vector
reduce_mean
1
reduce mean of a vector
reduce_prod
1
reduce product of a vector
Note: If an expression contains the preceding built-in vector operation functions, other variables that are not vector function parameters must be scalars.
Built-in binary operators
Operator
Description
Priority
=
assignement *
0
||
logical or
1
&&
logical and
2
|
bitwise or
3
&
bitwise and
4
<=
less or equal
5
>=
greater or equal
5
!=
not equal
5
==
equal
5
>
greater than
5
<
less than
5
+
addition
6
-
subtraction
6
*
multiplication
7
/
division
7
%
modulo
7
^
raise x to the power of y
8
The assignment operator is special because it modifies one of its arguments and can only be applied to variables.
Built-in ternary operator
Supports if-else syntax.
It uses lazy evaluation to ensure only the necessary branch of the expression is evaluated.
Operator
Description
Priority
?:
if then else operator
C++ style syntax
Built-in constants
Operator
Description
Priority
_pi
The one and only pi.
3.141592653589793238462643
_e
Euler's number.
2.718281828459045235360287
combo_feature
Function introduction
The combo_feature operator creates a combination (Cartesian product) of multiple fields or expressions. The id_feature operator can be considered a special case of combo_feature where only one field is used for the cross-product. Typically, the fields involved in the cross-product come from different tables, such as crossing a user feature with an item feature.
Configuration
{
"feature_type" : "combo_feature",
"feature_name" : "comb_age_item",
"expression" : ["user:age_class", "item:item_id"],
"need_prefix": true,
"separator": "\u001D",
"default_value": ""
}
Field name | Required | Description |
feature_name | Yes | The prefix for the output feature. |
expression | Yes | A list that specifies the source fields that the feature depends on. |
need_prefix | No | Specifies whether to add the feature_name as a prefix. Valid values:
|
value_type | No | The data type of the output feature. The default value is string. |
separator | No | The separator for multi-value input features. The default value is "\u001D". Only a single character is supported. |
default_value | No | The default value to use when the input feature is empty. |
value_dimension | No | The default value is 0. You can use this parameter to truncate the output in offline tasks. If the value is 1, the schema type of the output table is |
stub_type | No | The default value is false. If you set this parameter to true, the configured feature transform is used only as an intermediate result in the pipeline and is not output to the model. |
This operator supports binning. For more information, see Feature binning (discretization).
This operator supports multi-value inputs of the array type.
Example
The ^] symbol represents the multi-value separator. Note that this is a single character with the ASCII code "\x1D", not two characters.
Value of user:age_class | Value of item:item_id | Output feature |
123 | 45678 | comb_age_item_123_45678 |
abc, bcd | 45678 | [comb_age_item_abc_45678, comb_age_item_bcd_45678] |
abc, bcd | 12345^]45678 | [comb_age_item_abc_12345, comb_age_item_abc_45678, comb_age_item_bcd_12345, comb_age_item_bcd_45678] |
The number of output features is:
|F1| * |F2| * ... * |Fn|Fn refers to the number of values in the nth dependent field.
lookup_feature
Function introduction
Similar to match_feature, the lookup_feature operator matches and retrieves a required result from a set of key-value pairs.
The lookup_feature operator depends on two fields: map and key.
The
mapis a dictionary or a multi-value string (MultiString) field where each string is in a format such as "k1:v1".The
keycan be a field of any type. If you have multiple keys, an array-type input is recommended. When a feature is generated, the value of the key is retrieved, transformed into the key-value type of themap, and then matched against the key-value pairs in the map field to obtain the final feature.
Configuration
{
"feature_type": "lookup_feature",
"feature_name": "item_match_item",
"map": "item:item_attr",
"key": "item:item_value",
"need_discrete": true,
"need_key": true
}Field name | Required | Description |
feature_name | Yes | The prefix for the output feature. |
map | Yes | The content of the dictionary, which is a set of key-value pairs. |
key | Yes | The key to look up in the dictionary. |
value_type | No | The data type of the output feature. The default value is string. |
separator | No | The separator for the multi-value |
default_value | No | The default value to use when the input feature is empty. |
need_prefix | No | Specifies whether to add the feature_name as a prefix. Valid values:
|
need_key | No | Specifies whether to add the key as a prefix. This parameter takes effect only when value_type is set to string. Valid values:
|
normalizer | No | The normalization method. This parameter has the same meaning as the parameter of the same name for raw_feature. |
combiner | No | Specifies how to merge multiple values that are retrieved by multiple keys. Valid values: sum (default), avg/mean, max, and min. |
need_discrete | No | true: Does not execute the combiner and directly outputs multiple values. The default value is false. |
value_dimension | No | Valid values:
|
stub_type | No | The default value is false. If you set this parameter to true, the configured feature transform is used only as an intermediate result in the pipeline and is not output to the model. |
This operator supports binning. For more information, see Feature binning (discretization).
The dictionary supports map type inputs, and the key supports array type inputs.
Example
For the preceding configuration, assume that for a specific document:
item_attr : "k1:v1^]k2:v2^]k3:v3"The ^] symbol represents the multi-value separator. This is a single character with the ASCII code "\x1D", not two characters. You can enter this character in emacs by pressing C-q C-5, or in vim by pressing C-v C-5. Here, item_attr is a multi-value string.
When the map represents multiple key-value pairs, it is a multi-value string, not a single string.
item_value : "k2"The result of the feature transform is item_match_item_k2_v2.
need_prefix == true
feature_name: fg
map: {"k1:123", "k2:234", "k3:3"}
key: {"k1"}
Result: feature={"fg_123"}need_prefix == false
map: {"k1:123", "k2:234", "k3:3"}
key: {"k1"}
Result: feature={123}Merging query results
If there are multiple keys, you can configure a combiner to combine the multiple retrieved values. Possible configurations are sum, mean, max, and min.
To use a combiner, set need_discrete to false. In this case, the value must be a numeric type or a string that can be converted to a numeric type.
match_feature
Function introduction
The match_feature operator is typically used to define matching relationships between features. It implements a two-level map matching process.
Configuration
The configuration file uses the JSON format.
{
"feature_name": "user__l1_ctr_1",
"feature_type": "match_feature",
"category": "ALL",
"need_discrete": false,
"item": "item:category_level1",
"user": "user:l1_ctr_1",
"match_type": "hit"
}user: A nested dictionary (nested_dict), which is a dict of dicts.
The user field uses a string to describe a two-level map.
The
|character is the separator between items in the first-level map. The^character is the separator between the key and value in the first-level map.The
,character is the separator between items in the second-level map. The:character is the separator between the key and value in the second-level map.
category: The primary key, which is the key for looking up in the first-level map.
ALLis a wildcard character that indicates that all keys at this level can be matched.item: The secondary key, which is the key for looking up in the second-level map.
ALLis a wildcard character that indicates that all keys at this level can be matched.need_discrete
true: The model uses the feature name output by match_feature and ignores the feature value. The default value is false.
false: The model uses the feature value output by match_feature and ignores the feature name.
match_type
hit: Outputs the matched feature. The operator uses the value of category to search in the first-level map, and then uses the value of item to search in the second-level map to get a result. If you need only one level of matching instead of two, you can set the key in the first level of the map to ALL and also set the category parameter in the feature generation configuration to "ALL".
multihit: Allows the category and item fields to be set to the MATCH_WILDCARD option, which is "ALL", to match multiple values.
normalizer
Optional. The normalization method. This parameter has the same meaning as the parameter of the same name for raw_feature. This parameter takes effect only when
need_discrete=false.show_category
Specifies whether to add the
categoryprefix to the query result. The default value is true whenneed_discrete=trueandmatch_type=hit. Otherwise, the default value is false.show_item
Specifies whether to add the
itemprefix to the query result. The default value is true whenneed_discrete=trueandmatch_type=hit. Otherwise, the default value is false.value_type
Optional. The data type of the output feature. The default value is string.
separator
Optional. The separator for the multi-value
keyfield of the string type. The default value is "\u001D". Only a single character is supported.default_value
Optional. The default value to use when the input feature is empty.
value_dimension
Optional. The default value is 0. You can use this parameter to truncate the output in offline tasks. If the value is 1, the schema type of the output table is
value_type. Otherwise, the schema type isarray<value_type>.stub_type
Optional. The default value is false. If you set this parameter to true, the configured feature transform is used only as an intermediate result in the pipeline and is not output to the model.
Example
Example of a user-side feature (nested dict)
For example, a string such as 50011740^50011740:0.2,36806676:0.3,122572685:0.5|50006842^16788:0.1 is converted to a two-level map as follows:
{
"50011740": {
"50011740": 0.2,
"36806676": 0.3,
"122572685": 0.5
},
"50006842": {
"16788": 0.1
}
}hit
Example configuration for a hit match type.
{
"feature_name": "brand_hit",
"feature_type": "match_feature",
"category": "item:auction_root_category",
"need_discrete": true,
"item": "item:brand_id",
"user": "user:user_brand_tags_hit",
"match_type": "hit"
}Assume the field values are as follows:
Field | Value |
user_brand_tags_hit | 50011740^107287172:0.2,36806676:0.3,122572685:0.5|50006842^16788816:0.1,10122:0.2,29889:0.3,30068:19 |
auction_root_category | 50006842 |
brand_id | 30068 |
If need_discrete=true, the operator first uses the auction_root_category value 50006842 to query user_brand_tags_hit, which returns 16788816:0.1,10122:0.2,29889:0.3,30068:19. Then, it uses 30068 to query this result, which returns the value 19. The final result is: brand_hit_50006842_30068_19.
If need_discrete=false, the result is: 19.0.
If you use only one level of matching, you need to change the value of category in the preceding configuration to ALL. Assume the field values are as follows:
Field | Value |
user_brand_tags_hit | ALL^16788816:40,10122:40,29889:20,30068:20 |
brand_id | 30068 |
If need_discrete=true, the result is: brand_hit_ALL_30068_20.
If need_discrete=false, the result is: 20.0.
In this case, you can also use the lookup_feature operator. The value format in user_brand_tags_hit needs to be changed to: "16788816:40^]10122:40^]29889:20^]30068:20". The '^]' symbol is the multi-value separator \u001d, which is a non-printable character.
The lookup_feature operator supports complex input types such as map and array, and therefore provides better performance.
overlap_feature
Function introduction
Outputs features that contain string and word matching information. For example, in a search scenario, it calculates whether a search query is contained in a product title.
Method | Description |
query_common_ratio | Calculates the ratio of common terms between the query and the title to the total number of terms in the query. The value is in the range [0, 1]. |
title_common_ratio | Calculates the ratio of common terms between the query and the title to the total number of terms in the title. The value is in the range [0, 1]. |
is_contain | Calculates whether the entire query is contained in the title, preserving the order. Valid values:
|
is_equal | Calculates whether the query is identical to the title. Valid values:
|
index_of | Calculates the position of the first occurrence of the entire query in the title. Returns -1.0 if not found. |
proximity_min_cover | Calculates the proximity of query terms in the title. The value is in the range [0, length(title)]. A value of 0 indicates that there are terms that cannot be matched. |
proximity_min_dist | Calculates the proximity of query terms in the title (minimum pairwise distance). The value is in the range [0, length(title)+1]. A value of length(title)+1 indicates that no terms are matched. |
proximity_max_dist | Calculates the proximity of query terms in the title (maximum pairwise distance). The value is in the range [0, length(title)+1]. A value of length(title)+1 indicates that no terms are matched. |
proximity_avg_dist | Calculates the proximity of query terms in the title (average pairwise distance). The value is in the range [0, length(title)+1]. A value of length(title)+1 indicates that no terms are matched. |
The calculation method for Term Proximity Measures features is described in the paper "An Exploration of Proximity Measures in Information Retrieval".
Assume the term sequence of the title (document) is: t1,t2,t1,t3,t5,t4,t2,t3,t4
MinCover is defined as the length of the shortest document segment that covers each query term at least once in a document.
MinDist (Minimum pair distance): Calculates the minimum of all pairwise minimum distances. For example, if Q=t1,t2,t3, then MinDist=min(1,2,3)=1.
MaxDist (Maximum pair distance): The opposite of MinDist. It finds the maximum value. For example, if Q=t1,t2,t3, then MaxDist=max(1,2,3)=3.
AveDist (Average pair distance): Calculates the average of all pairwise minimum distances. For example, if Q=t1,t2,t3, then AveDist=(1+2+3)/3=2.
Note that all aggregate operators (MinDist, MaxDist, and AveDist) are defined based on the pairwise distances between matching search query terms. When a document matches only one search query term, MinDist, AveDist, and MaxDist are all defined as the length of the document.
Configuration
{
"feature_type" : "overlap_feature",
"feature_name" : "is_contain",
"query" : "user:attr1",
"title" : "item:attr2",
"method" : "is_contain",
"separator" : " ",
"normalizer" : ""
}Field name | Required | Description |
feature_type | Yes | The type of the feature. |
feature_name | Yes | The prefix for the output feature. |
query | Yes | The table that the query depends on. attr1 is a multi-value string. |
title | Yes | The table that the title depends on. attr2 is a multi-value string. |
method | Yes | Valid values include query_common_ratio, title_common_ratio, is_contain, and is_equal. |
separator | - | The separator character in the input. If not specified, the default is |
normalizer | No | The normalization method. This parameter has the same meaning as the parameter of the same name for raw_feature. |
stub_type | No | The default value is false. If you set this parameter to true, the configured feature transform is used only as an intermediate result in the pipeline and is not output to the model. |
The output of an overlap feature is of the float type.
Example 1
The query is "high,high2,fiberglass,abc", and the title is "high,quality,fiberglass,tube,for,golf,bag".
method | feature |
query_common_ratio | 0.5 |
title_common_ratio | 0.28 |
is_contain | 0 |
is_equal | 0 |
Example 2
method=index_of, and title is the cat sat on the mat.
query | feature |
the cat | 0 |
sat | 2 |
the mat | 4 |
cap | -1 |
gap | -1 |
sequence_feature
Function introduction
User historical behavior is an important feature. Historical behavior is typically a sequence, such as a click sequence or a purchase sequence. The entities that make up the sequence can be the items themselves or the properties of the items.
Configuration
For example, to process a user's click sequence with a length of 50, you can extract the item_id, price, and ts features for each sequence. Here, ts = request_time - event_time. The configuration is as follows:
{
"sequence_name": "click_50_seq",
"sequence_length": 50,
"sequence_delim": ";",
"sequence_pk": "user:click_50_seq",
"features": [
{
"feature_name": "item_id",
"feature_type": "id_feature",
"value_type": "string",
"expression": "item:item_id"
},
{
"feature_name": "price",
"feature_type": "raw_feature",
"expression": "item:price"
},
{
"feature_name": "ts",
"feature_type": "raw_feature",
"expression": "user:ts"
},
{
"feature_name": "time_diff_seq",
"feature_type": "custom_feature",
"operator_name": "SeqExpr",
"operator_lib_file": "3rdparty/lib64/libseq_expr.so",
"expression": ["user:cur_time", "user:clk_time_seq"],
"formula": "cur_time - clk_time_seq",
"sequence_fields": ["clk_time_seq"],
"default_value": "0",
"value_type": "double",
"is_op_thread_safe": false,
"value_dimension": 1
}
]
}sequence_name: The sequence name.
sequence_length: The maximum length of the sequence.
sequence_delim: The separator between elements in the sequence.
sequence_pk: The primary key of the sequence, such as user:click_50_seq. It stores the 50 most recent item IDs that the user clicked. The model inference service uses this field as a key to query
side info.The request parameters for the online inference service (EAS Processor) must include a feature whose key is the value of
sequence_pk.For example:
click_50_seq: 5410233389955966;1832586(The separator is the value configured forsequence_delim).In the preceding example, the value of the
click_50_seqfeature is 5410233389955966;1832586.
Item-side sub-features of the sequence do not need to be passed to the model inference service in the request parameters.
The model inference service uses this field as a key to query the item's
side info.For example, in this configuration, the
item_id, pricefeatures in the sequence feature do not need to be passed to the inference service in the request. Instead, they are read from the Processor's item cache and concatenated by the feature generation (FG) SDK within the Processor to ensure the format is consistent with that used during offline training.
User-side sub-features of the sequence must be passed to the model inference service in the request parameters.
The feature name is
${sequence_name}__${input_name}, such asclick_50_seq__ts.${input_name}is generally configured using theexpressionconfiguration item, but the configuration may vary for different sub-feature types, and${input_name}does not include aninput domainprefix (item:oruser:).
features: The
side infoof the sequence, which includes static property values of the item and behavior time information.sequence_fields: Specifies the field names of the input sequence. The value is a
stringor a[string]array.When the feature operator has only one input field, the content of that field must be a sequence. In this case, you do not need to configure
sequence_fields.When the feature operator has multiple input fields, if you do not configure
sequence_fields, all item-side features (item:XXX) are assumed to be sequence input fields.
The FG input table used in offline tasks must contain columns corresponding to all sub-features.
When a column is a sequence (based on the
sequence_fieldsrules), name it${sequence_name}__${input_name}.For example, in this configuration, the offline table requires four columns:
click_50_seq__item_id,click_50_seq__price,click_50_seq__ts, andclick_50_seq__clk_time_seq.We recommend that the column type in the offline table be an array for better performance. A
stringtype withsequence_delimas the element separator is also supported.
When a column is not a sequence, name it
${input_name}without a prefix.For example, in this configuration, the offline table requires one non-sequence column:
${cur_time}.
You can configure
input_aliasglobally to set a shorter alias for a long column name (see the example below).
This operator supports binning. For more information, see Feature binning (discretization). When binning is configured, the output element type is
int64, and the shape is determined by thevalue_dimensionconfiguration below.value_dimension (can be abbreviated as
value_dim): The dimension of each element in the sequence. Forsequence_raw_feature, if this is set to1, the output type isarray<float>. If set to other values, the output type isarray<array<float>>. Forsequence_id_feature, if this is set to1, the output type isarray<string>. If set to other values, the output type isarray<array<string>>. The default value is 0.
Any feature can be configured as a sub-feature of a sequence feature, as shown in the following example:
{
"features": [
{
"sequence_name": "common_seq",
"sequence_length": 50,
"sequence_delim": ";",
"sequence_pk": "user:click_50_seq",
"features": [
{
"feature_name": "item_id",
"feature_type": "id_feature",
"value_type": "String",
"expression": "item:item_id",
"value_dimension": 1
},
{
"feature_name": "price",
"feature_type": "raw_feature",
"expression": "item:price"
},
{
"feature_name": "ts",
"feature_type": "raw_feature",
"expression": "user:ts"
},
{
"feature_name": "expr_feat",
"feature_type": "expr_feature",
"expression": "a > b",
"variables": ["item:a", "item:b"],
"sequence_fields": "a",
"default_value": "0",
"value_dimension": 1
},
{
"feature_name": "lookup_feat",
"feature_type": "lookup_feature",
"map": "user:dict",
"key": "item:prop",
"separator": ",",
"default_value": "0",
"value_type": "float",
"combiner": "sum",
"boundaries": [0.0, 0.15, 0.5]
},
{
"feature_name": "match_feat",
"feature_type": "match_feature",
"user": "user:nested_dict",
"category": "item:pkey",
"item": "item:skey",
"separator": "\u001D",
"default_value": "0",
"matchType": "hit",
"value_type": "float",
"value_dimension": 1
},
{
"feature_name": "bm25_score",
"feature_type": "bm25_feature",
"separator": " ",
"default_value": "0",
"query": "user:query",
"document": "item:document",
"sequence_fields": "query",
"document_number": 100,
"avg_doc_length": 6,
"term_doc_freq_dict": {
"this": 30,
"example": 10,
"document": 15
}
},
{
"feature_name": "overlap_feat",
"feature_type": "overlap_feature",
"query": "user:query2",
"title": "item:title2",
"sequence_fields": "query2",
"method": "index_of",
"separator": " ",
"default_value": "-1"
},
{
"feature_type": "kv_dot_product",
"feature_name": "query_doc_sim",
"query": "user:query3",
"document": "item:title",
"sequence_fields": "query3",
"separator": "|",
"default_value": "0"
},
{
"feature_name": "seg_feat",
"feature_type": "tokenize_feature",
"expression": "input_a",
"default_value": "0",
"output_type": "word",
"tokenizer_type": "sentencepiece",
"vocab_file": "spmodel.model"
},
{
"feature_name": "txt_norm",
"feature_type": "text_normalizer",
"expression": "input",
"default_value": "<oov>",
"parameter": 28
},
{
"feature_name": "seq_combo_feat",
"feature_type": "combo_feature",
"expression": ["user:tags", "item:cat"],
"sequence_fields": ["tags"],
"separator": "_",
"default_value": "0",
"value_dimension": 1
},
{
"feature_name": "norm_str",
"feature_type": "str_replace_feature",
"expression": ["user:profile"],
"default_value": "",
"replace_file": "synonyms.txt",
"replacements": {
"|": "",
"aa": "x",
"a": "X"
},
"value_dimension": 1
},
{
"feature_name": "query_tokens",
"feature_type": "regex_replace_feature",
"expression": ["user:query_tokens"],
"default_value": "",
"value_type": "string",
"regex_pattern": [ "\\|", "#", "\\(.*\\)" ],
"replacement": "",
"value_dimension": 1
},
{
"feature_name": "slice",
"feature_type": "slice_feature",
"value_type": "int32",
"expression": ["context:array"],
"slice": "0:3",
"value_dimension": 3,
"num_buckets": 100000
},
{
"feature_name": "mask_feature",
"feature_type": "bool_mask_feature",
"value_type": "float",
"expression": [
"user:click_items",
"item:is_valid"
]
},
{
"feature_name": "time_diff_seq",
"feature_type": "custom_feature",
"operator_name": "SeqExpr",
"operator_lib_file": "3rdparty/lib64/libseq_expr.so",
"expression": ["user:cur_time", "user:clk_time_seq"],
"formula": "cur_time - clk_time_seq",
"sequence_fields": ["clk_time_seq"],
"default_value": "0",
"value_type": "double",
"is_op_thread_safe": false,
"value_dimension": 1
}
]
}
],
"input_alias": {
"common_seq__clk_time_seq": "clk_time_seq"
}
}Note: The input_alias parameter configures aliases for input fields. The format is "origin_field": "alias_field". You can use a shorter name to replace the original input field name.
Tiled configuration
In most cases, you can create a sequence version of a non-sequence feature by adding the sequence_ prefix to its feature_type. Note that you must typically configure a default_value for sequence versions of features.
Examples:
sequence_id_feature
sequence_raw_feature
Special case 1: Some feature transform types have both sequence and non-sequence versions.
In this case, you can activate the corresponding version by setting is_sequence: true/false.
In this case, the feature_type configuration item does not need the sequence_ prefix.
Examples:
Special case 2: Some feature transform types only have a sequence version, not a non-sequence version.
In this case, the feature_type configuration item does not need the sequence_ prefix.
Examples:
For these two special cases, you can add the following optional configurations:
sequence_length: The maximum length of the sequence. Any excess will be truncated. The default value is -1, which means no truncation.
sequence_delim: The separator between elements in the sequence. The default value is
;.
Configuration example:
{
"feature_name": "clk_seq__item_id",
"feature_type": "sequence_id_feature",
"sequence_name": "clk_seq",
"sequence_length": 50,
"sequence_delim": ";",
"expression": "item:clk_item_seq",
"separator": "\u001D",
"default_value": ""
},
{
"feature_name": "clk_seq__item_price",
"feature_type": "sequence_raw_feature",
"sequence_name": "clk_seq",
"sequence_length": 50,
"sequence_delim": ";",
"expression": "item:clk_item_prices",
"separator": "\u001D",
"default_value": "0"
},
{
"feature_name": "test",
"feature_type": "sequence_lookup_feature",
"map": "user:prefer_tags",
"key": "item:tags",
"sequence_length": 2,
"separator": ",",
"default_value": "-1024",
"value_type": "int32",
"normalizer": "method=expression,expr=x+1",
"combiner": "sum",
"default_bucketize_value": 50,
"num_buckets": 10000
},
{
"feature_name": "test",
"feature_type": "sequence_combo_feature",
"separator": "_",
"default_value": "0",
"expression": ["user:f1", "item:f2"],
"hash_bucket_size": 10000
}In the preceding example, the input fields clk_item_seq and clk_item_prices must be sequences. They can be of the array type or the string type, with element values separated by the character configured in sequence_delim.
With this configuration method, the online service (Processor) does not query sideinfo. The user must provide the complete input.
The input field names for tiled sequence features remain the same as configured and are not prefixed with
${sequence_name}__.
Online feature generation
There are two ways to obtain behavior side info. One way is to retrieve side info from the item cache of the EasyRec Processor. It uses the field configured in sequence_pk as the primary key to find item property information from the item cache. The other way is for the user to provide the corresponding field values in the request. For example, the "ts" field in the preceding configuration means (request_time - event_time). Because this value changes with the request time, it must be obtained from the request:
user_features {
key: "click_50_seq"
value {
string_feature: "9008721;34926279;22487529;73379;840804;911247;31999202;7421440;4911004;40866551"
}
}
user_features {
key: "click__ts"
value {
string_feature: "23;113;401363;401369;401375;401405;486678;486803;486922;486969"
}
}tokenize_feature
Function introduction
The tokenize_feature operator tokenizes an input string and returns the tokenized string or the token IDs. It supports tokenizer.json files from tokenize-cpp.
Tokenization dictionary format:
1. https://github.com/huggingface/tokenizers
2. https://github.com/mlc-ai/tokenizers-cpp
Configuration
{
"feature_name": "title_token",
"feature_type": "tokenize_feature",
"expression": "item:title",
"default_value": "",
"vocab_file": "tokenizer.json",
"tokenizer_type": "sentencepiece",
"output_type": "word_id",
"output_delim": ","
}
Field name | Required | Description |
feature_name | Yes | The feature name. |
expression | Yes | The source field that the feature depends on. The source must be user, item, or context. |
vocab_file | Yes | The path to the vocabulary file. |
default_value | - | The default value for the input string. |
tokenizer_type | - | Optional. The tokenizer type. If you set this to `sentencepiece` or leave it unset, the JSON content of the vocab_file determines which Hugging Face tokenizer to use. |
output_type | - |
|
output_delim | - | The separator for the output |
stub_type | No | Optional. The default value is false. If you set this parameter to true, the configured feature transform is used only as an intermediate result in the pipeline and is not output to the model. |
Example
If output_type=word_id, the operator takes a string as input and outputs a comma-separated string of token IDs.
Type | item:title | Output feature |
string | It is good today! | 1147,310,1175,3063,2 |
Configuration file examples
File name | Tokenizer type | Download link |
bert-base-chinese-vocab.json | WordPiece | |
tokenizer.json | BPE | |
spiece.model | sentencepiece |
text_normalizer
Function introduction
Performs text normalization. Functions include case conversion, traditional to simplified Chinese conversion, full-width to half-width character conversion, special character filtering, GBK/UTF8 encoding conversion, and Chinese character splitting.
Configuration
{
"feature_name": "txt_norm",
"feature_type": "text_normalizer",
"expression": "item:title",
"stop_char_file": "stop_char.txt",
"max_length": 256,
"parameter": 0,
"remove_space": false,
"is_gbk_input": false,
"is_gbk_output": false
}
Field name | Required | Description |
feature_name | Yes | The feature name. |
expression | Yes | The source field that the feature depends on. The source must be user, item, or context. |
stop_char_file | No | The path to a file that stores the special characters to be deleted. This file must use GBK encoding. If not configured, the system's built-in list of special characters is used. |
max_length | - | If the input text length exceeds this value, text normalization is not performed, and the input value is output as is. |
remove_space | - | Specifies whether to remove spaces. |
is_gbk_input | No | Specifies whether the input is GBK encoded. false indicates that the input is UTF-8 encoded. |
is_gbk_output | No | Specifies whether to use GBK encoding for the output. false indicates that the output uses UTF-8 encoding. |
parameter | - | Text normalization options. |
default_value | No | The default value to use when the input feature is empty. |
Note:
The
stop_char_filefile must use GBK encoding.Each line in the
stop_char_filefile can contain only one character. Otherwise, filtering will fail.
Text normalization options
For the parameter, select one or more of the following numeric values and add them together.
For example, to convert uppercase to lowercase, full-width to half-width, traditional to simplified Chinese, and filter special characters, set parameter=4+8+16+32=60.
The default value of the parameter is 60.
#define __NORMALIZED_LOWER2UPPER__ 2 /*Convert lowercase to uppercase*/
#define __NORMALIZED_UPPER2LOWER__ 4 /*Convert uppercase to lowercase*/
#define __NORMALIZED_SBC2DBC__ 8 /*Convert full-width to half-width*/
#define __NORMALIZED_BIG52GBK__ 16 /*Convert traditional to simplified Chinese*/
#define __NORMALIZED_FILTER__ 32 /*Filter special characters*/
#define __NORMALIZED_SPLITCHARS__ 512 /*Split Chinese characters into single characters (space-separated)*/Example
{
"feature_name": "txt_norm",
"feature_type": "text_normalizer",
"expression": "input_a",
"parameter": 28
}inputs=["Regular expression code generator", "HTML filtering tool", "Regular expression syntax cheat sheet", "The Cat/"]
outputs=["regular expression code generator", "html filtering tool", "regular expression syntax cheat sheet", "the cat/"]
bm25_feature
Function introduction
The BM25 (Best Matching) algorithm is a mainstream text matching algorithm in the field of information retrieval, typically used for search relevance scoring. It parses a query into morphemes
For Chinese text, you can tokenize the query as morpheme analysis, treating each word (term) as a morpheme
The general formula for the BM25 algorithm is as follows:
Here,
Term importance
There are multiple methods to determine the weight of a term's relevance to a document. A commonly used method is Inverse Document Frequency (IDF). The IDF formula is as follows:
Here,
The definition of IDF shows that for a given document collection, the more documents that contain
Term relevance
The relevance score between the term
In this formula,
The definition of
In summary, the relevance score formula for the BM25 algorithm is as follows:
The BM25 formula shows that different search relevance score calculation methods can be derived using various methods for tokenization, term weighting, and determining the relevance between a term and a document. This provides significant flexibility for algorithm design.
Configuration method
{
"feature_type": "bm25_feature",
"feature_name": "query_doc_relevance",
"query": "user:query",
"document": "item:title",
"term_doc_freq_file": "term_doc_freq.txt",
"avg_doc_length": 100.0,
"k1": 1.2,
"b": 0.75,
"separator": "\u001D",
"default_value": ""
}Field name | Required | Description |
feature_name | Yes | The name of the final output feature. |
query | Yes | The source of the query field that the feature uses. |
document | Yes | The source of the document field that the feature uses. |
term_doc_freq_file | No | The path to a file that contains terms and the number of documents that contain each term. Each line contains one record, with the term and its document count separated by a whitespace character. |
term_doc_freq_dict | No | The content is the same as |
k1 | No | A parameter for the BM25 algorithm. The default value is 1.2. Typical values include 1.2 and 2.0. |
b | No | A parameter for the BM25 algorithm. The default value is 0.75. |
separator | No | The separator for a multi-valued input feature. The default is |
normalizer | No | The normalization method. For more information, see the raw_feature configuration. |
default_value | No | The default value to use if the input feature is empty. |
stub_type | No | The default value is false. If set to true, the configured feature transformation is used only as an intermediate result in the pipeline and is not included in the final output to the model. |
Use either
term_doc_freq_fileorterm_doc_freq_dict. The former has priority, and the system uses it if both are configured.To use this feature for an online service, place the
term_doc_freq_filefile andfg.jsonin the same folder.
kv_dot_product
Function introduction
Calculates the dot product of two key-value index vectors or the size of the intersection of two sets.
Configuration method
{
"feature_type": "kv_dot_product",
"feature_name": "query_doc_sim",
"query": "user:query",
"document": "item:title",
"separator": "|",
"default_value": "0"
}Field name | Required | Description |
feature_name | Yes | The name of the output feature. |
query | Yes | The source of the query field that the feature depends on. |
document | Yes | The source of the document field that the feature depends on. |
separator | No | Specifies the separator for multi-value input features. The default value is "\u001D". The separator must be a single character. |
kv_delimiter | No | Specifies the separator between key-value pairs in the input feature. The default value is ":". The separator must be a single character. |
normalizer | No | The normalization method. For more information, see the configuration of raw_feature. |
default_value | No | The default value to use if the input feature is empty. The default value is 0. |
stub_type | No | The default value is `false`. If this parameter is set to `true`, the configured feature transformation is used only as an intermediate result in the pipeline and is not output to the model. |
This feature supports complex input types, such as array and map. Use complex types when possible.
If an input does not have a
valuepart, the defaultvalueis 1.0. Use this behavior to calculate the size of the intersection of two sets.If
default_valueis not configured, the default value is set to 0.
Examples
query | document | output |
"a:0.5|b:0.5" | "d:0.5|b:0.5" | 0.25 |
["a:0.5", "b:0.5"] | ["d:0.5", "b:0.5"] | 0.25 |
{"a":0.5, "b":0.5} | {"d":0.5, "b":0.5} | 0.25 |
["a:0.5", "b:0.5"] | {"d":0.5, "b":0.5} | 0.25 |
["a", "b", "c"] | ["a", "b", "d"] | 2.0 |
["a", "b", "c"] | "a|b|d" | 2.0 |
["a", "b", "c"] | {"a":0.5, "b":0.5} | 1.0 |
str_replace_feature(Click to expand for details)
Function introduction
The str_replace_feature is a string replacement feature. It replaces all matched substrings with specified substrings.
Overlapping matches are replaced greedily.
Configuration method
{
"feature_name": "norm_str",
"feature_type": "str_replace_feature",
"expression": ["user:query"],
"default_value": "",
"replacements": {
"brown": "box",
"dogs": "jugs",
"fox": "with",
"jumped": "five",
"over": "dozen",
"quick": "my",
"the": "pack",
"the lazy": "liquor",
"|": "",
"aa": "x",
"a": "X"
},
"value_dimension": 1
}Field name | Description |
feature_name | Required. The name of the output feature. |
expression | Required. The expression describes the source field that the feature depends on. |
default_value | Optional. The default value to use if the input feature is empty. |
replacements | Optional. This parameter becomes required if |
replace_file | Optional. This parameter becomes required if |
is_sequence | Optional. Marks whether the feature is a sequence feature. The default value is |
sequence_length | Optional. The maximum length of the sequence. The sequence is truncated if it exceeds this length. |
sequence_delim | Optional. The separator between sequence elements. Set this only when the input is a string. |
separator | Optional. This parameter is valid only when |
value_dimension | Optional. The default value is 0. This can be used in offline tasks to truncate the output. |
stub_type | Optional. The default value is false. If set to true, the configured feature transformation is used only as an intermediate result in the pipeline and is not output to the model. |
You can configure both
replace_fileandreplacements. The replacement dictionaries are merged. Thereplacementsdictionary has a higher priority.Binning operations are supported. For configuration methods, see the Feature Binning (Discretization) operation documentation:
hash_bucket_size: Hashes the feature transformation result and performs a modulo operation.
vocab_list: Bins the input based on a vocabulary list and maps the input to an index in the vocabulary.
vocab_dict: The binning result is the value from the vocab_dict dictionary that corresponds to the feature value.
vocab_file: Reads the vocab_list or vocab_dict from a file.
Multi-value inputs of the array type are supported.
Example
The execution results of the preceding configuration are as follows:
Value of user:query | Output feature |
the quick brown fox jumped over the lazy dogs | pack my box with five dozen liquor jugs |
aaa | xX |
Feature|Generation|Tool|Very useful | Feature Generation Tool Very useful |
regex_replace_feature(Click to expand for details)
Function introduction
The regex_replace_feature feature replaces matched substrings with a specified substring.
You can configure multiple patterns. Any substring that matches one of the patterns is replaced.
Configuration method
{
"feature_name": "query",
"feature_type": "regex_replace_feature",
"expression": ["user:query"],
"regex_pattern": "\\|",
"replacement": " ",
"default_value": ""
}Field name | Description |
feature_name | Required. The name of the output feature. |
expression | Required. The expression describes the source field that the feature depends on. |
default_value | Optional. The default value to use if the input feature is empty. |
regex_pattern | Required. The regular expression. Matched text segments are replaced. |
replacement | Optional. The replacement text. If this parameter is empty, the matched text segments are deleted. |
replace_all | Optional. Specifies whether to perform a global replacement. The default value is |
icase | Optional. Specifies whether the regular expression matching is case-sensitive. The default value is |
is_sequence | Optional. Marks whether the feature is a sequence feature. The default value is |
sequence_length | Optional. The maximum length of the sequence. The sequence is truncated if it exceeds this value. |
sequence_delim | Optional. The separator between sequence elements. Set this parameter only when the input is a string. |
separator | Optional. This parameter is valid only when |
value_dimension | Optional. The default value is 0. You can use this parameter in an offline task to truncate the output. |
stub_type | Optional. The default value is false. If this parameter is set to true, the configured feature transformation is used only as an intermediate result in the pipeline and is not included in the final output to the model. |
This feature supports binning operations. For more information about the configuration, see the Feature binning (discretization) document:
hash_bucket_size: Performs a hash and a modulo operation on the feature transformation result.
vocab_list: Bins data based on a vocabulary list and maps the input to an index in the list.
vocab_dict: The binning result is the value from the vocab_dict dictionary that corresponds to the feature value.
vocab_file: Reads a vocab_list or vocab_dict from a file.
This feature supports multi-valued input of the array type.
Examples
Value of user:query | Output feature |
China|People|Republic | People's Republic of China |
Feature|Generation|Tool|Is great | Feature Generation Tool Is great |
bool_mask_feature (Click to expand for details)
Function introduction
Filters elements using a Boolean value. This is similar to tf.boolean_mask(tensor, mask).
This is a sequence feature.
Configuration
{
"feature_name": "mask_feature",
"feature_type": "bool_mask_feature",
"value_type": "float",
"expression": [
"user:click_items",
"item:is_valid"
],
"sequence_delim": ","
}Field name | Meaning |
feature_name | Required. The `feature_name` is the prefix for the final output feature. |
expression | Required. A list. The `expression` describes the source fields for the feature. The second field is the mask. |
default_value | Optional. The default value to use when the input feature is empty. If this parameter is not set, the default value is |
value_type | Required. The data type of the output feature. |
sequence_length | Optional. The maximum length of the sequence. The sequence is truncated if it exceeds this length. |
sequence_delim | Optional. The separator between sequence elements. This parameter is required only if the input is a string. |
separator | Optional. The multi-value separator for the input. The default value is "\u001D". The separator must be a single character. |
value_dimension | Optional. The default value is 0. This parameter is used in an offline task to truncate the output. |
normalizer | Optional. The normalization method. This parameter applies only to numerical features. For more information, see RawFeature. |
stub_type | Optional. The default value is false. If set to true, the feature transformation is used only as an intermediate result in the pipeline. It is not included in the final output to the model. |
Supports binning operations. For more information about configuration, see Feature binning (discretization).
Supports multi-value inputs represented by array types and nested array types.
Examples
Input | Mask | Output |
"123,456,90,80" | "true,false,true,false" | ["123", "90"] |
"123,456,90,80" | [1, 0, 1, 0] | ["123", "90"] |
[1, 2, 3, 4] | [1, 0, 1, 0] | [1, 3] |
[1, 2, 3, 4] | "true,false,true,false" | [1, 3] |
Use with expression features
{
"features": [
{
"feature_name": "mask",
"feature_type": "expr_feature",
"expression": "price>100",
"variables": ["item:price"],
"value_dimension": 3
},
{
"feature_name": "filter_list",
"feature_type": "bool_mask_feature",
"expression": [
"user:click_items",
"feature:mask"
],
"num_buckets": 10000
}
]
}slice_feature
Function introduction
Slices an input array using Python-like syntax or gets an element from a specific index.
This is a type of sequence feature.
Configuration
{
"feature_name": "test_feature",
"feature_type": "slice_feature",
"value_type": "float",
"expression": [
"user:click_items"
],
"slice": "2:4"
}Field name | Description |
feature_name | Required. The feature_name is used as the prefix for the final output feature. |
expression | Required. A list. The expression describes the source fields that the feature depends on. The second field represents the Mask. |
default_value | Optional. The default value to use if the input feature is empty. If not specified, the default is |
value_type | Required. Specifies the type of the output feature. |
sequence_length | The maximum length of the sequence. If the length exceeds this value, the sequence is truncated. |
sequence_delim | The separator between sequence elements. Set this parameter only if the input is a string. |
separator | Optional. The separator for multi-value inputs. The default is "\u001D". Only a single character is allowed. |
value_dimension | Optional. The default value is 0. This parameter can be used in an offline task to truncate the output. |
normalizer | Optional. The normalization method. This is valid only for numerical features. For more information, see RawFeature. |
stub_type | Optional. The default value is false. If set to true, the configured feature transformation is used only as an intermediate result in the pipeline and is not included in the final output to the model. |
placeholder | A special value used in sequence features to fill empty positions and pad dimensions. The default value for floating-point numbers is |
Supports binning operations. For more information, see Feature binning (discretization).
Supports multi-value inputs using array and nested array types.
Example
When sequence_delim="," and value_dimension=1, the inputs and outputs are as follows:
Input | slice | Output |
"123,456,90,80" | 0 | "123" |
"123,456,90,80" | 2 | "90" |
"123,456,90,80" | 1:3 | ["456", "90"] |
[1, 2, 3, 4] | :2 | [1, 2] |
[1, 2, 3, 4] | 2: | [3, 4] |
[1, 2, 3, 4] | 1:4:2 | [2, 4] |
[1, 2, 3, 4] | ::-1 | [4, 3, 2, 1] |
[1, 2, 3, 4] | 2:-1:-1 | [3, 2, 1] |