EasyRec helps you build models based on your business requirements. This way, you can quickly validate various algorithm ideas, reuse core algorithm components in models in different scenarios, and build new models by permutations and combinations.
Limits
Componentized EasyRec 0.8.0 or later is supported.
Why is componentization required?
1. Build models in a flexible manner
You can build models by using public components that are dynamically pluggable. The EasyRec framework provides the "glue" syntax to achieve seamless connections between components.
2. Reuse components
Most models are called new models because they introduce one or more special sub-components. You can assemble the sub-components to build new models in an efficient manner.
In the past, to add a new optional public component, such as Dense Feature Embedding Layer or SENet, to an existing model, you needed to modify the entire model code to apply the new features. The process is cumbersome and error-prone. Given a larger number of models and public components, adding all optional public components to all models requires the modification of an immense amount of code.
Componentization helps decouple underlying public components from upper-level models.
3. Improve the iteration efficiency of experiments
You can add new features to a model in a convenient manner. To develop a new model, you need to only create its unique components, and then assemble the rest by using existing components in the component library.
Now, we need to develop a Keras Layer class for new features and add import statements to specific packages. This way, the EasyRec framework can automatically recognize and add the new features to the component library without additional operations. New developers no longer need to be familiar with all aspects of EasyRec to add new features to the framework. This greatly improves the development efficiency.
Objective of componentization
Instead of building a new model, you can create components and assemble them. This makes your work much easier.
Each component focuses on the implementation of its own features, and the code in the component is highly aggregated and works to achieve the same objective. This is commonly referred to as the single responsibility principle.
Backbone network
A componentized EasyRec model uses a configurable backbone network as its core component. A backbone network is a Directed Acyclic Graph (DAG) that consists of multiple blocks. The EasyRec framework executes the code logic of blocks in topological order of the DAG to build a subgraph of the TensorFlow graph. The output nodes of the DAG are specified by the concat_blocks parameter. The output tensors of all blocks are concatenated as the input of the top optional Multi-Layer Perceptron (MLP) layer or directly connected to the final prediction layer.

Case 1: Wide&Deep model
Configuration file: wide_and_deep_backbone_on_movielens.config
model_config: {
model_name: "WideAndDeep"
model_class: "RankModel"
feature_groups: {
group_name: 'wide'
feature_names: 'user_id'
feature_names: 'movie_id'
feature_names: 'job_id'
feature_names: 'age'
feature_names: 'gender'
feature_names: 'year'
feature_names: 'genres'
wide_deep: WIDE
}
feature_groups: {
group_name: 'deep'
feature_names: 'user_id'
feature_names: 'movie_id'
feature_names: 'job_id'
feature_names: 'age'
feature_names: 'gender'
feature_names: 'year'
feature_names: 'genres'
wide_deep: DEEP
}
backbone {
blocks {
name: 'wide'
inputs {
feature_group_name: 'wide'
}
input_layer {
only_output_feature_list: true
wide_output_dim: 1
}
}
blocks {
name: 'deep_logit'
inputs {
feature_group_name: 'deep'
}
keras_layer {
class_name: 'MLP'
mlp {
hidden_units: [256, 256, 256, 1]
use_final_bn: false
final_activation: 'linear'
}
}
}
blocks {
name: 'final_logit'
inputs {
block_name: 'wide'
input_fn: 'lambda x: tf.add_n(x)'
}
inputs {
block_name: 'deep_logit'
}
merge_inputs_into_list: true
keras_layer {
class_name: 'Add'
}
}
concat_blocks: 'final_logit'
}
model_params {
l2_regularization: 1e-4
}
embedding_regularization: 1e-4
}Performance comparison based on the MovieLens-1M dataset:
Model | Epoch | AUC |
Wide&Deep | 1 | 0.8558 |
Wide&Deep(Backbone) | 1 | 0.8854 |
Note: A model built by using components works better than a built-in model because the MLP layer can use optimal initialization methods.
A backbone network is defined by using a protobuf message and consists of multiple blocks. Each block is a reusable component.
Each
blockhas a unique name and one or more inputs and outputs.Each input can only be the name of a
feature group, anotherblock, or ablock package. If ablockhas multiple inputs, the inputs are automatically merged. For example, if the inputs are in the list format, the inputs are automatically merged. If the inputs are tensors, the tensors are automatically concatenated.All
blocksform a DAG based on the relationships between the inputs and outputs. The EasyRec framework automatically parses the topological relationships of the DAG and executes the components represented by the blocks in topological order.If a
blockhas multiple outputs, a Python tuple is returned. You can configureinput_slicefor a downstreamblockto obtain an element of the tuple to use as the input of the downstream block by using the Python slice syntax, or useinput_fnto configure a custom lambda function to obtain an element of the tuple.The components represented by
blocksare commonly Keras layers, which are reusable subnetwork components. The EasyRec framework allows you to load a custom Keras layer and all built-in Keras layers.Each
blockcan be associated with aninput layerto perform additional operations on the inputfeature group, such asbatch normalization,layer normalization, orfeature dropout. You can also specify the format of an output tensor, such as 2D, 3D, list, or other formats. Note: If ablockis associated with aninput_layer, you must configure feature_group_name as the name of afeature group. If ablockis not associated with aninput layer, the name of the block cannot be the same as the name of afeature group.When some special
blocksare associated with special components, includingLambda layer,sequential layers,repeated layer, andrecurrent layer, the special components achieve the features of custom expressions, executing multiple layers in sequence, repeatedly executing a specific layer, and looping over a specific layer.You can configure the
concat_blocksparameter to specify the name of an output node in a DAG. If you configure multiple output nodes, multiple output tensors are automatically concatenated.If you do not configure the
concat_blocksparameter, the EasyRec framework automatically concatenates all leaf nodes in a DAG and returns the output.You can configure an optional
MLPlayer for a backbone network.
Case 2: DeepFM model
Configuration file: deepfm_backbone_on_movielens.config
This case includes two special blocks. One block has a custom lambda function configured. The other block has the tf.keras.layers.Add built-in Keras layer loaded.
model_config: {
model_name: 'DeepFM'
model_class: 'RankModel'
feature_groups: {
group_name: 'wide'
feature_names: 'user_id'
feature_names: 'movie_id'
feature_names: 'job_id'
feature_names: 'age'
feature_names: 'gender'
feature_names: 'year'
feature_names: 'genres'
wide_deep: WIDE
}
feature_groups: {
group_name: 'features'
feature_names: 'user_id'
feature_names: 'movie_id'
feature_names: 'job_id'
feature_names: 'age'
feature_names: 'gender'
feature_names: 'year'
feature_names: 'genres'
feature_names: 'title'
wide_deep: DEEP
}
backbone {
blocks {
name: 'wide_logit'
inputs {
feature_group_name: 'wide'
}
input_layer {
wide_output_dim: 1
}
}
blocks {
name: 'features'
inputs {
feature_group_name: 'features'
}
input_layer {
output_2d_tensor_and_feature_list: true
}
}
blocks {
name: 'fm'
inputs {
block_name: 'features'
input_slice: '[1]'
}
keras_layer {
class_name: 'FM'
}
}
blocks {
name: 'deep'
inputs {
block_name: 'features'
input_slice: '[0]'
}
keras_layer {
class_name: 'MLP'
mlp {
hidden_units: [256, 128, 64, 1]
use_final_bn: false
final_activation: 'linear'
}
}
}
blocks {
name: 'add'
inputs {
block_name: 'wide_logit'
input_fn: 'lambda x: tf.reduce_sum(x, axis=1, keepdims=True)'
}
inputs {
block_name: 'fm'
}
inputs {
block_name: 'deep'
}
merge_inputs_into_list: true
keras_layer {
class_name: 'Add'
}
}
concat_blocks: 'add'
}
model_params {
l2_regularization: 1e-4
}
embedding_regularization: 1e-4
}Performance comparison based on the MovieLens-1M dataset:
Model | Epoch | AUC |
DeepFM | 1 | 0.8867 |
DeepFM(Backbone) | 1 | 0.8872 |
Case 3: DCN model
Configuration file: dcn_backbone_on_movielens.config
This case includes a special Deep & Cross Network (DCN) block that uses the recurrent layer to loop over a component multiple times. This case also has an MLP layer added to the DAG.
model_config: {
model_name: 'DCN V2'
model_class: 'RankModel'
feature_groups: {
group_name: 'all'
feature_names: 'user_id'
feature_names: 'movie_id'
feature_names: 'job_id'
feature_names: 'age'
feature_names: 'gender'
feature_names: 'year'
feature_names: 'genres'
wide_deep: DEEP
}
backbone {
blocks {
name: "deep"
inputs {
feature_group_name: 'all'
}
keras_layer {
class_name: 'MLP'
mlp {
hidden_units: [256, 128, 64]
}
}
}
blocks {
name: "dcn"
inputs {
feature_group_name: 'all'
input_fn: 'lambda x: [x, x]'
}
recurrent {
num_steps: 3
fixed_input_index: 0
keras_layer {
class_name: 'Cross'
}
}
}
concat_blocks: ['deep', 'dcn']
top_mlp {
hidden_units: [64, 32, 16]
}
}
model_params {
l2_regularization: 1e-4
}
embedding_regularization: 1e-4
}In the preceding configurations, the cross layer is looped over three times, which is logically equivalent to executing the following statements:
x1 = Cross()(x0, x0)
x2 = Cross()(x0, x1)
x3 = Cross()(x0, x2)Performance comparison based on the MovieLens-1M dataset:
Model | Epoch | AUC |
DCN (built-in) | 1 | 0.8576 |
DCN_v2 (backbone) | 1 | 0.8770 |
Note: The new Cross component is used in the DCN model of the v2 version, which has more parameters. The built-in DCN model is of the v1 version.
Case 4: DLRM model
Configuration file: dlrm_backbone_on_criteo.config
model_config: {
model_name: 'DLRM'
model_class: 'RankModel'
feature_groups: {
group_name: "dense"
feature_names: "F1"
feature_names: "F2"
...
wide_deep:DEEP
}
feature_groups: {
group_name: "sparse"
feature_names: "C1"
feature_names: "C2"
feature_names: "C3"
...
wide_deep:DEEP
}
backbone {
blocks {
name: 'bottom_mlp'
inputs {
feature_group_name: 'dense'
}
keras_layer {
class_name: 'MLP'
mlp {
hidden_units: [64, 32, 16]
}
}
}
blocks {
name: 'sparse'
inputs {
feature_group_name: 'sparse'
}
input_layer {
output_2d_tensor_and_feature_list: true
}
}
blocks {
name: 'dot'
inputs {
block_name: 'bottom_mlp'
}
inputs {
block_name: 'sparse'
input_slice: '[1]'
}
keras_layer {
class_name: 'DotInteraction'
}
}
blocks {
name: 'sparse_2d'
inputs {
block_name: 'sparse'
input_slice: '[0]'
}
}
concat_blocks: ['sparse_2d', 'dot']
top_mlp {
hidden_units: [256, 128, 64]
}
}
model_params {
l2_regularization: 1e-5
}
embedding_regularization: 1e-5
}Performance comparison based on the Criteo dataset:
Model | Epoch | AUC |
DLRM | 1 | 0.79785 |
DLRM (backbone) | 1 | 0.7993 |
Note: DotInteraction is a newly developed component for performing dot product operations on pairwise feature interactions.
In this model, the first input of the dot block is a tensor and the second input of the dot block is a list. In this case, the first input is inserted into the list and merged into a large list as the input of the block.
Case 5: Add a numerical feature embedding component to a Deep Learning Recommendation Model (DLRM)
Configuration file: dlrm_on_criteo_with_periodic.config
Compared with Case 4, this case has an added PeriodicEmbedding layer, which shows the flexibility and scalability of componentized programming.
You need to focus on the parameter configuration method of the PeriodicEmbedding layer. Instead of using a custom protobuf message for passing parameters, the built-in google.protobuf.Struct is used as the parameter of the custom layer. Actually, the custom layer also supports parameter passing by using a custom protobuf message. The framework provides a common parameter API to support the two methods for passing parameters.
model_config: {
model_class: 'RankModel'
feature_groups: {
group_name: "dense"
feature_names: "F1"
feature_names: "F2"
...
wide_deep:DEEP
}
feature_groups: {
group_name: "sparse"
feature_names: "C1"
feature_names: "C2"
...
wide_deep:DEEP
}
backbone {
blocks {
name: 'num_emb'
inputs {
feature_group_name: 'dense'
}
keras_layer {
class_name: 'PeriodicEmbedding'
st_params {
fields {
key: "output_tensor_list"
value { bool_value: true }
}
fields {
key: "embedding_dim"
value { number_value: 16 }
}
fields {
key: "sigma"
value { number_value: 0.005 }
}
}
}
}
blocks {
name: 'sparse'
inputs {
feature_group_name: 'sparse'
}
input_layer {
output_2d_tensor_and_feature_list: true
}
}
blocks {
name: 'dot'
inputs {
block_name: 'num_emb'
input_slice: '[1]'
}
inputs {
block_name: 'sparse'
input_slice: '[1]'
}
keras_layer {
class_name: 'DotInteraction'
}
}
blocks {
name: 'sparse_2d'
inputs {
block_name: 'sparse'
input_slice: '[0]'
}
}
blocks {
name: 'num_emb_2d'
inputs {
block_name: 'num_emb'
input_slice: '[0]'
}
}
concat_blocks: ['num_emb_2d', 'dot', 'sparse_2d']
top_mlp {
hidden_units: [256, 128, 64]
}
}
model_params {
l2_regularization: 1e-5
}
embedding_regularization: 1e-5
}Performance comparison based on the Criteo dataset:
Model | Epoch | AUC |
DLRM | 1 | 0.79785 |
DLRM (backbone) | 1 | 0.7993 |
DLRM (periodic) | 1 | 0.7998 |
Case 6: Use a built-in Keras layer to build a Deep Neural Network (DNN) model
Configuration file: mlp_on_movielens.config
This case is provided only to demonstrate that the componentized EasyRec framework can use a built-in atomic Keras layer of TensorFlow as a common component. We already have a custom MLP layer, which is more convenient to use.
This case includes a special sequential block, which can define multiple layers connected in series. The output of the former layer is used as the input of the later layer. Compared with defining multiple common blocks, defining multiple sequential blocks is more convenient.
Note: To call a built-in Keras layer, parameters must be passed by using the google.proto.Struct format.
model_config: {
model_class: "RankModel"
feature_groups: {
group_name: 'features'
feature_names: 'user_id'
feature_names: 'movie_id'
feature_names: 'job_id'
feature_names: 'age'
feature_names: 'gender'
feature_names: 'year'
feature_names: 'genres'
wide_deep: DEEP
}
backbone {
blocks {
name: 'mlp'
inputs {
feature_group_name: 'features'
}
layers {
keras_layer {
class_name: 'Dense'
st_params {
fields {
key: 'units'
value: { number_value: 256 }
}
fields {
key: 'activation'
value: { string_value: 'relu' }
}
}
}
}
layers {
keras_layer {
class_name: 'Dropout'
st_params {
fields {
key: 'rate'
value: { number_value: 0.5 }
}
}
}
}
layers {
keras_layer {
class_name: 'Dense'
st_params {
fields {
key: 'units'
value: { number_value: 256 }
}
fields {
key: 'activation'
value: { string_value: 'relu' }
}
}
}
}
layers {
keras_layer {
class_name: 'Dropout'
st_params {
fields {
key: 'rate'
value: { number_value: 0.5 }
}
}
}
}
layers {
keras_layer {
class_name: 'Dense'
st_params {
fields {
key: 'units'
value: { number_value: 1 }
}
}
}
}
}
concat_blocks: 'mlp'
}
model_params {
l2_regularization: 1e-4
}
embedding_regularization: 1e-4
}Performance comparison based on the MovieLens-1M dataset:
Model | Epoch | AUC |
MLP | 1 | 0.8616 |
Case 7: Contrastive learning by using block packages
Configuration file: contrastive_learning_on_movielens.config
This case demonstrates how to use a block package. You can package a set of blocks into a block package to build a reusable subnetwork that can be called multiple times in the same model by using shared parameters. In contrast, blocks that are not packaged cannot be called multiple times but the results can be reused multiple times.
Block packages are designed for scenarios such as self-supervised learning and contrastive learning.
model_config: {
model_name: "ContrastiveLearning"
model_class: "RankModel"
feature_groups: {
group_name: 'user'
feature_names: 'user_id'
feature_names: 'job_id'
feature_names: 'age'
feature_names: 'gender'
wide_deep: DEEP
}
feature_groups: {
group_name: 'item'
feature_names: 'movie_id'
feature_names: 'year'
feature_names: 'genres'
wide_deep: DEEP
}
backbone {
blocks {
name: 'user_tower'
inputs {
feature_group_name: 'user'
}
keras_layer {
class_name: 'MLP'
mlp {
hidden_units: [256, 128]
}
}
}
packages {
name: 'item_tower'
blocks {
name: 'item'
inputs {
feature_group_name: 'item'
}
input_layer {
dropout_rate: 0.2
}
}
blocks {
name: 'item_encoder'
inputs {
block_name: 'item'
}
keras_layer {
class_name: 'MLP'
mlp {
hidden_units: [256, 128]
}
}
}
}
blocks {
name: 'contrastive_learning'
inputs {
package_name: 'item_tower'
}
inputs {
package_name: 'item_tower'
}
merge_inputs_into_list: true
keras_layer {
class_name: 'AuxiliaryLoss'
st_params {
fields {
key: 'loss_type'
value: { string_value: 'info_nce' }
}
fields {
key: 'loss_weight'
value: { number_value: 0.1 }
}
fields {
key: 'temperature'
value: { number_value: 0.2 }
}
}
}
}
blocks {
name: 'top_mlp'
inputs {
block_name: 'contrastive_learning'
ignore_input: true
}
inputs {
block_name: 'user_tower'
}
inputs {
package_name: 'item_tower'
reset_input {}
}
keras_layer {
class_name: 'MLP'
mlp {
hidden_units: [128, 64]
}
}
}
concat_blocks: 'top_mlp'
}
model_params {
l2_regularization: 1e-4
}
embedding_regularization: 1e-4
}AuxiliaryLoss is a layer used to calculate contrastive learning loss. For more information, see Component parameters.
Additional input configurations:
ignore_input: true indicates that the current input is ignored. The input is added only to control the execution order of the topology.
reset_input: resets the parameters of the input layer when the
packageis called. You can configure parameters that are different from the parameters defined by thepackage.
Note that no concat_blocks is configured for the package named item_tower in this case. The framework automatically sets the package as a leaf node of the DAG.
In this case, the item_tower package is called three times, the dropout configuration at the input layer takes effect in the first two calls, which is used to calculate the contrastive learning loss function, and the configuration of the input layer is reset in the last call with the dropout operation not performed. The item_tower package of the main model shares parameters with the item_tower package in the auxiliary task of contrastive learning. The item_tower package in the auxiliary task generates augmented samples by applying dropout to the input feature embeddings. The item_tower package of the main model does not perform data augmentation operations.
Performance comparison based on the MovieLens-1M dataset:
Model | Epoch | AUC |
MultiTower | 1 | 0.8814 |
ContrastiveLearning | 1 | 0.8728 |
For information about a complex case of a contrastive learning model, see CL4SRec.
Case 8: Multi-objective model MMoE
The model_class parameter of a multi-objective model is commonly set to MultiTaskModel, and you need to configure multiple towers for each objective in model_params. model_name is a custom string that is only used for comments.
model_config {
model_name: "MMoE"
model_class: "MultiTaskModel"
feature_groups {
group_name: "all"
feature_names: "user_id"
feature_names: "cms_segid"
...
feature_names: "tag_brand_list"
wide_deep: DEEP
}
backbone {
blocks {
name: 'all'
inputs {
feature_group_name: 'all'
}
input_layer {
only_output_feature_list: true
}
}
blocks {
name: "senet"
inputs {
block_name: "all"
}
keras_layer {
class_name: 'SENet'
senet {
reduction_ratio: 4
}
}
}
blocks {
name: "mmoe"
inputs {
block_name: "senet"
}
keras_layer {
class_name: 'MMoE'
mmoe {
num_task: 2
num_expert: 3
expert_mlp {
hidden_units: [256, 128]
}
}
}
}
}
model_params {
task_towers {
tower_name: "ctr"
label_name: "clk"
dnn {
hidden_units: [128, 64]
}
num_class: 1
weight: 1.0
loss_type: CLASSIFICATION
metrics_set: {
auc {}
}
}
task_towers {
tower_name: "cvr"
label_name: "buy"
dnn {
hidden_units: [128, 64]
}
num_class: 1
weight: 1.0
loss_type: CLASSIFICATION
metrics_set: {
auc {}
}
}
l2_regularization: 1e-06
}
embedding_regularization: 5e-05
}Note that no concat_blocks is configured for the backbone in this case. The framework automatically sets the backbone as a leaf node of the DAG.
Case 9: Multi-objective model DBMTL
The model_class parameter of a multi-objective model is commonly set to MultiTaskModel, and you need to configure multiple towers for each objective in model_params. model_name is a custom string that is only used for comments.
model_config {
model_name: "DBMTL"
model_class: "MultiTaskModel"
feature_groups {
group_name: "all"
feature_names: "user_id"
feature_names: "cms_segid"
...
feature_names: "tag_brand_list"
wide_deep: DEEP
}
backbone {
blocks {
name: "mask_net"
inputs {
feature_group_name: "all"
}
keras_layer {
class_name: 'MaskNet'
masknet {
mask_blocks {
aggregation_size: 512
output_size: 256
}
mask_blocks {
aggregation_size: 512
output_size: 256
}
mask_blocks {
aggregation_size: 512
output_size: 256
}
mlp {
hidden_units: [512, 256]
}
}
}
}
}
model_params {
task_towers {
tower_name: "ctr"
label_name: "clk"
loss_type: CLASSIFICATION
metrics_set: {
auc {}
}
dnn {
hidden_units: [256, 128, 64]
}
relation_dnn {
hidden_units: [32]
}
weight: 1.0
}
task_towers {
tower_name: "cvr"
label_name: "buy"
loss_type: CLASSIFICATION
metrics_set: {
auc {}
}
dnn {
hidden_units: [256, 128, 64]
}
relation_tower_names: ["ctr"]
relation_dnn {
hidden_units: [32]
}
weight: 1.0
}
l2_regularization: 1e-6
}
embedding_regularization: 5e-6
}The Deep Bayesian Multi-Task Learning (DBMTL) model needs to have relation_dnn configured for the tower of each subtask in model_params, and also needs to configure the dependencies between tasks by using relation_tower_names.
Note that no concat_blocks is configured for the backbone in this case. The framework automatically sets the backbone as a leaf node of the DAG.
Case 10: MaskNet + PPNet + MMoE
model_config: {
model_name: 'MaskNet + PPNet + MMoE'
model_class: 'RankModel'
feature_groups: {
group_name: 'memorize'
feature_names: 'user_id'
feature_names: 'adgroup_id'
feature_names: 'pid'
wide_deep: DEEP
}
feature_groups: {
group_name: 'general'
feature_names: 'age_level'
feature_names: 'shopping_level'
...
wide_deep: DEEP
}
backbone {
blocks {
name: "mask_net"
inputs {
feature_group_name: "general"
}
repeat {
num_repeat: 3
keras_layer {
class_name: "MaskBlock"
mask_block {
output_size: 512
aggregation_size: 1024
}
}
}
}
blocks {
name: "ppnet"
inputs {
block_name: "mask_net"
}
inputs {
feature_group_name: "memorize"
}
merge_inputs_into_list: true
repeat {
num_repeat: 3
input_fn: "lambda x, i: [x[0][i], x[1]]"
keras_layer {
class_name: "PPNet"
ppnet {
mlp {
hidden_units: [256, 128, 64]
}
gate_params {
output_dim: 512
}
mode: "eager"
full_gate_input: false
}
}
}
}
blocks {
name: "mmoe"
inputs {
block_name: "ppnet"
}
inputs {
feature_group_name: "general"
}
keras_layer {
class_name: "MMoE"
mmoe {
num_task: 2
num_expert: 3
}
}
}
}
model_params {
l2_regularization: 0.0
task_towers {
tower_name: "ctr"
label_name: "is_click"
metrics_set {
auc {
num_thresholds: 20000
}
}
loss_type: CLASSIFICATION
num_class: 1
dnn {
hidden_units: 64
hidden_units: 32
}
weight: 1.0
}
task_towers {
tower_name: "cvr"
label_name: "is_train"
metrics_set {
auc {
num_thresholds: 20000
}
}
loss_type: CLASSIFICATION
num_class: 1
dnn {
hidden_units: 64
hidden_units: 32
}
weight: 1.0
}
}
}This case shows how to use a reusable block.
More cases
New models:
Configuration file of the FiBiNet model: fibinet_on_movielens.config
Configuration file of the MaskNet model: masknet_on_movielens.config
Performance comparison based on the MovieLens-1M dataset:
Model | Epoch | AUC |
MaskNet | 1 | 0.8872 |
FibiNet | 1 | 0.8893 |
Sequential models:
Configuration file of the Deep Interest Network (DIN) model: DIN_backbone.config
Configuration file of the Behavior Sequence Transformer (BST) model: BST_backbone.config
CL4SRec model: CL4SRec
Other models:
Highway Network:Highway Network
Cross Decoupling Network:CDN
DLRM+SENet:dlrm_senet_on_criteo.config
Introduction to the component library
1. Basic components
Class | Feature | Description | Example |
MLP | Multi-layer perceptron | Allows customization of activation functions, initializer, dropout, and BN. | |
Highway | Residual-like connection | Allows incremental fine-tuning for pre-training embeddings. | |
Gate | Gate | Allows for weighted summation of multiple inputs. | |
PeriodicEmbedding | Periodic activation function | Allows numerical feature embeddings. | |
AutoDisEmbedding | Automatic discretization | Allows numerical feature embeddings. |
Note: The first input of the gate component is a weight vector. The subsequent inputs are pieced together into a list. The length of the weight vector must be equal to the length of the list.
2. Feature crossing components
Class | Feature | Description | Example |
FM | Second-order interaction | The DeepFM model components. | |
DotInteraction | Second-order feature interaction | The DLRM model components. | |
Cross | Bit-wise interaction | The DCN v2 model components. | |
BiLinear | Bilinearity | The FiBiNet model components. | |
FiBiNet | SENet & BiLinear | The FiBiNet model. |
3. Feature importance learning components
Class | Feature | Description | Example |
SENet | Modeling feature importance | The FiBiNet model components. | |
MaskBlock | Modeling feature importance | The MaskNet model components. | |
MaskNet | Multiple serial or parallel MaskBlocks | The MaskNet model. | |
PPNet | Parameter personalization network | The PPNet model. |
4. Sequential feature encoding components
Class | Feature | Description | Example |
DIN | target attention | The DIN model components. | |
BST | transformer | The BST model components. | |
SeqAugment | Serial data augmentation | crop, mask, reorder |
5. Multi-objective learning component
Class | Feature | Description | Example |
MMoE | Multiple Mixture of Experts | The MMoE model components. |
6. Auxiliary loss function component
Class | Feature | Description | Example |
AuxiliaryLoss | Auxiliary loss function | Commonly used in self-supervised learning. |
For more information about component parameters, see Component parameters.
Note that the preceding reference is in Chinese.
Configure custom components
Create a py file in the easy_rec/python/layers/keras directory or directly add component configurations to an existing file. We recommend that components with similar objectives be defined in the same file to reduce the number of files. For example, features involving multiple components can be stored in the interaction.py file.
Define a component class that inherits tf.keras.layers.Layer and implements at least two methods: __init__ and call.
def __init__(self, params, name='xxx', reuse=None, **kwargs):
pass
def call(self, inputs, training=None, **kwargs):
passThe first parameter params of the __init__ method receives the parameters that the framework sends to the current component. The following parameter configuration methods are supported: google.protobuf.Struct and custom protobuf messages. The params object provides a unified read interface for the parameters in the two formats.
Check the required parameters. If required parameters are missing, return an error and exit:
params.check_required(['embedding_dim', 'sigma'])Use dot operators to read parameters:
sigma = params.sigma. Continuous dot operators are supported, such asparams.a.b.Treat all numerical parameters that use the
Structparameter configuration method as the FLOAT type. If a parameter is of the INTEGER type, convert the parameter type to the FLOAT type:embedding_dim = int(params.embedding_dim)Convert the ARRAY type to the LIST type:
units = list(params.hidden_units)Specify a default value to read. The type of the return value is forcibly converted to the type of the default value:
activation = params.get_or_default('activation', 'relu')Support reading the default value in a nested substructure:
params.field.get_or_default('key', def_val)Check whether a parameter exists:
params.has_field(key)[Not recommended because the parameter passing method is restricted] Obtain a custom proto object:
params.get_pb_config()Read or write the
l2_regularizerattribute:params.l2_regularizeris passed to a dense layer or dense function.
[Optional] If you need to configure a custom protobuf message parameter, first add the message definition of the parameter to the layer.proto file in the easy_rec/python/protos/ directory and then register the parameter in the KerasLayer.params message body defined in the easy_rec/python/protos/keras_layer.proto directory.
The reuse parameter of the __init__ method indicates whether the weight parameter of the layer needs to be reused. During development, you need to implement the layer based on the reusable logic. We recommend that you strictly follow the Keras layer specifications. You need to declare the Keras layers that are depended on in the __init__ method. You can use the tf.layers.* function and pass the reuse parameter based on your business requirements.
Tip: To implement the layer, try to use the native tf.keras.layers.* object, and declare all of them in the __init__ method in advance.
The call method is used to implement the main component logic, and the inputs parameter can be a tensor or a list of tensors. The optional training parameter is used to identify whether the model is being trained.
A newly developed layer needs to be exported from the easy_rec.python.layers.keras.__init__.py file so that the layer can be recognized by the framework as a member of the component library. For example, if you want to export the MLP class in the blocks.py file, you need to add from .blocks import MLP.
Sample code for the FM layer:
class FM(tf.keras.layers.Layer):
"""Factorization Machine models pairwise (order-2) feature interactions without linear term and bias.
References
- [Factorization Machines](https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf)
Input shape.
- List of 2D tensor with shape: ``(batch_size,embedding_size)``.
- Or a 3D tensor with shape: ``(batch_size,field_size,embedding_size)``
Output shape
- 2D tensor with shape: ``(batch_size, 1)``.
"""
def __init__(self, params, name='fm', reuse=None, **kwargs):
super(FM, self).__init__(name, **kwargs)
self.reuse = reuse
self.use_variant = params.get_or_default('use_variant', False)
def call(self, inputs, **kwargs):
if type(inputs) == list:
emb_dims = set(map(lambda x: int(x.shape[-1]), inputs))
if len(emb_dims) != 1:
dims = ','.join([str(d) for d in emb_dims])
raise ValueError('all embedding dim must be equal in FM layer:' + dims)
with tf.name_scope(self.name):
fea = tf.stack(inputs, axis=1)
else:
assert inputs.shape.ndims == 3, 'input of FM layer must be a 3D tensor or a list of 2D tensors'
fea = inputs
with tf.name_scope(self.name):
square_of_sum = tf.square(tf.reduce_sum(fea, axis=1))
sum_of_square = tf.reduce_sum(tf.square(fea), axis=1)
cross_term = tf.subtract(square_of_sum, sum_of_square)
if self.use_variant:
cross_term = 0.5 * cross_term
else:
cross_term = 0.5 * tf.reduce_sum(cross_term, axis=-1, keepdims=True)
return cross_termBuild a model
Blocks and block packages are core components for building a backbone network. This section describes the types, features, and configuration parameters of a block. This section also describes a block package that is specially designed for parameter-sharing subnetworks.
For information about how to build a model by using blocks and block packages, see Cases.
The following sample code provides the protobuf definition of a block:
message Block {
required string name = 1;
// the input names of feature groups or other blocks
repeated Input inputs = 2;
optional int32 input_concat_axis = 3 [default = -1];
optional bool merge_inputs_into_list = 4;
optional string extra_input_fn = 5;
// sequential layers
repeated Layer layers = 6;
// only take effect when there are no layers
oneof layer {
InputLayer input_layer = 101;
Lambda lambda = 102;
KerasLayer keras_layer = 103;
RecurrentLayer recurrent = 104;
RepeatLayer repeat = 105;
}
}A block automatically merges multiple inputs.
If the type of one input among multiple inputs is
list, the final result is a merged list that retains the order.If each input among multiple inputs is a tensor, the input tensors are automatically concatenated based on the last dimension. You can change the default behavior for the following parameters:
input_concat_axisindicates the dimension based on which input tensors are concatenated.merge_inputs_into_list: If you set this parameter to true, the inputs are merged into a list without the need to perform the concat operation.
message Input {
oneof name {
string feature_group_name = 1;
string block_name = 2;
string package_name = 3;
}
optional string input_fn = 11;
optional string input_slice = 12;
}Each input can be configured with the optional
input_fnparameter. This parameter is used to specify a lambda function to perform transformations on the input. For example, theinput_fn: 'lambda x: [x]'configuration can change an input to the list format.input_slicecan be used to obtain a slice of a tuple or list. For example, if an input is in the list format, you can use theinput_slice: '[1]'configuration to select the second element of the list as the input.extra_input_fnis an optional parameter that is used to perform transformation on the result merged by multiple inputs. It needs to be configured as a lambda function.
The following blocks are supported: empty block, input block, Lambda block, KerasLayer block, recurrent block, repeated block, and sequential block.
1. Empty block
An empty block is not configured with any layer and can only be used to merge multiple inputs.
2. Input block
An input block is associated with an input layer to obtain, process, and return the original feature input.
Input blocks are unique because they can have only one input and the input is the name of a feature group specified by the feature_group_name parameter.
The name of an input block is the same as the name of the feature group that is configured as the input of the input block.
Sample configuration:
blocks {
name: 'all'
inputs {
feature_group_name: 'all'
}
input_layer {
only_output_feature_list: true
}
}An input layer can be configured to receive inputs in different formats, and can perform additional operations such as dropout. The following sample code provides an example of the parameter definition of an input layer in the protobuf format:
message InputLayer {
optional bool do_batch_norm = 1;
optional bool do_layer_norm = 2;
optional float dropout_rate = 3;
optional float feature_dropout_rate = 4;
optional bool only_output_feature_list = 5;
optional bool only_output_3d_tensor = 6;
optional bool output_2d_tensor_and_feature_list = 7;
optional bool output_seq_and_normal_feature = 8;
}The following section describes the configurations of an input layer:
do_batch_normindicates whether to perform thebatch normalizationoperation on a feature input.do_layer_normindicates whether to perform thelayer normalizationoperation on a feature input.dropout_rateindicates the probability that the input layer performs the dropout operation. By default, the dropout operation is not performed.feature_dropout_rateindicates the probability that the input layer performs the dropout operation for all feature inputs. By default, the dropout operation is not performed.only_output_feature_listreturns features in the list format.only_output_3d_tensorreturns a 3D tensor corresponding to afeature group. This parameter can be configured only whenembedding dimensionsare the same.output_2d_tensor_and_feature_listindicates whether to output 2D tensors and a feature list at the same time.output_seq_and_normal_featureindicates whether to output a tuple including sequence and common features.
3. Lambda block
A Lambda block allows you to configure a lambda function to perform simple operations. Sample configuration:
blocks {
name: 'wide_logit'
inputs {
feature_group_name: 'wide'
}
lambda {
expression: 'lambda x: tf.reduce_sum(x, axis=1, keepdims=True)'
}
}4. KerasLayer block
A KerasLayer block is the most fundamental block, which loads and executes the code logic.
class_nameindicates the class name of the Keras layer to be loaded. Custom and built-in layer classes can be loaded.st_paramsindicates parameters that are configured ingoogle.protobuf.Structformat.You can also pass parameters to a loaded layer in the custom protobuf message format.
Sample configuration:
keras_layer {
class_name: 'MLP'
mlp {
hidden_units: [64, 32, 16]
}
}
keras_layer {
class_name: 'Dropout'
st_params {
fields {
key: 'rate'
value: { number_value: 0.5 }
}
}
}5. Recurrent block
A recurrent block can implement an RNN-like loop structure and execute a layer multiple times. The input of each execution contains the output of the previous execution. The following sample code provides an example of a recurrent block. For more information, see DCN.
recurrent {
num_steps: 3
fixed_input_index: 0
keras_layer {
class_name: 'Cross'
}
}In the preceding configurations, the cross layer is looped over three times, which is logically equivalent to executing the following statements:
x1 = Cross()(x0, x0)
x2 = Cross()(x0, x1)
x3 = Cross()(x0, x2)num_stepsindicates the number of times for recurrent execution.fixed_input_indexindicates the fixed elements in an input list for each execution. Example:x0.keras_layerindicates the component to be executed.
6. Repeated block
A repeated block executes a component multiple times by using the same inputs to implement the multi-head logic. Sample configuration:
repeat {
num_repeat: 2
keras_layer {
class_name: "MaskBlock"
mask_block {
output_size: 512
aggregation_size: 2048
input_layer_norm: false
}
}
}num_repeatindicates the number of repeated executions.output_concat_axisindicates the concatenation dimension of tensors returned for multiple executions. If you do not configure this parameter, a list of multiple execution results is returned.keras_layerindicates the component to be executed.input_sliceindicates the input slice of each component to be executed. For example,[i]is used to obtain the i-th element of the input list as the input for the i-th execution. If you do not configure this parameter, all inputs are obtained.input_fnindicates the input function for each component to be executed. Example:input_fn: "lambda x, i: [x[0][i], x[1]]".
For more information, see MaskNet+PPNet+MMoE.
7. Sequential block
A sequential block can execute multiple layers in sequence. The output of the former layer is the input of the later layer. A sequential block is simpler than multiple common blocks that are connected end to end. Sample configuration:
blocks {
name: 'mlp'
inputs {
feature_group_name: 'features'
}
layers {
keras_layer {
class_name: 'Dense'
st_params {
fields {
key: 'units'
value: { number_value: 256 }
}
fields {
key: 'activation'
value: { string_value: 'relu' }
}
}
}
}
layers {
keras_layer {
class_name: 'Dropout'
st_params {
fields {
key: 'rate'
value: { number_value: 0.5 }
}
}
}
}
layers {
keras_layer {
class_name: 'Dense'
st_params {
fields {
key: 'units'
value: { number_value: 1 }
}
}
}
}
}Implement parameter sharing subnets by using block packages
A block package encapsulates a DAG that consists of multiple blocks and can be called multiple times by using shared parameters. A block package is commonly used in self-supervised learning models.
The following sample code provides the protobuf message definition of a block package:
message BlockPackage {
// package name
required string name = 1;
// a few blocks generating a DAG
repeated Block blocks = 2;
// the names of output blocks
repeated string concat_blocks = 3;
}A block uses the package_name parameter to specify a block package.
The following sample code shows how to use a block package to implement contrastive learning:
model_config {
model_class: "RankModel"
feature_groups {
group_name: "all"
feature_names: "adgroup_id"
feature_names: "user"
...
feature_names: "pid"
wide_deep: DEEP
}
backbone {
packages {
name: 'feature_encoder'
blocks {
name: "fea_dropout"
inputs {
feature_group_name: "all"
}
input_layer {
dropout_rate: 0.5
only_output_3d_tensor: true
}
}
blocks {
name: "encode"
inputs {
block_name: "fea_dropout"
}
layers {
keras_layer {
class_name: 'BSTCTR'
bst {
hidden_size: 128
num_attention_heads: 4
num_hidden_layers: 3
intermediate_size: 128
hidden_act: 'gelu'
max_position_embeddings: 50
hidden_dropout_prob: 0.1
attention_probs_dropout_prob: 0
}
}
}
layers {
keras_layer {
class_name: 'Dense'
st_params {
fields {
key: 'units'
value: { number_value: 128 }
}
fields {
key: 'kernel_initializer'
value: { string_value: 'zeros' }
}
}
}
}
}
}
blocks {
name: "all"
inputs {
name: "all"
}
input_layer {
only_output_3d_tensor: true
}
}
blocks {
name: "loss_ctr"
merge_inputs_into_list: true
inputs {
package_name: 'feature_encoder'
}
inputs {
package_name: 'feature_encoder'
}
inputs {
package_name: 'all'
}
keras_layer {
class_name: 'LOSSCTR'
st_params{
fields {
key: 'cl_weight'
value: { number_value: 1 }
}
fields {
key: 'au_weight'
value: { number_value: 0.01 }
}
}
}
}
}
model_params {
l2_regularization: 1e-5
}
embedding_regularization: 1e-5
}Case
In industrial-grade recommendation systems, user feedback behaviors on items commonly follow a long-tail distribution. A small number of head items receive the majority of user interactions, such as clicks, favorites, and conversions, while the remaining numerous mid-tail and long-tail items receive very little feedback. Recommendation models trained on user behavior logs based on long-tail distribution tend to increasingly favor head items. This reduces the exposure opportunities for mid-tail and long-tail items and affects user satisfaction.
The following phenomena occur during business performance optimization:
Recall expansion by increasing the number of recalls or introducing new types of recalls often fails to improve the overall metrics.
Excluding items outside the "premium pool" from the recall results often enhances performance metrics.
Adding a coarse ranking model improves the coverage metric, but it does not necessarily lead to an overall improvement in business metrics.
Optimizations that closely adhere to the "independent and identically distributed" assumption tend to improve overall metrics, whereas those that stray from this assumption often fail to deliver the desired outcomes. The "independent and identically distributed" assumption refers to the assumption that the training and test datasets for the fine ranking model conform to the same data distribution. A fine ranking model is often the truncated recall or coarse ranking results from a production environment. A fine ranking model is trained on highly skewed long-tail behavioral data, which can lead to "overfitting" on head items and "underfitting" on mid-tail and long-tail items.
"Premium pool" filtering enhances the alignment of recall items with the long-tail distribution of user behavior. This meets the "independent and identically distributed" assumption of a fine ranking model and improves business metrics.
Recall expansion and coarse ranking models aim to smooth the long-tail distribution by increasing the number of mid-tail and long-tail items. However, this deviates from the "independent and identically distributed" assumption of a fine ranking model and often fails to achieve the desired performance improvements.
Based on the analysis of feature importance of a fine ranking model, "memory" features are high-importance features, whereas the importance of numerous mid-tail and long-tail features is extremely low. "Memory" features are those that do not enhance the generalization capabilities of a model, such as item IDs or user behavior statistics on specific item IDs over time. These features do not enable the model to acquire knowledge that can be applied to other items. Conventional model structures tend to produce a long-tail distribution of feature importance, which ultimately leads to a long-tail distribution of items preferred by a model.
Based on the preceding analysis, it is necessary to design a more reasonable model structure that allows a model to learn more "generalization" capabilities in addition to "memory" capabilities. CDN provides a feasible solution to the aforementioned issue by introducing a gating mechanism based on item distribution. This mechanism enables head items to fit "memory features," while mid-tail and long-tail items fit "generalization features." A model can output a solution that fits the final business objectives by using weighted summation to learn representation features from various features.
The following model structure is designed based on a real business scenario and a model is built by using the componentized EasyRec framework.

For more information about the configurations in this case, see Build a deep recommendation algorithm model based on the componentized EasyRec framework.
For more information about how to use EasyRec, see https://easyrec.readthedocs.io/en/latest/component/backbone.html.
Note that the preceding references are in Chinese.