All Products
Search
Document Center

Platform For AI:Quickly build a deep learning-based recommendation algorithm model based on the componentized EasyRec framework

Last Updated:Dec 25, 2024

EasyRec helps you build models based on your business requirements. This way, you can quickly validate various algorithm ideas, reuse core algorithm components in models in different scenarios, and build new models by permutations and combinations.

Limits

Componentized EasyRec 0.8.0 or later is supported.

Why is componentization required?

1. Build models in a flexible manner

You can build models by using public components that are dynamically pluggable. The EasyRec framework provides the "glue" syntax to achieve seamless connections between components.

2. Reuse components

Most models are called new models because they introduce one or more special sub-components. You can assemble the sub-components to build new models in an efficient manner.

In the past, to add a new optional public component, such as Dense Feature Embedding Layer or SENet, to an existing model, you needed to modify the entire model code to apply the new features. The process is cumbersome and error-prone. Given a larger number of models and public components, adding all optional public components to all models requires the modification of an immense amount of code.

Componentization helps decouple underlying public components from upper-level models.

3. Improve the iteration efficiency of experiments

You can add new features to a model in a convenient manner. To develop a new model, you need to only create its unique components, and then assemble the rest by using existing components in the component library.

Now, we need to develop a Keras Layer class for new features and add import statements to specific packages. This way, the EasyRec framework can automatically recognize and add the new features to the component library without additional operations. New developers no longer need to be familiar with all aspects of EasyRec to add new features to the framework. This greatly improves the development efficiency.

Objective of componentization

Instead of building a new model, you can create components and assemble them. This makes your work much easier.

Each component focuses on the implementation of its own features, and the code in the component is highly aggregated and works to achieve the same objective. This is commonly referred to as the single responsibility principle.

Backbone network

A componentized EasyRec model uses a configurable backbone network as its core component. A backbone network is a Directed Acyclic Graph (DAG) that consists of multiple blocks. The EasyRec framework executes the code logic of blocks in topological order of the DAG to build a subgraph of the TensorFlow graph. The output nodes of the DAG are specified by the concat_blocks parameter. The output tensors of all blocks are concatenated as the input of the top optional Multi-Layer Perceptron (MLP) layer or directly connected to the final prediction layer.

image.png

Case 1: Wide&Deep model

Configuration file: wide_and_deep_backbone_on_movielens.config

model_config: {
 model_name: "WideAndDeep"
 model_class: "RankModel"
 feature_groups: {
   group_name: 'wide'
   feature_names: 'user_id'
   feature_names: 'movie_id'
   feature_names: 'job_id'
   feature_names: 'age'
   feature_names: 'gender'
   feature_names: 'year'
   feature_names: 'genres'
   wide_deep: WIDE
 }
 feature_groups: {
   group_name: 'deep'
   feature_names: 'user_id'
   feature_names: 'movie_id'
   feature_names: 'job_id'
   feature_names: 'age'
   feature_names: 'gender'
   feature_names: 'year'
   feature_names: 'genres'
   wide_deep: DEEP
 }
 backbone {
   blocks {
     name: 'wide'
     inputs {
     	feature_group_name: 'wide'
     }
     input_layer {
     	only_output_feature_list: true
     	wide_output_dim: 1
     }
   }
   blocks {
     name: 'deep_logit'
     inputs {
     	feature_group_name: 'deep'
     }
     keras_layer {
     	 class_name: 'MLP'
       mlp {
         hidden_units: [256, 256, 256, 1]
         use_final_bn: false
         final_activation: 'linear'
       }
     }
   }
   blocks {
     name: 'final_logit'
     inputs {
       block_name: 'wide'
       input_fn: 'lambda x: tf.add_n(x)'
     }
     inputs {
       block_name: 'deep_logit'
     }
     merge_inputs_into_list: true
     keras_layer {
       class_name: 'Add'
     }
   }
   concat_blocks: 'final_logit'
 }
 model_params {
 	l2_regularization: 1e-4
 }
 embedding_regularization: 1e-4
}

Performance comparison based on the MovieLens-1M dataset:

Model

Epoch

AUC

Wide&Deep

1

0.8558

Wide&Deep(Backbone)

1

0.8854

Note: A model built by using components works better than a built-in model because the MLP layer can use optimal initialization methods.

A backbone network is defined by using a protobuf message and consists of multiple blocks. Each block is a reusable component.

  • Each block has a unique name and one or more inputs and outputs.

  • Each input can only be the name of a feature group, another block, or a block package. If a block has multiple inputs, the inputs are automatically merged. For example, if the inputs are in the list format, the inputs are automatically merged. If the inputs are tensors, the tensors are automatically concatenated.

  • All blocks form a DAG based on the relationships between the inputs and outputs. The EasyRec framework automatically parses the topological relationships of the DAG and executes the components represented by the blocks in topological order.

  • If a block has multiple outputs, a Python tuple is returned. You can configure input_slice for a downstream block to obtain an element of the tuple to use as the input of the downstream block by using the Python slice syntax, or use input_fn to configure a custom lambda function to obtain an element of the tuple.

  • The components represented by blocks are commonly Keras layers, which are reusable subnetwork components. The EasyRec framework allows you to load a custom Keras layer and all built-in Keras layers.

  • Each block can be associated with an input layer to perform additional operations on the input feature group, such as batch normalization, layer normalization, or feature dropout. You can also specify the format of an output tensor, such as 2D, 3D, list, or other formats. Note: If a block is associated with an input_layer, you must configure feature_group_name as the name of a feature group. If a block is not associated with an input layer, the name of the block cannot be the same as the name of a feature group.

  • When some special blocks are associated with special components, including Lambda layer, sequential layers, repeated layer, and recurrent layer, the special components achieve the features of custom expressions, executing multiple layers in sequence, repeatedly executing a specific layer, and looping over a specific layer.

  • You can configure the concat_blocks parameter to specify the name of an output node in a DAG. If you configure multiple output nodes, multiple output tensors are automatically concatenated.

  • If you do not configure the concat_blocks parameter, the EasyRec framework automatically concatenates all leaf nodes in a DAG and returns the output.

  • You can configure an optional MLP layer for a backbone network.

Case 2: DeepFM model

Configuration file: deepfm_backbone_on_movielens.config

This case includes two special blocks. One block has a custom lambda function configured. The other block has the tf.keras.layers.Add built-in Keras layer loaded.

model_config: {
  model_name: 'DeepFM'
  model_class: 'RankModel'
  feature_groups: {
    group_name: 'wide'
    feature_names: 'user_id'
    feature_names: 'movie_id'
    feature_names: 'job_id'
    feature_names: 'age'
    feature_names: 'gender'
    feature_names: 'year'
    feature_names: 'genres'
    wide_deep: WIDE
  }
  feature_groups: {
    group_name: 'features'
    feature_names: 'user_id'
    feature_names: 'movie_id'
    feature_names: 'job_id'
    feature_names: 'age'
    feature_names: 'gender'
    feature_names: 'year'
    feature_names: 'genres'
    feature_names: 'title'
    wide_deep: DEEP
  }
  backbone {
    blocks {
      name: 'wide_logit'
      inputs {
        feature_group_name: 'wide'
      }
      input_layer {
        wide_output_dim: 1
      }
    }
    blocks {
      name: 'features'
      inputs {
        feature_group_name: 'features'
      }
      input_layer {
        output_2d_tensor_and_feature_list: true
      }
    }
    blocks {
      name: 'fm'
      inputs {
        block_name: 'features'
        input_slice: '[1]'
      }
      keras_layer {
        class_name: 'FM'
      }
    }
    blocks {
      name: 'deep'
      inputs {
        block_name: 'features'
        input_slice: '[0]'
      }
      keras_layer {
        class_name: 'MLP'
        mlp {
          hidden_units: [256, 128, 64, 1]
          use_final_bn: false
          final_activation: 'linear'
        }
      }
    }
    blocks {
      name: 'add'
      inputs {
        block_name: 'wide_logit'
        input_fn: 'lambda x: tf.reduce_sum(x, axis=1, keepdims=True)'
      }
      inputs {
        block_name: 'fm'
      }
      inputs {
        block_name: 'deep'
      }
      merge_inputs_into_list: true
      keras_layer {
        class_name: 'Add'
      }
    }
    concat_blocks: 'add'
  }
  model_params {
    l2_regularization: 1e-4
  }
  embedding_regularization: 1e-4
}

Performance comparison based on the MovieLens-1M dataset:

Model

Epoch

AUC

DeepFM

1

0.8867

DeepFM(Backbone)

1

0.8872

Case 3: DCN model

Configuration file: dcn_backbone_on_movielens.config

This case includes a special Deep & Cross Network (DCN) block that uses the recurrent layer to loop over a component multiple times. This case also has an MLP layer added to the DAG.

model_config: {
  model_name: 'DCN V2'
  model_class: 'RankModel'
  feature_groups: {
    group_name: 'all'
    feature_names: 'user_id'
    feature_names: 'movie_id'
    feature_names: 'job_id'
    feature_names: 'age'
    feature_names: 'gender'
    feature_names: 'year'
    feature_names: 'genres'
    wide_deep: DEEP
  }
  backbone {
    blocks {
      name: "deep"
      inputs {
        feature_group_name: 'all'
      }
      keras_layer {
        class_name: 'MLP'
        mlp {
          hidden_units: [256, 128, 64]
        }
      }
    }
    blocks {
      name: "dcn"
      inputs {
        feature_group_name: 'all'
        input_fn: 'lambda x: [x, x]'
      }
      recurrent {
        num_steps: 3
        fixed_input_index: 0
        keras_layer {
          class_name: 'Cross'
        }
      }
    }
    concat_blocks: ['deep', 'dcn']
    top_mlp {
      hidden_units: [64, 32, 16]
    }
  }
  model_params {
    l2_regularization: 1e-4
  }
  embedding_regularization: 1e-4
}

In the preceding configurations, the cross layer is looped over three times, which is logically equivalent to executing the following statements:

x1 = Cross()(x0, x0)
x2 = Cross()(x0, x1)
x3 = Cross()(x0, x2)

Performance comparison based on the MovieLens-1M dataset:

Model

Epoch

AUC

DCN (built-in)

1

0.8576

DCN_v2 (backbone)

1

0.8770

Note: The new Cross component is used in the DCN model of the v2 version, which has more parameters. The built-in DCN model is of the v1 version.

Case 4: DLRM model

Configuration file: dlrm_backbone_on_criteo.config

model_config: {
  model_name: 'DLRM'
  model_class: 'RankModel'
  feature_groups: {
    group_name: "dense"
    feature_names: "F1"
    feature_names: "F2"
    ...
    wide_deep:DEEP
  }
  feature_groups: {
    group_name: "sparse"
    feature_names: "C1"
    feature_names: "C2"
    feature_names: "C3"
    ...
    wide_deep:DEEP
  }
  backbone {
    blocks {
      name: 'bottom_mlp'
      inputs {
        feature_group_name: 'dense'
      }
      keras_layer {
        class_name: 'MLP'
        mlp {
          hidden_units: [64, 32, 16]
        }
      }
    }
    blocks {
      name: 'sparse'
      inputs {
        feature_group_name: 'sparse'
      }
      input_layer {
        output_2d_tensor_and_feature_list: true
      }
    }
    blocks {
      name: 'dot'
      inputs {
        block_name: 'bottom_mlp'
      }
      inputs {
        block_name: 'sparse'
        input_slice: '[1]'
      }
      keras_layer {
        class_name: 'DotInteraction'
      }
    }
    blocks {
      name: 'sparse_2d'
      inputs {
        block_name: 'sparse'
        input_slice: '[0]'
      }
    }
    concat_blocks: ['sparse_2d', 'dot']
    top_mlp {
      hidden_units: [256, 128, 64]
    }
  }
  model_params {
    l2_regularization: 1e-5
  }
  embedding_regularization: 1e-5
}

Performance comparison based on the Criteo dataset:

Model

Epoch

AUC

DLRM

1

0.79785

DLRM (backbone)

1

0.7993

Note: DotInteraction is a newly developed component for performing dot product operations on pairwise feature interactions.

In this model, the first input of the dot block is a tensor and the second input of the dot block is a list. In this case, the first input is inserted into the list and merged into a large list as the input of the block.

Case 5: Add a numerical feature embedding component to a Deep Learning Recommendation Model (DLRM)

Configuration file: dlrm_on_criteo_with_periodic.config

Compared with Case 4, this case has an added PeriodicEmbedding layer, which shows the flexibility and scalability of componentized programming.

You need to focus on the parameter configuration method of the PeriodicEmbedding layer. Instead of using a custom protobuf message for passing parameters, the built-in google.protobuf.Struct is used as the parameter of the custom layer. Actually, the custom layer also supports parameter passing by using a custom protobuf message. The framework provides a common parameter API to support the two methods for passing parameters.

model_config: {
  model_class: 'RankModel'
  feature_groups: {
    group_name: "dense"
    feature_names: "F1"
    feature_names: "F2"
    ...
    wide_deep:DEEP
  }
  feature_groups: {
    group_name: "sparse"
    feature_names: "C1"
    feature_names: "C2"
    ...
    wide_deep:DEEP
  }
  backbone {
    blocks {
      name: 'num_emb'
      inputs {
        feature_group_name: 'dense'
      }
      keras_layer {
        class_name: 'PeriodicEmbedding'
        st_params {
          fields {
            key: "output_tensor_list"
            value { bool_value: true }
          }
          fields {
            key: "embedding_dim"
            value { number_value: 16 }
          }
          fields {
            key: "sigma"
            value { number_value: 0.005 }
          }
        }
      }
    }
    blocks {
      name: 'sparse'
      inputs {
        feature_group_name: 'sparse'
      }
      input_layer {
        output_2d_tensor_and_feature_list: true
      }
    }
    blocks {
      name: 'dot'
      inputs {
        block_name: 'num_emb'
        input_slice: '[1]'
      }
      inputs {
        block_name: 'sparse'
        input_slice: '[1]'
      }
      keras_layer {
        class_name: 'DotInteraction'
      }
    }
    blocks {
      name: 'sparse_2d'
      inputs {
        block_name: 'sparse'
        input_slice: '[0]'
      }
    }
    blocks {
      name: 'num_emb_2d'
      inputs {
        block_name: 'num_emb'
        input_slice: '[0]'
      }
    }
    concat_blocks: ['num_emb_2d', 'dot', 'sparse_2d']
    top_mlp {
      hidden_units: [256, 128, 64]
    }
  }
  model_params {
    l2_regularization: 1e-5
  }
  embedding_regularization: 1e-5
}

Performance comparison based on the Criteo dataset:

Model

Epoch

AUC

DLRM

1

0.79785

DLRM (backbone)

1

0.7993

DLRM (periodic)

1

0.7998

Case 6: Use a built-in Keras layer to build a Deep Neural Network (DNN) model

Configuration file: mlp_on_movielens.config

This case is provided only to demonstrate that the componentized EasyRec framework can use a built-in atomic Keras layer of TensorFlow as a common component. We already have a custom MLP layer, which is more convenient to use.

This case includes a special sequential block, which can define multiple layers connected in series. The output of the former layer is used as the input of the later layer. Compared with defining multiple common blocks, defining multiple sequential blocks is more convenient.

Note: To call a built-in Keras layer, parameters must be passed by using the google.proto.Struct format.

model_config: {
  model_class: "RankModel"
  feature_groups: {
    group_name: 'features'
    feature_names: 'user_id'
    feature_names: 'movie_id'
    feature_names: 'job_id'
    feature_names: 'age'
    feature_names: 'gender'
    feature_names: 'year'
    feature_names: 'genres'
    wide_deep: DEEP
  }
  backbone {
    blocks {
      name: 'mlp'
      inputs {
        feature_group_name: 'features'
      }
      layers {
        keras_layer {
          class_name: 'Dense'
          st_params {
            fields {
              key: 'units'
              value: { number_value: 256 }
            }
            fields {
              key: 'activation'
              value: { string_value: 'relu' }
            }
          }
        }
      }
      layers {
        keras_layer {
          class_name: 'Dropout'
          st_params {
            fields {
              key: 'rate'
              value: { number_value: 0.5 }
            }
          }
        }
      }
      layers {
        keras_layer {
          class_name: 'Dense'
          st_params {
            fields {
              key: 'units'
              value: { number_value: 256 }
            }
            fields {
              key: 'activation'
              value: { string_value: 'relu' }
            }
          }
        }
      }
      layers {
        keras_layer {
          class_name: 'Dropout'
          st_params {
            fields {
              key: 'rate'
              value: { number_value: 0.5 }
            }
          }
        }
      }
      layers {
        keras_layer {
          class_name: 'Dense'
          st_params {
            fields {
              key: 'units'
              value: { number_value: 1 }
            }
          }
        }
      }
    }
    concat_blocks: 'mlp'
  }
  model_params {
    l2_regularization: 1e-4
  }
  embedding_regularization: 1e-4
}

Performance comparison based on the MovieLens-1M dataset:

Model

Epoch

AUC

MLP

1

0.8616

Case 7: Contrastive learning by using block packages

Configuration file: contrastive_learning_on_movielens.config

This case demonstrates how to use a block package. You can package a set of blocks into a block package to build a reusable subnetwork that can be called multiple times in the same model by using shared parameters. In contrast, blocks that are not packaged cannot be called multiple times but the results can be reused multiple times.

Block packages are designed for scenarios such as self-supervised learning and contrastive learning.

model_config: {
  model_name: "ContrastiveLearning"
  model_class: "RankModel"
  feature_groups: {
    group_name: 'user'
    feature_names: 'user_id'
    feature_names: 'job_id'
    feature_names: 'age'
    feature_names: 'gender'
    wide_deep: DEEP
  }
  feature_groups: {
    group_name: 'item'
    feature_names: 'movie_id'
    feature_names: 'year'
    feature_names: 'genres'
    wide_deep: DEEP
  }
  backbone {
    blocks {
      name: 'user_tower'
      inputs {
        feature_group_name: 'user'
      }
      keras_layer {
        class_name: 'MLP'
        mlp {
          hidden_units: [256, 128]
        }
      }
    }
    packages {
      name: 'item_tower'
      blocks {
        name: 'item'
        inputs {
          feature_group_name: 'item'
        }
        input_layer {
          dropout_rate: 0.2
        }
      }
      blocks {
        name: 'item_encoder'
        inputs {
          block_name: 'item'
        }
        keras_layer {
          class_name: 'MLP'
          mlp {
            hidden_units: [256, 128]
          }
        }
      }
    }
    blocks {
      name: 'contrastive_learning'
      inputs {
        package_name: 'item_tower'
      }
      inputs {
        package_name: 'item_tower'
      }
      merge_inputs_into_list: true
      keras_layer {
        class_name: 'AuxiliaryLoss'
        st_params {
          fields {
            key: 'loss_type'
            value: { string_value: 'info_nce' }
          }
          fields {
            key: 'loss_weight'
            value: { number_value: 0.1 }
          }
          fields {
            key: 'temperature'
            value: { number_value: 0.2 }
          }
        }
      }
    }
    blocks {
      name: 'top_mlp'
      inputs {
        block_name: 'contrastive_learning'
        ignore_input: true
      }
      inputs {
        block_name: 'user_tower'
      }
      inputs {
        package_name: 'item_tower'
        reset_input {}
      }
      keras_layer {
        class_name: 'MLP'
        mlp {
          hidden_units: [128, 64]
        }
      }
    }
    concat_blocks: 'top_mlp'
  }
  model_params {
    l2_regularization: 1e-4
  }
  embedding_regularization: 1e-4
}

AuxiliaryLoss is a layer used to calculate contrastive learning loss. For more information, see Component parameters.

Additional input configurations:

  • ignore_input: true indicates that the current input is ignored. The input is added only to control the execution order of the topology.

  • reset_input: resets the parameters of the input layer when the package is called. You can configure parameters that are different from the parameters defined by the package.

Note that no concat_blocks is configured for the package named item_tower in this case. The framework automatically sets the package as a leaf node of the DAG.

In this case, the item_tower package is called three times, the dropout configuration at the input layer takes effect in the first two calls, which is used to calculate the contrastive learning loss function, and the configuration of the input layer is reset in the last call with the dropout operation not performed. The item_tower package of the main model shares parameters with the item_tower package in the auxiliary task of contrastive learning. The item_tower package in the auxiliary task generates augmented samples by applying dropout to the input feature embeddings. The item_tower package of the main model does not perform data augmentation operations.

Performance comparison based on the MovieLens-1M dataset:

Model

Epoch

AUC

MultiTower

1

0.8814

ContrastiveLearning

1

0.8728

For information about a complex case of a contrastive learning model, see CL4SRec.

Case 8: Multi-objective model MMoE

The model_class parameter of a multi-objective model is commonly set to MultiTaskModel, and you need to configure multiple towers for each objective in model_params. model_name is a custom string that is only used for comments.

model_config {
  model_name: "MMoE"
  model_class: "MultiTaskModel"
  feature_groups {
    group_name: "all"
    feature_names: "user_id"
    feature_names: "cms_segid"
    ...
    feature_names: "tag_brand_list"
    wide_deep: DEEP
  }
  backbone {
    blocks {
      name: 'all'
      inputs {
        feature_group_name: 'all'
      }
      input_layer {
        only_output_feature_list: true
      }
    }
    blocks {
      name: "senet"
      inputs {
        block_name: "all"
      }
      keras_layer {
        class_name: 'SENet'
        senet {
          reduction_ratio: 4
        }
      }
    }
    blocks {
      name: "mmoe"
      inputs {
        block_name: "senet"
      }
      keras_layer {
        class_name: 'MMoE'
        mmoe {
          num_task: 2
          num_expert: 3
          expert_mlp {
            hidden_units: [256, 128]
          }
        }
      }
    }
  }
  model_params {
    task_towers {
      tower_name: "ctr"
      label_name: "clk"
      dnn {
        hidden_units: [128, 64]
      }
      num_class: 1
      weight: 1.0
      loss_type: CLASSIFICATION
      metrics_set: {
       auc {}
      }
    }
    task_towers {
      tower_name: "cvr"
      label_name: "buy"
      dnn {
        hidden_units: [128, 64]
      }
      num_class: 1
      weight: 1.0
      loss_type: CLASSIFICATION
      metrics_set: {
       auc {}
      }
    }
    l2_regularization: 1e-06
  }
  embedding_regularization: 5e-05
}

Note that no concat_blocks is configured for the backbone in this case. The framework automatically sets the backbone as a leaf node of the DAG.

Case 9: Multi-objective model DBMTL

The model_class parameter of a multi-objective model is commonly set to MultiTaskModel, and you need to configure multiple towers for each objective in model_params. model_name is a custom string that is only used for comments.

model_config {
 model_name: "DBMTL"
 model_class: "MultiTaskModel"
 feature_groups {
 group_name: "all"
 feature_names: "user_id"
 feature_names: "cms_segid"
 ...
 feature_names: "tag_brand_list"
 wide_deep: DEEP
 }
 backbone {
 blocks {
 name: "mask_net"
 inputs {
 feature_group_name: "all"
 }
 keras_layer {
 class_name: 'MaskNet'
 masknet {
 mask_blocks {
 aggregation_size: 512
 output_size: 256
 }
 mask_blocks {
 aggregation_size: 512
 output_size: 256
 }
 mask_blocks {
 aggregation_size: 512
 output_size: 256
 }
 mlp {
 hidden_units: [512, 256]
 }
 }
 }
 }
 }
 model_params {
 task_towers {
 tower_name: "ctr"
 label_name: "clk"
 loss_type: CLASSIFICATION
 metrics_set: {
 auc {}
 }
 dnn {
 hidden_units: [256, 128, 64]
 }
 relation_dnn {
 hidden_units: [32]
 }
 weight: 1.0
 }
 task_towers {
 tower_name: "cvr"
 label_name: "buy"
 loss_type: CLASSIFICATION
 metrics_set: {
 auc {}
 }
 dnn {
 hidden_units: [256, 128, 64]
 }
 relation_tower_names: ["ctr"]
 relation_dnn {
 hidden_units: [32]
 }
 weight: 1.0
 }
 l2_regularization: 1e-6
 }
 embedding_regularization: 5e-6
}

The Deep Bayesian Multi-Task Learning (DBMTL) model needs to have relation_dnn configured for the tower of each subtask in model_params, and also needs to configure the dependencies between tasks by using relation_tower_names.

Note that no concat_blocks is configured for the backbone in this case. The framework automatically sets the backbone as a leaf node of the DAG.

Case 10: MaskNet + PPNet + MMoE

model_config: {
  model_name: 'MaskNet + PPNet + MMoE'
  model_class: 'RankModel'
  feature_groups: {
    group_name: 'memorize'
    feature_names: 'user_id'
    feature_names: 'adgroup_id'
    feature_names: 'pid'
    wide_deep: DEEP
  }
  feature_groups: {
    group_name: 'general'
    feature_names: 'age_level'
    feature_names: 'shopping_level'
    ...
    wide_deep: DEEP
  }
  backbone {
    blocks {
      name: "mask_net"
      inputs {
        feature_group_name: "general"
      }
      repeat {
        num_repeat: 3
        keras_layer {
          class_name: "MaskBlock"
          mask_block {
            output_size: 512
            aggregation_size: 1024
          }
        }
      }
    }
    blocks {
      name: "ppnet"
      inputs {
        block_name: "mask_net"
      }
      inputs {
        feature_group_name: "memorize"
      }
      merge_inputs_into_list: true
      repeat {
        num_repeat: 3
        input_fn: "lambda x, i: [x[0][i], x[1]]"
        keras_layer {
          class_name: "PPNet"
          ppnet {
            mlp {
              hidden_units: [256, 128, 64]
            }
            gate_params {
              output_dim: 512
            }
            mode: "eager"
            full_gate_input: false
          }
        }
      }
    }
    blocks {
      name: "mmoe"
      inputs {
        block_name: "ppnet"
      }
      inputs {
        feature_group_name: "general"
      }
      keras_layer {
        class_name: "MMoE"
        mmoe {
          num_task: 2
          num_expert: 3
        }
      }
    }
  }
  model_params {
    l2_regularization: 0.0
    task_towers {
      tower_name: "ctr"
      label_name: "is_click"
      metrics_set {
        auc {
          num_thresholds: 20000
        }
      }
      loss_type: CLASSIFICATION
      num_class: 1
      dnn {
        hidden_units: 64
        hidden_units: 32
      }
      weight: 1.0
    }
    task_towers {
      tower_name: "cvr"
      label_name: "is_train"
      metrics_set {
        auc {
          num_thresholds: 20000
        }
      }
      loss_type: CLASSIFICATION
      num_class: 1
      dnn {
        hidden_units: 64
        hidden_units: 32
      }
      weight: 1.0
    }
  }
}

This case shows how to use a reusable block.

More cases

New models:

Performance comparison based on the MovieLens-1M dataset:

Model

Epoch

AUC

MaskNet

1

0.8872

FibiNet

1

0.8893

Sequential models:

Other models:

Introduction to the component library

1. Basic components

Class

Feature

Description

Example

MLP

Multi-layer perceptron

Allows customization of activation functions, initializer, dropout, and BN.

Case 1

Highway

Residual-like connection

Allows incremental fine-tuning for pre-training embeddings.

Highway Network

Gate

Gate

Allows for weighted summation of multiple inputs.

CDN

PeriodicEmbedding

Periodic activation function

Allows numerical feature embeddings.

Case 5

AutoDisEmbedding

Automatic discretization

Allows numerical feature embeddings.

dlrm_on_criteo_with_autodis.config

Note: The first input of the gate component is a weight vector. The subsequent inputs are pieced together into a list. The length of the weight vector must be equal to the length of the list.

2. Feature crossing components

Class

Feature

Description

Example

FM

Second-order interaction

The DeepFM model components.

Case 2

DotInteraction

Second-order feature interaction

The DLRM model components.

Case 4

Cross

Bit-wise interaction

The DCN v2 model components.

Case 3

BiLinear

Bilinearity

The FiBiNet model components.

fibinet_on_movielens.config

FiBiNet

SENet & BiLinear

The FiBiNet model.

fibinet_on_movielens.config

3. Feature importance learning components

Class

Feature

Description

Example

SENet

Modeling feature importance

The FiBiNet model components.

MMoE

MaskBlock

Modeling feature importance

The MaskNet model components.

CDN

MaskNet

Multiple serial or parallel MaskBlocks

The MaskNet model.

DBMTL

PPNet

Parameter personalization network

The PPNet model.

PPNet

4. Sequential feature encoding components

Class

Feature

Description

Example

DIN

target attention

The DIN model components.

DIN_backbone.config

BST

transformer

The BST model components.

BST_backbone.config

SeqAugment

Serial data augmentation

crop, mask, reorder

CL4SRec

5. Multi-objective learning component

Class

Feature

Description

Example

MMoE

Multiple Mixture of Experts

The MMoE model components.

Case 8

6. Auxiliary loss function component

Class

Feature

Description

Example

AuxiliaryLoss

Auxiliary loss function

Commonly used in self-supervised learning.

Case 7

For more information about component parameters, see Component parameters.

Note

Note that the preceding reference is in Chinese.

Configure custom components

Create a py file in the easy_rec/python/layers/keras directory or directly add component configurations to an existing file. We recommend that components with similar objectives be defined in the same file to reduce the number of files. For example, features involving multiple components can be stored in the interaction.py file.

Define a component class that inherits tf.keras.layers.Layer and implements at least two methods: __init__ and call.

def __init__(self, params, name='xxx', reuse=None, **kwargs):
 pass
def call(self, inputs, training=None, **kwargs):
 pass

The first parameter params of the __init__ method receives the parameters that the framework sends to the current component. The following parameter configuration methods are supported: google.protobuf.Struct and custom protobuf messages. The params object provides a unified read interface for the parameters in the two formats.

  • Check the required parameters. If required parameters are missing, return an error and exit: params.check_required(['embedding_dim', 'sigma'])

  • Use dot operators to read parameters: sigma = params.sigma. Continuous dot operators are supported, such as params.a.b.

  • Treat all numerical parameters that use the Struct parameter configuration method as the FLOAT type. If a parameter is of the INTEGER type, convert the parameter type to the FLOAT type: embedding_dim = int(params.embedding_dim)

  • Convert the ARRAY type to the LIST type: units = list(params.hidden_units)

  • Specify a default value to read. The type of the return value is forcibly converted to the type of the default value: activation = params.get_or_default('activation', 'relu')

  • Support reading the default value in a nested substructure: params.field.get_or_default('key', def_val)

  • Check whether a parameter exists: params.has_field(key)

  • [Not recommended because the parameter passing method is restricted] Obtain a custom proto object: params.get_pb_config()

  • Read or write the l2_regularizer attribute: params.l2_regularizer is passed to a dense layer or dense function.

[Optional] If you need to configure a custom protobuf message parameter, first add the message definition of the parameter to the layer.proto file in the easy_rec/python/protos/ directory and then register the parameter in the KerasLayer.params message body defined in the easy_rec/python/protos/keras_layer.proto directory.

The reuse parameter of the __init__ method indicates whether the weight parameter of the layer needs to be reused. During development, you need to implement the layer based on the reusable logic. We recommend that you strictly follow the Keras layer specifications. You need to declare the Keras layers that are depended on in the __init__ method. You can use the tf.layers.* function and pass the reuse parameter based on your business requirements.

Tip: To implement the layer, try to use the native tf.keras.layers.* object, and declare all of them in the __init__ method in advance.

The call method is used to implement the main component logic, and the inputs parameter can be a tensor or a list of tensors. The optional training parameter is used to identify whether the model is being trained.

A newly developed layer needs to be exported from the easy_rec.python.layers.keras.__init__.py file so that the layer can be recognized by the framework as a member of the component library. For example, if you want to export the MLP class in the blocks.py file, you need to add from .blocks import MLP.

Sample code for the FM layer:

class FM(tf.keras.layers.Layer):
 """Factorization Machine models pairwise (order-2) feature interactions without linear term and bias.

 References
 - [Factorization Machines](https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf)
 Input shape.
 - List of 2D tensor with shape: ``(batch_size,embedding_size)``.
 - Or a 3D tensor with shape: ``(batch_size,field_size,embedding_size)``
 Output shape
 - 2D tensor with shape: ``(batch_size, 1)``.
 """

 def __init__(self, params, name='fm', reuse=None, **kwargs):
 super(FM, self).__init__(name, **kwargs)
 self.reuse = reuse
 self.use_variant = params.get_or_default('use_variant', False)

 def call(self, inputs, **kwargs):
 if type(inputs) == list:
 emb_dims = set(map(lambda x: int(x.shape[-1]), inputs))
 if len(emb_dims) != 1:
 dims = ','.join([str(d) for d in emb_dims])
 raise ValueError('all embedding dim must be equal in FM layer:' + dims)
 with tf.name_scope(self.name):
 fea = tf.stack(inputs, axis=1)
 else:
 assert inputs.shape.ndims == 3, 'input of FM layer must be a 3D tensor or a list of 2D tensors'
 fea = inputs

 with tf.name_scope(self.name):
 square_of_sum = tf.square(tf.reduce_sum(fea, axis=1))
 sum_of_square = tf.reduce_sum(tf.square(fea), axis=1)
 cross_term = tf.subtract(square_of_sum, sum_of_square)
 if self.use_variant:
 cross_term = 0.5 * cross_term
 else:
 cross_term = 0.5 * tf.reduce_sum(cross_term, axis=-1, keepdims=True)
 return cross_term

Build a model

Blocks and block packages are core components for building a backbone network. This section describes the types, features, and configuration parameters of a block. This section also describes a block package that is specially designed for parameter-sharing subnetworks.

For information about how to build a model by using blocks and block packages, see Cases.

The following sample code provides the protobuf definition of a block:

message Block {
 required string name = 1;
 // the input names of feature groups or other blocks
 repeated Input inputs = 2;
 optional int32 input_concat_axis = 3 [default = -1];
 optional bool merge_inputs_into_list = 4;
 optional string extra_input_fn = 5;

 // sequential layers
 repeated Layer layers = 6;
 // only take effect when there are no layers
 oneof layer {
 InputLayer input_layer = 101;
 Lambda lambda = 102;
 KerasLayer keras_layer = 103;
 RecurrentLayer recurrent = 104;
 RepeatLayer repeat = 105;
 }
}

A block automatically merges multiple inputs.

  1. If the type of one input among multiple inputs is list, the final result is a merged list that retains the order.

  2. If each input among multiple inputs is a tensor, the input tensors are automatically concatenated based on the last dimension. You can change the default behavior for the following parameters:

  • input_concat_axis indicates the dimension based on which input tensors are concatenated.

  • merge_inputs_into_list: If you set this parameter to true, the inputs are merged into a list without the need to perform the concat operation.

message Input {
 oneof name {
 string feature_group_name = 1;
 string block_name = 2;
 string package_name = 3;
 }
 optional string input_fn = 11;
 optional string input_slice = 12;
}
  • Each input can be configured with the optional input_fn parameter. This parameter is used to specify a lambda function to perform transformations on the input. For example, the input_fn: 'lambda x: [x]' configuration can change an input to the list format.

  • input_slice can be used to obtain a slice of a tuple or list. For example, if an input is in the list format, you can use the input_slice: '[1]' configuration to select the second element of the list as the input.

  • extra_input_fn is an optional parameter that is used to perform transformation on the result merged by multiple inputs. It needs to be configured as a lambda function.

The following blocks are supported: empty block, input block, Lambda block, KerasLayer block, recurrent block, repeated block, and sequential block.

1. Empty block

An empty block is not configured with any layer and can only be used to merge multiple inputs.

2. Input block

An input block is associated with an input layer to obtain, process, and return the original feature input.

Input blocks are unique because they can have only one input and the input is the name of a feature group specified by the feature_group_name parameter.

The name of an input block is the same as the name of the feature group that is configured as the input of the input block.

Sample configuration:

blocks {
 name: 'all'
 inputs {
 feature_group_name: 'all'
 }
 input_layer {
 only_output_feature_list: true
 }
}

An input layer can be configured to receive inputs in different formats, and can perform additional operations such as dropout. The following sample code provides an example of the parameter definition of an input layer in the protobuf format:

message InputLayer {
 optional bool do_batch_norm = 1;
 optional bool do_layer_norm = 2;
 optional float dropout_rate = 3;
 optional float feature_dropout_rate = 4;
 optional bool only_output_feature_list = 5;
 optional bool only_output_3d_tensor = 6;
 optional bool output_2d_tensor_and_feature_list = 7;
 optional bool output_seq_and_normal_feature = 8;
}

The following section describes the configurations of an input layer:

  • do_batch_norm indicates whether to perform the batch normalization operation on a feature input.

  • do_layer_norm indicates whether to perform the layer normalization operation on a feature input.

  • dropout_rate indicates the probability that the input layer performs the dropout operation. By default, the dropout operation is not performed.

  • feature_dropout_rate indicates the probability that the input layer performs the dropout operation for all feature inputs. By default, the dropout operation is not performed.

  • only_output_feature_list returns features in the list format.

  • only_output_3d_tensor returns a 3D tensor corresponding to a feature group. This parameter can be configured only when embedding dimensions are the same.

  • output_2d_tensor_and_feature_list indicates whether to output 2D tensors and a feature list at the same time.

  • output_seq_and_normal_feature indicates whether to output a tuple including sequence and common features.

3. Lambda block

A Lambda block allows you to configure a lambda function to perform simple operations. Sample configuration:

blocks {
 name: 'wide_logit'
 inputs {
 feature_group_name: 'wide'
 }
 lambda {
 expression: 'lambda x: tf.reduce_sum(x, axis=1, keepdims=True)'
 }
}

4. KerasLayer block

A KerasLayer block is the most fundamental block, which loads and executes the code logic.

  • class_name indicates the class name of the Keras layer to be loaded. Custom and built-in layer classes can be loaded.

  • st_params indicates parameters that are configured in google.protobuf.Struct format.

  • You can also pass parameters to a loaded layer in the custom protobuf message format.

Sample configuration:

keras_layer {
 class_name: 'MLP'
 mlp {
 hidden_units: [64, 32, 16]
 }
}

keras_layer {
 class_name: 'Dropout'
 st_params {
 fields {
 key: 'rate'
 value: { number_value: 0.5 }
 }
 }
}

5. Recurrent block

A recurrent block can implement an RNN-like loop structure and execute a layer multiple times. The input of each execution contains the output of the previous execution. The following sample code provides an example of a recurrent block. For more information, see DCN.

recurrent {
 num_steps: 3
 fixed_input_index: 0
 keras_layer {
 class_name: 'Cross'
 }
}

In the preceding configurations, the cross layer is looped over three times, which is logically equivalent to executing the following statements:

x1 = Cross()(x0, x0)
x2 = Cross()(x0, x1)
x3 = Cross()(x0, x2)
  • num_steps indicates the number of times for recurrent execution.

  • fixed_input_index indicates the fixed elements in an input list for each execution. Example: x0.

  • keras_layer indicates the component to be executed.

6. Repeated block

A repeated block executes a component multiple times by using the same inputs to implement the multi-head logic. Sample configuration:

repeat {
 num_repeat: 2
 keras_layer {
 class_name: "MaskBlock"
 mask_block {
 output_size: 512
 aggregation_size: 2048
 input_layer_norm: false
 }
 }
}
  • num_repeat indicates the number of repeated executions.

  • output_concat_axis indicates the concatenation dimension of tensors returned for multiple executions. If you do not configure this parameter, a list of multiple execution results is returned.

  • keras_layer indicates the component to be executed.

  • input_slice indicates the input slice of each component to be executed. For example, [i] is used to obtain the i-th element of the input list as the input for the i-th execution. If you do not configure this parameter, all inputs are obtained.

  • input_fn indicates the input function for each component to be executed. Example: input_fn: "lambda x, i: [x[0][i], x[1]]".

For more information, see MaskNet+PPNet+MMoE.

7. Sequential block

A sequential block can execute multiple layers in sequence. The output of the former layer is the input of the later layer. A sequential block is simpler than multiple common blocks that are connected end to end. Sample configuration:

blocks {
 name: 'mlp'
 inputs {
 feature_group_name: 'features'
 }
 layers {
 keras_layer {
 class_name: 'Dense'
 st_params {
 fields {
 key: 'units'
 value: { number_value: 256 }
 }

 fields {
 key: 'activation'
 value: { string_value: 'relu' }
 }
 }
 }
 }
 layers {
 keras_layer {
 class_name: 'Dropout'
 st_params {
 fields {
 key: 'rate'
 value: { number_value: 0.5 }
 }
 }
 }
 }
 layers {
 keras_layer {
 class_name: 'Dense'
 st_params {
 fields {
 key: 'units'
 value: { number_value: 1 }
 }
 }
 }
 }
}

Implement parameter sharing subnets by using block packages

A block package encapsulates a DAG that consists of multiple blocks and can be called multiple times by using shared parameters. A block package is commonly used in self-supervised learning models.

The following sample code provides the protobuf message definition of a block package:

message BlockPackage {
 // package name
 required string name = 1;
 // a few blocks generating a DAG
 repeated Block blocks = 2;
 // the names of output blocks
 repeated string concat_blocks = 3;
}

A block uses the package_name parameter to specify a block package.

The following sample code shows how to use a block package to implement contrastive learning:

model_config {
 model_class: "RankModel"
 feature_groups {
 group_name: "all"
 feature_names: "adgroup_id"
 feature_names: "user"
 ...
 feature_names: "pid"
 wide_deep: DEEP
 }

 backbone {
 packages {
 name: 'feature_encoder'
 blocks {
 name: "fea_dropout"
 inputs {
 feature_group_name: "all"
 }
 input_layer {
 dropout_rate: 0.5
 only_output_3d_tensor: true
 }
 }
 blocks {
 name: "encode"
 inputs {
 block_name: "fea_dropout"
 }
 layers {
 keras_layer {
 class_name: 'BSTCTR'
 bst {
 hidden_size: 128
 num_attention_heads: 4
 num_hidden_layers: 3
 intermediate_size: 128
 hidden_act: 'gelu'
 max_position_embeddings: 50
 hidden_dropout_prob: 0.1
 attention_probs_dropout_prob: 0
 }
 }
 }
 layers {
 keras_layer {
 class_name: 'Dense'
 st_params {
 fields {
 key: 'units'
 value: { number_value: 128 }
 }
 fields {
 key: 'kernel_initializer'
 value: { string_value: 'zeros' }
 }
 }
 }
 }
 }
 }
 blocks {
 name: "all"
 inputs {
 name: "all"
 }
 input_layer {
 only_output_3d_tensor: true
 }
 }
 blocks {
 name: "loss_ctr"
 merge_inputs_into_list: true
 inputs {
 package_name: 'feature_encoder'
 }
 inputs {
 package_name: 'feature_encoder'
 }
 inputs {
 package_name: 'all'
 }
 keras_layer {
 class_name: 'LOSSCTR'
 st_params{
 fields {
 key: 'cl_weight'
 value: { number_value: 1 }
 }
 fields {
 key: 'au_weight'
 value: { number_value: 0.01 }
 }
 }
 }
 }
 }
 model_params {
 l2_regularization: 1e-5
 }
 embedding_regularization: 1e-5
}

Case

In industrial-grade recommendation systems, user feedback behaviors on items commonly follow a long-tail distribution. A small number of head items receive the majority of user interactions, such as clicks, favorites, and conversions, while the remaining numerous mid-tail and long-tail items receive very little feedback. Recommendation models trained on user behavior logs based on long-tail distribution tend to increasingly favor head items. This reduces the exposure opportunities for mid-tail and long-tail items and affects user satisfaction.

The following phenomena occur during business performance optimization:

  1. Recall expansion by increasing the number of recalls or introducing new types of recalls often fails to improve the overall metrics.

  2. Excluding items outside the "premium pool" from the recall results often enhances performance metrics.

  3. Adding a coarse ranking model improves the coverage metric, but it does not necessarily lead to an overall improvement in business metrics.

Optimizations that closely adhere to the "independent and identically distributed" assumption tend to improve overall metrics, whereas those that stray from this assumption often fail to deliver the desired outcomes. The "independent and identically distributed" assumption refers to the assumption that the training and test datasets for the fine ranking model conform to the same data distribution. A fine ranking model is often the truncated recall or coarse ranking results from a production environment. A fine ranking model is trained on highly skewed long-tail behavioral data, which can lead to "overfitting" on head items and "underfitting" on mid-tail and long-tail items.

  • "Premium pool" filtering enhances the alignment of recall items with the long-tail distribution of user behavior. This meets the "independent and identically distributed" assumption of a fine ranking model and improves business metrics.

  • Recall expansion and coarse ranking models aim to smooth the long-tail distribution by increasing the number of mid-tail and long-tail items. However, this deviates from the "independent and identically distributed" assumption of a fine ranking model and often fails to achieve the desired performance improvements.

Based on the analysis of feature importance of a fine ranking model, "memory" features are high-importance features, whereas the importance of numerous mid-tail and long-tail features is extremely low. "Memory" features are those that do not enhance the generalization capabilities of a model, such as item IDs or user behavior statistics on specific item IDs over time. These features do not enable the model to acquire knowledge that can be applied to other items. Conventional model structures tend to produce a long-tail distribution of feature importance, which ultimately leads to a long-tail distribution of items preferred by a model.

Based on the preceding analysis, it is necessary to design a more reasonable model structure that allows a model to learn more "generalization" capabilities in addition to "memory" capabilities. CDN provides a feasible solution to the aforementioned issue by introducing a gating mechanism based on item distribution. This mechanism enables head items to fit "memory features," while mid-tail and long-tail items fit "generalization features." A model can output a solution that fits the final business objectives by using weighted summation to learn representation features from various features.

The following model structure is designed based on a real business scenario and a model is built by using the componentized EasyRec framework.

image.png

For more information about the configurations in this case, see Build a deep recommendation algorithm model based on the componentized EasyRec framework.

For more information about how to use EasyRec, see https://easyrec.readthedocs.io/en/latest/component/backbone.html.

Note

Note that the preceding references are in Chinese.