Introduction and Implementation of DBMTL for Multi-task Learning Model

multitasking background
The recommendation algorithms currently used in the industry are not limited to single-target (ctr) tasks, but also need to pay attention to subsequent conversion links, such as whether to comment, bookmark, add purchases, purchases, viewing time and other goals.
A common multi-objective optimization model starts from a separate model network for each optimization objective, and achieves an appropriate degree of independence and correlation of each objective-related model by allowing these networks to share parameters at the bottom layer. This type of model framework can be summarized by the structure in the above figure. Regardless of how the underlying parameters are shared, these networks have independent branches at the last few layers to predict the final value of each target. The probability model of such a network can be described by the following formula:
Among them, l and m are the target, x is the sample feature, and H is the model. The assumption here is that each target is independent.
Introduction to DBMTL
One of the starting points of DBMTL (Deep Bayesian Multi-Target Learning) is to solve the above problems. In fact, applying the simple Bayesian formula, the probability model can be written as:
As shown in the figure below, the main difference between DBMTL and the traditional MTL structure (which considers each target independent) lies in the construction of a Bayesian network between target nodes, which explicitly models the possible causal relationship between targets. Because in actual business, many behaviors of users often have obvious sequential dependencies. For example, in the information flow scenario, users must first click on the graphic details page before performing subsequent operations such as browsing/commenting/forwarding/favorite. DBMTL embodies these relationships in the model structure, and thus, tends to learn better results.
The following figure is the specific implementation of the DBMTL model. The network consists of input layer, shared embedding layer, shared layer, discriminative layer and Bayesian layer.
• The shared embedding layer is a shared lookup table shared by each target training.
• The shared and split layers are generic multilayer perceptrons (MLPs) that model shared/differentiated representations of objects, respectively.
• The Bayesian layer is the most important part of DBMTL. It implements the following probabilistic model:
Its corresponding log-likelihood loss function is:

In practical applications, weight adjustment for different goals still has a great practical effect. When assigning different weights to the target, it is equivalent to re-expressing the loss function as:

In the Bayesian layer of the network, the functions f1, f2, f3 are implemented as fully connected MLPs to learn implicit causal relationships among objects. They take as input the concatenation of embeddings of the function's input variables and feed in an embedding representing the function's output variables. The embedding of each target finally goes through a layer of MLP to output the probability of the final target.
Code
Based on the EasyRec recommendation algorithm framework, we have implemented the DBMTL algorithm, and the specific implementation can be moved to github: EasyRec-DBMTL.
Introduction to EasyRec: EasyRec is a large-scale distributed recommendation algorithm framework open sourced by the machine learning PAI team of Alibaba Cloud Computing Platform. The feature engineering method that has achieved excellent results, integrated training, evaluation, and deployment, and seamlessly connected with Alibaba Cloud products, can use EasyRec to build a cutting-edge recommendation system in a short period of time. As the leading product of Alibaba Cloud, it has been stably serving hundreds of enterprise customers.
Model Feedforward Network
def build_predict_graph(self):
"""Forward function.
Returns:
self._prediction_dict: Prediction result of two tasks.
"""
# Here we start from the tensor (self._features) after sharing the embedding layer, omitting its generation logic

# shared layer
if self._model_config.HasField('bottom_dnn'):
bottom_dnn = dnn.DNN(
self._model_config.bottom_dnn,
self._l2_reg,
name='bottom_dnn',
is_training=self._is_training)
bottom_fea = bottom_dnn(self._features)
else:
bottom_fea = self._features
# MMOE block
if self._model_config.HasField('expert_dnn'):
mmoe_layer = mmoe.MMOE(
self._model_config.expert_dnn,
l2_reg=self._l2_reg,
num_task=self._task_num,
num_expert=self._model_config.num_expert)
task_input_list = mmoe_layer(bottom_fea)
else:
task_input_list = [bottom_fea] * self._task_num
tower_features = {}
# specific layer
for i, task_tower_cfg in enumerate(self._model_config.task_towers):
tower_name = task_tower_cfg.tower_name
if task_tower_cfg. HasField('dnn'):
tower_dnn = dnn.DNN(
task_tower_cfg.dnn,
self._l2_reg,
name=tower_name + '/dnn',
is_training=self._is_training)
tower_fea = tower_dnn(task_input_list[i])
tower_features[tower_name] = tower_fea
else:
tower_features[tower_name] = task_input_list[i]
tower_outputs = {}
relation_features = {}
#bayesian network
for task_tower_cfg in self._model_config.task_towers:
tower_name = task_tower_cfg.tower_name
relation_dnn = dnn.DNN(
task_tower_cfg.relation_dnn,
self._l2_reg,
name=tower_name + '/relation_dnn',
is_training=self._is_training)
tower_inputs = [tower_features[tower_name]]
for relation_tower_name in task_tower_cfg.relation_tower_names:
tower_inputs.append(relation_features[relation_tower_name])
relation_input = tf.concat(
tower_inputs, axis=-1, name=tower_name + '/relation_input')
relation_fea = relation_dnn(relation_input)
relation_features[tower_name] = relation_features
output_logits = tf.layers.dense(
relation_fea,
task_tower_cfg.num_class,
kernel_regularizer=self._l2_reg,
name=tower_name + '/output')
tower_outputs[tower_name] = output_logits
self._add_to_prediction_dict(tower_outputs)
Loss calculation
def build(loss_type, label, pred, loss_weight=1.0, num_class=1, **kwargs):
if loss_type == LossType. CLASSIFICATION:
if num_class == 1:
return tf.losses.sigmoid_cross_entropy(
label, logits=pred, weights=loss_weight, **kwargs)
else:
return tf.losses.sparse_softmax_cross_entropy(
labels=label, logits=pred, weights=loss_weight, **kwargs)
elif loss_type == LossType.CROSS_ENTROPY_LOSS:
return tf.losses.log_loss(label, pred, weights=loss_weight, **kwargs)
elif loss_type in [LossType.L2_LOSS, LossType.SIGMOID_L2_LOSS]:
logging.info('%s is used' % LossType.Name(loss_type))
return tf.losses.mean_squared_error(
labels=label, predictions=pred, weights=loss_weight, **kwargs)
elif loss_type == LossType. PAIR_WISE_LOSS:
return pairwise_loss(pred, label)
else:
raise ValueError('unsupported loss type: %s' % LossType.Name(loss_type))
def _build_loss_impl(self,
loss_type,
label_name,
loss_weight=1.0,
num_class=1,
suffix=''):
loss_dict = {}
if loss_type == LossType. CLASSIFICATION:
loss_name = 'cross_entropy_loss' + suffix
pred = self._prediction_dict['logits' + suffix]
elif loss_type in [LossType.L2_LOSS, LossType.SIGMOID_L2_LOSS]:
loss_name = 'l2_loss' + suffix
pred = self._prediction_dict['y' + suffix]
else:
raise ValueError('invalid loss type: %s' % LossType.Name(loss_type))
loss_dict[loss_name] = build(loss_type,
self._labels[label_name],
pred,
loss_weight, num_class)
return loss_dict
def build_loss_graph(self):
"""Build loss graph for multi task model."""
for task_tower_cfg in self._task_towers:
tower_name = task_tower_cfg.tower_name
loss_weight = task_tower_cfg.weight * self._sample_weight
if hasattr(task_tower_cfg, 'task_space_indicator_label') and
task_tower_cfg. HasField('task_space_indicator_label'):
in_task_space = tf.to_float(
self._labels[task_tower_cfg.task_space_indicator_label] > 0)
loss_weight = loss_weight * (
task_tower_cfg.in_task_space_weight * in_task_space +
task_tower_cfg.out_task_space_weight * (1 - in_task_space))
# The EasyRec framework will automatically add the loss in self._loss_dict.
self._loss_dict.update(
self._build_loss_impl(
task_tower_cfg.loss_type,
label_name=self._label_name_dict[tower_name],
loss_weight=loss_weight,
num_class=task_tower_cfg.num_class,
suffix='_%s' % tower_name))
return self._loss_dict
application
Due to its excellent algorithm effect, DBMTL is widely used on PAI.
Taking a live broadcast recommendation business as an example, the scenario has multiple objectives of is_click, is_view, view_costtime, is_on_mic, and on_mic_duration, among which is_click, is_view, and is_on_mic are binary classification tasks, and view_costtime and on_mic_duration are regression tasks for predicting duration. The dependencies of user behavior are:
• is_click => is_view
• is_click+is_view=> view_costtime
• is_click => is_on_mic
• is_click+is_on_mic => on_mic_duration
So the configuration is as follows:
dbmtl {
bottom_dnn {
hidden_units: [512, 256]
}
task_towers {
tower_name: "is_click"
label_name: "is_click"
loss_type: CLASSIFICATION
metrics_set: {
auc {}
}
dnn {
hidden_units: [128, 96, 64]
}
relation_dnn {
hidden_units: [32]
}
weight: 1.0
}
task_towers {
tower_name: "is_view"
label_name: "is_view"
loss_type: CLASSIFICATION
metrics_set: {
auc {}
}
dnn {
hidden_units: [128, 96, 64]
}
relation_tower_names: ["is_click"]
relation_dnn {
hidden_units: [32]
}
weight: 1.0
}
task_towers {
tower_name: "view_costtime"
label_name: "view_costtime"
loss_type: L2_LOSS
metrics_set: {
mean_squared_error {}
}
dnn {
hidden_units: [128, 96, 64]
}
relation_tower_names: ["is_click", "is_view"]
relation_dnn {
hidden_units: [32]
}
weight: 1.0
}
task_towers {
tower_name: "is_on_mic"
label_name: "is_on_mic"
loss_type: CLASSIFICATION
metrics_set: {
auc {}
}
dnn {
hidden_units: [128, 96, 64]
}
relation_tower_names: ["is_click"]
relation_dnn {
hidden_units: [32]
}
weight: 1.0
}
task_towers {
tower_name: "on_mic_duration"
label_name: "on_mic_duration"
loss_type: L2_LOSS
metrics_set: {
mean_squared_error {}
}
dnn {
hidden_units: [128, 96, 64]
}
relation_tower_names: ["is_click", "is_on_mic"]
relation_dnn {
hidden_units: [32]
}
weight: 1.0
}
l2_regularization: 1e-6
}
embedding_regularization: 5e-6
}
It is worth mentioning that after the DBMTL model is launched, the online onlooker rate has increased by 18% and the mic rate has increased by 14% compared with GBDT+FM (onlooker single target).

Introduction and Implementation of DBMTL for Multi-task Learning Model

Related Articles

A detailed explanation of Hadoop core architecture HDFS

What Does IOT Mean

6 Optional Technologies for Data Storage

What Is Blockchain Technology

Explore More Special Offers

Short Message Service(SMS) & Mail Service

Sales Support

Technical Support

Connect & Report Abuse