This topic describes the DeepFM algorithm.


DeepFM combines deep neural network (DNN) and factorization machine (FM) models, and can learn both lower-order explicit feature combinations and higher-order implicit feature combinations. It does not require manual feature engineering and is often used in recommendation systems or advertising systems.
  • You can enter the following types of feature:
    • Categorical feature: a string type, such as gender (male or female) and commodity category (such as clothing, toys, or electronics).
    • Numerical feature: integer or floating-point data type, such as user activity or commodity prices.
  • Floating-point numbers between 0 and 1 are often returned, indicating the probability that the value is 1. They can be used for sorting or binary classification.


DeepFM is usually used in classification and sorting scenarios. It works better on scenes where features are manually built but do not directly reflect results. In recommendation scenarios, low-order feature combinations or high-order feature combinations affect final behaviors of users. However, you may not be able to identify feature combinations. DeepFM can automatically learn these combinations.

For example, in personalized commodity recommendation scenarios, click estimation models are usually required. Historical user behaviors such as clicks, unclicks, purchases can be used as training data to predict the probability of user clicks or purchases. If users have many historical behaviors and cannot directly reflect clicks or purchases in future, DeepFM can combine user behaviors and converts sparse features into high-dimentional dense features.


The parameters in the following table are the values of the model_parameter parameters in the CREATE MODEL statement for creating a model. You can select the values based on your needs.

metricsThe metrics used to evaluate the model. Default value: accuracy. Valid values:
  • accuracy: the accuracy. It is used to evaluate classification models.
  • binary_crossentropy: the cross entropy, It is used to evaluate binary classification problems.
  • mse: the mean square error. It is used to evaluate regression models.
lossThe learning task and its learning objectives. Default value: binary_crossentropy. Valid values:
  • binary_crossentropy: the cross entropy, It is used for binary classification problems.
  • mean_squared_error: the mean square error. It is used for regression models.
optimizerThe optimizer. Default value: adam. Valid values:
  • adam: This algorithm combines the advantages of AdaGrad and gradient descent with momentum. It can adapt to sparse gradients (the problems with natural language and computer vision) and alleviate gradient oscillation.
  • sgd: stochastic gradient descent.
  • rmsdrop: This algorithm improves AdaGrad algorithm by introducing a weight parameter and modifying the accumulation of step components to the weighted sum.
validation_splitThe ratio of cross-validation data. Default value: 0.2.
epochsThe number of iterations. Default value: 6.
batch_sizeThe length of the batch. A short batch is prone to overfitting. Default value: 64.
taskThe type of the task. Default value: binary. Valid values:
  • binary: the binary classification model.
  • regression: the regression model.


Create a model and an offline training task.
CREATE MODEL airline_deepfm WITH
(model_class = 'deepfm',
x_cols = 'Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length',
AS (select * from db4ai.airlines)
Use the model for prediction.
select Airline FROM PREDICT(MODEL airline_deepfm,
select * from db4ai.airlines limit 20) WITH
(x_cols = 'Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length',