This topic describes how to recommend products based on object features.

Background information

In the experiment that is described in this topic, a prediction model is trained based on e-commerce data of April and May and evaluated based on shopping data of June. An optimal model is deployed as a RESTful API to be called in business scenarios.
Notice The experiment is based on real data that is collected from an e-commerce platform after data masking. The data is not intended for commercial purposes.

The data and entire workflow of the experiment are preset in the Recommendation Based on Object Characteristics template of Machine Learning Studio. You can implement recommendation based on collaborative filtering in a fast manner by dragging the components that are provided by Machine Learning Studio. In addition, Machine Learning Studio supports automatic parameter tuning. This allows you to deploy a model as a RESTful API with ease.

Dataset

The experiment described in this topic is based on a dataset that is provided by the Tianchi Big Data Competition. The dataset includes the shopping behavior in April and May and that in June. The following table describes the fields in the dataset.
Field Meaning Data type Description
user_id User ID STRING The ID of the user.
item_id Item ID STRING The ID of the item.
active_type Shopping behavior STRING
  • 0: click
  • 1: purchase
  • 2: add to favorites
  • 3: add to shopping cart
active_month Active month STRING The month in which the shopping behavior was performed.
The following figure shows the sample data that is used in the experiment.Dataset

Procedure

  1. Go to the Machine Learning Studio console.
    1. Log on to the PAI console.
    2. In the left-side navigation pane, choose Model Training > Studio-Modeling Visualization.
    3. On the PAI Visualization Modeling page, find the project in which you want to create an experiment and click Machine Learning in the Operation column.Machine Learning
  2. Create an experiment.
    1. In the left-side navigation pane, click Home.
    2. In the Templates section, click Create below Recommendation Based on Object Characteristics.
    3. In the New Experiment dialog box, set the experiment parameters. You can use the default values of the parameters.
      Parameter Description
      Name The name of the experiment. Default value: Recommendation Based on Object Characteristics. The name must be 1 to 32 characters in length. Enter a name that meets this requirement, for example, RecommendBasedonCharacteristics.
      Project The project in which you want to create the experiment. You cannot change the value of this parameter.
      Description The description of the experiment. Default value: Recommendation based on object characteristics.
      Save To The directory for storing the experiment. Default value: My Experiments.
    4. Click OK.
    5. Wait about 10 seconds. Then, click Experiments in the left-side navigation pane.
    6. Click RecommendBasedonCharacteristics_XX under My Experiments. The canvas of the experiment appears.
      My Experiments is the directory for storing the experiment that you created and RecommendBasedonCharacteristics_XX is the name of the experiment. In the experiment name, _XX is the ID that the system automatically creates for the experiment.Recommendation based on object features
      Area No. Description
      1 The components in this area perform feature engineering. The raw data includes only four fields. Feature engineering is performed to expand the dimensions of the raw data. The features in this experiment include the features of the recommendation targets and objects.
      • The recommendation targets are the users to whom items are recommended. The dimensions that are expanded for each user include the number of purchases by the user, the number of clicks by the user, and the click-to-purchase ratio of the user. The click-to-purchase ratio is calculated by dividing the number of clicks by the purchase rate. The ratio describes the purchase intention of the user.
      • The recommendation objects are the items that are recommended to users. The dimensions that are expanded for each item include the number of purchases of the item, the number of clicks on the item, and the purchase-to-click ratio of the item. The purchase-to-click ratio is calculated by dividing the number of purchases by the click-through rate.
      After feature engineering, the dataset is expanded from 4 fields to 10 fields, as shown in the following figure.Data that is obtained after feature engineering
      2 The components in this area train a model. In this experiment, a logistic regression algorithm is used to train a model. You can use the AutoML engine that is preset in Machine Learning Studio to automatically adjust the parameters of the logistic regression component. This way, an optimal model can be obtained.
      3 The components in this area evaluate the quality of the model. The reserved data that is not used to train the model is used to evaluate the quality of the model. Experiments on recommendation are binary classification experiments. In such an experiment, you can use a confusion matrix and a binary classification evaluation component to evaluate the quality of the model.
  3. Use the AutoML engine to adjust the parameters of the logistic regression component.
    1. In the top toolbar of the canvas, choose AutoML > Auto Parameter Tuning.
    2. In the Auto Parameter Tuning dialog box, keep Logistic Regression for Binary Classification-1 selected in the Select Algorithm step and click Next.
    3. In the Configure Parameter Tuning step, set the parameters for parameter tuning, as described in the following table. Then, click Next.
      Parameter Description
      Data Splitting Ratio The split ratio of data. Select 0.7 from the drop-down list.
      Parameter Tuning Method The parameter tuning method. Select EVOLUTIONARY_OPTIMIZER from the drop-down list.
      Exploration Samples The number of exploration samples. Select 5 from the drop-down list.
      Explorations The number of explorations. Select 2 from the drop-down list.
      Convergence Coefficient The convergence coefficient. Select 0.5 from the drop-down list.
      Logistic regression for binary classification Regularization Type The regularization type. Select None from the drop-down list in the Custom Range column.
      Regularization Coefficient The regularization coefficient. Specify the range 0.1-2 in the Custom Range column.
      Minimum Convergence Deviance The minimum convergence deviance. Specify the range 0.00000001-0.00001 in the Custom Range column.
      Maximum Iterations The maximum number of iterations. Specify the range 50-500 in the Custom Range column.
    4. In the Configure Model Output step, set the parameters for model output and click Next.
      Parameter Description
      Generated Models The number of models to be generated. The system automatically calculates the number based on the values of the parameters in the Configure Parameter Tuning step. You do not need to set this parameter.
      Algorithm Type The type of the algorithm to be used. Select Binary Classification Evaluation from the drop-down list.
      Evaluation Criteria The criterion for evaluation. Select AUC from the drop-down list.
      Saved Models The number of models to be saved. Select 5 from the drop-down list.
      Pass Down Model Specifies whether to pass down the optimal model to subsequent components. Turn on the Pass Down Model switch.
    5. In the Auto Parameter Tuning message, click OK.
    6. In the top toolbar of the canvas, click Run.
    7. After the Logistic Regression for Binary Classification-1 component is run, right-click this component and select Parameter Tuning Details. In the AutoML-Parameter Tuning Details dialog box, view the parameter tuning details.
  4. View the model evaluation report.
    1. After the experiment is run, right-click Binary Classification Evaluation-1 on the canvas and select View Evaluation Report.
    2. In the Evaluation Report dialog box, click the Charts tab and view the receiver operating characteristic (ROC) curve.
      ROC curveThe blue area represents the area under curve (AUC) value. A larger blue area indicates higher model quality.
    3. Right-click Confusion Matrix-1 on the canvas and select View Evaluation Report.
    4. In the Confusion Matrix dialog box, view the evaluation report.Evaluation result in a confusion matrix
  5. Deploy the model as a RESTful API.
    If the model meets your expectation, click Deploy in the top toolbar of the canvas to deploy the model as a RESTful API. For more information, see Deploy a model.