This experiment uses data from real-life e-commerce scenarios that has been anonymized. The data is only used for learning and shall not be used for commercial purposes.
The previous issue describes how to use Machine Learning Platform for AI (PAI) to build a recommendation system based on collaborative filtering. This topic describes recommendation methods based on the features of recommendation objects and targets.
The following figure shows the general flowchart of recommendation based on object features.
- Import the supervised, structured data to MaxCompute.
- Perform feature engineering, including data preprocessing and feature derivation. Feature derivation aims to expand data dimensions so that data can reflect business features to the maximum extent.
- Split the data into two parts. One part is used to create a binary classification model by using the classification algorithm. The other part is used to test the model effect.
- Determine the model effect by using the evaluation component.
Create a prediction model by training the April and May data of a real-life e-commerce scenario. Evaluate the prediction model based on the shopping statistics in June to determine the optimal model. Deploy the optimal model as an online HTTP service to be called in business scenarios.
This experiment is conducted in PAI Studio to build a recommendation system based on object features simply by dragging and dropping components. The data and complete business flow in this experiment are built in the corresponding template on the homepage. The template is ready for use.
This experiment uses data provided by Tianchi Competition, including the shopping behavior statistics before July and the data since July.The fields are as follows.
|user_id||User ID||string||The ID of a buyer.|
|item_id||Item ID||string||The ID of the purchased item.|
|active_type||Shopping behavior||string||0: Click; 1: Buy; 2: Add to Favorites; 3: Add to Shopping Cart.|
|active_date||The time of shopping||string||The time when the shopping occurs.|
This experiment is conducted in PAI Studio. It allows you to build a recommendation system simply by dragging and dropping components based on collaborative filtering. PAI Studio supports automatic parameter tuning and one-click model deployment.
The following figure shows the experiment flowchart.
Perform feature engineering to expand the dimensions of the raw data with only four fields. The recommendation scenario includes two types of features: the features of the targets to which items are recommended and the features of the items that are recommended.
In the case of item recommendation:
- The recommendation object is an item. The expanded dimensions include the number of purchases of this item, the number of clicks on this item, and the purchase-to-click ratio of this item, which is calculated by dividing the purchase quantity by the click quantity.
- The recommendation target is a user. The expanded dimensions include the total number of purchases made by this user, the total number of clicks by this user, and the purchase-to-click ratio of this user, which is calculated by dividing the click quantity by the purchase quantity. The purchase-to-click ratio indicates the number of times that the user clicks before buying an item. It describes the user’s purchase intention.
The data is expanded from 4 fields to 10 fields.
Feature engineering produces a large wide table with structured data, which can be used for model training. This experiment uses the logistic regression algorithm. Model training requires proper parameter setting. It is necessary to properly set the following logistic regression parameters for optimal effect of model training.
PAI provides the AutoML engine for parameter tuning. Open AutoML and set the parameter value range and evaluation criteria of the algorithm that requires parameter tuning. Then, the engine finds the most suitable parameter settings with minimum resource consumption. See the following figure.
The model evaluation module uses the reserved data that is not used for model training to evaluate the model quality. The experiment on recommendation involves binary classification. You can use the confusion matrix and the Binary Classification Evaluation component to evaluate the model quality.
Binary classification evaluation: Choose Components and click the Charts tab. The ROC curve shown in the following figure appears. The blue area indicates the AUC value. The larger the area, the higher the model quality.
The confusion matrix can be used to determine specific metrics such as the prediction accuracy, recall rate, and F1-Score.
If the model effect meets expectation, deploy the model as an online service in one click through Elastic Algorithm Service (EAS) of PAI. Then, the model can be accessed over HTTP. On the canvas, click Deploy, select Deploy Model Online, and select the target model.
After the model is deployed as an online service, it can be accessed through HTTP requests in business scenarios. This streamlines the process from model training through PAI to business application.