Recommend products based on user and item features - Platform For AI

This topic describes how to create a product recommendation model based on user and item features.

Prerequisites

A workspace is created. For more information, see Create a workspace.

Background information

In this sample pipeline, a model is trained by using real-world e-commerce data from April and May and evaluated by using data from June. After the model performance is verified, the model is deployed in Elastic Algorithm Service (EAS) as an online service.

Important

The sample pipeline uses anonymized real data from an e-commerce platform. The data is not intended for commercial purposes.

The sample pipeline and related data are included in a preset template provided by Machine Learning Designer. You can drag the components in the template to create a recommendation model based on collaborative filtering. Then, you can deploy the model you train in Machine Learning Designer to EAS with a few clicks.

General workflow

Import data to MaxCompute to generate supervised, structured data.
Perform feature engineering operations such as data preprocessing and feature derivation. Feature derivation generates new data from existing data to better capture business-specific characteristics.
Split the data into two datasets. Use one dataset to train a binary classification model. Use the other dataset to evaluate the performance of the model.
Evaluate the performance of the model.

Dataset

The sample pipeline uses a dataset from Tianchi Big Data Competition. The dataset contains the shopping data of an e-commerce platform from April to June. The following table describes the fields in the dataset.

Field	Short description	Type	Full description
user_id	User ID	STRING	The ID of the user who purchased an item.
item_id	Item ID	STRING	The ID of the purchased item.
active_type	Shopping behavior	STRING	0: Click an item. 1: Purchase an item. 2: Add an item to favorites. 3: Add an item to the shopping cart.
active_date	Purchase date	STRING	The date on which the user purchased the item.

The following figure shows the raw data that is used by the sample pipeline. 原始示例数据

Procedure

Go to the Machine Learning Designer page.
1. Log on to the PAI console.
2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.
3. In the left-side navigation pane of the workspace page, choose Model Development and Training > Visual Modeling (Designer) to go to the Machine Learning Designer page.

Create a pipeline.

On the Visualized Modeling (Designer) page, click the Preset Templates tab.
In the Preset Templates tab, find Recommendation Based on Object Characteristics and click Create.
In the Create Pipeline dialog box, configure the parameters. You can use their default values.
The value specified for the Pipeline Data Path parameter is the Object Storage Service (OSS) bucket path of the temporary data and models generated during the runtime of the pipeline.
Click OK.
It requires about 10 seconds to create the pipeline.
In the Pipelines tab, double-click Recommendation Based on Object Characteristics to open the pipeline.

View the components of the pipeline on the canvas. The following figure shows the pipeline that is automatically created based on the preset template.

Section	Description
①	The components in this section perform feature engineering, which generates new data for the following features based on the original dataset: User features: Generated data includes the number of purchases, number of clicks, and click-to-purchase ratio of each user. The click-to-purchase ratio is calculated by dividing the number of clicks by the number of purchases. The ratio reflects the decisiveness of a user in shopping activities. Item features: Generated data includes the number of purchases, number of clicks, and purchase-to-click ratio of each item. The purchase-to-click ratio is calculated by dividing the number of purchases by the number of clicks. After feature engineering, the dataset is expanded from 4 fields to 10 fields, as shown in the following figure.
②	The components in this section use a logical regression algorithm to train a model. To save the trained model, click the Logistic Regression for Binary Classification component, click the Fields Setting tab in the right-side pane, and then select Whether To Generate PMML.
③	The components in this section evaluate the performance of the model by using the data that is not used to train the model. In most cases, you can use the Confusion Matrix and Binary Classification Evaluation components to evaluate the performance of a recommendation model.

Run the pipeline and view the prediction results.
1. In the upper-left corner of the canvas, click the Run icon.
2. After the pipeline completes running, right-click the Logistic Regression for Binary Classification component on the canvas and choose Model Options > Export to PMML Files to export the trained model.
3. Right-click the Prediction component on the canvas and choose View Data > Prediction Result Output to view the prediction results of the model.
View the evaluation results of the model.
1. Right-click the Binary Classification Evaluation component on the canvas and select Visual Analysis.
2. In the Binary Classification Evaluation section, click the Evaluation chart tab to view the receiver operating characteristic (ROC) curve.
  The blue area represents the area under curve (AUC) value. A larger blue area indicates higher model quality.
3. Right-click the Confusion Matrix component on the canvas and select Visual Analysis.
4. In the Confusion Matrix section, click the Confusion matrix tab to view the evaluation results.
Deploy the model.
If the model performance meets your expectations, click Models in the upper part of the canvas to deploy the model as an online service. For more information, see Deploy a model as an online service.