All Products
Search
Document Center

Platform For AI:Recommend products based on object features

Last Updated:Oct 16, 2023

This topic describes how to recommend products based on object features.

Prerequisites

A workspace is created. For more information, see Create a workspace.

Background information

In the pipeline that is used in this topic, a prediction model is trained based on e-commerce data from April and May and evaluated based on shopping data from June. An optimal model is deployed as a RESTful API to be called in business scenarios.

Important

The pipeline is based on masked real data that is collected from an e-commerce platform. The data is not intended for commercial purposes.

The data and entire workflow of the pipeline are preset in the pipeline template of Machine Learning Designer. You can implement recommendations based on collaborative filtering in a fast manner by dragging the components to the canvas in Machine Learning Designer. In addition, Machine Learning Designer supports model deployment with a few clicks. This allows you to deploy a model as a RESTful API with ease.

General process of product recommendation based on object features

基于特征推荐的流程
  1. Import data to MaxCompute to generate supervised, structured data.

  2. Perform feature engineering, including operations such as data preprocessing and feature derivation. Feature derivation expands dimensions for data, so that the data can better demonstrate business characteristics.

  3. Split the data into two datasets. Use a classification algorithm to train a binary classification model based on one dataset. Use a prediction component to predict the quality of the model based on the other dataset.

  4. Use an evaluation component to evaluate the quality of the model.

Datasets

The pipeline that is used in this topic uses a dataset that is provided by the Tianchi Big Data Competition. The dataset includes the shopping behavior in April and May and that in June. The following table describes the fields in the dataset.

Field

Description

Type

Description

user_id

User ID

STRING

The ID of the user.

item_id

Item ID

STRING

The ID of the item.

active_type

Shopping behavior

STRING

  • 0: click

  • 1: purchase

  • 2: add to favorites

  • 3: add to shopping cart

active_date

Purchase date

STRING

The date on which the user purchased the item.

The following figure shows the sample raw data that is used in the pipeline. 原始示例数据

Procedure

  1. Go to the Machine Learning Designer page.

    1. Log on to the Machine Learning Platform for AI console.

    2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.

    3. In the left-side navigation pane, choose Model Training > Visualized Modeling (Designer) to go to the Machine Learning Designer page.

  2. Create a pipeline

    1. On the Visualized Modeling (Designer) page, click the Preset Templates tab.

    2. In the Preset Pipelines tab, click Create in the Recommendation Based on Object Characteristics section.

    3. In the Create Pipeline dialog box, configure the parameters. You can use their default values.

      The value specified for the Pipeline Data Path parameter is the Object Storage Service (OSS) bucket path of the temporary data and models generated during the runtime of the pipeline.

    4. Click OK.

      It requires about 10 seconds to create the pipeline.

    5. In the Pipelines tab, double-click Recommendation Based on Object Characteristics to enter the pipeline.

    6. View the components of the pipeline on the canvas as shown in the following figure. The system automatically creates the pipeline based on the preset template.

      c4a0febf78030c372ff758f0946722da.png

      Section

      Description

      The components in this section perform feature engineering. The raw data includes only four fields. Feature engineering is performed to expand the dimensions of the raw data. The features in this pipeline include the features of the items and the users to whom these items are recommended.

      • The dimensions that are expanded for each user include the number of purchases, the number of clicks, and the click-to-purchase ratio of the user. The click-to-purchase ratio is calculated by dividing the number of clicks by the purchase rate. The ratio describes the purchase intention of the user.

      • The dimensions that are expanded for each item include the number of purchases, the number of clicks, and the purchase-to-click ratio of the item. The purchase-to-click ratio is calculated by dividing the number of purchases by the click-through rate.

      After feature engineering, the dataset is expanded from 4 fields to 10 fields, as shown in the following figure. 特征工程后的数据

      The pipeline uses a logical regression algorithm for model training.

      You can click the Logistic Regression for Binary Classification component, select the Fields Setting tab in the upper-right corner of the canvas, and then select Whether To Generate PMML. This allows PMML models to be generated.

      The components in this section evaluate the quality of the model. The reserved data that is not used to train the model is used to evaluate the quality of the model. Pipelines on recommendation are binary classification pipelines. In such a pipeline, you can use a confusion matrix and a binary classification evaluation component to evaluate the quality of the model.

  3. Run the pipeline and view the results.

    1. In the upper-left corner of the canvas, click Run.

    2. After the pipeline stops running, right-click Logistic Regression for Binary Classification on the canvas and choose Model Options > Export to PMML Files to export the recommendation based on object features model that is trained.

    3. Right-click Prediction on the canvas and choose View Data > Prediction Result Output to view the prediction results of the model.

  4. View the evaluation report of the model.

    1. Right-click Binary Classification Evaluation on the canvas and click Visual Analysis.

    2. In the Evaluate section, click the Evaluation Chart tab to view the receiver operating characteristic (ROC) curve.

      AOC曲线The blue area represents the area under curve (AUC) value. A larger blue area indicates higher model quality.

    3. Right-click Confusion Matrix on the canvas and click Visual Analysis.

    4. In the Confusion Matrix section, click the Confusion Matrix tab to view the evaluation results.

      混淆矩阵评估结果
  5. Deploy the model.

    If the model meets your expectations, you can click Models in the upper part of the canvas to deploy the model as an online service. For more information, see Deploy a model as an online service.