This topic describes how to use collaborative filtering to recommend products.

Background information

The correlation between diapers and beer is a classic example of data mining. Diapers and beer seem to be unrelated. However, when they are placed next to each other on shelves in supermarkets, the sales of both products significantly increase. You can use collaborative filtering, an algorithm that is commonly used in data mining, to explore the hidden correlations between different types of products. This way, the product sales can be boosted.

Collaborative filtering is based on association rules. For example, if both User 1 and User 2 purchased Products A and B, you can assume that User 1 and User 2 have similar interests in shopping. If User 1 purchases Product C but User 2 does not, you can recommend Product C to User 2. This is a typical example of user-based collaborative filtering, in which users are correlated based on their characteristics.

In the experiment that is described in this topic, the system obtains the correlations between products based on the shopping behavior of users before July. Then, the system recommends products to users based on the correlations and evaluates recommendation results. For example, if User 1 purchased Product A before July and Product A is strongly correlated with Product B, the system recommends Product B to User 1 in July or later and checks whether the recommendation is hit.
Note The dataset that is used in this topic is for experimental use only.
Note the following points about the experiment:
  • The experiment only shows how to use collaborative filtering in a shopping scenario. Many key factors such as the time series are not included.
  • The experiment considers only the correlations between products, but not the properties of the products, for example, whether a product is frequently purchased by the same user. For example, mobile phones are not frequently purchased by the same user. If a user buys a mobile phone in June, the user is unlikely to buy another mobile phone in July. The experiment does not consider the probability.
  • We recommend that you use product recommendation based on collaborative filtering as an add-on to your service. To increase the accuracy of the prediction, we recommend that you use a model that is trained by using a machine learning algorithm.

Dataset

The experiment described in this topic is based on a dataset that is provided by the Tianchi Big Data Competition. The dataset includes the shopping behavior before July and the shopping behavior in July and later. The following table describes the fields in the dataset.
Field Meaning Data type Description
user_id User ID STRING The ID of the user.
item_id Item ID STRING The ID of the item.
active_type Shopping behavior STRING
  • 0: click
  • 1: purchase
  • 2: add to favorites
  • 3: add to shopping cart
active_date Purchase date STRING The date on which the user purchased the item.
The following figure shows the sample data that is used in the experiment.Dataset

Procedure

  1. Go to the Machine Learning Studio console.
    1. Log on to the PAI console.
    2. In the left-side navigation pane, choose Model Training > Studio-Modeling Visualization.
    3. On the PAI Visualization Modeling page, find the project in which you want to create an experiment and click Machine Learning in the Operation column.Machine Learning
  2. Create an experiment.
    1. In the left-side navigation pane, click Home.
    2. In the Templates section, click Create below [Recommended Algorithms] Product Recommendation.
    3. In the New Experiment dialog box, set the experiment parameters. You can use the default values of the parameters.
      Parameter Description
      Name The name of the experiment. Default value: [Recommended Algorithms] Product Recommendation. The name must be 1 to 32 characters in length. Enter a name that meets this requirement, for example, Product Recommendation.
      Project The project in which you want to create the experiment. You cannot change the value of this parameter.
      Description The description of the experiment. Default value: Use collaborative filtering to recommend products.
      Save To The directory for storing the experiment. Default value: My Experiments.
    4. Click OK.
    5. Optional:Wait about 10 seconds. Then, click Experiments in the left-side navigation pane.
    6. Optional:Click Product Recommendation_XX under My Experiments. The canvas of the experiment appears.
      My Experiments is the directory for storing the experiment that you created and Product Recommendation_XX is the name of the experiment. In the experiment name, _XX is the ID that the system automatically creates for the experiment.
    7. View the components of the experiment on the canvas, as shown in the following figure. The system automatically creates the experiment based on the preset template.
      Collaborative filtering experiment
      Area No. Description
      1

      The components in this area generate a recommendation list based on association rules. The component named Use SQL statements to retrieve shopping behavior executes SQL statements to obtain the shopping behavior of users before July. Then, the Collaborative Filtering (etrec)-1 component calculates the item that is most similar to each item. This way, you can predict the products that each user may purchase at the same time.

      After the experiment is run, right-click Collaborative Filtering (etrec)-1 and select View Data. In the dialog box that appears, you can view the correlations between products, as shown in the following figure.Collaborative filtering resultParameter description:
      • itemid: the ID of the product on which a shopping behavior is performed.
      • similarity: the similarity between the product on which a shopping behavior is performed and the product that is most similar to this product. In each value, the number on the left of the colon (:) is the ID of the product that is most similar to the product on which a shopping behavior is performed. The number on the right of the colon (:) is the correlation probability between the two products.
      2 The components in this area process the shopping behavior data of July and later. In this experiment, simple recommendation rules are used. For example, if a user purchased Product A before July and Product A is strongly correlated with Product B, the system recommends Product B to the user in July or later.
      3 The components in this area calculate the total number of recommendations and the number of recommendations that are hit. The Whole Table Statistics-1 component provides the information about the recommendation list that is generated based on the shopping behavior before July. The Whole Table Statistics-2 component provides the information about the recommendations that are hit.
  3. Run the experiment and view the result.
    1. In the top toolbar of the canvas, click Run.
    2. After the experiment is run, right-click Whole Table Statistics-1 on the canvas and select View Data. In the dialog box that appears, view the information about the generated recommendation list.
    3. Right-click Whole Table Statistics-2 on the canvas and select View Data. In the dialog box that appears, view the information about the recommendations that are hit.