This topic describes how to combine the collaborative filtering strategy of matching recall with the matching algorithm of Machine Learning Studio to streamline a complete recall process.

Data description

To use the collaborative filtering strategy, you must import a User-item table and an Item-item table to Tablestore (OTS).
  • The User-item table stores user behavior, such as the behavior of purchasing, clicking, and adding to favorites. Table schema
    • user_id: the ID of the user.
    • item_id: the ID of the product.
    • active_type: the user behavior. 0 represents clicking, 1 represents purchasing, and 2 represents adding to favorites.
    You must make sure that the tables stored in OTS are in the format supported by AutoLearning. The following figure shows the format. user_id is the primary key. item_ids lists the items that correspond to each user_id. Items are separated with commas (,).Data format
  • The Item-item table uses the collaborative filtering strategy to calculate the similarity among different items.Item-item table example
    • item_id: the matched product.
    • similar_item_ids: stores key:value pairs. key represents the ID of a source product, and value indicates the similarity between the source product and the matched product. A larger value indicates a higher similarity. key:value pairs are separated with commas (,).
    You must make sure that the tables stored in OTS are in the format supported by AutoLearning. The following figure shows the format. item_ids is the primary key. similar_item_ids lists the items that are similar to the item. Items are separated with commas (,). The system supports items without weights.Data format

Step 1: Generate training data

  1. Visit the PAI Visualization Modeling page.
    1. Log on to the Machine Learning Platform for AI console.
    2. In the left-side pane, choose Model Training > Studio-Modeling Visualization.
    3. On the PAI Visualization Modeling page, click Machine Learning.
  2. In the left-side navigation pane, click Home.
  3. In the [Recommended Algorithms] Product Recommendation section, click Create.
  4. In the canvas, keep the following components and delete the other components.Model componentsThe cf_training_data component corresponds to the User-item data. The Collaborative Filtering (etrec)component corresponds to the Item-item data.
  5. Click the Collaborative Filtering (etrec) component. On the Parameter Setting tab of the right-side panel, set Top N to 5. This way, five similar items are returned for each specified item.
  6. In the left-side navigation pane, click Components.
  7. In the Components list, click the Data Source/Target folder, and drag and drop the Write MaxCompute Table component into the canvas twice. Then, rename the two components as user_item_data and item_item_data respectively.
  8. Connect the output port of cf_training_data to the input port of user_item_data, and connect the output port of Collaborative Filtering (etrec) to the input port of item_item_data.
  9. On the top of the canvas, click Run

Step 2: Import the training data to OTS

The generated training data is stored in MaxCompute. You must import the data to OTS before you can use the data in AutoLearning.

  1. For more information about how to create an OTS instance, see Create tables.
    Create a User-item table and an Item-item table. The column names and primary key of the tables must be the same as those described in the Data description section.
  2. Use DataWorks to import the training data to OTS.
    1. Add an OTS connection to DataWorks.
    2. Create a batch sync node.
    3. Specify the source and target of the connection. For more information, see Create a sync node by using the codeless UI.
      Wizard mode is not supported for adding OTS connections. Click Switch to the code editor and import the following script.
      {
          "type": "job",
          "steps": [
              {
                  "stepType": "odps",
                  "parameter": {
                      "partition": [],
                      "datasource": "odps_first",
                      "column": [
                          "user_id", //A column name of the MaxCompute table.
                          "item_id" //A column name of the MaxCompute table.
                      ],
                      "table": "user_item_data" //The name of the MaxCompute table.
                  },
                  "name": "Reader",
                  "category": "reader"
              },
              {
                  "stepType": "ots",
                  "parameter": {
                      "datasource": "otc_data", //The name of the OTS resource configured in the Data Integration module.
                      "column": [
                          {
                              "name": "item_ids", //The name of the OTS field.
                              "type": "STRING"
                          }
                      ],
                      "writeMode": "UpdateRow",
                      "table": "user_item", //The name of the OTS table.
                      "primaryKey": [
                          {
                              "name": "user_id", //The name of the primary key.
                              "type": "STRING"
                          }
                      ]
                  },
                  "name": "Writer",
                  "category": "writer"
              }
          ],
          "version": "2.0",
          "order": {
              "hops": [
                  {
                      "from": "Reader",
                      "to": "Writer"
                  }
              ]
          },
          "setting": {
              "errorLimit": {
                  "record": ""
              },
              "speed": {
                  "throttle": false,
                  "concurrent": 2
              }
          }
      }
      Delete the annotations when you use the preceding script.
  3. View the imported data in OTS
    1. Log on to the OTS console. On the Overview page, click the instance in the Instance Name column, or click Manage Instance in the Actions column.
    2. On the Instance Details tab, navigate to the Tables section, and click Data Editor in the Actions column.
      Machine Learning Studio separates the generated values with space characters, as shown in the following similarity column. AutoLearning supports only the comma (,) delimiter. You must preprocess the data by using the SQL tool of DataWorks before you import the data to OTS.Values separated with space characters

Step 3: Configure matching strategies

  1. Perform the following steps to navigate to the AutoLearning page.
    1. Log on to the Machine Learning Platform for AI console.
    2. In the left-side navigation pane, choose AutoLearning > General Purpose Model Training.
  2. On the AutoLearning page, click Create Instance.
  3. On the Create Instance page, set the Instance type to Rec-Matching System.
  4. Enter test in the Instance Name field. Then, click Confirm.
  5. On the AutoLearning page, click Open in the Operation column.
  6. In the Matching strategy configuration wizard, select the imported tables, and set Matching number to 100.
  7. Click Add to strategy list.
  8. Click Next step.

Step 4: Deploy and test the model

  1. In the Data filter strategy configuration wizard, click Deploy and test.
  2. In the Model deployment and testing wizard, specify user ID, and set Recommendation results to 10.
  3. Click Send test request.
  4. In the Debug information section, view the recommendation list. If the list meets your requirements, click Deploy to EAS to deploy the model to Elastic Algorithm Service (EAS) as a RESTful API.