The general-purpose model training plug-in allows you to automatically train models, tune hyperparameters, and evaluate models in scenarios where matching recall is required. This topic describes how to combine the collaborative filtering strategy of matching recall with the matching algorithms of Machine Learning Studio to streamline a complete recall process.

Data description

To use the collaborative filtering strategy, you must import a User-Item table and an Item-Item table to Tablestore. The tables must meet the following requirements:
  • The User-Item table stores the information about the historical behavior of a user, such as the behavior of purchasing, clicking, and adding to favorites. Schema
    • user_id: the ID of the user.
    • item_id: the ID of the commodity.
    • active_type: the behavior of the user. Values of 0, 1, and 2 indicate the behavior of clicking, purchasing, and adding to favorites.
    The data must be stored in Tablestore in the formats that are shown in the following figure. user_id is the primary key. item_ids lists the items that correspond to each user_id. Items are separated by commas (,). Data format
  • The Item-Item table stores the information about item similarities that are calculated based on the collaborative filtering strategy. Item-Item table example
    • item_id: the matched commodity.
    • similar_item_ids: stores key:value pairs. key indicates the ID of a source commodity, and value indicates the similarity between the source commodity and the matched commodity. A larger value indicates higher similarity. key:value pairs are separated by commas (,).
    The data must be stored in Tablestore in the formats that are shown in the following figure. item_ids is the primary key. similar_item_ids lists the items that are similar to each item. Items are separated by commas (,). Items without weights are supported. Data format

Procedure

To combine the collaborative filtering strategy of matching recall and the matching algorithms of Machine Learning Studio to build a recall process, perform the following steps:
  1. Step 1: Generate training data

    Use Machine Learning Studio to create an experiment and generate training data.

  2. Step 2: Import the training data to Tablestore

    Import the training data to Tablestore in the required format. For more information, see the Data description section of this topic.

  3. Step 3: Create an instance

    Create a model instance for matching recall.

  4. Step 2: Configure matching strategies

    Configure a matching strategy to reduce the number of recommendation candidates.

  5. Step 5: Deploy and test the model

    AutoLearning automatically deploys the matching recall solution as an online service based on the matching strategy. After the service is tested, the solution can be deployed to EAS.

Step 1: Generate training data

  1. Go to a Machine Learning Studio project.
    1. Log on to the Machine Learning Platform for AI (PAI) console.
    2. In the left-side navigation pane, choose Model Training > Visualized Modeling (Machine Learning Studio).
    3. In the upper-left corner of the page, select the region that you want to manage.
    4. Optional:In the search box on the PAI Visualization Modeling page, enter the name of a project to search for the project.
    5. Find the project and click Machine Learning in the Actions column.
  2. In the left-side navigation submenu, click Home.
  3. Click Create below [Recommended Algorithms] Product Recommendation.
  4. On the canvas, retain the following components and delete the other components. Model componentsThe cf_training_data component corresponds to the data in the User-Item table. The Collaborative Filtering (etrec) component corresponds to the data in the Item-Item table.
  5. Click the Collaborative Filtering (etrec) component. On the Parameters Setting tab of the right-side panel, set the Top N parameter to 5. This way, five similar items are returned for each specified item.
  6. In the left-side navigation submenu, click Components.
  7. In the components list, click the Data Source/Target folder and drag the Write MaxCompute Table component to the canvas twice. Then, rename the two components as user_item_data and item_item_data respectively.
  8. Connect the output port of the cf_training_data component to the input port of the user_item_data component. Connect the output port of the Collaborative Filtering (etrec) component to the input port of the item_item_data component.
  9. On the top of the canvas, click Run

Step 2: Import the training data to Tablestore

The training data that is generated in Machine Learning Studio is stored in MaxCompute. You must import the training data to Tablestore before you can use the training data in AutoLearning.

  1. For more information about how to create a Tablestore table, see Create tables.
    Create a User-Item table and an Item-Item table. The column names and primary key of the tables must be the same as those described in the Data description section of this topic. The following figure shows an example of the User-Item table.
  2. Use DataWorks to import the training data to Tablestore.
    1. Add a Tablestore connection to DataWorks. For more information, see Configure a Tablestore connection.
    2. Create a batch sync node. For more information, see Create a batch sync node.
    3. Specify the Source and Target connection. For more information, see Configure a sync node by using the codeless UI.
      Wizard mode is not supported for adding Tablestore connections. Click Switch to Code Editor and import the following script:
      {
          "type": "job",
          "steps": [
              {
                  "stepType": "odps",
                  "parameter": {
                      "partition": [],
                      "datasource": "odps_first",
                      "column": [
                          "user_id", // A column name of the MaxCompute table. 
                          "item_id"  // A column name of the MaxCompute table. 
                      ],
                      "table": "user_item_data" // The name of the MaxCompute table. 
                  },
                  "name": "Reader",
                  "category": "reader"
              },
              {
                  "stepType": "ots",
                  "parameter": {
                      "datasource": "otc_data", // The name of the Tablestore resource configured in Data Integration. 
                      "column": [
                          {
                              "name": "item_ids", // The names of Tablestore fields. 
                              "type": "STRING"
                          }
                      ],
                      "writeMode": "UpdateRow",
                      "table": "user_item",// The name of the Tablestore table. 
                      "primaryKey": [
                          {
                              "name": "user_id",  // The primary key of the Tablestore table. 
                              "type": "STRING"
                          }
                      ]
                  },
                  "name": "Writer",
                  "category": "writer"
              }
          ],
          "version": "2.0",
          "order": {
              "hops": [
                  {
                      "from": "Reader",
                      "to": "Writer"
                  }
              ]
          },
          "setting": {
              "errorLimit": {
                  "record": ""
              },
              "speed": {
                  "throttle": false,
                  "concurrent": 2
              }
          }
      }
      Note If you use the preceding script, delete the annotations.
  3. View the imported data in Tablestore.
    1. Log on to the Tablestore console.
    2. On the Overview page, click the instance in the Instance Name column or click Manage Instance in the Actions column.
    3. On the Instance Details tab of the Instance Management page, click the name of the table in the Tables section.
    4. On the Manage Table page, click the Query Data tab to view the imported data.
      Values generated in Machine Learning Studio are separated by spaces. The following figure shows an example of the similarity values that are separated by spaces. However, AutoLearning supports only the comma (,) delimiter. Therefore, you must preprocess the data by using the SQL tool of DataWorks before you import the data to Tablestore. Values separated by spaces

Step 3: Create an instance

  1. Go to the General Purpose Model Training page.
    1. Log on to the Machine Learning Platform for AI (PAI) console.
    2. In the left-side navigation pane, choose AI Industry Plug-In > General Purpose Model Training.
  2. On the General Purpose Model Training page, click Create Instance.
  3. In the Create Instance panel, set the parameters.
    Parameter Description
    Instance type The type of the instance. Set the Instance type parameter to Rec-Matching System. Valid values:
    • Image Classification
    • Rec-Matching System
    Instance name The name of the instance. Set the instance name to test.
    Example Description The description of the instance. Enter Perform matching recall by using the matching algorithms of Machine Learning Studio in the Example Description field.
    Storage dependency The storage service used by the matching recall feature. To use this feature, you must store your training data in Tablestore. For more information, see Create tables. If AutoLearning is not authorized to access Tablestore within your Alibaba Cloud account, click Authorize Now below the field.
    Instance binding The Tablestore instance used to store your training data. Select the created Tablestore instance.
  4. Click Confirm.

Step 2: Configure matching strategies

  1. On the General Purpose Model Training page, find the model instance that you created and click Open in the Operation column.
  2. On the Collaborative Filtering Recall tab, set the parameters as required.
    Parameter Description
    Strategy name The name of the strategy. Enter pai_rec in the Strategy name field.
    User-Item table The Tablestore table that stores user-item correlations for the matching strategy.
    Item-Item table The Tablestore table that stores data between items for the matching strategy.
    Matching number The number of items to be returned based on the matching strategy. Set this parameter to 100.
  3. Click Add to strategy list.
  4. Click Next step.

Step 5: Deploy and test the model

  1. In the Data filter strategy configuration step, click Deploy and test.
  2. In the Deployment confirmation message, check the configured matching and filtering strategies. Then, click OK.
  3. In the Test Module section, specify a user ID and set the Number of matching results parameter to 10.
  4. Click Send test request. 1
  5. In the Debug information section, view the return results. If the results meet your requirements, click Deploy to EAS to deploy the model instance to EAS as a RESTful service. For more information about how to deploy a model instance to EAS, see Upload and deploy models in the console.