×
Community Blog AIRec Deployment on Alibaba Cloud: Step-by-Step Instructions and Troubleshooting

AIRec Deployment on Alibaba Cloud: Step-by-Step Instructions and Troubleshooting

For those who want to build a recommendation engine on cloud, there are 2 approaches. The first is to build it on our data platform built on top of Da...

By Sunny Jovita, Solution Architect Intern

For those who want to build a recommendation engine on cloud, there are 2 approaches. The first is to build it on our data platform built on top of DataWorks, PAI, and MaxCompute as the data warehouse (open implementation of collaborative filtering).

The second, more recommended approach is to use AI Rec. AI Rec is an open box solution for anyone who is seeking to create their own recommendation engine. This public cloud service provides pre-built templates and an SDK that adheres to the best practices that is found by experience implementing this solution for Alibaba Group's business units.

Step 1: Prepare business data, item data, and behavioral data for model training based on your industry-specific data specifications.

Industry types:
‒ News Industry
‒ Content Industry
‒ E-Commerce Industry

For this lab, let's use some dummy content data that can be found here. This contains transactional data from Taobao, and this training data is suitable for content sharing platforms. You can use this template to recommend content that has sharing attributes, such as liking and forwarding. The recommended content can be short text, articles, images, or a combination of them.

Note: We recommend you to deduplicate items before upload the table.

Data description

Link for table schema: https://www.alibabacloud.com/help/en/airec/latest/content-industry

For content scene, you must prepare 3 tables:

1) Item table: This table contains the items that you want to recommend to users. The item_id and item_type fields are used together to uniquely identify an item. AIRec performs model training and recommends appropriate item data to each user.

Note: we recommend that you provide an item table that contains valid data. This can avoid noise caused by invalid data and improve the recommendation effect.

2) User table: A user table contains information related to users. AIRec performs data training based on user preferences and recommends the content in which users are most interested. This table contains information related to users (for example, users who have recently logged on to the system). You can use imed field or a combination of the user_id and imei fields to identify users. For instance, for a certain case, you can use user_id to identify users who have logged on and user imei to identify users who have not logged on.

Note: AIRec performs data training based on user preferences and recommends the content in which users are most interested

(3) Behavior (events) table: A behavior table contains behavioral data related to recommendations.This is the core for model training in AIRec.

Note: If behavioral data cannot be provided due to technical reasons or no historical data is available, you can use the test data provided by AIRec. (In this case, the recommendation model may return unsatisfactory results for about two weeks. The effect gradually becomes better and gets stable at last as more data is accumulated).

Step 2: Once you have your training data ready, import it into MaxCompute

AIRec supports two data upload methods.You can select a method based on your data needs. When data changes, make sure that the changes can be synchronized to AIRec in real-time.

blog_deploying_airec_picture

For this lab, we will use the historical data method of using AI rec.

There are 3 methods to upload data to MaxCompute:
Method 1: Use MaxCompute DDL to import the data from local.

Method 2: Use OSS -> DataWorks Data Integration -> Batch Synchronization (upload the data to the table)-> Data analytics (to create the table)

Method 3: Use MaxCompute Client and tunnel command.

In this tutorial, we will use MaxCompute DDL to upload the historical data

Create MaxCompute Project

Create maxcompute using standard version, please make sure the region is same with your AIRec.

  1. Go to the MaxCompute page.
  2. Log on to the here.
  3. On the left-side navigation pane, click Create project.

blog_deploying_airec_picture_2

Method 1: Create MaxCompute Table

1) After creating the project, click Data Development.

blog_deploying_airec_picture_3

2) Click Create Table.

blog_deploying_airec_picture_4

3) Fill in the required specifications.

blog_deploying_airec_picture_5

blog_deploying_airec_picture_6

4) Click DDL.

blog_deploying_airec_picture_7

5) Paste the SQL statement and click Generate Table Schema.

You can find the SQL statement for content industry in here.
blog_deploying_airec_picture_8

Item table

CREATE TABLE IF NOT EXISTS itemtable
(
    item_id          STRING COMMENT 'Unique ID of the item',
    item_type        STRING COMMENT 'Item type',
    title            STRING COMMENT 'Item title',
    content          STRING COMMENT 'Body part of the item',
    user_id          STRING COMMENT 'ID of the user who published the item',
    pub_time         STRING COMMENT 'Publish time',
    status           STRING COMMENT 'Whether the item can be recommended',
    expire_time      STRING COMMENT 'Time at which the item expires, accurate to the second',
    last_modify_time STRING COMMENT 'Last modification time of item information',
    scene_id         STRING COMMENT 'Scene ID',
    duration         STRING COMMENT 'Duration, in seconds',
    category_level   STRING COMMENT 'Category level',
    category_path    STRING COMMENT 'Category path',
    tags             STRING COMMENT 'Tags',
    channel          STRING COMMENT 'Channels',
    organization     STRING COMMENT 'Organizations',
    author           STRING COMMENT 'Authors',
    pv_cnt           STRING COMMENT 'Exposures',
    click_cnt        STRING COMMENT 'Clicks',
    like_cnt         STRING COMMENT 'Likes',
    unlike_cnt       STRING COMMENT 'Dislikes',
    comment_cnt      STRING COMMENT 'Comments',
    collect_cnt      STRING COMMENT 'Favorites',
    share_cnt        STRING COMMENT 'Shares',
    download_cnt     STRING COMMENT 'Downloads',
    tip_cnt          STRING COMMENT 'Rewards',
    subscribe_cnt    STRING COMMENT 'Follows',
    source_id        STRING COMMENT 'Item source',
    country          STRING COMMENT 'Country',
    city             STRING COMMENT 'City',
    features         STRING COMMENT 'Additional features',
    num_features     STRING COMMENT 'Additional features of a numeric type',
    weight           STRING COMMENT 'Weight of the item, default value: 1'
) 
LIFECYCLE 30;

User table

CREATE TABLE IF NOT EXISTS usertable
(
    user_id          STRING COMMENT 'Unique user ID',
    user_id_type     STRING COMMENT 'Registration type of the user',
    third_user_name  STRING COMMENT 'Third-party user name',
    third_user_type  STRING COMMENT 'Third-party platform name',
    phone_md5        STRING COMMENT 'MD5 hash of the mobile phone number of the user.',
    imei             STRING COMMENT 'Device ID of the user',
    content          STRING COMMENT 'User content',
    gender           STRING COMMENT 'Gender',
    age              STRING COMMENT 'Age',
    age_group        STRING COMMENT 'Age group',
    country          STRING COMMENT 'Country',
    city             STRING COMMENT 'City',
    ip               STRING COMMENT 'Last logon IP address',
    device_model     STRING COMMENT 'Device model',
    register_time    STRING COMMENT 'Registration time',
    last_login_time  STRING COMMENT 'Last logon time',
    last_modify_time STRING COMMENT 'Last modification time of user information',
    tags             STRING COMMENT 'User tags',
    source           STRING COMMENT 'Source of the user',
    features         STRING COMMENT 'Additional user features of the STRING type',
    num_features     STRING COMMENT 'Additional user features of a numeric type'
) 
LIFECYCLE 30;

Behavior table

CREATE TABLE IF NOT EXISTS behaviortable
(
    trace_id     STRING COMMENT 'Request tracking ID',
    trace_info   STRING COMMENT 'Request tracking information',
    platform     STRING COMMENT 'Client platform',
    device_model STRING COMMENT 'Device model',
    imei         STRING COMMENT 'Device ID',
    app_version  STRING COMMENT 'App version number',
    net_type     STRING COMMENT 'Network type',
    longitude    STRING COMMENT 'Longitude',
    latitude     STRING COMMENT 'Latitude',
    ip           STRING COMMENT 'Client IP address',
    login        STRING COMMENT 'Whether the user has logged on',
    report_src   STRING COMMENT 'Report source',
    scene_id     STRING COMMENT 'Scene ID',
    user_id      STRING COMMENT 'User ID',
    item_id      STRING COMMENT 'Item ID',
    item_type    STRING COMMENT 'Item type',
    module_id    STRING COMMENT 'Module ID',
    page_id      STRING COMMENT 'Page ID',
    position     STRING COMMENT 'Position of the item',
    bhv_type     STRING COMMENT 'Behavior type',
    bhv_value    STRING COMMENT 'Behavior details',
    bhv_time     STRING COMMENT 'Time at which the behavior occurs'
) ;

6) Click Commit to Development Environment

9

Import Data to MaxCompute Table
1) Click Import Data.

10

2) Type the MaxCompute Table that you want the data to be imported, click Next.

11

3) Click Browse to upload the data and click Next.

12

4) Click Import Data.

13

Create Ad Hoc Query
In order to make sure the data is surely imported, we can create an ad hoc query.

1) Inside the Ad Hoc Query page, click ODPS SQL.

14

2) Create an SQL statement.

15

3) The data has been imported successfully.

16

Step 3: Use MaxCompute client to grant permissions to the particular AI rec user.

Download and Configure MaxCompute Client
-Link to install Java 8 or later: https://www.java.com/download/ie_manual.jsp
-Link to use odpscmd/MaxCompute Client: https://www.alibabacloud.com/help/en/maxcompute/latest/maxcompute-client
1) Inside the conf folder, edit the odps_config file. Fill out the project name, access id, access key, and endpoint.
59
2) Run the./bin/odpscmd command to enter the MaxCompute environment.
60

Grant Permissions on MaxCompute to AIRec

Perform the following operations to authorize your 1619920497425387 account.
https://www.alibabacloud.com/help/en/airec/latest/grant-permissions-on-offline-storage-to-airec

Step 4. Create AI Rec instance on Alibaba Cloud console (you are unable to create the instance before granting the permissions)

Create an instance
1.Select an industry template.
61
2.Select Historical Data-Based Start method to start an AIRec instance.

Note:After you use baseline data in MaxCompute to start an AIRec instance,you no longer need to

maintain data in MaxCompute.You can use server SDKs to upload incremental data.
62

3.Configure data sources to start an AIRecinstance.

Note:before you configure data sources,you must grant permissions to AIRec in MaxCompute.

63

4.Clik Next.
64
5.Start an Instance.
It takes a while to start the instance.
65
6.Wait until the status says running.
66
7.You can view the related information immediately after the instance is started.
67

Step 5. Use Server SDKs to upload incremental user data, item data, and behavioral data.

Alibaba python SDK allows you to access Alibaba Cloud services such as ECS,OSS,and Resource Access Management.You can access Alibaba Cloud services without the need to handle API related tasks,such as
signing and constucting your requests.

1.Prerequisites

-Have access key
-Activated MaxCompute
-Activated AIRec console
-Upload data to MaxCompute

2.Install Python SDK

Alibaba Cloud python SDK supports python 2.7.x and python 3.x.

Check your python version
python --version

Install pip.
If pip is not installed,see the pip user guide to install pip.

‒ Install the individual libraries.
Install the core library:
Python 2.x:
pip install aliyun-python-sdk-core
Python 3.x:
pip install aliyun-python-sdk-core-v3
Install the AIRec SDK for python
pip install aliyun-python-sdk-airec

3.Download the source code of AIRec SDK for Python.
https://github.com/aliyun/aliyun-openapi-python-sdk/tree/master/aliyun-python-sdk-airec?spm=a2c63.p38356.0.0.1ad02734SsmqQY

4.Use Alibaba Cloud SDK for Python

You can perform the following operations to use Alibaba Cloud SDK for Python:
‒ After you downloaded the SDK file,create a new file called push-data.py
‒ Paste this code into the new python file.

#!/usr/bin/python
#coding=utf-8
fromaliyunsdkcore.clientimportAcsClient
fromaliyunsdkcore.acs_exception.exceptionsimportClientException
fromaliyunsdkcore.acs_exception.exceptionsimportServerException
fromaliyunsdkairec.request.v20181012.PushDocumentRequestimportPushDocumentRequest
#CreateaclientoftheAcsClientclass.
client=AcsClient(
ak="AlibabaCloudAccessKeyID",
secret="AlibabaCloudAccessKeysecret",
#Entertheregion.IftheregionisChina(Beijing),entercn-beijing.
region_id="cn-hangzhou"
)
#Configuretheendpoint.
#SpecifytheregionID,servicename,andendpoint.
client.add_endpoint("cn-hangzhou","Airec","airec.cn-hangzhou.aliyuncs.com")
#Createarequestandconfigureparameters.
#CreatearequestforaspecificAPIoperation.TheclassoftherequestisnamedbyaddingRequesttothe
endoftheAPIoperationname.
#Forexample,thenameoftheAPIoperationthatisusedtoobtainpusheddocumentsisPushDocument.In
thiscase,thenameoftherequestclassisPushDocumentRequest.
request=PushDocumentRequest()
request.set_instanceId("InstanceID")
request.set_tableName("item")
#Configureparametersfortherequest.
content="JSON-formatteddata"
request.set_content(content)
request.set_content_type("application/json")
#Initiatetherequestbyusingthemethodthatissupportedbytheclient,obtaintheresponse,andhandle
theexception.
try:
response=client.do_action_with_exception(request)
print(response)
exceptClientExceptionase:
print(e)
exceptServerExceptionase:
print(e)

‒ change version aliyunsdkairec to the latest version (line 7) check version in github
https://github.com/aliyun/aliyun-openapi-python-sdk/tree/master/aliyun-python-sdk-airec/aliyunsdkairec/request
68
from aliyunsdkairec.request.20181012.PushDocumentRequest import PushDocumentRequest
change to
from aliyunsdkairec.request.v20201126.PushDocumentRequest import PushDocumentRequest
69
ak : yourkeyid (line11)
secret : yoursecretkey (line12)
region_id : ap-southeast-1forsingapore (line14)
endpoint : "ap-southeast-1","Airec","airec.ap-southeast-1.aliyuncs.com" (line18)
"InstanceID" : yourairecinstanceid (line23)
‒ content : change with data you want to push (line26),for sample data you can get data specification
from this link.

Final Code

#! /usr/bin/python
# coding=utf-8

from aliyunsdkcore.client import AcsClient
from aliyunsdkcore.acs_exception.exceptions import ClientException
from aliyunsdkcore.acs_exception.exceptions import ServerException
from aliyunsdkairec.request.v20201126.PushDocumentRequest import PushDocumentRequest

# Create a client of the AcsClient class.
client = AcsClient(
    ak="Alibaba Cloud AccessKey ID",
    secret="Alibaba Cloud AccessKey secret",
    #Enter the region. If the region is China (Beijing), enter cn-beijing.
    region_id="cn-hangzhou"
)
# Configure the endpoint.
# Specify the region ID, service name, and endpoint.
client.add_endpoint("ap-southeast-1", "Airec", "airec.ap-southeast-1.aliyuncs.com")
# Create a request and configure parameters.
# Create a request for a specific API operation. The class of the request is named by adding Request to the end of the API operation name.
# For example, the name of the API operation that is used to obtain pushed documents is PushDocument. In this case, the name of the request class is PushDocumentRequest.
request = PushDocumentRequest()
request.set_instanceId("Instance ID")
request.set_tableName("item")
# Configure parameters for the request.
content = "[{ \
        \"cmd\": \"add\", \
        \"fields\": { \
            \"item_id\": \"120\", \
            \"item_type\": \"image\", \
            \"title\": \"spiderman\", \
            \"weight\": \"1\", \
            \"tags\": \"Action, Family\", \
            \"status\": \"1\", \
            \"scene_id\": \"1\", \
            \"content\": \"Content\", \
            \"pub_time\": \"1590327038\" \
        } \
    }]"

request.set_content(content)
request.set_content_type("application/json")
# Initiate the request by using the method that is supported by the client, obtain the response, and handle the exception.
try:
    response = client.do_action_with_exception(request)
    print(response)
except ClientException as e:
    print(e)
except ServerException as e:
    print(e)

5.Run the Python file.

6.View the increment data.

-In the AIRec Instance, click Data Query page, and select Item data.
-Specify the item ID that has been pushed. Click Search.
70

7.You will see the data is uploaded successfully to the console.
71
For further information about Python SDK, click link.

Step 6. Create a Recommendation Scene.

Scene
Scene refers to the different sections of recommendations created under the feature's policy such as "You might also like" based on all the products on the homepage, "Products you might like" by category on each product page, and related recommendations based on the products ordered on the details page.
1) On the Experiment Parameter Settings page, click Create Recommendation Scene.

72
2) Click Create Scene.

73
3) Fill out the required details.

Note: If you are confused what Scene ID is, you can find the information through this link.

74

4) Select Use Field scene_id in Table item, click Next.
75
5) Click Publish.

76

Step 7. Download and use the Python SDK to push a new row into the recommendation engine pipeline.

This is the record that we are trying to get recommendations for.
1) On the scenario building page, click Test to get the recommendation.
77

2) Specify the user id and how many recommendations you want to get as a result. Click Request Results.

78
3) You will see the related recommendation content below.

79
The following figure shows an example of behavior related to recommendations. After a user performs the operations shown in the preceding figure, five recommendation-related behavior entries are generated. The five behavior entries can be recorded and uploaded to AIRec as required by clicking the Add Behavior button.

Note 1: Modify Data Source

If you want to modify the table that has been uploaded to the AIRec instance, you can modify them by following this operations:
Prerequisite
-Make sure your other table has been granted permissions on MaxCompute Client.
1) On the Data Source section, click Modify Data Source to the table that you want to modify.

80
2) Upload the table name.
Do not forget to grant permisions in the MaxCompute Client.
81

3) Wait for the instance to restart.

82

0 2 0
Share on

Alibaba Cloud Indonesia

96 posts | 12 followers

You may also like

Comments