Before you use PAI-Rec to build a recommendation system, you must prepare basic data and analyze user features for model training and calibration. This topic describes data specifications for typical scenarios.
Background information
Feature data generally includes the following basic tables:
User table:
Contains feature data related to users. This table is used to describe personal information, preferences, and behavioral habits of users.
User IDs in a user table are unique. A user table and a behavior table can be associated by using the unique ID. A user table must contain basic user information such as age, gender, city, points, registration time, and user tags. Each partition represents one day and records all user information.
Item table:
Contains information about recommended items. This table is used to describe the properties and characteristics of items.
Item IDs in an item table are unique. An item table and a behavior table can be associated by using the unique ID. An item table must contain basic item information such as level 1 category, level 2 category, price, title, color, specifications, listing time, author ID, and number of followers. Each partition represents one day and records all item information.
Behavior table:
Contains behaviors between users and items, detailing what actions a user took on an item and when.
Prerequisites
You have created a MaxCompute project and associated it with a DataWorks workspace.
Procedure
To help you quickly get started, PAI-Rec has prepared three tables in the pai_online_project project in MaxCompute. You can clone the data to your own project for use.
Log on to the DataWorks console. In the left-side navigation pane, choose Data Development & O&M > Data Development.
On the Data Development page, select the DataWorks workspace that you created and click Go To Data Ddevelopment.
On the page that appears, move the pointer over Create and choose Create Node > MaxCompute > ODPS SQL. In the dialog box that appears, configure the node parameters and click Confirm.
On the tab of the node that you created, run the following SQL statements to synchronize the user table, item table, and label table from the pai_online_project project to your MaxCompute project.
-- User table CREATE TABLE IF NOT EXISTS rec_sln_demo_user_table( user_id BIGINT COMMENT 'The unique ID of the user', gender STRING COMMENT 'The gender', age BIGINT COMMENT 'The age', city STRING COMMENT 'The city', item_cnt BIGINT COMMENT 'The number of created content items', follow_cnt BIGINT COMMENT 'The cumulative number of follows', follower_cnt BIGINT COMMENT 'The cumulative number of followers', register_time BIGINT COMMENT 'The registration time', tags STRING COMMENT 'The user tags' ) PARTITIONED BY (ds STRING) STORED AS ALIORC; INSERT OVERWRITE TABLE rec_sln_demo_user_table PARTITION(ds) SELECT * FROM pai_online_project.rec_sln_demo_user_table WHERE ds > "20221231" and ds < "20230217"; -- Item table CREATE TABLE IF NOT EXISTS rec_sln_demo_item_table( item_id BIGINT COMMENT 'The content ID', duration DOUBLE COMMENT 'The video duration', title STRING COMMENT 'The title', category STRING COMMENT 'The level 1 tag', author BIGINT COMMENT 'The author', click_count BIGINT COMMENT 'The cumulative number of clicks', praise_count BIGINT COMMENT 'The cumulative number of likes', pub_time BIGINT COMMENT 'The publication time' ) PARTITIONED BY (ds STRING) STORED AS ALIORC; INSERT OVERWRITE TABLE rec_sln_demo_item_table PARTITION(ds) SELECT * FROM pai_online_project.rec_sln_demo_item_table WHERE ds > "20221231" and ds < "20230217"; -- Behavior table CREATE TABLE IF NOT EXISTS rec_sln_demo_behavior_table( request_id STRING COMMENT 'The request tracking ID/request ID', user_id STRING COMMENT 'The unique ID of the user', exp_id STRING COMMENT 'The experiment ID', page STRING COMMENT 'The page', net_type STRING COMMENT 'The network type', event_time BIGINT COMMENT 'The behavior time', item_id STRING COMMENT 'The content ID', event STRING COMMENT 'The behavior type', playtime DOUBLE COMMENT 'The playback duration/reading duration' ) PARTITIONED BY (ds STRING) STORED AS ALIORC; INSERT OVERWRITE TABLE rec_sln_demo_behavior_table PARTITION(ds) SELECT * FROM pai_online_project.rec_sln_demo_behavior_table WHERE ds > "20221231" and ds < "20230217";
You can also refer to Appendix: Data specifications for common scenarios to prepare your own user table, item table, and behavior table.
Appendix: Data specifications for common scenarios
E-commerce recommendation scenario
The following table lists the recommended fields for the user table, item table, and behavior table in the e-commerce scenario. If there are other feature fields not covered, you can add them yourself. The more complete and rich the fields are, the better the recommendation effect will be. The field names do not need to match exactly those in the tables.
User table
Item table
Behavior table
Content recommendation scenario
The following table lists the recommended fields for the user table, item table, and behavior table in the content recommendation scenario. If there are other feature fields not covered, you can add them yourself. The more complete and rich the fields are, the better the recommendation effect will be. The field names do not need to match exactly those in the table.
User table
Item table
Behavior table
Video recommendation
The following tables describe recommended fields in user, item, and behavior tables in video recommendation scenarios. Configuring more fields will give you better recommendation results. You can also provide additional fields that are not listed in the following tables to further improve the results. The names of fields do not need to be the same as the ones in the following tables.
User table
Item table
Behavior table
Live streaming recommendation
The following tables describe recommended fields in user, item, and behavior tables in live streaming recommendation scenarios. Configuring more fields will give you better recommendation results. You can also append fields that are not listed in the following tables to further improve the results. The names of fields do not need to be the same as the ones in the following tables.