Overview
This topic describes the data specifications for the e-commerce industry. You must comply with these specifications when you upload historical data and real-time data.
Data description
If your business belongs to the e-commerce industry, you must prepare three data tables when you use Artificial Intelligence Recommendation (AIRec).
Item table: This table refers to a commodity table in this topic.
This table contains all the recent commodities that can be recommended for the current scene. The number of items is limited by quota. We recommend that you deduplicate items before you upload the table. The item_id and item_type fields are used together to uniquely identify an item.
User table: This table contains all users who recently log on to the system.
The number of users is limited by quota. We recommend that you deduplicate users before you upload the table. You can use the imei field or a combination of the user_id and imei fields to uniquely identify a user. For example, in the latter case, you can use user_id to identify users who log on and use imei to identify users who do not log on.
Make sure that all users are unique. When you request recommendation results, you must specify the unique identifiers of users. Otherwise, personalized recommendations cannot be achieved.
Behavior table: This table contains recent behavioral data in the current scene. We recommend that you provide behavioral data of the last one to two weeks.
If behavioral data cannot be provided due to technical reasons or no historical data is available because the scene is new, you can use the test data provided by AIRec. In this case, the results of the first two weeks returned by the recommendation model may not meet your expectations. The recommendation effect gradually becomes better and more stable as more data is accumulated.
We recommend that you specify as many optional fields in the tables as possible. The more valid optional fields you specify, the better the recommendation effect is. If an optional field is not specified, the system uses its default value.
Table schema
1. You must specify the fields that are marked as "Required" in the Required column of the following tables. The fields that are marked as Required and Recommended in the "Required" column have significant impacts on the recommendation results. The fields are described in the "Value description" column.
2. When you use MaxCompute to upload historical data, you must create a MaxCompute table. In this case, you can leave optional fields empty in the table. However, the table must contain all fields. For more information about the statements to create a table, see the "CREATE TABLE statements" section of this topic.
item
Field | Data type | Required | Field description | Valid value | Value description | Example |
item_id | string | Yes | The unique ID of an item. The value can contain only letters and digits. | Custom | 1. The item_id and item_type fields uniquely identify an item. 2. The item_id field can be up to 50 characters in length. Important The reported item IDs must be recorded for later use. | 34513 |
item_type | string | Yes | The type of the item. | image, article, video, shortvideo, item, recipe, and audio. If the enumerated values do not meet your business requirements, contact technical support. |
| article |
status | string | Yes | Specifies whether the item can be recommended. | 0/1 | 1. When the value of this field is 1, the item can be recommended. 2. When the value of this field is 0, the item cannot be recommended. Note: 1. If you change the value from 0 to 1, the item is added to the recommendation list after a scheduling period of about 1 hour. 2. If you change the value from 1 to 0, the item is immediately removed from the list. | 1 |
scene_id | string | Yes | The ID of the scene. Items are released to different scenes. Scenes are designed based on the types of users that visit the web pages. | Custom | 1. We recommend that you use an acronym or a combination of letters and digits. 2. Do not use colons (:). 3. Do not set this field to -102. This value is an internally reserved value of the system. 4. If only one scene is available, set this field to 1. 5. You can set this field to multiple scene IDs and separate them with commas (,). The scene IDs match different web pages to which commodities are launched. For more information, see Use scene IDs. | sy101, gwc102. The value sy101 indicates the homepage. The value gwc102 indicates the shopping cart page. |
pub_time | string | Yes | The time at which the item is released. The value is a UNIX timestamp that is accurate to the second. This field is used to determine whether the item is the latest item. | Custom | If you have high requirements for timeliness, this field is required. This field is used for the recommendation of new items. | 1520327038 |
expire_time | string | No | The time at which the item expires. The value is a UNIX timestamp that is accurate to the second. | Custom | 1. If the current system time of your server is later than the value of this field, the item expires and is no longer recommended. 2. If all items in the table expire, the service cannot be started. 3. If this field is left empty, the item never expires. | 1520327038 |
last_modify_time | string | No | The time at which the item information was last modified. The value is a UNIX timestamp that is accurate to the second. | Custom | If you make major updates to a published item and have high requirements for timeliness, you can update this field. This field functions similarly to pub_time. Both of the fields are used to identify new items. | 1520327038 |
title | string | Recommended | The title of the item. | Custom | This field is used for in-depth semantic analysis. If this field is left empty, some effects of the algorithm may be lost. We recommend that you set this field. | CHANEL-Style Lady Dress |
weight | string | Recommended | Specifies whether the item is weighted. | Custom | Note: 1. For a weighted item, set this field to 100. For an unweighted item, set this field to 1. 2. You must set this field to 100 or 1. Other values are invalid. 3. We recommend that you keep the number of weighted items less than or equal to 10% of the total number of items. 4. A weighted item is more likely to be recommended. 5. Weighting affects the performance of the training model. We recommend that you set the weight of items with caution. | 1 |
category_level | string | Recommended | The category level, such as level 3. | Custom | If this value does not match the value of the category_path field, the discretization feature is affected. | 3 |
category_path | string | Recommended | The category path, with categories separated by underscores (_). | Custom | 1. A category path can contain multiple categories. You must separate the categories with underscores (_). 2. Commas (,) or colons (:) are not allowed. Category paths are used in discretization policies. | 12_1024_56 |
tags | string | Recommended | The tags of the item. Separate multiple tags with commas (,). | Custom | 1. Tags are used to describe the features of items. You must manage your own tag library. 2. The algorithm model may perform feature analysis based on tags. 3. You can create up to 100 tags for a single commodity. We recommend that you create up to 50,000 tags in each tag pool. 4. If tags are sensitive business data, we recommend that you convert the tags into digits based on the related mapping rules and upload the desensitized data. | famous actor, sports |
share_cnt | string | No | The number of shares in one month. | Custom | During service start, if behavioral data in the current scene is sparse, you can add behavioral data of other scenes to this field and the following fields whose names end with _cnt. Non-real-time data is acceptable. If the maintenance cost of these fields is high after the model becomes stable, you can process them at a low priority. | 156 |
collect_cnt | string | No | The number of favorites in one month. | Custom | Non-real-time data is acceptable. You can process this field at a low priority. | 566 |
pv_cnt | string | No | The number of exposures in one month. | Custom | Non-real-time data is acceptable. You can process this field at a low priority. | 10292 |
origin_price | string | No | The original price of the item, in the unit of USD. If the used price is not USD, convert the price to USD. | Custom | Non-real-time data is acceptable. You can process this field at a low priority. | 1000. The value 1000 indicates USD 1,000. dollar |
cur_price | string | No | The price of the item after a discount, in the unit of USD. If the used price is not USD, convert the price to USD. | Custom | Non-real-time data is acceptable. You can process this field at a low priority. | 900. The value 900 indicates USD 900. dollar |
buy_cnt | string | No | The monthly sales volume of transactions on the platform. | Custom | During service start, if behavioral data in the current scene is sparse, you can add behavioral data of other scenes to this field. Non-real-time data is acceptable. If the maintenance cost of these fields is high after the model becomes stable, you can process them at a low priority. | 10 |
source_buy_cnt | string | No | The monthly sales volume of transactions on Taobao. | Custom | Non-real-time data is acceptable. You can process this field at a low priority. | 10000 |
comment_cnt | string | Optional | The number of reviews. | Custom | Non-real-time data is acceptable. You can process this field at a low priority. | 1000 |
brand_id | string | Recommended | The brand ID. | Custom | Non-real-time data is acceptable. You can process this field at a low priority. | |
shop_id | string | Recommended | The store ID. | Custom | Non-real-time data is acceptable. You can process this field at a low priority. | |
source_id | string | No | The platform by using which the item is launched to the scene. | Custom | For example, you can use 1 to indicate Taobao and 2 to indicate Tmall. | 1 |
add_fee | string | No | The additional fees for the item. | Custom | For example, you can use 0 to indicate free shipping, and use 1 to indicate chargeable shipping. You can also use a specific amount of money, which is accurate to the cent, to represent postage. | 0 |
features | string | No | The item features of the STRING type. | Custom | Separate item features with commas (,). The features must be descriptive. | |
num_features | string | No | The item features of a numeric type. | Custom | Separate item features with commas (,). Make sure that the number of commas (,) in this field is the same for all items. |
user
Field | Data type | Required | Field description | Valid value | Value description | Example |
user_id | string | Required for users who log on | The unique ID of a user. | Custom | 1. This field is required for users who log on. 2. This field uniquely identifies a user. | 1234567 |
user_id_type | string | No | The registration type of the user. | 1234 | 1: app. 2: mobile phone number. 3: WeChat account. 4: other. | 2 |
imei | string | Required for users who do not log on | For an Android user, set this field to the MD5 hash value of the International Mobile Equipment Identity (IMEI). For an iOS user, set this field to the MD5 hash value of the Identifier for Advertisers (IDFA). | Custom | 1. This field is required for users who do not log on. 2. If the MAC address or the device number is invalid, internal user portrait information cannot be used. Only the exposure blocking feature is retained. 3. The value must be an MD5 hash value with 32 characters in length. | MD5 hash value of IMEI imei358800091015835: 74f25e604e1a9dde7471fe2e25ae54d0, MD5 hash value of IDFA 41B2FD07-695A-4A27-8D26-C30ECE6F7EAD: 06e1565409c9fc4887036b974421**** |
third_user_name | string | No | The name of a third-party user. | Custom | jack | |
third_user_type | string | No | The name of a third-party platform. | Custom | ||
phone_md5 | string | No | The MD5 hash value of a mobile phone number. | Custom | d41d8cd98f00b204e9800998ecf8**** | |
gender | string | No | The gender of the user. | male/female/unknown | male | |
age | string | No | The age of the user. | Custom | 22 | |
age_group | string | No | The age group. | Custom | 20-25 | |
country | string | No | The country code. | Custom | Set this field to an ISO 3166-1 alpha-3 code. | CHN |
city | string | No | The name of the city. | Custom | China (Hangzhou) | |
ip | string | No | The last logon IP address. | Custom | 202.113.XX.XX | |
device_model | string | No | The device model. | Custom | iphoneX | |
tags | string | No | User tags. Separate multiple tags with commas (,). | Custom | Use tags to describe the user. | football, fitness, outdoor |
source | string | No | The source of the user. | Custom | Toutiao | |
content | string | No | The description of the user. | Custom | ||
features | string | No | The user characteristic, which is a string. | Custom | Separate the user characteristics, such as user portrait, with commas (,). | |
num_features | string | No | The user characteristic, which is a numerical value. | Custom | Separate user characteristics with commas (,). Make sure that the number of commas (,) in this field is the same for all users. | |
register_time | string | No | The registration time. The value is a UNIX timestamp that is accurate to the second. | Custom | 1520007038 | |
last_login_time | string | No | The last logon time. The value is a UNIX timestamp that is accurate to the second. | Custom | 1520017038 | |
last_modify_time | string | No | The time at which the user information was last modified. The value is a UNIX timestamp that is accurate to the second. | Custom | 1520327038 |
behavior
Field | Data type | Required | Field description | Valid value | Value description | Example |
item_id | string | Yes | The ID of the item. | Custom | The value must be consistent with the value of the item_id field in the item table. | 34513 |
item_type | string | Yes | The type of the item. | image; article; video; shortvideo; item; recipe; audio; | The value must be consistent with the value of the item_type field in the item table. | article |
bhv_type | string | Yes | The behavior type, such as expose, stay, click, favorite, download, buy, cart, and review. For more information, see the Behavior types table in this topic. | expose/click/ buy/cart/ evaluate | The number of click entries must be less than the number of expose entries. Otherwise, the system may determine that the data is abnormal, and the service cannot be started. | expose |
trace_id | string | Yes | The request tracking ID. This field is used in A/B testing to determine whether an Alibaba recommendation engine is used. | Alibaba/selfhold | 1. If the behavior data is generated based on an Alibaba recommendation engine, set this field to Alibaba. If the behavior data is generated based on a self-developed or self-operated recommendation system, set this field to selfhold. 2. This field is used to generate analytical reports and compare the results in the console. | Alibaba |
trace_info | string | Yes | The request tracking information. The information is returned when the Recommend API operation is called. You need only to put the information in logs. | Information returned from response parameters of the Recommend API operation | 1. If the trace_id field is set to selfhold, set the trace_info field to 1. 2. If the trace_id field is set to Alibaba, trace_info is returned in the recommendation result. The value Alibaba indicates that the behavior is performed on an item that is recommended by AIRec. When you upload behavior data, you can retain the value of the trace_info field for this item. | 1007.5911.12351.1002000::::: |
scene_id | string | Yes | The ID of a scene. | Custom | 1. The ID of the scene where the behavior entry is generated. The value must be one of the scene IDs for the item that corresponds to the behavior. Only one scene ID is allowed. 2. The value of scene_id for the behavior table must be included in the value of scene_id for the item table. 3. If you do not need to distinguish between scenes, use the default value 1. If the scene ID of the behavior cannot be traced, set this field to -102. For more information, see Use scene IDs. | a1001 |
bhv_time | string | No | The time at which the behavior occurs. The value is a UNIX timestamp that is accurate to the second. | Custom | Set this field to the time at which the user performs the behavior. | 1520327038 |
bhv_value | string | Yes | Behavior details, such as the number of clicks, stay duration, number of purchased items, and monetary amount. For more information, see the Behavior types table in this topic. | Custom | 1. For clicks, set this field to 1. 2. For exposures, specify this field or leave it empty based on your business requirements. 3. For other behavior, contact technical support. | 1 |
user_id | string | Required for users who log on | The ID of a user. | Custom | 1. The value must be the same as that in the user table. 2. If the user does not log on, you can leave this field empty. | 1234567 |
platform | string | No | The client platform. | Custom | iOS/Android/H5 | iOS |
imei | string | Required for users who do not log on | The device ID. For an Android user, set this field to the MD5 hash value of the IMEI. For an iOS user, set this field to the MD5 hash value of the IDFA. | Custom | 1. This field is required for users who do not log on. 2. If the MAC address or the device number is invalid, internal user portrait information cannot be used. Only the exposure blocking feature is retained. 3. The value must be an MD5 hash value with 32 characters in length. | e2fcdb0f4dce45e35fe2823d7973**** |
app_version | string | No | The version number of the app. | Custom | 4.1.10 | |
net_type | string | No | The type of the network. | Custom | 2G/3G/4G/WIFI | 4G |
ip | string | No | The IP address of the client. | Custom | 234.45.XX.XX | |
login | string | No | Specifies whether the user logs on. | 01 | 0: The user does not log on. 1: The user logs on. | 1 |
report_src | string | No | The source of the report. | 12 | 1: the server. 2: the client. | 2 |
device_model | string | No | The device model. | Custom | iphoneX | |
longitude | string | No | The longitude. | Custom | 128.4 | |
latitude | string | No | The latitude. | Custom | 78.1 | |
module_id | string | No | The ID of the module. | Custom | 114 | |
page_id | string | No | The page ID. | Custom | 4 | |
position | string | No | The position of the item. | Custom | 5 | |
message_id | string | No | The unique identifier of a behavior entry. | Custom | If you do not set this field, the system uses the item_id, item_type, user_id, imei, bhv_type, and bhv_time fields to deduplicate behavior entries. | 5 |
behavior type
The following table describes the behavior types that are supported by the e-commerce industry. If other behavior types are required, contact technical support.
The types of data that you want to upload are strongly correlated to the object that is subject to model optimization. If you upload only click-related behavior entries, click-through rate (CTR) is the main optimization object.
If you upload consumption-related data, such as behavior entries to add items to a shopping cart and purchase items, the data will also be optimized. The main goal is to optimize CTR. If you cannot identify the scene where the add-to-cart behavior or buy behavior occurs, set scene_id to -102.
For more information about scenes, see Use scene IDs.
Area | Description | bhv_type | bhv_value | Remarks |
1 | The behavior to expose an item. | expose | Leave this field empty. | The behavior table must contain expose entries. The number of expose entries must be greater than the number of click entries. |
2 | The behavior to click an item. | click | 1 | The behavior table must contain click entries. |
3 | The "like" behavior on an item. | like | Leave this field empty. | / |
4 | The "dislike" behavior on an item. | unlike | Leave this field empty. | / |
5 | The "review" behavior on an item. | comment | Leave this field empty. | / |
6 | The "favorite" behavior on an item. | collect | Leave this field empty. | / |
7 | The "stay for" behavior on an item. | stay | A duration. | The unit is not limited. The unit is the same for all data entries. |
8 | The behavior to add an item to the shopping cart. | cart | Number of items,unit price. The item number is separated from the price with a comma (,). Example: 1,10000. | Unit price: USD, accurate to the cent. |
9 | The behavior to purchase an item. | buy | Number of items,unit price. The item number is separated from the price with a comma (,). Example: 1,10000. | Unit price: USD, accurate to the cent. One buy behavior entry corresponds only to one item ID specified by the item_id field. If an order contains multiple item IDs, you must split them. |
10 | The behavior to review an item. | evaluate | Discrete integers in ascending or descending order. | For example, if star rating is used, you can use integers 1 to 5 in ascending order for positive reviews. One star is indicated by the value 1, two stars are indicated by the value 2, and five stars are indicated by the value 5. Make sure that an ascending order is positively correlated to the trend of positive reviews. |
11 | The behavior to provide negative feedback on an item. | dislike | For more information, see Negative feedback. |
CREATE TABLE statements
If you use MaxCompute to report boot data, you can refer to the following CREATE TABLE statements:
--- Create a behavior table for the e-commerce industry.
DROP TABLE IF EXISTS behavior_table;
CREATE TABLE IF NOT EXISTS `behavior_table`
(
trace_id STRING COMMENT "Request tracking ID"
,trace_info STRING COMMENT "Request tracking information"
,platform STRING COMMENT "Client platform"
,device_model STRING COMMENT "Device model"
,imei STRING COMMENT "Device ID"
,app_version STRING COMMENT "App version number"
,net_type STRING COMMENT "Network type"
,longitude STRING COMMENT "Longitude"
,latitude STRING COMMENT "Latitude"
,ip STRING COMMENT "Client IP address"
,login STRING COMMENT "Whether the user logs on"
,report_src STRING COMMENT "Source of the report"
,scene_id STRING COMMENT "Scene ID"
,user_id STRING COMMENT "User ID"
,item_id STRING COMMENT "Item ID"
,item_type STRING COMMENT "Type of the item"
,module_id STRING COMMENT "Module ID"
,page_id STRING COMMENT "Page ID"
,position STRING COMMENT "Position of the item"
,bhv_type STRING COMMENT "Behavior type"
,bhv_value STRING COMMENT "Behavior details"
,bhv_time STRING COMMENT "Time at which the behavior occurs"
)
PARTITIONED BY
(
ds STRING
)
LIFECYCLE 30
;
--- Create a user table in the e-commerce industry.
DROP TABLE IF EXISTS user_table;
CREATE TABLE IF NOT EXISTS `user_table`
(
user_id STRING COMMENT "Unique ID of the user"
,user_id_type STRING COMMENT "Registration type of the user"
,third_user_name STRING COMMENT "Name of the third-party user"
,third_user_type STRING COMMENT "Name of the third-party platform"
,phone_md5 STRING COMMENT "MD5 hash value of the mobile phone number of the user"
,imei STRING COMMENT "Device ID of the user"
,content STRING COMMENT "User content"
,gender STRING COMMENT "Gender"
,age STRING COMMENT "Age"
,age_group STRING COMMENT "Age group"
,country STRING COMMENT "Country"
,city STRING COMMENT "City"
,ip STRING COMMENT "Last logon IP address"
,device_model STRING COMMENT "Device model"
,register_time STRING COMMENT "Registration time"
,last_login_time STRING COMMENT "Last logon time"
,last_modify_time STRING COMMENT "Time at which the user information was last modified"
,tags STRING COMMENT "User tags"
,source STRING COMMENT "Source of the user"
,features STRING COMMENT "Additional user features of the STRING type"
,num_features STRING COMMENT "Additional user features of a numeric type"
)
PARTITIONED BY
(
ds STRING
)
LIFECYCLE 30
;
--- Create an item table in the e-commerce industry.
DROP TABLE IF EXISTS item_table;
CREATE TABLE IF NOT EXISTS `item_table`
(
scene_id STRING COMMENT "Scene ID"
,item_id STRING COMMENT "Unique ID of the item"
,item_type STRING COMMENT "Type of the item"
,category_level STRING COMMENT "Category level"
,category_path STRING COMMENT "Category path"
,title STRING COMMENT "Item title"
,content STRING COMMENT "Body part of the item"
,pub_time STRING COMMENT "Publish time"
,tags STRING COMMENT "Tags"
,share_cnt STRING COMMENT "Number of shares"
,collect_cnt STRING COMMENT "Number of favorites"
,pv_cnt STRING COMMENT "Number of exposures"
,status STRING COMMENT "Whether the item can be recommended"
,expire_time STRING COMMENT "Time at which the item expires"
,last_modify_time STRING COMMENT "Time at which the item information was last modified"
,origin_price STRING COMMENT "Original price"
,cur_price STRING COMMENT "Price after a discount"
,buy_cnt STRING COMMENT "Monthly sales volume of transactions on the platform"
,source_buy_cnt STRING COMMENT "Monthly sales volume of transactions on Taobao"
,comment_cnt STRING COMMENT "Number of reviews"
,brand_id STRING COMMENT "Brand ID"
,shop_id STRING COMMENT "Store ID"
,source_id STRING COMMENT "Source of the item"
,add_fee STRING COMMENT "Additional fees"
,features STRING COMMENT "Additional user features of the STRING type"
,num_features STRING COMMENT "Additional user features of a numeric type"
,weight STRING COMMENT "Weight of the item, default value: 1"
)
PARTITIONED BY
(
ds STRING
)
LIFECYCLE 30
;