All Products
Search
Document Center

E-commerce industry

Last Updated: Jul 28, 2021

Data description

If your business belongs to the e-commerce industry, you must prepare three data tables when you use Artificial Intelligence Recommendation (AIRec).

  1. Item table: This table refers to a commodity table in this topic. This table contains all the recent commodities that can be recommended in the current scene. The number of items is limited by quota. We recommend that you deduplicate items before you upload the table. The item_id and item_type fields are used together to uniquely identify an item.

  2. User table: This table contains all users who have recently logged onto the system. The number of users is limited by quota. We recommend that you deduplicate users before you upload the table. You can use the imei field or a combination of the user_id and imei fields to uniquely identify a user. For example, in the latter case, you can use user_id to identify users who have logged on and use imei to identify users who have not logged on. Make sure that all users are unique. When you request recommendation results, you must specify the unique identifiers of users. Otherwise, personalized recommendations cannot be achieved.

  3. Behavior table: This table contains recent behavioral data in the current scene. We recommend that you provide behavioral data generated in the last one to two weeks. If behavioral data cannot be provided due to technical reasons or no historical data is available because the scene is new, you can use the test data provided by AIRec. In this case, the recommendation model may return unsatisfactory results for about two weeks. The recommendation effect gradually becomes better and gets stable at last as more data is accumulated.

We recommend that you specify as many optional fields in the tables as possible. The more valid optional fields you specify, the better the recommendation effect is. If an optional field is not specified, the system uses its default value.

Table schema

You must specify the fields that are marked as "Required" in the Required column of the following tables. The fields that are marked as "Required" and "Recommended" in the Required column have significant impacts on the recommendation effect. The fields are described in the "Value description" column.

Item table

Field name

Data type

Required

Field description

Valid value

Value description

Example

item_id

String

Required

The ID of an item.

Custom

1. The item_id and item_type fields uniquely identify an item.

2. The item_id field can be a maximum of 512 bytes in length.

34513

item_type

String

Required

The type of the item.

image

article

video

shortvideo

item

recipe

audio

(If the enumerated values do not meet your business requirements, contact technical support.)

1. The uploaded data must match the specified item type.

2. For more information about the arrangement of multiple item types, see mixed sorting policies.

article

status

String

Required

Specifies whether the item can be recommended.

0

1

1: The item can be recommended.0: The item cannot be recommended.

Note: 1. If you change the value from 0 to 1, the item is listed after a scheduling period of about 1 hour.

2. If you change the value from 1 to 0, the listing of the item is immediately removed.

1

scene_id

String

Required

The ID of the scene. Items are launched to different scenes. Scenes are designed based on the access types of users on different web pages.

Custom

1. We recommend that you use an acronym or a combination of letters and digits.

2. Do not use colons (:).

3. Do not set this field to -102. This value is an internal reserved value of the system.

4. If only a single scene is available, set this field to 1.

5. You can set this field to multiple scene IDs and separate them with commas (,). The scene IDs match different web pages to which commodities are launched. For more information, see Use event tracking.

sy101,gwc102 (sy101 indicates the homepage. gwc102 indicates the shopping cart page.)

pub_time

String

Required

The time at which the item is published. The value is a UNIX timestamp that is accurate to the second. This field is used to determine whether the item is the latest item.

Custom

If you have high requirements for timeliness, this field is required. This field is used for recommendation of new items.

1520327038

expire_time

String

Optional

The time at which the item expires. The value is a UNIX timestamp that is accurate to the second.

Custom

1. If the current system time of your server is later than the value of this field, the item expires and is no longer recommended.

2. If all items in the table have expired, the service cannot be started.

3. If this field is left empty, the item never expires.

1520327038

last_modify_time

String

Optional

The last modification time of item information. The value is a UNIX timestamp that is accurate to the second.

Custom

If you have made major updates to a published item and have high requirements for timeliness, you can update this field. This field functions similarly to pub_time. Both of the fields are used to identify new items.

1520327038

title

String

Recommended

The title of the item.

Custom

This field is used for in-depth semantic analysis. If this field is left empty, partial effects of the recommendation algorithm may be lost. We recommend that you specify this field.

CHANEL-Style Lady Dress

weight

String

Recommended

Specifies whether the item is weighted.

Custom

Note: 1. For a weighted item, set this field to 100. For an unweighted item, set this field to 1.

2. You must set this field to 100 or 1. Other values are invalid.

3. We recommend that you keep the number of weighted items less than or equal to 10% of the total number of items.

4. A weighted item is more likely to be recommended.

5. Weighting affects the performance of the training model. We recommend that you set the weight of items with caution.

1

category_level

String

Recommended

The category level, such as level 3.

Custom

If this value does not match the value of the category_path field, the discretization feature is affected.

3

category_path

String

Recommended

The category path, with categories combined by underscores (_).

Custom

1. A category path can contain multiple categories. You must combine the categories with underscores (_).

2. Commas (,) and colons (:) are not allowed. Category paths are used in discretization policies.

12_1024_56

tags

String

Recommended

The tags of the item. Separate multiple tags with commas (,).

Custom

1. Tags are used to describe the features of items. You must manage your own tag library.

2. The algorithm model performs feature analysis based on tags and trains the distribution of hot items based on behavioral data.

3. The number of tags for a single commodity cannot exceed 100. We recommend that the total number of tags in a tag pool does not exceed 50,000.

4. If tags are sensitive business data, we recommend that you convert the tags into digits based on the related mapping rules and upload the desensitized data.

Goddess style,Warm spring,Street fashion,Sports

share_cnt

String

Optional

The number of shares in one month.

Custom

During service start, if behavioral data in the current scene is sparse, you can add behavioral data of other scenes to this field and the following fields whose names end with _cnt. Timeliness is not required. If the maintenance cost of these fields is high after the model becomes stable, you can process them at a low priority.

156

collect_cnt

String

Optional

The number of favorites in one month.

Custom

Timeliness is not required. You can process this field at a low priority.

566

pv_cnt

String

Optional

The number of exposures in one month.

Custom

Timeliness is not required. You can process this field at a low priority.

10292

origin_price

String

Optional

The original price of the item. Unit: USD. If the price is in a different currency, you must convert it into USD.

Custom

Timeliness is not required. You can process this field at a low priority.

1000

cur_price

String

Optional

The price of the item after a discount. Unit: USD. If the price is in a different currency, you must convert it into USD.

Custom

Timeliness is not required. You can process this field at a low priority.

900

buy_cnt

String

Optional

The monthly sales volume of the platform.

Custom

During service start, if behavioral data in the current scene is sparse, you can add behavioral data of other scenes to this field and the following fields whose names end with _cnt. Timeliness is not required. If the maintenance cost of these fields is high after the model becomes stable, you can process them at a low priority.

10

source_buy_cnt

String

Optional

The monthly sales volume of Taobao.

Custom

Timeliness is not required. You can process this field at a low priority.

10000

comment_cnt

String

Optional

The number of comments.

Custom

Timeliness is not required. You can process this field at a low priority.

1000

brand_id

String

Optional

The ID of the brand.

Custom

Timeliness is not required. You can process this field at a low priority.

shop_id

String

Optional

The ID of the store.

Custom

Timeliness is not required. You can process this field at a low priority.

source_id

String

Optional

The platform by which the item is launched to the scene.

Custom

For example, you can use 1 to indicate Taobao and 2 to indicate Tmall.

1

add_fee

String

Optional

The additional fee for the item.

Custom

For example, you can use 0 to indicate free shipping, and use 1 to indicate chargeable shipping. You can also use a specific amount of money, which is accurate to the cent, to represent postage.

0

features

String

Optional

The item features of the STRING type.

Custom

Separate item features with commas (,). The features must be descriptive.

num_features

String

Optional

The item features of a numeric type.

Custom

Separate item features with commas (,). Make sure that the number of commas (,) in this field is the same for all items.

User table

Field name

Data type

Required

Field description

Valid value

Value description

Example

user_id

String

Required for users who have logged on

The unique ID of a user.

Custom

1. This field is required for users who have logged on.

2. This field uniquely identifies a user.

1234567

user_id_type

String

Optional

The registration type of the user.

1

2

3

4

1: app.

2: mobile phone number. 3: WeChat. 4: other.

2

imei

String

Required for users who have not logged on

For an Android user, set this field to the MD5 hash of the IMEI. For an iOS user, set this field to the MD5 hash of the IDFA.

Custom

1. This field is required for users who have not logged on.

2. If the MAC address or the device number is invalid, internal user persona information cannot be used. Only the recommendation blocking feature is retained.

3. The value must be 32 characters in length.

MD5 hash of IMEI 358800091015835: 74f25e604e1a9dde7471fe2e25ae54d0, MD5 hash of IDFA 41B2FD07-695A-4A27-8D26-C30ECE6F7EAD: 06e1565409c9fc4887036b97442135ee

third_user_name

String

Optional

The name of a third-party user.

Custom

jack

third_user_type

String

Optional

The name of a third-party platform.

Custom

wechat

phone_md5

String

Optional

The MD5 hash of a mobile phone number.

Custom

d41d8cd98f00b204e9800998ecf8427e

gender

String

Optional

The gender of the user.

male

female

unknown

male

age

String

Optional

The age of the user.

Custom

22

age_group

String

Optional

The age group.

Custom

20-25

country

String

Optional

The country code.

Custom

Set this field to an ISO 3166-1 alpha-3 code.

CHN

city

String

Optional

The city name.

Custom

Hangzhou

ip

String

Optional

The last logon IP address.

Custom

202.113.34.16

device_model

String

Optional

The device model.

Custom

iphoneX

tags

String

Optional

User tags. Separate multiple tags with commas (,).

Custom

Use tags to describe the user.

football,fitness,outdoor

source

String

Optional

The source of the user.

Custom

Toutiao

content

String

Optional

The description of the user.

Custom

features

String

Optional

The user features of the STRING type.

Custom

Separate the user features, such as user persona, with commas (,).

num_features

String

Optional

The user features of a numeric type.

Custom

Separate user features with commas (,). Make sure that the number of commas (,) in this field is the same for all users.

register_time

String

Optional

The registration time. The value is a UNIX timestamp that is accurate to the second.

Custom

1520007038

last_login_time

String

Optional

The last logon time. The value is a UNIX timestamp that is accurate to the second.

Custom

1520017038

last_modify_time

String

Optional

The last modification time of user information. The value is a UNIX timestamp that is accurate to the second.

Custom

1520327038

Behavior table

Field name

Data type

Required

Field description

Valid value

Value description

Example

item_id

String

Required

The ID of an item.

Custom

The value must be the same as the value of the item_id field in the item table.

34513

item_type

String

Required

The type of the item.

image

article

video

shortvideo

item

recipe

audio

The value must be the same as the value of the item_type field in the item table.

article

bhv_type

String

Required

The behavior type, such as expose, stay, click, collect, download, buy, cart, and evaluate. For more information, see "Behavior types" in this topic.

expose

click

buy

cart

evaluate

The number of click entries must be less than the number of expose entries. Otherwise, the system may determine that the data is abnormal, and the service cannot be started.

expose

trace_id

String

Required

The request tracking ID. This field is used in A/B testing to determine whether AIRec is used.

Alibaba

selfhold

1. If the behavior entry is generated by using AIRec, set this field to Alibaba. If the behavior entry is generated by using a self-developed or self-operated recommendation system, set this field to selfhold.

2. This field is used for report analysis and effect comparison in the console.

Alibaba

trace_info

String

Required

The request tracking information. The information is returned when the Recommend API operation is called. You need only to put the information in logs.

Information returned from response parameters of the Recommend API operation

1. If trace_id is set to selfhold, set trace_info to 1.

2. If trace_id is set to Alibaba, trace_info is returned in the recommendation result. The value Alibaba indicates that the behavior is performed on an item that is recommended by AIRec. When you upload behavioral data, you need only to specify the original trace_info for the item.

1007.5911.12351.1002000:::::

scene_id

String

Required

The scene ID.

Custom

1. The ID of the scene where the behavior entry is generated. The value must be one of the scene IDs for the item that corresponds to the behavior. Only one scene ID is allowed.

2. The value of scene_id for the behavior table must be included in the value of scene_id for the item table.

3. If you do not need to distinguish scenes, use the default value 1. If the scene ID of the behavior cannot be traced, set this field to -102. For more information, see Use event tracking.

a1001

bhv_time

String

Optional

The time at which the behavior occurs. The value is a UNIX timestamp that is accurate to the second.

Custom

Set this field to the time when the user performs the behavior.

1520327038

bhv_value

String

Required

Behavior details, such as the number of clicks, stay duration, number of purchased items, and monetary amount. For more information, see "Behavior types" in this topic.

Custom

1. For clicks, set this field to 1.

2. For exposures, specify this field or leave it empty based on your business requirements.

3. For other behavior, contact technical support.

1

user_id

String

Required for users who have logged on

The user ID.

Custom

1. The value must be the same as the value in the user table.

2. If the user has not logged on, you can leave this field empty.

1234567

platform

String

Optional

The client platform.

Custom

ios/andriod/h5

ios

imei

String

Required for users who have not logged on

The device ID. For an Android user, set this field to the MD5 hash of the IMEI. For an iOS user, set this field to the MD5 hash of the IDFA.

Custom

1. This field is required for users who have not logged on.

2. If the MAC address or the device number is invalid, internal user persona information cannot be used. Only the recommendation blocking feature is retained.

3. The value must be 32 characters in length.

e2fcdb0f4dce45e35fe2823d797333ec

app_version

String

Optional

The app version number.

Custom

4.1.10

net_type

String

Optional

The network type.

Custom

2G/3G/4G/WIFI

4G

ip

String

Optional

The client IP address.

Custom

234.45.13.14

login

String

Optional

Specifies whether the user has logged on.

0

1

0: The user has not logged on. 1: The user has logged on.

1

report_src

String

Optional

The report source.

1

2

1: server. 2: client.

2

device_model

String

Optional

The device model.

Custom

iphoneX

longitude

String

Optional

The longitude.

Custom

128.4

latitude

String

Optional

The latitude.

Custom

78.1

module_id

String

Optional

The module ID.

Custom

114

page_id

String

Optional

The page ID.

Custom

4

position

String

Optional

The position of the item.

Custom

5

message_id

String

Optional

The unique identifier of a behavior entry.

Custom

If you do not specify this field, the system uses the item_id, item_type, user_id, imei, bhv_type, and bhv_time fields to deduplicate behavior entries.

5

Behavior types

The following table describes the behavior types that are supported in the e-commerce industry. If other behavior types are required, contact technical support.

Note:

The types of data that you want to upload are strongly correlated to the object that is subject to model optimization. If you upload only click-related behavior entries, CTR is the main optimization object.

If you upload consumption-related data, such as behavior entries to add items to a shopping cart and purchase items, the data will also be optimized. Note that the core is to optimize CTR. If you cannot identify the scene where the add-to-cart behavior or buy behavior occurs, set scene_id to -102.

For more information about scenes, see Use event tracking.

No.

Description

bhv_type

bhv_value

Remarks

1

The behavior to expose an item.

expose

Leave this field empty.

The behavior table must contain expose entries. The number of expose entries must be greater than the number of click entries.

2

The behavior to click an item.

click

1.

The behavior table must contain click entries.

3

The behavior to like an item.

like

Leave this field empty.

/

4

The behavior to dislike an item.

unlike

Leave this field empty.

/

5

The behavior to comment on an item.

comment

Leave this field empty.

/

6

The behavior to favorite an item.

collect

Leave this field empty.

/

7

The behavior to stay on an item.

stay

A duration.

The unit is not limited. Note that the unit is the same for all data entries.

8

The behavior to add an item to the shopping cart.

cart

Number of items,Unit price, separated by a comma (,). Example: 1,10000.

Unit price: USD, accurate to the cent.

9

The behavior to purchase an item.

buy

Number of items,Unit price, separated by a comma (,). Example: 1,10000.

Unit price: USD, accurate to the cent. One buy behavior entry corresponds only to one item ID specified by item_id. If an order contains multiple item IDs, split them.

10

The behavior to evaluate an item.

evaluate

Discrete integers in ascending or descending order.

For example, if star rating is used, you can use integers 1 to 5 in ascending order for positive reviews. Make sure one to one correspondence.

11

The behavior to provide negative feedback on an item.

dislike

For more information, see negative feedback.

CREATE TABLE statements

If you use MaxCompute to report start data, you can refer to the following CREATE TABLE statements:

--- Create a behavior table in the e-commerce industry.
DROP TABLE IF EXISTS behavior_table;
CREATE TABLE IF NOT EXISTS `behavior_table`
(
    trace_id STRING COMMENT "Request tracking ID"
    ,trace_info STRING COMMENT "Request tracking information"
    ,platform STRING COMMENT "Client platform"
    ,device_model STRING COMMENT "Device model"
    ,imei STRING COMMENT "Device ID"
    ,app_version STRING COMMENT "App version number"
    ,net_type STRING COMMENT "Network type"
    ,longitude STRING COMMENT "Longitude"
    ,latitude STRING COMMENT "Latitude"
    ,ip STRING COMMENT "Client IP address"
    ,login STRING COMMENT "Whether the user has logged on"
    ,report_src STRING COMMENT "Report source"
    ,scene_id STRING COMMENT "Scene ID"
    ,user_id STRING COMMENT "User ID"
    ,item_id STRING COMMENT "Item ID"
    ,item_type STRING COMMENT "Item type"
    ,module_id STRING COMMENT "Module ID"
    ,page_id STRING COMMENT "Page ID"
    ,position STRING COMMENT "Position of the item"
    ,bhv_type STRING COMMENT "Behavior type"
    ,bhv_value STRING COMMENT "Behavior details"
    ,bhv_time STRING COMMENT "Time at which the behavior occurs"
)
PARTITIONED BY 
(
    ds STRING
)
LIFECYCLE 30
;


--- Create a user table in the e-commerce industry.
DROP TABLE IF EXISTS user_table;
CREATE TABLE IF NOT EXISTS `user_table`
(
    user_id STRING COMMENT "Unique user ID"
    ,user_id_type STRING COMMENT "Registration type of the user"
    ,third_user_name STRING COMMENT "Third-party user name"
    ,third_user_type STRING COMMENT "Third-party platform name"
    ,phone_md5 STRING COMMENT "MD5 hash of the mobile phone number of the user"
    ,imei STRING COMMENT "Device ID of the user"
    ,content STRING COMMENT "User content"
    ,gender STRING COMMENT "Gender"
    ,age STRING COMMENT "Age"
    ,age_group STRING COMMENT "Age group"
    ,country STRING COMMENT "Country"
    ,city STRING COMMENT "City"
    ,ip STRING COMMENT "Last logon IP address"
    ,device_model STRING COMMENT "Device model"
    ,register_time STRING COMMENT "Registration time"
    ,last_login_time STRING COMMENT "Last logon time"
    ,last_modify_time STRING COMMENT "Last modification time of user information"
    ,tags STRING COMMENT "User tags"
    ,source STRING COMMENT "Source of the user"
    ,features STRING COMMENT "Additional user features of the STRING type"
    ,num_features STRING COMMENT "Additional user features of a numeric type"
)
PARTITIONED BY
(
    ds STRING
)
LIFECYCLE 30
;


--- Create an item table in the e-commerce industry.
DROP TABLE IF EXISTS item_table;
CREATE TABLE IF NOT EXISTS `item_table`
(
    scene_id STRING COMMENT "Scene ID"
    ,item_id STRING COMMENT "Unique ID of the item"
    ,item_type STRING COMMENT "Item type"
    ,category_level STRING COMMENT "Category level"
    ,category_path STRING COMMENT "Category path"
    ,title STRING COMMENT "Item title"
    ,content STRING COMMENT "Body part of the item"
    ,pub_time STRING COMMENT "Publish time"
    ,tags STRING COMMENT "Tags"
    ,share_cnt STRING COMMENT "Number of shares"
    ,collect_cnt STRING COMMENT "Number of favorites"
    ,pv_cnt STRING COMMENT "Number of exposures"
    ,status STRING COMMENT "Whether the item can be recommended"
    ,expire_time STRING COMMENT "Time at which the item expires"
    ,last_modify_time STRING COMMENT "Last modification time of the item information"
    ,origin_price STRING COMMENT "Original price"
    ,cur_price STRING COMMENT "Price after a discount"
    ,buy_cnt STRING COMMENT "Monthly sales volume of the platform"
    ,source_buy_cnt STRING COMMENT "Monthly sales volume of Taobao"
    ,comment_cnt STRING COMMENT "Number of comments"
    ,brand_id STRING COMMENT "Brand ID"
    ,shop_id STRING COMMENT "Store ID"
    ,source_id STRING COMMENT "Item source"
    ,add_fee STRING COMMENT "Additional fee"
    ,features STRING COMMENT "Additional user features of the STRING type"
    ,num_features STRING COMMENT "Additional user features of a numeric type"
    ,weight STRING COMMENT "Weight of the item, default value: 1"
)
PARTITIONED BY
(
    ds STRING
)
LIFECYCLE 30
;