All Products
Search
Document Center

Artificial Intelligence Recommendation:PAI-Rec Recommendation Platform - E-commerce recommendation scenario data format

Last Updated:Apr 01, 2026

PAI-Rec requires three datasets to generate recommendations: a user table, an item table, and a behavior log table. The more fields you provide, the better the recommendation quality. Field names in your data do not need to match the names in the tables below.

Data requirements at a glance

The following table summarizes the required and recommended fields across all three datasets.

DatasetRequired fieldsRecommended fields
User tableuser_id, imei (required for non-logged-in users)
Item tableitem_id, pub_time, pricetitle, category_level, cate_id_path, cate_name_path, cate1_id, cate2_id, cate_id, cate1_name, cate2_name, cate_name, brand_id, properties, spu_id
Behavior log tableuser_id, item_id, event, event_time, scenerequest_id, exp_id

User table

The user table contains information about all users registered in your system. Create one partition per day with a full snapshot of all users for that day.

FieldRequiredDescription
user_idYesUnique identifier of a user
user_id_typeNoRegistration type: App, Mobile phone number, WeChat, Other
imeiYes, for users who haven't logged inInternational Mobile Equipment Identity (IMEI) — the device ID
genderNoValid values: male, female, unknown
age / birthdayNoAge or date of birth
purchasingNoPurchasing power derived from historical data or a predictive model
countryNoCountry
provinceNoRegion, state, or province
cityNoCity
register_timeNoRegistration timestamp in seconds (example: 1520017038)
educationNoEducation background
careerNoOccupation
last_login_timeNoLast login timestamp in seconds (example: 1520017038)
sourceNoAcquisition channel (example: TouTiao, WeChat)
contentNoFree-text description of the user
tagsNoTags describing the user's interests (example: Football, fitness, outdoor)

Item table

The item table contains information about all items in your system. Create one partition per day with a full snapshot of all items for that day.

Core fields

FieldRequiredDescription
item_idYesUnique identifier of an item
pub_timeYesPublication timestamp in seconds
priceYesActual sales price (float)
titleRecommendedItem title — used for semantic analysis. If blank, the recommendation algorithm may not perform as expected.
brand_idRecommendedBrand ID
spu_idRecommendedStandard Product Unit (SPU) ID
propertiesRecommendedItem properties in JSON format. Example: {"material": "cotton", "style": "commuting"}
item_typeNoType of the item
source_idNoSource e-commerce platform (example: Taobao, Tmall, JD)
sub_titleNoSubtitle of the item
expire_timeNoExpiration timestamp in seconds
descriptionNoItem details
origin_priceNoOriginal price before discount
discountNoDiscount ratio: price / origin_price
tagsNoTags attached by business operations staff, such as a promotion activity ID
colorNoColor category
postageNoShipping cost; set to 0 for free shipping
image_urlNoURL to download the item image
video_urlNoURL to download the item video
shop_dsrNoDetailed Seller Ratings (DSR): item description accuracy, customer service, and delivery quality
sku_idNoStock Keeping Unit (SKU) ID
shop_idNoStore ID
provNoRegion, state, or province where the item is located
cityNoCity where the item is located
rateNoPositive feedback rate

Category fields

Category fields power the recommendation algorithm's understanding of your product taxonomy. The hierarchy must follow the Mutually Exclusive, Collectively Exhaustive (MECE) principle — categories at the same level do not semantically overlap.

FieldRequiredDescription
category_levelRecommendedDepth of the category hierarchy (example: 3 for a three-level hierarchy)
cate_id_pathRecommendedFull category path as IDs, separated by underscores (example: 100_200_300)
cate_name_pathRecommendedFull category path as names, separated by underscores (example: Electronics_Phones_Smartphones)
cate1_idRecommendedLevel-1 category ID
cate2_idRecommendedLevel-2 category ID
cate_idRecommendedLeaf-level category ID (the deepest level in the hierarchy)
cate1_nameRecommendedLevel-1 category name
cate2_nameRecommendedLevel-2 category name
cate_nameRecommendedLeaf-level category name

Behavior log table

The behavior log table records user interactions with items. Provide at least 30–60 days of data.

Include behaviors from across the entire site, not just recommendation scenarios. Covering search and popular-item scenarios gives the algorithm a more complete view of user intent.

FieldRequiredDescription
user_idYesUser ID
item_idYesItem ID
eventYesBehavior type (see Event types)
event_timeYesTimestamp of the behavior in seconds
sceneYesScenario ID: home_feed (home feed), hot_items (popular items), or search (search results). In search scenarios, also populate the query field.
imeiNoDevice ID
event_valueNoBehavior value — format varies by event type (see Event types)
request_idRecommendedUnique identifier of the recommendation request returned by PAI-Rec. If blank, sample accuracy is reduced and real-time features cannot be enabled. This field can be left blank when creating a recommendation solution, but adding it later requires modifying the training sample code, re-preparing samples, and retraining the model.
exp_idRecommendedExperiment bucket ID returned by the PAI-Rec Recommend API. Set to default or other values if the result was not generated by PAI-Rec.
request_infoNoTracking information returned by the Recommend API — log it as-is
queryNoSearch query — required when scene is search
pageNoPage ID (for item detail pages, the item ID)
source_pageNoPrevious page — used to attribute conversions by traffic source
positionNoPosition of the item in the recommendation list
app_versionNoApp version number
net_typeNoNetwork type: 3G, 4G, 5G, Wi-Fi
loginNoWhether the user is logged in
device_platformNoClient platform: iOS, Android, H5, Msite
device_systemNoDevice operating system: iOS, Android, PC
device_modelNoDevice model (example: iPhone 5)
device_brandNoDevice manufacturer (example: Apple, Xiaomi, Huawei)
longitudeNoLongitude
latitudeNoLatitude
ipNoClient IP address — used to derive country and city features

Event types

The behavior log supports the following event types. Each event maps to a specific value for the event field and a specific format for event_value.

Eventevent valueevent_valueNotes
Expose an itemexposeLeave blank
Click an itemclickLeave blank
Like an itemlikeLeave blank
Dislike an itemunlikeLeave blank
Comment on an itemcommentComment textComment content is used to analyze shopping experience and item quality.
Add to favoritescollectLeave blank
Dwell on an itemstayDurationAny time unit is acceptable, but must be consistent across all entries.
Add to cartcart<quantity>,<unit_price>Example: 1,10000. Unit price is in USD, accurate to the cent.
Purchasebuy<quantity>,<unit_price>Example: 1,10000. Unit price is in USD, accurate to the cent. One buy entry corresponds to one item_id only — split orders with multiple items into separate entries.
Rate an itemevaluateIntegerUse discrete integers in a consistent ascending or descending order. For star ratings, integers 1–5 in ascending order means higher values indicate more positive reviews.
Negative feedbackdislikeLeave blank