All Products
Search
Document Center

Artificial Intelligence Recommendation:Data preparation

Last Updated:Jun 12, 2025

Before you use PAI-Rec to build a recommendation system, you must prepare basic data and analyze user features for model training and calibration. This topic describes data specifications for typical scenarios.

Background information

Feature data generally includes the following basic tables:

  • User table:

    Contains feature data related to users. This table is used to describe personal information, preferences, and behavioral habits of users.

    User IDs in a user table are unique. A user table and a behavior table can be associated by using the unique ID. A user table must contain basic user information such as age, gender, city, points, registration time, and user tags. Each partition represents one day and records all user information.

  • Item table:

    Contains information about recommended items. This table is used to describe the properties and characteristics of items.

    Item IDs in an item table are unique. An item table and a behavior table can be associated by using the unique ID. An item table must contain basic item information such as level 1 category, level 2 category, price, title, color, specifications, listing time, author ID, and number of followers. Each partition represents one day and records all item information.

  • Behavior table:

    Contains behaviors between users and items, detailing what actions a user took on an item and when.

Prerequisites

You have created a MaxCompute project and associated it with a DataWorks workspace.

Procedure

To help you quickly get started, PAI-Rec has prepared three tables in the pai_online_project project in MaxCompute. You can clone the data to your own project for use.

  1. Log on to the DataWorks console. In the left-side navigation pane, choose Data Development & O&M > Data Development.

  2. On the Data Development page, select the DataWorks workspace that you created and click Go To Data Ddevelopment.

  3. On the page that appears, move the pointer over Create and choose Create Node > MaxCompute > ODPS SQL. In the dialog box that appears, configure the node parameters and click Confirm.

  4. On the tab of the node that you created, run the following SQL statements to synchronize the user table, item table, and label table from the pai_online_project project to your MaxCompute project.

    -- User table
    CREATE TABLE IF NOT EXISTS rec_sln_demo_user_table(
      user_id BIGINT COMMENT 'The unique ID of the user',
      gender STRING COMMENT 'The gender',
      age BIGINT COMMENT 'The age',
      city STRING COMMENT 'The city',
      item_cnt BIGINT COMMENT 'The number of created content items',
      follow_cnt BIGINT COMMENT 'The cumulative number of follows',
      follower_cnt BIGINT COMMENT 'The cumulative number of followers',
      register_time BIGINT COMMENT 'The registration time',
      tags STRING COMMENT 'The user tags'
    ) PARTITIONED BY (ds STRING) STORED AS ALIORC;
    INSERT OVERWRITE TABLE rec_sln_demo_user_table PARTITION(ds)
    SELECT *
    FROM pai_online_project.rec_sln_demo_user_table
    WHERE ds > "20221231" and ds < "20230217";
    -- Item table
    CREATE TABLE IF NOT EXISTS rec_sln_demo_item_table(
      item_id BIGINT COMMENT 'The content ID',
      duration DOUBLE COMMENT 'The video duration',
      title STRING COMMENT 'The title',
      category STRING COMMENT 'The level 1 tag',
      author BIGINT COMMENT 'The author',
      click_count BIGINT COMMENT 'The cumulative number of clicks',
      praise_count BIGINT COMMENT 'The cumulative number of likes',
      pub_time BIGINT COMMENT 'The publication time'
    ) PARTITIONED BY (ds STRING) STORED AS ALIORC;
    INSERT OVERWRITE TABLE rec_sln_demo_item_table PARTITION(ds)
    SELECT *
    FROM pai_online_project.rec_sln_demo_item_table
    WHERE ds > "20221231" and ds < "20230217";
    -- Behavior table
    CREATE TABLE IF NOT EXISTS rec_sln_demo_behavior_table(
      request_id STRING COMMENT 'The request tracking ID/request ID',
      user_id STRING COMMENT 'The unique ID of the user',
      exp_id STRING COMMENT 'The experiment ID',
      page STRING COMMENT 'The page',
      net_type STRING COMMENT 'The network type',
      event_time BIGINT COMMENT 'The behavior time',
      item_id STRING COMMENT 'The content ID',
      event STRING COMMENT 'The behavior type',
      playtime DOUBLE COMMENT 'The playback duration/reading duration'
    ) PARTITIONED BY (ds STRING) STORED AS ALIORC;
    INSERT OVERWRITE TABLE rec_sln_demo_behavior_table PARTITION(ds)
    SELECT *
    FROM pai_online_project.rec_sln_demo_behavior_table
    WHERE ds > "20221231" and ds < "20230217";

You can also refer to Appendix: Data specifications for common scenarios to prepare your own user table, item table, and behavior table.

Appendix: Data specifications for common scenarios

E-commerce recommendation scenario

The following table lists the recommended fields for the user table, item table, and behavior table in the e-commerce scenario. If there are other feature fields not covered, you can add them yourself. The more complete and rich the fields are, the better the recommendation effect will be. The field names do not need to match exactly those in the tables.

User table

We recommend that the user table contains information about all users registered in your system. We recommend that you create a partition every day and synchronize daily data of all users to the partition.

Field

Required

Description

user_id

Required for logged-on users

The user ID.

user_id_type

Optional

The user registration type, including App registered account, phone number, WeChat account, and others.

device_id

Optional

The device ID.

gender

Optional

The gender.

age/birthday

Optional

The age or date of birth.

purchasing

Optional

The purchasing power, obtained through statistical analysis or modeling based on historical data.

country

Optional

The country.

province

Optional

The province.

city

Optional

The city.

register_time

Optional

The registration timestamp.

Unit: seconds. Example: 1520017038.

education

Optional

The educational background of the user.

career

Optional

The occupation of the user.

last_login_time

Optional

The last logon timestamp.

Unit: seconds. Example: 1520017038.

source

Optional

The source of the user, such as Toutiao and WeChat.

content

Optional

The user description.

tags

Optional

The description of the user tag, such as football, fitness, and outdoor activities.

Item table

We recommend that the item table contains information about all the items in your system. We recommend that you create a partition every day and synchronize daily data of all items to the partition.

Field

Required

Description

item_id

Yes

The unique ID of the item.

item_type

Optional

The type of the item.

source_id

Optional

The source of the item. For guided e-commerce, fill in the source platform of the item.

Example: Taobao, Tmall, and JD.com.

title

Optional. We recommend that you specify this parameter.

The title of the item. This field is used for in-depth semantic analysis. If this field is left empty, the recommendation algorithm may not function as expected.

sub_title

Optional

The subtitle of the item.

pub_time

Required

The timestamp when the item was published.

Unit: seconds.

expire_time

Optional

The timestamp when the content expires.

Unit: seconds.

category_level

Optional. We recommend that you specify this parameter.

The number of category levels.

cate_id_path

Optional. We recommend that you specify this parameter.

The full category path composed of IDs. A category path can contain multiple categories. You must separate the categories with underscores (_).

cate_name_path

Optional. We recommend that you specify this parameter.

The full category path composed of names. A category path can contain multiple categories. You must separate the categories with underscores (_).

cate1_id

Optional. We recommend that you specify this parameter.

The ID of the level-1 category. The category hierarchy tree must follow the Mutually Exclusive, Collectively Exhaustive (MECE) principle. All categories in the tree do not semantically overlap.

cate2_id

Optional. We recommend that you specify this parameter.

The ID of the level-2 category. The category hierarchy tree must follow the MECE principle. All categories in the tree do not semantically overlap.

cate_id

Optional. We recommend that you specify this parameter.

The ID of the last-level leaf category in the category hierarchy tree.

cate1_name

Optional. We recommend that you specify this parameter.

The name of the level-1 category.

cate2_name

Optional. We recommend that you specify this parameter.

The name of the level-2 category.

cate_name

Optional. We recommend that you specify this parameter.

The name of the leaf category.

brand_id

Optional. We recommend that you specify this parameter.

The brand ID.

shop_id

Optional

The store ID.

description

Optional

The item details.

price

Required

The actual sales price of the item.

origin_price

Optional

The original price of the item.

discount

Optional

The discount (price/origin_price).

tags

Optional

The tags that are attached to the item by business operation personnel, such as the ID of a promotion activity.

color

Optional

The color category.

properties

Optional. We recommend that you specify this parameter.

The item properties specified by the merchant in JSON format.

Example: {"material": "cotton", "style": "commuting"}.

postage

Optional

The shipping cost. The value is 0 for free shipping items.

image_url

Optional

The URL of the item image. The URL can be used to download the item image over the Internet.

video_url

Optional

The URL of the item video. The URL can be used to download the item video over the Internet.

shop_dsr

Optional

The detailed seller ratings.

spu_id

Optional. We recommend that you specify this parameter.

The ID of the standard product unit.

sku_id

Optional

The ID of the stock keeping unit.

prov

Optional

The province where the item is located.

city

Optional

The city where the item is located.

rate

Optional

The positive feedback rating.

Behavior table

The behavior table contains the behavioral data of the app or the behavioral data in specific scenarios in the most recent period. We recommend that the most recent period is at least 30 to 60 days.

To obtain a comprehensive view of user behaviors, we recommend that you report user behaviors across the entire site. This includes not only collecting data from recommendation scenarios (home_feed) but also gathering information from popular scenarios (hot_items) and search scenarios (search) regarding exposures, clicks, and other behaviors. In search scenarios, the search terms are recorded.

Field

Required

Description

user_id

Required for logged-on users

The ID of the user.

device_id

Optional

The device ID of the user.

item_id

Required

The item ID

item_type

Optional

The type of the item.

event

Required

The behavior type, including exposure, stay, click, and rating.

event_time

Required

The behavior timestamp.

Unit: seconds.

event_value

Optional

The behavior value, including stay duration, number of items purchased, and purchase amount.

request_id

Optional. We recommend that you specify this parameter.

The ID of the request, which is the unique identifier of each recommendation request.

If you leave this field empty, the accuracy of samples is affected and real-time features cannot be added. This field is optional when you create a recommendation solution, and you must configure this field, modify the training sample code, prepare training samples again, and then perform model training after you create the recommendation solution.

exp_id

Optional. We recommend that you specify this parameter.

The ID of the experiment bucket. The experiment ID returned by the PAI-Rec recommendation interface. If the result is not recommended by PAI-Rec, set this parameter to default or other values.

request_info

Optional

The request tracking information. The information is returned when the Recommend API operation is called. You need to only put the information in logs.

scene

Required

The ID of a scenario.

For example, home_feed indicates the home feed. hot_items indicates popular scenarios, which is required not only in recommendation scenarios. search indicates the search scenario. If you set this field to search, you must configure the query field.

query

Optional

The search term.

page

Optional

The page ID.

source_page

Optional

The previous page used to collect the effect based on different sources.

position

Optional

The position of the item, which is the position in the recommendation list.

app_version

Optional

The version of the app.

net_type

Optional

The network type.

Example: 3G, 4G, 5G, and Wi-Fi.

ip

Optional

The client IP information used to extract features such as country and city.

login

Optional

Specifies whether the user has logged on.

device_platform

Optional

The client platform.

Example: iOS, Android, H5, and Msite.

device_system

Optional

The operating system of the device.

Example: iOS, Android, and PC.

device_model

Optional

The device model.

device_brand

Optional

The brand or manufacturer of the device.

longitude

Optional

The longitude of the location.

latitude

Optional

The latitude of the location.

country

Optional

The country.

province

Optional

The province.

city

Optional. We recommend that you specify this parameter.

The city.

Behavior type table

The following table describes several common behavior types in the e-commerce recommendation scenarios.

Event

Event value

Description

expose

Leave this field empty.

The behavior to expose an item.

click

Leave this field empty.

The behavior to click an item.

like

Leave this field empty.

The behavior to like an item.

unlike

Leave this field empty.

The behavior to dislike an item.

comment

The comment content.

The behavior to comment an item.

The review content can be used to mine the shopping experience of the user and item quality.

collect

Leave this field empty.

The behavior to add an item to favorites.

stay

A duration.

The behavior to stay on an item.

The unit is not limited. Note that the unit is the same for all data entries.

cart

Number of items,unit price. The item number is separated from the price with a comma (,).

Example: 1,10000.

The behavior to add an item to the shopping cart.

Unit price: USD, accurate to the cent.

buy

Number of items,unit price. The item number is separated from the price with a comma (,).

Example: 1,10000.

The behavior to purchase an item.

Unit price: USD, accurate to the cent. One buy behavior entry corresponds only to one item ID specified by the item_id field. If an order contains multiple item IDs, you must split them.

evaluate

Discrete integers in ascending or descending order.

The behavior to evaluate an item.

For example, if star rating is used, you can use integers 1 to 5 in ascending order for positive reviews. You need to make sure that an ascending order is positively correlated to the trend of positive reviews.

dislike

The behavior to provide negative feedback on an item.

Content recommendation scenario

The following table lists the recommended fields for the user table, item table, and behavior table in the content recommendation scenario. If there are other feature fields not covered, you can add them yourself. The more complete and rich the fields are, the better the recommendation effect will be. The field names do not need to match exactly those in the table.

User table

We recommend that the user table contains information about all users registered in your system. We recommend that you create a partition every day and synchronize daily data of all users to the partition.

Field

Required

Description

user_id

Required for logged-on users

The ID of the user.

device_id

Optional

The device ID.

register_time

Optional. We recommend that you specify this parameter.

The registration time.

Unit: seconds. Example: 1520017038.

gender

Optional

The gender.

age

Optional

The age.

country

Optional

The country.

province

Optional

The province.

city

Optional

The city.

ip

Optional

The IP address of the last logon.

education

Optional

The educational background of the user.

career

Optional

The occupation of the user.

item_cnt

Optional

The number of content pieces that the user created from the time when the account was registered.

favorite_cnt

Optional

The number of favorites.

follow_cnt

Optional

The number of users that the user follows.

follower_cnt

Optional

The number of followers.

last_login_time

Optional

The last logon time.

tags

Optional

The tags of the user.

Example: football, fitness, and outdoor activities.

Item table

We recommend that the item table contains information about all the content in your system. We recommend that you create a partition every day and synchronize daily data of all the content to the partition.

Field

Required

Description

item_id

Required

The content ID.

item_type

Required for multiple content types

The content type, such as article and video.

status

Required

Indicates whether the item can be recommended.

duration

Optional (Required for videos)

The video duration.

pub_time

Required

The time when the item is published.

title

Optional. We recommend that you specify this parameter.

The title.

category

Optional. We recommend that you specify this parameter.

The level-1 tag.

tags

Optional

The tags. You can configure multiple tags and separate them with semicolons (;).

author

Optional. We recommend that you specify this parameter.

The author.

abstract

Optional

The abstract of the content.

content

Optional

The body part of the content.

image_url

Optional

The image URL used for extracting image features.

video_url

Optional

The video URL used for extracting video features.

pv_count

Optional

The total number of times the content is exposed.

click_count

Optional

The total number of times the content is clicked.

praise_count

Optional

The total number of times the content is liked.

comment_count

Optional

The total number of times the content is commented.

collect_count

Optional

The total number of times the content is added to favorites.

share_count

Optional

The total number of times the content is shared.

download_count

Optional

The total number of times the content is downloaded.

tip_count

Optional

The total number of times the content is rewarded.

Behavior table

The behavior table contains the behavioral data of the app or the behavioral data in specific scenarios in the most recent period. We recommend that the most recent period is at least 30 to 60 days.

Field

Required

Description

user_id

Required for logged-on users

The unique ID of the user.

device_id

Optional

The device ID of the user.

item_id

Required

The content ID.

item_type

Required for multiple content types

The content type, such as article and video.

request_id

Optional

The ID of the request, which is the unique identifier of each recommendation.

If you leave this field empty, the accuracy of samples is affected and real-time features cannot be added. This field is optional when you create a recommendation solution, and you must configure this field, modify the training sample code, prepare training samples again, and then perform model training after you create the recommendation solution.

request_info

Optional. We recommend that you specify this parameter.

The request tracking information, such as retrieval ID.

exp_id

Required

The experiment ID returned by the PAI-Rec recommendation interface. If the result is not recommended by PAI-Rec, set this parameter to default or other values.

scene

Required for multiple scenarios

The scenario.

page

Optional (Recommended for multiple pages)

The page.

source_page

Optional (Recommended for multiple pages)

The previous page.

position

Optional

The position of the content.

event

Required

The behavior type, such as exposure, stay, click, and rating.

event_time

Required

The time when the behavior occurred.

playtime

Optional. We recommend that you specify this parameter.

The playback duration or the reading duration.

Unit: seconds.

comment

Optional

The content of the comment.

net_type

Optional

The network type.

device_platform

Optional. We recommend that you specify this parameter.

The client platform.

device_brand

Optional

The client brand.

device_model

Optional

The client model.

device_system

Optional

The client operating system.

app_version

Optional

The version of the app.

longitude

Optional

The longitude of the location.

latitude

Optional

The latitude of the location.

country

Optional

The country.

province

Optional

The province.

city

Optional. We recommend that you specify this parameter.

The city.

ip

Optional

The IP address of the last logon.

Video recommendation

The following tables describe recommended fields in user, item, and behavior tables in video recommendation scenarios. Configuring more fields will give you better recommendation results. You can also provide additional fields that are not listed in the following tables to further improve the results. The names of fields do not need to be the same as the ones in the following tables.

User table

Field

Required

Description

user_id

Required

The ID of the user.

age

Optional

The age of the user, which can be segmented.

User age can be categorized into segments, such as 0 to 12, 12 to 18, 18 to 24, and 25 to 34, and converted from numerical features into categorical features by discretization.

gender

Optional

The gender of the user.

For example, male, female, and other genders can be used as categorical features. You can also use integers 0, 1, and 2 to indicate the gender of a user.

occupation

Optional

The occupation of the user.

For example, student, teacher, engineer, and other occupations can be used as categorical features.

education

Optional

The educational background of the user.

For example, senior high school, undergraduate, and master can be used as categorical features.

income

Optional

The income level of the user.

For example, low, medium, and high income levels can be used as categorical features.

user_level

Optional

The level or membership level of the user on the platform.

register_time

Optional

The time when the user registers the account. Unit: seconds. The time can be used as numerical features after being segmented by year, month, and day. It can be converted into categorical features after discretization.

country

Optional

The country in which the user is located, which can be used as a categorical feature.

province

Optional

The province in which the user is located, which can be used as a categorical feature.

city

Optional

The city in which the user is located, which can be used as a categorical feature.

active_time

Optional

The period of time during which the user is active on the platform.

For example, the morning, afternoon, evening, and other periods of time can be used as categorical features.

device_type

Optional

The type of the device used by the user.

For example, PC, mobile phone, tablet, and other devices can be used as categorical features.

os

Optional

The operating system of the user device.

For example, iOS, Android, Windows, and other operating systems can be used as categorical features.

browser

Optional

The type of the browser used by the user.

For example, Google Chrome, Firefox, Safari, and other browsers can be used as categorical features.

language

Optional

The language preferred by the user.

For example, English, Chinese, Spanish, and other languages can be used as categorical features.

interests

Optional

The interests of the user.

For example, sports, music, travel, and other interests can be used as tag features.

Item table

Field

Required

Description

item_id

Required

The ID of the item, which is the unique identifier of the video.

category

Optional

The main category to which the video belongs, which can be used as a categorical feature.

leaf_category

Optional

The sub-category to which the video belongs, which can be used as a categorical feature.

brand

Optional

The brand or producer of the video, which can be used as a categorical feature.

video_type

Optional

The type of the video.

For example, movie, TV series, documentary, short film, and other types can be used as categorical features.

duration

Optional

The duration of the video. The duration of the video can be discretized into the following categories: shorter than 10 minutes, 10 to 30 minutes, and longer than 30 minutes. These categories can be used as categorical features.

title

Optional

The title of the video.

series_name

Optional

The series title of the video,

such as Journey to the West.

series_total_number

Optional

The total number of episodes for the video series.

series_number

Optional

The current episode number of the video series.

For example, 1 indicates the first episode.

release_date

Optional

The release date of the video. The release date can be used as a numerical feature.

Unit: seconds.

director

Optional

The director of the video.

actors

Optional

The main actors of the video, which are separated with commas (,). Multiple values can be used as tag features.

rating

Optional

The rating of the video.

For example, IMDb, Douban, and other ratings can be used as numerical features.

language

Optional

The original language of the video.

For example, English, Chinese, Japanese, and other languages can be used as categorical features.

has_subtitle

Optional

Specifies whether the subtitle service is provided.

region

Optional

The region in which the video is produced.

For example, Hollywood, Bollywood, Chinese mainland, and other regions can be used as categorical features.

tags

Optional

The tag of the video, such as comedy, action, love, and other tags. Multiple values can be used as tag features.

Behavior table

To obtain all types of user behaviors, we recommend that you collect user behaviors such as exposure and click from the full stack, including the recommendation, hot items, and search scenarios. In search scenarios, search queries are recorded.

User clicks and viewing behaviors in non-recommendation scenarios can also serve as sources of insights into user preferences.

Field

Required

Description

request_id

Optional

The ID of the request, which is the unique ID of each recommendation request. The absence of the request_id field affects the accuracy of the sample and addition of real-time features. New recommendation scenarios do not require the request_id field. However, after you create a recommendation scenario, you must add the request_id field and modify the code of the training sample before model training.

user_id

Required

The ID of the user, which is the unique identifier of the user.

item_id

Required

The ID of the item, which is the unique identifier of the video.

event

Required

The behavior the user performs on the video.

For example, exposure, click, like, and other types of behaviors can be used as categorical features.

event_value

Required

If you set the event field to watch, this field indicates the watch duration in seconds.

timestamp

Required

The time when the user performs the behavior. Unit: seconds. The time can be segmented by hour, day of the week, or holiday and used as categorical features.

scene

Required

The scenario.

home_feed indicates homepage recommendation. hot_items indicates popular items. Note that this field is required in all scenarios.

search indicates the search scenario in which you must configure the query field.

query

Optional

The search term.

device_type

Optional

The type of device used by the user.

For example, PC, mobile phone, tablet, and other devices can be used as categorical features.

browser

Optional

The type of the browser used by the user.

For example, Google Chrome, Firefox, Safari, and other browsers can be used as categorical features.

mobile_brand

Optional

The brand of the mobile phone used by the user, which can be used as a categorical feature.

os

Optional

The operating system of the user device.

For example, iOS, Android, Windows, and other operating systems can be used as categorical features.

ip

Optional

The IP address of the user, which can be used to position the province and city of the user and can be used as a categorical feature.

rating

Optional

The average user rating on the video.

For example, the video scores 8.5 of 10.

weather

Optional

The weather condition of the region in which the user lives.

For example, sunny, rainy, snowy, and other weather conditions can be used as categorical features.

holiday

Optional

Specifies whether the user behavior takes place during a holiday.

For example, Spring Festival, National Day, and other holidays can be used as categorical features.

season

Optional

The season.

For example, spring, summer, autumn, and winter can be used as categorical features.

longitude

Optional

The longitude of the location of the user, which can be used as a numerical feature, and can be used as a categorical feature after discretization.

latitude

Optional

The latitude of the location of the user, which can be used as a numerical feature, and can be used as a categorical feature after discretization.

Live streaming recommendation

The following tables describe recommended fields in user, item, and behavior tables in live streaming recommendation scenarios. Configuring more fields will give you better recommendation results. You can also append fields that are not listed in the following tables to further improve the results. The names of fields do not need to be the same as the ones in the following tables.

User table

Field

Required

Description

user_id

Required

The user ID.

age

Optional

The age of the user, which can be segmented.

User age can be categorized into segments, such as 0 to 12, 12 to 18, 18 to 24, and 25 to 34, and converted from numerical features into categorical features by discretization.

gender

Optional

The gender of the user.

For example, male, female, and other genders can be used as categorical features.

occupation

Optional

The occupation of the user, which can be used as a categorical feature.

education

Optional

The educational background of the user.

For example, senior high school, undergraduate, and master can be used as categorical features.

income

Optional

The income level of the user.

For example, low, medium, and high income levels can be used as categorical features.

user_level

Optional

The level or membership level of the user on the platform.

register_time

Optional

The time when the user registers the account. Unit: seconds. The time can be used as numerical features after being segmented by year, month, and day. It can be converted into categorical features after discretization.

country

Optional

The country in which the user is located, which can be used as a categorical feature.

province

Optional

The province in which the user is located, which can be used as a categorical feature.

city

Optional

The city in which the user is located, which can be used as a categorical feature.

active_time

Optional

The period of time during which the user is active on the platform.

For example, the morning, afternoon, evening, and other periods of time can be used as categorical features.

device_type

Optional

The type of device used by the user.

For example, PC, mobile phone, tablet, and other devices can be used as categorical features.

os

Optional

The operating system of the user device, which can be used as a categorical feature.

browser

Optional

The type of the browser used by the user, which can be used as a categorical feature.

language

Optional

The language preferred by the user, which can be used as a categorical feature.

interests

Optional

The interests of the user, which can be used as tag features.

Item table

Field

Required

Field description

anchor_id

Required

The ID of the item, which is the unique identifier of the streamer.

name

Optional

The name of the streamer.

nickname

Optional

The nickname of the streamer, which is usually displayed on the live streaming page.

anchor_gender

Optional

The gender of the user.

For example, male, female, and other genders can be used as categorical features.

language

Optional

The language used by the streamer during live streaming.

The language can be Chinese, English, Japanese, or other languages.

level

Optional

The level of the streamer on the platform.

category

Optional

The main category to which the streamer belongs.

For example, talent show, game commentary, and other categories can be used as categorical features.

leaf_category

Optional

The sub-category to which the streamer belongs, which can be used as a categorical feature.

rating

Optional

The overall evaluation score for the streamer, which can be converted into positive comments, neutral comments, and negative comments. The comments are used as categorical features.

status

Optional

The status of the streamer, such as whether the streamer is live streaming.

review_count

Optional

The total number of comments on the streamer.

video_type

Optional

The specific form of live streaming,

such as live broadcast or recorded playback.

duration

Optional

The duration of a single live stream.

release_date

Optional

The exact date when live streaming begins.

director

Optional

The name of the director for a prerecorded program. Leave this field empty if there is none.

actors

Optional

The list of live streaming guests.

subtitles

Optional

Specifies whether the subtitle service is provided.

region

Optional

The region in which the streamer is located.

tags

Optional

The list of keywords related to the live streaming topic.

follow_count

Optional

The number of followers the streamer has.

Behavior table

Field

Required

Field description

request_id

Optional

The ID of the request, which is the unique identifier of the recommendation each time. The absence of the request_id field affects the accuracy of the sample and the addition of real-time features. New recommendation scenarios do not require the request_id field. However, after you create a recommendation scenario, you must add the request_id field and modify the code of the training sample before model training.

user_id

Required

The ID of the user who performs a specific behavior.

item_id

Required

The ID of the item, which is affected by user interactions.

event

Required

The user behavior on video content, such as exposure, click, like, gift, and comment.

event_value

Required

If the value of the event field is gift, this field indicates the specific amount of a gift.

If the value of the event field is like, this field indicates the number of likes.

event_time

Optional

The time when the behavior takes place, accurate to the second.

ip

Optional

The IP address of the user, which can be used to find the city and province where the user lives.

rating

Optional

The star rating or other forms of feedback users give to the streamer.

scene

Optional

The access portal, such as homepage or search page.

device_type

Optional

The device used by the user during interaction.

browser

Optional

The browser used by the user for access.

mobile_brand

Optional

The brand of the mobile device used by the user.

os

Optional

The operating system of the user device.

weather

Optional

The real-time weather conditions obtained based on IP positioning.

holiday

Optional

Specifies whether the user behavior takes place during a holiday.

season

Optional

The season.

longitude

Optional

The longitude of the location of the user.

latitude

Optional

The latitude of the location of the user.