You can use FeatureStore to generate features in Platform for AI (PAI). FeatureStore standardizes commonly used generation operations. You need to only perform a few configurations to generate features. This efficiently simplifies feature generation. Feature generation is widely used in various scenarios, such as recommendation, advertising, risk control, and machine learning. This topic describes how to use source tables to generate features and sample tables in the recommendation scenario.
Prerequisites
Make sure that preparations described in the following table are complete.
Service | Operation |
PAI |
|
MaxCompute |
|
DataWorks |
|
1. Make preparations
Prepare source tables
In recommendation scenarios, feature generation requires the following three source tables. These tables can be used to generate hundreds of features. This facilitates model building to align with the objectives.
rec_sln_demo_user_table_preprocess_v1: the user table that contains basic user features, such as gender, age, city, and number of followers.
rec_sln_demo_item_table_preprocess_v1: the item table that contains basic item features, such as category, author, cumulative number of clicks, and cumulative number of praises.
rec_sln_demo_behavior_table_preprocess_v1: the behavior table that contains behavior features. For example, a user clicks an item at a specific point in time.
Data tables are stored in the pai_online_project workspace which is visible to all users. The data tables store only simulation data. You must execute SQL statements in DataWorks to synchronize data in the preceding tables from the pai_online_project workspace to your MaxCompute project. Perform the following steps:
Log on to the DataWorks console.
In the navigation pane on the left, click Data Development and O&M > Data Development.
Select the DataWorks workspace that you created and click Go to Data Studio.
Hover over Create, and choose Create Node > MaxCompute > ODPS SQL. In the page that appears, configure the node parameters.
Parameter
Suggested value
Node Type
ODPS SQL
Path
Business Flow/Workflow/MaxCompute
Name
Enter a custom name.
Click Confirm.
In the SQL editor of the new node, execute the following SQL statements to synchronize data in the user table, item table, and behavior table from the pai_online_project workspace to your MaxCompute project. Select an existing exclusive resource group to synchronize data.
Execute the following SQL statements to synchronize data in the user table rec_sln_demo_user_table_preprocess_v1:
CREATE TABLE IF NOT EXISTS rec_sln_demo_user_table_preprocess_v1 like pai_online_project.rec_sln_demo_user_table_preprocess_v1 STORED AS ALIORC LIFECYCLE 90; INSERT OVERWRITE TABLE rec_sln_demo_user_table_preprocess_v1 PARTITION(ds) SELECT * FROM pai_online_project.rec_sln_demo_user_table_preprocess_v1 WHERE ds >= '20240530' and ds <='20240605';Execute the following SQL statements to synchronize data in the item table rec_sln_demo_item_table_preprocess_v1:
CREATE TABLE IF NOT EXISTS rec_sln_demo_item_table_preprocess_v1 like pai_online_project.rec_sln_demo_item_table_preprocess_v1 STORED AS ALIORC LIFECYCLE 90; INSERT OVERWRITE TABLE rec_sln_demo_item_table_preprocess_v1 PARTITION(ds) SELECT * FROM pai_online_project.rec_sln_demo_item_table_preprocess_v1 WHERE ds >= '20240530' and ds <='20240605';Execute the following SQL statements to synchronize data in the behavior table rec_sln_demo_behavior_table_preprocess_v1:
CREATE TABLE IF NOT EXISTS rec_sln_demo_behavior_table_preprocess_v1 like pai_online_project.rec_sln_demo_behavior_table_preprocess_v1 STORED AS ALIORC LIFECYCLE 90; INSERT OVERWRITE TABLE rec_sln_demo_behavior_table_preprocess_v1 PARTITION(ds) SELECT * FROM pai_online_project.rec_sln_demo_behavior_table_preprocess_v1 WHERE ds >= '20240530' and ds <='20240605';
Install FeatureStore SDK for Python
We recommend that you use Jupyter Notebook to run the following code.
Install FeatureStore SDK for Python in the Python3 environment.
%pip install https://feature-store-py.oss-cn-beijing.aliyuncs.com/package/feature_store_py-2.0.2-py3-none-any.whlImport the required functional modules.
import os from feature_store_py import FeatureStoreClient from feature_store_py.fs_datasource import MaxComputeDataSource from feature_store_py.feature_engineering import TableTransform, Condition, DayOf, ComboTransform, Feature, AggregationTransform, auto_count_feature_transform, WindowTransform, auto_window_feature_transform
2. Transform source tables
After you complete the preparations, you can view the user table rec_sln_demo_user_table_preprocess_v1, item table rec_sln_demo_item_table_preprocess_v1, and behavior table rec_sln_demo_behavior_table_preprocess_v1 in your workspace.
Before you generate features by using the tables, preprocess the tables to facilitate feature generation. You can execute the following SQL statements by using the ODPS SQL node that you created to preprocess the tables. For more information, see Prepare source tables.
Merge the user table, item table, and behavior table.
CREATE TABLE IF NOT EXISTS rec_sln_demo_behavior_table_preprocess_wide_v1 ( request_id bigint ,user_id string ,page string ,net_type string ,day_h bigint COMMENT 'The hour at which the behavior occurred.' ,week_day bigint COMMENT 'The day of the week on which the behavior occurred.' ,event_unix_time bigint ,item_id string ,event string ,playtime double ,gender string ,age bigint ,city string ,item_cnt bigint ,follow_cnt bigint ,follower_cnt bigint ,is_new_user bigint ,tags string ,duration double ,category string ,author bigint ,click_count bigint ,praise_count bigint ,is_new_item bigint ) PARTITIONED BY ( ds string ) LIFECYCLE 90 ; INSERT OVERWRITE TABLE rec_sln_demo_behavior_table_preprocess_wide_v1 PARTITION(ds='${bdp.system.bizdate}') SELECT sq0.request_id ,sq0.user_id ,sq0.page ,sq0.net_type ,sq0.day_h ,sq0.week_day ,sq0.event_unix_time ,sq0.item_id ,sq0.event ,sq0.playtime ,sq1.gender ,sq1.age ,sq1.city ,sq1.item_cnt ,sq1.follow_cnt ,sq1.follower_cnt ,sq1.is_new_user ,sq1.tags ,sq2.duration ,sq2.category ,sq2.author ,sq2.click_count ,sq2.praise_count ,sq2.is_new_item FROM ( SELECT * FROM rec_sln_demo_behavior_table_preprocess_v1 WHERE ds = '${bdp.system.bizdate}' ) sq0 LEFT JOIN ( SELECT * FROM rec_sln_demo_user_table_preprocess_v1 WHERE ds = '${bdp.system.bizdate}' ) sq1 ON sq0.user_id = sq1.user_id LEFT JOIN ( SELECT * FROM rec_sln_demo_item_table_preprocess_v1 WHERE ds = '${bdp.system.bizdate}' ) sq2 ON sq0.item_id = sq2.item_id ;Preprocess the behavior table to transform it into a label table. Labels are required for model training. In this example, click, playtime, and praise are used as labels.
CREATE TABLE IF NOT EXISTS rec_sln_demo_fs_label_table_v1 ( request_id bigint ,user_id string ,page string ,net_type string ,day_h bigint COMMENT 'The hour at which the behavior occurred.' ,week_day bigint COMMENT 'The day of the week on which the behavior occurred.' ,day_min string ,event_unix_time bigint ,item_id string ,playtime double ,is_click BIGINT ,ln_playtime DOUBLE ,is_praise BIGINT ) PARTITIONED BY ( ds string ) LIFECYCLE 90 ; INSERT OVERWRITE TABLE rec_sln_demo_fs_label_table_v1 PARTITION(ds='${bdp.system.bizdate}') SELECT request_id ,user_id ,MAX(page) page ,MAX(net_type) net_type ,MAX(day_h) day_h ,MAX(week_day) week_day ,TO_CHAR(FROM_UNIXTIME(MIN(event_unix_time)),'yyyymmddhhmi') day_min ,MAX(event_unix_time) event_unix_time ,item_id ,MAX(playtime) playtime ,max(if(event='click', 1, 0)) is_click ,ln(sum(playtime) + 1) ln_playtime ,max(if(event='praise', 1, 0)) is_praise FROM rec_sln_demo_behavior_table_preprocess_v1 WHERE ds = '${bdp.system.bizdate}' GROUP BY request_id ,user_id ,item_id ;
After you execute the preceding SQL statements, you can obtain the following tables:
rec_sln_demo_behavior_table_preprocess_wide_v1: the new behavior table that is used to generate features.
rec_sln_demo_fs_label_table_v1: the label table that is used to generate a sample table.
3. Generate features
You can call the automatic extension functions to generate features. Hundreds of features can be generated by using only a few lines of code.
Feature generation requires generation of user features and item features. Perform the following steps:
Generate user features
We recommend that you use Jupyter Notebook to run the following code.
Initialize the client.
access_key_id=os.environ.get ("ALIBABA_CLOUD_ACCESS_KEY_ID") # Enter your AccessKey ID. access_key_secret=os.environ.get ("ALIBABA_CLOUD_ACCESS_KEY_SECRET") # Enter your AccessKey secret. project='project_name' # Enter your project name. region='cn-hangzhou' # Enter the region in which your project resides. For example, if your project resides in China (Hangzhou), enter cn-hangzhou. fs_client = FeatureStoreClient(access_key_id=access_key_id, access_key_secret=access_key_secret, region=region)Specify source tables on which you want to perform feature transformation.
input_bhv_table_name = "rec_sln_demo_behavior_table_preprocess_wide_v1" ds_bhv = MaxComputeDataSource(table=input_bhv_table_name, project=project) input_user_table_name = "rec_sln_demo_user_table_preprocess_v1" ds_user = MaxComputeDataSource(table=input_user_table_name, project=project) input_item_table_name = "rec_sln_demo_item_table_preprocess_v1" ds_item = MaxComputeDataSource(table=input_item_table_name, project=project)Specify a name for the user feature table to be generated by performing the JoinTransform and AggregationTransform operations.
agg_user_table_v1 = 'rec_sln_demo_user_table_preprocess_agg_v1'Implement automatic extension for statistical features by using automatic extension functions.
name_prefix = "user" input_list = ["playtime", "duration", "click_count", "praise_count"] event_name = 'event' event_type = 'expr' group_by_key = "user_id" window_size = [3,7,15] user_count_feature_list = auto_count_feature_transform(name_prefix, input_list, event_name, event_type, group_by_key, window_size) print("len_count_feature_list = ", len(user_count_feature_list)) print("count_feature_list = ", user_count_feature_list)Run a pipeline and view the table generation result.
Create a pipeline.
agg_user_bhv_pipeline = fs_client.create_pipeline(ds_bhv).add_feature_transform(user_count_feature_list) agg_user_pipeline =fs_client.create_pipeline(ds_user, agg_user_table_v1).merge(agg_user_bhv_pipeline, keep_input_columns=False)Run the pipeline.
execute_date = '20240605' output_agg_user_table = agg_user_pipeline.execute(execute_date, drop_table=True)View the table generation result.
agg_user_ret = output_agg_user_table.to_pandas(execute_date, limit=20) agg_user_ret
Specify a name for the user feature table to be generated by performing the WindowTransform operation.
win_user_table_v1 = 'rec_sln_demo_user_table_preprocess_win_v1'Use built-in automatic extension functions to perform the WindowTransform operation.
name_prefix = 'user' input_list = ['day_h', 'category'] agg_field = ['duration', 'click_count'] event_name = 'event' event_type = 'expr' group_by_key = 'user_id' window_size = [7, 15, 30, 45] user_win_feature_list = auto_window_feature_transform(name_prefix, input_list, agg_field, event_name, event_type, group_by_key, window_size) print("len_user_win_feature_list = ", len(user_win_feature_list)) print("user_win_feature_list = ", user_win_feature_list)Run a pipeline and view the table generation result.
Create a pipeline.
win_user_bhv_pipeline = fs_client.create_pipeline(ds_bhv).add_feature_transform(user_win_feature_list) win_user_pipeline = fs_client.create_pipeline(ds_user, win_user_table_v1).merge(win_user_bhv_pipeline, keep_input_columns=False)Run the pipeline.
execute_date = '20240605' output_win_user_table = win_user_pipeline.execute(execute_date, drop_table=True) # For more information, see the related documentation. The operation involves intermediate tables. The first time you run the code, you must backfill data, which may require a long period of time. You can run the following code to backfill data: # output_win_user_table = win_user_pipeline.execute(execute_date, drop_table=True, backfill_partitions=True)View the table generation result.
win_user_ret = output_win_user_table.to_pandas(execute_date, limit=20) win_user_ret
Generate item features
After you generate user features, you can generate item features based on the item type. Perform the following steps:
Specify a name for the item feature table to be generated by performing the AggregationTransform operation.
agg_item_table_v1 = 'rec_sln_demo_item_table_preprocess_agg_v1'Implement automatic extension for statistical features by using automatic extension functions.
name_prefix = "item" input_list = ["item_cnt", "follow_cnt", "follower_cnt"] event_name = 'event' event_type = 'expr' group_by_key = "item_id" window_size = [3,7,15] item_count_feature_list = auto_count_feature_transform(name_prefix, input_list, event_name, event_type, group_by_key, window_size) print("len_count_feature_list = ", len(item_count_feature_list)) print("count_feature_list = ", item_count_feature_list)Run a pipeline and view the table generation result.
Create a pipeline.
agg_item_bhv_pipeline =fs_client.create_pipeline(ds_bhv).add_feature_transform(item_count_feature_list) agg_item_pipeline =fs_client.create_pipeline(ds_item, agg_item_table_v1).merge(agg_item_bhv_pipeline, keep_input_columns=False)Run the pipeline.
execute_date = '20240605' output_agg_item_table = agg_item_pipeline.execute(execute_date, drop_table=True)View the table generation result.
agg_item_ret = output_agg_item_table.to_pandas(execute_date, limit=20) agg_item_ret
Specify a name for the item feature table to be generated by performing the WindowTransform operation.
win_item_table_v1 = 'rec_sln_demo_item_table_preprocess_win_v1'Use built-in automatic extension functions to perform the WindowTransform operation.
name_prefix = 'item' input_list = ['day_h', 'category'] agg_field = ['click_count', 'praise_count'] event_name = 'event' event_type = 'expr' group_by_key = 'item_id' window_size = [7, 15, 30, 45] item_win_feature_list = auto_window_feature_transform(name_prefix, input_list, agg_field, event_name, event_type, group_by_key, window_size) print("len_item_win_feature_list = ", len(item_win_feature_list)) print("item_win_feature_list = ", item_win_feature_list)Run a pipeline and view the table generation result.
Create a pipeline.
win_item_bhv_pipeline = fs_client.create_pipeline(ds_bhv).add_feature_transform(item_win_feature_list) win_item_pipeline = fs_client.create_pipeline(ds_item, win_item_table_v1).merge(win_item_bhv_pipeline, keep_input_columns=False)Run the pipeline.
execute_date = '20240605' output_win_item_table = win_item_pipeline.execute(execute_date, drop_table=True) # For more information, see the related documentation. The operation involves intermediate tables. The first time you run the code, you must backfill data, which may require a long period of time. You can run the following code to backfill data: # output_win_item_table = win_item_pipeline.execute(execute_date, drop_table=True, backfill_partitions=True)View the table generation result.
win_item_ret = output_win_item_table.to_pandas(execute_date, limit=20) win_item_ret
4. Generate a sample table
After you perform the preceding operations, you can obtain a label table, three user feature tables, and three item feature tables. The three feature tables include a source table and two feature tables that are generated after feature processing. Then, you can execute the following SQL statements by using the ODPS SQL node that you created to merge the seven tables into a complete sample table. For more information, see Prepare source tables.
CREATE TABLE IF NOT EXISTS fs_demo_fs_engineering_v1_training_set(
request_id BIGINT,
user_id STRING,
page STRING,
net_type STRING,
day_h BIGINT,
week_day BIGINT,
day_min STRING,
event_unix_time BIGINT,
item_id STRING,
playtime DOUBLE,
is_click BIGINT,
ln_playtime DOUBLE,
is_praise BIGINT,
duration DOUBLE,
category STRING,
author BIGINT,
click_count BIGINT,
praise_count BIGINT,
is_new_item BIGINT,
item__sum_item_cnt_3d BIGINT,
item__sum_follow_cnt_3d BIGINT,
item__sum_follower_cnt_3d BIGINT,
item__max_item_cnt_3d BIGINT,
item__max_follow_cnt_3d BIGINT,
item__max_follower_cnt_3d BIGINT,
item__min_item_cnt_3d BIGINT,
item__min_follow_cnt_3d BIGINT,
item__min_follower_cnt_3d BIGINT,
item__avg_item_cnt_3d DOUBLE,
item__avg_follow_cnt_3d DOUBLE,
item__avg_follower_cnt_3d DOUBLE,
item__sum_item_cnt_7d BIGINT,
item__sum_follow_cnt_7d BIGINT,
item__sum_follower_cnt_7d BIGINT,
item__max_item_cnt_7d BIGINT,
item__max_follow_cnt_7d BIGINT,
item__max_follower_cnt_7d BIGINT,
item__min_item_cnt_7d BIGINT,
item__min_follow_cnt_7d BIGINT,
item__min_follower_cnt_7d BIGINT,
item__avg_item_cnt_7d DOUBLE,
item__avg_follow_cnt_7d DOUBLE,
item__avg_follower_cnt_7d DOUBLE,
item__sum_item_cnt_15d BIGINT,
item__sum_follow_cnt_15d BIGINT,
item__sum_follower_cnt_15d BIGINT,
item__max_item_cnt_15d BIGINT,
item__max_follow_cnt_15d BIGINT,
item__max_follower_cnt_15d BIGINT,
item__min_item_cnt_15d BIGINT,
item__min_follow_cnt_15d BIGINT,
item__min_follower_cnt_15d BIGINT,
item__avg_item_cnt_15d DOUBLE,
item__avg_follow_cnt_15d DOUBLE,
item__avg_follower_cnt_15d DOUBLE,
item__kv_day_h_click_count_sum_7d STRING,
item__kv_category_click_count_sum_7d STRING,
item__kv_day_h_praise_count_sum_7d STRING,
item__kv_category_praise_count_sum_7d STRING,
item__kv_day_h_click_count_max_7d STRING,
item__kv_category_click_count_max_7d STRING,
item__kv_day_h_praise_count_max_7d STRING,
item__kv_category_praise_count_max_7d STRING,
item__kv_day_h_click_count_min_7d STRING,
item__kv_category_click_count_min_7d STRING,
item__kv_day_h_praise_count_min_7d STRING,
item__kv_category_praise_count_min_7d STRING,
item__kv_day_h_click_count_avg_7d STRING,
item__kv_category_click_count_avg_7d STRING,
item__kv_day_h_praise_count_avg_7d STRING,
item__kv_category_praise_count_avg_7d STRING,
item__kv_day_h_click_count_sum_15d STRING,
item__kv_category_click_count_sum_15d STRING,
item__kv_day_h_praise_count_sum_15d STRING,
item__kv_category_praise_count_sum_15d STRING,
item__kv_day_h_click_count_max_15d STRING,
item__kv_category_click_count_max_15d STRING,
item__kv_day_h_praise_count_max_15d STRING,
item__kv_category_praise_count_max_15d STRING,
item__kv_day_h_click_count_min_15d STRING,
item__kv_category_click_count_min_15d STRING,
item__kv_day_h_praise_count_min_15d STRING,
item__kv_category_praise_count_min_15d STRING,
item__kv_day_h_click_count_avg_15d STRING,
item__kv_category_click_count_avg_15d STRING,
item__kv_day_h_praise_count_avg_15d STRING,
item__kv_category_praise_count_avg_15d STRING,
item__kv_day_h_click_count_sum_30d STRING,
item__kv_category_click_count_sum_30d STRING,
item__kv_day_h_praise_count_sum_30d STRING,
item__kv_category_praise_count_sum_30d STRING,
item__kv_day_h_click_count_max_30d STRING,
item__kv_category_click_count_max_30d STRING,
item__kv_day_h_praise_count_max_30d STRING,
item__kv_category_praise_count_max_30d STRING,
item__kv_day_h_click_count_min_30d STRING,
item__kv_category_click_count_min_30d STRING,
item__kv_day_h_praise_count_min_30d STRING,
item__kv_category_praise_count_min_30d STRING,
item__kv_day_h_click_count_avg_30d STRING,
item__kv_category_click_count_avg_30d STRING,
item__kv_day_h_praise_count_avg_30d STRING,
item__kv_category_praise_count_avg_30d STRING,
item__kv_day_h_click_count_sum_45d STRING,
item__kv_category_click_count_sum_45d STRING,
item__kv_day_h_praise_count_sum_45d STRING,
item__kv_category_praise_count_sum_45d STRING,
item__kv_day_h_click_count_max_45d STRING,
item__kv_category_click_count_max_45d STRING,
item__kv_day_h_praise_count_max_45d STRING,
item__kv_category_praise_count_max_45d STRING,
item__kv_day_h_click_count_min_45d STRING,
item__kv_category_click_count_min_45d STRING,
item__kv_day_h_praise_count_min_45d STRING,
item__kv_category_praise_count_min_45d STRING,
item__kv_day_h_click_count_avg_45d STRING,
item__kv_category_click_count_avg_45d STRING,
item__kv_day_h_praise_count_avg_45d STRING,
item__kv_category_praise_count_avg_45d STRING,
gender STRING,
age BIGINT,
city STRING,
item_cnt BIGINT,
follow_cnt BIGINT,
follower_cnt BIGINT,
is_new_user BIGINT,
tags STRING,
user__sum_playtime_3d DOUBLE,
user__sum_duration_3d DOUBLE,
user__sum_click_count_3d BIGINT,
user__sum_praise_count_3d BIGINT,
user__max_playtime_3d DOUBLE,
user__max_duration_3d DOUBLE,
user__max_click_count_3d BIGINT,
user__max_praise_count_3d BIGINT,
user__min_playtime_3d DOUBLE,
user__min_duration_3d DOUBLE,
user__min_click_count_3d BIGINT,
user__min_praise_count_3d BIGINT,
user__avg_playtime_3d DOUBLE,
user__avg_duration_3d DOUBLE,
user__avg_click_count_3d DOUBLE,
user__avg_praise_count_3d DOUBLE,
user__sum_playtime_7d DOUBLE,
user__sum_duration_7d DOUBLE,
user__sum_click_count_7d BIGINT,
user__sum_praise_count_7d BIGINT,
user__max_playtime_7d DOUBLE,
user__max_duration_7d DOUBLE,
user__max_click_count_7d BIGINT,
user__max_praise_count_7d BIGINT,
user__min_playtime_7d DOUBLE,
user__min_duration_7d DOUBLE,
user__min_click_count_7d BIGINT,
user__min_praise_count_7d BIGINT,
user__avg_playtime_7d DOUBLE,
user__avg_duration_7d DOUBLE,
user__avg_click_count_7d DOUBLE,
user__avg_praise_count_7d DOUBLE,
user__sum_playtime_15d DOUBLE,
user__sum_duration_15d DOUBLE,
user__sum_click_count_15d BIGINT,
user__sum_praise_count_15d BIGINT,
user__max_playtime_15d DOUBLE,
user__max_duration_15d DOUBLE,
user__max_click_count_15d BIGINT,
user__max_praise_count_15d BIGINT,
user__min_playtime_15d DOUBLE,
user__min_duration_15d DOUBLE,
user__min_click_count_15d BIGINT,
user__min_praise_count_15d BIGINT,
user__avg_playtime_15d DOUBLE,
user__avg_duration_15d DOUBLE,
user__avg_click_count_15d DOUBLE,
user__avg_praise_count_15d DOUBLE,
user__kv_day_h_duration_sum_7d STRING,
user__kv_category_duration_sum_7d STRING,
user__kv_day_h_click_count_sum_7d STRING,
user__kv_category_click_count_sum_7d STRING,
user__kv_day_h_duration_max_7d STRING,
user__kv_category_duration_max_7d STRING,
user__kv_day_h_click_count_max_7d STRING,
user__kv_category_click_count_max_7d STRING,
user__kv_day_h_duration_min_7d STRING,
user__kv_category_duration_min_7d STRING,
user__kv_day_h_click_count_min_7d STRING,
user__kv_category_click_count_min_7d STRING,
user__kv_day_h_duration_avg_7d STRING,
user__kv_category_duration_avg_7d STRING,
user__kv_day_h_click_count_avg_7d STRING,
user__kv_category_click_count_avg_7d STRING,
user__kv_day_h_duration_sum_15d STRING,
user__kv_category_duration_sum_15d STRING,
user__kv_day_h_click_count_sum_15d STRING,
user__kv_category_click_count_sum_15d STRING,
user__kv_day_h_duration_max_15d STRING,
user__kv_category_duration_max_15d STRING,
user__kv_day_h_click_count_max_15d STRING,
user__kv_category_click_count_max_15d STRING,
user__kv_day_h_duration_min_15d STRING,
user__kv_category_duration_min_15d STRING,
user__kv_day_h_click_count_min_15d STRING,
user__kv_category_click_count_min_15d STRING,
user__kv_day_h_duration_avg_15d STRING,
user__kv_category_duration_avg_15d STRING,
user__kv_day_h_click_count_avg_15d STRING,
user__kv_category_click_count_avg_15d STRING,
user__kv_day_h_duration_sum_30d STRING,
user__kv_category_duration_sum_30d STRING,
user__kv_day_h_click_count_sum_30d STRING,
user__kv_category_click_count_sum_30d STRING,
user__kv_day_h_duration_max_30d STRING,
user__kv_category_duration_max_30d STRING,
user__kv_day_h_click_count_max_30d STRING,
user__kv_category_click_count_max_30d STRING,
user__kv_day_h_duration_min_30d STRING,
user__kv_category_duration_min_30d STRING,
user__kv_day_h_click_count_min_30d STRING,
user__kv_category_click_count_min_30d STRING,
user__kv_day_h_duration_avg_30d STRING,
user__kv_category_duration_avg_30d STRING,
user__kv_day_h_click_count_avg_30d STRING,
user__kv_category_click_count_avg_30d STRING,
user__kv_day_h_duration_sum_45d STRING,
user__kv_category_duration_sum_45d STRING,
user__kv_day_h_click_count_sum_45d STRING,
user__kv_category_click_count_sum_45d STRING,
user__kv_day_h_duration_max_45d STRING,
user__kv_category_duration_max_45d STRING,
user__kv_day_h_click_count_max_45d STRING,
user__kv_category_click_count_max_45d STRING,
user__kv_day_h_duration_min_45d STRING,
user__kv_category_duration_min_45d STRING,
user__kv_day_h_click_count_min_45d STRING,
user__kv_category_click_count_min_45d STRING,
user__kv_day_h_duration_avg_45d STRING,
user__kv_category_duration_avg_45d STRING,
user__kv_day_h_click_count_avg_45d STRING,
user__kv_category_click_count_avg_45d STRING
)
PARTITIONED BY (ds STRING)
LIFECYCLE 90;
insert overwrite table fs_demo_fs_engineering_v1_training_set partition (ds = '${bdp.system.bizdate}')
select
sq0.request_id,
sq0.user_id,
sq0.page,
sq0.net_type,
sq0.day_h,
sq0.week_day,
sq0.day_min,
sq0.event_unix_time,
sq0.item_id,
sq0.playtime,
sq0.is_click,
sq0.ln_playtime,
sq0.is_praise,
sq2.duration,
sq2.category,
sq2.author,
sq2.click_count,
sq2.praise_count,
sq2.is_new_item,
sq5.item__sum_item_cnt_3d,
sq5.item__sum_follow_cnt_3d,
sq5.item__sum_follower_cnt_3d,
sq5.item__max_item_cnt_3d,
sq5.item__max_follow_cnt_3d,
sq5.item__max_follower_cnt_3d,
sq5.item__min_item_cnt_3d,
sq5.item__min_follow_cnt_3d,
sq5.item__min_follower_cnt_3d,
sq5.item__avg_item_cnt_3d,
sq5.item__avg_follow_cnt_3d,
sq5.item__avg_follower_cnt_3d,
sq5.item__sum_item_cnt_7d,
sq5.item__sum_follow_cnt_7d,
sq5.item__sum_follower_cnt_7d,
sq5.item__max_item_cnt_7d,
sq5.item__max_follow_cnt_7d,
sq5.item__max_follower_cnt_7d,
sq5.item__min_item_cnt_7d,
sq5.item__min_follow_cnt_7d,
sq5.item__min_follower_cnt_7d,
sq5.item__avg_item_cnt_7d,
sq5.item__avg_follow_cnt_7d,
sq5.item__avg_follower_cnt_7d,
sq5.item__sum_item_cnt_15d,
sq5.item__sum_follow_cnt_15d,
sq5.item__sum_follower_cnt_15d,
sq5.item__max_item_cnt_15d,
sq5.item__max_follow_cnt_15d,
sq5.item__max_follower_cnt_15d,
sq5.item__min_item_cnt_15d,
sq5.item__min_follow_cnt_15d,
sq5.item__min_follower_cnt_15d,
sq5.item__avg_item_cnt_15d,
sq5.item__avg_follow_cnt_15d,
sq5.item__avg_follower_cnt_15d,
sq6.item__kv_day_h_click_count_sum_7d,
sq6.item__kv_category_click_count_sum_7d,
sq6.item__kv_day_h_praise_count_sum_7d,
sq6.item__kv_category_praise_count_sum_7d,
sq6.item__kv_day_h_click_count_max_7d,
sq6.item__kv_category_click_count_max_7d,
sq6.item__kv_day_h_praise_count_max_7d,
sq6.item__kv_category_praise_count_max_7d,
sq6.item__kv_day_h_click_count_min_7d,
sq6.item__kv_category_click_count_min_7d,
sq6.item__kv_day_h_praise_count_min_7d,
sq6.item__kv_category_praise_count_min_7d,
sq6.item__kv_day_h_click_count_avg_7d,
sq6.item__kv_category_click_count_avg_7d,
sq6.item__kv_day_h_praise_count_avg_7d,
sq6.item__kv_category_praise_count_avg_7d,
sq6.item__kv_day_h_click_count_sum_15d,
sq6.item__kv_category_click_count_sum_15d,
sq6.item__kv_day_h_praise_count_sum_15d,
sq6.item__kv_category_praise_count_sum_15d,
sq6.item__kv_day_h_click_count_max_15d,
sq6.item__kv_category_click_count_max_15d,
sq6.item__kv_day_h_praise_count_max_15d,
sq6.item__kv_category_praise_count_max_15d,
sq6.item__kv_day_h_click_count_min_15d,
sq6.item__kv_category_click_count_min_15d,
sq6.item__kv_day_h_praise_count_min_15d,
sq6.item__kv_category_praise_count_min_15d,
sq6.item__kv_day_h_click_count_avg_15d,
sq6.item__kv_category_click_count_avg_15d,
sq6.item__kv_day_h_praise_count_avg_15d,
sq6.item__kv_category_praise_count_avg_15d,
sq6.item__kv_day_h_click_count_sum_30d,
sq6.item__kv_category_click_count_sum_30d,
sq6.item__kv_day_h_praise_count_sum_30d,
sq6.item__kv_category_praise_count_sum_30d,
sq6.item__kv_day_h_click_count_max_30d,
sq6.item__kv_category_click_count_max_30d,
sq6.item__kv_day_h_praise_count_max_30d,
sq6.item__kv_category_praise_count_max_30d,
sq6.item__kv_day_h_click_count_min_30d,
sq6.item__kv_category_click_count_min_30d,
sq6.item__kv_day_h_praise_count_min_30d,
sq6.item__kv_category_praise_count_min_30d,
sq6.item__kv_day_h_click_count_avg_30d,
sq6.item__kv_category_click_count_avg_30d,
sq6.item__kv_day_h_praise_count_avg_30d,
sq6.item__kv_category_praise_count_avg_30d,
sq6.item__kv_day_h_click_count_sum_45d,
sq6.item__kv_category_click_count_sum_45d,
sq6.item__kv_day_h_praise_count_sum_45d,
sq6.item__kv_category_praise_count_sum_45d,
sq6.item__kv_day_h_click_count_max_45d,
sq6.item__kv_category_click_count_max_45d,
sq6.item__kv_day_h_praise_count_max_45d,
sq6.item__kv_category_praise_count_max_45d,
sq6.item__kv_day_h_click_count_min_45d,
sq6.item__kv_category_click_count_min_45d,
sq6.item__kv_day_h_praise_count_min_45d,
sq6.item__kv_category_praise_count_min_45d,
sq6.item__kv_day_h_click_count_avg_45d,
sq6.item__kv_category_click_count_avg_45d,
sq6.item__kv_day_h_praise_count_avg_45d,
sq6.item__kv_category_praise_count_avg_45d,
sq1.gender,
sq1.age,
sq1.city,
sq1.item_cnt,
sq1.follow_cnt,
sq1.follower_cnt,
sq1.is_new_user,
sq1.tags,
sq3.user__sum_playtime_3d,
sq3.user__sum_duration_3d,
sq3.user__sum_click_count_3d,
sq3.user__sum_praise_count_3d,
sq3.user__max_playtime_3d,
sq3.user__max_duration_3d,
sq3.user__max_click_count_3d,
sq3.user__max_praise_count_3d,
sq3.user__min_playtime_3d,
sq3.user__min_duration_3d,
sq3.user__min_click_count_3d,
sq3.user__min_praise_count_3d,
sq3.user__avg_playtime_3d,
sq3.user__avg_duration_3d,
sq3.user__avg_click_count_3d,
sq3.user__avg_praise_count_3d,
sq3.user__sum_playtime_7d,
sq3.user__sum_duration_7d,
sq3.user__sum_click_count_7d,
sq3.user__sum_praise_count_7d,
sq3.user__max_playtime_7d,
sq3.user__max_duration_7d,
sq3.user__max_click_count_7d,
sq3.user__max_praise_count_7d,
sq3.user__min_playtime_7d,
sq3.user__min_duration_7d,
sq3.user__min_click_count_7d,
sq3.user__min_praise_count_7d,
sq3.user__avg_playtime_7d,
sq3.user__avg_duration_7d,
sq3.user__avg_click_count_7d,
sq3.user__avg_praise_count_7d,
sq3.user__sum_playtime_15d,
sq3.user__sum_duration_15d,
sq3.user__sum_click_count_15d,
sq3.user__sum_praise_count_15d,
sq3.user__max_playtime_15d,
sq3.user__max_duration_15d,
sq3.user__max_click_count_15d,
sq3.user__max_praise_count_15d,
sq3.user__min_playtime_15d,
sq3.user__min_duration_15d,
sq3.user__min_click_count_15d,
sq3.user__min_praise_count_15d,
sq3.user__avg_playtime_15d,
sq3.user__avg_duration_15d,
sq3.user__avg_click_count_15d,
sq3.user__avg_praise_count_15d,
sq4.user__kv_day_h_duration_sum_7d,
sq4.user__kv_category_duration_sum_7d,
sq4.user__kv_day_h_click_count_sum_7d,
sq4.user__kv_category_click_count_sum_7d,
sq4.user__kv_day_h_duration_max_7d,
sq4.user__kv_category_duration_max_7d,
sq4.user__kv_day_h_click_count_max_7d,
sq4.user__kv_category_click_count_max_7d,
sq4.user__kv_day_h_duration_min_7d,
sq4.user__kv_category_duration_min_7d,
sq4.user__kv_day_h_click_count_min_7d,
sq4.user__kv_category_click_count_min_7d,
sq4.user__kv_day_h_duration_avg_7d,
sq4.user__kv_category_duration_avg_7d,
sq4.user__kv_day_h_click_count_avg_7d,
sq4.user__kv_category_click_count_avg_7d,
sq4.user__kv_day_h_duration_sum_15d,
sq4.user__kv_category_duration_sum_15d,
sq4.user__kv_day_h_click_count_sum_15d,
sq4.user__kv_category_click_count_sum_15d,
sq4.user__kv_day_h_duration_max_15d,
sq4.user__kv_category_duration_max_15d,
sq4.user__kv_day_h_click_count_max_15d,
sq4.user__kv_category_click_count_max_15d,
sq4.user__kv_day_h_duration_min_15d,
sq4.user__kv_category_duration_min_15d,
sq4.user__kv_day_h_click_count_min_15d,
sq4.user__kv_category_click_count_min_15d,
sq4.user__kv_day_h_duration_avg_15d,
sq4.user__kv_category_duration_avg_15d,
sq4.user__kv_day_h_click_count_avg_15d,
sq4.user__kv_category_click_count_avg_15d,
sq4.user__kv_day_h_duration_sum_30d,
sq4.user__kv_category_duration_sum_30d,
sq4.user__kv_day_h_click_count_sum_30d,
sq4.user__kv_category_click_count_sum_30d,
sq4.user__kv_day_h_duration_max_30d,
sq4.user__kv_category_duration_max_30d,
sq4.user__kv_day_h_click_count_max_30d,
sq4.user__kv_category_click_count_max_30d,
sq4.user__kv_day_h_duration_min_30d,
sq4.user__kv_category_duration_min_30d,
sq4.user__kv_day_h_click_count_min_30d,
sq4.user__kv_category_click_count_min_30d,
sq4.user__kv_day_h_duration_avg_30d,
sq4.user__kv_category_duration_avg_30d,
sq4.user__kv_day_h_click_count_avg_30d,
sq4.user__kv_category_click_count_avg_30d,
sq4.user__kv_day_h_duration_sum_45d,
sq4.user__kv_category_duration_sum_45d,
sq4.user__kv_day_h_click_count_sum_45d,
sq4.user__kv_category_click_count_sum_45d,
sq4.user__kv_day_h_duration_max_45d,
sq4.user__kv_category_duration_max_45d,
sq4.user__kv_day_h_click_count_max_45d,
sq4.user__kv_category_click_count_max_45d,
sq4.user__kv_day_h_duration_min_45d,
sq4.user__kv_category_duration_min_45d,
sq4.user__kv_day_h_click_count_min_45d,
sq4.user__kv_category_click_count_min_45d,
sq4.user__kv_day_h_duration_avg_45d,
sq4.user__kv_category_duration_avg_45d,
sq4.user__kv_day_h_click_count_avg_45d,
sq4.user__kv_category_click_count_avg_45d
from
(
select *
from rec_sln_demo_fs_label_table_v1
where ds = '${bdp.system.bizdate}'
) sq0
left join (
select
*
from <project_name>.rec_sln_demo_user_table_preprocess_v1
where ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), - 1,'dd'),'yyyymmdd')
) sq1 on sq0.user_id = sq1.user_id
left join (
select
*
from <project_name>.rec_sln_demo_item_table_preprocess_v1
where ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), - 1,'dd'),'yyyymmdd')
) sq2 on sq0.item_id = sq2.item_id
left join (
select
*
from <project_name>.rec_sln_demo_user_table_preprocess_win_v1
where ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), - 1,'dd'),'yyyymmdd')
) sq4 on sq0.user_id = sq4.user_id
left join (
select
*
from <project_name>.rec_sln_demo_item_table_preprocess_agg_v1
where ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), - 1,'dd'),'yyyymmdd')
) sq5 on sq0.item_id = sq5.item_id
left join (
select
*
from <project_name>.rec_sln_demo_item_table_preprocess_win_v1
where ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), - 1,'dd'),'yyyymmdd')
) sq6 on sq0.item_id = sq6.item_id
left join (
select
*
from <project_name>.rec_sln_demo_user_table_preprocess_agg_v1
where ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), - 1,'dd'),'yyyymmdd')
) sq3 on sq0.user_id = sq3.user_id
;In the preceding SQL statements, you must replace <project_name> with the actual project name.
References
For more information about how to generate features, see Feature platform and feature engineering.
FeatureStore is suitable for all scenarios that require features, such as recommendation, financial risk control, and user growth scenarios. FeatureStore is integrated with common data source engines and recommendation service engines of Alibaba Cloud. FeatureStore provides an efficient and convenient end-to-end platform from feature registration to model development and application. For more information about FeatureStore, see Overview.
If you have questions when you use FeatureStore, you can join the DingTalk group 34415007523 for technical assistance.