特徵平台當前提供的特徵生產功能旨在簡化特徵建立過程,通過固化常用的和普遍的生產步驟,您僅需進行簡單配置就能輕鬆產生特徵,從而有效降低了特徵生產的複雜性。特徵生產在多個領域(包括推薦、廣告、風控以及機器學習等)都有廣泛應用,本文將以推薦情境為例,為您介紹從原始表到特徵生產加工產生樣本表的完整過程。
前提條件
在開始執行操作前,請確認您已完成以下準備工作:
依賴產品 | 具體操作 |
人工智慧平台PAI |
|
雲原生MaxCompute |
|
巨量資料開發治理平台DataWorks |
|
一、準備工作
準備未經處理資料
一般對於推薦情境,特徵生產通常需要以下三張原始表,以這三張原始表為基礎,特徵生產可以生產出成百上千個特徵,方便建立模型來擬合目標。
使用者表(rec_sln_demo_user_table_preprocess_v1):包含一些基礎的使用者特徵,例如性別、年齡、城市和關注數等。
物品表(rec_sln_demo_item_table_preprocess_v1):包含一些基礎的物品特徵,例如類別、作者、累計點擊數和累計點贊數等。
行為表(rec_sln_demo_behavior_table_preprocess_v1):包含一些行為特徵,例如某時使用者點擊某個物品等。
資料表存放在有公開讀取許可權的pai_online_project中,其資料均為類比資料產生。您需要在DataWorks中執行SQL命令,將上表資料從pai_online_project專案同步到您的MaxCompute專案中。具體操作步驟如下:
登入DataWorks控制台。
在左側導覽列單擊資料開發與營運 > 資料開發。
選擇已建立的DataWorks工作空間後,單擊進入資料開發。
滑鼠移至上方至建立,選擇建立節點 > MaxCompute > ODPS SQL,在彈出的頁面中配置節點參數。
參數
取值建議
引擎執行個體
選擇已建立的MaxCompute引擎。
節點類型
ODPS SQL
路徑
商務程序/Workflow/MaxCompute
名稱
可自訂名稱。
單擊確認。
在建立節點地區運行以下SQL命令,將使用者表、物品表、行為表從pai_online_project專案同步到您自己的MaxCompute中。資源群組選擇已建立的獨享資源群組。
同步處理的使用者表:rec_sln_demo_user_table_preprocess_v1
CREATE TABLE IF NOT EXISTS rec_sln_demo_user_table_preprocess_v1 like pai_online_project.rec_sln_demo_user_table_preprocess_v1 STORED AS ALIORC LIFECYCLE 90; INSERT OVERWRITE TABLE rec_sln_demo_user_table_preprocess_v1 PARTITION(ds) SELECT * FROM pai_online_project.rec_sln_demo_user_table_preprocess_v1 WHERE ds >= '20240530' and ds <='20240605';同步物品表:rec_sln_demo_item_table_preprocess_v1
CREATE TABLE IF NOT EXISTS rec_sln_demo_item_table_preprocess_v1 like pai_online_project.rec_sln_demo_item_table_preprocess_v1 STORED AS ALIORC LIFECYCLE 90; INSERT OVERWRITE TABLE rec_sln_demo_item_table_preprocess_v1 PARTITION(ds) SELECT * FROM pai_online_project.rec_sln_demo_item_table_preprocess_v1 WHERE ds >= '20240530' and ds <='20240605';同步行為表:rec_sln_demo_behavior_table_preprocess_v1
CREATE TABLE IF NOT EXISTS rec_sln_demo_behavior_table_preprocess_v1 like pai_online_project.rec_sln_demo_behavior_table_preprocess_v1 STORED AS ALIORC LIFECYCLE 90; INSERT OVERWRITE TABLE rec_sln_demo_behavior_table_preprocess_v1 PARTITION(ds) SELECT * FROM pai_online_project.rec_sln_demo_behavior_table_preprocess_v1 WHERE ds >= '20240530' and ds <='20240605';
安裝FeatureStore Python SDK
以下代碼均建議在Jupyter Notebook環境下運行。
安裝特徵平台Python SDK,要求在Python3環境下運行。
%pip install https://feature-store-py.oss-cn-beijing.aliyuncs.com/package/feature_store_py-2.0.2-py3-none-any.whl匯入需要的功能模組:
import os from feature_store_py import FeatureStoreClient from feature_store_py.fs_datasource import MaxComputeDataSource from feature_store_py.feature_engineering import TableTransform, Condition, DayOf, ComboTransform, Feature, AggregationTransform, auto_count_feature_transform, WindowTransform, auto_window_feature_transform
二、原始表初步變換
準備流程完成後,您可以在自己的專案空間中查看已準備就緒的三張表:使用者表(rec_sln_demo_user_table_preprocess_v1)、物品表(rec_sln_demo_item_table_preprocess_v1)以及行為表(rec_sln_demo_behavior_table_preprocess_v1),您可以以這三張表為基礎進行特徵生產。
在進行後續的特徵生產加工之前,為了方便您後續做特徵統計,您需要對資料表做以下預先處理操作。您可以將以下SQL命令粘貼到已建立的ODPS SQL節點中執行,具體操作,請參見準備未經處理資料。
將使用者表、物品表的特徵和行為表進行串連。
CREATE TABLE IF NOT EXISTS rec_sln_demo_behavior_table_preprocess_wide_v1 ( request_id bigint ,user_id string ,page string ,net_type string ,day_h bigint COMMENT '行為發生在當天的第幾小時' ,week_day bigint COMMENT '行為發生在當前周的第幾天' ,event_unix_time bigint ,item_id string ,event string ,playtime double ,gender string ,age bigint ,city string ,item_cnt bigint ,follow_cnt bigint ,follower_cnt bigint ,is_new_user bigint ,tags string ,duration double ,category string ,author bigint ,click_count bigint ,praise_count bigint ,is_new_item bigint ) PARTITIONED BY ( ds string ) LIFECYCLE 90 ; INSERT OVERWRITE TABLE rec_sln_demo_behavior_table_preprocess_wide_v1 PARTITION(ds='${bdp.system.bizdate}') SELECT sq0.request_id ,sq0.user_id ,sq0.page ,sq0.net_type ,sq0.day_h ,sq0.week_day ,sq0.event_unix_time ,sq0.item_id ,sq0.event ,sq0.playtime ,sq1.gender ,sq1.age ,sq1.city ,sq1.item_cnt ,sq1.follow_cnt ,sq1.follower_cnt ,sq1.is_new_user ,sq1.tags ,sq2.duration ,sq2.category ,sq2.author ,sq2.click_count ,sq2.praise_count ,sq2.is_new_item FROM ( SELECT * FROM rec_sln_demo_behavior_table_preprocess_v1 WHERE ds = '${bdp.system.bizdate}' ) sq0 LEFT JOIN ( SELECT * FROM rec_sln_demo_user_table_preprocess_v1 WHERE ds = '${bdp.system.bizdate}' ) sq1 ON sq0.user_id = sq1.user_id LEFT JOIN ( SELECT * FROM rec_sln_demo_item_table_preprocess_v1 WHERE ds = '${bdp.system.bizdate}' ) sq2 ON sq0.item_id = sq2.item_id ;因訓練模型需要有標籤,您需要對行為表進行預先處理,將其轉換成Label表。本樣本將點擊、播放時間、點贊作為標籤。
CREATE TABLE IF NOT EXISTS rec_sln_demo_fs_label_table_v1 ( request_id bigint ,user_id string ,page string ,net_type string ,day_h bigint COMMENT '行為發生在當天的第幾小時' ,week_day bigint COMMENT '行為發生在當前周的第幾天' ,day_min string ,event_unix_time bigint ,item_id string ,playtime double ,is_click BIGINT ,ln_playtime DOUBLE ,is_praise BIGINT ) PARTITIONED BY ( ds string ) LIFECYCLE 90 ; INSERT OVERWRITE TABLE rec_sln_demo_fs_label_table_v1 PARTITION(ds='${bdp.system.bizdate}') SELECT request_id ,user_id ,MAX(page) page ,MAX(net_type) net_type ,MAX(day_h) day_h ,MAX(week_day) week_day ,TO_CHAR(FROM_UNIXTIME(MIN(event_unix_time)),'yyyymmddhhmi') day_min ,MAX(event_unix_time) event_unix_time ,item_id ,MAX(playtime) playtime ,max(if(event='click', 1, 0)) is_click ,ln(sum(playtime) + 1) ln_playtime ,max(if(event='praise', 1, 0)) is_praise FROM rec_sln_demo_behavior_table_preprocess_v1 WHERE ds = '${bdp.system.bizdate}' GROUP BY request_id ,user_id ,item_id ;
經過上面的初步變換後,您將擁有以下兩種表:
新的行為表:rec_sln_demo_behavior_table_preprocess_wide_v1,後續的統計特徵將以此表為基礎進行變換。
Label表:rec_sln_demo_fs_label_table_v1,後續構建樣本表時需要用到此表。
三、特徵生產加工
您可以調用特徵生產中的自動擴充函數進行特徵生產,僅需幾行代碼,就可以生產出上百種特徵。
特徵生產加工分別需要加工使用者側特徵和物品側特徵。具體操作步驟如下:
使用者側特徵的特徵生產
以下代碼均建議在Jupyter Notebook環境下運行。
初始化Client。
access_key_id = os.environ.get("ALIBABA_CLOUD_ACCESS_KEY_ID") # 填入您的Access Key ID access_key_secret = os.environ.get("ALIBABA_CLOUD_ACCESS_KEY_SECRET") # 填入您的Access Key Secret project = 'project_name' # 填入您的專案名 region = 'cn-hangzhou' # 填入您的專案所在地區,例如華東1(杭州)為cn-hangzhou fs_client = FeatureStoreClient(access_key_id=access_key_id, access_key_secret=access_key_secret, region=region)指定要進行特徵變換的資料來源。
input_bhv_table_name = "rec_sln_demo_behavior_table_preprocess_wide_v1" ds_bhv = MaxComputeDataSource(table=input_bhv_table_name, project=project) input_user_table_name = "rec_sln_demo_user_table_preprocess_v1" ds_user = MaxComputeDataSource(table=input_user_table_name, project=project) input_item_table_name = "rec_sln_demo_item_table_preprocess_v1" ds_item = MaxComputeDataSource(table=input_item_table_name, project=project)指定JoinTransform和AggregationTransform後輸出的使用者側特徵表名稱。
agg_user_table_v1 = 'rec_sln_demo_user_table_preprocess_agg_v1'利用自動擴充函數,對統計特徵進行自動擴充。
name_prefix = "user" input_list = ["playtime", "duration", "click_count", "praise_count"] event_name = 'event' event_type = 'expr' group_by_key = "user_id" window_size = [3,7,15] user_count_feature_list = auto_count_feature_transform(name_prefix, input_list, event_name, event_type, group_by_key, window_size) print("len_count_feature_list = ", len(user_count_feature_list)) print("count_feature_list = ", user_count_feature_list)執行生產過程,並查看錶運行結果。
建立Pipeline。
agg_user_bhv_pipeline = fs_client.create_pipeline(ds_bhv).add_feature_transform(user_count_feature_list) agg_user_pipeline =fs_client.create_pipeline(ds_user, agg_user_table_v1).merge(agg_user_bhv_pipeline, keep_input_columns=False)執行Pipeline生產過程。
execute_date = '20240605' output_agg_user_table = agg_user_pipeline.execute(execute_date, drop_table=True)查看錶運行結果。
agg_user_ret = output_agg_user_table.to_pandas(execute_date, limit=20) agg_user_ret
進行WindowTransform變換,指定WindowTransform後輸出的使用者側特徵表名稱。
win_user_table_v1 = 'rec_sln_demo_user_table_preprocess_win_v1'利用內建的自動擴充函數,自動產生WindowTransform特徵定義。
name_prefix = 'user' input_list = ['day_h', 'category'] agg_field = ['duration', 'click_count'] event_name = 'event' event_type = 'expr' group_by_key = 'user_id' window_size = [7, 15, 30, 45] user_win_feature_list = auto_window_feature_transform(name_prefix, input_list, agg_field, event_name, event_type, group_by_key, window_size) print("len_user_win_feature_list = ", len(user_win_feature_list)) print("user_win_feature_list = ", user_win_feature_list)執行生產過程,並查看錶運行結果。
建立pipeline。
win_user_bhv_pipeline = fs_client.create_pipeline(ds_bhv).add_feature_transform(user_win_feature_list) win_user_pipeline = fs_client.create_pipeline(ds_user, win_user_table_v1).merge(win_user_bhv_pipeline, keep_input_columns=False)執行pipeline生產過程。
execute_date = '20240605' output_win_user_table = win_user_pipeline.execute(execute_date, drop_table=True) # 詳細實現見功能文檔。因為有中間表存在,第一次執行時需要補資料,可能需要較長時間,運行補資料可以執行下面的命令 # output_win_user_table = win_user_pipeline.execute(execute_date, drop_table=True, backfill_partitions=True)查看錶運行結果。
win_user_ret = output_win_user_table.to_pandas(execute_date, limit=20) win_user_ret
物品側特徵的特徵生產
在完成使用者側特徵的提取和加工之後,您可以繼續對物品表進行特徵生產加工,產生基於物品類型的特徵。具體操作步驟如下:
指定AggregationTransform後輸出的物品側特徵表名稱。
agg_item_table_v1 = 'rec_sln_demo_item_table_preprocess_agg_v1'利用自動擴充函數,對統計特徵進行自動擴充。
name_prefix = "item" input_list = ["item_cnt", "follow_cnt", "follower_cnt"] event_name = 'event' event_type = 'expr' group_by_key = "item_id" window_size = [3,7,15] item_count_feature_list = auto_count_feature_transform(name_prefix, input_list, event_name, event_type, group_by_key, window_size) print("len_count_feature_list = ", len(item_count_feature_list)) print("count_feature_list = ", item_count_feature_list)執行生產過程,並查看錶運行結果。
建立pipeline。
agg_item_bhv_pipeline =fs_client.create_pipeline(ds_bhv).add_feature_transform(item_count_feature_list) agg_item_pipeline =fs_client.create_pipeline(ds_item, agg_item_table_v1).merge(agg_item_bhv_pipeline, keep_input_columns=False)執行pipeline生產過程。
execute_date = '20240605' output_agg_item_table = agg_item_pipeline.execute(execute_date, drop_table=True)查看錶運行結果。
agg_item_ret = output_agg_item_table.to_pandas(execute_date, limit=20) agg_item_ret
指定WindowTransform後輸出的使用者側特徵表名稱。
win_item_table_v1 = 'rec_sln_demo_item_table_preprocess_win_v1'利用內建的自動擴充函數,自動產生WindowTransform特徵定義。
name_prefix = 'item' input_list = ['day_h', 'category'] agg_field = ['click_count', 'praise_count'] event_name = 'event' event_type = 'expr' group_by_key = 'item_id' window_size = [7, 15, 30, 45] item_win_feature_list = auto_window_feature_transform(name_prefix, input_list, agg_field, event_name, event_type, group_by_key, window_size) print("len_item_win_feature_list = ", len(item_win_feature_list)) print("item_win_feature_list = ", item_win_feature_list)執行生產過程,並查看錶運行結果。
建立pipeline。
win_item_bhv_pipeline = fs_client.create_pipeline(ds_bhv).add_feature_transform(item_win_feature_list) win_item_pipeline = fs_client.create_pipeline(ds_item, win_item_table_v1).merge(win_item_bhv_pipeline, keep_input_columns=False)執行pipeline生產過程。
execute_date = '20240605' output_win_item_table = win_item_pipeline.execute(execute_date, drop_table=True) # 詳細實現見功能文檔。因為有中間表存在,第一次執行時需要補資料,可能需要較長時間,運行補資料可以執行下面的命令 # output_win_item_table = win_item_pipeline.execute(execute_date, drop_table=True, backfill_partitions=True)查看錶運行結果。
win_item_ret = output_win_item_table.to_pandas(execute_date, limit=20) win_item_ret
四、產生樣本表
經過上述流程,您已擷取了Label表,並在物品側和使用者側分別整理出三張特徵表(包括一張原始表及兩張經過特徵生產加工出的表)。隨後,您可以將以下SQL命令粘貼到已建立的ODPS SQL節點中執行,將這七張表進行合并,以構建完整的樣本表。具體操作,請參見準備未經處理資料。
CREATE TABLE IF NOT EXISTS fs_demo_fs_engineering_v1_training_set(
request_id BIGINT,
user_id STRING,
page STRING,
net_type STRING,
day_h BIGINT,
week_day BIGINT,
day_min STRING,
event_unix_time BIGINT,
item_id STRING,
playtime DOUBLE,
is_click BIGINT,
ln_playtime DOUBLE,
is_praise BIGINT,
duration DOUBLE,
category STRING,
author BIGINT,
click_count BIGINT,
praise_count BIGINT,
is_new_item BIGINT,
item__sum_item_cnt_3d BIGINT,
item__sum_follow_cnt_3d BIGINT,
item__sum_follower_cnt_3d BIGINT,
item__max_item_cnt_3d BIGINT,
item__max_follow_cnt_3d BIGINT,
item__max_follower_cnt_3d BIGINT,
item__min_item_cnt_3d BIGINT,
item__min_follow_cnt_3d BIGINT,
item__min_follower_cnt_3d BIGINT,
item__avg_item_cnt_3d DOUBLE,
item__avg_follow_cnt_3d DOUBLE,
item__avg_follower_cnt_3d DOUBLE,
item__sum_item_cnt_7d BIGINT,
item__sum_follow_cnt_7d BIGINT,
item__sum_follower_cnt_7d BIGINT,
item__max_item_cnt_7d BIGINT,
item__max_follow_cnt_7d BIGINT,
item__max_follower_cnt_7d BIGINT,
item__min_item_cnt_7d BIGINT,
item__min_follow_cnt_7d BIGINT,
item__min_follower_cnt_7d BIGINT,
item__avg_item_cnt_7d DOUBLE,
item__avg_follow_cnt_7d DOUBLE,
item__avg_follower_cnt_7d DOUBLE,
item__sum_item_cnt_15d BIGINT,
item__sum_follow_cnt_15d BIGINT,
item__sum_follower_cnt_15d BIGINT,
item__max_item_cnt_15d BIGINT,
item__max_follow_cnt_15d BIGINT,
item__max_follower_cnt_15d BIGINT,
item__min_item_cnt_15d BIGINT,
item__min_follow_cnt_15d BIGINT,
item__min_follower_cnt_15d BIGINT,
item__avg_item_cnt_15d DOUBLE,
item__avg_follow_cnt_15d DOUBLE,
item__avg_follower_cnt_15d DOUBLE,
item__kv_day_h_click_count_sum_7d STRING,
item__kv_category_click_count_sum_7d STRING,
item__kv_day_h_praise_count_sum_7d STRING,
item__kv_category_praise_count_sum_7d STRING,
item__kv_day_h_click_count_max_7d STRING,
item__kv_category_click_count_max_7d STRING,
item__kv_day_h_praise_count_max_7d STRING,
item__kv_category_praise_count_max_7d STRING,
item__kv_day_h_click_count_min_7d STRING,
item__kv_category_click_count_min_7d STRING,
item__kv_day_h_praise_count_min_7d STRING,
item__kv_category_praise_count_min_7d STRING,
item__kv_day_h_click_count_avg_7d STRING,
item__kv_category_click_count_avg_7d STRING,
item__kv_day_h_praise_count_avg_7d STRING,
item__kv_category_praise_count_avg_7d STRING,
item__kv_day_h_click_count_sum_15d STRING,
item__kv_category_click_count_sum_15d STRING,
item__kv_day_h_praise_count_sum_15d STRING,
item__kv_category_praise_count_sum_15d STRING,
item__kv_day_h_click_count_max_15d STRING,
item__kv_category_click_count_max_15d STRING,
item__kv_day_h_praise_count_max_15d STRING,
item__kv_category_praise_count_max_15d STRING,
item__kv_day_h_click_count_min_15d STRING,
item__kv_category_click_count_min_15d STRING,
item__kv_day_h_praise_count_min_15d STRING,
item__kv_category_praise_count_min_15d STRING,
item__kv_day_h_click_count_avg_15d STRING,
item__kv_category_click_count_avg_15d STRING,
item__kv_day_h_praise_count_avg_15d STRING,
item__kv_category_praise_count_avg_15d STRING,
item__kv_day_h_click_count_sum_30d STRING,
item__kv_category_click_count_sum_30d STRING,
item__kv_day_h_praise_count_sum_30d STRING,
item__kv_category_praise_count_sum_30d STRING,
item__kv_day_h_click_count_max_30d STRING,
item__kv_category_click_count_max_30d STRING,
item__kv_day_h_praise_count_max_30d STRING,
item__kv_category_praise_count_max_30d STRING,
item__kv_day_h_click_count_min_30d STRING,
item__kv_category_click_count_min_30d STRING,
item__kv_day_h_praise_count_min_30d STRING,
item__kv_category_praise_count_min_30d STRING,
item__kv_day_h_click_count_avg_30d STRING,
item__kv_category_click_count_avg_30d STRING,
item__kv_day_h_praise_count_avg_30d STRING,
item__kv_category_praise_count_avg_30d STRING,
item__kv_day_h_click_count_sum_45d STRING,
item__kv_category_click_count_sum_45d STRING,
item__kv_day_h_praise_count_sum_45d STRING,
item__kv_category_praise_count_sum_45d STRING,
item__kv_day_h_click_count_max_45d STRING,
item__kv_category_click_count_max_45d STRING,
item__kv_day_h_praise_count_max_45d STRING,
item__kv_category_praise_count_max_45d STRING,
item__kv_day_h_click_count_min_45d STRING,
item__kv_category_click_count_min_45d STRING,
item__kv_day_h_praise_count_min_45d STRING,
item__kv_category_praise_count_min_45d STRING,
item__kv_day_h_click_count_avg_45d STRING,
item__kv_category_click_count_avg_45d STRING,
item__kv_day_h_praise_count_avg_45d STRING,
item__kv_category_praise_count_avg_45d STRING,
gender STRING,
age BIGINT,
city STRING,
item_cnt BIGINT,
follow_cnt BIGINT,
follower_cnt BIGINT,
is_new_user BIGINT,
tags STRING,
user__sum_playtime_3d DOUBLE,
user__sum_duration_3d DOUBLE,
user__sum_click_count_3d BIGINT,
user__sum_praise_count_3d BIGINT,
user__max_playtime_3d DOUBLE,
user__max_duration_3d DOUBLE,
user__max_click_count_3d BIGINT,
user__max_praise_count_3d BIGINT,
user__min_playtime_3d DOUBLE,
user__min_duration_3d DOUBLE,
user__min_click_count_3d BIGINT,
user__min_praise_count_3d BIGINT,
user__avg_playtime_3d DOUBLE,
user__avg_duration_3d DOUBLE,
user__avg_click_count_3d DOUBLE,
user__avg_praise_count_3d DOUBLE,
user__sum_playtime_7d DOUBLE,
user__sum_duration_7d DOUBLE,
user__sum_click_count_7d BIGINT,
user__sum_praise_count_7d BIGINT,
user__max_playtime_7d DOUBLE,
user__max_duration_7d DOUBLE,
user__max_click_count_7d BIGINT,
user__max_praise_count_7d BIGINT,
user__min_playtime_7d DOUBLE,
user__min_duration_7d DOUBLE,
user__min_click_count_7d BIGINT,
user__min_praise_count_7d BIGINT,
user__avg_playtime_7d DOUBLE,
user__avg_duration_7d DOUBLE,
user__avg_click_count_7d DOUBLE,
user__avg_praise_count_7d DOUBLE,
user__sum_playtime_15d DOUBLE,
user__sum_duration_15d DOUBLE,
user__sum_click_count_15d BIGINT,
user__sum_praise_count_15d BIGINT,
user__max_playtime_15d DOUBLE,
user__max_duration_15d DOUBLE,
user__max_click_count_15d BIGINT,
user__max_praise_count_15d BIGINT,
user__min_playtime_15d DOUBLE,
user__min_duration_15d DOUBLE,
user__min_click_count_15d BIGINT,
user__min_praise_count_15d BIGINT,
user__avg_playtime_15d DOUBLE,
user__avg_duration_15d DOUBLE,
user__avg_click_count_15d DOUBLE,
user__avg_praise_count_15d DOUBLE,
user__kv_day_h_duration_sum_7d STRING,
user__kv_category_duration_sum_7d STRING,
user__kv_day_h_click_count_sum_7d STRING,
user__kv_category_click_count_sum_7d STRING,
user__kv_day_h_duration_max_7d STRING,
user__kv_category_duration_max_7d STRING,
user__kv_day_h_click_count_max_7d STRING,
user__kv_category_click_count_max_7d STRING,
user__kv_day_h_duration_min_7d STRING,
user__kv_category_duration_min_7d STRING,
user__kv_day_h_click_count_min_7d STRING,
user__kv_category_click_count_min_7d STRING,
user__kv_day_h_duration_avg_7d STRING,
user__kv_category_duration_avg_7d STRING,
user__kv_day_h_click_count_avg_7d STRING,
user__kv_category_click_count_avg_7d STRING,
user__kv_day_h_duration_sum_15d STRING,
user__kv_category_duration_sum_15d STRING,
user__kv_day_h_click_count_sum_15d STRING,
user__kv_category_click_count_sum_15d STRING,
user__kv_day_h_duration_max_15d STRING,
user__kv_category_duration_max_15d STRING,
user__kv_day_h_click_count_max_15d STRING,
user__kv_category_click_count_max_15d STRING,
user__kv_day_h_duration_min_15d STRING,
user__kv_category_duration_min_15d STRING,
user__kv_day_h_click_count_min_15d STRING,
user__kv_category_click_count_min_15d STRING,
user__kv_day_h_duration_avg_15d STRING,
user__kv_category_duration_avg_15d STRING,
user__kv_day_h_click_count_avg_15d STRING,
user__kv_category_click_count_avg_15d STRING,
user__kv_day_h_duration_sum_30d STRING,
user__kv_category_duration_sum_30d STRING,
user__kv_day_h_click_count_sum_30d STRING,
user__kv_category_click_count_sum_30d STRING,
user__kv_day_h_duration_max_30d STRING,
user__kv_category_duration_max_30d STRING,
user__kv_day_h_click_count_max_30d STRING,
user__kv_category_click_count_max_30d STRING,
user__kv_day_h_duration_min_30d STRING,
user__kv_category_duration_min_30d STRING,
user__kv_day_h_click_count_min_30d STRING,
user__kv_category_click_count_min_30d STRING,
user__kv_day_h_duration_avg_30d STRING,
user__kv_category_duration_avg_30d STRING,
user__kv_day_h_click_count_avg_30d STRING,
user__kv_category_click_count_avg_30d STRING,
user__kv_day_h_duration_sum_45d STRING,
user__kv_category_duration_sum_45d STRING,
user__kv_day_h_click_count_sum_45d STRING,
user__kv_category_click_count_sum_45d STRING,
user__kv_day_h_duration_max_45d STRING,
user__kv_category_duration_max_45d STRING,
user__kv_day_h_click_count_max_45d STRING,
user__kv_category_click_count_max_45d STRING,
user__kv_day_h_duration_min_45d STRING,
user__kv_category_duration_min_45d STRING,
user__kv_day_h_click_count_min_45d STRING,
user__kv_category_click_count_min_45d STRING,
user__kv_day_h_duration_avg_45d STRING,
user__kv_category_duration_avg_45d STRING,
user__kv_day_h_click_count_avg_45d STRING,
user__kv_category_click_count_avg_45d STRING
)
PARTITIONED BY (ds STRING)
LIFECYCLE 90;
insert overwrite table fs_demo_fs_engineering_v1_training_set partition (ds = '${bdp.system.bizdate}')
select
sq0.request_id,
sq0.user_id,
sq0.page,
sq0.net_type,
sq0.day_h,
sq0.week_day,
sq0.day_min,
sq0.event_unix_time,
sq0.item_id,
sq0.playtime,
sq0.is_click,
sq0.ln_playtime,
sq0.is_praise,
sq2.duration,
sq2.category,
sq2.author,
sq2.click_count,
sq2.praise_count,
sq2.is_new_item,
sq5.item__sum_item_cnt_3d,
sq5.item__sum_follow_cnt_3d,
sq5.item__sum_follower_cnt_3d,
sq5.item__max_item_cnt_3d,
sq5.item__max_follow_cnt_3d,
sq5.item__max_follower_cnt_3d,
sq5.item__min_item_cnt_3d,
sq5.item__min_follow_cnt_3d,
sq5.item__min_follower_cnt_3d,
sq5.item__avg_item_cnt_3d,
sq5.item__avg_follow_cnt_3d,
sq5.item__avg_follower_cnt_3d,
sq5.item__sum_item_cnt_7d,
sq5.item__sum_follow_cnt_7d,
sq5.item__sum_follower_cnt_7d,
sq5.item__max_item_cnt_7d,
sq5.item__max_follow_cnt_7d,
sq5.item__max_follower_cnt_7d,
sq5.item__min_item_cnt_7d,
sq5.item__min_follow_cnt_7d,
sq5.item__min_follower_cnt_7d,
sq5.item__avg_item_cnt_7d,
sq5.item__avg_follow_cnt_7d,
sq5.item__avg_follower_cnt_7d,
sq5.item__sum_item_cnt_15d,
sq5.item__sum_follow_cnt_15d,
sq5.item__sum_follower_cnt_15d,
sq5.item__max_item_cnt_15d,
sq5.item__max_follow_cnt_15d,
sq5.item__max_follower_cnt_15d,
sq5.item__min_item_cnt_15d,
sq5.item__min_follow_cnt_15d,
sq5.item__min_follower_cnt_15d,
sq5.item__avg_item_cnt_15d,
sq5.item__avg_follow_cnt_15d,
sq5.item__avg_follower_cnt_15d,
sq6.item__kv_day_h_click_count_sum_7d,
sq6.item__kv_category_click_count_sum_7d,
sq6.item__kv_day_h_praise_count_sum_7d,
sq6.item__kv_category_praise_count_sum_7d,
sq6.item__kv_day_h_click_count_max_7d,
sq6.item__kv_category_click_count_max_7d,
sq6.item__kv_day_h_praise_count_max_7d,
sq6.item__kv_category_praise_count_max_7d,
sq6.item__kv_day_h_click_count_min_7d,
sq6.item__kv_category_click_count_min_7d,
sq6.item__kv_day_h_praise_count_min_7d,
sq6.item__kv_category_praise_count_min_7d,
sq6.item__kv_day_h_click_count_avg_7d,
sq6.item__kv_category_click_count_avg_7d,
sq6.item__kv_day_h_praise_count_avg_7d,
sq6.item__kv_category_praise_count_avg_7d,
sq6.item__kv_day_h_click_count_sum_15d,
sq6.item__kv_category_click_count_sum_15d,
sq6.item__kv_day_h_praise_count_sum_15d,
sq6.item__kv_category_praise_count_sum_15d,
sq6.item__kv_day_h_click_count_max_15d,
sq6.item__kv_category_click_count_max_15d,
sq6.item__kv_day_h_praise_count_max_15d,
sq6.item__kv_category_praise_count_max_15d,
sq6.item__kv_day_h_click_count_min_15d,
sq6.item__kv_category_click_count_min_15d,
sq6.item__kv_day_h_praise_count_min_15d,
sq6.item__kv_category_praise_count_min_15d,
sq6.item__kv_day_h_click_count_avg_15d,
sq6.item__kv_category_click_count_avg_15d,
sq6.item__kv_day_h_praise_count_avg_15d,
sq6.item__kv_category_praise_count_avg_15d,
sq6.item__kv_day_h_click_count_sum_30d,
sq6.item__kv_category_click_count_sum_30d,
sq6.item__kv_day_h_praise_count_sum_30d,
sq6.item__kv_category_praise_count_sum_30d,
sq6.item__kv_day_h_click_count_max_30d,
sq6.item__kv_category_click_count_max_30d,
sq6.item__kv_day_h_praise_count_max_30d,
sq6.item__kv_category_praise_count_max_30d,
sq6.item__kv_day_h_click_count_min_30d,
sq6.item__kv_category_click_count_min_30d,
sq6.item__kv_day_h_praise_count_min_30d,
sq6.item__kv_category_praise_count_min_30d,
sq6.item__kv_day_h_click_count_avg_30d,
sq6.item__kv_category_click_count_avg_30d,
sq6.item__kv_day_h_praise_count_avg_30d,
sq6.item__kv_category_praise_count_avg_30d,
sq6.item__kv_day_h_click_count_sum_45d,
sq6.item__kv_category_click_count_sum_45d,
sq6.item__kv_day_h_praise_count_sum_45d,
sq6.item__kv_category_praise_count_sum_45d,
sq6.item__kv_day_h_click_count_max_45d,
sq6.item__kv_category_click_count_max_45d,
sq6.item__kv_day_h_praise_count_max_45d,
sq6.item__kv_category_praise_count_max_45d,
sq6.item__kv_day_h_click_count_min_45d,
sq6.item__kv_category_click_count_min_45d,
sq6.item__kv_day_h_praise_count_min_45d,
sq6.item__kv_category_praise_count_min_45d,
sq6.item__kv_day_h_click_count_avg_45d,
sq6.item__kv_category_click_count_avg_45d,
sq6.item__kv_day_h_praise_count_avg_45d,
sq6.item__kv_category_praise_count_avg_45d,
sq1.gender,
sq1.age,
sq1.city,
sq1.item_cnt,
sq1.follow_cnt,
sq1.follower_cnt,
sq1.is_new_user,
sq1.tags,
sq3.user__sum_playtime_3d,
sq3.user__sum_duration_3d,
sq3.user__sum_click_count_3d,
sq3.user__sum_praise_count_3d,
sq3.user__max_playtime_3d,
sq3.user__max_duration_3d,
sq3.user__max_click_count_3d,
sq3.user__max_praise_count_3d,
sq3.user__min_playtime_3d,
sq3.user__min_duration_3d,
sq3.user__min_click_count_3d,
sq3.user__min_praise_count_3d,
sq3.user__avg_playtime_3d,
sq3.user__avg_duration_3d,
sq3.user__avg_click_count_3d,
sq3.user__avg_praise_count_3d,
sq3.user__sum_playtime_7d,
sq3.user__sum_duration_7d,
sq3.user__sum_click_count_7d,
sq3.user__sum_praise_count_7d,
sq3.user__max_playtime_7d,
sq3.user__max_duration_7d,
sq3.user__max_click_count_7d,
sq3.user__max_praise_count_7d,
sq3.user__min_playtime_7d,
sq3.user__min_duration_7d,
sq3.user__min_click_count_7d,
sq3.user__min_praise_count_7d,
sq3.user__avg_playtime_7d,
sq3.user__avg_duration_7d,
sq3.user__avg_click_count_7d,
sq3.user__avg_praise_count_7d,
sq3.user__sum_playtime_15d,
sq3.user__sum_duration_15d,
sq3.user__sum_click_count_15d,
sq3.user__sum_praise_count_15d,
sq3.user__max_playtime_15d,
sq3.user__max_duration_15d,
sq3.user__max_click_count_15d,
sq3.user__max_praise_count_15d,
sq3.user__min_playtime_15d,
sq3.user__min_duration_15d,
sq3.user__min_click_count_15d,
sq3.user__min_praise_count_15d,
sq3.user__avg_playtime_15d,
sq3.user__avg_duration_15d,
sq3.user__avg_click_count_15d,
sq3.user__avg_praise_count_15d,
sq4.user__kv_day_h_duration_sum_7d,
sq4.user__kv_category_duration_sum_7d,
sq4.user__kv_day_h_click_count_sum_7d,
sq4.user__kv_category_click_count_sum_7d,
sq4.user__kv_day_h_duration_max_7d,
sq4.user__kv_category_duration_max_7d,
sq4.user__kv_day_h_click_count_max_7d,
sq4.user__kv_category_click_count_max_7d,
sq4.user__kv_day_h_duration_min_7d,
sq4.user__kv_category_duration_min_7d,
sq4.user__kv_day_h_click_count_min_7d,
sq4.user__kv_category_click_count_min_7d,
sq4.user__kv_day_h_duration_avg_7d,
sq4.user__kv_category_duration_avg_7d,
sq4.user__kv_day_h_click_count_avg_7d,
sq4.user__kv_category_click_count_avg_7d,
sq4.user__kv_day_h_duration_sum_15d,
sq4.user__kv_category_duration_sum_15d,
sq4.user__kv_day_h_click_count_sum_15d,
sq4.user__kv_category_click_count_sum_15d,
sq4.user__kv_day_h_duration_max_15d,
sq4.user__kv_category_duration_max_15d,
sq4.user__kv_day_h_click_count_max_15d,
sq4.user__kv_category_click_count_max_15d,
sq4.user__kv_day_h_duration_min_15d,
sq4.user__kv_category_duration_min_15d,
sq4.user__kv_day_h_click_count_min_15d,
sq4.user__kv_category_click_count_min_15d,
sq4.user__kv_day_h_duration_avg_15d,
sq4.user__kv_category_duration_avg_15d,
sq4.user__kv_day_h_click_count_avg_15d,
sq4.user__kv_category_click_count_avg_15d,
sq4.user__kv_day_h_duration_sum_30d,
sq4.user__kv_category_duration_sum_30d,
sq4.user__kv_day_h_click_count_sum_30d,
sq4.user__kv_category_click_count_sum_30d,
sq4.user__kv_day_h_duration_max_30d,
sq4.user__kv_category_duration_max_30d,
sq4.user__kv_day_h_click_count_max_30d,
sq4.user__kv_category_click_count_max_30d,
sq4.user__kv_day_h_duration_min_30d,
sq4.user__kv_category_duration_min_30d,
sq4.user__kv_day_h_click_count_min_30d,
sq4.user__kv_category_click_count_min_30d,
sq4.user__kv_day_h_duration_avg_30d,
sq4.user__kv_category_duration_avg_30d,
sq4.user__kv_day_h_click_count_avg_30d,
sq4.user__kv_category_click_count_avg_30d,
sq4.user__kv_day_h_duration_sum_45d,
sq4.user__kv_category_duration_sum_45d,
sq4.user__kv_day_h_click_count_sum_45d,
sq4.user__kv_category_click_count_sum_45d,
sq4.user__kv_day_h_duration_max_45d,
sq4.user__kv_category_duration_max_45d,
sq4.user__kv_day_h_click_count_max_45d,
sq4.user__kv_category_click_count_max_45d,
sq4.user__kv_day_h_duration_min_45d,
sq4.user__kv_category_duration_min_45d,
sq4.user__kv_day_h_click_count_min_45d,
sq4.user__kv_category_click_count_min_45d,
sq4.user__kv_day_h_duration_avg_45d,
sq4.user__kv_category_duration_avg_45d,
sq4.user__kv_day_h_click_count_avg_45d,
sq4.user__kv_category_click_count_avg_45d
from
(
select *
from rec_sln_demo_fs_label_table_v1
where ds = '${bdp.system.bizdate}'
) sq0
left join (
select
*
from <project_name>.rec_sln_demo_user_table_preprocess_v1
where ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), - 1,'dd'),'yyyymmdd')
) sq1 on sq0.user_id = sq1.user_id
left join (
select
*
from <project_name>.rec_sln_demo_item_table_preprocess_v1
where ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), - 1,'dd'),'yyyymmdd')
) sq2 on sq0.item_id = sq2.item_id
left join (
select
*
from <project_name>.rec_sln_demo_user_table_preprocess_win_v1
where ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), - 1,'dd'),'yyyymmdd')
) sq4 on sq0.user_id = sq4.user_id
left join (
select
*
from <project_name>.rec_sln_demo_item_table_preprocess_agg_v1
where ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), - 1,'dd'),'yyyymmdd')
) sq5 on sq0.item_id = sq5.item_id
left join (
select
*
from <project_name>.rec_sln_demo_item_table_preprocess_win_v1
where ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), - 1,'dd'),'yyyymmdd')
) sq6 on sq0.item_id = sq6.item_id
left join (
select
*
from <project_name>.rec_sln_demo_user_table_preprocess_agg_v1
where ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), - 1,'dd'),'yyyymmdd')
) sq3 on sq0.user_id = sq3.user_id
;其中<project_name>需要替換為您的專案名稱。
相關文檔
特徵生產的詳細過程,請參見特徵平台與特徵生產。
阿里雲特徵平台(FeatureStore)基本適用於所有需要特徵的情境,比如推薦情境、金融風控情境、使用者增長情境。同時,FeatureStore與阿里雲常用資料來源引擎、建議服務引擎完成對接,可為您提供端到端高效便捷的一站式從特徵註冊管理到模型開發應用的全流程操作平台。更多關於FeatureStore的資訊,請參見FeatureStore概述。
如果您在配置或使用過程中有任何問題,可以搜尋DingTalk群號:34415007523,進入答疑群聯絡技術人員進行諮詢。