FeatureStore, sebuah platform fitur dalam Platform for AI (PAI), menstandarkan operasi rekayasa fitur umum sehingga Anda dapat menghasilkan ratusan fitur dari tabel sumber mentah hanya dengan beberapa baris kode. Panduan ini memandu Anda melalui skenario rekomendasi: mulai dari tiga tabel sumber, Anda akan menghasilkan tabel fitur pengguna dan item, lalu menggabungkannya menjadi satu set data pelatihan lengkap yang siap untuk Pelatihan model.
Apa yang akan Anda buat
Pada akhir panduan ini, Anda akan memiliki:
Tiga tabel sumber yang telah diproses sebelumnya (pengguna, item, dan perilaku) yang disinkronkan ke proyek MaxCompute Anda
Satu tabel perilaku lebar dan satu tabel label yang diturunkan dari sumber-sumber tersebut
Dua tabel fitur pengguna yang dihasilkan menggunakan
AggregationTransformdanWindowTransformDua tabel fitur item yang dihasilkan dengan cara yang sama
Satu set data pelatihan akhir (
fs_demo_fs_engineering_v1_training_set) yang menggabungkan ketujuh tabel tersebut
Prasyarat
Sebelum memulai, pastikan hal-hal berikut telah tersedia:
PAI
Siapkan ID AccessKey dan Rahasia AccessKey Anda. Simpan sebagai Variabel lingkungan, jangan hardcoding di dalam kode Anda. Untuk petunjuk penyiapan, lihat Buat Pasangan Kunci Akses dan Konfigurasikan AccessKey otentikasi dalam variabel lingkungan
MaxCompute
DataWorks
Aktifkan DataWorks
Beli grup sumber daya eksklusif untuk penjadwalan dan sambungkan ke ruang kerja DataWorks Anda. Lihat Gunakan grup sumber daya eksklusif untuk penjadwalan
Tambahkan sumber data MaxCompute ke ruang kerja DataWorks Anda. Lihat Ikatan sumber daya komputasi MaxCompute
Cara kerja
Panduan ini mencakup empat tahap:
Persiapan — Sinkronkan tiga tabel sumber demo dari ruang kerja
pai_online_projectke proyek MaxCompute Anda, dan instal FeatureStore Python SDK.Transformasi — Gabungkan dan proses sebelumnya tabel-tabel sumber untuk menghasilkan tabel perilaku lebar dan tabel label.
Hasilkan fitur — Jalankan pipeline FeatureStore untuk menghasilkan tabel fitur agregasi dan jendela (window) baik untuk pengguna maupun item.
Buat set data pelatihan — Gabungkan semua tabel fitur menjadi satu tabel sampel.
Langkah 1: Persiapkan tabel sumber dan SDK
Sinkronkan tabel sumber
Ketiga tabel sumber untuk tutorial ini disimpan di ruang kerja bersama pai_online_project. Tabel-tabel tersebut hanya berisi data simulasi. Gunakan DataWorks untuk menyalinnya ke proyek MaxCompute Anda sendiri.
| Tabel | Isi |
|---|---|
rec_sln_demo_user_table_preprocess_v1 | Fitur dasar pengguna: jenis kelamin, usia, kota, jumlah pengikut |
rec_sln_demo_item_table_preprocess_v1 | Fitur dasar item: kategori, penulis, jumlah klik kumulatif, jumlah pujian kumulatif |
rec_sln_demo_behavior_table_preprocess_v1 | Event perilaku: pengguna mana yang mengklik item mana dan kapan |
Untuk menyinkronkan tabel:
Masuk ke Konsol DataWorks.
Di panel navigasi kiri, klik Data Development and O&M > Data Development.
Pilih ruang kerja DataWorks Anda dan klik Go to Data Studio.
Arahkan kursor ke Create, lalu pilih Create Node > MaxCompute > ODPS SQL. Konfigurasikan parameter node:
Parameter Nilai Node Type ODPS SQL Path Business Flow/Workflow/MaxCompute Name Masukkan nama kustom Klik Confirm.
Di editor SQL, jalankan pernyataan berikut. Setiap blok membuat struktur tabel dan memuat data dari rentang tanggal
20240530hingga20240605. Tabel pengguna:CREATE TABLE IF NOT EXISTS rec_sln_demo_user_table_preprocess_v1 LIKE pai_online_project.rec_sln_demo_user_table_preprocess_v1 STORED AS ALIORC LIFECYCLE 90; INSERT OVERWRITE TABLE rec_sln_demo_user_table_preprocess_v1 PARTITION(ds) SELECT * FROM pai_online_project.rec_sln_demo_user_table_preprocess_v1 WHERE ds >= '20240530' AND ds <= '20240605';Tabel item:
CREATE TABLE IF NOT EXISTS rec_sln_demo_item_table_preprocess_v1 LIKE pai_online_project.rec_sln_demo_item_table_preprocess_v1 STORED AS ALIORC LIFECYCLE 90; INSERT OVERWRITE TABLE rec_sln_demo_item_table_preprocess_v1 PARTITION(ds) SELECT * FROM pai_online_project.rec_sln_demo_item_table_preprocess_v1 WHERE ds >= '20240530' AND ds <= '20240605';Tabel perilaku:
CREATE TABLE IF NOT EXISTS rec_sln_demo_behavior_table_preprocess_v1 LIKE pai_online_project.rec_sln_demo_behavior_table_preprocess_v1 STORED AS ALIORC LIFECYCLE 90; INSERT OVERWRITE TABLE rec_sln_demo_behavior_table_preprocess_v1 PARTITION(ds) SELECT * FROM pai_online_project.rec_sln_demo_behavior_table_preprocess_v1 WHERE ds >= '20240530' AND ds <= '20240605';
Instal FeatureStore SDK
Jalankan perintah berikut di Jupyter Notebook (Python 3):
%pip install https://feature-store-py.oss-cn-beijing.aliyuncs.com/package/feature_store_py-2.0.2-py3-none-any.whlKemudian impor modul-modul yang akan Anda perlukan sepanjang panduan ini:
import os
from feature_store_py import FeatureStoreClient
from feature_store_py.fs_datasource import MaxComputeDataSource
from feature_store_py.feature_engineering import (
TableTransform, Condition, DayOf, ComboTransform, Feature,
AggregationTransform, auto_count_feature_transform,
WindowTransform, auto_window_feature_transform
)Langkah 2: Transformasi tabel sumber
Sebelum pembuatan fitur dimulai, ketiga tabel sumber perlu diubah bentuknya menjadi dua tabel kerja: tabel perilaku lebar yang menggabungkan ketiga sumber, dan tabel label yang mengekstraksi label pelatihan dari data perilaku.
Gunakan node ODPS SQL yang Anda buat di Langkah 1 untuk menjalankan kedua blok SQL di bawah ini.
Buat tabel perilaku lebar
Pernyataan ini menggabungkan atribut pengguna, atribut item, dan event perilaku ke dalam satu tabel denormalisasi tunggal yang digunakan oleh FeatureStore SDK untuk komputasi fitur.
CREATE TABLE IF NOT EXISTS rec_sln_demo_behavior_table_preprocess_wide_v1
(
request_id bigint
,user_id string
,page string
,net_type string
,day_h bigint COMMENT 'Jam terjadinya perilaku.'
,week_day bigint COMMENT 'Hari dalam seminggu terjadinya perilaku.'
,event_unix_time bigint
,item_id string
,event string
,playtime double
,gender string
,age bigint
,city string
,item_cnt bigint
,follow_cnt bigint
,follower_cnt bigint
,is_new_user bigint
,tags string
,duration double
,category string
,author bigint
,click_count bigint
,praise_count bigint
,is_new_item bigint
)
PARTITIONED BY
(
ds string
)
LIFECYCLE 90;
INSERT OVERWRITE TABLE rec_sln_demo_behavior_table_preprocess_wide_v1 PARTITION(ds='${bdp.system.bizdate}')
SELECT sq0.request_id
,sq0.user_id
,sq0.page
,sq0.net_type
,sq0.day_h
,sq0.week_day
,sq0.event_unix_time
,sq0.item_id
,sq0.event
,sq0.playtime
,sq1.gender
,sq1.age
,sq1.city
,sq1.item_cnt
,sq1.follow_cnt
,sq1.follower_cnt
,sq1.is_new_user
,sq1.tags
,sq2.duration
,sq2.category
,sq2.author
,sq2.click_count
,sq2.praise_count
,sq2.is_new_item
FROM (
SELECT *
FROM rec_sln_demo_behavior_table_preprocess_v1
WHERE ds = '${bdp.system.bizdate}'
) sq0
LEFT JOIN (
SELECT *
FROM rec_sln_demo_user_table_preprocess_v1
WHERE ds = '${bdp.system.bizdate}'
) sq1
ON sq0.user_id = sq1.user_id
LEFT JOIN (
SELECT *
FROM rec_sln_demo_item_table_preprocess_v1
WHERE ds = '${bdp.system.bizdate}'
) sq2
ON sq0.item_id = sq2.item_id;Buat tabel label
Pernyataan ini menurunkan label pelatihan dari tabel perilaku. Label-label tersebut adalah is_click (apakah pengguna mengklik), ln_playtime (playtime total yang telah ditransformasi logaritmik), dan is_praise (apakah pengguna memberi pujian pada item).
CREATE TABLE IF NOT EXISTS rec_sln_demo_fs_label_table_v1
(
request_id bigint
,user_id string
,page string
,net_type string
,day_h bigint COMMENT 'Jam terjadinya perilaku.'
,week_day bigint COMMENT 'Hari dalam seminggu terjadinya perilaku.'
,day_min string
,event_unix_time bigint
,item_id string
,playtime double
,is_click BIGINT
,ln_playtime DOUBLE
,is_praise BIGINT
)
PARTITIONED BY
(
ds string
)
LIFECYCLE 90;
INSERT OVERWRITE TABLE rec_sln_demo_fs_label_table_v1 PARTITION(ds='${bdp.system.bizdate}')
SELECT request_id
,user_id
,MAX(page) page
,MAX(net_type) net_type
,MAX(day_h) day_h
,MAX(week_day) week_day
,TO_CHAR(FROM_UNIXTIME(MIN(event_unix_time)),'yyyymmddhhmi') day_min
,MAX(event_unix_time) event_unix_time
,item_id
,MAX(playtime) playtime
,MAX(IF(event='click', 1, 0)) is_click
,LN(SUM(playtime) + 1) ln_playtime
,MAX(IF(event='praise', 1, 0)) is_praise
FROM rec_sln_demo_behavior_table_preprocess_v1
WHERE ds = '${bdp.system.bizdate}'
GROUP BY request_id, user_id, item_id;Setelah kedua pernyataan selesai dijalankan, Anda akan memiliki:
rec_sln_demo_behavior_table_preprocess_wide_v1— input untuk pembuatan fiturrec_sln_demo_fs_label_table_v1— input untuk tabel sampel akhir
Langkah 3: Hasilkan fitur
Fungsi ekstensi otomatis FeatureStore memungkinkan Anda menghasilkan ratusan fitur statistik dan jendela (window) hanya dari satu pemanggilan fungsi, tanpa perlu menulis SQL agregasi secara manual. Langkah-langkah di bawah ini menghasilkan fitur secara terpisah untuk pengguna dan item.
Semua contoh menggunakan API pipeline FeatureStore SDK: definisikan sumber data dan transformasi, lalu panggil execute() untuk menjalankan komputasi di MaxCompute.
Inisialisasi klien
# Muat kredensial dari variabel lingkungan — jangan pernah hardcode kunci di kode sumber
access_key_id = os.environ.get("ALIBABA_CLOUD_ACCESS_KEY_ID")
access_key_secret = os.environ.get("ALIBABA_CLOUD_ACCESS_KEY_SECRET")
project = 'project_name' # Nama proyek MaxCompute Anda
region = 'cn-hangzhou' # Wilayah tempat proyek Anda berada
fs_client = FeatureStoreClient(
access_key_id=access_key_id,
access_key_secret=access_key_secret,
region=region
)Ikat tabel sumber ke sumber data
input_bhv_table_name = "rec_sln_demo_behavior_table_preprocess_wide_v1"
ds_bhv = MaxComputeDataSource(table=input_bhv_table_name, project=project)
input_user_table_name = "rec_sln_demo_user_table_preprocess_v1"
ds_user = MaxComputeDataSource(table=input_user_table_name, project=project)
input_item_table_name = "rec_sln_demo_item_table_preprocess_v1"
ds_item = MaxComputeDataSource(table=input_item_table_name, project=project)Hasilkan fitur pengguna
AggregationTransform
auto_count_feature_transform menghasilkan fitur agregasi (jumlah, maksimum, minimum, rata-rata) untuk setiap metrik di berbagai jendela waktu — dalam kasus ini, jendela 3, 7, dan 15 hari yang dikelompokkan berdasarkan user_id.
Definisikan fitur:
agg_user_table_v1 = 'rec_sln_demo_user_table_preprocess_agg_v1'
name_prefix = "user"
input_list = ["playtime", "duration", "click_count", "praise_count"]
event_name = 'event'
event_type = 'expr'
group_by_key = "user_id"
window_size = [3, 7, 15]
user_count_feature_list = auto_count_feature_transform(
name_prefix, input_list, event_name, event_type, group_by_key, window_size
)
print("len_count_feature_list =", len(user_count_feature_list))
print("count_feature_list =", user_count_feature_list)Buat pipeline:
agg_user_bhv_pipeline = fs_client.create_pipeline(ds_bhv).add_feature_transform(user_count_feature_list)
agg_user_pipeline = fs_client.create_pipeline(ds_user, agg_user_table_v1).merge(
agg_user_bhv_pipeline, keep_input_columns=False
)Jalankan dan verifikasi:
execute_date = '20240605'
output_agg_user_table = agg_user_pipeline.execute(execute_date, drop_table=True)
# Pratinjau 20 baris pertama
agg_user_ret = output_agg_user_table.to_pandas(execute_date, limit=20)
agg_user_retWindowTransform
auto_window_feature_transform menghasilkan fitur jendela lintas dimensi — misalnya, bagaimana jumlah klik pengguna bervariasi berdasarkan jam dalam sehari atau kategori konten di berbagai jendela waktu.
Definisikan fitur:
win_user_table_v1 = 'rec_sln_demo_user_table_preprocess_win_v1'
name_prefix = 'user'
input_list = ['day_h', 'category']
agg_field = ['duration', 'click_count']
event_name = 'event'
event_type = 'expr'
group_by_key = 'user_id'
window_size = [7, 15, 30, 45]
user_win_feature_list = auto_window_feature_transform(
name_prefix, input_list, agg_field, event_name, event_type, group_by_key, window_size
)
print("len_user_win_feature_list =", len(user_win_feature_list))
print("user_win_feature_list =", user_win_feature_list)Buat pipeline:
win_user_bhv_pipeline = fs_client.create_pipeline(ds_bhv).add_feature_transform(user_win_feature_list)
win_user_pipeline = fs_client.create_pipeline(ds_user, win_user_table_v1).merge(
win_user_bhv_pipeline, keep_input_columns=False
)Jalankan dan verifikasi:
execute_date = '20240605'
output_win_user_table = win_user_pipeline.execute(execute_date, drop_table=True)
# Catatan: WindowTransform menggunakan tabel antara. Pada eksekusi pertama, isi ulang semua
# partisi historis dengan menambahkan backfill_partitions=True — ini mungkin memakan waktu:
# output_win_user_table = win_user_pipeline.execute(execute_date, drop_table=True, backfill_partitions=True)
win_user_ret = output_win_user_table.to_pandas(execute_date, limit=20)
win_user_retHasilkan fitur item
Pembuatan fitur item mengikuti pola yang sama seperti fitur pengguna, dengan item_id sebagai kunci pengelompokan.
AggregationTransform
Definisikan fitur:
agg_item_table_v1 = 'rec_sln_demo_item_table_preprocess_agg_v1'
name_prefix = "item"
input_list = ["item_cnt", "follow_cnt", "follower_cnt"]
event_name = 'event'
event_type = 'expr'
group_by_key = "item_id"
window_size = [3, 7, 15]
item_count_feature_list = auto_count_feature_transform(
name_prefix, input_list, event_name, event_type, group_by_key, window_size
)
print("len_count_feature_list =", len(item_count_feature_list))
print("count_feature_list =", item_count_feature_list)Buat pipeline:
agg_item_bhv_pipeline = fs_client.create_pipeline(ds_bhv).add_feature_transform(item_count_feature_list)
agg_item_pipeline = fs_client.create_pipeline(ds_item, agg_item_table_v1).merge(
agg_item_bhv_pipeline, keep_input_columns=False
)Jalankan dan verifikasi:
execute_date = '20240605'
output_agg_item_table = agg_item_pipeline.execute(execute_date, drop_table=True)
agg_item_ret = output_agg_item_table.to_pandas(execute_date, limit=20)
agg_item_retWindowTransform
Definisikan fitur:
win_item_table_v1 = 'rec_sln_demo_item_table_preprocess_win_v1'
name_prefix = 'item'
input_list = ['day_h', 'category']
agg_field = ['click_count', 'praise_count']
event_name = 'event'
event_type = 'expr'
group_by_key = 'item_id'
window_size = [7, 15, 30, 45]
item_win_feature_list = auto_window_feature_transform(
name_prefix, input_list, agg_field, event_name, event_type, group_by_key, window_size
)
print("len_item_win_feature_list =", len(item_win_feature_list))
print("item_win_feature_list =", item_win_feature_list)Buat pipeline:
win_item_bhv_pipeline = fs_client.create_pipeline(ds_bhv).add_feature_transform(item_win_feature_list)
win_item_pipeline = fs_client.create_pipeline(ds_item, win_item_table_v1).merge(
win_item_bhv_pipeline, keep_input_columns=False
)Jalankan dan verifikasi:
execute_date = '20240605'
output_win_item_table = win_item_pipeline.execute(execute_date, drop_table=True)
# Pada eksekusi pertama, tambahkan backfill_partitions=True jika partisi antara tidak tersedia:
# output_win_item_table = win_item_pipeline.execute(execute_date, drop_table=True, backfill_partitions=True)
win_item_ret = output_win_item_table.to_pandas(execute_date, limit=20)
win_item_retLangkah 4: Bangun set data pelatihan
Pada tahap ini Anda memiliki tujuh tabel: satu tabel label, tiga tabel sisi pengguna (sumber, agregasi, jendela), dan tiga tabel sisi item (sumber, agregasi, jendela). SQL di bawah ini menggabungkan ketujuh tabel tersebut menjadi satu set data pelatihan menggunakan LEFT JOIN dengan kunci user_id dan item_id.
Ganti <project_name> dengan nama proyek MaxCompute Anda yang sebenarnya sebelum menjalankan.
CREATE TABLE IF NOT EXISTS fs_demo_fs_engineering_v1_training_set(
request_id BIGINT,
user_id STRING,
page STRING,
net_type STRING,
day_h BIGINT,
week_day BIGINT,
day_min STRING,
event_unix_time BIGINT,
item_id STRING,
playtime DOUBLE,
is_click BIGINT,
ln_playtime DOUBLE,
is_praise BIGINT,
duration DOUBLE,
category STRING,
author BIGINT,
click_count BIGINT,
praise_count BIGINT,
is_new_item BIGINT,
item__sum_item_cnt_3d BIGINT,
item__sum_follow_cnt_3d BIGINT,
item__sum_follower_cnt_3d BIGINT,
item__max_item_cnt_3d BIGINT,
item__max_follow_cnt_3d BIGINT,
item__max_follower_cnt_3d BIGINT,
item__min_item_cnt_3d BIGINT,
item__min_follow_cnt_3d BIGINT,
item__min_follower_cnt_3d BIGINT,
item__avg_item_cnt_3d DOUBLE,
item__avg_follow_cnt_3d DOUBLE,
item__avg_follower_cnt_3d DOUBLE,
item__sum_item_cnt_7d BIGINT,
item__sum_follow_cnt_7d BIGINT,
item__sum_follower_cnt_7d BIGINT,
item__max_item_cnt_7d BIGINT,
item__max_follow_cnt_7d BIGINT,
item__max_follower_cnt_7d BIGINT,
item__min_item_cnt_7d BIGINT,
item__min_follow_cnt_7d BIGINT,
item__min_follower_cnt_7d BIGINT,
item__avg_item_cnt_7d DOUBLE,
item__avg_follow_cnt_7d DOUBLE,
item__avg_follower_cnt_7d DOUBLE,
item__sum_item_cnt_15d BIGINT,
item__sum_follow_cnt_15d BIGINT,
item__sum_follower_cnt_15d BIGINT,
item__max_item_cnt_15d BIGINT,
item__max_follow_cnt_15d BIGINT,
item__max_follower_cnt_15d BIGINT,
item__min_item_cnt_15d BIGINT,
item__min_follow_cnt_15d BIGINT,
item__min_follower_cnt_15d BIGINT,
item__avg_item_cnt_15d DOUBLE,
item__avg_follow_cnt_15d DOUBLE,
item__avg_follower_cnt_15d DOUBLE,
item__kv_day_h_click_count_sum_7d STRING,
item__kv_category_click_count_sum_7d STRING,
item__kv_day_h_praise_count_sum_7d STRING,
item__kv_category_praise_count_sum_7d STRING,
item__kv_day_h_click_count_max_7d STRING,
item__kv_category_click_count_max_7d STRING,
item__kv_day_h_praise_count_max_7d STRING,
item__kv_category_praise_count_max_7d STRING,
item__kv_day_h_click_count_min_7d STRING,
item__kv_category_click_count_min_7d STRING,
item__kv_day_h_praise_count_min_7d STRING,
item__kv_category_praise_count_min_7d STRING,
item__kv_day_h_click_count_avg_7d STRING,
item__kv_category_click_count_avg_7d STRING,
item__kv_day_h_praise_count_avg_7d STRING,
item__kv_category_praise_count_avg_7d STRING,
item__kv_day_h_click_count_sum_15d STRING,
item__kv_category_click_count_sum_15d STRING,
item__kv_day_h_praise_count_sum_15d STRING,
item__kv_category_praise_count_sum_15d STRING,
item__kv_day_h_click_count_max_15d STRING,
item__kv_category_click_count_max_15d STRING,
item__kv_day_h_praise_count_max_15d STRING,
item__kv_category_praise_count_max_15d STRING,
item__kv_day_h_click_count_min_15d STRING,
item__kv_category_click_count_min_15d STRING,
item__kv_day_h_praise_count_min_15d STRING,
item__kv_category_praise_count_min_15d STRING,
item__kv_day_h_click_count_avg_15d STRING,
item__kv_category_click_count_avg_15d STRING,
item__kv_day_h_praise_count_avg_15d STRING,
item__kv_category_praise_count_avg_15d STRING,
item__kv_day_h_click_count_sum_30d STRING,
item__kv_category_click_count_sum_30d STRING,
item__kv_day_h_praise_count_sum_30d STRING,
item__kv_category_praise_count_sum_30d STRING,
item__kv_day_h_click_count_max_30d STRING,
item__kv_category_click_count_max_30d STRING,
item__kv_day_h_praise_count_max_30d STRING,
item__kv_category_praise_count_max_30d STRING,
item__kv_day_h_click_count_min_30d STRING,
item__kv_category_click_count_min_30d STRING,
item__kv_day_h_praise_count_min_30d STRING,
item__kv_category_praise_count_min_30d STRING,
item__kv_day_h_click_count_avg_30d STRING,
item__kv_category_click_count_avg_30d STRING,
item__kv_day_h_praise_count_avg_30d STRING,
item__kv_category_praise_count_avg_30d STRING,
item__kv_day_h_click_count_sum_45d STRING,
item__kv_category_click_count_sum_45d STRING,
item__kv_day_h_praise_count_sum_45d STRING,
item__kv_category_praise_count_sum_45d STRING,
item__kv_day_h_click_count_max_45d STRING,
item__kv_category_click_count_max_45d STRING,
item__kv_day_h_praise_count_max_45d STRING,
item__kv_category_praise_count_max_45d STRING,
item__kv_day_h_click_count_min_45d STRING,
item__kv_category_click_count_min_45d STRING,
item__kv_day_h_praise_count_min_45d STRING,
item__kv_category_praise_count_min_45d STRING,
item__kv_day_h_click_count_avg_45d STRING,
item__kv_category_click_count_avg_45d STRING,
item__kv_day_h_praise_count_avg_45d STRING,
item__kv_category_praise_count_avg_45d STRING,
gender STRING,
age BIGINT,
city STRING,
item_cnt BIGINT,
follow_cnt BIGINT,
follower_cnt BIGINT,
is_new_user BIGINT,
tags STRING,
user__sum_playtime_3d DOUBLE,
user__sum_duration_3d DOUBLE,
user__sum_click_count_3d BIGINT,
user__sum_praise_count_3d BIGINT,
user__max_playtime_3d DOUBLE,
user__max_duration_3d DOUBLE,
user__max_click_count_3d BIGINT,
user__max_praise_count_3d BIGINT,
user__min_playtime_3d DOUBLE,
user__min_duration_3d DOUBLE,
user__min_click_count_3d BIGINT,
user__min_praise_count_3d BIGINT,
user__avg_playtime_3d DOUBLE,
user__avg_duration_3d DOUBLE,
user__avg_click_count_3d DOUBLE,
user__avg_praise_count_3d DOUBLE,
user__sum_playtime_7d DOUBLE,
user__sum_duration_7d DOUBLE,
user__sum_click_count_7d BIGINT,
user__sum_praise_count_7d BIGINT,
user__max_playtime_7d DOUBLE,
user__max_duration_7d DOUBLE,
user__max_click_count_7d BIGINT,
user__max_praise_count_7d BIGINT,
user__min_playtime_7d DOUBLE,
user__min_duration_7d DOUBLE,
user__min_click_count_7d BIGINT,
user__min_praise_count_7d BIGINT,
user__avg_playtime_7d DOUBLE,
user__avg_duration_7d DOUBLE,
user__avg_click_count_7d DOUBLE,
user__avg_praise_count_7d DOUBLE,
user__sum_playtime_15d DOUBLE,
user__sum_duration_15d DOUBLE,
user__sum_click_count_15d BIGINT,
user__sum_praise_count_15d BIGINT,
user__max_playtime_15d DOUBLE,
user__max_duration_15d DOUBLE,
user__max_click_count_15d BIGINT,
user__max_praise_count_15d BIGINT,
user__min_playtime_15d DOUBLE,
user__min_duration_15d DOUBLE,
user__min_click_count_15d BIGINT,
user__min_praise_count_15d BIGINT,
user__avg_playtime_15d DOUBLE,
user__avg_duration_15d DOUBLE,
user__avg_click_count_15d DOUBLE,
user__avg_praise_count_15d DOUBLE,
user__kv_day_h_duration_sum_7d STRING,
user__kv_category_duration_sum_7d STRING,
user__kv_day_h_click_count_sum_7d STRING,
user__kv_category_click_count_sum_7d STRING,
user__kv_day_h_duration_max_7d STRING,
user__kv_category_duration_max_7d STRING,
user__kv_day_h_click_count_max_7d STRING,
user__kv_category_click_count_max_7d STRING,
user__kv_day_h_duration_min_7d STRING,
user__kv_category_duration_min_7d STRING,
user__kv_day_h_click_count_min_7d STRING,
user__kv_category_click_count_min_7d STRING,
user__kv_day_h_duration_avg_7d STRING,
user__kv_category_duration_avg_7d STRING,
user__kv_day_h_click_count_avg_7d STRING,
user__kv_category_click_count_avg_7d STRING,
user__kv_day_h_duration_sum_15d STRING,
user__kv_category_duration_sum_15d STRING,
user__kv_day_h_click_count_sum_15d STRING,
user__kv_category_click_count_sum_15d STRING,
user__kv_day_h_duration_max_15d STRING,
user__kv_category_duration_max_15d STRING,
user__kv_day_h_click_count_max_15d STRING,
user__kv_category_click_count_max_15d STRING,
user__kv_day_h_duration_min_15d STRING,
user__kv_category_duration_min_15d STRING,
user__kv_day_h_click_count_min_15d STRING,
user__kv_category_click_count_min_15d STRING,
user__kv_day_h_duration_avg_15d STRING,
user__kv_category_duration_avg_15d STRING,
user__kv_day_h_click_count_avg_15d STRING,
user__kv_category_click_count_avg_15d STRING,
user__kv_day_h_duration_sum_30d STRING,
user__kv_category_duration_sum_30d STRING,
user__kv_day_h_click_count_sum_30d STRING,
user__kv_category_click_count_sum_30d STRING,
user__kv_day_h_duration_max_30d STRING,
user__kv_category_duration_max_30d STRING,
user__kv_day_h_click_count_max_30d STRING,
user__kv_category_click_count_max_30d STRING,
user__kv_day_h_duration_min_30d STRING,
user__kv_category_duration_min_30d STRING,
user__kv_day_h_click_count_min_30d STRING,
user__kv_category_click_count_min_30d STRING,
user__kv_day_h_duration_avg_30d STRING,
user__kv_category_duration_avg_30d STRING,
user__kv_day_h_click_count_avg_30d STRING,
user__kv_category_click_count_avg_30d STRING,
user__kv_day_h_duration_sum_45d STRING,
user__kv_category_duration_sum_45d STRING,
user__kv_day_h_click_count_sum_45d STRING,
user__kv_category_click_count_sum_45d STRING,
user__kv_day_h_duration_max_45d STRING,
user__kv_category_duration_max_45d STRING,
user__kv_day_h_click_count_max_45d STRING,
user__kv_category_click_count_max_45d STRING,
user__kv_day_h_duration_min_45d STRING,
user__kv_category_duration_min_45d STRING,
user__kv_day_h_click_count_min_45d STRING,
user__kv_category_click_count_min_45d STRING,
user__kv_day_h_duration_avg_45d STRING,
user__kv_category_duration_avg_45d STRING,
user__kv_day_h_click_count_avg_45d STRING,
user__kv_category_click_count_avg_45d STRING
)
PARTITIONED BY (ds STRING)
LIFECYCLE 90;
INSERT OVERWRITE TABLE fs_demo_fs_engineering_v1_training_set PARTITION (ds = '${bdp.system.bizdate}')
SELECT
sq0.request_id,
sq0.user_id,
sq0.page,
sq0.net_type,
sq0.day_h,
sq0.week_day,
sq0.day_min,
sq0.event_unix_time,
sq0.item_id,
sq0.playtime,
sq0.is_click,
sq0.ln_playtime,
sq0.is_praise,
sq2.duration,
sq2.category,
sq2.author,
sq2.click_count,
sq2.praise_count,
sq2.is_new_item,
sq5.item__sum_item_cnt_3d,
sq5.item__sum_follow_cnt_3d,
sq5.item__sum_follower_cnt_3d,
sq5.item__max_item_cnt_3d,
sq5.item__max_follow_cnt_3d,
sq5.item__max_follower_cnt_3d,
sq5.item__min_item_cnt_3d,
sq5.item__min_follow_cnt_3d,
sq5.item__min_follower_cnt_3d,
sq5.item__avg_item_cnt_3d,
sq5.item__avg_follow_cnt_3d,
sq5.item__avg_follower_cnt_3d,
sq5.item__sum_item_cnt_7d,
sq5.item__sum_follow_cnt_7d,
sq5.item__sum_follower_cnt_7d,
sq5.item__max_item_cnt_7d,
sq5.item__max_follow_cnt_7d,
sq5.item__max_follower_cnt_7d,
sq5.item__min_item_cnt_7d,
sq5.item__min_follow_cnt_7d,
sq5.item__min_follower_cnt_7d,
sq5.item__avg_item_cnt_7d,
sq5.item__avg_follow_cnt_7d,
sq5.item__avg_follower_cnt_7d,
sq5.item__sum_item_cnt_15d,
sq5.item__sum_follow_cnt_15d,
sq5.item__sum_follower_cnt_15d,
sq5.item__max_item_cnt_15d,
sq5.item__max_follow_cnt_15d,
sq5.item__max_follower_cnt_15d,
sq5.item__min_item_cnt_15d,
sq5.item__min_follow_cnt_15d,
sq5.item__min_follower_cnt_15d,
sq5.item__avg_item_cnt_15d,
sq5.item__avg_follow_cnt_15d,
sq5.item__avg_follower_cnt_15d,
sq6.item__kv_day_h_click_count_sum_7d,
sq6.item__kv_category_click_count_sum_7d,
sq6.item__kv_day_h_praise_count_sum_7d,
sq6.item__kv_category_praise_count_sum_7d,
sq6.item__kv_day_h_click_count_max_7d,
sq6.item__kv_category_click_count_max_7d,
sq6.item__kv_day_h_praise_count_max_7d,
sq6.item__kv_category_praise_count_max_7d,
sq6.item__kv_day_h_click_count_min_7d,
sq6.item__kv_category_click_count_min_7d,
sq6.item__kv_day_h_praise_count_min_7d,
sq6.item__kv_category_praise_count_min_7d,
sq6.item__kv_day_h_click_count_avg_7d,
sq6.item__kv_category_click_count_avg_7d,
sq6.item__kv_day_h_praise_count_avg_7d,
sq6.item__kv_category_praise_count_avg_7d,
sq6.item__kv_day_h_click_count_sum_15d,
sq6.item__kv_category_click_count_sum_15d,
sq6.item__kv_day_h_praise_count_sum_15d,
sq6.item__kv_category_praise_count_sum_15d,
sq6.item__kv_day_h_click_count_max_15d,
sq6.item__kv_category_click_count_max_15d,
sq6.item__kv_day_h_praise_count_max_15d,
sq6.item__kv_category_praise_count_max_15d,
sq6.item__kv_day_h_click_count_min_15d,
sq6.item__kv_category_click_count_min_15d,
sq6.item__kv_day_h_praise_count_min_15d,
sq6.item__kv_category_praise_count_min_15d,
sq6.item__kv_day_h_click_count_avg_15d,
sq6.item__kv_category_click_count_avg_15d,
sq6.item__kv_day_h_praise_count_avg_15d,
sq6.item__kv_category_praise_count_avg_15d,
sq6.item__kv_day_h_click_count_sum_30d,
sq6.item__kv_category_click_count_sum_30d,
sq6.item__kv_day_h_praise_count_sum_30d,
sq6.item__kv_category_praise_count_sum_30d,
sq6.item__kv_day_h_click_count_max_30d,
sq6.item__kv_category_click_count_max_30d,
sq6.item__kv_day_h_praise_count_max_30d,
sq6.item__kv_category_praise_count_max_30d,
sq6.item__kv_day_h_click_count_min_30d,
sq6.item__kv_category_click_count_min_30d,
sq6.item__kv_day_h_praise_count_min_30d,
sq6.item__kv_category_praise_count_min_30d,
sq6.item__kv_day_h_click_count_avg_30d,
sq6.item__kv_category_click_count_avg_30d,
sq6.item__kv_day_h_praise_count_avg_30d,
sq6.item__kv_category_praise_count_avg_30d,
sq6.item__kv_day_h_click_count_sum_45d,
sq6.item__kv_category_click_count_sum_45d,
sq6.item__kv_day_h_praise_count_sum_45d,
sq6.item__kv_category_praise_count_sum_45d,
sq6.item__kv_day_h_click_count_max_45d,
sq6.item__kv_category_click_count_max_45d,
sq6.item__kv_day_h_praise_count_max_45d,
sq6.item__kv_category_praise_count_max_45d,
sq6.item__kv_day_h_click_count_min_45d,
sq6.item__kv_category_click_count_min_45d,
sq6.item__kv_day_h_praise_count_min_45d,
sq6.item__kv_category_praise_count_min_45d,
sq6.item__kv_day_h_click_count_avg_45d,
sq6.item__kv_category_click_count_avg_45d,
sq6.item__kv_day_h_praise_count_avg_45d,
sq6.item__kv_category_praise_count_avg_45d,
sq1.gender,
sq1.age,
sq1.city,
sq1.item_cnt,
sq1.follow_cnt,
sq1.follower_cnt,
sq1.is_new_user,
sq1.tags,
sq3.user__sum_playtime_3d,
sq3.user__sum_duration_3d,
sq3.user__sum_click_count_3d,
sq3.user__sum_praise_count_3d,
sq3.user__max_playtime_3d,
sq3.user__max_duration_3d,
sq3.user__max_click_count_3d,
sq3.user__max_praise_count_3d,
sq3.user__min_playtime_3d,
sq3.user__min_duration_3d,
sq3.user__min_click_count_3d,
sq3.user__min_praise_count_3d,
sq3.user__avg_playtime_3d,
sq3.user__avg_duration_3d,
sq3.user__avg_click_count_3d,
sq3.user__avg_praise_count_3d,
sq3.user__sum_playtime_7d,
sq3.user__sum_duration_7d,
sq3.user__sum_click_count_7d,
sq3.user__sum_praise_count_7d,
sq3.user__max_playtime_7d,
sq3.user__max_duration_7d,
sq3.user__max_click_count_7d,
sq3.user__max_praise_count_7d,
sq3.user__min_playtime_7d,
sq3.user__min_duration_7d,
sq3.user__min_click_count_7d,
sq3.user__min_praise_count_7d,
sq3.user__avg_playtime_7d,
sq3.user__avg_duration_7d,
sq3.user__avg_click_count_7d,
sq3.user__avg_praise_count_7d,
sq3.user__sum_playtime_15d,
sq3.user__sum_duration_15d,
sq3.user__sum_click_count_15d,
sq3.user__sum_praise_count_15d,
sq3.user__max_playtime_15d,
sq3.user__max_duration_15d,
sq3.user__max_click_count_15d,
sq3.user__max_praise_count_15d,
sq3.user__min_playtime_15d,
sq3.user__min_duration_15d,
sq3.user__min_click_count_15d,
sq3.user__min_praise_count_15d,
sq3.user__avg_playtime_15d,
sq3.user__avg_duration_15d,
sq3.user__avg_click_count_15d,
sq3.user__avg_praise_count_15d,
sq4.user__kv_day_h_duration_sum_7d,
sq4.user__kv_category_duration_sum_7d,
sq4.user__kv_day_h_click_count_sum_7d,
sq4.user__kv_category_click_count_sum_7d,
sq4.user__kv_day_h_duration_max_7d,
sq4.user__kv_category_duration_max_7d,
sq4.user__kv_day_h_click_count_max_7d,
sq4.user__kv_category_click_count_max_7d,
sq4.user__kv_day_h_duration_min_7d,
sq4.user__kv_category_duration_min_7d,
sq4.user__kv_day_h_click_count_min_7d,
sq4.user__kv_category_click_count_min_7d,
sq4.user__kv_day_h_duration_avg_7d,
sq4.user__kv_category_duration_avg_7d,
sq4.user__kv_day_h_click_count_avg_7d,
sq4.user__kv_category_click_count_avg_7d,
sq4.user__kv_day_h_duration_sum_15d,
sq4.user__kv_category_duration_sum_15d,
sq4.user__kv_day_h_click_count_sum_15d,
sq4.user__kv_category_click_count_sum_15d,
sq4.user__kv_day_h_duration_max_15d,
sq4.user__kv_category_duration_max_15d,
sq4.user__kv_day_h_click_count_max_15d,
sq4.user__kv_category_click_count_max_15d,
sq4.user__kv_day_h_duration_min_15d,
sq4.user__kv_category_duration_min_15d,
sq4.user__kv_day_h_click_count_min_15d,
sq4.user__kv_category_click_count_min_15d,
sq4.user__kv_day_h_duration_avg_15d,
sq4.user__kv_category_duration_avg_15d,
sq4.user__kv_day_h_click_count_avg_15d,
sq4.user__kv_category_click_count_avg_15d,
sq4.user__kv_day_h_duration_sum_30d,
sq4.user__kv_category_duration_sum_30d,
sq4.user__kv_day_h_click_count_sum_30d,
sq4.user__kv_category_click_count_sum_30d,
sq4.user__kv_day_h_duration_max_30d,
sq4.user__kv_category_duration_max_30d,
sq4.user__kv_day_h_click_count_max_30d,
sq4.user__kv_category_click_count_max_30d,
sq4.user__kv_day_h_duration_min_30d,
sq4.user__kv_category_duration_min_30d,
sq4.user__kv_day_h_click_count_min_30d,
sq4.user__kv_category_click_count_min_30d,
sq4.user__kv_day_h_duration_avg_30d,
sq4.user__kv_category_duration_avg_30d,
sq4.user__kv_day_h_click_count_avg_30d,
sq4.user__kv_category_click_count_avg_30d,
sq4.user__kv_day_h_duration_sum_45d,
sq4.user__kv_category_duration_sum_45d,
sq4.user__kv_day_h_click_count_sum_45d,
sq4.user__kv_category_click_count_sum_45d,
sq4.user__kv_day_h_duration_max_45d,
sq4.user__kv_category_duration_max_45d,
sq4.user__kv_day_h_click_count_max_45d,
sq4.user__kv_category_click_count_max_45d,
sq4.user__kv_day_h_duration_min_45d,
sq4.user__kv_category_duration_min_45d,
sq4.user__kv_day_h_click_count_min_45d,
sq4.user__kv_category_click_count_min_45d,
sq4.user__kv_day_h_duration_avg_45d,
sq4.user__kv_category_duration_avg_45d,
sq4.user__kv_day_h_click_count_avg_45d,
sq4.user__kv_category_click_count_avg_45d
FROM
(
SELECT *
FROM rec_sln_demo_fs_label_table_v1
WHERE ds = '${bdp.system.bizdate}'
) sq0
LEFT JOIN (
SELECT *
FROM <project_name>.rec_sln_demo_user_table_preprocess_v1
WHERE ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), -1,'dd'),'yyyymmdd')
) sq1 ON sq0.user_id = sq1.user_id
LEFT JOIN (
SELECT *
FROM <project_name>.rec_sln_demo_item_table_preprocess_v1
WHERE ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), -1,'dd'),'yyyymmdd')
) sq2 ON sq0.item_id = sq2.item_id
LEFT JOIN (
SELECT *
FROM <project_name>.rec_sln_demo_user_table_preprocess_agg_v1
WHERE ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), -1,'dd'),'yyyymmdd')
) sq3 ON sq0.user_id = sq3.user_id
LEFT JOIN (
SELECT *
FROM <project_name>.rec_sln_demo_user_table_preprocess_win_v1
WHERE ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), -1,'dd'),'yyyymmdd')
) sq4 ON sq0.user_id = sq4.user_id
LEFT JOIN (
SELECT *
FROM <project_name>.rec_sln_demo_item_table_preprocess_agg_v1
WHERE ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), -1,'dd'),'yyyymmdd')
) sq5 ON sq0.item_id = sq5.item_id
LEFT JOIN (
SELECT *
FROM <project_name>.rec_sln_demo_item_table_preprocess_win_v1
WHERE ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), -1,'dd'),'yyyymmdd')
) sq6 ON sq0.item_id = sq6.item_id;Hasilnya adalah fs_demo_fs_engineering_v1_training_set, sebuah tabel partisi dengan siklus hidup 90 hari yang menggabungkan semua fitur pengguna, fitur item, dan label pelatihan dalam satu baris per permintaan.
Langkah selanjutnya
Untuk memahami platform fitur FeatureStore secara lengkap, termasuk penyajian online dan pendaftaran fitur, lihat Ikhtisar FeatureStore.
Untuk referensi API lengkap tentang transformasi rekayasa fitur, lihat Platform fitur dan rekayasa fitur.
Untuk dukungan teknis, bergabunglah dengan grup DingTalk 34415007523.