All Products
Search
Document Center

Platform For AI:Praktik terbaik produksi fitur

Last Updated:Apr 01, 2026

FeatureStore, sebuah platform fitur dalam Platform for AI (PAI), menstandarkan operasi rekayasa fitur umum sehingga Anda dapat menghasilkan ratusan fitur dari tabel sumber mentah hanya dengan beberapa baris kode. Panduan ini memandu Anda melalui skenario rekomendasi: mulai dari tiga tabel sumber, Anda akan menghasilkan tabel fitur pengguna dan item, lalu menggabungkannya menjadi satu set data pelatihan lengkap yang siap untuk Pelatihan model.

Apa yang akan Anda buat

Pada akhir panduan ini, Anda akan memiliki:

  • Tiga tabel sumber yang telah diproses sebelumnya (pengguna, item, dan perilaku) yang disinkronkan ke proyek MaxCompute Anda

  • Satu tabel perilaku lebar dan satu tabel label yang diturunkan dari sumber-sumber tersebut

  • Dua tabel fitur pengguna yang dihasilkan menggunakan AggregationTransform dan WindowTransform

  • Dua tabel fitur item yang dihasilkan dengan cara yang sama

  • Satu set data pelatihan akhir (fs_demo_fs_engineering_v1_training_set) yang menggabungkan ketujuh tabel tersebut

Prasyarat

Sebelum memulai, pastikan hal-hal berikut telah tersedia:

PAI

MaxCompute

DataWorks

Cara kerja

Panduan ini mencakup empat tahap:

  1. Persiapan — Sinkronkan tiga tabel sumber demo dari ruang kerja pai_online_project ke proyek MaxCompute Anda, dan instal FeatureStore Python SDK.

  2. Transformasi — Gabungkan dan proses sebelumnya tabel-tabel sumber untuk menghasilkan tabel perilaku lebar dan tabel label.

  3. Hasilkan fitur — Jalankan pipeline FeatureStore untuk menghasilkan tabel fitur agregasi dan jendela (window) baik untuk pengguna maupun item.

  4. Buat set data pelatihan — Gabungkan semua tabel fitur menjadi satu tabel sampel.

Langkah 1: Persiapkan tabel sumber dan SDK

Sinkronkan tabel sumber

Ketiga tabel sumber untuk tutorial ini disimpan di ruang kerja bersama pai_online_project. Tabel-tabel tersebut hanya berisi data simulasi. Gunakan DataWorks untuk menyalinnya ke proyek MaxCompute Anda sendiri.

TabelIsi
rec_sln_demo_user_table_preprocess_v1Fitur dasar pengguna: jenis kelamin, usia, kota, jumlah pengikut
rec_sln_demo_item_table_preprocess_v1Fitur dasar item: kategori, penulis, jumlah klik kumulatif, jumlah pujian kumulatif
rec_sln_demo_behavior_table_preprocess_v1Event perilaku: pengguna mana yang mengklik item mana dan kapan

Untuk menyinkronkan tabel:

  1. Masuk ke Konsol DataWorks.

  2. Di panel navigasi kiri, klik Data Development and O&M > Data Development.

  3. Pilih ruang kerja DataWorks Anda dan klik Go to Data Studio.

  4. Arahkan kursor ke Create, lalu pilih Create Node > MaxCompute > ODPS SQL. Konfigurasikan parameter node:

    ParameterNilai
    Node TypeODPS SQL
    PathBusiness Flow/Workflow/MaxCompute
    NameMasukkan nama kustom
  5. Klik Confirm.

  6. Di editor SQL, jalankan pernyataan berikut. Setiap blok membuat struktur tabel dan memuat data dari rentang tanggal 20240530 hingga 20240605. Tabel pengguna:

    CREATE TABLE IF NOT EXISTS rec_sln_demo_user_table_preprocess_v1
    LIKE pai_online_project.rec_sln_demo_user_table_preprocess_v1
    STORED AS ALIORC
    LIFECYCLE 90;
    
    INSERT OVERWRITE TABLE rec_sln_demo_user_table_preprocess_v1 PARTITION(ds)
    SELECT *
    FROM pai_online_project.rec_sln_demo_user_table_preprocess_v1
    WHERE ds >= '20240530' AND ds <= '20240605';

    Tabel item:

    CREATE TABLE IF NOT EXISTS rec_sln_demo_item_table_preprocess_v1
    LIKE pai_online_project.rec_sln_demo_item_table_preprocess_v1
    STORED AS ALIORC
    LIFECYCLE 90;
    
    INSERT OVERWRITE TABLE rec_sln_demo_item_table_preprocess_v1 PARTITION(ds)
    SELECT *
    FROM pai_online_project.rec_sln_demo_item_table_preprocess_v1
    WHERE ds >= '20240530' AND ds <= '20240605';

    Tabel perilaku:

    CREATE TABLE IF NOT EXISTS rec_sln_demo_behavior_table_preprocess_v1
    LIKE pai_online_project.rec_sln_demo_behavior_table_preprocess_v1
    STORED AS ALIORC
    LIFECYCLE 90;
    
    INSERT OVERWRITE TABLE rec_sln_demo_behavior_table_preprocess_v1 PARTITION(ds)
    SELECT *
    FROM pai_online_project.rec_sln_demo_behavior_table_preprocess_v1
    WHERE ds >= '20240530' AND ds <= '20240605';

Instal FeatureStore SDK

Jalankan perintah berikut di Jupyter Notebook (Python 3):

%pip install https://feature-store-py.oss-cn-beijing.aliyuncs.com/package/feature_store_py-2.0.2-py3-none-any.whl

Kemudian impor modul-modul yang akan Anda perlukan sepanjang panduan ini:

import os
from feature_store_py import FeatureStoreClient
from feature_store_py.fs_datasource import MaxComputeDataSource
from feature_store_py.feature_engineering import (
    TableTransform, Condition, DayOf, ComboTransform, Feature,
    AggregationTransform, auto_count_feature_transform,
    WindowTransform, auto_window_feature_transform
)

Langkah 2: Transformasi tabel sumber

Sebelum pembuatan fitur dimulai, ketiga tabel sumber perlu diubah bentuknya menjadi dua tabel kerja: tabel perilaku lebar yang menggabungkan ketiga sumber, dan tabel label yang mengekstraksi label pelatihan dari data perilaku.

Gunakan node ODPS SQL yang Anda buat di Langkah 1 untuk menjalankan kedua blok SQL di bawah ini.

Buat tabel perilaku lebar

Pernyataan ini menggabungkan atribut pengguna, atribut item, dan event perilaku ke dalam satu tabel denormalisasi tunggal yang digunakan oleh FeatureStore SDK untuk komputasi fitur.

CREATE TABLE IF NOT EXISTS rec_sln_demo_behavior_table_preprocess_wide_v1
(
    request_id bigint
    ,user_id string
    ,page string
    ,net_type string
    ,day_h bigint COMMENT 'Jam terjadinya perilaku.'
    ,week_day bigint COMMENT 'Hari dalam seminggu terjadinya perilaku.'
    ,event_unix_time bigint
    ,item_id string
    ,event string
    ,playtime double
    ,gender string
    ,age bigint
    ,city string
    ,item_cnt bigint
    ,follow_cnt bigint
    ,follower_cnt bigint
    ,is_new_user bigint
    ,tags string
    ,duration double
    ,category string
    ,author bigint
    ,click_count bigint
    ,praise_count bigint
    ,is_new_item bigint
)
PARTITIONED BY
(
    ds string
)
LIFECYCLE 90;

INSERT OVERWRITE TABLE rec_sln_demo_behavior_table_preprocess_wide_v1 PARTITION(ds='${bdp.system.bizdate}')
SELECT  sq0.request_id
        ,sq0.user_id
        ,sq0.page
        ,sq0.net_type
        ,sq0.day_h
        ,sq0.week_day
        ,sq0.event_unix_time
        ,sq0.item_id
        ,sq0.event
        ,sq0.playtime
        ,sq1.gender
        ,sq1.age
        ,sq1.city
        ,sq1.item_cnt
        ,sq1.follow_cnt
        ,sq1.follower_cnt
        ,sq1.is_new_user
        ,sq1.tags
        ,sq2.duration
        ,sq2.category
        ,sq2.author
        ,sq2.click_count
        ,sq2.praise_count
        ,sq2.is_new_item
FROM    (
            SELECT  *
            FROM    rec_sln_demo_behavior_table_preprocess_v1
            WHERE   ds = '${bdp.system.bizdate}'
        ) sq0
LEFT JOIN (
              SELECT  *
              FROM    rec_sln_demo_user_table_preprocess_v1
              WHERE   ds = '${bdp.system.bizdate}'
          ) sq1
ON      sq0.user_id = sq1.user_id
LEFT JOIN (
              SELECT  *
              FROM    rec_sln_demo_item_table_preprocess_v1
              WHERE   ds = '${bdp.system.bizdate}'
          ) sq2
ON      sq0.item_id = sq2.item_id;

Buat tabel label

Pernyataan ini menurunkan label pelatihan dari tabel perilaku. Label-label tersebut adalah is_click (apakah pengguna mengklik), ln_playtime (playtime total yang telah ditransformasi logaritmik), dan is_praise (apakah pengguna memberi pujian pada item).

CREATE TABLE IF NOT EXISTS rec_sln_demo_fs_label_table_v1
(
    request_id bigint
    ,user_id string
    ,page string
    ,net_type string
    ,day_h bigint COMMENT 'Jam terjadinya perilaku.'
    ,week_day bigint COMMENT 'Hari dalam seminggu terjadinya perilaku.'
    ,day_min string
    ,event_unix_time bigint
    ,item_id string
    ,playtime double
    ,is_click BIGINT
    ,ln_playtime DOUBLE
    ,is_praise BIGINT
)
PARTITIONED BY
(
    ds string
)
LIFECYCLE 90;

INSERT OVERWRITE TABLE rec_sln_demo_fs_label_table_v1 PARTITION(ds='${bdp.system.bizdate}')
SELECT  request_id
        ,user_id
        ,MAX(page) page
        ,MAX(net_type) net_type
        ,MAX(day_h) day_h
        ,MAX(week_day) week_day
        ,TO_CHAR(FROM_UNIXTIME(MIN(event_unix_time)),'yyyymmddhhmi') day_min
        ,MAX(event_unix_time) event_unix_time
        ,item_id
        ,MAX(playtime) playtime
        ,MAX(IF(event='click', 1, 0)) is_click
        ,LN(SUM(playtime) + 1) ln_playtime
        ,MAX(IF(event='praise', 1, 0)) is_praise
FROM    rec_sln_demo_behavior_table_preprocess_v1
WHERE   ds = '${bdp.system.bizdate}'
GROUP BY request_id, user_id, item_id;

Setelah kedua pernyataan selesai dijalankan, Anda akan memiliki:

  • rec_sln_demo_behavior_table_preprocess_wide_v1 — input untuk pembuatan fitur

  • rec_sln_demo_fs_label_table_v1 — input untuk tabel sampel akhir

Langkah 3: Hasilkan fitur

Fungsi ekstensi otomatis FeatureStore memungkinkan Anda menghasilkan ratusan fitur statistik dan jendela (window) hanya dari satu pemanggilan fungsi, tanpa perlu menulis SQL agregasi secara manual. Langkah-langkah di bawah ini menghasilkan fitur secara terpisah untuk pengguna dan item.

Semua contoh menggunakan API pipeline FeatureStore SDK: definisikan sumber data dan transformasi, lalu panggil execute() untuk menjalankan komputasi di MaxCompute.

Inisialisasi klien

# Muat kredensial dari variabel lingkungan — jangan pernah hardcode kunci di kode sumber
access_key_id = os.environ.get("ALIBABA_CLOUD_ACCESS_KEY_ID")
access_key_secret = os.environ.get("ALIBABA_CLOUD_ACCESS_KEY_SECRET")

project = 'project_name'   # Nama proyek MaxCompute Anda
region = 'cn-hangzhou'     # Wilayah tempat proyek Anda berada

fs_client = FeatureStoreClient(
    access_key_id=access_key_id,
    access_key_secret=access_key_secret,
    region=region
)

Ikat tabel sumber ke sumber data

input_bhv_table_name = "rec_sln_demo_behavior_table_preprocess_wide_v1"
ds_bhv = MaxComputeDataSource(table=input_bhv_table_name, project=project)

input_user_table_name = "rec_sln_demo_user_table_preprocess_v1"
ds_user = MaxComputeDataSource(table=input_user_table_name, project=project)

input_item_table_name = "rec_sln_demo_item_table_preprocess_v1"
ds_item = MaxComputeDataSource(table=input_item_table_name, project=project)

Hasilkan fitur pengguna

AggregationTransform

auto_count_feature_transform menghasilkan fitur agregasi (jumlah, maksimum, minimum, rata-rata) untuk setiap metrik di berbagai jendela waktu — dalam kasus ini, jendela 3, 7, dan 15 hari yang dikelompokkan berdasarkan user_id.

Definisikan fitur:

agg_user_table_v1 = 'rec_sln_demo_user_table_preprocess_agg_v1'

name_prefix = "user"
input_list = ["playtime", "duration", "click_count", "praise_count"]
event_name = 'event'
event_type = 'expr'
group_by_key = "user_id"
window_size = [3, 7, 15]

user_count_feature_list = auto_count_feature_transform(
    name_prefix, input_list, event_name, event_type, group_by_key, window_size
)
print("len_count_feature_list =", len(user_count_feature_list))
print("count_feature_list =", user_count_feature_list)

Buat pipeline:

agg_user_bhv_pipeline = fs_client.create_pipeline(ds_bhv).add_feature_transform(user_count_feature_list)
agg_user_pipeline = fs_client.create_pipeline(ds_user, agg_user_table_v1).merge(
    agg_user_bhv_pipeline, keep_input_columns=False
)

Jalankan dan verifikasi:

execute_date = '20240605'
output_agg_user_table = agg_user_pipeline.execute(execute_date, drop_table=True)

# Pratinjau 20 baris pertama
agg_user_ret = output_agg_user_table.to_pandas(execute_date, limit=20)
agg_user_ret

WindowTransform

auto_window_feature_transform menghasilkan fitur jendela lintas dimensi — misalnya, bagaimana jumlah klik pengguna bervariasi berdasarkan jam dalam sehari atau kategori konten di berbagai jendela waktu.

Definisikan fitur:

win_user_table_v1 = 'rec_sln_demo_user_table_preprocess_win_v1'

name_prefix = 'user'
input_list = ['day_h', 'category']
agg_field = ['duration', 'click_count']
event_name = 'event'
event_type = 'expr'
group_by_key = 'user_id'
window_size = [7, 15, 30, 45]

user_win_feature_list = auto_window_feature_transform(
    name_prefix, input_list, agg_field, event_name, event_type, group_by_key, window_size
)
print("len_user_win_feature_list =", len(user_win_feature_list))
print("user_win_feature_list =", user_win_feature_list)

Buat pipeline:

win_user_bhv_pipeline = fs_client.create_pipeline(ds_bhv).add_feature_transform(user_win_feature_list)
win_user_pipeline = fs_client.create_pipeline(ds_user, win_user_table_v1).merge(
    win_user_bhv_pipeline, keep_input_columns=False
)

Jalankan dan verifikasi:

execute_date = '20240605'
output_win_user_table = win_user_pipeline.execute(execute_date, drop_table=True)

# Catatan: WindowTransform menggunakan tabel antara. Pada eksekusi pertama, isi ulang semua
# partisi historis dengan menambahkan backfill_partitions=True — ini mungkin memakan waktu:
# output_win_user_table = win_user_pipeline.execute(execute_date, drop_table=True, backfill_partitions=True)

win_user_ret = output_win_user_table.to_pandas(execute_date, limit=20)
win_user_ret

Hasilkan fitur item

Pembuatan fitur item mengikuti pola yang sama seperti fitur pengguna, dengan item_id sebagai kunci pengelompokan.

AggregationTransform

Definisikan fitur:

agg_item_table_v1 = 'rec_sln_demo_item_table_preprocess_agg_v1'

name_prefix = "item"
input_list = ["item_cnt", "follow_cnt", "follower_cnt"]
event_name = 'event'
event_type = 'expr'
group_by_key = "item_id"
window_size = [3, 7, 15]

item_count_feature_list = auto_count_feature_transform(
    name_prefix, input_list, event_name, event_type, group_by_key, window_size
)
print("len_count_feature_list =", len(item_count_feature_list))
print("count_feature_list =", item_count_feature_list)

Buat pipeline:

agg_item_bhv_pipeline = fs_client.create_pipeline(ds_bhv).add_feature_transform(item_count_feature_list)
agg_item_pipeline = fs_client.create_pipeline(ds_item, agg_item_table_v1).merge(
    agg_item_bhv_pipeline, keep_input_columns=False
)

Jalankan dan verifikasi:

execute_date = '20240605'
output_agg_item_table = agg_item_pipeline.execute(execute_date, drop_table=True)

agg_item_ret = output_agg_item_table.to_pandas(execute_date, limit=20)
agg_item_ret

WindowTransform

Definisikan fitur:

win_item_table_v1 = 'rec_sln_demo_item_table_preprocess_win_v1'

name_prefix = 'item'
input_list = ['day_h', 'category']
agg_field = ['click_count', 'praise_count']
event_name = 'event'
event_type = 'expr'
group_by_key = 'item_id'
window_size = [7, 15, 30, 45]

item_win_feature_list = auto_window_feature_transform(
    name_prefix, input_list, agg_field, event_name, event_type, group_by_key, window_size
)
print("len_item_win_feature_list =", len(item_win_feature_list))
print("item_win_feature_list =", item_win_feature_list)

Buat pipeline:

win_item_bhv_pipeline = fs_client.create_pipeline(ds_bhv).add_feature_transform(item_win_feature_list)
win_item_pipeline = fs_client.create_pipeline(ds_item, win_item_table_v1).merge(
    win_item_bhv_pipeline, keep_input_columns=False
)

Jalankan dan verifikasi:

execute_date = '20240605'
output_win_item_table = win_item_pipeline.execute(execute_date, drop_table=True)

# Pada eksekusi pertama, tambahkan backfill_partitions=True jika partisi antara tidak tersedia:
# output_win_item_table = win_item_pipeline.execute(execute_date, drop_table=True, backfill_partitions=True)

win_item_ret = output_win_item_table.to_pandas(execute_date, limit=20)
win_item_ret

Langkah 4: Bangun set data pelatihan

Pada tahap ini Anda memiliki tujuh tabel: satu tabel label, tiga tabel sisi pengguna (sumber, agregasi, jendela), dan tiga tabel sisi item (sumber, agregasi, jendela). SQL di bawah ini menggabungkan ketujuh tabel tersebut menjadi satu set data pelatihan menggunakan LEFT JOIN dengan kunci user_id dan item_id.

Ganti <project_name> dengan nama proyek MaxCompute Anda yang sebenarnya sebelum menjalankan.

CREATE TABLE IF NOT EXISTS fs_demo_fs_engineering_v1_training_set(
     request_id BIGINT,
     user_id STRING,
     page STRING,
     net_type STRING,
     day_h BIGINT,
     week_day BIGINT,
     day_min STRING,
     event_unix_time BIGINT,
     item_id STRING,
     playtime DOUBLE,
     is_click BIGINT,
     ln_playtime DOUBLE,
     is_praise BIGINT,
     duration DOUBLE,
     category STRING,
     author BIGINT,
     click_count BIGINT,
     praise_count BIGINT,
     is_new_item BIGINT,
     item__sum_item_cnt_3d BIGINT,
     item__sum_follow_cnt_3d BIGINT,
     item__sum_follower_cnt_3d BIGINT,
     item__max_item_cnt_3d BIGINT,
     item__max_follow_cnt_3d BIGINT,
     item__max_follower_cnt_3d BIGINT,
     item__min_item_cnt_3d BIGINT,
     item__min_follow_cnt_3d BIGINT,
     item__min_follower_cnt_3d BIGINT,
     item__avg_item_cnt_3d DOUBLE,
     item__avg_follow_cnt_3d DOUBLE,
     item__avg_follower_cnt_3d DOUBLE,
     item__sum_item_cnt_7d BIGINT,
     item__sum_follow_cnt_7d BIGINT,
     item__sum_follower_cnt_7d BIGINT,
     item__max_item_cnt_7d BIGINT,
     item__max_follow_cnt_7d BIGINT,
     item__max_follower_cnt_7d BIGINT,
     item__min_item_cnt_7d BIGINT,
     item__min_follow_cnt_7d BIGINT,
     item__min_follower_cnt_7d BIGINT,
     item__avg_item_cnt_7d DOUBLE,
     item__avg_follow_cnt_7d DOUBLE,
     item__avg_follower_cnt_7d DOUBLE,
     item__sum_item_cnt_15d BIGINT,
     item__sum_follow_cnt_15d BIGINT,
     item__sum_follower_cnt_15d BIGINT,
     item__max_item_cnt_15d BIGINT,
     item__max_follow_cnt_15d BIGINT,
     item__max_follower_cnt_15d BIGINT,
     item__min_item_cnt_15d BIGINT,
     item__min_follow_cnt_15d BIGINT,
     item__min_follower_cnt_15d BIGINT,
     item__avg_item_cnt_15d DOUBLE,
     item__avg_follow_cnt_15d DOUBLE,
     item__avg_follower_cnt_15d DOUBLE,
     item__kv_day_h_click_count_sum_7d STRING,
     item__kv_category_click_count_sum_7d STRING,
     item__kv_day_h_praise_count_sum_7d STRING,
     item__kv_category_praise_count_sum_7d STRING,
     item__kv_day_h_click_count_max_7d STRING,
     item__kv_category_click_count_max_7d STRING,
     item__kv_day_h_praise_count_max_7d STRING,
     item__kv_category_praise_count_max_7d STRING,
     item__kv_day_h_click_count_min_7d STRING,
     item__kv_category_click_count_min_7d STRING,
     item__kv_day_h_praise_count_min_7d STRING,
     item__kv_category_praise_count_min_7d STRING,
     item__kv_day_h_click_count_avg_7d STRING,
     item__kv_category_click_count_avg_7d STRING,
     item__kv_day_h_praise_count_avg_7d STRING,
     item__kv_category_praise_count_avg_7d STRING,
     item__kv_day_h_click_count_sum_15d STRING,
     item__kv_category_click_count_sum_15d STRING,
     item__kv_day_h_praise_count_sum_15d STRING,
     item__kv_category_praise_count_sum_15d STRING,
     item__kv_day_h_click_count_max_15d STRING,
     item__kv_category_click_count_max_15d STRING,
     item__kv_day_h_praise_count_max_15d STRING,
     item__kv_category_praise_count_max_15d STRING,
     item__kv_day_h_click_count_min_15d STRING,
     item__kv_category_click_count_min_15d STRING,
     item__kv_day_h_praise_count_min_15d STRING,
     item__kv_category_praise_count_min_15d STRING,
     item__kv_day_h_click_count_avg_15d STRING,
     item__kv_category_click_count_avg_15d STRING,
     item__kv_day_h_praise_count_avg_15d STRING,
     item__kv_category_praise_count_avg_15d STRING,
     item__kv_day_h_click_count_sum_30d STRING,
     item__kv_category_click_count_sum_30d STRING,
     item__kv_day_h_praise_count_sum_30d STRING,
     item__kv_category_praise_count_sum_30d STRING,
     item__kv_day_h_click_count_max_30d STRING,
     item__kv_category_click_count_max_30d STRING,
     item__kv_day_h_praise_count_max_30d STRING,
     item__kv_category_praise_count_max_30d STRING,
     item__kv_day_h_click_count_min_30d STRING,
     item__kv_category_click_count_min_30d STRING,
     item__kv_day_h_praise_count_min_30d STRING,
     item__kv_category_praise_count_min_30d STRING,
     item__kv_day_h_click_count_avg_30d STRING,
     item__kv_category_click_count_avg_30d STRING,
     item__kv_day_h_praise_count_avg_30d STRING,
     item__kv_category_praise_count_avg_30d STRING,
     item__kv_day_h_click_count_sum_45d STRING,
     item__kv_category_click_count_sum_45d STRING,
     item__kv_day_h_praise_count_sum_45d STRING,
     item__kv_category_praise_count_sum_45d STRING,
     item__kv_day_h_click_count_max_45d STRING,
     item__kv_category_click_count_max_45d STRING,
     item__kv_day_h_praise_count_max_45d STRING,
     item__kv_category_praise_count_max_45d STRING,
     item__kv_day_h_click_count_min_45d STRING,
     item__kv_category_click_count_min_45d STRING,
     item__kv_day_h_praise_count_min_45d STRING,
     item__kv_category_praise_count_min_45d STRING,
     item__kv_day_h_click_count_avg_45d STRING,
     item__kv_category_click_count_avg_45d STRING,
     item__kv_day_h_praise_count_avg_45d STRING,
     item__kv_category_praise_count_avg_45d STRING,
     gender STRING,
     age BIGINT,
     city STRING,
     item_cnt BIGINT,
     follow_cnt BIGINT,
     follower_cnt BIGINT,
     is_new_user BIGINT,
     tags STRING,
     user__sum_playtime_3d DOUBLE,
     user__sum_duration_3d DOUBLE,
     user__sum_click_count_3d BIGINT,
     user__sum_praise_count_3d BIGINT,
     user__max_playtime_3d DOUBLE,
     user__max_duration_3d DOUBLE,
     user__max_click_count_3d BIGINT,
     user__max_praise_count_3d BIGINT,
     user__min_playtime_3d DOUBLE,
     user__min_duration_3d DOUBLE,
     user__min_click_count_3d BIGINT,
     user__min_praise_count_3d BIGINT,
     user__avg_playtime_3d DOUBLE,
     user__avg_duration_3d DOUBLE,
     user__avg_click_count_3d DOUBLE,
     user__avg_praise_count_3d DOUBLE,
     user__sum_playtime_7d DOUBLE,
     user__sum_duration_7d DOUBLE,
     user__sum_click_count_7d BIGINT,
     user__sum_praise_count_7d BIGINT,
     user__max_playtime_7d DOUBLE,
     user__max_duration_7d DOUBLE,
     user__max_click_count_7d BIGINT,
     user__max_praise_count_7d BIGINT,
     user__min_playtime_7d DOUBLE,
     user__min_duration_7d DOUBLE,
     user__min_click_count_7d BIGINT,
     user__min_praise_count_7d BIGINT,
     user__avg_playtime_7d DOUBLE,
     user__avg_duration_7d DOUBLE,
     user__avg_click_count_7d DOUBLE,
     user__avg_praise_count_7d DOUBLE,
     user__sum_playtime_15d DOUBLE,
     user__sum_duration_15d DOUBLE,
     user__sum_click_count_15d BIGINT,
     user__sum_praise_count_15d BIGINT,
     user__max_playtime_15d DOUBLE,
     user__max_duration_15d DOUBLE,
     user__max_click_count_15d BIGINT,
     user__max_praise_count_15d BIGINT,
     user__min_playtime_15d DOUBLE,
     user__min_duration_15d DOUBLE,
     user__min_click_count_15d BIGINT,
     user__min_praise_count_15d BIGINT,
     user__avg_playtime_15d DOUBLE,
     user__avg_duration_15d DOUBLE,
     user__avg_click_count_15d DOUBLE,
     user__avg_praise_count_15d DOUBLE,
     user__kv_day_h_duration_sum_7d STRING,
     user__kv_category_duration_sum_7d STRING,
     user__kv_day_h_click_count_sum_7d STRING,
     user__kv_category_click_count_sum_7d STRING,
     user__kv_day_h_duration_max_7d STRING,
     user__kv_category_duration_max_7d STRING,
     user__kv_day_h_click_count_max_7d STRING,
     user__kv_category_click_count_max_7d STRING,
     user__kv_day_h_duration_min_7d STRING,
     user__kv_category_duration_min_7d STRING,
     user__kv_day_h_click_count_min_7d STRING,
     user__kv_category_click_count_min_7d STRING,
     user__kv_day_h_duration_avg_7d STRING,
     user__kv_category_duration_avg_7d STRING,
     user__kv_day_h_click_count_avg_7d STRING,
     user__kv_category_click_count_avg_7d STRING,
     user__kv_day_h_duration_sum_15d STRING,
     user__kv_category_duration_sum_15d STRING,
     user__kv_day_h_click_count_sum_15d STRING,
     user__kv_category_click_count_sum_15d STRING,
     user__kv_day_h_duration_max_15d STRING,
     user__kv_category_duration_max_15d STRING,
     user__kv_day_h_click_count_max_15d STRING,
     user__kv_category_click_count_max_15d STRING,
     user__kv_day_h_duration_min_15d STRING,
     user__kv_category_duration_min_15d STRING,
     user__kv_day_h_click_count_min_15d STRING,
     user__kv_category_click_count_min_15d STRING,
     user__kv_day_h_duration_avg_15d STRING,
     user__kv_category_duration_avg_15d STRING,
     user__kv_day_h_click_count_avg_15d STRING,
     user__kv_category_click_count_avg_15d STRING,
     user__kv_day_h_duration_sum_30d STRING,
     user__kv_category_duration_sum_30d STRING,
     user__kv_day_h_click_count_sum_30d STRING,
     user__kv_category_click_count_sum_30d STRING,
     user__kv_day_h_duration_max_30d STRING,
     user__kv_category_duration_max_30d STRING,
     user__kv_day_h_click_count_max_30d STRING,
     user__kv_category_click_count_max_30d STRING,
     user__kv_day_h_duration_min_30d STRING,
     user__kv_category_duration_min_30d STRING,
     user__kv_day_h_click_count_min_30d STRING,
     user__kv_category_click_count_min_30d STRING,
     user__kv_day_h_duration_avg_30d STRING,
     user__kv_category_duration_avg_30d STRING,
     user__kv_day_h_click_count_avg_30d STRING,
     user__kv_category_click_count_avg_30d STRING,
     user__kv_day_h_duration_sum_45d STRING,
     user__kv_category_duration_sum_45d STRING,
     user__kv_day_h_click_count_sum_45d STRING,
     user__kv_category_click_count_sum_45d STRING,
     user__kv_day_h_duration_max_45d STRING,
     user__kv_category_duration_max_45d STRING,
     user__kv_day_h_click_count_max_45d STRING,
     user__kv_category_click_count_max_45d STRING,
     user__kv_day_h_duration_min_45d STRING,
     user__kv_category_duration_min_45d STRING,
     user__kv_day_h_click_count_min_45d STRING,
     user__kv_category_click_count_min_45d STRING,
     user__kv_day_h_duration_avg_45d STRING,
     user__kv_category_duration_avg_45d STRING,
     user__kv_day_h_click_count_avg_45d STRING,
     user__kv_category_click_count_avg_45d STRING
)
PARTITIONED BY (ds STRING)
LIFECYCLE 90;

INSERT OVERWRITE TABLE fs_demo_fs_engineering_v1_training_set PARTITION (ds = '${bdp.system.bizdate}')
SELECT
sq0.request_id,
sq0.user_id,
sq0.page,
sq0.net_type,
sq0.day_h,
sq0.week_day,
sq0.day_min,
sq0.event_unix_time,
sq0.item_id,
sq0.playtime,
sq0.is_click,
sq0.ln_playtime,
sq0.is_praise,
sq2.duration,
sq2.category,
sq2.author,
sq2.click_count,
sq2.praise_count,
sq2.is_new_item,
sq5.item__sum_item_cnt_3d,
sq5.item__sum_follow_cnt_3d,
sq5.item__sum_follower_cnt_3d,
sq5.item__max_item_cnt_3d,
sq5.item__max_follow_cnt_3d,
sq5.item__max_follower_cnt_3d,
sq5.item__min_item_cnt_3d,
sq5.item__min_follow_cnt_3d,
sq5.item__min_follower_cnt_3d,
sq5.item__avg_item_cnt_3d,
sq5.item__avg_follow_cnt_3d,
sq5.item__avg_follower_cnt_3d,
sq5.item__sum_item_cnt_7d,
sq5.item__sum_follow_cnt_7d,
sq5.item__sum_follower_cnt_7d,
sq5.item__max_item_cnt_7d,
sq5.item__max_follow_cnt_7d,
sq5.item__max_follower_cnt_7d,
sq5.item__min_item_cnt_7d,
sq5.item__min_follow_cnt_7d,
sq5.item__min_follower_cnt_7d,
sq5.item__avg_item_cnt_7d,
sq5.item__avg_follow_cnt_7d,
sq5.item__avg_follower_cnt_7d,
sq5.item__sum_item_cnt_15d,
sq5.item__sum_follow_cnt_15d,
sq5.item__sum_follower_cnt_15d,
sq5.item__max_item_cnt_15d,
sq5.item__max_follow_cnt_15d,
sq5.item__max_follower_cnt_15d,
sq5.item__min_item_cnt_15d,
sq5.item__min_follow_cnt_15d,
sq5.item__min_follower_cnt_15d,
sq5.item__avg_item_cnt_15d,
sq5.item__avg_follow_cnt_15d,
sq5.item__avg_follower_cnt_15d,
sq6.item__kv_day_h_click_count_sum_7d,
sq6.item__kv_category_click_count_sum_7d,
sq6.item__kv_day_h_praise_count_sum_7d,
sq6.item__kv_category_praise_count_sum_7d,
sq6.item__kv_day_h_click_count_max_7d,
sq6.item__kv_category_click_count_max_7d,
sq6.item__kv_day_h_praise_count_max_7d,
sq6.item__kv_category_praise_count_max_7d,
sq6.item__kv_day_h_click_count_min_7d,
sq6.item__kv_category_click_count_min_7d,
sq6.item__kv_day_h_praise_count_min_7d,
sq6.item__kv_category_praise_count_min_7d,
sq6.item__kv_day_h_click_count_avg_7d,
sq6.item__kv_category_click_count_avg_7d,
sq6.item__kv_day_h_praise_count_avg_7d,
sq6.item__kv_category_praise_count_avg_7d,
sq6.item__kv_day_h_click_count_sum_15d,
sq6.item__kv_category_click_count_sum_15d,
sq6.item__kv_day_h_praise_count_sum_15d,
sq6.item__kv_category_praise_count_sum_15d,
sq6.item__kv_day_h_click_count_max_15d,
sq6.item__kv_category_click_count_max_15d,
sq6.item__kv_day_h_praise_count_max_15d,
sq6.item__kv_category_praise_count_max_15d,
sq6.item__kv_day_h_click_count_min_15d,
sq6.item__kv_category_click_count_min_15d,
sq6.item__kv_day_h_praise_count_min_15d,
sq6.item__kv_category_praise_count_min_15d,
sq6.item__kv_day_h_click_count_avg_15d,
sq6.item__kv_category_click_count_avg_15d,
sq6.item__kv_day_h_praise_count_avg_15d,
sq6.item__kv_category_praise_count_avg_15d,
sq6.item__kv_day_h_click_count_sum_30d,
sq6.item__kv_category_click_count_sum_30d,
sq6.item__kv_day_h_praise_count_sum_30d,
sq6.item__kv_category_praise_count_sum_30d,
sq6.item__kv_day_h_click_count_max_30d,
sq6.item__kv_category_click_count_max_30d,
sq6.item__kv_day_h_praise_count_max_30d,
sq6.item__kv_category_praise_count_max_30d,
sq6.item__kv_day_h_click_count_min_30d,
sq6.item__kv_category_click_count_min_30d,
sq6.item__kv_day_h_praise_count_min_30d,
sq6.item__kv_category_praise_count_min_30d,
sq6.item__kv_day_h_click_count_avg_30d,
sq6.item__kv_category_click_count_avg_30d,
sq6.item__kv_day_h_praise_count_avg_30d,
sq6.item__kv_category_praise_count_avg_30d,
sq6.item__kv_day_h_click_count_sum_45d,
sq6.item__kv_category_click_count_sum_45d,
sq6.item__kv_day_h_praise_count_sum_45d,
sq6.item__kv_category_praise_count_sum_45d,
sq6.item__kv_day_h_click_count_max_45d,
sq6.item__kv_category_click_count_max_45d,
sq6.item__kv_day_h_praise_count_max_45d,
sq6.item__kv_category_praise_count_max_45d,
sq6.item__kv_day_h_click_count_min_45d,
sq6.item__kv_category_click_count_min_45d,
sq6.item__kv_day_h_praise_count_min_45d,
sq6.item__kv_category_praise_count_min_45d,
sq6.item__kv_day_h_click_count_avg_45d,
sq6.item__kv_category_click_count_avg_45d,
sq6.item__kv_day_h_praise_count_avg_45d,
sq6.item__kv_category_praise_count_avg_45d,
sq1.gender,
sq1.age,
sq1.city,
sq1.item_cnt,
sq1.follow_cnt,
sq1.follower_cnt,
sq1.is_new_user,
sq1.tags,
sq3.user__sum_playtime_3d,
sq3.user__sum_duration_3d,
sq3.user__sum_click_count_3d,
sq3.user__sum_praise_count_3d,
sq3.user__max_playtime_3d,
sq3.user__max_duration_3d,
sq3.user__max_click_count_3d,
sq3.user__max_praise_count_3d,
sq3.user__min_playtime_3d,
sq3.user__min_duration_3d,
sq3.user__min_click_count_3d,
sq3.user__min_praise_count_3d,
sq3.user__avg_playtime_3d,
sq3.user__avg_duration_3d,
sq3.user__avg_click_count_3d,
sq3.user__avg_praise_count_3d,
sq3.user__sum_playtime_7d,
sq3.user__sum_duration_7d,
sq3.user__sum_click_count_7d,
sq3.user__sum_praise_count_7d,
sq3.user__max_playtime_7d,
sq3.user__max_duration_7d,
sq3.user__max_click_count_7d,
sq3.user__max_praise_count_7d,
sq3.user__min_playtime_7d,
sq3.user__min_duration_7d,
sq3.user__min_click_count_7d,
sq3.user__min_praise_count_7d,
sq3.user__avg_playtime_7d,
sq3.user__avg_duration_7d,
sq3.user__avg_click_count_7d,
sq3.user__avg_praise_count_7d,
sq3.user__sum_playtime_15d,
sq3.user__sum_duration_15d,
sq3.user__sum_click_count_15d,
sq3.user__sum_praise_count_15d,
sq3.user__max_playtime_15d,
sq3.user__max_duration_15d,
sq3.user__max_click_count_15d,
sq3.user__max_praise_count_15d,
sq3.user__min_playtime_15d,
sq3.user__min_duration_15d,
sq3.user__min_click_count_15d,
sq3.user__min_praise_count_15d,
sq3.user__avg_playtime_15d,
sq3.user__avg_duration_15d,
sq3.user__avg_click_count_15d,
sq3.user__avg_praise_count_15d,
sq4.user__kv_day_h_duration_sum_7d,
sq4.user__kv_category_duration_sum_7d,
sq4.user__kv_day_h_click_count_sum_7d,
sq4.user__kv_category_click_count_sum_7d,
sq4.user__kv_day_h_duration_max_7d,
sq4.user__kv_category_duration_max_7d,
sq4.user__kv_day_h_click_count_max_7d,
sq4.user__kv_category_click_count_max_7d,
sq4.user__kv_day_h_duration_min_7d,
sq4.user__kv_category_duration_min_7d,
sq4.user__kv_day_h_click_count_min_7d,
sq4.user__kv_category_click_count_min_7d,
sq4.user__kv_day_h_duration_avg_7d,
sq4.user__kv_category_duration_avg_7d,
sq4.user__kv_day_h_click_count_avg_7d,
sq4.user__kv_category_click_count_avg_7d,
sq4.user__kv_day_h_duration_sum_15d,
sq4.user__kv_category_duration_sum_15d,
sq4.user__kv_day_h_click_count_sum_15d,
sq4.user__kv_category_click_count_sum_15d,
sq4.user__kv_day_h_duration_max_15d,
sq4.user__kv_category_duration_max_15d,
sq4.user__kv_day_h_click_count_max_15d,
sq4.user__kv_category_click_count_max_15d,
sq4.user__kv_day_h_duration_min_15d,
sq4.user__kv_category_duration_min_15d,
sq4.user__kv_day_h_click_count_min_15d,
sq4.user__kv_category_click_count_min_15d,
sq4.user__kv_day_h_duration_avg_15d,
sq4.user__kv_category_duration_avg_15d,
sq4.user__kv_day_h_click_count_avg_15d,
sq4.user__kv_category_click_count_avg_15d,
sq4.user__kv_day_h_duration_sum_30d,
sq4.user__kv_category_duration_sum_30d,
sq4.user__kv_day_h_click_count_sum_30d,
sq4.user__kv_category_click_count_sum_30d,
sq4.user__kv_day_h_duration_max_30d,
sq4.user__kv_category_duration_max_30d,
sq4.user__kv_day_h_click_count_max_30d,
sq4.user__kv_category_click_count_max_30d,
sq4.user__kv_day_h_duration_min_30d,
sq4.user__kv_category_duration_min_30d,
sq4.user__kv_day_h_click_count_min_30d,
sq4.user__kv_category_click_count_min_30d,
sq4.user__kv_day_h_duration_avg_30d,
sq4.user__kv_category_duration_avg_30d,
sq4.user__kv_day_h_click_count_avg_30d,
sq4.user__kv_category_click_count_avg_30d,
sq4.user__kv_day_h_duration_sum_45d,
sq4.user__kv_category_duration_sum_45d,
sq4.user__kv_day_h_click_count_sum_45d,
sq4.user__kv_category_click_count_sum_45d,
sq4.user__kv_day_h_duration_max_45d,
sq4.user__kv_category_duration_max_45d,
sq4.user__kv_day_h_click_count_max_45d,
sq4.user__kv_category_click_count_max_45d,
sq4.user__kv_day_h_duration_min_45d,
sq4.user__kv_category_duration_min_45d,
sq4.user__kv_day_h_click_count_min_45d,
sq4.user__kv_category_click_count_min_45d,
sq4.user__kv_day_h_duration_avg_45d,
sq4.user__kv_category_duration_avg_45d,
sq4.user__kv_day_h_click_count_avg_45d,
sq4.user__kv_category_click_count_avg_45d
FROM
(
    SELECT *
    FROM rec_sln_demo_fs_label_table_v1
    WHERE ds = '${bdp.system.bizdate}'
) sq0
LEFT JOIN (
    SELECT *
    FROM <project_name>.rec_sln_demo_user_table_preprocess_v1
    WHERE ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), -1,'dd'),'yyyymmdd')
) sq1 ON sq0.user_id = sq1.user_id
LEFT JOIN (
    SELECT *
    FROM <project_name>.rec_sln_demo_item_table_preprocess_v1
    WHERE ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), -1,'dd'),'yyyymmdd')
) sq2 ON sq0.item_id = sq2.item_id
LEFT JOIN (
    SELECT *
    FROM <project_name>.rec_sln_demo_user_table_preprocess_agg_v1
    WHERE ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), -1,'dd'),'yyyymmdd')
) sq3 ON sq0.user_id = sq3.user_id
LEFT JOIN (
    SELECT *
    FROM <project_name>.rec_sln_demo_user_table_preprocess_win_v1
    WHERE ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), -1,'dd'),'yyyymmdd')
) sq4 ON sq0.user_id = sq4.user_id
LEFT JOIN (
    SELECT *
    FROM <project_name>.rec_sln_demo_item_table_preprocess_agg_v1
    WHERE ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), -1,'dd'),'yyyymmdd')
) sq5 ON sq0.item_id = sq5.item_id
LEFT JOIN (
    SELECT *
    FROM <project_name>.rec_sln_demo_item_table_preprocess_win_v1
    WHERE ds = TO_CHAR(DATEADD(TO_DATE('${bdp.system.bizdate}','yyyymmdd'), -1,'dd'),'yyyymmdd')
) sq6 ON sq0.item_id = sq6.item_id;

Hasilnya adalah fs_demo_fs_engineering_v1_training_set, sebuah tabel partisi dengan siklus hidup 90 hari yang menggabungkan semua fitur pengguna, fitur item, dan label pelatihan dalam satu baris per permintaan.

Langkah selanjutnya

  • Untuk memahami platform fitur FeatureStore secara lengkap, termasuk penyajian online dan pendaftaran fitur, lihat Ikhtisar FeatureStore.

  • Untuk referensi API lengkap tentang transformasi rekayasa fitur, lihat Platform fitur dan rekayasa fitur.

  • Untuk dukungan teknis, bergabunglah dengan grup DingTalk 34415007523.