All Products
Search
Document Center

Artificial Intelligence Recommendation:Memulai Feature Generator

Last Updated:Apr 01, 2026

Feature Generator (FG) mengubah data mentah pengguna dan item menjadi fitur yang siap digunakan oleh model saat inferensi. Panduan ini memandu Anda menguji konfigurasi FG secara lokal di Linux sebelum menerapkannya dalam pipeline produksi.

Pada akhir panduan ini, Anda akan:

  1. Menginstal pyfg di lingkungan Python Anda

  2. Menentukan konfigurasi fitur dan menjalankan FG secara lokal

  3. Memverifikasi input dan output untuk prosesor EasyRec (TensorFlow) maupun TorchEasyRec (PyTorch)

  4. Mengirimkan tugas FG offline ke MaxCompute

Prasyarat

Sebelum memulai, pastikan Anda telah memiliki:

  • Mesin Linux (pengujian lokal FG memerlukan Linux)

  • Python 3.10, 3.11, atau 3.12 yang telah terinstal

  • (Untuk tugas MaxCompute) Python 3.7 dan kredensial Alibaba Cloud yang valid yang disimpan sebagai variabel lingkungan ALIBABA_CLOUD_ACCESS_KEY_ID dan ALIBABA_CLOUD_ACCESS_KEY_SECRET

Instal pyfg

Instal pyfg menggunakan file wheel yang sesuai dengan versi Python Anda.

Python 3.11 (disarankan):

pip install http://tzrec.oss-cn-beijing.aliyuncs.com/third_party/pyfg-1.0.1-cp311-cp311-linux_x86_64.whl

Python 3.10:

pip install http://tzrec.oss-cn-beijing.aliyuncs.com/third_party/pyfg-1.0.1-cp310-cp310-linux_x86_64.whl

Python 3.12:

pip install http://tzrec.oss-cn-beijing.aliyuncs.com/third_party/pyfg-1.0.1-cp312-cp312-linux_x86_64.whl
Versi 1.0.1 memperkenalkan peningkatan serialisasi untuk operator fitur dan menambahkan dukungan bagi semua operator fitur sebagai sub-fitur dari fitur sekuensial.

Jalankan FG secara lokal

Pilih bagian yang sesuai dengan prosesor inferensi model Anda.

Prosesor EasyRec (TensorFlow)

Skrip berikut mendefinisikan konfigurasi fitur dan menjalankan FG menggunakan prosesor EasyRec. Perbedaan utama dibandingkan versi PyTorch: value_type menggunakan double untuk fitur numerik, dan pemanggilannya adalah handler(inputs) alih-alih handler.process(inputs).

#!/usr/bin/env python
import os
import pyfg

config = {
  "features": [
    {
      "feature_name": "goods_id",
      "feature_type": "id_feature",       # ID feature: memetakan bidang mentah ke bucket hash atau ID string
      "value_type": "string",
      "expression": "item:goods_id",      # Format: {side}:{field_name} — side adalah user, item, atau context
      "default_value": "-1024",
      "need_prefix": False,
      "value_dimension": 1               # Dimensi output; mengontrol jumlah nilai yang dihasilkan per sampel
    },
    {
      "feature_name": "color_pair",
      "feature_type": "combo_feature",   # Combination feature: menggabungkan beberapa bidang menjadi satu fitur silang
      "value_type": "string",
      "expression": ["user:query_color", "item:color"],
      "default_value": "",
      "need_prefix": False,
      "value_dimension": 1
    },
    {
      "feature_name": "current_price",
      "feature_type": "raw_feature",     # Raw feature: meneruskan bidang numerik tanpa transformasi
      "value_type": "double",
      "expression": "item:current_price",
      "default_value": "0",
      "need_prefix": False
    },
    {
      "feature_name": "usr_cate1_clk_cnt_1d",
      "feature_type": "lookup_feature",  # Lookup feature: mencari nilai dari bidang map menggunakan bidang kunci
      "map": "user:usr_cate1_clk_cnt_1d",
      "key": "item:cate1",
      "need_discrete": False,
      "need_key": False,
      "default_value": "0",
      "combiner": "max",                 # Metode agregasi ketika beberapa kunci cocok; opsi: max, min, sum, mean
      "need_prefix": False,
      "value_type": "double"
    },
    {
      "feature_name": "recommend_match",
      "feature_type": "overlap_feature", # Overlap feature: mengukur tumpang tindih istilah antara bidang query dan item
      "method": "is_contain",            # Mengembalikan 1 jika ada istilah query yang muncul di bidang item, 0 jika tidak
      "query": "user:query_recommend",
      "title": "item:recommend",
      "default_value": "0"
    },
    {
      "feature_name": "query_title_match_ratio",
      "feature_type": "overlap_feature",
      "method": "query_common_ratio",    # Proporsi istilah query yang ditemukan dalam judul item
      "query": "user:query_terms",
      "title": "item:title_terms",
      "default_value": "0"
    },
    {
      "feature_name": "title_term_match_ratio",
      "feature_type": "overlap_feature",
      "method": "title_common_ratio",    # Proporsi istilah judul item yang ditemukan dalam query
      "query": "user:query_terms",
      "title": "item:title_terms",
      "default_value": "0"
    },
    {
      "feature_name": "term_proximity_min_cover",
      "feature_type": "overlap_feature",
      "method": "proximity_min_cover",   # Ukuran jendela minimum yang mencakup semua istilah query dalam judul item
      "query": "user:query_terms",
      "title": "item:title_terms",
      "default_value": "0"
    }
  ]
}


if __name__ == '__main__':
    handler = pyfg.FgHandler(config)
    print("------------------------ meta info ---------------------------")
    print("user side inputs:", handler.user_inputs())
    print("item side inputs:", handler.item_inputs())
    print("context side inputs:", handler.context_inputs())
    print("offline table schema:", handler.table_schema())
    features = handler.all_feature_names()
    print("all generated features:", features)

    inputs = {
      "goods_id": ["110", "111", "112"],
      "query_color": ["red", "pink", "gray"],
      "color": ["white", "black", "pink"],
      "current_price": [0.5, 0.25, 0.78],
      "usr_cate1_clk_cnt_1d": [
        {"c1": 1, "c2": 13, "c3": 5},
        {"c1": 5, "c2": 3, "c4": 4.5},
        {"c7": 7, "c5": 9, "c3": 5}
      ],
      "cate1": ["c1", "c2", "c3"],
      "query_recommend": ["High-quality", "Brand", "Premium"],
      "recommend": ["High-quality", "Brand", "Carefully-selected"],
      "title_terms": [
        "Clear\035Men\035Shampoo\035Anti-dandruff\035Refreshing\035Shampoo-cream\035Men\035Vitality\035Sport\035100G",
        "Master-Kong\035Jasmine\035Honey-Tea\035330ml*12\035bottles",
        "Diao-Brand\035Detergent\035Household\035Large-barrel\035Food-grade\035Dishwashing-liquid\035Fruit-and-vegetable\035Cleaner\035Value-pack\035Dish-soap"
      ],
      "query_terms": [
        "Clear\035Shampoo",
        "Jasmine\035Green-Tea",
        "Detergent\035Household"
      ]
    }
    outputs, status = handler(inputs)      # EasyRec: panggil handler langsung
    print("status:", status.ok())
    print("outputs:", outputs)
    # Debug log: input data & generated features
    print("------------------------ debug log ---------------------------")
    input_str = handler.to_input_str(inputs)
    print("input data:", input_str)
    print()
    generated_str = handler.to_debug_str_v2(outputs)   # EasyRec menggunakan to_debug_str_v2
    print("generated feature:", generated_str)

Output yang diharapkan adalah:

------------------------ meta info ---------------------------
user side inputs: {'usr_cate1_clk_cnt_1d', 'query_terms', 'query_recommend', 'query_color', 'query'}
item side inputs: {'title_terms', 'title', 'recommend', 'goods_id', 'color', 'cate1', 'current_price'}
context side inputs: set()
offline table schema: {'usr_cate1_clk_cnt_1d': 'double', 'query_title_match_ratio': 'float', 'current_price': 'double', 'term_proximity_min_cover': 'float', 'title_term_match_ratio': 'float', 'edit_distance': 'int', 'recommend_match': 'float', 'goods_id': 'string', 'color_pair': 'string'}
all generated features: ['goods_id', 'color_pair', 'current_price', 'usr_cate1_clk_cnt_1d', 'recommend_match', 'query_title_match_ratio', 'title_term_match_ratio', 'term_proximity_min_cover', 'edit_distance']
inputs {'goods_id': ['110', '111', '112'], 'query_color': ['red', 'pink', 'gray'], 'color': ['white', 'black', 'pink'], 'current_price': [0.5, 0.25, 0.78], 'usr_cate1_clk_cnt_1d': [{'c1': 1, 'c2': 13, 'c3': 5}, {'c1': 5, 'c2': 3, 'c4': 4.5}, {'c7': 7, 'c5': 9, 'c3': 5}], 'cate1': ['c1', 'c2', 'c3'], 'query_recommend': ['High-quality', 'Brand', 'Premium'], 'recommend': ['High-quality', 'Brand', 'Carefully-selected'], 'title_terms': ['Clear\x1dMen\x1dShampoo\x1dAnti-dandruff\x1dRefreshing\x1dShampoo-cream\x1dMen\x1dVitality\x1dSport\x1d100G', 'Master-Kong\x1dJasmine\x1dHoney-Tea\x1d330ml*12\x1dbottles', 'Diao-Brand\x1dDetergent\x1dHousehold\x1dLarge-barrel\x1dFood-grade\x1dDishwashing-liquid\x1dFruit-and-vegetable\x1dCleaner\x1dValue-pack\x1dDish-soap'], 'query_terms': ['Clear\x1dShampoo', 'Jasmine\x1dGreen-Tea', 'Detergent\x1dHousehold'], 'query': ['Republic of China', 'Feature|Generation', 'The tool is very useful'], 'title': ['China', 'Feature|Transformation', 'The tool is useful']}
status: True
outputs: {'title_term_match_ratio': [0.20000000298023224, 0.20000000298023224, 0.20000000298023224], 'term_proximity_min_cover': [3.0, 0.0, 2.0], 'edit_distance': [12, 12, 11], 'query_title_match_ratio': [1.0, 0.5, 1.0], 'color_pair': ['red_white', 'pink_black', 'gray_pink'], 'goods_id': ['110', '111', '112'], 'current_price': [0.5, 0.25, 0.7799999713897705], 'usr_cate1_clk_cnt_1d': [1.0, 3.0, 5.0], 'recommend_match': [1.0, 1.0, 0.0]}
------------------------  debug log ---------------------------
input data: ['cate1:c1 | color:white | current_price:0.5 | goods_id:110 | query:Republic of China | query_color:red | query_recommend:High-quality | query_terms:Clear\x1dShampoo | recommend:High-quality | title:China | title_terms:Clear\x1dMen\x1dShampoo\x1dAnti-dandruff\x1dRefreshing\x1dShampoo-cream\x1dMen\x1dVitality\x1dSport\x1d100G | usr_cate1_clk_cnt_1d:c1:1\x1dc2:13\x1dc3:5', 'cate1:c2 | color:black | current_price:0.25 | goods_id:111 | query:Feature|Generation | query_color:pink | query_recommend:Brand | query_terms:Jasmine\x1dGreen-Tea | recommend:Brand | title:Feature|Transformation | title_terms:Master-Kong\x1dJasmine\x1dHoney-Tea\x1d330ml*12\x1dbottles | usr_cate1_clk_cnt_1d:c1:5\x1dc2:3\x1dc4:4.5', 'cate1:c3 | color:pink | current_price:0.78 | goods_id:112 | query:The tool is very useful | query_color:gray | query_recommend:Premium | query_terms:Detergent\x1dHousehold | recommend:Carefully-selected | title:The tool is useful | title_terms:Diao-Brand\x1dDetergent\x1dHousehold\x1dLarge-barrel\x1dFood-grade\x1dDishwashing-liquid\x1dFruit-and-vegetable\x1dCleaner\x1dValue-pack\x1dDish-soap | usr_cate1_clk_cnt_1d:c3:5\x1dc5:9\x1dc7:7']

generated feature: ['goods_id:110 | color_pair:red_white | current_price:0.5 | usr_cate1_clk_cnt_1d:1 | recommend_match:1 | query_title_match_ratio:1 | title_term_match_ratio:0.2 | term_proximity_min_cover:3 | edit_distance:12', 'goods_id:111 | color_pair:pink_black | current_price:0.25 | usr_cate1_clk_cnt_1d:3 | recommend_match:1 | query_title_match_ratio:0.5 | title_term_match_ratio:0.2 | term_proximity_min_cover:0 | edit_distance:12', 'goods_id:112 | color_pair:gray_pink | current_price:0.78 | usr_cate1_clk_cnt_1d:5 | recommend_match:0 | query_title_match_ratio:1 | title_term_match_ratio:0.2 | term_proximity_min_cover:2 | edit_distance:11']

Cara memverifikasi output:

  • Blok `meta info` — mencantumkan kolom input yang terdeteksi berdasarkan sisi (user, item, context) dan skema tabel offline. Pastikan semua bidang fitur Anda muncul di sini.

  • `status: True` — menegaskan bahwa FG berjalan tanpa error. Jika False, periksa log debug untuk input yang gagal.

  • Dikt `outputs` — berisi nilai fitur yang dihasilkan. Parameter value_dimension dalam konfigurasi setiap fitur mengontrol jumlah nilai yang dihasilkan per sampel.

  • `generated feature` — tampilan hasil per sampel yang mudah dibaca, berguna untuk memeriksa catatan individual secara cepat.

Prosesor TorchEasyRec (PyTorch)

Prosesor TorchEasyRec berbeda dari EasyRec dalam tiga hal:

  • value_type numerik menggunakan float alih-alih double

  • hash_bucket_size wajib ditentukan untuk fitur ID dan kombinasi

  • Fitur kustom tambahan edit_distance disertakan, didukung oleh pustaka bersama

Pemanggilannya juga berbeda: gunakan handler.process(inputs) alih-alih handler(inputs), dan to_debug_str alih-alih to_debug_str_v2.

#!/usr/bin/env python
import pyfg

config = {
  "features": [
    {
      "feature_name": "goods_id",
      "feature_type": "id_feature",
      "value_type": "string",
      "expression": "item:goods_id",
      "default_value": "-1024",
      "need_prefix": False,
      "hash_bucket_size": 100000,        # Wajib untuk PyTorch: ukuran ruang hash untuk fitur ID
      "value_dimension": 1
    },
    {
      "feature_name": "color_pair",
      "feature_type": "combo_feature",
      "value_type": "string",
      "expression": ["user:query_color", "item:color"],
      "default_value": "",
      "need_prefix": False,
      "hash_bucket_size": 100000,        # Wajib untuk PyTorch: ukuran ruang hash untuk fitur kombinasi
      "value_dimension": 1
    },
    {
      "feature_name": "current_price",
      "feature_type": "raw_feature",
      "value_type": "float",             # PyTorch menggunakan float (bukan double)
      "expression": "item:current_price",
      "default_value": "0",
      "need_prefix": False
    },
    {
      "feature_name": "usr_cate1_clk_cnt_1d",
      "feature_type": "lookup_feature",
      "map": "user:usr_cate1_clk_cnt_1d",
      "key": "item:cate1",
      "need_discrete": False,
      "need_key": False,
      "default_value": "0",
      "combiner": "max",
      "need_prefix": False,
      "value_type": "float"              # PyTorch menggunakan float (bukan double)
    },
    {
      "feature_name": "recommend_match",
      "feature_type": "overlap_feature",
      "method": "is_contain",
      "query": "user:query_recommend",
      "title": "item:recommend",
      "default_value": "0"
    },
    {
      "feature_name": "query_title_match_ratio",
      "feature_type": "overlap_feature",
      "method": "query_common_ratio",
      "query": "user:query_terms",
      "title": "item:title_terms",
      "default_value": "0"
    },
    {
      "feature_name": "title_term_match_ratio",
      "feature_type": "overlap_feature",
      "method": "title_common_ratio",
      "query": "user:query_terms",
      "title": "item:title_terms",
      "default_value": "0"
    },
    {
      "feature_name": "term_proximity_min_cover",
      "feature_type": "overlap_feature",
      "method": "proximity_min_cover",
      "query": "user:query_terms",
      "title": "item:title_terms",
      "default_value": "0"
    },
    {
      "feature_name": "edit_distance",
      "feature_type": "custom_feature",          # Custom feature: menggunakan operator pustaka eksternal
      "operator_name": "EditDistance",
      "operator_lib_file": "pyfg/lib/libedit_distance.so",
      "expression": ["user:query", "item:title"],
      "default_value": "0",
      "value_type": "int32",
      "value_dimension": 1,
      "encoding": "utf-8",
      "normalizer": "method=expression,expr=x+10"  # Ekspresi pasca-pemrosesan yang diterapkan pada output operator mentah
    }
  ]
}


if __name__ == '__main__':
    handler = pyfg.FgHandler(config)
    print("------------------------ meta info ---------------------------")
    print("user side inputs:", handler.user_inputs())
    print("item side inputs:", handler.item_inputs())
    print("context side inputs:", handler.context_inputs())
    print("offline table schema:", handler.table_schema())
    features = handler.all_feature_names()
    print("all generated features:", features)

    inputs = {
      "goods_id": ["110", "111", "112"],
      "query_color": ["red", "pink", "gray"],
      "color": ["white", "black", "pink"],
      "current_price": [0.5, 0.25, 0.78],
      "usr_cate1_clk_cnt_1d": [
        {"c1": 1, "c2": 13, "c3": 5},
        {"c1": 5, "c2": 3, "c4": 4.5},
        {"c7": 7, "c5": 9, "c3": 5}
      ],
      "cate1": ["c1", "c2", "c3"],
      "query_recommend": ["High-quality", "Brand", "Premium"],
      "recommend": ["High-quality", "Brand", "Carefully-selected"],
      "title_terms": [
        "Clear\035Men\035Shampoo\035Anti-dandruff\035Refreshing\035Shampoo-cream\035Men\035Vitality\035Sport\035100G",
        "Master-Kong\035Jasmine\035Honey-Tea\035330ml*12\035bottles",
        "Diao-Brand\035Detergent\035Household\035Large-barrel\035Food-grade\035Dishwashing-liquid\035Fruit-and-vegetable\035Cleaner\035Value-pack\035Dish-soap"
      ],
      "query_terms": [
        "Clear\035Shampoo",
        "Jasmine\035Green-Tea",
        "Detergent\035Household"
      ],
      "query": ["Republic of China", "Feature|Generation", "The tool is very useful"],
      "title": ["China", "Feature|Transformation", "The tool is useful"]
    }
    print("inputs", inputs)
    outputs, status = handler.process(inputs)  # TorchEasyRec: gunakan handler.process()
    print("status:", status.ok())
    print("------------------------ outputs ---------------------------")
    for feature in features:
      feat = outputs[feature]
      if feat.feat_mode in (pyfg.FeatMode.Sparse, pyfg.FeatMode.SeqSparse):
        print(feature, "values:", feat.values)
        print(feature, "lengths:", feat.lengths)
      elif feat.feat_mode in (pyfg.FeatMode.Dense, pyfg.FeatMode.SeqDense):
        print(feature, "values:", feat.dense_values)

    # Debug log: input data & generated features
    print("------------------------ debug log ---------------------------")
    input_str = handler.to_input_str(inputs)
    print("input data:", input_str)
    print()
    generated_str = handler.to_debug_str(outputs, ',')   # TorchEasyRec menggunakan to_debug_str (bukan v2)
    print("generated feature:", generated_str)

Output yang diharapkan adalah:

------------------------ meta info ---------------------------
user side inputs: {'query_terms', 'usr_cate1_clk_cnt_1d', 'query_color', 'query', 'query_recommend'}
item side inputs: {'cate1', 'color', 'goods_id', 'title_terms', 'title', 'recommend', 'current_price'}
context side inputs: set()
offline table schema: {'goods_id': 'bigint', 'color_pair': 'bigint', 'title_term_match_ratio': 'float', 'usr_cate1_clk_cnt_1d': 'float', 'term_proximity_min_cover': 'float', 'recommend_match': 'float', 'query_title_match_ratio': 'float', 'current_price': 'float', 'edit_distance': 'int'}
all generated features: ['goods_id', 'color_pair', 'current_price', 'usr_cate1_clk_cnt_1d', 'recommend_match', 'query_title_match_ratio', 'title_term_match_ratio', 'term_proximity_min_cover', 'edit_distance']
inputs {'goods_id': ['110', '111', '112'], 'query_color': ['red', 'pink', 'gray'], 'color': ['white', 'black', 'pink'], 'current_price': [0.5, 0.25, 0.78], 'usr_cate1_clk_cnt_1d': [{'c1': 1, 'c2': 13, 'c3': 5}, {'c1': 5, 'c2': 3, 'c4': 4.5}, {'c7': 7, 'c5': 9, 'c3': 5}], 'cate1': ['c1', 'c2', 'c3'], 'query_recommend': ['High-quality', 'Brand', 'Premium'], 'recommend': ['High-quality', 'Brand', 'Carefully-selected'], 'title_terms': ['Clear\x1dMen\x1dShampoo\x1dAnti-dandruff\x1dRefreshing\x1dShampoo-cream\x1dMen\x1dVitality\x1dSport\x1d100G', 'Master-Kong\x1dJasmine\x1dHoney-Tea\x1d330ml*12\x1dbottles', 'Diao-Brand\x1dDetergent\x1dHousehold\x1dLarge-barrel\x1dFood-grade\x1dDishwashing-liquid\x1dFruit-and-vegetable\x1dCleaner\x1dValue-pack\x1dDish-soap'], 'query_terms': ['Clear\x1dShampoo', 'Jasmine\x1dGreen-Tea', 'Detergent\x1dHousehold'], 'query': ['Republic of China', 'Feature|Generation', 'The tool is very useful'], 'title': ['China', 'Feature|Transformation', 'The tool is useful']}
status: True
------------------------ outputs ---------------------------
goods_id values: [89031, 84826, 50041]
goods_id lengths: [1, 1, 1]
color_pair values: [82277, 85822, 86290]
color_pair lengths: [1, 1, 1]
current_price values: [[0.5 ]
 [0.25]
 [0.78]]
usr_cate1_clk_cnt_1d values: [[1.]
 [3.]
 [5.]]
recommend_match values: [[1.]
 [1.]
 [0.]]
query_title_match_ratio values: [[1. ]
 [0.5]
 [1. ]]
title_term_match_ratio values: [[0.2]
 [0.2]
 [0.2]]
term_proximity_min_cover values: [[3.]
 [0.]
 [2.]]
edit_distance values: [[12.]
 [12.]
 [11.]]
------------------------  debug log ---------------------------
input data: ['cate1:c1 | color:white | current_price:0.5 | goods_id:110 | query:Republic of China | query_color:red | query_recommend:High-quality | query_terms:Clear\x1dShampoo | recommend:High-quality | title:China | title_terms:Clear\x1dMen\x1dShampoo\x1dAnti-dandruff\x1dRefreshing\x1dShampoo-cream\x1dMen\x1dVitality\x1dSport\x1d100G | usr_cate1_clk_cnt_1d:c1:1\x1dc2:13\x1dc3:5', 'cate1:c2 | color:black | current_price:0.25 | goods_id:111 | query:Feature|Generation | query_color:pink | query_recommend:Brand | query_terms:Jasmine\x1dGreen-Tea | recommend:Brand | title:Feature|Transformation | title_terms:Master-Kong\x1dJasmine\x1dHoney-Tea\x1d330ml*12\x1dbottles | usr_cate1_clk_cnt_1d:c1:5\x1dc2:3\x1dc4:4.5', 'cate1:c3 | color:pink | current_price:0.78 | goods_id:112 | query:The tool is very useful | query_color:gray | query_recommend:Premium | query_terms:Detergent\x1dHousehold | recommend:Carefully-selected | title:The tool is useful | title_terms:Diao-Brand\x1dDetergent\x1dHousehold\x1dLarge-barrel\x1dFood-grade\x1dDishwashing-liquid\x1dFruit-and-vegetable\x1dCleaner\x1dValue-pack\x1dDish-soap | usr_cate1_clk_cnt_1d:c3:5\x1dc5:9\x1dc7:7']
generated feature: ['goods_id:89031 | color_pair:82277 | current_price:0.5 | usr_cate1_clk_cnt_1d:1 | recommend_match:1 | query_title_match_ratio:1 | title_term_match_ratio:0.2 | term_proximity_min_cover:3 | edit_distance:12', 'goods_id:84826 | color_pair:85822 | current_price:0.25 | usr_cate1_clk_cnt_1d:3 | recommend_match:1 | query_title_match_ratio:0.5 | title_term_match_ratio:0.2 | term_proximity_min_cover:0 | edit_distance:12', 'goods_id:50041 | color_pair:86290 | current_price:0.78 | usr_cate1_clk_cnt_1d:5 | recommend_match:0 | query_title_match_ratio:1 | title_term_match_ratio:0.2 | term_proximity_min_cover:2 | edit_distance:11']

Cara memverifikasi output:

  • Blok `meta info` — mencantumkan kolom input yang terdeteksi berdasarkan sisi (user, item, context) dan skema tabel offline. Pastikan semua bidang fitur Anda muncul di sini.

  • `status: True` — menegaskan bahwa FG berjalan tanpa error. Jika False, periksa log debug untuk input yang gagal.

  • Blok `outputs` — fitur sparse (ID dan kombinasi) menghasilkan array values dan lengths; fitur dense (raw, lookup, overlap) menghasilkan matriks dense_values.

  • `generated feature` — tampilan hasil per sampel yang mudah dibaca, berguna untuk memeriksa catatan individual secara cepat.

Jalankan pyfg di MaxCompute

Untuk panduan menjalankan FG dalam tugas offline DataWorks, lihat Gunakan FG dalam tugas offline.

Untuk mengirimkan tugas secara lokal ke MaxCompute:

  1. Instal pyfg101 di lingkungan Python 3.7:

    pip install http://tzrec.oss-cn-beijing.aliyuncs.com/third_party/pyfg101-1.0.1-cp37-cp37m-linux_x86_64.whl
  2. Jalankan pyfg:

    #!/usr/bin/env python
    from pyfg101 import run_on_odps
    
    fg_task = run_on_odps.FgTask(
        '${input_table}',             # Nama tabel input MaxCompute
        '${output_table}',            # Nama tabel output MaxCompute
        'fg.json',                    # Jalur ke file konfigurasi FG Anda
        batch_size=128,
        force_delete_output_table=True,
        force_update_resource=False,
        output_merged_str=False,
        debug=False)
    fg_task.add_sql_setting('odps.stage.mapper.split.size', 256)
    
    
    if __name__ == '__main__':
        import os
        from odps import ODPS
        odps = ODPS(
            os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
            os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
            project='pai_rec_test_dev',
            endpoint='http://service.cn-beijing.maxcompute.aliyun.com/api',
        )
        fg_task.run(odps)

Langkah selanjutnya