Feature Generator (FG) mengubah data mentah pengguna dan item menjadi fitur yang siap digunakan oleh model saat inferensi. Panduan ini memandu Anda menguji konfigurasi FG secara lokal di Linux sebelum menerapkannya dalam pipeline produksi.
Pada akhir panduan ini, Anda akan:
Menginstal
pyfgdi lingkungan Python AndaMenentukan konfigurasi fitur dan menjalankan FG secara lokal
Memverifikasi input dan output untuk prosesor EasyRec (TensorFlow) maupun TorchEasyRec (PyTorch)
Mengirimkan tugas FG offline ke MaxCompute
Prasyarat
Sebelum memulai, pastikan Anda telah memiliki:
Mesin Linux (pengujian lokal FG memerlukan Linux)
Python 3.10, 3.11, atau 3.12 yang telah terinstal
(Untuk tugas MaxCompute) Python 3.7 dan kredensial Alibaba Cloud yang valid yang disimpan sebagai variabel lingkungan
ALIBABA_CLOUD_ACCESS_KEY_IDdanALIBABA_CLOUD_ACCESS_KEY_SECRET
Instal pyfg
Instal pyfg menggunakan file wheel yang sesuai dengan versi Python Anda.
Python 3.11 (disarankan):
pip install http://tzrec.oss-cn-beijing.aliyuncs.com/third_party/pyfg-1.0.1-cp311-cp311-linux_x86_64.whlPython 3.10:
pip install http://tzrec.oss-cn-beijing.aliyuncs.com/third_party/pyfg-1.0.1-cp310-cp310-linux_x86_64.whlPython 3.12:
pip install http://tzrec.oss-cn-beijing.aliyuncs.com/third_party/pyfg-1.0.1-cp312-cp312-linux_x86_64.whlVersi 1.0.1 memperkenalkan peningkatan serialisasi untuk operator fitur dan menambahkan dukungan bagi semua operator fitur sebagai sub-fitur dari fitur sekuensial.
Jalankan FG secara lokal
Pilih bagian yang sesuai dengan prosesor inferensi model Anda.
Prosesor EasyRec (TensorFlow)
Skrip berikut mendefinisikan konfigurasi fitur dan menjalankan FG menggunakan prosesor EasyRec. Perbedaan utama dibandingkan versi PyTorch: value_type menggunakan double untuk fitur numerik, dan pemanggilannya adalah handler(inputs) alih-alih handler.process(inputs).
#!/usr/bin/env python
import os
import pyfg
config = {
"features": [
{
"feature_name": "goods_id",
"feature_type": "id_feature", # ID feature: memetakan bidang mentah ke bucket hash atau ID string
"value_type": "string",
"expression": "item:goods_id", # Format: {side}:{field_name} — side adalah user, item, atau context
"default_value": "-1024",
"need_prefix": False,
"value_dimension": 1 # Dimensi output; mengontrol jumlah nilai yang dihasilkan per sampel
},
{
"feature_name": "color_pair",
"feature_type": "combo_feature", # Combination feature: menggabungkan beberapa bidang menjadi satu fitur silang
"value_type": "string",
"expression": ["user:query_color", "item:color"],
"default_value": "",
"need_prefix": False,
"value_dimension": 1
},
{
"feature_name": "current_price",
"feature_type": "raw_feature", # Raw feature: meneruskan bidang numerik tanpa transformasi
"value_type": "double",
"expression": "item:current_price",
"default_value": "0",
"need_prefix": False
},
{
"feature_name": "usr_cate1_clk_cnt_1d",
"feature_type": "lookup_feature", # Lookup feature: mencari nilai dari bidang map menggunakan bidang kunci
"map": "user:usr_cate1_clk_cnt_1d",
"key": "item:cate1",
"need_discrete": False,
"need_key": False,
"default_value": "0",
"combiner": "max", # Metode agregasi ketika beberapa kunci cocok; opsi: max, min, sum, mean
"need_prefix": False,
"value_type": "double"
},
{
"feature_name": "recommend_match",
"feature_type": "overlap_feature", # Overlap feature: mengukur tumpang tindih istilah antara bidang query dan item
"method": "is_contain", # Mengembalikan 1 jika ada istilah query yang muncul di bidang item, 0 jika tidak
"query": "user:query_recommend",
"title": "item:recommend",
"default_value": "0"
},
{
"feature_name": "query_title_match_ratio",
"feature_type": "overlap_feature",
"method": "query_common_ratio", # Proporsi istilah query yang ditemukan dalam judul item
"query": "user:query_terms",
"title": "item:title_terms",
"default_value": "0"
},
{
"feature_name": "title_term_match_ratio",
"feature_type": "overlap_feature",
"method": "title_common_ratio", # Proporsi istilah judul item yang ditemukan dalam query
"query": "user:query_terms",
"title": "item:title_terms",
"default_value": "0"
},
{
"feature_name": "term_proximity_min_cover",
"feature_type": "overlap_feature",
"method": "proximity_min_cover", # Ukuran jendela minimum yang mencakup semua istilah query dalam judul item
"query": "user:query_terms",
"title": "item:title_terms",
"default_value": "0"
}
]
}
if __name__ == '__main__':
handler = pyfg.FgHandler(config)
print("------------------------ meta info ---------------------------")
print("user side inputs:", handler.user_inputs())
print("item side inputs:", handler.item_inputs())
print("context side inputs:", handler.context_inputs())
print("offline table schema:", handler.table_schema())
features = handler.all_feature_names()
print("all generated features:", features)
inputs = {
"goods_id": ["110", "111", "112"],
"query_color": ["red", "pink", "gray"],
"color": ["white", "black", "pink"],
"current_price": [0.5, 0.25, 0.78],
"usr_cate1_clk_cnt_1d": [
{"c1": 1, "c2": 13, "c3": 5},
{"c1": 5, "c2": 3, "c4": 4.5},
{"c7": 7, "c5": 9, "c3": 5}
],
"cate1": ["c1", "c2", "c3"],
"query_recommend": ["High-quality", "Brand", "Premium"],
"recommend": ["High-quality", "Brand", "Carefully-selected"],
"title_terms": [
"Clear\035Men\035Shampoo\035Anti-dandruff\035Refreshing\035Shampoo-cream\035Men\035Vitality\035Sport\035100G",
"Master-Kong\035Jasmine\035Honey-Tea\035330ml*12\035bottles",
"Diao-Brand\035Detergent\035Household\035Large-barrel\035Food-grade\035Dishwashing-liquid\035Fruit-and-vegetable\035Cleaner\035Value-pack\035Dish-soap"
],
"query_terms": [
"Clear\035Shampoo",
"Jasmine\035Green-Tea",
"Detergent\035Household"
]
}
outputs, status = handler(inputs) # EasyRec: panggil handler langsung
print("status:", status.ok())
print("outputs:", outputs)
# Debug log: input data & generated features
print("------------------------ debug log ---------------------------")
input_str = handler.to_input_str(inputs)
print("input data:", input_str)
print()
generated_str = handler.to_debug_str_v2(outputs) # EasyRec menggunakan to_debug_str_v2
print("generated feature:", generated_str)Output yang diharapkan adalah:
------------------------ meta info ---------------------------
user side inputs: {'usr_cate1_clk_cnt_1d', 'query_terms', 'query_recommend', 'query_color', 'query'}
item side inputs: {'title_terms', 'title', 'recommend', 'goods_id', 'color', 'cate1', 'current_price'}
context side inputs: set()
offline table schema: {'usr_cate1_clk_cnt_1d': 'double', 'query_title_match_ratio': 'float', 'current_price': 'double', 'term_proximity_min_cover': 'float', 'title_term_match_ratio': 'float', 'edit_distance': 'int', 'recommend_match': 'float', 'goods_id': 'string', 'color_pair': 'string'}
all generated features: ['goods_id', 'color_pair', 'current_price', 'usr_cate1_clk_cnt_1d', 'recommend_match', 'query_title_match_ratio', 'title_term_match_ratio', 'term_proximity_min_cover', 'edit_distance']
inputs {'goods_id': ['110', '111', '112'], 'query_color': ['red', 'pink', 'gray'], 'color': ['white', 'black', 'pink'], 'current_price': [0.5, 0.25, 0.78], 'usr_cate1_clk_cnt_1d': [{'c1': 1, 'c2': 13, 'c3': 5}, {'c1': 5, 'c2': 3, 'c4': 4.5}, {'c7': 7, 'c5': 9, 'c3': 5}], 'cate1': ['c1', 'c2', 'c3'], 'query_recommend': ['High-quality', 'Brand', 'Premium'], 'recommend': ['High-quality', 'Brand', 'Carefully-selected'], 'title_terms': ['Clear\x1dMen\x1dShampoo\x1dAnti-dandruff\x1dRefreshing\x1dShampoo-cream\x1dMen\x1dVitality\x1dSport\x1d100G', 'Master-Kong\x1dJasmine\x1dHoney-Tea\x1d330ml*12\x1dbottles', 'Diao-Brand\x1dDetergent\x1dHousehold\x1dLarge-barrel\x1dFood-grade\x1dDishwashing-liquid\x1dFruit-and-vegetable\x1dCleaner\x1dValue-pack\x1dDish-soap'], 'query_terms': ['Clear\x1dShampoo', 'Jasmine\x1dGreen-Tea', 'Detergent\x1dHousehold'], 'query': ['Republic of China', 'Feature|Generation', 'The tool is very useful'], 'title': ['China', 'Feature|Transformation', 'The tool is useful']}
status: True
outputs: {'title_term_match_ratio': [0.20000000298023224, 0.20000000298023224, 0.20000000298023224], 'term_proximity_min_cover': [3.0, 0.0, 2.0], 'edit_distance': [12, 12, 11], 'query_title_match_ratio': [1.0, 0.5, 1.0], 'color_pair': ['red_white', 'pink_black', 'gray_pink'], 'goods_id': ['110', '111', '112'], 'current_price': [0.5, 0.25, 0.7799999713897705], 'usr_cate1_clk_cnt_1d': [1.0, 3.0, 5.0], 'recommend_match': [1.0, 1.0, 0.0]}
------------------------ debug log ---------------------------
input data: ['cate1:c1 | color:white | current_price:0.5 | goods_id:110 | query:Republic of China | query_color:red | query_recommend:High-quality | query_terms:Clear\x1dShampoo | recommend:High-quality | title:China | title_terms:Clear\x1dMen\x1dShampoo\x1dAnti-dandruff\x1dRefreshing\x1dShampoo-cream\x1dMen\x1dVitality\x1dSport\x1d100G | usr_cate1_clk_cnt_1d:c1:1\x1dc2:13\x1dc3:5', 'cate1:c2 | color:black | current_price:0.25 | goods_id:111 | query:Feature|Generation | query_color:pink | query_recommend:Brand | query_terms:Jasmine\x1dGreen-Tea | recommend:Brand | title:Feature|Transformation | title_terms:Master-Kong\x1dJasmine\x1dHoney-Tea\x1d330ml*12\x1dbottles | usr_cate1_clk_cnt_1d:c1:5\x1dc2:3\x1dc4:4.5', 'cate1:c3 | color:pink | current_price:0.78 | goods_id:112 | query:The tool is very useful | query_color:gray | query_recommend:Premium | query_terms:Detergent\x1dHousehold | recommend:Carefully-selected | title:The tool is useful | title_terms:Diao-Brand\x1dDetergent\x1dHousehold\x1dLarge-barrel\x1dFood-grade\x1dDishwashing-liquid\x1dFruit-and-vegetable\x1dCleaner\x1dValue-pack\x1dDish-soap | usr_cate1_clk_cnt_1d:c3:5\x1dc5:9\x1dc7:7']
generated feature: ['goods_id:110 | color_pair:red_white | current_price:0.5 | usr_cate1_clk_cnt_1d:1 | recommend_match:1 | query_title_match_ratio:1 | title_term_match_ratio:0.2 | term_proximity_min_cover:3 | edit_distance:12', 'goods_id:111 | color_pair:pink_black | current_price:0.25 | usr_cate1_clk_cnt_1d:3 | recommend_match:1 | query_title_match_ratio:0.5 | title_term_match_ratio:0.2 | term_proximity_min_cover:0 | edit_distance:12', 'goods_id:112 | color_pair:gray_pink | current_price:0.78 | usr_cate1_clk_cnt_1d:5 | recommend_match:0 | query_title_match_ratio:1 | title_term_match_ratio:0.2 | term_proximity_min_cover:2 | edit_distance:11']Cara memverifikasi output:
Blok `meta info` — mencantumkan kolom input yang terdeteksi berdasarkan sisi (user, item, context) dan skema tabel offline. Pastikan semua bidang fitur Anda muncul di sini.
`status: True` — menegaskan bahwa FG berjalan tanpa error. Jika
False, periksa log debug untuk input yang gagal.Dikt `outputs` — berisi nilai fitur yang dihasilkan. Parameter
value_dimensiondalam konfigurasi setiap fitur mengontrol jumlah nilai yang dihasilkan per sampel.`generated feature` — tampilan hasil per sampel yang mudah dibaca, berguna untuk memeriksa catatan individual secara cepat.
Prosesor TorchEasyRec (PyTorch)
Prosesor TorchEasyRec berbeda dari EasyRec dalam tiga hal:
value_typenumerik menggunakanfloatalih-alihdoublehash_bucket_sizewajib ditentukan untuk fitur ID dan kombinasiFitur kustom tambahan
edit_distancedisertakan, didukung oleh pustaka bersama
Pemanggilannya juga berbeda: gunakan handler.process(inputs) alih-alih handler(inputs), dan to_debug_str alih-alih to_debug_str_v2.
#!/usr/bin/env python
import pyfg
config = {
"features": [
{
"feature_name": "goods_id",
"feature_type": "id_feature",
"value_type": "string",
"expression": "item:goods_id",
"default_value": "-1024",
"need_prefix": False,
"hash_bucket_size": 100000, # Wajib untuk PyTorch: ukuran ruang hash untuk fitur ID
"value_dimension": 1
},
{
"feature_name": "color_pair",
"feature_type": "combo_feature",
"value_type": "string",
"expression": ["user:query_color", "item:color"],
"default_value": "",
"need_prefix": False,
"hash_bucket_size": 100000, # Wajib untuk PyTorch: ukuran ruang hash untuk fitur kombinasi
"value_dimension": 1
},
{
"feature_name": "current_price",
"feature_type": "raw_feature",
"value_type": "float", # PyTorch menggunakan float (bukan double)
"expression": "item:current_price",
"default_value": "0",
"need_prefix": False
},
{
"feature_name": "usr_cate1_clk_cnt_1d",
"feature_type": "lookup_feature",
"map": "user:usr_cate1_clk_cnt_1d",
"key": "item:cate1",
"need_discrete": False,
"need_key": False,
"default_value": "0",
"combiner": "max",
"need_prefix": False,
"value_type": "float" # PyTorch menggunakan float (bukan double)
},
{
"feature_name": "recommend_match",
"feature_type": "overlap_feature",
"method": "is_contain",
"query": "user:query_recommend",
"title": "item:recommend",
"default_value": "0"
},
{
"feature_name": "query_title_match_ratio",
"feature_type": "overlap_feature",
"method": "query_common_ratio",
"query": "user:query_terms",
"title": "item:title_terms",
"default_value": "0"
},
{
"feature_name": "title_term_match_ratio",
"feature_type": "overlap_feature",
"method": "title_common_ratio",
"query": "user:query_terms",
"title": "item:title_terms",
"default_value": "0"
},
{
"feature_name": "term_proximity_min_cover",
"feature_type": "overlap_feature",
"method": "proximity_min_cover",
"query": "user:query_terms",
"title": "item:title_terms",
"default_value": "0"
},
{
"feature_name": "edit_distance",
"feature_type": "custom_feature", # Custom feature: menggunakan operator pustaka eksternal
"operator_name": "EditDistance",
"operator_lib_file": "pyfg/lib/libedit_distance.so",
"expression": ["user:query", "item:title"],
"default_value": "0",
"value_type": "int32",
"value_dimension": 1,
"encoding": "utf-8",
"normalizer": "method=expression,expr=x+10" # Ekspresi pasca-pemrosesan yang diterapkan pada output operator mentah
}
]
}
if __name__ == '__main__':
handler = pyfg.FgHandler(config)
print("------------------------ meta info ---------------------------")
print("user side inputs:", handler.user_inputs())
print("item side inputs:", handler.item_inputs())
print("context side inputs:", handler.context_inputs())
print("offline table schema:", handler.table_schema())
features = handler.all_feature_names()
print("all generated features:", features)
inputs = {
"goods_id": ["110", "111", "112"],
"query_color": ["red", "pink", "gray"],
"color": ["white", "black", "pink"],
"current_price": [0.5, 0.25, 0.78],
"usr_cate1_clk_cnt_1d": [
{"c1": 1, "c2": 13, "c3": 5},
{"c1": 5, "c2": 3, "c4": 4.5},
{"c7": 7, "c5": 9, "c3": 5}
],
"cate1": ["c1", "c2", "c3"],
"query_recommend": ["High-quality", "Brand", "Premium"],
"recommend": ["High-quality", "Brand", "Carefully-selected"],
"title_terms": [
"Clear\035Men\035Shampoo\035Anti-dandruff\035Refreshing\035Shampoo-cream\035Men\035Vitality\035Sport\035100G",
"Master-Kong\035Jasmine\035Honey-Tea\035330ml*12\035bottles",
"Diao-Brand\035Detergent\035Household\035Large-barrel\035Food-grade\035Dishwashing-liquid\035Fruit-and-vegetable\035Cleaner\035Value-pack\035Dish-soap"
],
"query_terms": [
"Clear\035Shampoo",
"Jasmine\035Green-Tea",
"Detergent\035Household"
],
"query": ["Republic of China", "Feature|Generation", "The tool is very useful"],
"title": ["China", "Feature|Transformation", "The tool is useful"]
}
print("inputs", inputs)
outputs, status = handler.process(inputs) # TorchEasyRec: gunakan handler.process()
print("status:", status.ok())
print("------------------------ outputs ---------------------------")
for feature in features:
feat = outputs[feature]
if feat.feat_mode in (pyfg.FeatMode.Sparse, pyfg.FeatMode.SeqSparse):
print(feature, "values:", feat.values)
print(feature, "lengths:", feat.lengths)
elif feat.feat_mode in (pyfg.FeatMode.Dense, pyfg.FeatMode.SeqDense):
print(feature, "values:", feat.dense_values)
# Debug log: input data & generated features
print("------------------------ debug log ---------------------------")
input_str = handler.to_input_str(inputs)
print("input data:", input_str)
print()
generated_str = handler.to_debug_str(outputs, ',') # TorchEasyRec menggunakan to_debug_str (bukan v2)
print("generated feature:", generated_str)Output yang diharapkan adalah:
------------------------ meta info ---------------------------
user side inputs: {'query_terms', 'usr_cate1_clk_cnt_1d', 'query_color', 'query', 'query_recommend'}
item side inputs: {'cate1', 'color', 'goods_id', 'title_terms', 'title', 'recommend', 'current_price'}
context side inputs: set()
offline table schema: {'goods_id': 'bigint', 'color_pair': 'bigint', 'title_term_match_ratio': 'float', 'usr_cate1_clk_cnt_1d': 'float', 'term_proximity_min_cover': 'float', 'recommend_match': 'float', 'query_title_match_ratio': 'float', 'current_price': 'float', 'edit_distance': 'int'}
all generated features: ['goods_id', 'color_pair', 'current_price', 'usr_cate1_clk_cnt_1d', 'recommend_match', 'query_title_match_ratio', 'title_term_match_ratio', 'term_proximity_min_cover', 'edit_distance']
inputs {'goods_id': ['110', '111', '112'], 'query_color': ['red', 'pink', 'gray'], 'color': ['white', 'black', 'pink'], 'current_price': [0.5, 0.25, 0.78], 'usr_cate1_clk_cnt_1d': [{'c1': 1, 'c2': 13, 'c3': 5}, {'c1': 5, 'c2': 3, 'c4': 4.5}, {'c7': 7, 'c5': 9, 'c3': 5}], 'cate1': ['c1', 'c2', 'c3'], 'query_recommend': ['High-quality', 'Brand', 'Premium'], 'recommend': ['High-quality', 'Brand', 'Carefully-selected'], 'title_terms': ['Clear\x1dMen\x1dShampoo\x1dAnti-dandruff\x1dRefreshing\x1dShampoo-cream\x1dMen\x1dVitality\x1dSport\x1d100G', 'Master-Kong\x1dJasmine\x1dHoney-Tea\x1d330ml*12\x1dbottles', 'Diao-Brand\x1dDetergent\x1dHousehold\x1dLarge-barrel\x1dFood-grade\x1dDishwashing-liquid\x1dFruit-and-vegetable\x1dCleaner\x1dValue-pack\x1dDish-soap'], 'query_terms': ['Clear\x1dShampoo', 'Jasmine\x1dGreen-Tea', 'Detergent\x1dHousehold'], 'query': ['Republic of China', 'Feature|Generation', 'The tool is very useful'], 'title': ['China', 'Feature|Transformation', 'The tool is useful']}
status: True
------------------------ outputs ---------------------------
goods_id values: [89031, 84826, 50041]
goods_id lengths: [1, 1, 1]
color_pair values: [82277, 85822, 86290]
color_pair lengths: [1, 1, 1]
current_price values: [[0.5 ]
[0.25]
[0.78]]
usr_cate1_clk_cnt_1d values: [[1.]
[3.]
[5.]]
recommend_match values: [[1.]
[1.]
[0.]]
query_title_match_ratio values: [[1. ]
[0.5]
[1. ]]
title_term_match_ratio values: [[0.2]
[0.2]
[0.2]]
term_proximity_min_cover values: [[3.]
[0.]
[2.]]
edit_distance values: [[12.]
[12.]
[11.]]
------------------------ debug log ---------------------------
input data: ['cate1:c1 | color:white | current_price:0.5 | goods_id:110 | query:Republic of China | query_color:red | query_recommend:High-quality | query_terms:Clear\x1dShampoo | recommend:High-quality | title:China | title_terms:Clear\x1dMen\x1dShampoo\x1dAnti-dandruff\x1dRefreshing\x1dShampoo-cream\x1dMen\x1dVitality\x1dSport\x1d100G | usr_cate1_clk_cnt_1d:c1:1\x1dc2:13\x1dc3:5', 'cate1:c2 | color:black | current_price:0.25 | goods_id:111 | query:Feature|Generation | query_color:pink | query_recommend:Brand | query_terms:Jasmine\x1dGreen-Tea | recommend:Brand | title:Feature|Transformation | title_terms:Master-Kong\x1dJasmine\x1dHoney-Tea\x1d330ml*12\x1dbottles | usr_cate1_clk_cnt_1d:c1:5\x1dc2:3\x1dc4:4.5', 'cate1:c3 | color:pink | current_price:0.78 | goods_id:112 | query:The tool is very useful | query_color:gray | query_recommend:Premium | query_terms:Detergent\x1dHousehold | recommend:Carefully-selected | title:The tool is useful | title_terms:Diao-Brand\x1dDetergent\x1dHousehold\x1dLarge-barrel\x1dFood-grade\x1dDishwashing-liquid\x1dFruit-and-vegetable\x1dCleaner\x1dValue-pack\x1dDish-soap | usr_cate1_clk_cnt_1d:c3:5\x1dc5:9\x1dc7:7']
generated feature: ['goods_id:89031 | color_pair:82277 | current_price:0.5 | usr_cate1_clk_cnt_1d:1 | recommend_match:1 | query_title_match_ratio:1 | title_term_match_ratio:0.2 | term_proximity_min_cover:3 | edit_distance:12', 'goods_id:84826 | color_pair:85822 | current_price:0.25 | usr_cate1_clk_cnt_1d:3 | recommend_match:1 | query_title_match_ratio:0.5 | title_term_match_ratio:0.2 | term_proximity_min_cover:0 | edit_distance:12', 'goods_id:50041 | color_pair:86290 | current_price:0.78 | usr_cate1_clk_cnt_1d:5 | recommend_match:0 | query_title_match_ratio:1 | title_term_match_ratio:0.2 | term_proximity_min_cover:2 | edit_distance:11']Cara memverifikasi output:
Blok `meta info` — mencantumkan kolom input yang terdeteksi berdasarkan sisi (user, item, context) dan skema tabel offline. Pastikan semua bidang fitur Anda muncul di sini.
`status: True` — menegaskan bahwa FG berjalan tanpa error. Jika
False, periksa log debug untuk input yang gagal.Blok `outputs` — fitur sparse (ID dan kombinasi) menghasilkan array
valuesdanlengths; fitur dense (raw, lookup, overlap) menghasilkan matriksdense_values.`generated feature` — tampilan hasil per sampel yang mudah dibaca, berguna untuk memeriksa catatan individual secara cepat.
Jalankan pyfg di MaxCompute
Untuk panduan menjalankan FG dalam tugas offline DataWorks, lihat Gunakan FG dalam tugas offline.
Untuk mengirimkan tugas secara lokal ke MaxCompute:
Instal pyfg101 di lingkungan Python 3.7:
pip install http://tzrec.oss-cn-beijing.aliyuncs.com/third_party/pyfg101-1.0.1-cp37-cp37m-linux_x86_64.whlJalankan pyfg:
#!/usr/bin/env python from pyfg101 import run_on_odps fg_task = run_on_odps.FgTask( '${input_table}', # Nama tabel input MaxCompute '${output_table}', # Nama tabel output MaxCompute 'fg.json', # Jalur ke file konfigurasi FG Anda batch_size=128, force_delete_output_table=True, force_update_resource=False, output_merged_str=False, debug=False) fg_task.add_sql_setting('odps.stage.mapper.split.size', 256) if __name__ == '__main__': import os from odps import ODPS odps = ODPS( os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'), os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'), project='pai_rec_test_dev', endpoint='http://service.cn-beijing.maxcompute.aliyun.com/api', ) fg_task.run(odps)
Langkah selanjutnya
Gunakan FG dalam tugas offline — jalankan FG dalam skala besar di DataWorks
Operator fitur bawaan — referensi lengkap untuk jenis fitur dan peningkatan serialisasi di versi 1.0.1