This topic uses a FeatureStore feature table as an example. It describes the entire process, from creating and registering a feature table to publishing it online. This helps you understand how to build and launch a complete recommendation system from scratch.
Background information
A recommendation system suggests personalized content or products to users based on their interests and preferences. Extracting and configuring feature information for users and items is a critical part of any recommendation system. This solution shows you how to build a recommendation system using FeatureStore. It also shows how FeatureStore interacts with other recommendation system products through various software development kits (SDKs). The process includes creating a project in FeatureStore, registering a feature table, creating a model feature, exporting a training sample table, synchronizing features from an offline store to an online store, training a model with the sample table, deploying an Elastic Algorithm Service (EAS) model service, and using FeatureStore configurations in PAI-REC.
If you are familiar with code, you can run the Python Notebook to view the configuration process. For more information, see DSW Gallery.
For more information about FeatureStore, see FeatureStore overview.
If you have any questions during configuration or use, you can search for the DingTalk group number 34415007523 to join the group and consult with our technical staff.
Prerequisites
Before you begin, make sure that you have completed the following preparations.
Required product | Action |
Platform for AI (PAI) | Activate PAI and create a PAI workspace. For more information, see Activate PAI and create a default workspace. |
MaxCompute |
|
FeatureDB |
|
DataWorks |
|
Object Storage Service (OSS) | Activate OSS. For more information, see Quick Start in the console. |
Step 1: Prepare data
Sync data tables
For a typical recommendation scenario, you need to prepare three data tables: a user feature table, an item feature table, and a label table.
To make it easier to follow this topic, we have prepared simulated user, item, and label tables in the pai_online_project project in MaxCompute. The user and item tables each have about 100,000 data entries per partition and occupy about 70 MB in MaxCompute. The label table has about 450,000 data entries per partition and occupies about 5 MB in MaxCompute.
You need to run SQL commands in DataWorks to sync the user, item, and label tables from the pai_online_project project to your own MaxCompute project. The procedure is as follows:
Log on to the DataWorks console.
In the navigation pane on the left, click Data Development and O&M > Data Development.
Select the DataWorks workspace that you created and click Go to Data Studio.
Hover over Create, and choose Create Node > MaxCompute > ODPS SQL. In the page that appears, configure the node parameters.
Parameter
Suggested value
Node Type
ODPS SQL
Path
Business Flow/Workflow/MaxCompute
Name
Enter a custom name.
Click Confirm.
In the new node area, run the following SQL commands to sync the user, item, and label tables from the pai_online_project project to your MaxCompute project. For Resource Group, select the exclusive resource group that you created.
Sync the user table: rec_sln_demo_user_table_preprocess_all_feature_v1 (Click to view details)
Sync the item table: rec_sln_demo_item_table_preprocess_all_feature_v1 (Click to view details)
Sync the label table: rec_sln_demo_label_table (Click to view details)
Perform data backfill for the synced tables.
In the DataWorks console, in the navigation pane on the left, click Data Development & O&M > Operation Center. Select the corresponding workspace from the drop-down list and click Enter Operation Center.
In the navigation pane on the left, click Auto Triggered Task O&M > Auto Triggered Task to go to the Auto Triggered Task page.
In the list of auto triggered tasks, click the target task to open its directed acyclic graph (DAG).
Right-click the target node and choose Data Backfill > Current Node. Select a data backfill mode.
Set Data Timestamp to a range from 2023-10-22 to 2023-10-24 and click Submit.
After you complete these steps, you can view the user table rec_sln_demo_user_table_preprocess_all_feature_v1, the item table rec_sln_demo_item_table_preprocess_all_feature_v1, and the label table rec_sln_demo_label_table in your workspace. The following operations use these three tables as examples.
Configure data sources
FeatureStore typically requires two data sources: an offline store (MaxCompute) and an online store (FeatureDB, Hologres, or TableStore). This topic uses MaxCompute and FeatureDB as examples.
Log on to the PAI console. In the navigation pane on the left, click Data Preparation > FeatureStore.
Select a workspace and click Enter FeatureStore.
Configure the MaxCompute data source.
On the Data Source tab, click New Data Source. In the page that appears, configure the MaxCompute data source parameters.
Parameter
Suggested value
Type
MaxCompute
Name
Enter a custom name.
MaxCompute Project Name
Select the MaxCompute project that you created.
Configure the FeatureDB data source.
If you have already created a FeatureDB data source, you can skip this step.
On the Data Source tab, click New Data Source. In the page that appears, configure the FeatureDB data source parameters.
Parameter
Suggested value
Type
FeatureDB (If this is your first time using it, follow the on-screen instructions to activate FeatureDB)
Name
You cannot customize the name. The default value is feature_db.
Username
Set a username.
Password
Set a password.
VPC high-speed connection (Optional)
After a successful configuration, you can use the FeatureStore SDK in a VPC to directly access FeatureDB through a PrivateLink connection. This improves data read and write performance and reduces access latency.
VPC
Select the VPC where your online FeatureStore service is located.
Zone and vSwitch
Select a zone and vSwitch. Make sure you select the vSwitch for the zone where your online service machine is located. We recommend selecting vSwitches in at least two zones to ensure high availability and stability for your business.
Click Submit.
Install the FeatureStore Python SDK
Log on to the DataWorks console.
In the navigation pane on the left, click Resource Groups.
On the Exclusive Resource Groups tab, find the resource group for which Purpose is set to Data Scheduling. Click the
icon that corresponds to the schedule resource and choose O&M Assistant.Click Create Command. In the page that appears, configure the command parameters.
Parameter
Suggested value
Command Name
Enter a custom name. This topic uses install as an example.
Command Type
Manual Input (pip Command Cannot Be Used To Install Third-party Packages)
Command Content
/home/tops/bin/pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple https://feature-store-py.oss-cn-beijing.aliyuncs.com/package/feature_store_py-2.0.2-py3-none-any.whlTimeout
Enter a custom time.
Click OK to create the command.
Click Run Command. In the page that appears, click Run.
You can click the
button to view the latest execution status. When the status changes to Success, the installation is complete.
Step 2: Create and register a FeatureStore project
You can create and register a FeatureStore project using either the console or an SDK. Because you need the SDK to export the training set and sync data later, you must use the FeatureStore Python SDK even if you choose to perform the initial configuration in the console.
Method 1: Use the console
Create a FeatureStore project
Log on to the PAI console. In the navigation pane on the left, click Data Preparation > FeatureStore.
Select a workspace and click Enter FeatureStore.
On the Projects tab, click New Project. In the page that appears, configure the project parameters.
Parameter
Suggested value
Name
Enter a custom name. This topic uses fs_demo as an example.
Description
Enter a custom description.
Offline Store
Select the MaxCompute data source that you created.
Online Store
Select the FeatureDB data source that you created.
Click Submit to create the FeatureStore project.
Create a feature entity
On the Projects page of FeatureStore, click the project name to go to the project details page.
On the Feature Entities tab, click New Feature Entity. In the dialog box that appears, configure the parameters for the user feature entity.
Parameter
Suggested value
Feature Entity Name
Enter a custom name. This topic uses user as an example.
Join Id
user_id
Click Submit.
Click New Feature Entity. In the dialog box that appears, configure the parameters for the item feature entity.
Parameter
Suggested value
Feature Entity Name
Enter a custom name. This topic uses item as an example.
Join Id
item_id
Click Submit to create the feature entity.
Create a feature view
On the project details page, on the Feature Views tab, click New Feature View. In the dialog box that appears, configure the parameters for the user feature view.
Parameter
Suggested value
View Name
Enter a custom name. This topic uses user_table_preprocess_all_feature_v1 as an example.
Type
Offline
Write Method
Use Offline Table
Data Source
Select the MaxCompute data source that you created.
Feature Table
Select the user table that you prepared, rec_sln_demo_user_table_preprocess_all_feature_v1.
Feature Fields
Select the user_id primary key.
Sync To Online Feature Table
Yes
Feature Entity
user
Feature Lifecycle
Keep the default value.
Click Submit.
Click New Feature View. In the dialog box that appears, configure the parameters for the item feature view.
Parameter
Suggested value
View Name
Enter a custom name. This topic uses item_table_preprocess_all_feature_v1 as an example.
Type
Offline
Write Method
Use Offline Table
Data Source
Select the MaxCompute data source that you created.
Feature Table
Select the item table that you prepared, rec_sln_demo_item_table_preprocess_all_feature_v1.
Feature Fields
Select the item_id primary key.
Sync To Online Feature Table
Yes
Feature Entity
item
Feature Lifecycle
Keep the default value.
Click Submit to create the feature view.
Create a label table
On the project details page, on the Label Tables tab, click New Label Table. In the page that appears, configure the label table information.
Parameter
Suggested value
Data Source
Select the MaxCompute data source that you created.
Table Name
Select the label table that you prepared, rec_sln_demo_label_table.
Click Submit.
Create a model feature
On the project details page, on the Model Features tab, click New Model Feature. In the page that appears, configure the model feature parameters.
Parameter
Suggested value
Model Feature Name
Enter a custom name. This topic uses fs_rank_v1 as an example.
Select Features
Select the user feature view and item feature view that you created.
Label Table Name
Select the label table that you created, rec_sln_demo_label_table.
Click Submit to create the model feature.
On the model feature list page, click Details in the row of the model you created.
On the Basic Information tab of the Model Feature Details page that appears, you can view the Export Table Name. The name is fs_demo_fs_rank_v1_trainning_set. This table is used for subsequent feature generation and model training.
Install the FeatureStore Python SDK. For more information, see Method 2: Use the FeatureStore Python SDK.
Method 2: Use the FeatureStore Python SDK
For the specific steps to use the SDK, see DSW Gallery.
Step 3: Export the training set and train the model
Export the training set.
Log on to the DataWorks console.
In the navigation pane on the left, click Data Development & O&M > Data Development.
Select the DataWorks workspace that you created and click Go to Data Studio.
Hover over Create, and choose Create Node > MaxCompute > PyODPS 3. In the page that appears, configure the node parameters.
Parameter
Suggested value
Engine Instance
Select the MaxCompute engine that you created.
Node Type
PyODPS 3
Path
Business Flow/Workflow/MaxCompute
Name
Enter a custom name.
Click Confirm.
Copy the following content to the script.
from feature_store_py.fs_client import FeatureStoreClient from feature_store_py.fs_project import FeatureStoreProject from feature_store_py.fs_datasource import LabelInput, MaxComputeDataSource, TrainingSetOutput from feature_store_py.fs_features import FeatureSelector from feature_store_py.fs_config import LabelInputConfig, PartitionConfig, FeatureViewConfig from feature_store_py.fs_config import TrainSetOutputConfig, EASDeployConfig import datetime import sys from odps.accounts import StsAccount cur_day = args['dt'] print('cur_day = ', cur_day) offset = datetime.timedelta(days=-1) pre_day = (datetime.datetime.strptime(cur_day, "%Y%m%d") + offset).strftime('%Y%m%d') print('pre_day = ', pre_day) access_key_id = o.account.access_id access_key_secret = o.account.secret_access_key sts_token = None endpoint = 'paifeaturestore-vpc.cn-beijing.aliyuncs.com' if isinstance(o.account, StsAccount): sts_token = o.account.sts_token fs = FeatureStoreClient(access_key_id=access_key_id, access_key_secret=access_key_secret, security_token=sts_token, endpoint=endpoint) cur_project_name = 'fs_demo' project = fs.get_project(cur_project_name) label_partitions = PartitionConfig(name = 'ds', value = cur_day) label_input_config = LabelInputConfig(partition_config=label_partitions) user_partitions = PartitionConfig(name = 'ds', value = pre_day) feature_view_user_config = FeatureViewConfig(name = 'user_table_preprocess_all_feature_v1', partition_config=user_partitions) item_partitions = PartitionConfig(name = 'ds', value = pre_day) feature_view_item_config = FeatureViewConfig(name = 'item_table_preprocess_all_feature_v1', partition_config=item_partitions) feature_view_config_list = [feature_view_user_config, feature_view_item_config] train_set_partitions = PartitionConfig(name = 'ds', value = cur_day) train_set_output_config = TrainSetOutputConfig(partition_config=train_set_partitions) model_name = 'fs_rank_v1' cur_model = project.get_model(model_name) task = cur_model.export_train_set(label_input_config, feature_view_config_list, train_set_output_config) task.wait() print("task_summary = ", task.task_summary)In the right-side pane, click Scheduling Configuration. In the page that appears, configure the scheduling parameters.
Parameter
Suggested value
Scheduling Parameters
Parameter Name
dt
Parameter Value
$[yyyymmdd-1]
Resource Properties
Scheduling Resource Group
Select the exclusive resource group that you created.
Scheduling Dependencies
Select the user table and item table that you created.
After you configure and test the node, save and submit the node configuration.
Perform data backfill. For more information, see Sync data tables.
(Optional) View the export task.
On the FeatureStore Projects page, click a project name to open its details page.
On the Feature Entities tab, click Task Hub.
Click Details in the row of the target task to view its basic information, run configuration, and task logs.
Train the model
EasyRec is an open-source recommendation system framework that seamlessly integrates with FeatureStore to train, export, and publish models. We recommend that you use the fs_demo_fs_rank_v1_trainning_set table as input to train a model with EasyRec.
For the EasyRec open source code, see EasyRec.
For the EasyRec documentation, see EasyRec Introduction.
For the EasyRec training documentation, see EasyRec Training.
For more questions about EasyRec, you can join the Alibaba Cloud Platform for AI (PAI) consultation group on DingTalk (Group ID: 32260796) to contact us.
Step 4: Publish the model
After you train and export the model, you can deploy and publish it. If you have a self-built recommendation system, FeatureStore provides Python, Go, C++, and Java SDKs to connect with various systems. You can also contact us through the DingTalk group (ID: 32260796) to discuss specific solutions. If you use Alibaba Cloud products, they can seamlessly integrate with FeatureStore to help you quickly build and launch a recommendation system.
This topic uses Alibaba Cloud products as an example to describe how to publish a model.
Step 1: Schedule data synchronization nodes
Before publishing, you must schedule the data synchronization nodes. This means you need to regularly synchronize data from the offline store to the online store, which is read in real time. In this example, you need to schedule the synchronization for the user feature table and the item feature table. The procedure is as follows.
Log on to the DataWorks console.
In the navigation pane on the left, click Data Development & O&M > Data Development.
Select the DataWorks workspace that you created and click Go to Data Studio.
Schedule synchronization for the user table.
Hover over New, and choose New Node > MaxCompute > PyODPS 3.
Copy the following content to the script to schedule synchronization for user_table_preprocess_all_feature_v1.
Schedule synchronization for user_table_preprocess_all_feature_v1 (Click to view details)
In the right-side pane, click Scheduling Configuration. In the page that appears, configure the scheduling parameters.
Parameter
Suggested value
Scheduling Parameters
Parameter Name
dt
Parameter Value
$[yyyymmdd-1]
Resource Properties
Scheduling Resource Group
Select the exclusive resource group that you created.
Scheduling Dependencies
Select the user table that you created.
After you configure and test the node, save and submit the node configuration.
Perform data backfill. For more information, see Sync data tables.
Schedule synchronization for the item table.
Hover over Create, and choose Create Node > MaxCompute > PyODPS 3. In the page that appears, configure the node parameters.
Click Confirm.
Copy the following content to the script.
Schedule synchronization for item_table_preprocess_all_feature_v1 (Click to view details)
In the right-side pane, click Scheduling Configuration. In the page that appears, configure the scheduling parameters.
Parameter
Suggested value
Scheduling Parameters
Parameter Name
dt
Parameter Value
$[yyyymmdd-1]
Resource Properties
Scheduling Resource Group
Select the exclusive resource group that you created.
Scheduling Dependencies
Select the item table that you created.
After you configure and test the node, save and submit the node configuration.
Perform data backfill. For more information, see Sync data tables.
After the synchronization is complete, you can view the latest synchronized features in Hologres.
Step 2: Create and deploy an EAS model service
A model service receives requests from the recommendation engine, scores the corresponding items based on the request, and returns the scores. The EasyRec Processor includes the FeatureStore C++ SDK, which allows for low-latency, high-performance feature fetching. After the EasyRec Processor fetches features using the FeatureStore C++ SDK, it sends them to the model for inference. The resulting scores are then returned to the recommendation engine.
The procedure to deploy the model service is as follows.
Log on to the DataWorks console.
In the navigation pane on the left, click Data Development & O&M > Data Development.
Select the DataWorks workspace that you created and click Go to Data Studio.
Hover over New, and choose New Node > MaxCompute > PyODPS 3.
Copy the following content to the script.
import os import json config = { "name": "fs_demo_v1", "metadata": { "cpu": 4, "rpc.max_queue_size": 256, "rpc.enable_jemalloc": 1, "gateway": "default", "memory": 16000 }, "model_path": f"oss://beijing0009/EasyRec/deploy/rec_sln_demo_dbmtl_v1/{args['ymd']}/export/final_with_fg", # Path of the trained model. You can customize the path. "model_config": { "access_key_id": f'{o.account.access_id}', "access_key_secret": f'{o.account.secret_access_key}', "region": "cn-beijing", # Replace this with the region where PAI is deployed. This topic uses cn-beijing as an example. "fs_project": "fs_demo", # Replace this with the name of your FeatureStore project. This topic uses fs_demo as an example. "fs_model": "fs_rank_v1", # Replace this with the name of your FeatureStore model feature. This topic uses fs_rank_v1 as an example. "fs_entity": "item", "load_feature_from_offlinestore": True, "steady_mode": True, "period": 2880, "outputs": "probs_is_click,y_ln_playtime,probs_is_praise", "fg_mode": "tf" }, "processor": "easyrec-1.9", "processor_type": "cpp" } with open("echo.json", "w") as output_file: json.dump(config, output_file) # Run the following line for the first deployment os.system(f"/home/admin/usertools/tools/eascmd -i {o.account.access_id} -k {o.account.secret_access_key} -e pai-eas.cn-beijing.aliyuncs.com create echo.json") # Run the following line for scheduled updates # os.system(f"/home/admin/usertools/tools/eascmd -i {o.account.access_id} -k {o.account.secret_access_key} -e pai-eas.cn-beijing.aliyuncs.com modify fs_demo_v1 -s echo.json")In the right-side pane, click Scheduling Configuration. In the page that appears, configure the scheduling parameters.
Parameter
Suggested value
Scheduling Parameters
Parameter Name
dt
Parameter Value
$[yyyymmdd-1]
Resource Properties
Scheduling Resource Group
Select the exclusive resource group that you created.
Scheduling Dependencies
Select the corresponding training task and item_table_preprocess_all_feature_v1.
After you configure and test the node, run it and check the deployment status.
After the deployment is complete, comment out line 34, uncomment line 37, and submit the task for scheduled execution.
(Optional) You can view the deployed service on the Inference Service tab of the Elastic Algorithm Service (EAS) page. For more information, see Custom deployment.
(Optional) When you use a data source that can be accessed only through a specific VPC, such as Hologres, you must connect the networks of EAS and the data source's VPC. For example, when you use Hologres, you can find the VPC ID and vSwitch ID on the Network Information page of the Hologres instance. On the EAS service page, click Configure High-Speed Connection in the upper-right corner and enter the corresponding VPC ID and vSwitch ID. You also need to enter a Security Group Name. You can select an existing security group or create a new one. Make sure that the security group allows traffic on the port that is required by Hologres. Hologres connections typically use port 80. Therefore, the selected security group must allow traffic on port 80 for the connection to work. After you enter all the information, click OK. You can use the service after it is updated.
Step 3: Configure PAI-REC
PAI-REC is a recommendation engine service that integrates the FeatureStore Go SDK. It can seamlessly connect with FeatureStore and EAS.
The configuration procedure is as follows.
Configure FeatureStoreConfs.
RegionId: Change this to the region where your product is located. This topic uses cn-beijing as an example.ProjectName: The name of the FeatureStore project that you created, which is fs_demo.
"FeatureStoreConfs": { "pairec-fs": { "RegionId": "cn-beijing", "AccessId": "${AccessKey}", "AccessKey": "${AccessSecret}", "ProjectName": "fs_demo" } },Configure FeatureConfs.
FeatureStoreName: Must be the same as the pairec-fs setting in FeatureStoreConfs from the previous step.FeatureStoreModelName: The name of the model feature that you created, which is fs_rank_v1.FeatureStoreEntityName: The name of the feature entity that you created, which is user. This instructs the FeatureStore Go SDK in PAI-REC to fetch features from the user entity in the fs_rank_v1 model.
"FeatureConfs": { "recreation_rec": { "AsynLoadFeature": true, "FeatureLoadConfs": [ { "FeatureDaoConf": { "AdapterType": "featurestore", "FeatureStoreName": "pairec-fs", "FeatureKey": "user:uid", "FeatureStoreModelName": "fs_rank_v1", "FeatureStoreEntityName": "user", "FeatureStore": "user" } } ] } },Configure AlgoConfs.
This configuration tells PAI-REC which EAS model scoring service to connect to.
Name: Must be the same as the name of the deployed EAS service.UrlandAuth: This is information provided by the EAS service. You can obtain the URL and token by clicking the service name on the EAS model service page. Then, on the Overview tab, in the Basic Information section, click View Invocation Information. For more information about detailed configurations, see FAQ about EAS.
"AlgoConfs": [ { "Name": "fs_demo_v1", "Type": "EAS", "EasConf": { "Processor": "EasyRec", "Timeout": 300, "ResponseFuncName": "easyrecMutValResponseFunc", "Url": "eas_url_xxx", "EndpointType": "DIRECT", "Auth": "eas_token" } } ],