The Update Easyrec Config component of Platform for AI (PAI) uses the add_feature_info_to_config.py script in EasyRec to add feature information to the template.config file that is generated by PAIREC. PAIREC is an end-to-end in-depth customization and development platform. This topic describes how to configure the Update Easyrec Config component.
Prerequisites
Object Storage Service (OSS) is activated, and Machine Learning Designer is authorized to access OSS. For more information, see Activate OSS and Grant the permissions that are required to use Machine Learning Designer.
Configure the component
To configure the parameters of the Update Easyrec Config component, you can use one of the following methods.
Method 1: Configure the component in the PAI console
Input ports
Input port (left to right)
Recommended upstream component
Parameter of PAI commands
Required
config table input
NoteThe table is a statistical table that collects the number of occurrences of each feature and the bucket values of numeric features.
MaxCompute tables. Upstream components: SQL Scripts and Read Table.
config_table
Yes
Component parameters
Tab
Parameter
Required
Description
Parameter of PAI commands
Default value
Parameters Setting
rectemplate produce template.config
Yes
The OSS path in which the template configuration file generated by the recommendation template is stored.
template_config_path
N/A
easyrec.config output path
Yes
The output path of the EasyRec configuration file.
Passed in as
the output_config_path parameter
N/A
easyrec.config filename
Yes
The name of the EasyRec configuration file.
N/A
Specify the algorithm version
Yes
Select an algorithm package.
Generate a TAR package of EasyRec. For more information, see Release & Upgrade.
Upload the TAR package to the OSS path. For more information, see Upload objects.
Select the TAR package that you uploaded.
script
N/A
Model Tuning
Worker Count
No
The number of worker nodes.
The parameters on the Tuning tab are passed in as the cluster parameter.
1
Worker CPU
No
The number of vCPUs for each worker node. A value of 1 indicates one vCPU.
8
Worker Memory
No
The memory size of each worker node. The value 100 specifies 100 MB.
40000
Worker GPU
No
GPUs are not required in most EasyRec trainings.
0
Output ports
Output port (left to right)
Data type
Parameter of PAI commands
Required
easyrec config output
Data type: OSS path. Component: model training
output_config_path
Yes
Method 2: Use PAI commands
To configure the Update Easyrec Config component by using PAI commands, run the commands in the SQL Script component. For more information, see SQL Script.
PAI -project algo_public -name easy_rec_ext
-Darn="acs:ram::xxx:role/aliyunodpspaidefaultrole"
-Dbuckets="oss://rec_sln_demo/"
-Dcluster="{\"worker\": {\"count\": 1, \"cpu\": 800, \"gpu\": 0, \"memory\": 40000}}"
-Dcmd="custom"
-DentryFile="easy_rec/python/tools/add_feature_info_to_config.py"
-Dextra_params="--template_config_path=oss://rec_sln_demo/EasyRec/deploy/rec_sln_demo_dssm_recall_v1/rec_sln_demo_dssm_recall_v1_template.config --output_config_path=oss://rec_sln_demo/EasyRec/deploy/rec_sln_demo_dssm_recall_v1//rec_sln_demo_dssm_recall_v1.config --config_table=odps://pai_hangzhou/tables/pai_temp_flow_26un8zq7v4goadi373_node_2m6yfr7q3a69m9jv7n_outputTable"
-Dlifecycle="28"
-DossHost="oss-cn-hangzhou-internal.aliyuncs.com"
-Dscript="oss://rec_sln_demo/easy_rec_ext_0.6.1_res.tar.gz"
-Dtables="odps://pai_hangzhou/tables/pai_temp_flow_26un8zq7v4goadi373_node_2m6yfr7q3a69m9jv7n_outputTable";
Parameter | Required | Description |
entryFile | Yes | The entry file. The add_feature_info_to_config.py script is run. |
cmd | Yes | If the cmd parameter is set to custom, the custom script in EasyRec is run. |
arn | Yes | The resource group authorization information. To obtain the Alibaba Cloud Resource Name (ARN) of the services, perform the following steps: Log on to the PAI console. In the left-side navigation pane, choose Activation & Authorization > Dependent Services. In the Designer section, click View Authorization in the Actions column. |
ossHost | Yes | The OSS endpoint. For more information, see Regions and endpoints. |
buckets | Yes | The OSS bucket in which the TAR package of EasyRec and the model are stored. Separate multiple buckets with commas (,). Example: |
extra_params | Yes | Additional parameters that are not specified in the pipeline, including the parameters that are used to specify the template_config_path temporary file, the output_config_path output path, and the config_table feature information table. |
script | No | The path in which the TAR package is generated. For more information, see Release & Upgrade. Upload the TAR package to OSS and specify the OSS path of the TAR package. Sample TAR package: easy_rec_ext_0.6.1_res.tar.gz. |
Examples
Download the dssm_recall_30d_config_v1.csv feature information data file and the template.config temporary file.
The feature information data file and the template.config temporary file are generated by PAIREC. The data and file are provided to help you easily use the Update Easyrec Config component in this example.
Create data tables for the feature information on the MaxCompute client. For more information about how to use the MaxCompute client, see MaxCompute client (odpscmd).
CREATE TABLE IF NOT EXISTS dssm_recall_30d_config_v1(feature STRING,feature_info STRING,message STRING);
Upload the downloaded dataset dssm_recall_30d_config_v1.csv to the created MaxCompute table. For more information about how to use the MaxCompute client to upload data, see Tunnel commands.
tunnel upload dssm_recall_30d_config_v1.csv dssm_recall_30d_config_v1 -fd \t;
Upload the template.config temporary file to OSS. For more information, see Upload objects.
Create a pipeline. The following figure shows a pipeline.
Section
Description
1
Set the Table Name parameter of the Read Table-51 component to the dssm_recall_30d_config_v1 table that you created.
2
On the Parameters Setting tab of the Update Easyrec Config-1 component, configure the following parameters:
rectemplate produce template.config: Select the OSS path in which the template.config temporary file is stored.
easyrec.config output path: Select the output path in which the template.config temporary file is stored. You cannot select a bucket-level path as the output path. You must select a directory of a bucket.
easyrec.config filename: Enter a custom file name.
Specify the algorithm version: Select the OSS path in which the TAR package of EasyRec is stored. For more information, see Release & Upgrade of EasyRec documentation. Sample TAR package: easy_rec_ext_0.6.1_res.tar.gz.
Click the
icon to run the pipeline.
After the pipeline is run, you can view the EasyRec configuration file generated in the OSS path that is specified by the easyrec.config output path parameter.
References
The Update EasyRec Config component is used to run the node 11_rec_sln_demo_dssm_recall_v1_update_config . For more information about how to use the component, see DSSM vector recall.