All Products
Search
Document Center

Platform For AI:Update Easyrec Config

Last Updated:May 06, 2024

The Update Easyrec Config component of Platform for AI (PAI) uses the add_feature_info_to_config.py script in EasyRec to add feature information to the template.config file that is generated by PAIREC. PAIREC is an end-to-end in-depth customization and development platform. This topic describes how to configure the Update Easyrec Config component.

Prerequisites

Object Storage Service (OSS) is activated, and Machine Learning Designer is authorized to access OSS. For more information, see Activate OSS and Grant the permissions that are required to use Machine Learning Designer.

Configure the component

To configure the parameters of the Update Easyrec Config component, you can use one of the following methods.

Method 1: Configure the component in the PAI console

  • Input ports

    Input port (left to right)

    Recommended upstream component

    Parameter of PAI commands

    Required

    config table input

    Note

    The table is a statistical table that collects the number of occurrences of each feature and the bucket values of numeric features.

    MaxCompute tables. Upstream components: SQL Scripts and Read Table.

    config_table

    Yes

  • Component parameters

    Tab

    Parameter

    Required

    Description

    Parameter of PAI commands

    Default value

    Parameters Setting

    rectemplate produce template.config

    Yes

    The OSS path in which the template configuration file generated by the recommendation template is stored.

    template_config_path

    N/A

    easyrec.config output path

    Yes

    The output path of the EasyRec configuration file.

    Passed in as

    the output_config_path parameter

    N/A

    easyrec.config filename

    Yes

    The name of the EasyRec configuration file.

    N/A

    Specify the algorithm version

    Yes

    Select an algorithm package.

    1. Generate a TAR package of EasyRec. For more information, see Release & Upgrade.

    2. Upload the TAR package to the OSS path. For more information, see Upload objects.

    3. Select the TAR package that you uploaded.

    script

    N/A

    Model Tuning

    Worker Count

    No

    The number of worker nodes.

    The parameters on the Tuning tab are passed in as the cluster parameter.

    1

    Worker CPU

    No

    The number of vCPUs for each worker node. A value of 1 indicates one vCPU.

    8

    Worker Memory

    No

    The memory size of each worker node. The value 100 specifies 100 MB.

    40000

    Worker GPU

    No

    GPUs are not required in most EasyRec trainings.

    0

  • Output ports

    Output port (left to right)

    Data type

    Parameter of PAI commands

    Required

    easyrec config output

    Data type: OSS path. Component: model training

    output_config_path

    Yes

Method 2: Use PAI commands

To configure the Update Easyrec Config component by using PAI commands, run the commands in the SQL Script component. For more information, see SQL Script.

PAI -project algo_public -name easy_rec_ext 
    -Darn="acs:ram::xxx:role/aliyunodpspaidefaultrole" 
    -Dbuckets="oss://rec_sln_demo/" 
    -Dcluster="{\"worker\": {\"count\": 1, \"cpu\": 800, \"gpu\": 0, \"memory\": 40000}}" 
    -Dcmd="custom" 
    -DentryFile="easy_rec/python/tools/add_feature_info_to_config.py" 
    -Dextra_params="--template_config_path=oss://rec_sln_demo/EasyRec/deploy/rec_sln_demo_dssm_recall_v1/rec_sln_demo_dssm_recall_v1_template.config --output_config_path=oss://rec_sln_demo/EasyRec/deploy/rec_sln_demo_dssm_recall_v1//rec_sln_demo_dssm_recall_v1.config --config_table=odps://pai_hangzhou/tables/pai_temp_flow_26un8zq7v4goadi373_node_2m6yfr7q3a69m9jv7n_outputTable" 
    -Dlifecycle="28" 
    -DossHost="oss-cn-hangzhou-internal.aliyuncs.com" 
    -Dscript="oss://rec_sln_demo/easy_rec_ext_0.6.1_res.tar.gz" 
    -Dtables="odps://pai_hangzhou/tables/pai_temp_flow_26un8zq7v4goadi373_node_2m6yfr7q3a69m9jv7n_outputTable";

Parameter

Required

Description

entryFile

Yes

The entry file. The add_feature_info_to_config.py script is run.

cmd

Yes

If the cmd parameter is set to custom, the custom script in EasyRec is run.

arn

Yes

The resource group authorization information. To obtain the Alibaba Cloud Resource Name (ARN) of the services, perform the following steps: Log on to the PAI console. In the left-side navigation pane, choose Activation & Authorization > Dependent Services. In the Designer section, click View Authorization in the Actions column.

ossHost

Yes

The OSS endpoint. For more information, see Regions and endpoints.

buckets

Yes

The OSS bucket in which the TAR package of EasyRec and the model are stored. Separate multiple buckets with commas (,). Example: oss://xxxx/,oss://xxxx/.

extra_params

Yes

Additional parameters that are not specified in the pipeline,

including the parameters that are used to specify the template_config_path temporary file, the output_config_path output path, and

the config_table feature information table.

script

No

The path in which the TAR package is generated. For more information, see Release & Upgrade. Upload the TAR package to OSS and specify the OSS path of the TAR package. Sample TAR package: easy_rec_ext_0.6.1_res.tar.gz.

Examples

  1. Download the dssm_recall_30d_config_v1.csv feature information data file and the template.config temporary file.

Note

The feature information data file and the template.config temporary file are generated by PAIREC. The data and file are provided to help you easily use the Update Easyrec Config component in this example.

  1. Create data tables for the feature information on the MaxCompute client. For more information about how to use the MaxCompute client, see MaxCompute client (odpscmd).

    CREATE TABLE IF NOT EXISTS dssm_recall_30d_config_v1(feature STRING,feature_info STRING,message STRING);
  2. Upload the downloaded dataset dssm_recall_30d_config_v1.csv to the created MaxCompute table. For more information about how to use the MaxCompute client to upload data, see Tunnel commands.

    tunnel upload dssm_recall_30d_config_v1.csv dssm_recall_30d_config_v1 -fd \t;
  3. Upload the template.config temporary file to OSS. For more information, see Upload objects.

  4. Create a pipeline. The following figure shows a pipeline.

    a3588ed9d6a79a95967530f2ce0cbdb3

    Section

    Description

    1

    Set the Table Name parameter of the Read Table-51 component to the dssm_recall_30d_config_v1 table that you created.

    2

    On the Parameters Setting tab of the Update Easyrec Config-1 component, configure the following parameters:

    • rectemplate produce template.config: Select the OSS path in which the template.config temporary file is stored.

    • easyrec.config output path: Select the output path in which the template.config temporary file is stored. You cannot select a bucket-level path as the output path. You must select a directory of a bucket.

    • easyrec.config filename: Enter a custom file name.

    • Specify the algorithm version: Select the OSS path in which the TAR package of EasyRec is stored. For more information, see Release & Upgrade of EasyRec documentation. Sample TAR package: easy_rec_ext_0.6.1_res.tar.gz.

  5. Click the image icon to run the pipeline.

    After the pipeline is run, you can view the EasyRec configuration file generated in the OSS path that is specified by the easyrec.config output path parameter.

References

The Update EasyRec Config component is used to run the node 11_rec_sln_demo_dssm_recall_v1_update_config . For more information about how to use the component, see DSSM vector recall.