All Products
Search
Document Center

Platform For AI:Model Train

Last Updated:Feb 06, 2024

You can use the Model Train component to train EasyRec models.

Prerequisites

Object Storage Service (OSS) is activated, and Machine Learning Designer is authorized to access OSS. For more information, see Activate OSS and Grant the permissions that are required to use Machine Learning Designer.

Configure the component

You can use one of the following methods to configure the Model Train component:

Configure the component in Machine Learning Designer

  • Input ports

Port (from left to right)

Recommended upstream component

Parameters of PAI commands

Required

Negative Sample Item Feature Table

Note

In most cases, negative sampling is used in specific algorithms such as DSSM.

data_config.negative_sampler.input_path

No

easyrec.config

Note

You need to specify the full OSS path.

config

No

Train Table

train_tables

Yes

Evaluation Table

eval_tables

Yes

fine_tune_checkpoint

Note

The model is trained based on this checkpoint.

train_config.fine_tune_checkpoint in edit_config_json

No

Boundary Table

boundary_table

No

  • Component parameters

Tab

Parameter

Required

Description

Parameters of PAI commands

Default value

Parameters Setting

Model Dir

No

The path in which the model is stored.

model_dir

Data Storage

EasyRec Configuration

No

If the input port easyrec.config is not specified, you can specify an OSS path to store the code. For more information, see model_config.

config

None

Selected label columns for training and evaluation

No

This parameter is available when RTP FG is selected.

You can set this parameter to specify the target columns that are used for training and evaluation.

Part of the selected_cols parameter

None

Selected weight columns for training and evaluation

No

This parameter is available when RTP FG is selected.

You can set this parameter to specify the weight columns that are used for training and evaluation.

None

Selected feature columns for training and evaluation

No

This parameter is available when RTP FG is selected.

You can set this parameter to specify the feature columns that are used for training and evaluation.

None

Specify the algorithm version

No

In the Advanced Options section, select an algorithm version to train the model.

  1. Generate a TAR package. For more information, see Update EasyRec.

  2. Upload the TAR package to the OSS path. For more information, see Upload objects.

  3. Select the uploaded file in the console.

script

None

Hyperparameter edit_config_json

No

In the Advanced Options section, use the hyperparameter to specify the content that you want to add to the EasyRec configuration file. The component adds the hyperparameter configuration to the EasyRec configuration file.

edit_config_json

None

Model Tuning

ps Count

No

The number of parameter server (PS) nodes.

The tuning parameters are integrated into the cluster parameter.

2

ps CPU

No

The CPU number of each PS. The value 1 indicates one vCPU.

10

ps Memory

No

The memory of each PS. Unit: MB.

40000

Worker Count

No

The number of workers.

6

Worker CPU

No

The CPU number of each worker. A value of 1 indicates one vCPU.

8

Worker Memory

No

The memory size for each worker. Unit: MB.

40000

Worker GPU

No

GPUs are not required in most EasyRec trainings.

0

PAI command and parameters

PAI -project algo_public -name easy_rec_ext 
    -Darn="acs:ram::xxxx:role/aliyunodpspaidefaultrole" 
    -Dbuckets="oss://rec_sln_demo/" 
    -Dcluster="{\"ps\": {\"count\": 2, \"cpu\": 1000, \"memory\": 40000}, \"worker\": {\"count\": 6, \"cpu\": 800, \"gpu\": 0, \"memory\": 40000}}" 
    -Dcmd="train" 
    -Dconfig="oss://rec_sln_demo/EasyRec/deploy/rec_sln_demo_dssm_recall_v1/rec_sln_demo_dssm_recall_v1.config" 
    -Deval_tables="odps://pai_hangzhou/tables/pai_temp_flow_26un8zq7v4goadi373_node_39w13qw9osm9rdbu0h_outputTable" 
    -Dlifecycle="28" 
    -Dmodel_dir="oss://rec_sln_demo/EasyRec/deploy/rec_sln_demo_dssm_recall_v1/20230425" 
    -DossHost="oss-cn-hangzhou-internal.aliyuncs.com" 
    -Dscript="oss://rec_sln_demo/easy_rec_ext_0.6.1_res.tar.gz" 
    -Dselected_cols="is_click,features" 
    -Dtables="odps://pai_hangzhou/tables/pai_temp_flow_26un8zq7v4goadi373_node_4ijqwcg7upzteu5036_outputTable,odps://pai_hangzhou/tables/pai_temp_flow_26un8zq7v4goadi373_node_39w13qw9osm9rdbu0h_outputTable,odps://pai_hangzhou/tables/pai_temp_flow_fty24i21e9dzvzj6a0_node_svxd0bqu2x7ep8furu_outputTable" 
    -Dtrain_tables="odps://pai_hangzhou/tables/pai_temp_flow_26un8zq7v4goadi373_node_4ijqwcg7upzteu5036_outputTable"
    -Dedit_config_json="{\"train_config.fine_tune_checkpoint\": \"oss://rec_sln_demo/EasyRec/deploy/rec_sln_demo_dssm_recall_v1/20230405/\", \"data_config.negative_sampler.input_path\": \"odps://pai_hangzhou/tables/pai_temp_flow_fty24i21e9dzvzj6a0_node_svxd0bqu2x7ep8furu_outputTable\"}" ;

Parameter

Required

Description

cmd

Yes

Set cmd to train to perform model training.

config

Yes

The full OSS path of the EasyRec configuration file used for training.

train_tables

Yes

The training table. Specify the parameter in the format of odps://{project}/tables/{table name}. Separate multiple training tables with commas (,).

eval_tables

Yes

Evaluation tables. Specify the parameter in the format of odps://{project}/tables/{table name}. Separate multiple evaluation tables with commas (,).

arn

Yes

The resource group authorization information. To obtain arn, log on to the PAI console and choose Activation & Authorization > Dependent Services. In the Designer section, click View Authorization in the Actions column.

ossHost

Yes

The endpoint of OSS. For more information about endpoints, see Regions and endpoints.

buckets

Yes

The bucket where the configuration file resides and the bucket where the model is stored. Separate multiple buckets with commas (,). Example: oss://xxxx/,oss://xxxx/.

model_dir

Yes

The path of the model. If you specify the model_dir parameter, the model path in the EasyRec configuration file is ignored. This parameter is used for periodic scheduling

edit_config_json

No

Modify the fields in the config file in JSON. Example: edit_config_json="{\"train_config.fine_tune_checkpoint\": \"oss://xxx/\"}".

script

No

Specify the TAR package used for EasyRec training.

selected_cols

No

Columns selected for training and evaluation, which are used to accelerate training.

Examples

  1. Prepare the following datasets:

    • train: pai_online_project.easyrec_demo_taobao_train_data

    • test: pai_online_project.easyrec_demo_taobao_test_data

    Note

    The preceding two tables are your own ODPS tables. To facilitate testing, the two tables provided above can be publicly accessed.

  2. Create a pipeline as shown in the following figure. image..png

    Area

    Description

    In the Read Table-1 component, set the Table Name parameter to pai_online_project.easyrec_demo_taobao_train_data Training Table.

    In the Read Table-2 component, set the Table Name parameter to pai_online_project.easyrec_demo_taobao_test_data Test Table.

    • Upload the EasyRec configuration file to OSS and select the uploaded file in the EasyRec Configuration section.

    • Specify the path where the model is stored in the Model Dir section.

  3. View the output model.

    After the pipeline is run, you can view the exported model in the OSS path that you specified by the Model Dir parameter.

  4. Use Logview to analyze logs.

    When you run the command to perform model training, the system prints the Logview URL. Right-click the Model Train component and click View Logs in the shortcut menu. You can use Logview to view the model training result and locate errors.

image..png

On the page that shows the running status of workers, view the tasks and workers.

image..png

Where:

  • Worker 0 is a training worker. Click the icon in the StdErr column to view the training process.

  • Worker 1 is an evaluation worker. Click the icon in the StdErr column to view the metrics of the model on the evaluation set.

For more information, see 8_rec_sln_demo_rec_sln_demo_sorting_v2_train in Rank and 12 _rec_sln_demo_dssm_recall_v1_train in DSSM vector recall.