Build a CTR prediction pipeline that maintains consistent feature transformations between offline training and online serving.
Consistency challenge
Prediction errors often stem from mismatched feature transformations between training and serving. When normalization parameters or encoding mappings are rebuilt at serving time, the model receives different inputs than during training.
PAI Designer packages preprocessing, feature engineering, and prediction into one deployable unit. Normalization parameters and one-hot encoding mappings from training are preserved and reused at inference.
Prerequisites
Prepare the following resources:
-
A PAI workspace. See Create and manage workspaces.
-
MaxCompute resources associated with the workspace.
Dataset
This tutorial uses a 200,000-sample subset of the Avazu CTR prediction dataset: 160,000 training samples and 40,000 test samples.
| Column name | Type | Description |
|---|---|---|
| id | STRING | Advertisement ID |
| click | DOUBLE | Click indicator (1 = clicked, 0 = not clicked) |
| dt_year | INT | Year |
| dt_month | INT | Month |
| dt_day | INT | Day |
| dt_hour | INT | Hour |
| c1 | STRING | Anonymized categorical variable |
| banner_pos | INT | Banner position |
| site_id | STRING | Site ID |
| site_domain | STRING | Site domain |
| site_category | STRING | Site category |
| app_id | STRING | Application ID |
| app_domain | STRING | Application domain |
| app_category | STRING | Application category |
| device_id | STRING | Device ID |
| device_ip | STRING | Device IP address |
| device_model | STRING | Device model |
| device_type | STRING | Device type |
| device_conn_type | STRING | Device connection type |
| c14 - c21 | DOUBLE | Anonymized categorical variables (8 columns) |
Open Designer
-
Log on to the PAI console.
-
In the left-side navigation pane, click Workspaces. On the Workspaces page, click the workspace name.
-
In the left-side navigation pane, choose Model Training > Visualized Modeling (Designer).
Create a pipeline from a template
-
On the Designer page, click the Preset Templates tab.
-
Find Click-Through Rate Prediction and click Create.
-
In the Create Pipeline dialog box, set Data Storage to an OSS bucket path for temporary data and models. Keep default values for other parameters.
-
Click OK. Pipeline creation takes approximately 10 seconds.
-
In the pipeline list, select Click-Through Rate Prediction and click Open.
Workflow structure
The template processes features in two parallel paths before combining them for training:
-
Numerical features: Normalized to a common range.
-
Categorical features: One-hot encoded into binary vectors, then combined with normalized numerical features using Vector Assembler.
The combined feature vector feeds into the Factorization Machine algorithm for training and prediction.
Run and evaluate
-
At the top of the canvas, click the run button to start execution.
-
After completion, right-click Binary Classification Evaluation-1 and select Visual Analytics. Alternatively, click the evaluation icon at the top.
-
In the Binary Classification Evaluation-1 dialog box, view prediction accuracy on the Metrics Data tab.
The evaluation displays AUC, KS (Kolmogorov-Smirnov statistic), and F1 Score. AUC is the primary metric for CTR prediction. An AUC above 0.70 indicates reasonable performance on this dataset subset.
Package and deploy
When evaluation metrics meet requirements, package the entire pipeline—preprocessing, feature engineering, and prediction—and deploy to EAS.
Package model
-
At the top of the canvas, click Create Pipeline Model to start packaging.
-
Select Normalization Batch Prediction-2. The downstream pipeline is automatically selected. Click Next to package the selected pipeline and models.
-
Confirm the packaging information and click Next. Packaging takes 3-5 minutes.
Deploy service
Deploy the packaged model using either method:
-
Method 1: After Run Status shows Successful, click Deploy to EAS. Configure Service Name and Resource Deployment Information, then click Deploy. See Deploy a pipeline as an online service.
-
Method 2: If you closed the dialog box, click View All Tasks in the upper-right corner. In Historical Tasks, wait for Status to show Success:
-
Click Actions > Model > Deploy.
-
Alternatively, click Model List at the top. Select the packaged model and click Deploy to EAS.
-
Test service
-
In the EAS console, find your service and click Online Debugging in the Actions column. See Debug a service online.
-
In Request Body, enter test data matching the dataset schema:
[{"id":"10000169349117863715","click":0.0,"dt_year":14,"dt_month":10,"dt_day":21,"dt_hour":0,"C1":"1005","banner_pos":0,"site_id":"1fbe01fe","site_domain":"f3845767","site_category":"28905ebd","app_id":"ecad2386","app_domain":"7801e8d9","app_category":"07d7df22","device_id":"a99f214a","device_ip":"96809ac8","device_model":"711ee120","device_type":"1","device_conn_type":"0","c14":15704.0,"c15":320.0,"c16":50.0,"c17":1722.0,"c18":0,"c19":35.0,"c20":100084.0,"c21":79.0}] -
Click Send Request. The service processes data through the inference pipeline: Normalization Prediction > One-Hot Encoding Prediction > Vector Assembler > FM Prediction.
The response contains a prediction score for each input record. Scores closer to 1.0 indicate higher click probability; scores closer to 0.0 indicate lower probability. Because the service uses the same pipeline evaluated offline, predictions match offline metrics.
Clean up
Remove resources to avoid ongoing charges:
-
EAS service: In the EAS console, stop or delete the deployed service.
-
Pipeline model: In Designer, click Model List and delete the packaged model.
-
Workflow data: Remove temporary data stored in the OSS bucket path specified in Data Storage.
-
Workspace resources: If this workspace was created solely for this tutorial, delete the workspace and associated MaxCompute resources.