Deploy a complete CTR prediction pipeline for consistent offline training and online inference - Platform For AI

Consistency challenge

Prediction errors often stem from mismatched feature transformations between training and serving. When normalization parameters or encoding mappings are rebuilt at serving time, the model receives different inputs than during training.

PAI Designer packages preprocessing, feature engineering, and prediction into one deployable unit. Normalization parameters and one-hot encoding mappings from training are preserved and reused at inference.

Prerequisites

Prepare the following resources:

A PAI workspace. See Create and manage workspaces.
MaxCompute resources associated with the workspace.

Dataset

This tutorial uses a 200,000-sample subset of the Avazu CTR prediction dataset: 160,000 training samples and 40,000 test samples.

Column name	Type	Description
id	STRING	Advertisement ID
click	DOUBLE	Click indicator (1 = clicked, 0 = not clicked)
dt_year	INT	Year
dt_month	INT	Month
dt_day	INT	Day
dt_hour	INT	Hour
c1	STRING	Anonymized categorical variable
banner_pos	INT	Banner position
site_id	STRING	Site ID
site_domain	STRING	Site domain
site_category	STRING	Site category
app_id	STRING	Application ID
app_domain	STRING	Application domain
app_category	STRING	Application category
device_id	STRING	Device ID
device_ip	STRING	Device IP address
device_model	STRING	Device model
device_type	STRING	Device type
device_conn_type	STRING	Device connection type
c14 - c21	DOUBLE	Anonymized categorical variables (8 columns)

Open Designer

Log on to the PAI console.
In the left-side navigation pane, click Workspaces. On the Workspaces page, click the workspace name.
In the left-side navigation pane, choose Model Training > Visualized Modeling (Designer).

Create a pipeline from a template

On the Designer page, click the Preset Templates tab.
Find Click-Through Rate Prediction and click Create.
In the Create Pipeline dialog box, set Data Storage to an OSS bucket path for temporary data and models. Keep default values for other parameters.
Click OK. Pipeline creation takes approximately 10 seconds.
In the pipeline list, select Click-Through Rate Prediction and click Open.

Workflow structure

The template processes features in two parallel paths before combining them for training:

Numerical features: Normalized to a common range.
Categorical features: One-hot encoded into binary vectors, then combined with normalized numerical features using Vector Assembler.

The combined feature vector feeds into the Factorization Machine algorithm for training and prediction.

Run and evaluate

At the top of the canvas, click the run button to start execution.
After completion, right-click Binary Classification Evaluation-1 and select Visual Analytics. Alternatively, click the evaluation icon at the top.
In the Binary Classification Evaluation-1 dialog box, view prediction accuracy on the Metrics Data tab.

The evaluation displays AUC, KS (Kolmogorov-Smirnov statistic), and F1 Score. AUC is the primary metric for CTR prediction. An AUC above 0.70 indicates reasonable performance on this dataset subset.

Package and deploy

When evaluation metrics meet requirements, package the entire pipeline—preprocessing, feature engineering, and prediction—and deploy to EAS.

Package model

At the top of the canvas, click Create Pipeline Model to start packaging.
Select Normalization Batch Prediction-2. The downstream pipeline is automatically selected. Click Next to package the selected pipeline and models.
Confirm the packaging information and click Next. Packaging takes 3-5 minutes.

Deploy service

Deploy the packaged model using either method:

Method 1: After Run Status shows Successful, click Deploy to EAS. Configure Service Name and Resource Deployment Information, then click Deploy. See Deploy a pipeline as an online service.
Method 2: If you closed the dialog box, click View All Tasks in the upper-right corner. In Historical Tasks, wait for Status to show Success:
- Click Actions > Model > Deploy.
- Alternatively, click Model List at the top. Select the packaged model and click Deploy to EAS.

Test service

In the EAS console, find your service and click Online Debugging in the Actions column. See Debug a service online.

In Request Body, enter test data matching the dataset schema:

[{"id":"10000169349117863715","click":0.0,"dt_year":14,"dt_month":10,"dt_day":21,"dt_hour":0,"C1":"1005","banner_pos":0,"site_id":"1fbe01fe","site_domain":"f3845767","site_category":"28905ebd","app_id":"ecad2386","app_domain":"7801e8d9","app_category":"07d7df22","device_id":"a99f214a","device_ip":"96809ac8","device_model":"711ee120","device_type":"1","device_conn_type":"0","c14":15704.0,"c15":320.0,"c16":50.0,"c17":1722.0,"c18":0,"c19":35.0,"c20":100084.0,"c21":79.0}]

Click Send Request. The service processes data through the inference pipeline: Normalization Prediction > One-Hot Encoding Prediction > Vector Assembler > FM Prediction.

The response contains a prediction score for each input record. Scores closer to 1.0 indicate higher click probability; scores closer to 0.0 indicate lower probability. Because the service uses the same pipeline evaluated offline, predictions match offline metrics.

Clean up

Remove resources to avoid ongoing charges:

EAS service: In the EAS console, stop or delete the deployed service.
Pipeline model: In Designer, click Model List and delete the packaged model.
Workflow data: Remove temporary data stored in the OSS bucket path specified in Data Storage.
Workspace resources: If this workspace was created solely for this tutorial, delete the workspace and associated MaxCompute resources.

Platform For AI:CTR prediction with consistent offline and online inference