This topic describes how to use the data mining components that Machine Learning Platform for AI (PAI) provides to perform offline scheduling in the scenario of ad click-through rate (CTR) prediction.
Background information
The experiment described in this topic is conducted in the following procedure:
- Train a model in PAI based on historical data.
- Schedule the model in DataWorks.
- In the early morning of each day, the CTRs of ads are predicted and ads are delivered based on the predicted CTRs.
The dataset used in this experiment is generated by using a random number generator. Therefore, the experiment result is not evaluated. This topic describes only how to create an experiment and perform offline scheduling in DataWorks.
Dataset
The training dataset used in this experiment includes the historical data about the
ads delivered on September 19, 2016 and September 20, 2016. This experiment predicts
the CTRs of the ads delivered on September 21, 2016. The dataset is stored in a MaxCompute
partitioned table. The following table describes the fields in the dataset.
The following figure shows the sample table ad that this experiment uses.
Field | Data type | Description |
---|---|---|
id | STRING | The unique ID of the ad. |
age | DOUBLE | The age of the person to which the ad is delivered. |
sex | DOUBLE | The gender of the person to which the ad is delivered. Valid values: 1 (male) and 0 (female). |
duration | DOUBLE | The duration that the ad is displayed. Unit: seconds. |
place | DOUBLE | The position where the ad is displayed. Valid values: 0 to 4. |
ctr | DOUBLE | The CTR of the ad. If the number of clicks divided by the number of views is greater than 0.03 for the ad, the value of this field is 1. Otherwise, the value of this field is 0. |
dt | STRING | The date when the ad is delivered. Format: YYYYMMDD. |

Step 1: Create an experiment
- Create and configure an experiment.
- Set the component parameters.
- In the top toolbar of the canvas, click Run.
- After the experiment is run, right-click ad_result-1 on the canvas and select View Data to view the table that is generated based on the prediction result. The following
figure shows the table.
In the table, the prediction_result field indicates whether the ad is clicked. Valid values: 1 and 0. 1 indicates that the ad is clicked, whereas 0 indicates that the ad is not clicked. The prediction_score field indicates the probability that the ad is clicked.