Community Blog Alibaba Cloud Machine Learning Platform for AI: Offline Scheduling Instructions

Alibaba Cloud Machine Learning Platform for AI: Offline Scheduling Instructions

This article will illustrate how to implement an ad click-through rate (CTR) prediction through the Alibaba Cloud Machine Learning Platform for Artificial Intelligence.

By Garvin Li

This article implements an ad click-through rate (CTR) prediction scenario. Ad CTR prediction is a typical application in the advertising industry. By using history data to train the prediction model, this prediction method predicts daily increment data, and finds and advertises samples that meet the ad CTR standard.

The whole experiment uses Alibaba Cloud Machine Learning to perform data mining and uses DataWorks to perform scheduling and pushing. Here is the specific business scenario:

  1. Use historical data to perform model training on the Alibaba Cloud machine learning platform.
  2. Use DataWorks to perform scheduling for the model.
  3. Perform CTR prediction on ads at midnight every day to find and push ads that meet the standards.

Dataset Introduction

The detailed fields are as follows:


Because data shown in the following screenshot is randomly generated by using the random algorithm, this experiment doesn't evaluate results, and mainly describes the experiment establishment and the use and scheduling of DataWorks. History data of 20160919 and 20160920 is used to predict 20160921 data. The MaxCompute partition table is used.


Experiment Procedure

The following diagram shows the experiment process.


The experiment can be roughly divided into four modules: data source importing (ads), data pre-processing (normalization), model training (binary logistic regression), and predicting (prediction).

1. Importing Data Source

  1. "ad-2" is the data source for training.
  2. "ad-1" is the data source for predicting.
  3. In the partition table, configure partition to dt=@@{yyyyMMdd} to ensure prediction data is the daily incremental data, as shown in the following screenshot. (For more information on using partitions, please see https://help.aliyun.com/document_detail/30281.html?spm=5176.doc30276.6.126.3kX7OU)


2. Intermediate Processing

The intermediate process includes two steps: data normalization and model training. Model training is to use history data to train the generated prediction model. (For more principle details, please see Heart disease prediction case)

3. Data Prediction

The list of prediction results is "ad_result-1", as shown below.


  1. prediction_result: Indicates whether an ad is clicked. 1 indicates that an ad has been clicked, and 0 indicates that an ad has not been clicked.
  2. prediction_score: Indicates the probability of being clicked.

Module Scheduling

1. Go to the Workspace of DataWorks

Go to the homepage of the console, click DataWorks to access the Data IDE workspace.


DataWorks and the machine learning platform share the same set of projects. Select the project where the experiment to be scheduled for is located, and click Start Data Modeling.


2. Create a New Node Scheduling Task

Click New and select New Task


In the configuration section of the created task, select Node Task for Task Type and Machine Learning for Type.


3. Configure the Scheduling Task

After the node task has been created, select the machine learning task to be scheduled for and select scheduling time in the configuration bar on the right side. In this experiment, we choose to perform training and push information at 00:00 each day.


Click Submit. Submitted jobs will be effective next day.


4. Query Task Logs

After the scheduling task has been submitted, click Maintain to view logs


To learn more about Alibaba Cloud Machine Learning Platform for Artificial Intelligence (PAI), visit www.alibabacloud.com/product/machine-learning

0 0 0
Share on


14 posts | 2 followers

You may also like



14 posts | 2 followers

Related Products