This topic describes how to use the user feature algorithm that is provided by Machine Learning Platform for AI (PAI) to create a model to monitor user churn.
Background information
How to increase the user base and retain existing users is key to business growth. You can use risk control models to identify likely-to-churn users and take measures to prevent these users from churning.
Mainstream solutions for monitoring user churn are based on rules and are not intelligent enough to accurately mine likely-to-churn users.
Solutions
PAI provides a comprehensive solution to implement feature encoding, classification model training, and model evaluation based on labeled data. The following conditions must be met before you can use this solution:
You master basic modeling knowledge.
You can be fully engaged in the development for one to two days.
You have more than 1,000 labeled data records that show the characteristics of situations in which users churn.
Datasets
The pipeline described in this topic uses masked real data that is collected from a telecommunications platform. The entire dataset contains 7,043 data records, including the basic information and churn status of each user. The following figure shows the sample data that is used in the pipeline.
The following table describes the fields in the dataset.
Field | Description |
customerid | The ID of the user. |
gender | The gender of the user. |
SeniorCitizen | Specifies whether the user is a citizen. Valid values:
|
Partner | Specifies whether the user has a partner. |
Dependents | Specifies whether the user is affiliated. |
tenure | The duration for which the user is served by the service provider. |
PhoneService | Specifies whether the user subscribes to mobile phone services. |
MultipleLine | Specifies whether the user uses multiple lines of services. |
InternetService | The Internet service to which the user subscribes, such as DSL or Fiber optic. |
OnlineSecurity | Specifies whether the user faces Internet security issues. |
OnlineBackup | Specifies whether the user has access to online support. |
DeviceProtection | Specifies whether the user has access to service protection. |
TechSupport | Specifies whether the user has applied for technical support. |
StreamingTV | Specifies whether the user has access to streaming TV programs. |
StreamingMovies | Specifies whether the user has access to streaming movies. |
Contract | The contract period, such as Month-to-month or Two year. |
PaperlessBilling | Specifies whether the user receives electronic bills. |
PaymentMethod | The payment method used by the user. |
MonthlyCharges | The monthly expenses of the user. |
TotalCharges | The total expenses of the user. |
The following table describes the field in labeled data.
Field | Description |
churn | Specifies whether the user churns. |
Procedure
Go to the Machine Learning Designer page.
Log on to the PAI console.
In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.
In the left-side navigation pane, choose .
Create a pipeline.
On the Visualized Modeling (Designer) page, click the Preset Templates tab.
On this tab, find the Churn User Monitoring template and click Create.
In the Create Pipeline dialog box, configure the parameters. You can use their default values.
The value specified for the Data Storage parameter is the Object Storage Service (OSS) bucket path of the temporary data and models generated during the runtime of the pipeline.
Click OK.
It requires about 10 seconds to create the pipeline.
On the Pipelines tab, double-click the created Churn User Monitoring pipeline to open it.
View the components of the pipeline on the canvas, as shown in the following figure. The system automatically creates the pipeline based on the built-in template.

Section
Description
①
The component displayed in this section imports the dataset used by the pipeline.
②
The One Hot Encoding-1 and SQL Script-1 components displayed in this section perform feature engineering to convert STRING-typed feature data to numeric-typed feature data. For example, the original value of the churn field is Yes or No. The SQL Script-1 component executes an SQL statement to convert Yes to 1 and No to 0.
select (case churn when 'Yes' then 1 else 0 end) as churn from ${t1};③
The components displayed in this section divide the dataset into a training dataset and a prediction dataset. A user may churn or not. Therefore, you can use a binary classification algorithm to predict user churn.
④
The Evaluate component evaluates the quality of the model by using indexes such as an area under curve (AUC), a Kolmogorov-Smirnov (KS) value, or an F1 score.
Run the pipeline and view the results.
In the upper-left corner of the canvas, click the Run icon.
After the pipeline is run, right-click the Binary Classification Evaluation-1 component. In the shortcut menu that appears, click Visual Analysis.
In the Evaluate section, click the Index data tab to view the indexes that are used to evaluate the model.
The closer the AUC value is to 1, the higher the prediction accuracy of the model is. In the preceding figure, the AUC value is greater than 0.8. This indicates that the prediction accuracy of the model is high.