This topic describes how to use the user feature algorithm that is provided by Machine Learning Platform for AI (PAI) to create a model to monitor user churn.

Background information

How to increase the user base and retain existing users is key to business growth. You can use risk control models to identify likely-to-churn users and take measures to prevent these users from churning.

Mainstream solutions for monitoring user churn are based on rules and are not intelligent enough to accurately mine likely-to-churn users.

Solutions

PAI provides a comprehensive solution to implement feature encoding, classification model training, and model evaluation based on labeled data. The following conditions must be met before you can use this solution:
  • You master basic modeling knowledge.
  • You can be fully engaged in the development for one to two days.
  • You have more than 1,000 labeled data records that show the characteristics of situations in which users churn.

Datasets

The pipeline described in this topic uses masked real data that is collected from a telecommunications platform. The entire dataset contains 7,043 data records, including the basic information and churn status of each user. The following figure shows the sample data that is used in the pipeline. Dataset for the pipeline on monitoring user churnThe following table describes the fields in the dataset.
Field Description
customerid The ID of the user.
gender The gender of the user.
SeniorCitizen Specifies whether the user is a citizen. Valid values:
  • 1: The user is a citizen.
  • 0: The user is not a citizen.
Partner Specifies whether the user has a partner.
Dependents Specifies whether the user is affiliated.
tenure The duration for which the user is served by the service provider.
PhoneService Specifies whether the user subscribes to mobile phone services.
MultipleLine Specifies whether the user uses multiple lines of services.
InternetService The Internet service to which the user subscribes, such as DSL or Fiber optic.
OnlineSecurity Specifies whether the user faces Internet security issues.
OnlineBackup Specifies whether the user has access to online support.
DeviceProtection Specifies whether the user has access to service protection.
TechSupport Specifies whether the user has applied for technical support.
StreamingTV Specifies whether the user has access to streaming TV programs.
StreamingMovies Specifies whether the user has access to streaming movies.
Contract The contract period, such as Month-to-month or Two year.
PaperlessBilling Specifies whether the user receives electronic bills.
PaymentMethod The payment method used by the user.
MonthlyCharges The monthly expenses of the user.
TotalCharges The total expenses of the user.
The following table describes the field in labeled data.
Field Description
churn Specifies whether the user churns.

Procedure

  1. Go to the Machine Learning Designer page.
    1. Log on to the Machine Learning Platform for AI console.
    2. In the left-side navigation pane, click Workspaces. On the Workspace list page, click the name of the workspace that you want to manage.
    3. In the left-side navigation pane, choose Model Training > Visualized Modeling (Designer) to go to the Machine Learning Designer page.
  2. Create a pipeline.
    1. On the Visualized Modeling (Designer) page, click the Preset Templates tab.
    2. On this tab, find the Churn User Monitoring template and click Create.
    3. In the Create Pipeline dialog box, configure the parameters. You can use their default values.
      The value specified for the Pipeline Data Path parameter is the Object Storage Service (OSS) bucket path of the temporary data and models generated during the runtime of the pipeline.
    4. Click OK.
      It requires about 10 seconds to create the pipeline.
    5. On the Pipelines tab, double-click the created Churn User Monitoring pipeline to open it.
    6. View the components of the pipeline on the canvas, as shown in the following figure. The system automatically creates the pipeline based on the built-in template.
      Churn user monitoring
      Section Description
      The component displayed in this section imports the dataset used by the pipeline.
      The One Hot Encoding-1 and SQL Script-1 components displayed in this section perform feature engineering to convert STRING-typed feature data to numeric-typed feature data. For example, the original value of the churn field is Yes or No. The SQL Script-1 component executes an SQL statement to convert Yes to 1 and No to 0.
      select (case churn  when 'Yes' then 1 else 0 end) as churn from  ${t1};
      The components displayed in this section divide the dataset into a training dataset and a prediction dataset. A user may churn or not. Therefore, you can use a binary classification algorithm to predict user churn.
      The Binary Classification Evaluation-1 component evaluates the quality of the model by using indexes such as an area under curve (AUC), a Kolmogorov-Smirnov (KS) value, or an F1 score.
  3. Run the pipeline and view the results.
    1. In the upper-left corner of the canvas, click the Run icon.
    2. After the pipeline is run, right-click the Binary Classification Evaluation-1 component. In the shortcut menu that appears, click Visual Analysis.
    3. In the Evaluate section, click the Index data tab to view the indexes that are used to evaluate the model.
      Model effectsThe closer the AUC value is to 1, the higher the prediction accuracy of the model is. In the preceding figure, the AUC value is greater than 0.8. This indicates that the prediction accuracy of the model is high.