This topic describes how to use the user feature algorithm that is provided by Machine Learning Platform for AI (PAI) to create a model to monitor user churn.

Background information

How to increase the user base and retain existing users is key to business growth. You can use risk control models to identify likely-to-churn users and take measures to prevent these users from churning.

Mainstream solutions for monitoring user churn are based on rules and are not intelligent enough. These solutions cannot accurately mine likely-to-churn users.

Solution

PAI provides a comprehensive solution to implement feature encoding, classification model training, and model evaluation based on labeled data. The following conditions must be met before you can use this solution:
  • You master basic modeling knowledge.
  • You can be fully engaged in the development for one to two days.
  • You have more than 1,000 labeled data records that show the characteristics of situations in which users churn.

Dataset

The experiment described in this topic is based on real data that is collected from a telecommunications platform after data masking. The entire dataset contains 7,043 data records, including the basic information and churn status of each user. The following figure shows the sample data that is used in the experiment. Dataset for the experiment on monitoring user churnThe following table describes the fields in the dataset.
Parameter Description
customerid The ID of the user.
gender The gender of the user.
SeniorCitizen Indicate whether the user is a citizen. Valid values:
  • 1: The user is a citizen.
  • 0: The user is not a citizen.
Partner Indicates whether the user has a partner.
Dependents Indicates whether the user is affiliated.
tenure The duration for which the user is served by the service provider.
PhoneService Indicates whether the user subscribes to mobile phone services.
MultipleLine Indicates whether the user uses multiple lines of services.
InternetService The Internet service that the user subscribes to, for example, DSL or Fiber optic.
OnlineSecurity Indicates whether the user faces Internet security issues.
OnlineBackup Indicates whether the user has access to online support.
DeviceProtection Indicates whether the user has access to service protection.
TechSupport Indicates whether the user has applied for technical support.
StreamingTV Indicates whether the user has access to streaming TV programs.
StreamingMovies Indicates whether the user has access to streaming movies.
Contract The contract period, for example, Month-to-month or Two year.
PaperlessBilling Indicates whether the user receives electronic bills.
PaymentMethod The payment method used by the user.
MonthlyCharges The monthly expenses of the user.
TotalCharges The total expenses of the user.
The following table describes the field in labeled data.
Parameter Description
churn Indicates whether the user churns.

Procedure for monitoring user churn

  1. Go to the Machine Learning Studio console.
    1. Log on to the PAI console.
    2. In the left-side navigation pane, choose Model Training > Studio-Modeling Visualization.
    3. On the PAI Visualization Modeling page, find the project in which you want to create an experiment and click Machine Learning in the Operation column.Machine Learning
  2. Create an experiment.
    1. In the left-side navigation pane, click Home.
    2. In the Templates section, click Create below Churn user monitoring.
    3. In the New Experiment dialog box, set the parameters. You can use the default values of the parameters.
      Parameter Description
      Name The name of the experiment. Default value: Churn user monitoring.
      Project The name of the project to which the experiment belongs. You cannot change the value of this parameter.
      Description The description of the experiment. Default value: Algorithm to mine potential churn users.
      Save To The directory for storing the experiment. Default value: My Experiments.
    4. Click OK.
    5. Optional:Wait about 10 seconds. Then, click Experiments in the left-side navigation pane.
    6. Optional:Click Churn user monitoring_XX under My Experiments. The canvas of the experiment appears.
      My Experiments is the directory for storing the experiment that you created and Churn user monitoring_XX is the name of the experiment. In the experiment name, _XX is the ID that the system automatically creates for the experiment.
    7. View the components of the experiment on the canvas, as shown in the following figure. The system automatically creates the experiment based on the preset template.
      Experiment on monitoring user churn
      Area No. Description
      1 The component in this area imports data from the dataset of the experiment.
      2 The One Hot Encoding-1 and SQL Script-1 components in this area perform feature engineering to convert string-type feature data to numeric-type feature data. For example, the original value of the churn field is Yes or No. The SQL Script-1 component executes an SQL statement to convert Yes to 1 and No to 0.
      select (case churn  when 'Yes' then 1 else 0 end) as churn from  ${t1};
      3 The components in this area divide the dataset into a training dataset and a prediction dataset. A user may churn or not. Therefore, you can use a binary classification algorithm to predict user churn.
      4 The Binary Classification Evaluation-1 component evaluates the quality of the model by using indexes such as an area under curve (AUC), a Kolmogorov-Smirnov (KS) value, and an F1 score.
  3. Run the experiment and view the result.
    1. In the top toolbar of the canvas, click Run.
    2. After the experiment is run, right-click the Binary Classification Evaluation-1 component on the canvas and select View Evaluation Report.
    3. In the Evaluation Report dialog box, click the Indexes tab to view the indexes that are used to evaluate the model.
      Model evaluation reportThe closer the AUC value is to 1, the higher the prediction accuracy of the model is. In the preceding figure, the AUC value is greater than 0.8. This indicates that the prediction accuracy of the model is high.