Community Blog Predicting Potential Customers Using Alibaba Cloud's Machine Learning Platform

Predicting Potential Customers Using Alibaba Cloud's Machine Learning Platform

In this article, we will show you how you can find potential customers using Alibaba Cloud's Machine Learning Platform.

By Muaaz Bin Sarfaraz, data engineer and Alibaba Cloud Community Blog contributor.

Consider this. In several developing countries in Asia and Africa, although smart phone adaptation is growing quickly, a vast majority of Telecom customers in these regions are non-data users. What that means is these users only make phone calls and do not use mobile data. Even though services like Whatsapp , WeChat, Facebook, and other internet-based messaging and calling platforms have taken a lot of the market share originally enjoyed by Telcom companies, Telcom companies are still interested in increasing the number of data users, especially in developing countries, as this sector has potential for much revenue growth.

Therefore, finding a way to identify customers who are likely to start using mobile data is quite valuable. Most of these potential customers currently either rely on WIFI spots, or other competitor Telcom companies. What this means is that these users probably either use dual-sim phones or have another handset for their data usage.

Once such customers are identified, targeted campaigns and promotions can be delivered to them for upselling data packages and converting these users into data users. Also, by promoting 4G to 3G users Telcom companies can enhance their customer's user experience by giving better speed, and potentially increase a customer's data usage.

So, what exactly can be done to find this information? Well, in this article, we will discuss how you can leverage Telecom data for predicting the potential customers that are currently on 2G/3G sims but have the potential to move to a 4G sim and 4G data usage . We are going to do that by using the machine learning tools provided on Alibaba Cloud's Machine Learning Platform for AI. In this tutorial we will also be using this platform with other Alibaba Cloud products, including the data warehouse solution MaxCompute, the cost-effective storage solution Object Storage Service, and a fully hosted database solution ApsaraDB RDS for MySQL. Based on the information presented in this tutorial, you can use easily a combination of these products for machine learning applications on Alibaba Cloud in the future.

Our Approach

Your relevant audience can be identified by either using data like domain knowledge or human oracle by performing data analysis to device a marketing plan. In this article, we will be focusing on the Machine Learning side of things, where the preprocessed dataset would be fed to a Machine Learning algorithm that would ingest and learn from the data. Our dataset will already be preprocessed in this tutorial, so we won't need to consider preprocessing the dataset. This model can identify relevant customers based on its learning model. Please note that this tutorial was written for learning purposes only.

In reality, a machine learning model would need a training set with the outcome already identified, which is known as a "target". This 'target label' in the case of the Telcom companies we discussed early would be the specific customers who are currently on 4G sim and usage and those customers who have actually converted from 2G/ 3G data usage recently. Naturally, the definition of "recency" could either be agreed upon based on how swiftly the market dynamics change in a specific market, or, alternatively, this could be based on a certain statistical approach. This is a bit complex and we won't be discussing this in this tutorial.


Before we begin this tutorial, it is important that you note the following:

  1. The ideas presented in this article are entirely my own and do not represent any company's strategy, predictive model, dataset, algorithm and method.
  2. The data set used in the article is a dummy dataset provided within the Alibaba Cloud Machine Learning sample data sources.
  3. Telecom company regulations for certain countries might not allow one to host the data outside their own country and in that situation kindly use an Alibaba Cloud data center that is hosted within their own country. If that's not the case, one should discuss with the regulator before uploading the data outside the geographical boundary of their country.
  4. To ensure data privacy, it is recommended to remove all of customer's personally identifiable data point and anonymize the dataset


All Telecom operators collect a large amount of data from various systems and certain high level aggregates get stored in their analytical warehouse for answering business questions. Certain progressive Telecom companies have even started to leverage the concept of data lake for advanced analytics needs (as the one in discussion, in this article). For answering the problem statement its quite crucial to bring a 360 degree view of the customer. Below are some important data points that are recommended to be brought forward for modeling:


  1. Date
  2. Handset Category (Basic Smartphone, LTE Smart Phone, Feature Phone)
  3. 3G Data Usage
  4. 2G Data Usage
  5. 4G Data Usage
  6. Top 10 Websites browsed
  7. Top 10 Apps used
  8. Calls made within the network
  9. Calls made outside the network
  10. Calls received from within the network
  11. Calls received from outside the network
  12. SMS sent outside the network
  13. SMS sent within the network
  14. SMS received from outside the network
  15. International Incoming Calls
  16. International Outgoing Calls
  17. International SMS sent
  18. International SMS received
  19. Short Code Count (Using Manual catalog of SMS short-codes For example: SMS received from 5678 is for specific Bank Customers)
  20. Most Used Cell Site
  21. Most Used Cell Site Type (2G, 3G, 4G)
  22. Recharge Amount
  23. Current Balance
  24. Customer Psychographic Details
  25. Customer Demographic Details

Also, it is recommended to derive various other features from the ones mentioned above. For example, "call count" can be further broken down and divided into "peak" and "off peak" call count, or "weekday" and "weekend" call count to get a more fine-grained analysis of customer calling and data usage.

For us, when it comes to looking at our data. It is the case that any 4G users that have moved from 2G or 3G over the last three months would be positive samples, whereas any 2G and 3G data use individuals who are using 2G or 3G data would be negative samples. This definition will be used by our Machine Learning Algorithm for learning and training purposes. With this training, the algorithm should be able to make predictions for the next three months.

Customer ID Criteria: Did they move from a 2G/3G data usage to 4G data usage over the last 3 months Target Variable (4G User)
1 Yes +ve
2 No -ve
3 No -ve
4 No -ve
5 Yes +ve

Alibaba Cloud Machine Learning Platform

Product Selection

Select the Machine Learning product from the list of product offerings of Alibaba cloud. In order to avoid coding so that we can do a simple experiment, use PAI Visualization Modeling. With this product you can intuitively use Machine learning through a drag/drop method. However, note that this product does not offer full flexibility and customization. Other products on the platform do offer better flexibility.

Step 1: Creating a Project


You will need to fulfill the three dependencies mentioned below before creating the project:

  1. Name verification
  2. Access key token
  3. MaxCompute (For machine learning exercises like this, I recommend that you purchase this product, and do so with pay-as-you-go billing, as this is a more cost-effective billing model, as you only pay for what you use, nothing else.)

Later on, fill the project name, alias and a project description and press OK. Doing so will create the project.

Step 2: Starting up the Model


Once the project is created, press the Machine Learning button in the Operation column.

Step 3: Creating a New Experiment


In the console, click New, which is located on the top right portion of your screen, and create a new Experiment.

Dragging the Components

In this tutorial, we will be using some open-source Telecom data, which is already available in the Alibaba Cloud Machine Learning Platform. For an actual enterprise-level application, data would most probably be read from a database. In the case that your database is not on cloud, a snapshot of the dataset can be made available on Alibaba Cloud Object Storage Service. It's a good option because it is one of the cheapest options available. Use the experiment window to drag and drop the components (listed in the subsequent paragraph later) and join them as shown in the diagram shown below.


The following components will be picked for a basic prediction model:

1. Data Source.

Alibaba Cloud is compatible with a large variety of data sources, including OSS Storage, File Data, a MaxCompute Table, or a MySQL Database.

For this tutorial, however, we will be using a public sample dataset, as we discussed before.


2. Data Preprocessing

  1. For this, you'll need to choose the Split option, to split into a test and train data set.


In order to preprocess your dataset, Alibaba Cloud's Machine Learning Platform offers a range of preprocessing components. For this tutorial, splitting the data into test and train data sets is the only action needed here as the data we are using is already preprocessed. The split setting used is shown above where 80% was used for the training of the model and 20% was a hold out set that was used for final model performance testing.

3. Machine Learning

  1. We will choose GBDT Binary Classification as our machine learning model. Just as we did in our case, the target classes are binary +ve or –ve.)

Alibaba Cloud offers the following Machine Learning models: Gradient Boosted Decision Trees, AdaBoost Binary Classification, Linear Support Vector Machines, Logistic Regression for the purpose of binary classification.

For us, GBDT Binary Classification is the most suitable model. It offers the best performance for what we're looking for. The default GBDT settings are good enough for us.


Ensure that the relevant features are selected in feature column and the relevant Target label is selected in the label column.


4. Prediction

This part is relatively self-explanatory.

5. Evaluation

a. Binary Classification Evaluation


The Binary classification evaluation block was used with its default setting here. You may need to adjust the positive sample label based on how they have marked their target label.

Moreover, the following points regarding preprocessing need to be noted:

  1. The Target class is expected to be imbalanced, hence ensure the final dataset is balanced or model is not bias towards target class imbalance.
  2. The Categorical Column needs to be one-hot encoded or categorized for model ingestion.
  3. The Target label to be calculated is based on the definition suitable for your own audience and market dynamics.

Now press the Run button on top panel.


And wait until all the blocks are ticked green. Once the experiment is complete, you can view the results by right clicking on the Binary Classification Evaluation Block and selecting View Evaluation Report.

A sample evaluation report is shown below.


Note that a a dummy dataset was used here that's why the results shown are quite optimistic for a real world setting. We expect noise and tons of variables at play that would swing our target label either positive or negative in a real world setting. The F1-Score should be around 70-85% for a real world dataset where relevant variables and derived variables are fed to the appropriate mode with proper preprocessing.

As a general rule of thumb, if one could predict the binary classes (in a balanced dataset), at an accuracy higher than 50% one is doing better than "a random chance". Once the predictions are ready, those identified customers could be pitched 4G upsell offers through their preferred channel of communication, such as an email promotion or online advertisement.

0 0 0
Share on

Alibaba Clouder

2,600 posts | 754 followers

You may also like