By Muaaz Bin Sarfaraz, data engineer and Alibaba Cloud Community Blog contributor.
Consider this. In several developing countries in Asia and Africa, although smart phone adaptation is growing quickly, a vast majority of Telecom customers in these regions are non-data users. What that means is these users only make phone calls and do not use mobile data. Even though services like Whatsapp , WeChat, Facebook, and other internet-based messaging and calling platforms have taken a lot of the market share originally enjoyed by Telcom companies, Telcom companies are still interested in increasing the number of data users, especially in developing countries, as this sector has potential for much revenue growth.
Therefore, finding a way to identify customers who are likely to start using mobile data is quite valuable. Most of these potential customers currently either rely on WIFI spots, or other competitor Telcom companies. What this means is that these users probably either use dual-sim phones or have another handset for their data usage.
Once such customers are identified, targeted campaigns and promotions can be delivered to them for upselling data packages and converting these users into data users. Also, by promoting 4G to 3G users Telcom companies can enhance their customer's user experience by giving better speed, and potentially increase a customer's data usage.
So, what exactly can be done to find this information? Well, in this article, we will discuss how you can leverage Telecom data for predicting the potential customers that are currently on 2G/3G sims but have the potential to move to a 4G sim and 4G data usage . We are going to do that by using the machine learning tools provided on Alibaba Cloud's Machine Learning Platform for AI. In this tutorial we will also be using this platform with other Alibaba Cloud products, including the data warehouse solution MaxCompute, the cost-effective storage solution Object Storage Service, and a fully hosted database solution ApsaraDB RDS for MySQL. Based on the information presented in this tutorial, you can use easily a combination of these products for machine learning applications on Alibaba Cloud in the future.
Your relevant audience can be identified by either using data like domain knowledge or human oracle by performing data analysis to device a marketing plan. In this article, we will be focusing on the Machine Learning side of things, where the preprocessed dataset would be fed to a Machine Learning algorithm that would ingest and learn from the data. Our dataset will already be preprocessed in this tutorial, so we won't need to consider preprocessing the dataset. This model can identify relevant customers based on its learning model. Please note that this tutorial was written for learning purposes only.
In reality, a machine learning model would need a training set with the outcome already identified, which is known as a "target". This 'target label' in the case of the Telcom companies we discussed early would be the specific customers who are currently on 4G sim and usage and those customers who have actually converted from 2G/ 3G data usage recently. Naturally, the definition of "recency" could either be agreed upon based on how swiftly the market dynamics change in a specific market, or, alternatively, this could be based on a certain statistical approach. This is a bit complex and we won't be discussing this in this tutorial.
Before we begin this tutorial, it is important that you note the following:
All Telecom operators collect a large amount of data from various systems and certain high level aggregates get stored in their analytical warehouse for answering business questions. Certain progressive Telecom companies have even started to leverage the concept of data lake for advanced analytics needs (as the one in discussion, in this article). For answering the problem statement its quite crucial to bring a 360 degree view of the customer. Below are some important data points that are recommended to be brought forward for modeling:
Also, it is recommended to derive various other features from the ones mentioned above. For example, "call count" can be further broken down and divided into "peak" and "off peak" call count, or "weekday" and "weekend" call count to get a more fine-grained analysis of customer calling and data usage.
For us, when it comes to looking at our data. It is the case that any 4G users that have moved from 2G or 3G over the last three months would be positive samples, whereas any 2G and 3G data use individuals who are using 2G or 3G data would be negative samples. This definition will be used by our Machine Learning Algorithm for learning and training purposes. With this training, the algorithm should be able to make predictions for the next three months.
|Customer ID||Criteria: Did they move from a 2G/3G data usage to 4G data usage over the last 3 months||Target Variable (4G User)|
Select the Machine Learning product from the list of product offerings of Alibaba cloud. In order to avoid coding so that we can do a simple experiment, use PAI Visualization Modeling. With this product you can intuitively use Machine learning through a drag/drop method. However, note that this product does not offer full flexibility and customization. Other products on the platform do offer better flexibility.
You will need to fulfill the three dependencies mentioned below before creating the project:
Later on, fill the project name, alias and a project description and press OK. Doing so will create the project.
Once the project is created, press the Machine Learning button in the Operation column.
In the console, click New, which is located on the top right portion of your screen, and create a new Experiment.
In this tutorial, we will be using some open-source Telecom data, which is already available in the Alibaba Cloud Machine Learning Platform. For an actual enterprise-level application, data would most probably be read from a database. In the case that your database is not on cloud, a snapshot of the dataset can be made available on Alibaba Cloud Object Storage Service. It's a good option because it is one of the cheapest options available. Use the experiment window to drag and drop the components (listed in the subsequent paragraph later) and join them as shown in the diagram shown below.
The following components will be picked for a basic prediction model:
1. Data Source.
Alibaba Cloud is compatible with a large variety of data sources, including OSS Storage, File Data, a MaxCompute Table, or a MySQL Database.
For this tutorial, however, we will be using a public sample dataset, as we discussed before.
2. Data Preprocessing
In order to preprocess your dataset, Alibaba Cloud's Machine Learning Platform offers a range of preprocessing components. For this tutorial, splitting the data into test and train data sets is the only action needed here as the data we are using is already preprocessed. The split setting used is shown above where 80% was used for the training of the model and 20% was a hold out set that was used for final model performance testing.
3. Machine Learning
Alibaba Cloud offers the following Machine Learning models: Gradient Boosted Decision Trees, AdaBoost Binary Classification, Linear Support Vector Machines, Logistic Regression for the purpose of binary classification.
For us, GBDT Binary Classification is the most suitable model. It offers the best performance for what we're looking for. The default GBDT settings are good enough for us.
Ensure that the relevant features are selected in feature column and the relevant Target label is selected in the label column.
This part is relatively self-explanatory.
a. Binary Classification Evaluation
The Binary classification evaluation block was used with its default setting here. You may need to adjust the positive sample label based on how they have marked their target label.
Moreover, the following points regarding preprocessing need to be noted:
Now press the Run button on top panel.
And wait until all the blocks are ticked green. Once the experiment is complete, you can view the results by right clicking on the Binary Classification Evaluation Block and selecting View Evaluation Report.
A sample evaluation report is shown below.
Note that a a dummy dataset was used here that's why the results shown are quite optimistic for a real world setting. We expect noise and tons of variables at play that would swing our target label either positive or negative in a real world setting. The F1-Score should be around 70-85% for a real world dataset where relevant variables and derived variables are fed to the appropriate mode with proper preprocessing.
As a general rule of thumb, if one could predict the binary classes (in a balanced dataset), at an accuracy higher than 50% one is doing better than "a random chance". Once the predictions are ready, those identified customers could be pitched 4G upsell offers through their preferred channel of communication, such as an email promotion or online advertisement.
Alibaba Clouder - June 12, 2018
Alibaba Clouder - January 21, 2020
Alibaba Clouder - July 18, 2018
Alibaba Container Service - October 21, 2019
Alibaba Clouder - September 28, 2017
Alibaba Clouder - May 11, 2017
An end-to-end platform that provides various machine learning algorithms to meet your data mining and analysis requirements.Learn More
Conduct large-scale data warehousing with MaxComputeLearn More
An encrypted and secure cloud storage service which stores, processes and accesses massive amounts of data from anywhere in the worldLearn More
More Posts by Alibaba Clouder