Practice of User Growth Scheme Based on MaxCompute + PAI

This article explains how to use PAI + MaxCompute to complete the AARRR link of the user growth model, including acquisition, activation, retention, revenue, and referral.

By Li Bo, Alibaba Cloud Intelligent Senior Product Expert

Over the past year, the Alibaba Cloud PAI Team has conducted many business-oriented practices, one of which is based on the MaxCompute + PAI product practice to solve the problems related to user growth encountered by customers. This article mainly shares some of the explorations and practices of the Alibaba Cloud Team in the field of user growth. I hope this article can offer some help to everyone in terms of user growth.

I. User Growth Model

AARRR

User growth is aimed more at Internet-type companies, and the business of Internet customers is essentially solving the problem of user growth. There are many models for user growth from a business perspective. Today, we will focus on the AARRR user growth model.

Internet app operators should be very familiar with the AARRR model, which takes the user growth of the entire Internet product as a ring structure. At the top of the list is acquisition, which is important to the business, and the corresponding business indicators are access, download, registration, and follow. Over the past few years, acquisition was very popular because the bonus for Internet users was still there, but now Chinese Internet users have reached a ceiling. The following parts are particularly important for how to improve our products. For example, novel apps are relatively popular now. When the number of users is difficult to grow, it is particularly important to increase the duration of users, so the novel business helps extend the duration of users' stay in the app. The indicators of promotion are login, click browse, and stay duration. The next step is retention. When we cannot obtain new users, we will try to recall our inactive and lost users. MaxCompute + PAI have many classic cases in retention. There's also a lot of work to be done with the AI on how Internet apps create income through traffic and user behavior. The Fission app will pay more attention to sharing indicators.

Across the entire AARRR user growth model, in which module can MaxCompute + PAI do which kind of work? What value can it bring to customers?

MaxCompute + PAI Business Support Architecture

MaxCompute + PAI are used as the base to support user growth. The following figure shows the product architecture.

The computing engine layer is MaxCompute, and the computing engine is the AI scenario. We focus on the business scenario that enables user growth based on the AI capabilities of the PAI Machine Learning Platform for AI. First, we provide an open framework to SQL/PYSpark/Spark to develop an algorithm model based on TensorFlow/PyTorch. The product layer above is the product system of the PAI Machine Learning Platform for AI. The whole product system also serves as the support of our business, including PAI-DLC (cloud-native deep learning run environment). It can package its code training scripts into a mirror package to run in DLC. PAI-Studio (visual modeling) will make operators related to the user growth field modular through simple drag and drop. You can do model training and PAI-DSW (interactive modeling) for the entire user growth. Developers with strong technical capabilities can develop corresponding scripts alone instead of using our packaged scripts. PAI-EAS (model online service) can generate a RESTful API from the models generated by studio and DSW and then call the service through HTTP requests. The generated RESTful requests can support solutions, including advertising RTA solutions, advertising DSP solutions, Artificial Intelligence Recommendation solutions, user recall solutions, and LTV computing solutions. The solution is ultimately to solve the problem of user growth, including new acquisition, activation, retention, revenue, and referral.

II. MaxCompute + PAI Specifies the User Growth Category

User Growth – Acquisition

Currently, acquisition through advertisement is still a core and important means for Internet customers. One popular solution in the advertising industry is RTA. What is the role of MaxCompute + PAI in the RTA solution? First of all, look at the principle of RTA. In the past, if an app wanted acquisition, it would put money on the DSP advertising platform, which would circle users for bidding. The RTA did one thing. When advertisers wanted to control some DSP people, there was no way before. Then, an interface was opened with the support of RTA technology. Every time the advertising platform selected users, it would request a model. The function of this model is to tell the platform whether this user wants it or not. MaxCompute + PAI can help customers generate such a model.

You can use MaxCompute to clean up data, use PAI to train bidding models, and use models to filter users that are worth delivering.

Core Advantages

Powerful Data Computing Capabilities: MaxCompute provides PB-level data computing capabilities.
Rich Algorithms: PAI provides a classic machine learning platform for AI algorithms, such as LR and GBDT, as well as deep learning algorithms, such as DeepFM and MultiTower.

User Growth – Promotion

In the case of a few new users, we hope existing customers can browse our platform longer and click more often. If you open an Internet app, more than 70% of apps have a feed stream recommendation, which can also be called correlation recommendation. The accurate recommendation rate of this system affects the popularity of users on the platform. If the recommended content is what users like to watch and browse, it will naturally increase the number of clicks on the platform (and the stay duration will increase.) For example, popular short video apps in the industry have better personalized recommendation systems. Then, how can you build a recommendation system based on MaxCompute + PAI? As shown in the following figure, you can make a correlation recommendation system based on MaxCompute + PAI + DataWorks + Hologres + Flink.

You need an online service module first to do a good job in a recommendation system. The service module can be divided into multiple recall, filtering, sorting, and cold start. The recall module is to make a coarse screen. For example, when a user comes in, our platform has 10 million commodities in stock. Compare and match this user with 10 million commodities. The calculation amount is very large. The recall means I will make a coarse screen first. For example, I will select hundreds of commodities. At this time, I will rank the hundreds of commodities by this user, and the complexity of the whole calculation will become very low.

How can the two models of recall and sorting be trained with MaxCompute + PAI? As shown on the architecture diagram, we need to upload the three core tables of user behavior logs, user persona data, and material attribute data to MaxCompute and use DataWorks to process a feature of the table to process training samples, user feature data, and material feature data. Next, it goes to PAI-Studio, a built-in modeling platform with a large number of algorithms in recommendation fields, such as PAI-EasyRec, GraphLearn, and Alink. We use the recall algorithm in PAI-Studio to produce some basic recall tables, such as u2i, i2i, and c2i, and put these results into Hologres. This way, we can associate the multi-channel recall service with Hologres to solve the problem of our recall model training.

The sorting service can select the sorting algorithm in the PAI-Studio and produce the sorting model. The sorting model can be deployed into the PAI-EAS and become a RESTful API. This way, the sorting module can request the RESTful API of the sorting model and produce a real-time sorting result return.

After our multiway recall, we can filter out some duplicate products and get a TopN recommendation list after sorting. It can be displayed in the feed stream of the app. The value of MaxCompute + PAI is to complete the data processing and model training of the entire sorting business. This set of relevant recommendation systems will improve the conversion rate of CTR and CVR of the feed stream in our app and help the app improve the popularity and stay duration of users.

User Growth – Retention

When the existing users of an app reach millions, tens of millions, or hundreds of millions, a large number of historical users are stored in the database, but there are users that have not used the app for a period of time. Therefore, under the current difficult situation of the Internet, we need to recall of sleeping and lost users. Currently, the more popular solution in the Internet industry is to recall through SMS because SMS has no limitation (like making phone calls) and will not be intercepted (like push.) For SMS, the effect and probability of reaching users are still relatively high.

Based on MaxCompute + PAI, SMS recall solutions for lost users have been constructed for many industries, such as novels, social networking, games, and other industries.

The general method is to store the user’s buried point data in MaxCompute, do feature processing through DataWorks, and use the PAI Machine Learning Platform for AI platform to train a lost user recall model. Then, we can make a prediction for existing users whether they have a high probability of returning to the app when they are reached by SMS. This way, we can focus on high probability users for SMS recalls, which can save recall cost and improve recall rates.

Customer Case

The customer is a social app with nearly 10 million sleeping users in the library. It will recall user lost through SMS.

PAI Core Value:

After using PAI, the recall rate of millions of short messages increased from 3% to 8%, the effect was 267% improved, and the cost was reduced by about two times.

User Growth – LTV Score Calculation & Share Score Calculation

You can predict LTV scores and share probability scores using PAI + MaxCompute to build a score prediction model.

When an app brings a user through an advertisement, it will care whether the user will pay or not or how much app value will be generated. Some customers need to figure out how much that user will spend on the app in the future as a new user comes in. If this user is a high-value user, it needs to be activated through coupons or subsidies. We have provided the LTV solution. For example, how do we calculate a new app user’s LTV score?

We need to find a third-party data source because the new user has not generated any behavior log in the app. MaxCompute + PAI provides a set of joint modeling solutions that meet the trusted computing standards. In other words, there will be no contact between user data and third-party data. The two-party data can be federated, and a model can be generated within PAI. This model can score LTV for each new user and guide subsequent operation activities for LTV.

Customer Case

The client is a novel platform. For new users, they need to make predictions about purchasing VIP services within 30 days. It can make the operation of new users targeted and improve operation efficiency to predict the future VIP purchase behavior of users when they have little behavior.

The accuracy for judging if new users would purchase VIP is obviously improved. Circling about 40% users as training data, the model generated by federal modeling can identify 67% members who will purchase VIP, improve the operational efficiency of the 67.5% (compared with randomly selected users).

III. Introduction to Practical Operations – Recall of Lost Users

Upload Data to MaxCompute

Run the Tunnel command of MaxCompute to upload data to the project: tunnelupload{file}{table};

Link: https://www.alibabacloud.com/help/doc-detail/137663.htm

Build a Workflow

Go to the PAI-Studio to build the workflow:

Build Training Samples: Users That Do Not Log in for Seven Days Are Lost

You can determine which users are 7-day non-login users by filtering the registration date and last login time.

Feature Processing

Turn data into structured data through processing:

One-Hot Encoding

One-Hot Encoding can convert category variables into forms that are easy to use with the Machine Learning Platform for AI algorithms. The format after One-Hot conversion is shown in the following figure:

Model Training and Evaluation

After logistic regression model training, the PAI platform has dozens of classification models. Judging whether texting can recall can be defined as a dichotomous problem, yes/no. You can use the binary classification algorithm to train the model. After the logical model is trained, we can get the model effect using some data as test data. We generate a model evaluation report under the binary classification evaluation; the larger the area of the ROC value, the better the model works.

Model Prediction

After the model is generated, we can deploy the model as a RESTful service for business parties or operations personnel to call. The following figure shows the calling format:

Community

Practice of User Growth Scheme Based on MaxCompute + PAI

I. User Growth Model

AARRR

MaxCompute + PAI Business Support Architecture

II. MaxCompute + PAI Specifies the User Growth Category

User Growth – Acquisition

Core Advantages

User Growth – Promotion

User Growth – Retention

Customer Case

User Growth – LTV Score Calculation & Share Score Calculation

Customer Case

III. Introduction to Practical Operations – Recall of Lost Users

Upload Data to MaxCompute

Build a Workflow

Build Training Samples: Users That Do Not Log in for Seven Days Are Lost

Feature Processing

One-Hot Encoding

Model Training and Evaluation

Model Prediction

Read previous post:

Read next post:

Alibaba Cloud MaxCompute

You may also like

Comments

Alibaba Cloud MaxCompute

Related Products

Big Data Consulting for Data Technology Solution

MaxCompute

Big Data Consulting Services for Retail Solution

DataWorks