The Path of Serverless Exploration of Algorithm Platform
Serverless Exploration of TapTap Algorithm Platform
Serverless has saved us a lot of O&M and development manpower in building applications. Without investing in infrastructure manpower, it has directly brought our very original infrastructure, or resource management level, to a relatively cutting-edge standard in the industry. The most intuitive data is that our team can provide a full set of AI and big data support for all the businesses related to the entire search, promotion and promotion of TapTap with only a single digit of manpower———— Chen Xinhao
Introduction
Xinxin, founded in 2003, is a global game developer and distributor with rich experience in R&D, distribution and agency operation. By the middle of 2022, Heartbeat has operated 38 free and paid games, with 50 million active users worldwide, mainly in Greater China, Southeast Asia, North America and South America. In 2016, Heart launched the mobile game community and application store TapTap. Players can buy and download mobile games for free or pay through official channels, and can also communicate with other players in the community. By June 2022, TapTap has more than 50 million active users worldwide.
Business background
TapTap is different from the traditional sharing model of app stores, and has always insisted on zero channel sharing, which also determines that the current commercialization of TapTap is mainly driven by advertising. TapTap's ads belong to the native ads in the site, which are highly consistent with other non-commercial forms in content and give users a better experience. For example, the game recommendation on the home page, the content recommendation on the discovery page, the shading words on the search guide page, the search suggestion words that will appear during the search input, and the final landing page of the search, etc. The advertising part is interspersed between these strategic contents.
Our serverless practice is also based on the actual needs of these business scenarios. For example, the automatic update/deployment of the deep learning model on which the current search, broadcast and promotion all rely, and the model experiment recording platform on which the algorithm students in the group need to rely, as well as some NLP analysis and processing of the new content in the station.
In the early days, most of our back-end services were deployed in ECS, managed and deployed through Rundeck, which was not ideal in terms of efficiency and management. In terms of the needs of the infrastructure upgrading plan, I summarized four points:
● Can significantly improve the efficiency of development, operation and maintenance
● Meet business needs with lower labor costs
● The service is reliable enough to have good performance
● Because our project is mainly based on Go language, we need to have good support for Go in the subsequent infrastructure upgrading.
Scheme comparison
We have considered two mainstream solution architectures, one is the complete solution of virtual machine+self-built K8s, and the other is the Serverless architecture, which uses the Severess application engine (SAE) and function to calculate FC.
After comparison, we chose the latter. On the one hand, Serverless can dispense with the purchase process of the machine and does not need to purchase ECS in advance. It also comes with some optional default environments. If there are no special requirements, it can basically eliminate the complexity of environment construction; On the other hand, Serverless has integrated many basic components. Basically, it can be said that it can go online without operation and maintenance.
Then in the subsequent maintenance, the Serverless product has a higher accuracy in billing accuracy than ECS. It can achieve minute or even second billing, and only pay when the real business uses resources. Compared with the K8s+ECS model, it can save a lot of labor costs in the early development and subsequent operation and maintenance.
We can understand the two products of Serverless from our own experience of actual experiments.
The function calculation FC decouples the scheduling and triggering logic of the business from the business logic itself. The development and algorithm students can control the triggering and scheduling logic of the entire business logic at the function calculation console first, so there is no need for further development. They can focus more on the design of the business logic itself, which also determines that the function calculation is more suitable for the scenarios with business drivers, Apply for resources to run business logic when the event actually occurs.
In our view, the Serverless application engine SAE is similar to the enhanced K8s with richer functions and a full set of micro-service capabilities, which can greatly reduce maintenance costs and achieve real out-of-the-box use. This is more suitable for microservice transformation. By directly migrating the old services on ECS, you can obtain a complete set of containerized O&M solutions without investing in O&M manpower.
Basically, by combining the two, we can cover most of our business scenarios and realize all application services All On Serverless.
Business Practice
Function calculation FC
1) Full-automatic model deployment/hourly update service triggered by OSS.
We have a model automatic deployment and update service triggered by OSS to realize model export and deployment. After training their own models, whether TensorFlow, PyTorch or other machine learning models, algorithm students only need to export them to the specified OSS B storage space ucket, which will trigger the update and deployment service of the model, and achieve complete export and deployment. In this way, even without relying on other engineering manpower, the algorithm students can deploy, update and expand the model by themselves.
2) Model experiment management platform triggered by HTTP (WEB service)
After the algorithm students submit the model training task through the internal model experiment management and parameter platform realized by HTTP trigger, we will automatically record its training parameters, log addresses, and log instances to realize the traceability and manageability of all the experiments. This is a Web service in itself. It has a front-end, but it is also an internal service. The requirements for QPS and performance are not very high, so we put it on the function calculation, It has considerable advantages in management costs, especially the recent free quota for function calculation, so it is basically free.
3) Trigger NLP processing/parsing service of new content through Kafka
When a user in our site sends a new post, we will push it to the NLP analysis service provider through Kafka for NLP processing and analysis, and save it for later search, which can realize that users can send a content to call a service, and accurately control the cost.
4) Weekly/daily statistics of resource consumption
For MaxCompute and EAS resource consumption statistics that are triggered regularly every week/day, we will automatically pull the unstructured consumption bill of Alibaba Cloud background, and then aggregate it to each student, each task and each model, and push it to the students in the group, helping the students in the group to improve their cost awareness, and also helping each business line to do better cost management.
Serverless application engine SAE
On the landing of SAE, we chose the prediction service within the group. This service itself integrates the ability of model reasoning, feature development and sample return required by search, recommendation and advertising. It is a mid-level micro-service. All business lines can access the most mature online prediction service in the group at a very low cost. For example, the click-through rate of the recommendation on the current search page is estimated, and the click-through rate of the international version of the game is estimated.
Through SAE, our service has rapidly acquired the ability of Serverless, because SAE itself has shielded many resource management, environmental management and basic operation and maintenance component management work, enabling us to quickly launch an independent set of estimated services for new scenarios and new businesses at home and abroad.
At the same time, we have also integrated SAE's alarm platform, event center and log service. We can sense the status of online business in real time by nailing the alarm, such as whether there is OOM or restart, error log, and so on.
In addition, the service itself is also connected to the Dubbo Go framework, which enables the service to directly provide micro-service capabilities such as service registration discovery, IP direct connection, and elegant online and offline. Compared with the previous mode of using ECS, this scheme has great advantages in operation and maintenance management, development and online and subsequent cost control. It can basically cover the whole process of follow-up operation and maintenance from development and online, and greatly save the development cost within the group.
Business value
Simple operation and maintenance, save time and effort: development can easily handle the whole process of application development, deployment and management, make yourself more focused on business, and greatly save the investment and cost of operation and maintenance.
Non-stop release+minute-level launch: SAE supports the ability of grayscale release and rolling release, and also provides a relatively complete Open API, which can be integrated into Git for rapid deployment, so that our service has the ability of minute-level release, which is particularly attractive for new businesses.
Second-level elastic scaling: SAE supports the configuration of scaling strategies with different dimensions such as CPU, memory, QPS, RT, and timing, which can help improve resource utilization. Especially after the business scale is large, the machine cost can be significantly reduced by configuring more sophisticated elastic strategies.
Multi-language micro-service capability: SAE provides PHP, Python, GO and other runtime services, and realizes the low-cost micro-service of Go language based on the K8s Service multilingual service registration discovery.
Serverless has saved us a lot of O&M and development manpower in building applications. Without investing in infrastructure manpower, it has directly brought our very original infrastructure, or resource management level, to a relatively cutting-edge standard in the industry. The most intuitive data is that our team can provide a full set of AI and big data support for all the businesses related to the entire search, promotion and promotion of TapTap with only a single digit of manpower———— Chen Xinhao
Introduction
Xinxin, founded in 2003, is a global game developer and distributor with rich experience in R&D, distribution and agency operation. By the middle of 2022, Heartbeat has operated 38 free and paid games, with 50 million active users worldwide, mainly in Greater China, Southeast Asia, North America and South America. In 2016, Heart launched the mobile game community and application store TapTap. Players can buy and download mobile games for free or pay through official channels, and can also communicate with other players in the community. By June 2022, TapTap has more than 50 million active users worldwide.
Business background
TapTap is different from the traditional sharing model of app stores, and has always insisted on zero channel sharing, which also determines that the current commercialization of TapTap is mainly driven by advertising. TapTap's ads belong to the native ads in the site, which are highly consistent with other non-commercial forms in content and give users a better experience. For example, the game recommendation on the home page, the content recommendation on the discovery page, the shading words on the search guide page, the search suggestion words that will appear during the search input, and the final landing page of the search, etc. The advertising part is interspersed between these strategic contents.
Our serverless practice is also based on the actual needs of these business scenarios. For example, the automatic update/deployment of the deep learning model on which the current search, broadcast and promotion all rely, and the model experiment recording platform on which the algorithm students in the group need to rely, as well as some NLP analysis and processing of the new content in the station.
In the early days, most of our back-end services were deployed in ECS, managed and deployed through Rundeck, which was not ideal in terms of efficiency and management. In terms of the needs of the infrastructure upgrading plan, I summarized four points:
● Can significantly improve the efficiency of development, operation and maintenance
● Meet business needs with lower labor costs
● The service is reliable enough to have good performance
● Because our project is mainly based on Go language, we need to have good support for Go in the subsequent infrastructure upgrading.
Scheme comparison
We have considered two mainstream solution architectures, one is the complete solution of virtual machine+self-built K8s, and the other is the Serverless architecture, which uses the Severess application engine (SAE) and function to calculate FC.
After comparison, we chose the latter. On the one hand, Serverless can dispense with the purchase process of the machine and does not need to purchase ECS in advance. It also comes with some optional default environments. If there are no special requirements, it can basically eliminate the complexity of environment construction; On the other hand, Serverless has integrated many basic components. Basically, it can be said that it can go online without operation and maintenance.
Then in the subsequent maintenance, the Serverless product has a higher accuracy in billing accuracy than ECS. It can achieve minute or even second billing, and only pay when the real business uses resources. Compared with the K8s+ECS model, it can save a lot of labor costs in the early development and subsequent operation and maintenance.
We can understand the two products of Serverless from our own experience of actual experiments.
The function calculation FC decouples the scheduling and triggering logic of the business from the business logic itself. The development and algorithm students can control the triggering and scheduling logic of the entire business logic at the function calculation console first, so there is no need for further development. They can focus more on the design of the business logic itself, which also determines that the function calculation is more suitable for the scenarios with business drivers, Apply for resources to run business logic when the event actually occurs.
In our view, the Serverless application engine SAE is similar to the enhanced K8s with richer functions and a full set of micro-service capabilities, which can greatly reduce maintenance costs and achieve real out-of-the-box use. This is more suitable for microservice transformation. By directly migrating the old services on ECS, you can obtain a complete set of containerized O&M solutions without investing in O&M manpower.
Basically, by combining the two, we can cover most of our business scenarios and realize all application services All On Serverless.
Business Practice
Function calculation FC
1) Full-automatic model deployment/hourly update service triggered by OSS.
We have a model automatic deployment and update service triggered by OSS to realize model export and deployment. After training their own models, whether TensorFlow, PyTorch or other machine learning models, algorithm students only need to export them to the specified OSS B storage space ucket, which will trigger the update and deployment service of the model, and achieve complete export and deployment. In this way, even without relying on other engineering manpower, the algorithm students can deploy, update and expand the model by themselves.
2) Model experiment management platform triggered by HTTP (WEB service)
After the algorithm students submit the model training task through the internal model experiment management and parameter platform realized by HTTP trigger, we will automatically record its training parameters, log addresses, and log instances to realize the traceability and manageability of all the experiments. This is a Web service in itself. It has a front-end, but it is also an internal service. The requirements for QPS and performance are not very high, so we put it on the function calculation, It has considerable advantages in management costs, especially the recent free quota for function calculation, so it is basically free.
3) Trigger NLP processing/parsing service of new content through Kafka
When a user in our site sends a new post, we will push it to the NLP analysis service provider through Kafka for NLP processing and analysis, and save it for later search, which can realize that users can send a content to call a service, and accurately control the cost.
4) Weekly/daily statistics of resource consumption
For MaxCompute and EAS resource consumption statistics that are triggered regularly every week/day, we will automatically pull the unstructured consumption bill of Alibaba Cloud background, and then aggregate it to each student, each task and each model, and push it to the students in the group, helping the students in the group to improve their cost awareness, and also helping each business line to do better cost management.
Serverless application engine SAE
On the landing of SAE, we chose the prediction service within the group. This service itself integrates the ability of model reasoning, feature development and sample return required by search, recommendation and advertising. It is a mid-level micro-service. All business lines can access the most mature online prediction service in the group at a very low cost. For example, the click-through rate of the recommendation on the current search page is estimated, and the click-through rate of the international version of the game is estimated.
Through SAE, our service has rapidly acquired the ability of Serverless, because SAE itself has shielded many resource management, environmental management and basic operation and maintenance component management work, enabling us to quickly launch an independent set of estimated services for new scenarios and new businesses at home and abroad.
At the same time, we have also integrated SAE's alarm platform, event center and log service. We can sense the status of online business in real time by nailing the alarm, such as whether there is OOM or restart, error log, and so on.
In addition, the service itself is also connected to the Dubbo Go framework, which enables the service to directly provide micro-service capabilities such as service registration discovery, IP direct connection, and elegant online and offline. Compared with the previous mode of using ECS, this scheme has great advantages in operation and maintenance management, development and online and subsequent cost control. It can basically cover the whole process of follow-up operation and maintenance from development and online, and greatly save the development cost within the group.
Business value
Simple operation and maintenance, save time and effort: development can easily handle the whole process of application development, deployment and management, make yourself more focused on business, and greatly save the investment and cost of operation and maintenance.
Non-stop release+minute-level launch: SAE supports the ability of grayscale release and rolling release, and also provides a relatively complete Open API, which can be integrated into Git for rapid deployment, so that our service has the ability of minute-level release, which is particularly attractive for new businesses.
Second-level elastic scaling: SAE supports the configuration of scaling strategies with different dimensions such as CPU, memory, QPS, RT, and timing, which can help improve resource utilization. Especially after the business scale is large, the machine cost can be significantly reduced by configuring more sophisticated elastic strategies.
Multi-language micro-service capability: SAE provides PHP, Python, GO and other runtime services, and realizes the low-cost micro-service of Go language based on the K8s Service multilingual service registration discovery.
Related Articles
-
A detailed explanation of Hadoop core architecture HDFS
Knowledge Base Team
-
What Does IOT Mean
Knowledge Base Team
-
6 Optional Technologies for Data Storage
Knowledge Base Team
-
What Is Blockchain Technology
Knowledge Base Team
Explore More Special Offers
-
Short Message Service(SMS) & Mail Service
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00