Alibaba Cloud Function Compute Helps to Upgrade the Architecture of Amap RTA Advertising System

By Qingjie Zhao (Alibaba Cloud Function Compute), Xueqing Lin (Alibaba Cloud Function Compute), Lingling Du (Amap), and Bicheng Wang (Amap)

Background

What is RTA?

RTA is a real-time advertising program interface that enables real-time advertising optimization by utilizing the data and model capabilities of both media and advertisers. It is not just an interface technology but also a delivery capability that is focused on strategy.

Advertising media can inquire about placing advertisements using Amap's RTA interface. RTA services query Amap's crowd information and return placement results, making media advertising more accurate.

Original System Architecture and Issues

The original system utilized a large number of servers and had a long dependency chain. Whenever the system expanded, the dependent services were required to be expanded accordingly, resulting in a considerable amount of resources being utilized.

Technology Selection

Crowd Hit Function

The crowd hit function is essentially a problem of determining whether an element is in a set. The industry commonly uses Bloom filters to solve this problem. Bloom filters combine hash algorithms and bitmaps to achieve high query efficiency and require minimal space. Redis Extended Edition provides the bf (bloom filter) feature. Since golang is used for reading and Java is used for writing, the bf (bloom filter) function of Redis extended version can be used to implement the bf function on the Redis server to ensure data consistency in different languages.

Redis is used to implement the crowd hit function, eliminating the need for an algorithm gateway and saving resources on the data middle platform.

Data Synchronization

The audience selection platform currently has four types of data updates: online, real-time, offline single-time, and offline periodic.

Currently, the audience selection strategy is based on the selection of offline people. Although it is possible to use online and real-time situations in the future, the real-time requirements are not high due to the large size of the population delineated by RTA advertisements, the low proportion of real-time population changes, and the media side's cache, which has low real-time requirements. If real-time performance is necessary, consider offline updates in the hour dimension (real-time performance depends on the UDF update frequency and trigger method). The offline periodic update of Redis can also be considered.

Serverless

Why Serverless

By separating the interface between applications and platforms, Serverless allows businesses to focus on their core business logic. It enables businesses to quickly develop and deploy stable, secure, elastic, and scalable distributed applications.

How to Implement Serverless

In the new technology selection, the engine service needs to access Redis, which raises the question of how to make a system serverless with high-frequency storage access.

Serverless is commonly considered to be FaaS + BaaS. FaaS (Function as a Service) generally refers to various backend microservices, while BaaS (Backend as a Service) refers to backend services that are not suitable for FaaS, such as storage services.

A serverless system architecture has high requirements for cloud storage. Cloud storage needs to be able to achieve automatic scaling capabilities that are equal to or close to those of applications in terms of scalability, latency, and IOPS.

Alibaba Cloud provides the ApsaraDB for Redis Enterprise Edition, which features a variety of instance types supporting a maximum bandwidth of 2 GB and 60 million QPS. The architecture and specifications of an instance can be adjusted to meet different performance and capacity requirements and implement non-sensitive scaling. This can satisfy the storage requirements after the engine service adopts serverless.

FaaS is the most common technology for serverless backend microservices.

High Performance Requirements

The RTA advertising delivery system provides related services for external media and has large traffic and high latency requirements. It is a typical high-performance requirement scenario, where the timeout period set by the client is generally very short. Once the timeout period is reached, the interface call will fail.

After adopting a serverless architecture, request traffic is first sent to the Alibaba Cloud Function Compute system and then forwarded to the function instance for processing. In this scenario, Function Compute must control the average system time (excluding the execution time of the function itself) and the P99 value of request processing to a very low level to ensure the request success rate SLA requirements in situations of multi-tenancy and high traffic.

Landing Scheme

Architecture

In the new architecture, after the middle platform generates a crowd, it calls Redis BF.INSERT and other instructions to generate bf. After the engine obtains the device ID, it uses the BF.EXISTS command to determine if the device is in the corresponding population.

Features:

Eliminates the gateway and reduces link length.
Establishes a cache to decouple offline and online systems, thereby improving performance.
Uses data compression to reduce memory usage.
Enables serverless computing, facilitating real-time elasticity and O&M-free, and accelerating application iteration.

Request scheduling

Earlier, we mentioned that the RTA advertising system has high-performance requirements due to its large traffic and high latency. Alibaba Cloud Function Compute is a multi-tenancy system that not only supports the advertising function of RTA but also many other business functions. This puts high demands on the request scheduling of Function Compute.

There is no upper limit on the QPS of a single function. A large number of long-tail functions do not consume resources.
Scheduling services must be highly available, and single points of failure should not impact services.
The system time required for request processing must be controlled to an average value of less than 2ms and a P99 value of less than 10ms.

Let's see how Function Compute was able to meet these demands.

To realize real-time elasticity, when the request of the function reaches the front-end machine of the Function Compute, the front-end machine finds the scheduling node (Partitionworker) to process the request and forwards the request to it. After the scheduling node receives the request, if an instance is available, it obtains an instance according to the load balancing policy and returns it to the front-end machine; if not, it creates one in real-time and returns it to the front-end machine. The instance creation time can take hundreds of milliseconds.

To ensure high availability and horizontal scalability, the scheduling node uses a partitioned architecture.
Requests of the same user/function are mapped in contiguous shard regions.
Single function requests can span multiple shards and scale out.
The scheduling node (Partitionworker) reports the shard and node status to the shard manager (Partitionmaster) through the heartbeat.
Partition master performs load balancing by moving/splitting/merging shards.
Scheduling 1 million functions, single function maximum peak 200,000 TPS, scheduling latency less than 1ms.
If any node fails, requests will be routed to other Partitionworker without affecting availability.

As we can see, a request needs to go through the front-end machine and scheduling node before being forwarded to a specific function instance. This means that the system time consumption of request processing includes the processing time of the front-end machine, scheduling node, communication time between the front-end machine and scheduling node, and communication time between the front-end machine and function instance.

Over the past year, we have made lots of optimizations to the front-end machine and scheduling system of Function Compute, ensuring that the system can handle large traffic. The system time required for request processing must be kept to an average value of less than 2ms and a P99 value of less than 10ms.

Resource Delivery

In serverless scenarios, businesses no longer need to manage resources as the platform is responsible for resource management and scheduling. As business traffic increases, the platform needs to quickly and reliably deliver the necessary computing resources, and when traffic decreases, the platform must automatically release idle resources.

To ensure the reliable delivery of resources for functions, including the Amap RTA advertising function, Alibaba Cloud Function Compute has continuously optimized resource management implementation.

New Serverless Base: Bare Metal + Security Container

Initially, Function Compute instances were delivered in the form of Docker containers, which had security risks such as container escape storage. To ensure security, each host could only deploy functions for one tenant. However, as Function Compute has a large number of long-tail functions, the specifications of function instances are often relatively small, limiting the improvement of resource utilization, for example, 128M/0.1 cores.

To address this issue, Alibaba Cloud Function Compute worked with relevant teams to upgrade the resource base to ECS Bare Metal + Secure Container. This upgrade improved virtualization efficiency and security assurance by using hardware integration technology and secure containers, enabling multi-tenant high-density hybrid deployment and greatly improving resource utilization.

Independent Resource Control

In Kubernetes clusters, the pod output efficiency can hardly meet the requirements of creating tens of thousands of instances per minute in Serverless scenarios. Therefore, Function Compute works with relevant teams to further subdivide computing resources in pods. Function Compute directly controls containers in pods, achieving high-density deployment and high-frequency creation.

Millisecond-level Resource Delivery Speed

In Serverless scenarios, resource delivery needs to be accelerated to seconds and milliseconds. To resolve the contradiction between the time-consuming startup of Kubernetes infrastructure and the extreme elasticity demand of Function Compute, FC implements technologies like pod pooling, image acceleration, image preheating, and computing instance recycling to ensure extremely fast resource delivery.

High Availability

To achieve high availability, Alibaba Cloud Function Compute distributes FC resources across multiple Kubernetes clusters in each region, ensuring automatic switching to a normal cluster in case of any problem. Each cluster has exclusive, hybrid, and preemptible resource pool types, and Function Compute performs unified scheduling based on business characteristics, further reducing costs.

Delivery SLA

Alibaba Cloud Function Compute has delivered tens of thousands of instances for a single function in terms of the total amount of resources delivered. With dynamic resource pool supplementation, a single function of FC can theoretically deliver far more instances than tens of thousands. Function Compute can achieve instance creation speeds of 100 milliseconds in terms of resource delivery speed. In cases of traffic spikes, FC controls resource delivery speed from the following two dimensions:

Bursting instances: the number of instances that can be created immediately (default 300)
Instance growth rate: the number of instances that can be added per minute after exceeding the number of instances (default 300 per minute)

These parameters are adjustable. The figure below shows the flow control behavior of FC in scenarios of rapidly increasing call numbers.

Multi-data Center Deployment

The system uses three-unit deployment to ensure that external media can be accessed nearby, reducing network latency.

Business Effects

After upgrading the system architecture, thousands of machine resources were saved, and full serverless was implemented. Call links were shortened, and the system became more elastic, robust, and easy to maintain. As a result, good business outcomes were achieved.

Community