An Atypical FaaS Architecture with Typical FaaS Capabilities

This article provides a detailed description of the FaaS container architecture, service release, service routing, and powerful serverless capabilities of NBF-FaaS.

By Feng Weifeng, nicknamed Zhugejin at Alibaba.

In as early as 2014, function as a service (FaaS) was made popular by AWS Lamba. Currently, FaaS has grown as an important part of the new-retail business framework (NBF). NBF has an atypical FaaS architecture with all the typical FaaS capabilities. This article provides a detailed description of the FaaS container architecture, service release, service routing, and powerful serverless capabilities of NBF-FaaS. It also shares some of the practical experience we have gained by using NBF-FaaS during some of the major promotion events on Alibaba's e-commerce platforms.

NBF

The new-retail business framework (NBF) is an open framework developed by the supply chain infrastructure technology team for new-retail services. It provides a standardized service definition, rapid service development capabilities, and an open ecosystem. It aims to provide partners with a complete set of new-retail platform as a service (PaaS) and software as a service (SaaS) solutions.

FaaS

Definition

Function as a service (FaaS) is the typical service form of a serverless system. A Serverless platform can be used to achieve industry-standard practices in such areas as load balancing, high availability, automatic scaling, and service governance. Through these practices and having them be completely transparent to developers, serveless computing further reduces the time needed to bring a product to go from the on-paper planning stages to actual deployment, reduces development costs and ensuring the reliability of the services. Developers use events such as HTTP requests and messages to trigger functions.

The Typical Architecture

Event Sources Function is an event-driven set.
Function Instances provide service functions or microservices.
The FaaS Controller manages function control services, such as API Gateway or Backend For Front (BFF).
Platform Service Functions relay on platform services, such as privilege management APIs and OSS.

The NBF-FaaS Architecture

NBF - Platform Services

The new-retail business framework (NBF) platform capabilities are mainly divided into three layers:

1. Serverless platform - Cloud Service Engine (CSE): Serverless is instrumental to a FaaS platform. This is why NBF relies heavily on the cloud service engine (CSE) to provide rapid dynamic scaling capabilities, which can rapidly adapt to different resource needs during on and off-peak hours. The infrastructure technology team worked with the CSE team to optimize container cold and hot start performance and develop serverless O&M tools such as logging, monitoring, and link tracking.
2. NBF containers: NBF containers use the open service gateway initiative (OSGI) architecture and provide complete bundle lifecycle management, including load, start, unmount, and delete, as well as bundle isolation and communication.
3. Platform capabilities:

Service release: Allows for quickly release of a function or to bundle as a service.
Service routing: Includes service polymorphism, degradation, and moke routing.
Service management: Includes service version control and service start/stop.
Service O&M: Includes serverless services, hybrid-deployment, phased release, disaster recovery, and service degradation.

NBF - Function Instances

NBF Function Instances correspond to the bundle implementations behind every service in the NBF service market.

NBF - Event Sources

NBF Event Sources provides service orchestration capabilities using EventMap from the Process Center.

NBF - FaaS Controller

There are three types of NBF FaaS Controllers:

1. Process Center service scheduling: Process Center provides event-driven capabilities for scheduling SPI services, including process events, message events, and scheduled events.
2. RPC service calls using Broker: Broker can invoke remote procedure call (RPC) services from bundles in several ways, including polymorphic routing, degradation routing, and mock routing.
3. NBF: Rest can utilize HTTP services provided by bundles.

NBF FaaS Capabilities

Bundle Lifecycle Management - NBF Containers

NBF Container Architecture

NBF containers manage the entire lifecycle of bundles from loading to starting to unmounting and finally deletion. It also uses OSGI to implement bundle isolation and communication.

The following figure shows the container architecture design.

The NBF container architecture is divided into three layers:

1. The Serverless layer: This layer is jointly built by the NBF and CSE team, and CSE is responsible for fast auto-scaling. It has proved itself in terms of performing by serving well during major promotion events on Alibaba's e-commerce platforms, including the Double 12 Festival and the Queen's Festival. NBF implements Fast Cold Start and Fast Hot Start. Fast Cold Start optimizes the cold start time of bundle service releases and Fast Hot Start optimizes the service availability time of bundle services after scaling. The underlying containers as a service (CaaS) services are also migrating from Sigma 3.0 to ACK-EE and will fully support Alibaba Cloud units in the future.

2. NBF-OSGI Framework: By using the OSGI mechanism, NBF-OSGI Framework is able to manage the complete lifecycle management of bundles from load to start to unmount, to deletion. Currently, we use Pandora for the majority of our FaaS needs and use ModuleClassLoader to load middleware. Therefore, NBF containers also use the Pandora loading mechanism to load bundles. The NBF-OSGI Framework provides a complete set of mechanisms for bundle isolation, bundle-to-container communication, and bundle-to-bundle communication.

i. Bundle-to-container communication: Bundles are able to utilize container capabilities, such as Spring context hosting and AOP, by using the import mechanism provided by containers.

<imported>
    <packages>
        <package>org.springframework</package>
        <package>org.apache.commons.logging</package>
        <package>org.aopalliance</package>
        <package>org.aspectj</package>
    </packages>
</imported>

ii. Bundle isolation: the NBF container creates an independent sandbox for each bundle, which ensures code-level bundle isolation from the time of loading, thus preventing class and resource conflicts among bundles.

iii. Bundle-to-bundle communication: NBF containers manage bundle context globally, which means bundles are able to share context globally.

3. Bundle and Plugin hosting: A bundle is a collection of code written by developers to provide a service. Plugins are provided by the NBF engine to implement certain capabilities and load them as plugins, such as the crucial service release feature of NBF-FaaS.

A Future NBF Container Architecture

We use Pandora for most of our FaaS needs. For that reason, the current NBF container architecture is based on the Pandora loading mechanism. At its core, NBF containers are essentially Pandora containers. In the future, we will use the NBF-OSGI Framework to host external containers. These external containers can be Pandora containers, or any other type of containers, thus solving our dependency on Pandora containers. A greater variety of bundles can run on the NBF-FaaS platform.

New NBF container architecture:

Bundle Service Release

The following diagram is an illustration of how NBF release bundles as RPC services. There are three phases:

Load bundles by using the routing table.
Load and start bundles by using the NBF Framework.
Load and start the service release plugin by using the NBF Framework.

Service Routing and Control - Broker

Broker Architecture

The Broker architecture includes:

Broker Agent

Broker Agent separates Broker SPI and Implement. It uses BrokerBundleLoader to load implement dynamically, thereby solving the problem of code change and service re-release when the Broker is updated. It must feel good to be freed from the headaches caused by second-party library updates. SPI Proxy implements non-invasive service calling by using annotations. It is easy to switch from traditional service calling methods to NBF service calling methods. For example:

1. Traditional service calling methods:

@Autowired
ServiceA serviceA；
serviceA.invoke(params);

2. Broker SPI call method:

@Autowired
BundleBroker bundleBroker;
bundleBroker.get(ServiceA.class).invoke(params);

3. Service calls through annotation:

@DynamicInject
ServiceA serviceA;
serviceA.invoke(params);

@DynamicInject makes service calls just as easy as traditional calls.

Broker Bundle

The core functions of the Broker Bundle are as follows:

BundleProxy

BundleProxy acts as the proxy for running bundles. For example, bundle circuit breaking and passive degradation are all implemented through BundleProxy. Therefore, all these features are the same for every bundle.

Service Discovery

Service discovery is responsible for finding the necessary service and generating the URI needed for invoking the service. Let's use HSF as an example. HSF uses the following URI: Proxy://IP:port/service/version/method. For Broker service discovery, container network information such as the IP and port can be found based on APPName or Armory and, after the implementation of serverless, GroupId. The serviceName and version can be found based on SPI and bundle metadata. With these, we can generate the NBF service discovery URI.

Route Computing

Before we go into route computing, let's take a look at the three routing modes @DynamicInject supports: default mode, policy mode, and dynamic mode.

1. Default mode: The default mode does not define any routing parameters. This mode is suitable for single bundle SPI. The bundle implementation is the default implementation.

@DynamicInject
private ConfigReadService configReadService;
ResultDO<List<ConfigDTO>> result = configReadService.queryConfig(new ConfigQuery);

2. Policy mode: This mode supports three routing parameters: Id (service identifier), Expression (regular expression), and Rule (policy definition).

// 指定bundleId方式, type默认为ID
@DynamicInject(pattern = "drf")
// 指定正则方式
@DynamicInject(pattern = "^drf-hz.*$", type = "REG")
// 指定Rule方式
@DynamicInject(pattern = "{\"wareHouseId\":\"2001\"}", type = "RULE")

3. Dynamic mode: Sometimes we do not know which parameters will be useful during runtime and need to pass the necessary parameters when the method is called. This is where Dynamic mode comes in handy.

@DynamicInject
private DynamicInvoker<ConfigReadService> configReadServiceDynamic;

ResultDO<List<ConfigDTO>> result;
// 动态传入bundleId
result = configReadServiceDynamic.getService(bundleId).queryConfig(new ConfigQuery);

// 动态传入规则参数
Map<String, Object> params = new HashMap<>();
params.put("merchant", merchant);
result = configReadServiceDynamic.getService(params).queryConfig(new ConfigQuery);

Route computing also involves SpiProxy, which has two main functions:

A. Obtain SPIInfo, including ClassName, SpiVersion, and SpiCode.
B. BundleId is called when route calculation is performed. With this, combined with the addressing policy we mentioned in Service discovery, it is not hard to see that we have already generated the URI for invoking the NBF service. This is the core principle of NBF polymorphic routing.

Circuit Breaking and Degradation

This includes two components: active circuit breaking and passive degradation.

1. Passive degradation: Passive degradation is triggered by one of the following errors: service not found, service timeout, or service exception. When this is triggered, service calls to the bundles are automatically routed to the corresponding degradation bundles. The following table explains the degradation:

2. Active circuit breaking: NBF uses baseline metrics to decide if active circuit breaking is in effect. If the baseline is exceeded, then traffic is routed to degradation bundles.

In the example above we selected a bundle (Supply Chain - Wholesale - RT-Mart) to be the degradation bundle of Supply Chain - Wholesale - Hema Fresh. If the baseline value of 100 miliseconds is exceeded, then the degradation bundle is used. At the heart of the active circuit breaking process is the BundleProxy we mentioned earlier in the article. BundleProxy decides if the conditions for active circuit breaking or passive degradation are met and executes the action on behalf of the bundles.

Throttling

Throttling provides software load balancing capabilities that allow traffic balancing between bundles and their corresponding degradation bundles. Let's take the bundle Supply Chain - Wholesale - Hema Fresh as an example:

High-availability O&M of Services

NBF-Serverless Capabilities

NBF-Serverless is an important cornerstone of the NBF container architecture mentioned earlier. Only when Serverless is capable of elastic scaling in milliseconds can it truly support peak/off-peak scenarios, which saves computing resources to the maximum extent.

NBF service deployment can only be truly isolated when Serverless centralizes elastic resource scheduling, replacing the current solution of hybrid deployment of different container specs (such as 1Core2G, 2Core4G, and 4Core8G) and bundles. Speaking of which, we want to give our sincere thanks to our partner, the CSE team. Without the capacity of auto-scaling within miliseconds, none of this would be possible.

Of course, that's not to say there's no more work to be done. We are still working on Serverless O&M tools, such as logging, monitoring and alarms, and link tracking. We also plan to move our Serverless to ACK-EE cloud units. What role does NBF play in the implementation of NBF-Serverless capabilities? The following diagram illustrates how Serverless works and the responsibilities of CSE and NBF.

Fast Auto-Scaling

Fast Auto-Scaling is the core infrastructure capability provided by CSE, which includes the following steps:

1. Seed server startup is a cold start process. This is also how the current apps in Alibaba start, namely container start, image loading, and service exposure. This process usually takes several minutes.
2. Seed distribution uses Fork2 to replicate seed machine memory. The process from memory replication to server scaling is very quick. Therefore, CSE Auto-Scaling can implement elastic auto-scaling in milliseconds.
3. Service registration is performed using ConfigServer. This is to ensure that the replicated Service Bean can be called.

Fast Cold Start

The first thing we did when implementing NBF-Serverless was to optimize cold start performance. We wanted to reduce the start time from minutes to seconds and that was why we changed how NBF bundle cold start worked.

Deploy the Engine in advance when you create or expand the server group for a bundle.
Bundles are dynamically loaded with NBF-FaaS. Originally, the cold start time is the combination of the Pandora container start time, engine start time, bundle install time, and bundle start time. After optimization, the cold start time was reduced to only the bundle install and start time.

Fast Hot Start

The current scaling mechanism is implemented through memory replication, but the replication of machine-related memory variables, such as a universally unique identifier (UUID), is obviously less than ideal. Therefore, the NBF Hot Start optimization mainly provides a memory variable refresh mechanism. The bundle lifecycle is managed by NBF Framework through hooks. The same hooks also solve the UUID problem.

Serverless Best Practices

Although Serverless O&M capabilities are not yet perfect, we launched several P0 services during last year's Double 12 Festival and this year's Queen's Festival to test Serverless stability and millisecond-level auto-scaling during major promotion events. Of course, we didn't go into this without any insurance. NBF's ability to circuit break, degrade services, and throttle traffic were our aces in the hole.

The queries per second of the profile service soared from more than 4000 to 120,000 queries per second on the day of the Queen's Festival, and Serverless quickly expanded to 10 units, providing the necessary support for the business peak. The resource savings are quite obvious. Originally, the profiling service was deployed on 10 servers at all times. Now, it only runs on two servers. In fact, only one server is really needed, but two servers provide fault tolerance. Ultimately, Serverless should be able to solve the long-tail service issue and scale down to zero servers.

The following figure shows a comparison of the indicator systems before and after Serverless implementation on the day of the Queen's Festival:

The data in the figure shows that the system and service stability of the entire profiling service during the promotion are completely reliable, which fully verifies the feasibility of NBF-Serverless.

Fast Rollback

Fast rollback is a very effective way to achieve high-availability NBF O&M. The traditional app rollback method involves recompiling, building, packaging, and deployment. However, with true FaaS capabilities, bundle rollback on NBF only needs to load the JAR file of a certain version. In addition, NBF Engine is a resident container, making bundle rollback a lightning-fast affair.

Summary

This article describes the FaaS capability of NBF in detail. To sum up, NBF uses an atypical FaaS architecture to achieve typical FaaS capabilities.

At the start, we introduced FaaS and compared the typical FaaS architecture and the architecture of NBF-FaaS. That was followed by a focus on the FaaS capabilities of NBF, including the container architecture of NBF, the service release of bundles, and the core implementation principles of bundle routing and control. Finally, we described the high-availability O&M capabilities of NBF, focusing on the implementation principles of and our practical experiences with NBF-Serverless. The NBF team originated in Hema Fresh and then moved back to the supply chain mid-end, providing an open ecosystem to 25 business units and partners, including Hema Fresh.