A serverless platform can provide ultrahigh elasticity but poses great challenges to the infrastructure. To resolve this problem, Alibaba Cloud provides a solution that combines Container Service for Kubernetes (ACK) with other cloud services to optimize data access based on Elastic Container Instance. This topic describes the challenges of accessing data in serverless cloud computing and the solution to overcome the challenges.

Challenges of accessing data in serverless cloud computing

A serverless platform provides scaling capabilities that can quickly scale resources or workloads for applications. It takes only a few seconds after an application starts to scale out until it is ready for use. Compute resources are scaled within a few seconds or even milliseconds. As a result, the infrastructure faces great challenges. Storage resources are the most commonly used infrastructure resources. If the IO throughput of a storage system cannot match the rate of instance scaling activities, the system cannot meet the requirement for second-granularity scaling. For example, the system can scale container instances within 2 seconds but needs tens of seconds or even several minutes to download data from the storage system.

Serverless containerization has the following requirements on the traditional storage systems:
  • High-density access: Compute resources are only used to process data. Data is stored in the storage system. As a result, the overheads of data access in high concurrency scenarios are increased. This not only adversely affects system stability but also increases the bandwidth usage of the storage system to 100%.
  • Low network latency: An architecture that decouples computing and storage prolongs the link to the storage system. The latency of accessing business data and metadata across networks increases.
  • Elastic IO throughput: The bandwidth and throughput of traditional distributed storage systems increase with the storage capacity. However, application-oriented resource scaling may create large numbers of containers in high concurrency scenarios. When these containers concurrently access the storage system, the storage system triggers access throttling. The problem is how to balance the ultrahigh elasticity of compute resources and the limited bandwidth of the storage system.

Solution for optimizing data access

To better support serverless cloud computing, the ACK team works with the basic software and operating system team, Elastic Container Instance team, and Data Lake team to provide a solution for optimizing data access based on Elastic Container Instance. The solution conforms to the following rules:
  • Complies with the existing standards to ensure a consistent user experience. For example, Sidecar and Device Plugin in Kubernetes are used as standards to expose APIs and user interfaces.
  • Supports fine-grained Linux privilege control.
  • Synchronizes kernel updates and underlying updates from open source Kubernetes. All design is consistent with open source Kubernetes.
The architecture used by Fluid in serverless cloud computing consists of the data plane and control plane. Architecture
  • Data plane: FUSE containers corresponding to different runtimes compose of the data plane. FUSE containers are deployed together with applications as sidecar containers. Sidecar manages data access related to applications.
  • Control plane: The control plane consists of the injector, cache runtime controller, and application controller.
    • Injector: The injector transforms data access and runtime implementation information to Sidecar readable information, and injects the information into applications. The injector also controls the sequence in which the containers of a workload are launched. The workload can be a pod or a big data AI computing workload, such as a Spark Job, TensorFlow Job, or MPI Job.
    • Cache runtime controller: The cache runtime controller controls the elasticity of data caches based on the throughput of FUSE Sidecar, and also manages data access permissions.
    • Application controller: The application controller terminates the FUSE containers in a pod when the containers of a batch Job, TensorFlow Job, or Spark Job in the same pod are terminated.