How Object Storage and Data Lake Follow the Trend?

Preface:

Cloud native, which has become a prominent learning in the digital era, is not a single technology, but a design idea that reshapes software development and business operation applications, and is a set of technical systems and methodologies. Cloud native "Cloud Native" refers to the cloud platform, while Native means that applications use the cloud environment from the beginning of design, are designed for the cloud, and make full use of the elastic and distributed advantages of the cloud platform. According to the prediction of relevant institutions (Gartner), the digital workload deployed on the cloud native platform will increase from 30% in 2021 to 95% in 2025. As the earliest cloud service, object storage has widely supported various cloud services, and is the best choice for unified storage in the field of data lake.

1、 What is Cloud Native

1.1 Cloud Native Definition

With the development of technology, the cloud native definition is also evolving. The following is the current cloud native definition of the Cloud Native Computing Foundation (CNCF):

"Cloud native technology is conducive for organizations to build and run elastic and scalable applications in new dynamic environments such as public cloud, private cloud and hybrid cloud. Representative technologies of cloud native include container, service grid, microservice, immutable infrastructure and declarative API.

The first sentence of this description describes the application scenarios and goals of Cloud Native, and the second sentence describes the related technologies that Cloud Native will use. At present, cloud native technology represented by containers, microservices and DevOps has been practiced and verified in many industries such as finance, telecommunications and the Internet, providing enterprises with flexible, resilient and expandable user experiences.

1.2 Cloud Native Technology Features

According to the original technical architecture of the cloud, it can be widely deployed to public cloud, private cloud, hybrid cloud and other cloud environments.

Agile application development is realized through CI/CD continuous integration.

The lightweight running environment of the container is used to reduce resource overhead and optimize costs. Based on the service grid (such as Istio), each module of the application is controlled and flexible scheduling is realized. Through the concept of microservice architecture, the application is divided into multiple modular services. Based on the divide and conquer approach, multiple teams can develop rapidly and iteratively.

The immutable infrastructure is to deliver software through Docker Images, package and publish the software and running environment, and reduce the complexity of environment adaptation; However, it is very complicated to provide software packages, and then to deploy, debug and run the solutions in the customer's environment. In some scenarios, the image based deployment time is only 1/10 of the software package based deployment time, which greatly optimizes software delivery.

Declarative API, similar to K8S (Kubernetes), only needs to submit the defined API interface to "declare", representing the expected final state, and can be completed in one call. In the software package deployment mode, you need to execute commands to achieve step by step interaction and finally complete the release. This "imperative API" is inefficient compared with the "declarative API" of K8S. Therefore, the "declarative API" can make the delivery between systems simpler, without paying attention to process details.

1.3 Cloud Native's Demand for Storage

Under the above cloud native technology architecture, there are many requirements for storage, including:

• Safety of containers

Whether it is CI/CD, container or microservice, they usually run in a virtual network (VPC) environment. How to achieve secure data access under the VPC container is a basic requirement.

• Microservice isolation

Based on the service grid and microservice architecture, applications need to be divided into many sub services. It is essential to reduce the interference between sub services and achieve data access isolation between sub services.

• Elastic scalability

The sub service modules in the microservice architecture will introduce burst traffic, for example, 10000+containers accessing data concurrently will bring a flood of access. The container images of the immutable infrastructure start storms in batches, which will also bring centralized instantaneous traffic. Therefore, storage is required to provide elastic scalability.

• High availability and reliability

Microservice architecture will produce a large number of sub services, which all require highly available and reliable underlying storage, so as to achieve the five nines availability required by enterprise applications.

• Performance per unit storage density, predictable bandwidth, delay, OPS

The container based fine-grained running environment realizes the second level billing capability on the public cloud, which is more refined than the hourly level of the elastic computing server. Therefore, the storage provides bandwidth specifications per unit density (Gbps bandwidth capacity per TB), stable request latency, and OPS (99.99% of requests are completed within a specified time T), which can effectively help microservices evaluate the duration of using storage, so that containers can be released as needed to obtain the most appropriate cost performance.

2、 How does object storage support cloud native

2.1 Object storage conforms to cloud native definition

As a platform for data storage, object storage naturally supports the construction and operation of elastic and extensible applications. In addition, containers, service grids, and microservices use object storage to store data in the early design and development. The common technology for storing container image data is also object storage. The Restful API of object storage fully matches the declarative API requirements, so object storage is an ideal place for cloud native data storage.

2.2 Challenges of Cloud Native to Object Storage

Object storage is applied to cloud native, which is a typical storage computing separation architecture; At the same time, as the data base, object storage has been widely recognized in the cloud for its high reliability, high availability and elastic scalability.

Cloud native applications are leading various application fields to realize cloud native applications, while also profoundly changing all aspects of application services. As the cornerstone of application operation, object storage has encountered more challenges in the process of service cloud evolution. Security, isolation, and unit density performance are all new challenges faced by object storage, which need to be addressed centrally.

2.3 What to do with object storage

• Strengthen storage security of container access objects for VPC environment

The VPC runtime environment and the object bucket can be bound for secure access. Only the specified object bucket can be accessed within the VPC to avoid "insider" copying the enterprise's object storage data to the individual object data bucket from the enterprise VPC. This requires that the object store provide the VPC endpoint function; It is also necessary to limit the access ability of the object bucket to the designated VPC to avoid data leakage after external hackers steal the key. The object bucket is required to provide the bucket policy function.

• PoD level flexible access to object buckets for microservices

Similar to the cloud computing server binding the access token to access the storage, the PoD needs to bind the access token to access the storage. Through the temporary nature of the token, long-term keys such as the Access Key can be prevented from being stolen, which brings huge security risks.

• Microservice oriented isolation

Access isolation. Different sub services of the application can use the same data bucket, but they need to provide different access domain names for each sub service, bind different access policies, control access paths and permissions, and achieve access isolation. Object storage needs to provide the Access Point function.

• Flow control isolation

It provides user level, sub user level, and object bucket level flow control capabilities for application sub services, limits the bandwidth that can be used by sub services, and prevents applications from being overwhelmed by abnormal traffic.

• Specifications that provide performance per unit storage density

Provide differentiated performance of unit storage density through different storage types. For example, high-performance storage types provide different bandwidth capabilities for each TB of capacity. The stronger the bandwidth performance specifications, the higher the price. Thus, different storage types can be selected for applications according to actual performance requirements.

• Ensure stable and predictable time delay and OPS

The stored data access delay fluctuates. Even though it is low in most cases, a small amount of long tail high delay is enough to make the running time of cloud native applications unpredictable, and can only be designed according to the worst long tail high delay. Therefore, all storage types of object storage should provide stable delay and QPS, rather than pursuing the extreme low delay.

• Hot data performance acceleration

Container batch startup storms and parallel computing frameworks will bring a lot of hot data repeated reading. An object storage hotspot data accelerator deployed near the container availability zone (AZ) is provided, which can improve the container loading speed and quickly complete parallel computing tasks.

three

3、 How to do a good job of data lake under cloud native

Data lake is an effective means to solve the storage and utilization of big data. As shown in the figure below, in order to better adapt to cloud native applications, in addition to stronger requirements for high reliability, high availability, and elastic expansion, the data lake also needs to provide stronger functions in terms of security, isolation, and interfaces and functions that support computing engines. In order to make better use of the fine-grained operating environment of cloud native containers, the performance of unit storage density is also required to achieve predictable bandwidth, delay, OPS, and performance acceleration capabilities, so that the data access duration of microservices can be modeled and calculated, so that applications can accurately apply for and release containers, and achieve cost optimization.

In addition to these changes, we also need to pay attention to the following points:

• Interface adaptation of data lake supporting cloud native computing

The physiological concept of cloud origin has been widely accepted, and data analysis and calculation engines based on cloud native architecture have blossomed everywhere. The data lake based on object storage should support rich computing engines, including historical engines (such as Hadoop ecology) and new engines (such as Spark, Iceberg, Hudi, Delta Lake, etc.). To support these engines, the data lake must adapt to the data access interface, especially the HDFS interface of historical Hadoop ecological access.

• Observable operation and maintenance

Since the cloud native uses the microservice architecture, applications are usually composed of multiple microservices, which simplifies the deployment difficulty. But with the increase of data links, it also increases the difficulty of troubleshooting, analyzing performance, and measuring the system's operation and maintenance. Therefore, cloud native requires architecture related modules to provide observable O&M, and object storage as data storage also needs to provide observable O&M Log/Metric/Trace capabilities to be integrated by cloud native applications.

• Arranged and schedulable resources

Cloud Native implements flexible deployment and management of applications through orchestration and scheduling. Object storage needs to provide an interface for orchestration and scheduling, and integrate it into Cloud Native's life platform, so that storage resources can be quickly available on demand.

Cloud native has fully released the dividend of cloud computing. In the future, more business applications will be born in the cloud and grow better than the cloud. Therefore, as the application of cloud native architecture expands to more fields, new needs will spring up like mushrooms. Under this background, the data lake will continue to evolve, so as to better meet the actual needs of the business.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us