What Is CloudOps

DevOps and the cloud go hand-in-hand, but we've also found that many companies that move DevOps to the cloud don't take full advantage of the cloud's benefits. In order to better leverage the dual agile qualities of DevOps and cloud, the industry needs to have a more mature and systematic philosophy. Therefore, we propose a new idea - CloudOps (automated operation and maintenance on the cloud).
CloudOps is an extension of traditional IT operation and maintenance and DevOps. It realizes the re-evolution of operation and maintenance through cloud-native architecture, which fully helps enterprises reduce IT operation and maintenance costs, improve delivery speed and system flexibility and agility, enhance system reliability, and build more secure and reliable Open business platform. CloudOps is not equal to pure Cloud + DevOps or DevOps on Cloud, but requires the organic combination of DevOps and cloud to gain greater value.

First. The next stop in DevOps evolution: CloudOps

From the traditional R&D to operation and maintenance model to DevOps, the efficiency from organizational culture to application delivery and deployment has been greatly improved.

Today, as more and more enterprises use cloud resources and delegate the responsibility for infrastructure operation and maintenance to cloud vendors, we believe that a new era has arrived, that is, cloud-centric DevOps, which will redefine DevOps , By fully combining the advantages and capabilities of cloud computing and DevOps, we define a new vocabulary: CloudOps, emphasizing how to better practice DevOps on the cloud platform and realize the evolution of operation and maintenance again.

Second. CloudOps Maturity Model

The report shows that at present, almost all enterprises fully recognize the products, services and capabilities brought by the public cloud, and most of them have already used DevOps in the public cloud. potential.
We believe that the cloud needs to be properly managed to achieve the best performance and benefits. For this reason, the cloud also provides a lot of automation and self-service capabilities to help enterprises. In the process of practicing CloudOps, we need to think about the following questions:
(1) The cloud provides a large number of automation tools and self-service capabilities. How to better use these tools to achieve automation?
(2) The cloud platform naturally provides sufficient elasticity, how to utilize the elasticity?
(3) How should high reliability and availability on the cloud be achieved?
(4) The challenges of network management and security and auditability on the cloud are far greater than those on the offline. How should they be managed?
(5) If cloud resources are not well managed, threshold design and resource quantification management will bring huge waste, how should they be optimized?

Combine the challenges of several parts mentioned above. We summarize the 5 building and measuring dimensions of CloudOps:

1. Automation capability
One of the core capabilities of DevOps is automation capabilities. Similarly, automation capabilities are the core capabilities of the cloud. In order to improve automation capabilities and programmability, cloud platforms expose a large number of open APIs and also provide a large number of automation products and capabilities. With the help of the automation capabilities provided by the cloud platform, enterprises can reduce the need to find more DevOps experts and fully use the automation capabilities of the cloud platform.
The main automation capabilities provided by the cloud platform include 3 major parts:
The first is the Infrastructure as Code capability. With the help of IaC tools and open OpenAPI, the version management of repeated deployment and deployment scripts can be quickly and automatically realized, and standardized strategies are used as much as possible to reduce environment differences, and at the same time, application delivery and operation auditing can be realized. In order to better support automation, Alibaba Cloud has also built various forms such as resource orchestration and Terraform to orchestrate basic resources.
After the basic resources and application delivery are completed, the daily operation and maintenance is mainly to operate the stock resources. As more and more tasks use the automation mode, the complexity of operation and maintenance tasks is increasing. It is necessary to deconstruct complex tasks and complete the automation of operation and maintenance by combining more atomic tasks. More and more enterprises begin to use The ability of Pipeline(Ops) as Code. By clearly sorting out the context of executing tasks and visualizing dependencies; atomizing each job unit, unit tasks can be efficiently completed and the complexity of a single task can be reduced; function maintenance and expansion are performed through task abstraction.
In addition to the infrastructure automation and the automatic operation and maintenance of basic resources mentioned above, the cloud platform can program a large number of resources, and expose a large number of other auxiliary capabilities to manage the full life cycle of resources through OpenAPI. However, as the complexity of the business system increases, the platform needs to expose more capabilities. For example, the changes of the underlying resources are sent in real time through the event system to improve transparency; more metrics are exposed through the monitoring system; after the application has problems, The problem discovery time can be simplified through simple self-diagnosis services, and the problem can even be repaired with one click with our cloud assistant for the management and control operation and maintenance channel.
Elasticity is one of the most important capabilities of cloud computing. Through the super-large-scale resource pool configuration capability, it can quickly realize the supply of resource requirements at the minute level, and meet the elastic requirements of different scale scenarios. With the help of flexible elastic capabilities, it can fully help enterprises. Reduce costs and improve availability. Using elastic capabilities on the cloud can improve the overall flexibility and stability of an enterprise's business.

2. Elasticity
Elasticity can be divided into two directions according to business requirements, one is vertical elasticity and the other is horizontal elasticity.
Vertical elasticity is suitable for scenarios where applications cannot be scaled horizontally. In common scenarios such as single applications, independent applications, and stateful applications, it is necessary to quickly upgrade or reduce configuration to cope with business changes.
Horizontal elasticity is more suitable for distributed applications and stateless applications. Thousands of computing resources can be scaled in minutes through the console, API and our automated tools.
In order to reduce the cost of using elastic scaling. Elastic scaling supports automatic resource scaling by setting different modes, and even intelligently predicts resource demand based on historical records.

3. Reliability
The cloud platform provides reliability building capabilities at multiple levels from data centers, hardware, data, and self-service.
Cloud computing's ultra-large data center and multi-availability zone support allow users to quickly build high-availability solutions such as intra-city disaster recovery and remote disaster recovery at low cost, high expansion, and high reliability based on the cloud. When planning and deploying applications It is necessary to prioritize the design and deployment of disaster recovery architecture to improve reliability.
In terms of data reliability, the scale dividend of cloud platforms also has natural advantages. This is not only reflected in the SLA guarantee of multiple copies of storage and extremely high data reliability, but the cloud platform also exposes OpenAPI to users in a service-oriented manner. Users can use the snapshot and mirroring capabilities provided by cloud vendors to achieve data backup capacity. disaster high reliability capacity building.
In recent years, observability capability has been a feature of great attention in DevOps. In order to support different levels of user needs, cloud platforms usually provide the following monitoring service capabilities: cloud resource monitoring, application layer APM, and user business layer monitoring.
In addition to fault tolerance in infrastructure and data, cloud service providers usually provide fault tolerance for application services to help users build resilient and fault-tolerant distributed systems. For example, by using some network disconnection drills in the security group, through AHAS (Application High Availability Service), you can realize automatic traffic control, service degradation, and plan execution of applications through traffic protection, fault drills, multi-active disaster recovery, and switch plans.

4. Security and compliance capabilities
According to the Flexera 2021 state of cloud report, 81% of enterprises are most concerned about cloud security, ranking first, and 75% of enterprises are very concerned about cloud compliance. So security and compliance are top priorities on the cloud.
Cloud platforms provide numerous policies, controls, and technologies that work together to help users secure data, infrastructure, and applications, and protect cloud computing environments from external and internal cybersecurity threats and vulnerabilities. In terms of security compliance capabilities, the cloud platform is responsible for the security, trustworthiness and auditability of infrastructure and products, including identity and access control and management, monitoring and operation, so as to provide customers with high-availability and high-security cloud services. Customers need to properly configure and utilize the capabilities of platforms and products to build their own cloud applications.
The network is the only entrance to all cloud services, and network attacks are the most diverse, the most harmful, and one of the most difficult risks to protect against. The cloud computing platform will provide a mature network security architecture to deal with various threats from the Internet. The communication and isolation between intranets can be guaranteed through security groups, subnet ACLs, and routing policies, and the network security capabilities of the system can be guaranteed through the cloud firewall application firewall and DDOS protection provided by the cloud security center.
Operational auditing and tracing are an important part of the security lifecycle, identifying potential security misconfigurations, threats, or unexpected behavior, supporting quality processes, legal or compliance obligations, and threat identification and response efforts. Auditing and change tracking functions are provided through a log-like auditing service, which is convenient to quickly trace the scope and source of changes.
Traditional operation and maintenance channels need to obtain keys for management through SSH, and open corresponding network ports. Improper key management and exposure of network ports will bring great security risks to cloud resources. The native Alibaba Cloud on-cloud automated O&M channel—Cloud Assistant, can help customers operate and maintain on-cloud resources safely and efficiently.

5. Quantitative management of costs and resources
One of the biggest features of cloud services compared to IDC is that they use resources instead of holding assets. Not only can resources be quickly created and released on the cloud, but also the cost of use can be greatly reduced compared to IDC. Also according to the Flexera 2021 state of cloud report, the second concern of cloud customers is cloud cost and management. Taking a cloud server as an example, its resource cost is mainly composed of three parts: computing, storage, and network. On the cloud, the billing method directly determines the pricing of resources, and choosing an appropriate billing method can directly save costs. For example, choosing preemptive instances can save up to 90% of the cost compared to the usage of billing. At the same time, different products provide rich specifications and billing methods, and selecting the appropriate specifications can effectively reduce resource costs; Utilization can also save a lot of money.
In order to achieve cost optimization and resource quantification, we also provide a series of products, ranging from cost analysis, resource optimization, resource specification, resource usage insight and automation tools to fully help enterprises reduce unnecessary cloud resource expenditures.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00