CloudOps makes operation and maintenance easier

1. Pain Points in DevOps Implementation Practice

It has been 12 years since DevOps was proposed, and many enterprises have begun to practice DevOps with great success. However, enterprises have encountered different challenges in the implementation of DevOps:

◾ Before the transformation of DevOps: Many enterprises will find a lack of DevOps experts; DevOps' initial investment is very heavy, requiring organizational change and adjustment; The internal tool capabilities are weak. With the development of the business, many DevOps tools can no longer meet the needs of enterprises.

◾ During the practice of DevOps, the focus will shift: in terms of organizational effectiveness, more attention will be paid to achieving efficient and agile delivery; In terms of architecture design, focus on how to clarify the dependency relationships between architectures, quickly deliver applications, and perform remote or multi live migration; In terms of self-service, more and more enterprises are choosing to use self-service. According to Gartner's "China DevOps Research Report (2021)", 75% of large enterprises will consider self-service as the most important trend in DevOps applications by 2025.

◾ In the evolution trend of DevOps, more and more DevOps enterprises have chosen to use intelligent decision-making capabilities, including evaluating the maturity of DevOps capabilities.

2. DevOps in Cloud Trends

In combination with the trend of enterprise cloud adoption, more and more enterprises have begun to use DevOps on the public cloud. This process requires cloud based transformation and adaptation of applications, while combining cloud native tools and task flow orchestration to improve delivery efficiency.

During the practice of DevOps on the cloud, many enterprises have completed the transformation of microservice architectures and the upgrading of distributed applications. At the same time, service governance is becoming increasingly mature. However, the proliferation of applications and the increase in dependency complexity brought about by this composition also pose great challenges to the observability of enterprise applications and the stability of systems.

During the cloud transformation process of DevOps, many enterprises have also made service-oriented transformation to their own Jushi applications. And almost all enterprises believe that open APIs and As Services are the core competitiveness of enterprise openness and service.

3. New Trends in Cloud Operation and Maintenance CloudOps

Based on the above trends of DevOps in the cloud, Alibaba Cloud elastic computing defines a model for CloudOps. Combining the dual advantages of DevOps and the cloud, it can be seen from four dimensions: cost, delivery speed, flexibility, and system reliability:

◾ Cost reduction: DevOps can significantly reduce costs through changes in organizational effectiveness and the construction of digital tools, while the cloud can reduce the cost of resources and manpower through on-demand resource flexibility and multiple resource selection and payment methods.

◾ Delivery efficiency: DevOps can achieve CI/CD, while the cloud can achieve second or minute level resource delivery.

◾ Flexibility: Users have higher requirements for the application's R&D and launch cycle, such as delivering an app within 7 days, from 0 to launch to the app store; The cloud can also help customers achieve rapid delivery of diverse infrastructure resources.

◾ Reliability: DevOps practices the concept of automation, while cloud nature provides high availability of infrastructure.

From the high availability of applications to the high availability of technical resources, as well as the monitoring and insight capabilities of systems, DevOps and the cloud are a very good combination. Therefore, a new concept, CloudOps, is proposed on the cloud, fully combining the advantages of cloud and DevOps to achieve a 1+1>2 effect.

02 Application centric automated operation and maintenance

The core concept of CloudOps is application centric, as only applications are the most important to customers.

During the entire lifecycle of an application from construction to delivery, customers' concerns will change: first, how to achieve automatic agile delivery of the application's construction and delivery; After delivery, customers will pay attention to the reliability of the system; One strategy that can quickly improve availability is flexibility, which combines flexibility and high availability solutions to complete system architecture upgrades; As applications become online, customers are gradually paying attention to the security, compliance, and audit work after the application is released; When the scale of the application becomes larger, customers will focus on costs and complete a continuous iterative and upgrading cycle.

1. Trilogy of Applied Automation

Automation is the foundation of system upgrading and transformation. Application automation includes several major parts, the most important of which are: infrastructure automation, operation and maintenance automation, and service automation.

1. Infrastructure Automation: In the past year, Alibaba Cloud has released many products to simplify infrastructure automation. Many companies and enterprises have begun to implement automation, but its problem is that automation templates are run based on customer completion. Today, Alibaba Cloud can let these templates be executed without any modifications and directly handed over to our engine. At the same time, more and more enterprises are reluctant to use JSON or YAML to define their infrastructure, and our new product ROS CDK released today can solve this problem well.

In addition, in order to simplify automated delivery, resource migration tools and automated image building capabilities are also provided. Customers can build an ECS image just like building a container image. At the same time, we will define an image family, allowing users to always automatically select the latest version like using container images, without the need to update configuration files.

2. Operation and maintenance automation: Our operation and maintenance orchestration OOS has opened up the task market, releasing a lot of accumulated best practices and tools for free in the task market, allowing users to integrate and use them; At the same time, in order to build convenient and related multiple applications, we have also released application management.

3. Service automation: We always take the ability of customers to self discover, troubleshoot, and solve problems as our main direction of effort.

2. New product: ROS Resource Migration

First, introduce the first product - ROS Resource Migration. Many people believe that IaC (Infrastructure as Code) is very good, but in the process of practice, the challenges are very great. First, it is very difficult to write an IaC template, which requires a lot of complex domain knowledge and an understanding of the scripting language; On the other hand, after the template is written, as the application architecture is upgraded, it is necessary to continuously update the template to reflect the latest infrastructure.

To solve this problem, Alibaba Cloud has provided a new solution. Users can use Alibaba Cloud's labeling function. After labeling, our ROS system will automatically analyze the label dependencies and help users build a set of IaC templates. In other words, users can completely not understand IaC, nor need to write JSON and YAML. Alibaba Cloud will automatically generate templates. After the template is generated, users can easily complete deployment in multiple availability zones, even multiple accounts, and multiple regions, greatly reducing the complexity of building a set of infrastructure templates previously. At the same time, after users have written templates, they can also ensure the success rate of user template deployment through intelligent template configuration and definition.

3. New capabilities: ROS Cloud Development Suite ROS CDK

In recent years, we have found that many enterprises are eager to embrace CloudOps, but they do not like JSON and YAML. Therefore, Alibaba Cloud has also released a new capability this year - ROS Cloud Development Kit ROS CDK (Cloud Development Toolkit).

It can use high-level languages (such as JAVA/Python, etc.) to directly generate ROS templates like writing scripts, and then generate the user's infrastructure through ROS templates. In summary, you can choose your own development language and familiar programming model to efficiently implement Infrastructure as Code.

4. New tool: application management

To simplify application construction, Alibaba Cloud has released application management. Application management is very simple, just selecting a tag or importing existing resources can quickly build a set of applications. With an application perspective, it can span multiple products, helping users do automated O&M, monitoring, publishing, and CI/CD, greatly simplifying the entire O&M process and reducing costs.

In addition, the biggest challenge in applications is application upgrade, including patch management, operating system configuration management, etc. Based on the application perspective, we help users group application perspectives, greatly reducing the threshold for using applications.

◾ Application reliability capability: After the application is built, the biggest challenge is actually reliability capability. Alibaba Cloud provides strong application reliability capabilities in infrastructure, such as multi region deployment and multi availability zone deployment.

◾ Elastic fault-tolerant capability: We have built intelligent prediction, which can dynamically recommend required resources based on users' past usage and operations of these resources; For transparency, we have also opened the ECS event system, which can simulate fault-tolerant drills of a physical machine outage or disk I/O hang model infrastructure; At the same time, it provides application high availability services, which can simulate traffic protection, fault drills, etc., greatly improving the fault tolerance capability between systems.

◾ Observability building: We have products such as cloud monitoring, SLS, ARMS, and Xtrace that can provide full link observation from basic resources to applications to logs to ensure system reliability.

◾ Data backup and recovery: We provide an extremely fast snapshot capability that can complete snapshot creation in seconds. It makes it very safe for users to make operational changes without having to wait a long time to make a snapshot, as before. Due to the cost of using snapshots, we have created a new service called snapshot retention cycle, which allows users to automatically archive or delete unused snapshots, reducing the cost of using snapshots.

5. Capacity building for safety and compliance

Security&compliance capabilities are also the basic capabilities of Alibaba Cloud and elastic computing. In addition to basic platforms (such as network security and system audit capabilities) and application security, we have provided more capabilities today.

When a user operates a security group and there are non compliant port changes, the system will automatically issue a warning to the user to help monitor these unreasonable changes and avoid system risks; In application security, in addition to the cloud security center, the security of the control channel of the operating system has also been our focus.

When operating and maintaining ECS, many people prefer to use SSH/RDP to log in to the server for operation. With the cloud assistant provided by Alibaba Cloud, we have opened the basic API, which is like a browser request, allowing users to directly perform host side operations on the client. Many users have reported that this operation is not as convenient and unfriendly as SSH, so we have released a new feature - Session Manager.

Through Session Manager, you can directly control the host without requiring a username and password, and integrate it into existing systems to complete operations such as keyless login, authentication, operations, and auditing.

In addition, this year, we also released a new feature - high-risk command interception. When a user executes high-risk commands, they can be intercepted, and their actions can be added to the playback log. When users perform high-risk operations, screen recording is performed through the Workbench and transmitted to the OSS, which can greatly improve our security and the reliability of auditable channels.

From the application perspective, the user's biggest headache is to determine what the configuration differences between the two ECSs are, and why some machines have problems, while others have no problems. Previously, it was very difficult for users to analyze this issue. Through the ECS instance configuration list, we will help users take a snapshot of configuration information such as the Windows registry and configuration. After the snapshot is completed, it will be automatically analyzed to analyze the differences between the two machines. This way, users can quickly find the differences between the two machines, greatly reducing the time for troubleshooting problems.

We have been pursuing the intensification of configuration management. We have released the key parameter management for ECS, which allows customers to unify application parameters into the Parameter Store for management. It natively supports multiple products such as resource orchestration, cloud assistant, and operation and maintenance orchestration, which can avoid the problem of not performing intensive management during parameter configuration. At the same time, using the Parameter Store also supports user parameter auditing.

The above new capabilities can greatly simplify the operation and maintenance complexity of ECS operations, provide secure channels, and achieve intensive configuration management.

03 CloudOps (Cloud Automated Operation and Maintenance) White Paper Release

1、DevOps in Cloud ≠ CloudOps

Is using DevOps on the cloud CloudOps? Maybe not. According to the latest DevOps report for 2021, only 20% of enterprises have fully utilized the advantages of DevOps on the cloud, due to the significant differences between the cloud and the cloud.

◾ First, there are differences in operation methods. There are many free automated O&M tools and integration tools available on the cloud, which can greatly reduce user costs, but require users to integrate with existing tools.

◾ Second, there are differences from assets to resources. When managing resources, it may be considered a resource on the cloud, or an asset off the cloud. For example, when managing resources on the cloud, more often than not, the original machine is released and a new machine is pulled up, so that configuration upgrades and application upgrades can be completed without caring about the asset form. This is the difference between the forms of operations on the cloud and off the cloud.

◾ Third, the difference between unification and scale. The scale of the cloud is very large, and many machines can be started or released at any time. If there is a mistake in operation, it may bring relatively large costs or technical risks to the enterprise.

◾ Finally, the real-time requirements for security and auditing on the cloud are very high.

2. CloudOps Key Maturity Models and White Papers

We believe that CloudOps is not just about using DevOps on the cloud, but more about requiring users to pay attention to the characteristics of the cloud. These characteristics can be summarized into five dimensions, namely, automation ability, flexibility ability, reliability ability, security compliance ability, and cost and resource quantification. We have divided the five major areas of DevOps on the cloud in detail, and we have also defined and divided the levels of each area, forming the main maturity model of CloudOps.

Taking automation as an example, the popular view now is to achieve unattended, which is defined in the main maturity model of CloudOps. We hope to use this maturity model to help customers measure whether DevOps on the cloud is mature enough and how they can improve their maturity.

In order to better help customers understand our CloudOps maturity model, we have released a CloudOps white paper, a CARES model jointly written by more than 10 technical experts in Alibaba Cloud elastic computing, which demonstrates how to find appropriate operation and maintenance methods and tools on the cloud from five aspects: cost management, automation, reliability, elastic capacity management, and security compliance.

3. Full display of Alibaba Cloud CloudOps product family

Many people say that the essence of cloud computing is the automation of operation and maintenance capabilities. Over the past decade, Alibaba Cloud elastic computing has made a lot of tools and efforts in simplifying operation and maintenance, aiming to comprehensively improve the performance of DevOps on the cloud, and has also formed a complete CloudOps product family.

◾ In terms of cost management, cost optimization schemes and cost payment mode schemes can greatly reduce user costs.

◾ In terms of automated services, it provides hosted O&M, including O&M choreography, patch management, configuration lists, and parameter warehouses.

◾ In terms of batch delivery, tools such as OpenAPI and elastic scaling are provided, which can greatly reduce the complexity of automated delivery.

◾ Instance operation and maintenance channels provide many ways for users to integrate through our web version or through cloud assistants and the latest released tools, greatly reducing the threshold for using automated operation and maintenance.

◾ Reliability services are the focus of all cloud users, and we have released application management capabilities.

◾ In terms of observability, self-help troubleshooting, and event services, a complete suite has also been released, and most services are free.

◾ In terms of security and compliance, including the security of the application environment and the convenience of compliance audit. We have integrated many products to improve our overall security and compliance capabilities, helping customers identify and eliminate security and compliance risks in a timely manner.

From the initial cloud deployment to today's era of making good use of and managing the cloud, Alibaba Cloud elastic computing has been committed to providing customers with rich, secure, and convenient cloud O&M products and capabilities. In the future, we also hope to work with everyone to build more efficient and intelligent cloud O&M.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us