Hujiang container orchestration-based Dev/Ops process-Alibaba Cloud Developer Community

[Editor's words] our entire DevOps process is based on container orchestration, with the purpose of simplifying the process, implementing automated CI/CD and automated O & M. There are many unexpected things, which may not be suitable for complex scenarios. This article is about the Dev/Ops process based on container orchestration in Hujiang with the DevOps and SRE concept of popular, more and more developer and operater the abandon traditional development deployment process, steering. As shown in the following illustration wireless circulating mode: in my understanding, DevOps consists of three parts: Agile, continuous integration and delivery (CI/CD), and automatic O & M (ITSM). In the containerization era, how do we implement DepOps or SRE? Next, I will share the DevOps process of the Hujiang learning product line team based on container orchestration.

Agile development

dao Zhi Jian, all the lessons of blood tell us not to complicate simple things. In other words, do not use complex methods to deal with simple things. My understanding of agility is "fast" and "micro 」. Fast refers to fast iteration, fast development, fast launch, and fast performance. Microservices and images. Around these two points, we need to do the following in the development phase:

application microservice

this is a relatively large concept and will not be discussed here. If you are interested, please refer to my other articles. However, only when the application is small can it be faster.

Slimming Docker images

to enable Docker to start and run quickly, you must first slim down Docker. All applications are developed in the Java language. We use Java as an example and use jre-alpine as our basic image. The following is an example of Dockerfile:
FROM java:8-jre-alpine
RUN apk --update add --no-cache tzdata ENV TZ=Asia/ Shanghai RUN mkdir -p /app/ log COPY ./target/xxx.jar /app/xxx.jar EXPOSE 9999 VOLUME ["/app/log"] WORKDIR /app/ ENTRYPOINT ["java","-Xms2048m", "-Xmx2048m", "-Xss512k", "-jar","xxx.jar"] CMD [] images generated using the preceding Dockerfile are only more than 80MB on average, and the startup time is almost 5 seconds. Although using Alpine images reduces the volume, some tool commands, such as curl, are missing. You can install them as needed. Another problem is the time zone: The time zone in the Docker image is UTC time, which is inconsistent with the host's East 8 zone. Therefore, you must install the timezone tool and set TZ, to ensure that the time in the container is consistent with that of the host, it is necessary to write data to the database and output logs.

Include all environment configurations in the image

as early as in the virtual machine era, we have already accelerated the deployment by using virtual machine images that contain dependencies. Why do we stop here? We can go further and include the service itself in the image. Docker has already achieved this in a lighter way. Here, we also want to introduce a concept that enables the image to run on all servers with Docker installed, regardless of the operating system and environment of the host. To borrow a word from Java: one production, multiple platforms run. Therefore, we also put the configuration files of all environments into the image with different file names, and select the environment configuration files used when Docker is started by parameters. It is worth noting that if the developed application is based on the Spring Framework, this function is very easy to implement. However, if it is developed in other languages, there will be a certain amount of development. This article uses the default Java development. After all the development work is completed, the recommended program directory structure is as follows:
│   ├── main
│   │   ├── java
│   │   ├── resources
│   │   │   ├── application.yaml
│   │   │   ├── application-dev.yaml
│   │   │   ├── application-qa.yaml
│   │   │   ├── application-yz.yaml
│   │   │   ├── application-prod.yaml
│   │   │   ├── logback.xml
│   ├── test
├── scripts
│   ├── Dockerfile
│   ├── InitDB.sql
├── pom.xml 

continuous integration and delivery

continuous integration and delivery of automation play an important role in the entire DevOps flow, which serves as a bridge between development and O & M. If you do not do this well, you cannot support fast iteration and efficient O & M of a large number of microservices. In this phase, we need to use tools flexibly to minimize participation. Of course, we still need to focus on "fast" and "micro. How to reduce manual participation in continuous integration and continuous delivery? The most desirable development process is to tell the desired functions to the computer. The computer will automatically code according to the routine, automatically publish to the test environment, automatically run the test script, and automatically go online. Of course, in order to realize automatic coding in the current era, the "cat" needs to be invented 」. But as long as we have enough confidence in the test, we can completely achieve a state: in a hot afternoon, we can easily submit our own code, go to the lounge for a cup of coffee, when I came back, I saw that my code had been applied to the production environment. In the container age, we can quickly realize this dream. The following figure shows the specific steps:

Gitfolw and Anti-Gitflown

the first step in continuous integration is to submit Code (Code Commit). VCS has also evolved from CVS and SVN to Git today. Naturally, we have to talk about Gitflow. When it comes to the unknown Gitflow, everyone will talk about its advantages: support multiple teams, set developers from multiple countries to develop in parallel, and reduce the probability of code conflicts or dirty code going online. The general process is as follows: Gitflow shows us an elegant solution for complex teams that cannot handle code versions. It requires feature, develop, release, hotfix, and master branches to handle parallel development at different time periods. But is this really suitable for the development of a local cooperation team with no more than 20 people? Our development team is less than 6 people, and each person is responsible for more than 3 microservices. It is almost impossible to arrange more than two employees to develop in parallel on the same project. At the beginning, we followed the rules and used the standard Gitflow process. Developers immediately found a problem that they needed to merge code back and forth on at least three branches, and there is no code conflict (because it is developed by one person), which reduces the development efficiency. This made me realize that Gitflow mode may not be suitable for the world of small team microservices, and an idea of anti-Gitflow mode appears in my mind. I decided to slim down the Gitflow and make it simple. We simplified 5 branches into 3 branches. The Master branch only maintains the latest online version. The Dev branch is the main branch of development, all images are generated from the code of this branch. The development process is as follows:
  • developer from Dev branch in checkout new feature branch, in feature branch development on
  • after the development is completed, merge it back to the Dev branch, create an image based on the code of the Dev branch, deploy the QA environment and submit it to the QA personnel for testing.
  • If there is a bug in the test, fix the problem in the new branch. Step 2
  • merge back to Master branch after test
in this way, only one merge action is performed to merge the code from the Feature to the Dev branch, which greatly improves the development efficiency.

Use Jenkins Pipeline

jenkins, as an old CI/CD tool, can help us automate code compilation, upload static code analysis, image production, deployment test environment, Smoke testing, deployment online, and other steps. Especially after Jenkins 2.0 introduces the concept of Pipeline, the preceding steps become so flowing. It allows us to complete the entire integration and release process from step 3. If you want to do a good job, you must first install the plug-in on the Jenkins:
  1. Pipeline Plugin (if Jenkins2.0 is used, it is installed by default)
  2. Git
  3. Sonar Scaner
  4. Docker Pipeline Plugin
  5. Marathon
if you first come into contact with Jenkins Pipeline, you can https://github.com/jenkinsci/p ... AL.md find help. Now, let's start writing Groove code. Container Orchestration-based Pipeline is divided into the following steps: 1. Check out the code in this step, use the Git plug-in to check out the developed code.
stage('Check out')
gitUrl = "git@gitlab.xxxx.com:xxx.git"
git branch: "dev", changelog: false, credentialsId: "deploy-key", url: gitUrl 
2. Build Java code with Maven because we use Spring Boot framework, the product should be an executable jar package.
stage('Build')
sh "${mvnHome}/bin/mvn -U clean install"
3. Static code analysis the Sonar Scaner plug-in notifies Sonar to perform a static scan on the code library.
stage('SonarQube analysis')
// requires SonarQube Scanner 2.8+
def scannerHome = tool 'SonarQube.Scanner-2.8';
withSonarQubeEnv('SonarQube-Prod') {
 sh "${scannerHome}/bin/sonar-scanner -e -Dsonar.links.scm=${gitUrl} -Dsonar.sources=. -Dsonar.test.exclusions=file:**/src/test/java/** -Dsonar.exclusions=file:**/src/test/java/** -Dsonar.language=java -Dsonar.projectVersion=1.${BUILD_NUMBER} -Dsonar.projectKey=lms-barrages -Dsonar.projectDescription=0000000-00000 -Dsonar.java.source=8 -Dsonar.projectName=xxx"
} 
4. Create a Docker image in this step, the Docker Pipeline plug-in is called to import the jar package, the configuration file, and the third-party dependency package into the Docker image through a pre-written Dockerfile, and upload them to the private Docker image repository.
stage('Build image')
docker.withRegistry('https://dockerhub.xxx.com', 'dockerhub-login') {
docker.build('dockerhub.xxx.com/xxxx').push('test') //test是tag名称
} 
5. Deploy the Test environment use the Marathon plug-in to notify the Marathon cluster of the prepared deployment file and deploy the generated image in the Test environment.
stage('Deploy on Test')
sh "mkdir -pv deploy"
dir("./deploy") {
    git branch: 'dev', changelog: false, credentialsId: 'deploy-key', url: 'git@gitlab.xxx.com:lms/xxx-deploy.git'
    //Get the right marathon url
    marathon_url="http://marathon-qa"
    marathon docker: imageName, dockerForcePull: true, forceUpdate: true, url: marathon_url, filename: "qa-deploy.json"
} 
6. Automated Testing run the automated test script written by the test personnel in advance to check whether the program is running normally.
stage('Test')
// 下载测试用例代码
git branch: 'dev', changelog: false, credentialsId: 'deploy-key', url: 'git@gitlab.xxx.com:lms/xxx-test.git'
parallel(autoTests: {
    // 使用nosetests 运行测试用例
    sh "docker run -it --rm -v $PWD:/code nosetests nosetests -s -v -c conf\run\api_test.cfg --attr safeControl=1"
},manualTests:{
    sleep 30000
}) 
7. Manual test if you are not at ease with automated testing, you can select end Pipeline to perform manual testing. To illustrate the entire process, skip manual testing. 8. Deploy the production environment after all the tests are passed, the Pipeline automatically publish the production environment.
stage('Deploy on Prod')
input "Do tests OK?"
dir("./deploy") {
    //Get the right marathon url
    marathon_url="http://marathon-prod"
    marathon docker: imageName, dockerForcePull: true, forceUpdate: true, url: marathon_url, filename: "prod-deploy.json"
} 
Finally, let's take a look at the whole Pipeline process:

container Orchestration configuration documentation

when introducing agile development, we introduced deploying to different environments according to the configuration parameters of different environments. How can I tell the deployment program what configuration file is used to start the service, and how much CPU, memory, and instance is used in each environment? The following describes the configuration file of container orchestration. Because we use the container orchestration method of Mesos + Marathon, the important task of deployment has changed from writing a deployment script to writing a Marathon configuration. The content is as follows:
{
"id": "/appName",
"cpus": 2,
"mem": 2048.0,
"instances": 2,
"args": [
"--spring.profiles.active=qa"
],
"labels": {
"HAPROXY_GROUP": "external",
"HAPROXY_0_VHOST": "xxx.hujiang.com"
},
"container": {
"type": "DOCKER",
"docker": {
  "image": "imageName",
  "network": "USER",
  "forcePullImage": true,
  "portMappings": [
    {
      "containerPort": 12345,
      "hostPort": 0,
      "protocol": "tcp",
      "servicePort": 12345
    }
  ]
},
"volumes": [
  {
    "containerPath": "/app/log",
    "hostPath": "/home/logs/appName",
    "mode": "RW"
  }
]
},
"ipAddress": {
"networkName": "calico-net"
},
"healthChecks": [
{
  "gracePeriodSeconds": 300,
  "ignoreHttp1xx": true,
  "intervalSeconds": 20,
  "maxConsecutiveFailures": 3,
  "path": "/health_check",
  "portIndex": 0,
  "protocol": "HTTP",
  "timeoutSeconds": 20
}
],
"uris": [
"file:///etc/docker.tar.gz"
]
} 
we save the configuration content as different Json files, and each corresponding environment has a set of configuration files. For example, Marathon-qa.json,Marathon-prod.json. When the Pipeline is deployed, you can use the Jenkins Marathon plug-in to call the Deployment configuration according to different environments.

Separation and management of automated processes and deployment online

development and deployment are so simple and fast, can everyone use them conveniently? The answer is no, not because of technical difficulties, but because of authority. Ideally, after submitting the code, you can see that your code has been used by tens of millions of users after having a cup of coffee. However, the risk is too high. Not everyone can have bugs like Rambo. In most cases, we need to use specifications and procedures to restrict them. Just as automated testing cannot replace manual black-box testing, you cannot directly go to the production environment after the deployment test. After the test passes, you still need to manually confirm and deploy the production process. Therefore, we need to separate the automation process and the final deployment and launch work into two jobs, and assign permissions to the latter separately so that the authorized person can do the final deployment work. This person can be a Team leader, development manager, or O & M partner, depending on the organizational structure of the company. What is the specific purpose of this deployed Job? In the container orchestration era, the deployment Job does not start with code compilation, but a fully tested and passed image version, deploy the Marathon Plugin to the production line environment. Here is an example of Deploy_only:
node('docker-qa'){
if (ReleaseVersion ==""){
    echo "发布版本不能为空"
    return
}
stage "Prepare image"
    def moduleName = "${ApplicationModule}".toLowerCase()
    def resDockerImage = imageName + ":latest"
    def desDockerImage = imageName + ":${ReleaseVersion}"
    if (GenDockerVersion =="true"){
        sh "docker pull ${resDockerImage}"
        sh "docker tag ${resDockerImage} ${desDockerImage}"
        sh "docker push ${desDockerImage}"
        sh "docker rmi -f ${resDockerImage} ${desDockerImage}"
    }

stage "Deploy on Mesos"
    git branch: 'dev', changelog: false, credentialsId: 'deploy-key', url: 'git@gitlab.xxx.com:lms/xxx-test.git'  
    //Get the right marathon url
    echo "DeployDC: " + DeployDC
    marathon_url = ""
    if (DeployDC=="AA") {
        if (DeployEnv == "prod"){
          input "Are you sure to deploy to production?"
          marathon_url = "${marathon_AA_prod}"
        }else if (DeployEnv == "yz") {
          marathon_url = "${marathon_AA_yz}"
        }
    }else if ("${DeployDC}"=="BB"){
      if ("${DeployEnv}" == "prod"){
        input "Are you sure to deploy to production?"
        marathon_url = "${marathon_BB_prod}"
      }else if ("${DeployEnv}" == "yz") {
        marathon_url = "${marathon_BB_yz}"
      }
    }
    marathon docker: imageName, dockerForcePull: true, forceUpdate: true, url: marathon_url, filename: "${DeployEnv}-deploy.json"
} 
why not put this file under scripts along with the application project? After the deployment and application are separated, two groups of people can maintain the deployment and application, taking into account the organizational structure of the company.

Automated O & M

the final phase of DevOps is the O & M phase. In the container age, how to perform O & M on large images? Our goal is to implement automated O & M as much as possible. Here we mainly describe two points:

container monitoring

container monitoring can be performed in two ways: install other services on the physical server to monitor all containers on the local machine, and monitor the container status through Mesos or Kubernates APIs. Both methods require the corresponding monitoring software or Agent to be installed on the physical machine. Currently, our team uses a combination of cAdvisor, InfluxDB, and Grafana to monitor containers. You must install cAdvisor on all the agents in the Mesos cluster. He is responsible for bar on host all running containers data to data points (data point) sent time series database (InfluxDB), the following is cAdvisor surveillance some of the data point: these data points are Grafana and displayed on the interface, so that we can master the performance indicators of specific containers. The following is a screenshot of the Grafana: in addition to monitoring the container itself, host monitoring is also essential. Because there are many monitoring points, we will not give examples here.

Auto Scaling

monitoring metrics are only the first step in automated O & M. When a large number of business requests increase or decrease, manual monitoring cannot be performed in a timely manner, besides, there may not be so many people, 7 × 24 hours monitoring. You must have a self-scaling mechanism based on the monitoring data. In the learning product line, we have developed a set of automatic scale-out microservices for applications based on the Mesos + Marathon framework of container orchestration. The principle is as follows:
  • notify the Restful of the application services to be monitored through the AutoScaler interface.
  • AutoScaler program starts to read Metrics data about applications deployed on each Agent, including CPU and memory usage.
  • When an application is found to be too busy (mostly in the form of CPU usage or memory usage), the Marathon API is called to scale it out.
  • After receiving the message, the Marathon immediately notifies the Mesos cluster to release new applications to ease the current busy situation.

Concluding Remarks

DevOps and SRE are not desired but unreachable concepts. They need to be implemented in different environments. The entire DevOps process is based on container orchestration. The purpose is to simplify the process and implement automated CI/CD and automated O & M. There are many unexpected things, which may not be suitable for complex scenarios. Secondly, the examples in this article also have corresponding privacy processing, which may not be used directly. We hope that you can refine the DevOps process that suits you based on the successes and problems we have encountered in practice.

Original link: The Dev/Ops process based on container orchestration (by Huang Kai)

the original text was published: 2017-10-01

author: huang Kai

this article is from Dockerone.io, a partner of Yunqi community. For more information, see Dockerone.io.

Original title: the Dev/Ops process based on container orchestration in Hujiang

Selected, One-Stop Store for Enterprise Applications
Support various scenarios to meet companies' needs at different stages of development

Start Building Today with a Free Trial to 50+ Products

Learn and experience the power of Alibaba Cloud.

Sign Up Now