All Products
Search
Document Center

Container Compute Service:Use CRaC to accelerate the startup of Java applications

Last Updated:Feb 12, 2025

Java applications usually occupy large amounts of resources to load classes and perform JIT compiling during startups. If a CrashBackOff event occurs after the application starts up, the application needs to restart and reload data again. This can cause business interruptions in a production environment. Container Compute Service (ACS) not only allows you to accelerate the startup of Java applications based on in-place scaling but also supports Coordinated Restore at Checkpoint (CRaC). This topic describes how CRaC is used to accelerate the startup of Java applications and the use scenarios.

Background information

CRaC is an open source technology developed to accelerate the startup of Java applications. It is suitable for large applications or microservices applications. This technology saves and restores the data and states of programs at certain checkpoints to reduce the amount of time required for reloading and recreating applications.

How CRaC works

Checkpoints are introduced in CRaC to restore applications. CRaC creates a process snapshot for each JVM by persisting the current status of the JVM, including the context and memory status. The snapshot is known as a checkpoint. When the current business process exceptionally exits, it can be restored to the previous state from the snapshot. Restoring applications to checkpoints is much faster than cold start.

CRaC creates checkpoints and restores applications based on Checkpoint and Restore in Userspace (CRIU), which is suitable for Linux. CRaC is adjusted and enhanced based on CRIU and can be used with container in-place scaling to further accelerate the startup of Java applications. The following figure shows the architecture.

image

Benefits

Using CRaC in ACS provides the following benefits:

  • Simplicity: CRaC greatly simplifies the procedure for launching ACS applications.

  • Standardization: CRaC ensure the compatibility and extensibility of ACS applications.

  • Elasticity: Intelligent scheduling and resource optimization are used to ensure the stability and performance of applications with different loads.

Usage notes

  • JDK version limit: You need to replace the JDK with the latest version of Alibaba Dragonwell 11 or a JDK that supports CRaC.

  • Checkpoint setting: Make sure that your development team has deep knowledge about CRaC because they need to set checkpoints in the application code or script. For more information, see CRaC library.

    • No operation is needed if the application states earlier than the specified checkpoint are reusable.

    • If the application states earlier than the specified checkpoint are not reusable, you need to use callbacks to restore the state data or business logic.

  • To work with in-place scaling, you can configure JVM parameters for the application to maximize startup acceleration. For more information, see Configure JVM parameters to accelerate the startup of Java applications.

Scenarios

Note

To perform this operation, you must submit a ticket to enable the privileged mode for the ACS pod.

Scenario 1: Launch applications with and without CRaC

In this scenario, the Java application named Spring-Petclinic is launched without in-place scaling. This example compares the results of application startups before and after CRaC is used.

  1. Create a file named spring-petclinic-crac.yaml based on the following content.

    Show the YAML file content

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: spring-petclinic-crac
      namespace: default
    spec:
      progressDeadlineSeconds: 600
      replicas: 1
      revisionHistoryLimit: 10
      selector:
        matchLabels:
          app: spring-petclinic-crac
      template:
        metadata:
          # annotations:  Disable in-place scaling.
          #   scaling.alibabacloud.com/enable-inplace-resource-resize: "true"
          creationTimestamp: null
          labels:
            alibabacloud.com/compute-class: general-purpose
            alibabacloud.com/compute-qos: default
            app: spring-petclinic-crac
        spec:
          containers:
          - env:
            # Enable CRaC checkpoints.
            - name: DO_CRAC_CHECKPOINT
              value: "true"
            # Specify the checkpoint path. We recommend that you specify an emptyDir if you use a container environment.
            - name: CRAC_IMAGE_DIR
              value: /home/crac
            image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/quickstart-acs-petclinic-demo:alpha.3
            imagePullPolicy: IfNotPresent
            name: crac-container
            securityContext:
              privileged: true
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
            resources:
              requests:
                cpu: "500m"          
                memory: "1Gi"        
              limits:
                cpu: "500m"          
                memory: "1Gi"
            volumeMounts:
            - mountPath: /home/crac
              name: crac-cache-volume
          restartPolicy: Always
          schedulerName: default-scheduler
          volumes:
          - emptyDir: {}
            name: crac-cache-volume
  2. Create a file named spring-petclinic.yaml based on the following content.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: spring-petclinic
      namespace: default
    spec:
      progressDeadlineSeconds: 600
      replicas: 1
      revisionHistoryLimit: 10
      selector:
        matchLabels:
          app: spring-petclinic
      template:
        metadata:
          # annotations:  Disable in-place scaling.
          #   scaling.alibabacloud.com/enable-inplace-resource-resize: "true"
          creationTimestamp: null
          labels:
            alibabacloud.com/compute-class: general-purpose
            alibabacloud.com/compute-qos: default
            app: spring-petclinic
        spec:
          containers:
          - name: crac-container
            image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/quickstart-acs-petclinic-demo:alpha.3
            imagePullPolicy: IfNotPresent
            securityContext:
              privileged: true
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
            resources:
              requests:
                cpu: "500m"          
                memory: "1Gi"        
              limits:
                cpu: "500m"          
                memory: "1Gi"
          restartPolicy: Always
          schedulerName: default-scheduler
  3. Deploy the application.

    kubectl apply -f spring-petclinic-crac.yaml && kubectl apply -f spring-petclinic.yaml
  4. Verify the startup speed.

    1. View the status of the Deployments.

      kubectl get pod | grep spring-petclinic

      Expected results:

      spring-petclinic-64cb7xxxxx-xxxxx        1/1     Running   0   110m
      spring-petclinic-crac-574cdxxxxx-xxxxx   1/1     Running   0   47m
    2. View the startup logs of the Deployments.

      kubectl logs spring-petclinic-64cb7xxxxx-xxxxx --tail=5 && \
      echo -e "\033[31m↑↑↑ No crac ↑↑↑\033[0m-------------------\033[32m↓↓↓ With crac ↓↓↓\033[0m" && \
      kubectl logs spring-petclinic-crac-574cdxxxxx-xxxxx --tail=5

      Expected results:

      2025-01-21 06:50:34.521  INFO 9 --- [           main] o.s.b.a.e.web.EndpointLinksResolver      : Exposing 13 endpoint(s) beneath base path '/actuator'
      2025-01-21 06:50:35.035  INFO 9 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port(s): 8080 (http) with context path ''
      2025-01-21 06:50:35.036  INFO 9 --- [           main] DeferredRepositoryInitializationListener : Triggering deferred initialization of Spring Data repositories…
      2025-01-21 06:50:37.022  INFO 9 --- [           main] DeferredRepositoryInitializationListener : Spring Data repositories initialized!
      2025-01-21 06:50:37.098  INFO 9 --- [           main] o.s.s.petclinic.PetClinicApplication     : Started PetClinicApplication in 26.587 seconds (JVM running for 28.57)
      ↑↑↑ No crac ↑↑↑-------------------↓↓↓ With crac ↓↓↓
      2025-01-21 06:50:38.312  INFO 109 --- [           main] o.s.b.a.e.web.EndpointLinksResolver      : Exposing 13 endpoint(s) beneath base path '/actuator'
      2025-01-21 06:50:38.628  INFO 109 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port(s): 8080 (http) with context path ''
      2025-01-21 06:50:38.629  INFO 109 --- [           main] DeferredRepositoryInitializationListener : Triggering deferred initialization of Spring Data repositories…
      2025-01-21 06:50:40.700  INFO 109 --- [           main] DeferredRepositoryInitializationListener : Spring Data repositories initialized!
      2025-01-21 06:50:40.792  INFO 109 --- [           main] o.s.s.petclinic.PetClinicApplication     : Started PetClinicApplication in 27.941 seconds (JVM running for 31.305)
    3. Create a checkpoint for the Deployment that uses CRaC.

      kubectl exec -it spring-petclinic-crac-574cdxxxxx-xxxxx  -- sh -c './checkpoint.sh'
      Note

      The operation for creating checkpoints is listed separately only for demonstration. In the actual production environment, you need to automate checkpoint creation for your application pods.

    4. Simulate an exceptional exit.

      kubectl exec -it spring-petclinic-64cb7xxxxx-xxxxx -- sh -c 'pid=`ps -ef | pgrep java` && kill -9 $pid' 
      kubectl exec -it spring-petclinic-crac-574cdxxxxx-xxxxx -- sh -c 'pid=`ps -ef | pgrep java` && kill -9 $pid'
    5. View the logs of the Deployments.

      kubectl logs spring-petclinic-64cb7xxxxx-xxxxx --tail=5 && \
      echo -e "\033[31m↑↑↑ No crac ↑↑↑\033[0m-------------------\033[32m↓↓↓ With crac ↓↓↓\033[0m" && \
      kubectl exec -it  spring-petclinic-crac-574cdxxxxx-xxxxx -- sh -c 'cat /home/app/app_start.log'
      Note

      No log is generated for the application that is restored from the checkpoint. Therefore, a built-in function is provided in the container image to calculate the restore time. In this example, the log that stores the output of the function is printed.

      Expected results:

      2025-01-21 02:32:36.254  INFO 9 --- [           main] o.s.b.a.e.web.EndpointLinksResolver      : Exposing 13 endpoint(s) beneath base path '/actuator'
      2025-01-21 02:32:36.821  INFO 9 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port(s): 8080 (http) with context path ''
      2025-01-21 02:32:36.822  INFO 9 --- [           main] DeferredRepositoryInitializationListener : Triggering deferred initialization of Spring Data repositories…
      2025-01-21 02:32:38.858  INFO 9 --- [           main] DeferredRepositoryInitializationListener : Spring Data repositories initialized!
      2025-01-21 02:32:38.950  INFO 9 --- [           main] o.s.s.petclinic.PetClinicApplication     : Started PetClinicApplication in 26.558 seconds (JVM running for 28.644)
      ↑↑↑ No crac ↑↑↑-------------------↓↓↓ With crac ↓↓↓
      Checking application start at Thu Jan 21 02:32:54 UTC 2025
      Start PetClinic Cost : 417 ms
      Application started successfully at Thu Jan 21 02:32:54 UTC 2025
      ===========================================

      The first-time startups of the two Deployments cost almost the same amount of time. However, the restart of the Deployment that uses CRaC is much faster than the other Deployment.

Scenario 2: Use CRaC to launch application with and without in-place scaling

In this scenario, CRaC is used to accelerate the startup of a Java application. This example compares the results of application startups before and after in-place scaling is used.

  1. Create a file named spring-petclinic-crac-resize.yaml based on the following content.

    Show the YAML file content

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: spring-petclinic-crac-resize
      namespace: default
    spec:
      progressDeadlineSeconds: 600
      replicas: 1
      revisionHistoryLimit: 10
      selector:
        matchLabels:
          app: spring-petclinic-crac-resize
      template:
        metadata:
          annotations:  
            scaling.alibabacloud.com/enable-inplace-resource-resize: "true" # Enable in-place scaling.
            alibabacloud.com/startup-cpu-burst-factor: '2' #Set the CPU Burst factor to 2.
            alibabacloud.com/startup-cpu-burst-duration-seconds: "30" #If you do not specify this annotation, the pod is scaled down 30 seconds after the pod is ready.
          creationTimestamp: null
          labels:
            alibabacloud.com/compute-class: general-purpose
            alibabacloud.com/compute-qos: default
            app: spring-petclinic-crac-resize
        spec:
          containers:
          - env:
            # Enable CRaC checkpoints.
            - name: DO_CRAC_CHECKPOINT
              value: "true"
            # Specify the checkpoint path. We recommend that you specify an emptyDir if you use a container environment.
            - name: CRAC_IMAGE_DIR
              value: /home/crac
            image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/quickstart-acs-petclinic-demo:alpha.3
            imagePullPolicy: IfNotPresent
            name: crac-container
            securityContext:
              privileged: true
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
            resources:
              requests:
                cpu: "500m"          
                memory: "1Gi"        
              limits:
                cpu: "500m"          
                memory: "1Gi"
            volumeMounts:
            - mountPath: /home/crac
              name: crac-cache-volume
            readinessProbe:
                tcpSocket:
                  port: 8080
                initialDelaySeconds: 20
                periodSeconds: 10
          restartPolicy: Always
          schedulerName: default-scheduler
          volumes:
          - emptyDir: {}
            name: crac-cache-volume

    The spring-petclinic-crac.yaml file in Scenario 1 is used in this example for comparison.

  2. Deploy the application.

    kubectl apply -f spring-petclinic-crac-resize.yaml
  3. Verify the startup speed.

    1. View the status of the Deployments.

      kubectl get pod | grep spring-petclinic-crac

      Expected results:

      spring-petclinic-crac-574cdxxxxx-xxxxx          1/1     Running   0          29m
      spring-petclinic-crac-resize-6474cxxxxx-xxxxx   1/1     Running   0          32m
    2. View the startup logs of the Deployments.

      kubectl logs spring-petclinic-crac-574cdxxxxx-xxxxx --tail=5 && \
      echo -e "\033[31m↑↑↑ No resize ↑↑↑\033[0m-------------------\033[32m↓↓↓ With resize ↓↓↓\033[0m" && \
      kubectl logs spring-petclinic-crac-resize-6474cxxxxx-xxxxx --tail=5

      Expected results:

      2025-01-23 05:50:16.564  INFO 109 --- [           main] o.s.b.a.e.web.EndpointLinksResolver      : Exposing 13 endpoint(s) beneath base path '/actuator'
      2025-01-23 05:50:17.346  INFO 109 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port(s): 8080 (http) with context path ''
      2025-01-23 05:50:17.347  INFO 109 --- [           main] DeferredRepositoryInitializationListener : Triggering deferred initialization of Spring Data repositories…
      2025-01-23 05:50:19.848  INFO 109 --- [           main] DeferredRepositoryInitializationListener : Spring Data repositories initialized!
      2025-01-23 05:50:19.936  INFO 109 --- [           main] o.s.s.petclinic.PetClinicApplication     : Started PetClinicApplication in 38.912 seconds (JVM running for 43.614)
      ↑↑↑ No resize ↑↑↑-------------------↓↓↓ With resize ↓↓↓
      2025-01-23 05:48:28.334  INFO 108 --- [           main] o.s.b.a.e.web.EndpointLinksResolver      : Exposing 13 endpoint(s) beneath base path '/actuator'
      2025-01-23 05:48:28.793  INFO 108 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port(s): 8080 (http) with context path ''
      2025-01-23 05:48:28.794  INFO 108 --- [           main] DeferredRepositoryInitializationListener : Triggering deferred initialization of Spring Data repositories…
      2025-01-23 05:48:29.940  INFO 108 --- [           main] DeferredRepositoryInitializationListener : Spring Data repositories initialized!
      2025-01-23 05:48:29.981  INFO 108 --- [           main] o.s.s.petclinic.PetClinicApplication     : Started PetClinicApplication in 19.449 seconds (JVM running for 22.339)

      The output indicates that in-place scaling greatly reduces the first-time startup time of the application.

    3. Create a checkpoint for the Deployments.

      kubectl exec -it spring-petclinic-crac-574cdxxxxx-xxxxx  -- sh -c './checkpoint.sh' 
      kubectl exec -it spring-petclinic-crac-resize-6474cxxxxx-xxxxx  -- sh -c './checkpoint.sh'
    4. Simulate an exceptional exit.

      kubectl exec -it spring-petclinic-crac-574cdxxxxx-xxxxx -- sh -c 'pid=`ps -ef | pgrep java` && kill -9 $pid' 
      kubectl exec -it spring-petclinic-crac-resize-6474cxxxxx-xxxxx -- sh -c 'pid=`ps -ef | pgrep java` && kill -9 $pid'
    5. View the logs of the Deployments.

      kubectl exec -it spring-petclinic-crac-574cdxxxxx-xxxxx -- sh -c 'cat /home/app/app_start.log' && \
      echo -e "\033[31m↑↑↑ No crac ↑↑↑\033[0m-------------------\033[32m↓↓↓ With crac ↓↓↓\033[0m" && \
      kubectl exec -it spring-petclinic-crac-resize-6474cxxxxx-xxxxx -- sh -c 'cat /home/app/app_start.log'

      Expected results:

      Checking application start at Thu Jan 23 05:56:34 UTC 2025
      Start PetClinic Cost : 440 ms
      Application started successfully at Thu Jan 23 05:56:34 UTC 2025
      ===========================================
      ↑↑↑ No resize ↑↑↑-------------------↓↓↓ With resize ↓↓↓
      Checking application start at Thu Jan 23 05:56:45 UTC 2025
      Start PetClinic Cost : 349 ms
      Application started successfully at Thu Jan 23 05:56:46 UTC 2025
      ===========================================

      The restart of the Deployment that has in-place scaling enabled is much faster than the other Deployment.