Java applications usually occupy large amounts of resources to load classes and perform JIT compiling during startups. If a CrashBackOff event occurs after the application starts up, the application needs to restart and reload data again. This can cause business interruptions in a production environment. Container Compute Service (ACS) not only allows you to accelerate the startup of Java applications based on in-place scaling but also supports Coordinated Restore at Checkpoint (CRaC). This topic describes how CRaC is used to accelerate the startup of Java applications and the use scenarios.
Background information
CRaC is an open source technology developed to accelerate the startup of Java applications. It is suitable for large applications or microservices applications. This technology saves and restores the data and states of programs at certain checkpoints to reduce the amount of time required for reloading and recreating applications.
How CRaC works
Checkpoints are introduced in CRaC to restore applications. CRaC creates a process snapshot for each JVM by persisting the current status of the JVM, including the context and memory status. The snapshot is known as a checkpoint. When the current business process exceptionally exits, it can be restored to the previous state from the snapshot. Restoring applications to checkpoints is much faster than cold start.
CRaC creates checkpoints and restores applications based on Checkpoint and Restore in Userspace (CRIU), which is suitable for Linux. CRaC is adjusted and enhanced based on CRIU and can be used with container in-place scaling to further accelerate the startup of Java applications. The following figure shows the architecture.
Benefits
Using CRaC in ACS provides the following benefits:
Simplicity: CRaC greatly simplifies the procedure for launching ACS applications.
Standardization: CRaC ensure the compatibility and extensibility of ACS applications.
Elasticity: Intelligent scheduling and resource optimization are used to ensure the stability and performance of applications with different loads.
Usage notes
JDK version limit: You need to replace the JDK with the latest version of Alibaba Dragonwell 11 or a JDK that supports CRaC.
Checkpoint setting: Make sure that your development team has deep knowledge about CRaC because they need to set checkpoints in the application code or script. For more information, see CRaC library.
No operation is needed if the application states earlier than the specified checkpoint are reusable.
If the application states earlier than the specified checkpoint are not reusable, you need to use callbacks to restore the state data or business logic.
To work with in-place scaling, you can configure JVM parameters for the application to maximize startup acceleration. For more information, see Configure JVM parameters to accelerate the startup of Java applications.
Scenarios
To perform this operation, you must submit a ticket to enable the privileged mode for the ACS pod.
Scenario 1: Launch applications with and without CRaC
In this scenario, the Java application named Spring-Petclinic is launched without in-place scaling. This example compares the results of application startups before and after CRaC is used.
Create a file named spring-petclinic-crac.yaml based on the following content.
Create a file named spring-petclinic.yaml based on the following content.
apiVersion: apps/v1 kind: Deployment metadata: name: spring-petclinic namespace: default spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: app: spring-petclinic template: metadata: # annotations: Disable in-place scaling. # scaling.alibabacloud.com/enable-inplace-resource-resize: "true" creationTimestamp: null labels: alibabacloud.com/compute-class: general-purpose alibabacloud.com/compute-qos: default app: spring-petclinic spec: containers: - name: crac-container image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/quickstart-acs-petclinic-demo:alpha.3 imagePullPolicy: IfNotPresent securityContext: privileged: true terminationMessagePath: /dev/termination-log terminationMessagePolicy: File resources: requests: cpu: "500m" memory: "1Gi" limits: cpu: "500m" memory: "1Gi" restartPolicy: Always schedulerName: default-schedulerDeploy the application.
kubectl apply -f spring-petclinic-crac.yaml && kubectl apply -f spring-petclinic.yamlVerify the startup speed.
View the status of the Deployments.
kubectl get pod | grep spring-petclinicExpected results:
spring-petclinic-64cb7xxxxx-xxxxx 1/1 Running 0 110m spring-petclinic-crac-574cdxxxxx-xxxxx 1/1 Running 0 47mView the startup logs of the Deployments.
kubectl logs spring-petclinic-64cb7xxxxx-xxxxx --tail=5 && \ echo -e "\033[31m↑↑↑ No crac ↑↑↑\033[0m-------------------\033[32m↓↓↓ With crac ↓↓↓\033[0m" && \ kubectl logs spring-petclinic-crac-574cdxxxxx-xxxxx --tail=5Expected results:
2025-01-21 06:50:34.521 INFO 9 --- [ main] o.s.b.a.e.web.EndpointLinksResolver : Exposing 13 endpoint(s) beneath base path '/actuator' 2025-01-21 06:50:35.035 INFO 9 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 8080 (http) with context path '' 2025-01-21 06:50:35.036 INFO 9 --- [ main] DeferredRepositoryInitializationListener : Triggering deferred initialization of Spring Data repositories… 2025-01-21 06:50:37.022 INFO 9 --- [ main] DeferredRepositoryInitializationListener : Spring Data repositories initialized! 2025-01-21 06:50:37.098 INFO 9 --- [ main] o.s.s.petclinic.PetClinicApplication : Started PetClinicApplication in 26.587 seconds (JVM running for 28.57) ↑↑↑ No crac ↑↑↑-------------------↓↓↓ With crac ↓↓↓ 2025-01-21 06:50:38.312 INFO 109 --- [ main] o.s.b.a.e.web.EndpointLinksResolver : Exposing 13 endpoint(s) beneath base path '/actuator' 2025-01-21 06:50:38.628 INFO 109 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 8080 (http) with context path '' 2025-01-21 06:50:38.629 INFO 109 --- [ main] DeferredRepositoryInitializationListener : Triggering deferred initialization of Spring Data repositories… 2025-01-21 06:50:40.700 INFO 109 --- [ main] DeferredRepositoryInitializationListener : Spring Data repositories initialized! 2025-01-21 06:50:40.792 INFO 109 --- [ main] o.s.s.petclinic.PetClinicApplication : Started PetClinicApplication in 27.941 seconds (JVM running for 31.305)Create a checkpoint for the Deployment that uses CRaC.
kubectl exec -it spring-petclinic-crac-574cdxxxxx-xxxxx -- sh -c './checkpoint.sh'NoteThe operation for creating checkpoints is listed separately only for demonstration. In the actual production environment, you need to automate checkpoint creation for your application pods.
Simulate an exceptional exit.
kubectl exec -it spring-petclinic-64cb7xxxxx-xxxxx -- sh -c 'pid=`ps -ef | pgrep java` && kill -9 $pid' kubectl exec -it spring-petclinic-crac-574cdxxxxx-xxxxx -- sh -c 'pid=`ps -ef | pgrep java` && kill -9 $pid'View the logs of the Deployments.
kubectl logs spring-petclinic-64cb7xxxxx-xxxxx --tail=5 && \ echo -e "\033[31m↑↑↑ No crac ↑↑↑\033[0m-------------------\033[32m↓↓↓ With crac ↓↓↓\033[0m" && \ kubectl exec -it spring-petclinic-crac-574cdxxxxx-xxxxx -- sh -c 'cat /home/app/app_start.log'NoteNo log is generated for the application that is restored from the checkpoint. Therefore, a built-in function is provided in the container image to calculate the restore time. In this example, the log that stores the output of the function is printed.
Expected results:
2025-01-21 02:32:36.254 INFO 9 --- [ main] o.s.b.a.e.web.EndpointLinksResolver : Exposing 13 endpoint(s) beneath base path '/actuator' 2025-01-21 02:32:36.821 INFO 9 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 8080 (http) with context path '' 2025-01-21 02:32:36.822 INFO 9 --- [ main] DeferredRepositoryInitializationListener : Triggering deferred initialization of Spring Data repositories… 2025-01-21 02:32:38.858 INFO 9 --- [ main] DeferredRepositoryInitializationListener : Spring Data repositories initialized! 2025-01-21 02:32:38.950 INFO 9 --- [ main] o.s.s.petclinic.PetClinicApplication : Started PetClinicApplication in 26.558 seconds (JVM running for 28.644) ↑↑↑ No crac ↑↑↑-------------------↓↓↓ With crac ↓↓↓ Checking application start at Thu Jan 21 02:32:54 UTC 2025 Start PetClinic Cost : 417 ms Application started successfully at Thu Jan 21 02:32:54 UTC 2025 ===========================================The first-time startups of the two Deployments cost almost the same amount of time. However, the restart of the Deployment that uses CRaC is much faster than the other Deployment.
Scenario 2: Use CRaC to launch application with and without in-place scaling
In this scenario, CRaC is used to accelerate the startup of a Java application. This example compares the results of application startups before and after in-place scaling is used.
Create a file named spring-petclinic-crac-resize.yaml based on the following content.
The spring-petclinic-crac.yaml file in Scenario 1 is used in this example for comparison.
Deploy the application.
kubectl apply -f spring-petclinic-crac-resize.yamlVerify the startup speed.
View the status of the Deployments.
kubectl get pod | grep spring-petclinic-cracExpected results:
spring-petclinic-crac-574cdxxxxx-xxxxx 1/1 Running 0 29m spring-petclinic-crac-resize-6474cxxxxx-xxxxx 1/1 Running 0 32mView the startup logs of the Deployments.
kubectl logs spring-petclinic-crac-574cdxxxxx-xxxxx --tail=5 && \ echo -e "\033[31m↑↑↑ No resize ↑↑↑\033[0m-------------------\033[32m↓↓↓ With resize ↓↓↓\033[0m" && \ kubectl logs spring-petclinic-crac-resize-6474cxxxxx-xxxxx --tail=5Expected results:
2025-01-23 05:50:16.564 INFO 109 --- [ main] o.s.b.a.e.web.EndpointLinksResolver : Exposing 13 endpoint(s) beneath base path '/actuator' 2025-01-23 05:50:17.346 INFO 109 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 8080 (http) with context path '' 2025-01-23 05:50:17.347 INFO 109 --- [ main] DeferredRepositoryInitializationListener : Triggering deferred initialization of Spring Data repositories… 2025-01-23 05:50:19.848 INFO 109 --- [ main] DeferredRepositoryInitializationListener : Spring Data repositories initialized! 2025-01-23 05:50:19.936 INFO 109 --- [ main] o.s.s.petclinic.PetClinicApplication : Started PetClinicApplication in 38.912 seconds (JVM running for 43.614) ↑↑↑ No resize ↑↑↑-------------------↓↓↓ With resize ↓↓↓ 2025-01-23 05:48:28.334 INFO 108 --- [ main] o.s.b.a.e.web.EndpointLinksResolver : Exposing 13 endpoint(s) beneath base path '/actuator' 2025-01-23 05:48:28.793 INFO 108 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 8080 (http) with context path '' 2025-01-23 05:48:28.794 INFO 108 --- [ main] DeferredRepositoryInitializationListener : Triggering deferred initialization of Spring Data repositories… 2025-01-23 05:48:29.940 INFO 108 --- [ main] DeferredRepositoryInitializationListener : Spring Data repositories initialized! 2025-01-23 05:48:29.981 INFO 108 --- [ main] o.s.s.petclinic.PetClinicApplication : Started PetClinicApplication in 19.449 seconds (JVM running for 22.339)The output indicates that in-place scaling greatly reduces the first-time startup time of the application.
Create a checkpoint for the Deployments.
kubectl exec -it spring-petclinic-crac-574cdxxxxx-xxxxx -- sh -c './checkpoint.sh' kubectl exec -it spring-petclinic-crac-resize-6474cxxxxx-xxxxx -- sh -c './checkpoint.sh'Simulate an exceptional exit.
kubectl exec -it spring-petclinic-crac-574cdxxxxx-xxxxx -- sh -c 'pid=`ps -ef | pgrep java` && kill -9 $pid' kubectl exec -it spring-petclinic-crac-resize-6474cxxxxx-xxxxx -- sh -c 'pid=`ps -ef | pgrep java` && kill -9 $pid'View the logs of the Deployments.
kubectl exec -it spring-petclinic-crac-574cdxxxxx-xxxxx -- sh -c 'cat /home/app/app_start.log' && \ echo -e "\033[31m↑↑↑ No crac ↑↑↑\033[0m-------------------\033[32m↓↓↓ With crac ↓↓↓\033[0m" && \ kubectl exec -it spring-petclinic-crac-resize-6474cxxxxx-xxxxx -- sh -c 'cat /home/app/app_start.log'Expected results:
Checking application start at Thu Jan 23 05:56:34 UTC 2025 Start PetClinic Cost : 440 ms Application started successfully at Thu Jan 23 05:56:34 UTC 2025 =========================================== ↑↑↑ No resize ↑↑↑-------------------↓↓↓ With resize ↓↓↓ Checking application start at Thu Jan 23 05:56:45 UTC 2025 Start PetClinic Cost : 349 ms Application started successfully at Thu Jan 23 05:56:46 UTC 2025 ===========================================The restart of the Deployment that has in-place scaling enabled is much faster than the other Deployment.