Configure liveness, readiness, and startup probes using HTTP GET, TCP socket, or custom commands to monitor container health and route traffic away from failed instances automatically.
Limitations
Health checks are available only for services deployed using custom images that include health check logic.
How it works
EAS uses the Kubernetes health check mechanism. Each probe monitors a specific aspect of container health and takes a distinct action when it fails.
| Probe | What it checks | What happens when it fails |
|---|---|---|
| Liveness probe | Whether the container is running | kubelet kills the container and applies the restart policy. Without a liveness probe, kubelet assumes the container is always healthy. |
| Readiness probe | Whether the container is ready to serve requests | The pod's IP address is removed from the Endpoint list, stopping traffic to it. When the container recovers, the IP is added back. |
| Startup probe | Whether the container application has finished starting | Liveness and readiness checks are blocked until this probe succeeds, preventing premature termination of slow-starting containers. |
Three check methods determine how each probe tests the container:
| Method | How it works | Success condition |
|---|---|---|
http_get | Sends an HTTP GET request to the container | Response status code ≥ 200 and < 400 |
tcp_socket | Opens a TCP connection to the container | TCP connection is established |
exec | Runs a command inside the container | Command exits with code 0 |
Prerequisites
Before you begin, ensure that you have:
A custom image with health check logic built in
The image pushed to a VPC internal registry (EAS has no public network access), for example:
registry-vpc.cn-shanghai.aliyuncs.com/xxx/yyy:zzz
Prepare a custom image
Use a web framework to encapsulate your prediction logic. The following example uses Flask:
import json
from flask import Flask, request, make_response
app = Flask(__name__)
@app.route('/', methods=['GET', 'POST'])
def process_handle_func():
"""
Parse the request body based on your requirements.
"""
data = request.get_data().decode('utf-8')
body = json.loads(data)
res = process(body)
"""
Set the response based on your requirements.
"""
response = make_response(res)
response.status_code = 200
return response
def process(data):
"""
Your prediction logic
"""
return 'result'
if __name__ == '__main__':
# Set host to '0.0.0.0'. Otherwise, the health check fails during service deployment.
# The port must match the port specified in the JSON deployment configuration.
app.run(host='0.0.0.0', port=8000)Write a Dockerfile to copy the prediction code and install required packages:
# This example uses Python.
FROM registry.cn-shanghai.aliyuncs.com/eas/bashbase-amd64:0.0.1
COPY ./process_code /eas
RUN /xxx/pip install required_packages
CMD ["/xxx/python", "/eas/xxx/app.py"]For steps on building a custom image, see Build images on a Container Registry Enterprise Edition instance and Custom images. Alternatively, store your code in a NAS file system or Git repository and attach it via storage mount. For more information, see Storage mount.
Configure health checks
Console deployment
Log on to the PAI console. Select a region, then select your workspace and click Elastic Algorithm Service (EAS).
On the Inference Service tab, click Deploy Service. Under Custom Model Deployment, click Custom Deployment.
In the Environment Information section, configure the following key parameters:
Parameter Description Image Configuration Select Image Address and enter your custom image address, for example: registry-vpc.cn-shanghai.aliyuncs.com/xxx/yyy:zzzCommand Entry command for the image. Single commands only — complex scripts are not supported. This command must match the Dockerfile CMD, for example:/data/eas/ENV/bin/python /data/eas/app.py. Enter the port your container listens on (for example,8000). The EAS engine reserves ports 8080 and 9090 — do not use those.Health Check Turn on the Health Check switch. Configure the parameters and click OK. You can add up to three health checks, each with a unique probe type. Configure health check parameters:
Parameter Description Default Probe Type Select Liveness Probe, Readiness Probe, or Startup Probe. See How it works for when to use each. — Check Method http_get: Sends an HTTP GET request. Healthy if the status code is ≥ 200 and < 400. tcp_socket: Performs a TCP check. Healthy if a connection can be established. exec (Custom health check): Runs a command. Healthy if the command exits with code 0. — Call Path Available for http_getonly. The URL suffix afterhttp://localhost./Port Number Available for http_getandtcp_socket. The port to check.— Command Available for execonly. The console converts your input to the required format automatically.— Latency for Check Initialization Seconds to wait after the container starts before the first check runs. 15 Check Interval Seconds between checks. A short interval increases pod overhead; a long interval delays failure detection. 10 Check Timeout Period Seconds before a check times out. A timeout counts as a failure. 1 Check Success Threshold Consecutive failures required after a success to mark the container as failed. 3 (readiness), 1 (liveness/startup) Check Failure Threshold Consecutive successes required after a failure to mark the container as healthy. 1 Click Deploy.
JSON deployment
Create a file named service.json. The following example configures a liveness probe using HTTP GET:
{
"metadata": {
"name": "test",
"instance": 1,
"enable_webservice": true
},
"cloud": {
"computing": {
"instance_type": "ml.gu7i.c16m60.1-gu30"
}
},
"containers": [
{
"image": "registry-vpc.cn-shanghai.aliyuncs.com/xxx/yyy:zzz",
"env": [
{
"name": "VAR_NAME",
"value": "var_value"
}
],
"liveness_check": {
"http_get": {
"path": "/",
"port": 8000
},
"initial_delay_seconds": 3,
"period_seconds": 3,
"timeout_seconds": 1,
"success_threshold": 2,
"failure_threshold": 4
},
"command": "/data/eas/ENV/bin/python /data/eas/app1.py",
"port": 8000
}
]
}This example uses liveness_check for a liveness probe. Use health_check for a readiness probe, or startup_check for a startup probe.
Container and image parameters
| Parameter | Description |
|---|---|
image | VPC internal registry address of the custom image. EAS has no public network access — use a VPC address, for example: registry-vpc.cn-shanghai.aliyuncs.com/xxx/yyy:zzz |
env.name | Name of the environment variable |
env.value | Value of the environment variable |
command | Entry command for the image. Single commands only, for example: /data/eas/ENV/bin/python /data/eas/app.py |
port | Port the process listens on. Must match the port in the file referenced by command. |
Probe configuration parameters
All three probe types (liveness_check, health_check, startup_check) share the same timing parameters.
| Parameter | Description | Default |
|---|---|---|
http_get.path | Access path for the HTTP check, prefixed with http://localhost. | / |
http_get.port | Port for the HTTP check. | — |
tcp_socket.port | Port for the TCP check. Example: {"port": 8000} | — |
exec.command | Command to run in the container. Example: {"command": ["your_script", "with_args"]} | — |
initial_delay_seconds | Seconds to wait after the container starts before the first check runs. | 0 |
period_seconds | Seconds between checks. | 10 |
timeout_seconds | Seconds before a check times out. A timeout counts as a failure. | 1 |
failure_threshold | Consecutive failures required after a success to mark the container as failed. | 3 (readiness), 1 (liveness/startup) |
success_threshold | Consecutive successes required after a failure to mark the container as healthy. | 1 |
For other parameters, see JSON deployment parameters and Deploy a custom inference service.