Configure health checks for EAS services - Platform For AI

EAS health checks use Kubernetes probe mechanisms to automatically detect and recover unhealthy containers, ensuring that only healthy instances receive traffic.

Limitations

Health checks are available only when you deploy a service using a custom image that includes health check logic.

How it works

EAS health checks use Kubernetes probe mechanisms to detect and manage service health. EAS supports the following probe types and health check methods.

Supported probe types:

Probe type	Description
Liveness probe	Determines whether a container is running. If the probe detects an unhealthy container, the kubelet kills the container and applies the restart policy. If a container has no liveness probe configured, the kubelet treats its liveness probe result as always Success.
Readiness probe	Determines whether a container is ready to serve requests. Only pods in the Ready state receive traffic. The association between a Service and its Endpoints is managed based on pod readiness: When a pod's Ready state is False, Kubernetes removes the pod IP from the Endpoint list associated with the Service. When the pod's Ready state changes to True, Kubernetes adds the pod IP back to the Endpoint list.
Startup probe	Determines when a container has finished starting up. Use this probe for slow-starting containers to prevent liveness and readiness checks from running before initialization completes, which would cause the container to be killed prematurely.

Supported health check methods:

Health check method	Description
`http_get`	Sends an HTTP GET request to check service health and availability. The check succeeds when the response status code is in the 2xx or 3xx range.
`tcp_socket`	Attempts to open a TCP connection to check service health and availability.
`exec`	Runs a specified command inside the container. The check result is determined by the command's exit code.

Prepare a custom image

Wrap your prediction logic with a web framework. This example uses Flask:

import json
from flask import Flask, request, make_response

app = Flask(__name__)

@app.route('/', methods = ['GET','POST'])
def process_handle_func():
    """
       Parse the request body based on actual requirements
    """
    data = request.get_data().decode('utf-8')
    body = json.loads(data)
    res = process(body)
    """
       Set the response based on actual requirements
    """
    response = make_response(res)
    response.status_code = 200
    return response

def process(data):
    """
       Your prediction logic
    """
    return 'result'

if __name__ == '__main__':
    """
    Note: host must be set to 0.0.0.0, otherwise the health check will fail during service deployment.
    port must match the port specified in the JSON configuration file for the deployed service.
    """
    app.run(host='0.0.0.0', port=8000)

Write a Dockerfile to copy your prediction code and install the required packages:

# Python example
FROM registry.cn-shanghai.aliyuncs.com/eas/bashbase-amd64:0.0.1
COPY ./process_code  /eas
RUN /xxx/pip install 需要的包
CMD ["/xxx/python", "/eas/xxx/app.py"]

For steps to build a custom image, see Build images on an Enterprise Edition instance. Review Deploy model services with custom images for image building guidelines. Alternatively, save your code in a NAS file system or Git repository and mount the storage when deploying (see Storage mounts). This topic uses the first approach to demonstrate health check configuration in Configure health checks when deploying a service.

Configure health checks when deploying a service

Configure health checks in custom deployment

Log in to the PAI console, select the target region at the top of the page, select the target workspace on the right, and then click Go to EAS.
On the Inference Service tab, click Deploy Service. In the Custom Model Deployment section, click Custom Deployment.

In the Environment Information section, configure the following key parameters. For other parameters, see Custom deployment.

Parameter

Description

Image Configuration

Select Image Address, then enter the address of your custom image in the text field, for example, registry-vpc.cn-shanghai.aliyuncs.com/xxx/yyy:zzz.

Command

The container entrypoint command. Only a single command is supported — complex scripts aren't allowed. The command must match what's defined in your Dockerfile. For example: /data/eas/ENV/bin/python /data/eas/app.py.

Enter the port number that the container listens on after startup, for example, 8000.

Important

The EAS engine listens on fixed ports 8080 and 9090. Avoid using these ports for your container.
The port must match the one configured in the xxx.py file referenced in the run command.

In the Features section, expand the Stability guarantee panel, turn on the Health Check toggle, configure the parameters described below, and then click OK.

Health check parameters

Parameter	Description
Probe Type	Three probe types are supported: Liveness Probe: Checks whether the container is running normally. Readiness Probe: Verifies that the container has finished initialization and is ready to handle requests. Startup Probe: Designed for applications that need extra time to initialize, preventing slow-starting containers from being incorrectly marked as failed. For a description of how each probe works, see How it works.
Check Method	Three health check methods are supported: http_get: Calls the HTTP GET method using the container's IP address, port, and path. The check succeeds when the response status code is between 200 (inclusive) and 400 (exclusive). tcp_socket: Runs a TCP check using the container's IP address and port. The check succeeds when a TCP connection can be established. exec (custom health check): Runs a specified command inside the container. The check succeeds when the command exits with code 0.
Call Path	Available only when Check Method is set to http_get. The HTTP server URL to check. The prefix is `http://localhost`. The path suffix is customizable and defaults to `/`.
Port Number	Available only when Check Method is set to http_get or tcp_socket. The port to check, for example, 8000.
Command	Available only when Check Method is set to exec. Enter the command to run. The console automatically converts the command into the required format and writes it to the deployment JSON.
Latency for Check Initialization	The delay before the first health check runs after the container starts. Defaults to 15 seconds.
Check Interval	How frequently health checks run. Defaults to 10 seconds. A high frequency adds overhead to the pod; a low frequency delays detection of container errors.
Check Timeout Period	The timeout for each health check. Defaults to 1 second. A check that exceeds this duration is considered failed.
Check Success Threshold	The number of consecutive failed checks required to mark a previously healthy container as unhealthy. Defaults to 3 for readiness probes and 1 for liveness and startup probes.
Check Failure Threshold	The number of consecutive successful checks required to mark a previously unhealthy container as healthy. Defaults to 1.

Click Deploy.

Configure health checks in JSON deployment

Create a JSON file named service.json. A sample file is shown below.

{
    "metadata": {
        "name": "test",
        "instance": 1,
        "enable_webservice": true
    },
    "cloud": {
        "computing": {
            "instance_type": "ml.gu7i.c16m60.1-gu30"
        }
    },
    "containers": [
        {
            "image":"registry-vpc.cn-shanghai.aliyuncs.com/xxx/yyy:zzz",
            "env":[
                {
                    "name":"VAR_NAME",
                    "value":"var_value"
                }
            ],
            "liveness_check":{
                "http_get":{
                    "path":"/",
                    "port":8000
                },
                "initial_delay_seconds":3,
                "period_seconds":3,
                "timeout_seconds":1,
                "success_threshold":2,
                "failure_threshold":4
            },
            "command":"/data/eas/ENV/bin/python /data/eas/app1.py",
            "port":8000
        }
    ]
}

The key parameters are described in the following table. For other parameters, see JSON deployment.

Parameter		Description
image		The address of the custom image used to deploy the model service. EAS doesn't allow public network access. Use a VPC-internal registry address instead, for example: `registry-vpc.cn-shanghai.aliyuncs.com/xxx/yyy:zzz`.
env	name	The name of an environment variable passed to the container at runtime.
env	value	The value of the environment variable.
command		The container entrypoint command. Only a single command is supported — complex scripts aren't allowed. For example: `/data/eas/ENV/bin/python /data/eas/app.py`.
port		The network port that the process inside the container listens on, for example, 8000. Important The port must match the port configured in the command field's xxx.py file.
liveness_check Note Specifies the liveness probe as the health check probe type. You can also configure this as health_check (readiness probe) or startup_check (startup probe).	http_get	Sends an HTTP GET request to port 8000. Sub-parameters: http_get.path: The HTTP server URL to check. The prefix is `http://localhost`. The path suffix is customizable and defaults to `/`. http_get.port: The HTTP server port to check. Two additional health check methods are supported: tcp_socket: Runs a TCP check using the container's IP address and port. The check succeeds when a TCP connection can be established. Configuration: `"tcp_socket":{ "port":8000 }` exec: Runs a specified command inside the container. The check succeeds when the command exits with code 0. Configuration: `"exec":{ "command":[ "your_script", "with_args" ] }`
	initial_delay_seconds	The delay before the first health check runs after the container starts. Defaults to 0 seconds.
	period_seconds	How frequently health checks run. Defaults to 10 seconds. A high frequency adds overhead to the pod; a low frequency delays detection of container errors.
	timeout_seconds	The timeout for each health check. Defaults to 1 second. A check that exceeds this duration is considered failed.
	success_threshold	The number of consecutive failed checks required to mark a previously healthy container as unhealthy. Defaults to 3 for readiness probes and 1 for liveness and startup probes.
	failure_threshold	The number of consecutive successful checks required to mark a previously unhealthy container as healthy. Defaults to 1.