All Products
Search
Document Center

Platform For AI:Configure health checks

Last Updated:Apr 01, 2026

Configure liveness, readiness, and startup probes using HTTP GET, TCP socket, or custom commands to monitor container health and route traffic away from failed instances automatically.

Limitations

Health checks are available only for services deployed using custom images that include health check logic.

How it works

EAS uses the Kubernetes health check mechanism. Each probe monitors a specific aspect of container health and takes a distinct action when it fails.

ProbeWhat it checksWhat happens when it fails
Liveness probeWhether the container is runningkubelet kills the container and applies the restart policy. Without a liveness probe, kubelet assumes the container is always healthy.
Readiness probeWhether the container is ready to serve requestsThe pod's IP address is removed from the Endpoint list, stopping traffic to it. When the container recovers, the IP is added back.
Startup probeWhether the container application has finished startingLiveness and readiness checks are blocked until this probe succeeds, preventing premature termination of slow-starting containers.

Three check methods determine how each probe tests the container:

MethodHow it worksSuccess condition
http_getSends an HTTP GET request to the containerResponse status code ≥ 200 and < 400
tcp_socketOpens a TCP connection to the containerTCP connection is established
execRuns a command inside the containerCommand exits with code 0

Prerequisites

Before you begin, ensure that you have:

  • A custom image with health check logic built in

  • The image pushed to a VPC internal registry (EAS has no public network access), for example: registry-vpc.cn-shanghai.aliyuncs.com/xxx/yyy:zzz

Prepare a custom image

Use a web framework to encapsulate your prediction logic. The following example uses Flask:

import json
from flask import Flask, request, make_response

app = Flask(__name__)

@app.route('/', methods=['GET', 'POST'])
def process_handle_func():
    """
    Parse the request body based on your requirements.
    """
    data = request.get_data().decode('utf-8')
    body = json.loads(data)
    res = process(body)
    """
    Set the response based on your requirements.
    """
    response = make_response(res)
    response.status_code = 200
    return response

def process(data):
    """
    Your prediction logic
    """
    return 'result'

if __name__ == '__main__':
    # Set host to '0.0.0.0'. Otherwise, the health check fails during service deployment.
    # The port must match the port specified in the JSON deployment configuration.
    app.run(host='0.0.0.0', port=8000)

Write a Dockerfile to copy the prediction code and install required packages:

# This example uses Python.
FROM registry.cn-shanghai.aliyuncs.com/eas/bashbase-amd64:0.0.1
COPY ./process_code  /eas
RUN /xxx/pip install required_packages
CMD ["/xxx/python", "/eas/xxx/app.py"]

For steps on building a custom image, see Build images on a Container Registry Enterprise Edition instance and Custom images. Alternatively, store your code in a NAS file system or Git repository and attach it via storage mount. For more information, see Storage mount.

Configure health checks

Console deployment

  1. Log on to the PAI console. Select a region, then select your workspace and click Elastic Algorithm Service (EAS).

  2. On the Inference Service tab, click Deploy Service. Under Custom Model Deployment, click Custom Deployment.

  3. In the Environment Information section, configure the following key parameters:

    ParameterDescription
    Image ConfigurationSelect Image Address and enter your custom image address, for example: registry-vpc.cn-shanghai.aliyuncs.com/xxx/yyy:zzz
    CommandEntry command for the image. Single commands only — complex scripts are not supported. This command must match the Dockerfile CMD, for example: /data/eas/ENV/bin/python /data/eas/app.py. Enter the port your container listens on (for example, 8000). The EAS engine reserves ports 8080 and 9090 — do not use those.
    Health CheckTurn on the Health Check switch. Configure the parameters and click OK. You can add up to three health checks, each with a unique probe type.
  4. Configure health check parameters:

    ParameterDescriptionDefault
    Probe TypeSelect Liveness Probe, Readiness Probe, or Startup Probe. See How it works for when to use each.
    Check Methodhttp_get: Sends an HTTP GET request. Healthy if the status code is ≥ 200 and < 400. tcp_socket: Performs a TCP check. Healthy if a connection can be established. exec (Custom health check): Runs a command. Healthy if the command exits with code 0.
    Call PathAvailable for http_get only. The URL suffix after http://localhost./
    Port NumberAvailable for http_get and tcp_socket. The port to check.
    CommandAvailable for exec only. The console converts your input to the required format automatically.
    Latency for Check InitializationSeconds to wait after the container starts before the first check runs.15
    Check IntervalSeconds between checks. A short interval increases pod overhead; a long interval delays failure detection.10
    Check Timeout PeriodSeconds before a check times out. A timeout counts as a failure.1
    Check Success ThresholdConsecutive failures required after a success to mark the container as failed.3 (readiness), 1 (liveness/startup)
    Check Failure ThresholdConsecutive successes required after a failure to mark the container as healthy.1
  5. Click Deploy.

JSON deployment

Create a file named service.json. The following example configures a liveness probe using HTTP GET:

{
    "metadata": {
        "name": "test",
        "instance": 1,
        "enable_webservice": true
    },
    "cloud": {
        "computing": {
            "instance_type": "ml.gu7i.c16m60.1-gu30"
        }
    },
    "containers": [
        {
            "image": "registry-vpc.cn-shanghai.aliyuncs.com/xxx/yyy:zzz",
            "env": [
                {
                    "name": "VAR_NAME",
                    "value": "var_value"
                }
            ],
            "liveness_check": {
                "http_get": {
                    "path": "/",
                    "port": 8000
                },
                "initial_delay_seconds": 3,
                "period_seconds": 3,
                "timeout_seconds": 1,
                "success_threshold": 2,
                "failure_threshold": 4
            },
            "command": "/data/eas/ENV/bin/python /data/eas/app1.py",
            "port": 8000
        }
    ]
}

This example uses liveness_check for a liveness probe. Use health_check for a readiness probe, or startup_check for a startup probe.

Container and image parameters

ParameterDescription
imageVPC internal registry address of the custom image. EAS has no public network access — use a VPC address, for example: registry-vpc.cn-shanghai.aliyuncs.com/xxx/yyy:zzz
env.nameName of the environment variable
env.valueValue of the environment variable
commandEntry command for the image. Single commands only, for example: /data/eas/ENV/bin/python /data/eas/app.py
portPort the process listens on. Must match the port in the file referenced by command.

Probe configuration parameters

All three probe types (liveness_check, health_check, startup_check) share the same timing parameters.

ParameterDescriptionDefault
http_get.pathAccess path for the HTTP check, prefixed with http://localhost./
http_get.portPort for the HTTP check.
tcp_socket.portPort for the TCP check. Example: {"port": 8000}
exec.commandCommand to run in the container. Example: {"command": ["your_script", "with_args"]}
initial_delay_secondsSeconds to wait after the container starts before the first check runs.0
period_secondsSeconds between checks.10
timeout_secondsSeconds before a check times out. A timeout counts as a failure.1
failure_thresholdConsecutive failures required after a success to mark the container as failed.3 (readiness), 1 (liveness/startup)
success_thresholdConsecutive successes required after a failure to mark the container as healthy.1

For other parameters, see JSON deployment parameters and Deploy a custom inference service.