All Products
Search
Document Center

Platform For AI:EAS FAQ

Last Updated:Aug 20, 2025

This topic answers frequently asked questions (FAQs) about online prediction services.

Abnormal service status

1. The service remains in the Waiting state for a long time. How do I resolve this?

After deployment, a service enters the Waiting state while it awaits resource scheduling and service instance startup. When all service instances have successfully started, the service enters the Running state. If a service remains in the Waiting state for an extended period, you can typically identify the cause by viewing the status and logs of the service instances in the Service Instance list on the Overview page. Possible causes include the following:

1. Insufficient resources: All or some instances in the service instance list are in the Pending state.

This issue usually occurs because the dedicated resource group has insufficient idle resources, which prevents instances from being scheduled. The following figure shows an example:8da00c3a5f2ebf7f0a110686ae473103

In this case, check whether the machine nodes in the dedicated resource group have enough idle resources, including CPU, Memory, and GPU. If an instance requires 3 cores and 4 GB of resources, at least one machine node in the dedicated resource group must have 3 cores and 4 GB of idle resources.

Important

To prevent system failures during high-load periods, each machine node reserves one CPU core for system components. The schedulable resources are the total resources minus this reserved core.

The following figure shows the node list of a dedicated resource group. For more information about how to view resource group details, see Use EAS resource groups.image

2. Instance health check is not complete: The service instance is in the Running state, but the container status is typically [0/1] or [1/2].

The number before the forward slash (/) indicates the number of successfully started containers. The number after the forward slash (/) indicates the total number of containers. When you deploy a service using a custom image, a sidecar container is automatically injected into the instance for traffic shaping, monitoring, and other tasks. You do not need to manage this container. In the console, you will see that the total number of containers is 2, which includes your custom container and the engine's sidecar container. In this case, the service instance is considered started and begins to receive traffic only after both containers are in the Ready state.ab1dfe90bbf7d3056c52b2dfbea196ce

3. Instance health check failed: The port configured for the EAS service is inconsistent with the port set in the code.

Problem description: An EAS service is deployed that uses Flask (or other web frameworks such as FastAPI, Sanic, or Django) to provide API operations. The log shows Running on http://127.0.0.1:7000:

image

However, the PAI console shows that the EAS service is still in the Waiting state:

image

Cause: The worker of the EAS service failed the health check. The port exposed by the worker is 8089, but Flask is providing services on port 7000.

image

Solution: Modify the port number configured for the EAS service to match the port in the code, and then restart the service.image

2. The service is in the Failed state. How do I resolve this?

A service enters the Failed state in the following two scenarios:

  • During service deployment: If a specified resource, such as a model address, does not exist during deployment, the cause of the error is displayed in the service's current status information. You can usually determine the cause of the deployment failure from this error message.

  • During service startup: The service fails to start after it is deployed and scheduled to resources. In this case, the status message Instance <network-test-5ff76448fd-h9dsn> not healthy: Instance crashed, please inspect instance log. appears.

    This status message indicates that a service instance failed to start. You must check the failed status in the Service Instances list on the service's Overview page to determine the specific cause. The following are possible causes for instance failure:

    • The service instance was terminated by the system due to an out-of-memory (OOM) error during startup. In this case, you must increase the service memory and redeploy the service. The following figure shows the instance status:5c694af1e97b7d3c11cea6d6303d1540

    • A service may crash because of a code error during startup. In this case, the Last Status is Error (error code). Click the Log button in the Actions column for the instance to check the service logs and identify the cause of the failure. The following figure shows the instance status:93d610794f407ebfc40a862fe47a9069

    • The service image failed to be pulled. For more information, see What do I do if an image fails to be pulled (ImagePullBackOff)?.

3. The EAS service automatically restarts after being stopped

Problem description: An EAS service automatically restarts some time after it is stopped.

Cause:

This occurs because Auto Scaling is configured for the service, and the minimum number of instances is set to 0. After a period with no traffic, the number of instances is automatically scaled in to 0. If a request arrives when no instances are available, a scale-out is automatically triggered, even if the configured scale-out metric threshold is not met.

You can determine whether a scale-out was automatically triggered based on the auto scaling description in the deployment events.

Solution:

  • If you no longer need the service, you can delete it.

  • If you do not want to delete the service, you can manually stop it by clicking Stop in the console or by calling the StopService API operation. A manually stopped service cannot be scaled out by traffic.

  • If you do not want the service to be automatically stopped due to elastic scaling, do not set the minimum number of instances to 0.

  • You can also disable Auto Scaling as needed to prevent unexpected traffic from triggering a scale-out.

4. PAI-EAS startup error: IoError(Os { code: 28, kind: StorageFull, message: "No space left on device" })

Problem description:

PAI-EAS reports the following error upon startup:

[2024-10-21 20:59:33] serialize_file(_flatten(tensors), filename, metadata=metadata)

[2024-10-21 20:59:33] safetensors_rust.SafetensorError: Error while serializing: IoError(Os { code: 28, kind: StorageFull, message: "No space left on device" })

[2024-10-21 20:59:35] time="2024-10-21T12:59:35Z" level=info msg="program stopped with status:exit status 1" program=/bin/sh

Cause: The system disk of the EAS instance is full because of too many model files, which prevents the service from starting properly.

Solution:

Solution 1: Scale out the system disk for the EAS instance.

Solution 2: If the model files are too large, you can store them in external storage, such as OSS or NAS, and read them using storage mounts.

5. Deployment error: fail to start program with error: fork/exec /bin/sh: exec format error

The exec format error indicates that the operating system cannot execute the target program file. The most common cause is that the CPU architecture of the executable file or container image is incompatible with the host system architecture.

You can try switching to a different resource specification.

6. Error: Invalid GPU count 6, only supported: [0 1 2 4 8 16]

To maximize communication efficiency between multiple GPUs, the number of GPUs specified for a single service must be a power of 2.

You can allocate 0, 1, 2, 4, 8, or 16 GPUs.

Image issues

1. What do I do if an image fails to be pulled (ImagePullBackOff)?

If you see ImagePullBackOff as the Last Exit Reason in the service instance list, the image pull has likely failed. If the following icon appears in the Status column, you can click it to view the specific cause.

image

Common causes for image pull failures include the following:

Failure cause

Possible error

Solution

Insufficient system disk space

no space left on device

Scale out the system disk.

ACR access control not configured

no such host

If you use the public endpoint of the image, you need to enable public access for ACR.

If you use the internal endpoint of the image:

  1. Add a VPC, such as eas_vpc, for EAS.

  2. In the access control settings of the ACR Enterprise instance, add eas_vpc. For more information, see Configure access control for a VPC for ACR.

EAS network configuration issue

dial tcp ***** timeout

If you use the public endpoint of the image, you need to configure Internet access for EAS.

Missing or incorrect authentication information

  • 401 Unauthorized

  • authorization failed

If an ACR Enterprise instance is not configured for public anonymous pulls and you pull the image from another region over the Internet, you must configure the username and password for the image repository during deployment. For more information about how to obtain the credentials, see Configure access credentials.

The following are recommendations based on the regions of the image service and the EAS service:

  • Same region: We recommend that you use the internal endpoint of the image to pull it.

  • Different regions: For ACR Personal Edition, you can only use the public endpoint of the image. For ACR Enterprise Edition, you can choose one of the following options:

    • If you have high requirements for security and stability, you can use the internal endpoint of the image. You must connect the VPCs using CEN. For more information, see Access an ACR Enterprise instance from another region or an IDC.

    • If the business scenario is simple or you cannot connect to the internal network, you can use the public endpoint of the image as a temporary solution. The download speed over the public internet is slow.

Note the following points about ACR Enterprise instances:

  • You must configure access control for VPCs and the public internet as needed.

  • If the repository is not configured for public anonymous pulls, you must configure the username and password for the image repository in the EAS service when you pull the image from another region using the public endpoint.

2. Can I download official EAS images from the Internet?

No. PAI official images are internal platform images. You can use them only on the PAI platform. You cannot download them outside the platform's containers.

Computing resource usage

1. The dedicated resource group is always scaling out

This is usually because resources are insufficient in the current region.

For subscription machine instances, if creation fails due to insufficient resources, the system automatically creates a refund order. The paid amount is returned to your account.

2. How do I delete a subscription instance from a dedicated resource group?

You can go to the Alibaba Cloud Unsubscribe page to unsubscribe from unused EAS subscription dedicated machines. On the page, perform the following steps:

  • For Type, select Partial Unsubscribe.

  • For Product Name, select EAS Dedicated Machine Subscription.

Click Search to find the resource you want to unsubscribe from. Then, click Unsubscribe Resource in the Actions column and follow the instructions in the console to complete the process.

3. Will service instance data be retained after I unsubscribe from an EAS resource group machine?

No, service instance data is not retained.

4. Why can't I select a 1-core, 2 GB resource configuration when deploying an EAS service?

To prevent issues, the 1-core, 2 GB resource specification is no longer available. EAS deploys system components on each machine node, and these components consume some machine resources. If the machine specification is too small, the proportion of resources occupied by system components is too high, which reduces the proportion of resources available for your use.

5. What is the maximum number of services that can be deployed in EAS?

The maximum number of service instances that you can deploy in EAS depends on the remaining resource usage. You can view the remaining usage in the machine list of the resource group in the console. For more information, see Use EAS resource groups.

If you allocate tasks based on the number of CPU cores, the maximum number of instances you can deploy is (Number of CPU cores - 1) / Number of cores used by each instance.

6. What EAS specification has computing power similar to a 4090 graphics card?

The ecs.gn8ia-2x.8xlarge specification has performance similar to a 4090 graphics card.

7. What is the maximum concurrency supported after a model is deployed with a specific resource specification?

The maximum concurrency of a model service is related to multiple factors, such as the model, scenario, and resource configuration. We recommend that you use automatic stress testing to determine the service performance.

Service management

1. Can I connect to an EAS instance using SSH?

No. EAS does not support remote SSH connections. You cannot enter the container to debug. To execute commands, we recommend that you configure them in the run command.

2. What are the EAS service statuses?

EAS services can have the following statuses. You can also go to the Elastic Algorithm Service (EAS) page and view the Service Status column.

  • Creating: The service is being created.

  • Waiting: The service is waiting for instances to start.

  • Stopped: The service is stopped.

  • Failed: The service has failed.

  • Updating: The service is being updated. Instances will be updated.

  • Stopping: The service is being stopped.

  • HotUpdate: The service is being updated. This is a hot update and instances are not updated.

  • Starting: The service is starting.

  • DeleteFailed: The service failed to be deleted.

  • Running: The service is running.

  • Scaling: The service is being updated. Instances are being scaled.

  • Pending: The service is waiting for a specific action.

  • Deleting: The service is being deleted.

  • Completed: The task is complete.

  • Preparing: The service is being prepared.

Service invocation

1. A service invocation error occurs, and a status code such as 404, 401, or 504 is returned.

404 Not Found

A 404 error is usually caused by an invalid request path, an incorrect request body, or the service not supporting the requested API operation. You can troubleshoot the issue based on the specific error message you receive.

Error type 1: {"object":"error","message":"The model `` does not exist.","type":"NotFoundError","param":null,"code":404}

Cause: When you call the /v1/chat/completions API operation of a service deployed with vLLM, the value of the model parameter in the request body is empty or invalid.

image

Solution: The value of the model parameter must be a valid model name. You can query for the name by calling the v1/models API operation.

Error type 2: {"detail":"Not Found"}

Cause: The request path is incomplete or incorrect. For example, when you call the chat API operation of an LLM service, you do not append v1/chat/completions to the base endpoint.

image

Solution: Make sure the API request path is complete and correct. For LLM services, see LLM invocation.

Error type 3: Calling the /v1/models API operation of BladeLLM returns 404: Not Found.

Cause: The service deployed with BladeLLM does not support the v1/models API operation.

image

Solution: For a list of supported API operations, see BladeLLM service invocation parameter configuration.

Error type 4: The online debugging page returns 404 with no other information.

Cause: The request path is incorrect. For example, when you debug online, the base endpoint is typically http://123***.cn-hangzhou.pai-eas.aliyuncs.com/predict/service_name. A 404 error occurs if you incorrectly modify or delete the service name.

image

Solution: When you debug online, you typically do not need to delete or modify the default endpoint. Just append the specific API path that you want to call.

Error type 5: An API call to ComfyUI returns 404 not found page.

Cause: You are calling a Serverless version of a ComfyUI service using an API call. This version does not support API calls.

Solution: Deploy the Standard Edition or API Edition. For more information, see AI video generation - ComfyUI deployment.

Returns 401 Authorization Failed

This error occurs if the token is not specified, is incorrect, or is used incorrectly when you access the service. You can check the following:

  • Check whether the token is correct. On the service Overview page, click View Invocation Information in the Basic Information section.

    Note

    The authentication token is automatically generated by default. You can also specify a token using custom authentication and modify the token when you update the service.

  • Check whether the token is set correctly.

    • If you use the curl command, you can add the token to the Authorization field of the HTTP request header. For example: curl -H 'Authorization: NWMyN2UzNjBiZmI2YT***' http:// xxx.cn-shanghai.aliyuncs.com/api/predict/echo.

    • If you use an SDK to access the service, you can call the corresponding SetToken() function. For more information, see Java SDK usage instructions.

The request has no response for a long time and eventually times out (timeout / 504)

This error occurs when the server is acting as a gateway or proxy but does not receive a timely response from the upstream server. This usually means that model inference is taking too long. To resolve this issue, you can perform the following steps:

  1. In the calling code, you can increase the timeout period for the HTTP request.

  2. For long-running tasks, we recommend that you use the EAS queue service (asynchronous invocation) mode. This mode can handle batch or long-running inference tasks.

For more information about status codes, see Appendix: Service status codes and common errors.

2. Does the PAI-EAS service support HTTPS calls?

Yes, it does. You can simply replace http with https for more secure, encrypted transmission. Make sure that the calling environment supports HTTPS certificate validation. If you encounter an SSL Certificate validation error in the client, such as with Python requests, this is usually a client environment configuration issue and not an issue with the EAS service.

3. How do I block HTTP access and allow only HTTPS access?

You cannot block HTTP access on a shared gateway. For a dedicated gateway, you can enable HTTPS Redirection. This redirects all HTTP requests to the HTTPS protocol.

image

4. Can I use my own domain name for invocation?

Yes, you can. You need to use a dedicated gateway. For more information, see Invoke a service using a dedicated gateway.

5. Does the token expire or change?

No, it does not. After a service is deployed, the token is valid for a long time. Restarting the service does not change the token. The token changes only if you manually reset it by modifying it with custom authentication or if you delete the service.

6. Can I create multiple tokens for one service?

No, you cannot. An EAS service instance supports only one fixed authentication token. You cannot create multiple tokens for a single service for access control or separate metering. To implement multi-user authentication management, we recommend that you use a more complex solution, such as Alibaba Cloud RAM authentication.

7. How do I enable streaming responses for a deployed LLM service?

The EAS service itself cannot be configured for default streaming responses during deployment. You must explicitly specify that you want streaming output in the body of each API call request.

For example, when you call an LLM service, you can add the "stream": true parameter to the JSON request body.

8. What is the difference between VPC endpoint invocation and VPC direct connection invocation?

  • VPC endpoint invocation: This method uses an internal-facing SLB and a gateway. This is a classic request model. In this model, requests are forwarded through Layer 4 of SLB and Layer 7 of the gateway to reach the service instance. In high-traffic and high-concurrency scenarios, this forwarding causes some performance overhead. The gateway also has a bandwidth limit, which is 1 Gbps by default.

  • VPC direct connection: EAS provides a high-speed direct connection access mode that resolves performance and extensibility issues at no extra cost. After you enable VPC direct connection, a network path is established between your VPC and the EAS service VPC. Your requests use the service discovery feature provided by EAS to locate the service and then initiate load-balanced service requests in the client code. This process requires you to use the SDK provided by EAS and set endpoint_type to DIRECT.

    For example, in the scenario described in Python SDK usage instructions, you can add the following line of code to switch from gateway invocation to direct connection:

    client = PredictClient('http://pai-eas-vpc.cn-hangzhou.aliyuncs.com', 'mnist_saved_model_example')
    client.set_token('M2FhNjJlZDBmMzBmMzE4NjFiNzZhMmUxY2IxZjkyMDczNzAzYjFi****')
    client.set_endpoint_type(ENDPOINT_TYPE_DIRECT) # Direct link
    client.init()

9. How do I use the curl command to call an EAS online service?

After an EAS online service is successfully deployed, you can use the curl command to call the service using its public or VPC endpoint. The procedure is as follows:

  1. Obtain the service endpoint and token.

    1. On the Elastic Algorithm Service (EAS) page, click the target service to go to its overview page.

    2. In the Basic Information section, click View Invocation Information.

    3. In the Invocation Information dialog box, on the Public Endpoint Invocation or VPC Endpoint Invocation tab, obtain the service endpoint and token.

  2. Use the curl command to call the service.

    Example:

    $ curl <service_url> -H 'Authorization: <service_token>' -d '[{"sex":0,"cp":0,"fbs":0,"restecg":0,"exang":0,"slop":0,"thal":0,"age":0,"trestbps":0,"chol":0,"thalach":0,"oldpeak":0,"ca":0}]'

    Where:

    • <service_url>: Replace this with the service endpoint you obtained.

    • <service_token>: Replace this with the token you obtained.

    • -d: Use this to configure the service request data.

Others

System disk

How do I scale out the system disk?

  • You can configure it in Resource Information > Extra System Disk, or you can directly modify the service JSON configuration as follows.

    image

    "features": { "eas.aliyun.com/extra-ephemeral-storage": "40GB" }
  • When you use dedicated resources, if the extra system disk you want to configure is larger than the remaining system disk size of the machine, you must delete the current resource and purchase a new one. You can adjust the system disk size during the purchase.

Network issues

How does an EAS service access the Internet from within the service?

By default, an EAS service cannot access the public internet. To access the public internet, you must configure a VPC with internet access capabilities for the EAS service. For more information, see Network configuration.

Service storage mounts

Why can't I select an OSS bucket when deploying an EAS service?

When you deploy an EAS service, you can configure models and code using mounts. Make sure that the OSS bucket and NAS file system you use are in the same region as the EAS service. Otherwise, you cannot select them.

Permissions

Why can't a RAM user automatically create or delete the EAS service-linked role?

Only users with specified permissions can automatically create or delete AliyunServiceRoleForPaiEas. Therefore, if a RAM user cannot automatically create or delete AliyunServiceRoleForPaiEas, you need to add the corresponding access policy for the RAM user. The procedure is as follows:

  1. Create a custom policy using the following policy script. For more information, see Create a custom access policy.

    Access policy for creating or deleting a service-linked role

    {
      "Statement": [
        {
          "Action": "ram:CreateServiceLinkedRole",
          "Resource": "*",
          "Effect": "Allow",
          "Condition": {
            "StringEquals": {
              "ram:ServiceName": "eas.pai.aliyuncs.com"
            }
          }
        }
      ],
      "Version": "1"
    }
  2. Grant the custom policy that you created in the previous step to the target RAM user. For more information, see Method 1: Grant permissions to a RAM user on the RAM user page.

TensorFlow issues

For more information, see TensorFlow FAQ.