All Products
Search
Document Center

Tablestore:Why does a HTTP 5XX status code occur when I use Tablestore?

Last Updated:Apr 01, 2025

Problem description

Why I use Tablestore, a 5XX status code occasionally occurs. The following table describes the 5XX HTTP status codes.

HTTP status code

Error code

Error message

503

OTSPartitionUnavailable

The partition is not available.

503

OTSServerUnavailable

Server is not available.

503

OTSServerBusy

Server is busy.

503

OTSTimeout

Operation timeout.

Cause

Tablestore is a table-based distributed serverless storage service. The Tablestore server automatically performs load balancing based on the data volume and access status of each partition. This way, data storage and access capabilities can be seamlessly scaled regardless of the capability limit of a single server.

Tablestore distributes data to different partitions base on the order of the first primary key column. Different partitions are distributed to different service nodes to provide data reading and writing services.

The dynamic load balancing mechanism of Tablestore can detect whether a partition such as P1 stores large amounts of data or processes too frequent access. In this case, the partition is split into two partitions P1 and P5. The two partitions are distributed to the service node that has lower loads.

Note

Only one partition is created when a data table is created, which provides limited concurrent read and write capabilities. The automatic load balancing mechanism delivers latency. To address these issues, contact Tablestore technical support to split a data table into multiple partitions in advance.

Tablestore uses the preceding automatic load balancing mechanism to automatically scale the storage and concurrent access capabilities without human intervention during the whole process.

Tablestore adopts the shared storage mechanism that uses partitions as logical units. Therefore, data is not migrated, but metadata of the data table is changed during load balancing. During the metadata change process, specific partitions may become unavailable to ensure data consistency. In normal cases, the unavailability lasts hundreds of milliseconds. When the workloads of the partitions are heavy, the unavailability lasts seconds. If you perform read or write operations on the partitions during the period of time, the preceding error may occur.

Solution

Retry the operation. Tablestore SDKs provide retry policies. You can specify retry policies when you initialize a Tablestore client.

Tablestore uses the API that complies with the standard Restful protocol. Due to the uncontrollable network environment, we recommend that you add a retry policy for all read and write operations to respond to network errors for fault tolerance.

Note

When you call the BatchWriteRow or BatchGetRow operation to write or read data in a batch, the data may be distributed to multiple partitions of one or more tables. A partition may be being split when you call the operation. Therefore, the batch operation is regarded successful if at least one single-row operation succeeded. When HTTP status code 200 is returned, you need to check getFailedRows()g in the response for failed single-row operations.