edit-icon download-icon

Why is Error 500 returned occasionally when Table Store is used?

Last Updated: Apr 11, 2018

Symptom

When using Table Store, some common 4xx and 5xx errors that may be reported include:

HTTPStatus ErrorCode ErrorMsg
503 OTSPartitionUnavailable The partition is not available.
503 OTSServerUnavailable Server is not available.
503 OTSServerBusy Server is busy.
503 OTSTimeout Operation timeout.

As Table Store is an exclusively distributed NoSQL service, the server automatically balances the load based on the data volume and access condition in the data partitions, enabling seamless scaling of the data size and access concurrency typically restricted by single-host services

Table Store divides data into different data partitions based on the sequence of the first primary key, and data partitions are then scheduled to different service nodes for read and write services.

If the data or access volume of a data partition is too large, Table Store first detects this condition, splits the data partition to two data partitions, and then uses its dynamic load balancing capability to schedule the two data partitions to a service node under a lighter load. In the following figure, P1 is partitioned by Table Store into P1’ and P5 as follows:

LoadBalance

As shown in the preceding example, Table Store enables automatic scaling of the table data size and access concurrency throughout the entire process, with no manual intervention required. However, when a data table is created, only one data partition exists, which provides limited read/write concurrency capabilities. As a result, latency exists even with automatic load balancing. You can open a ticket to request capabilities to divide a data table into multiple data partitions in advance.

Table Store uses shared storage, and data partitions are logic units. During load balancing, data table metadata changes without migration of data. When metadata changes, involved data partitions may be invalid for a short period for the purpose of data consistency. This period lasts as short as hundreds of milliseconds normally and may last several seconds in the case of a large load of data partitions.During this period, a 500 error may be reported if the data partition is read or written. You can try again to solve this problem. The official SDK provides some retry policies by default. You can specify a retry policy when initializing the client.

Table Store provides standard RESTful APIs. Due to the uncontrollable network environment, we recommend that you add retry policies for all read/write operations to tolerate network errors to a certain extend.

Note: A data partition may be splitting while BatchWriteRow or BatchGetRow is used to write or read data in batches to or from multiple tables or multiple data partitions of a table, so the write or read operation is non-atomic. The operation is atomic to a row only. You have to check getFailedRows() in the response for failed rows even if a 200 message is returned.

Thank you! We've received your feedback.