If you upload a large number of objects that use sequential prefixes such as timestamps and letters in the object names, multiple object indexes may be stored in a single partition. If an excessive number of requests are sent to query these objects, the response time is increased. To resolve this issue, we recommend that you add random prefixes to the names of objects.

Background information

OSS partitions objects based on the object names that are encoded by using UTF-8. This way, a large number of objects can be processed and the response time is reduced for the requests. OSS supports only up to 2,000 queries per second (QPS) in the sequential read and write mode. If you use sequential prefixes such as timestamps and letters in object names when you upload a large number of objects, multiple object indexes may be stored in a single partition. For more information about the QPS of a single account, see Usage notes.

If you use sequential prefixes in object names when you upload a large number of objects and you call operations such as GET, PUT, DELETE, COPY, and HEAD operations more than 2,000 times per second, perform batch operations such as batch delete operations that include more than 2,000 operations per second, or list more than 2,000 objects per second, the following issues occur:

  • The partition becomes a hotspot. The I/O capacity is exhausted, or the system automatically limits the request rate.
  • OSS partitions the data to balance the distribution of the data among partitions again and reduce the number of hotspots. This process may increase the request processing time.
    Note This operation is performed based on the analysis result of system status and processing capability. Objects that use sequential prefixes in object names may be stored in hotspots after the preceding operation is performed.

To resolve these issues, you can change the sequential prefixes in the object names to random prefixes for evenly distribution of object indexes and I/O loads among different partitions.

Solution

You can use the following methods to change sequential prefixes in object names to random prefixes:

  • Specify a hexadecimal hash as the prefix in an object name

    If you use dates and customer IDs to generate object names, sequential prefixes that use timestamps are included in object names as shown in the following examples:

    sample-bucket-01/2017-11-11/customer-1/file1
    sample-bucket-01/2017-11-11/customer-2/file2
    sample-bucket-01/2017-11-11/customer-3/file3
    ...
    sample-bucket-01/2017-11-12/customer-2/file4
    sample-bucket-01/2017-11-12/customer-5/file5
    sample-bucket-01/2017-11-12/customer-7/file6
    ...

    In this case, you can use the MD5 hash of multiple characters of the customer ID as the object name prefix. If you use the MD5 hash of four characters of the customer ID as the object name prefix, the names of the objects are generated as shown in the following examples:

    sample-bucket-01/9b11/2017-11-11/customer-1/file1
    sample-bucket-01/9fc2/2017-11-11/customer-2/file2
    sample-bucket-01/d1b3/2017-11-11/customer-3/file3
    ...
    sample-bucket-01/9fc2/2017-11-12/customer-2/file4
    sample-bucket-01/f1ed/2017-11-12/customer-5/file5
    sample-bucket-01/0ddc/2017-11-12/customer-7/file6
    ...

    If you use the hexadecimal hash of four characters of the customer ID as the object name prefix, each character can be one of the 16 values (0-9, a-f). This way, the total number of combinations of the four characters is 65,536 (16 4). In the storage system, the data can be distributed to up to 65,536 partitions. You can perform up to 2,000 operations per second on each partition. You can use the request rate to determine whether the number of buckets in a hash table meets your business requirements.

    If you want to view the objects from the sample-bucket-01 bucket whose names contain a specified date, such as 2017-11-11, you need to call an operation to display all objects from sample-bucket-01. This way, you need to call the ListObject operation multiple times to obtain all objects in sample-bucket-01 and display the objects whose names contain the specified date.

  • Reverse the order of digits that indicate milliseconds in object names

    If you use the UNIX timestamps that are accurate to milliseconds to generate object names, sequential prefixes are included in object names as shown in the following examples:

    sample-bucket-02/1513160001245.log
    sample-bucket-02/1513160001722.log
    sample-bucket-02/1513160001836.log
    sample-bucket-02/1513160001956.log
    ...
    sample-bucket-02/1513160002153.log
    sample-bucket-02/1513160002556.log
    sample-bucket-02/1513160002859.log
    ...

    In this case, you can reverse the order of the digits in the UNIX timestamp. This way, the object names do not contain sequential prefixes. After you reverse the order of the digits, the object names are displayed as shown in the following examples:

    sample-bucket-02/5421000613151.log
    sample-bucket-02/2271000613151.log
    sample-bucket-02/6381000613151.log
    sample-bucket-02/6591000613151.log
    ...
    sample-bucket-02/3512000613151.log
    sample-bucket-02/6552000613151.log
    sample-bucket-02/9582000613151.log
    ...

    The first three digits indicate milliseconds and 1,000 values are available. The fourth digit changes at an interval of 1 second. The fifth digit changes at an interval of 10 seconds. The reverse operation increases the randomness of prefixes. This way, requests are evenly distributed among each partition to avoid performance bottleneck issues.