If you upload a large number of objects with sequential prefixes such as timestamps and letters in the object names, multiple object indexes may be stored in a single partition. If too many requests are sent to query these objects, the responsiveness may become slow. In this case, we recommend that you add random prefixes to the names of your objects.

Background information

OSS stores objects in partitions based on the UTF-8-encoded object names to process a large number of objects and high request rates. However, if you use sequential prefixes such as timestamps and letters in object names when you upload a large number of objects, multiple file indexes may be stored in a single partition. In this case, if you initiate more than 2,000 requests for the PUT, COPY, POST, DELETE, and HEAD operations per second (the number of objects on which the operations are performed indicate the number of requests), the following impacts are generated:

  • The partition becomes a hotspot. The I/O capacity is exhausted, or the system automatically limits the request rate.
  • OSS repartitions the data to rebalancing the data across partitions and reduce hotspots. This process may result in a longer process request time.
    Note The repartition and rebalance are performed based on the analysis result of system status and processing capability but not a fixed rule. Therefore, objects with sequential prefixes may still be stored in hotspots after repartition and rebalancing are performed.

The preceding cases affect the horizontal scalability of OSS, which degrades request rates.

To maintain the horizontal scalability and request rates, we recommend that you do not use sequential prefixes in object names. You can randomize prefix naming to evenly distribute object indexes and I/O loads to multiple partitions.

Solution

Two methods are provided to change sequential prefixes in object names to random prefixes:

  • Add a hex hash as the prefix to an object name

    If you use dates and customer IDs to generate object names, sequential prefixes with timestamps are included in object names as follows:

    sample-bucket-01/2017-11-11/customer-1/file1
    sample-bucket-01/2017-11-11/customer-2/file2
    sample-bucket-01/2017-11-11/customer-3/file3
    ...
    sample-bucket-01/2017-11-12/customer-2/file4
    sample-bucket-01/2017-11-12/customer-5/file5
    sample-bucket-01/2017-11-12/customer-7/file6
    ...

    In this case, you can calculate the MD5 hash of several characters from the customer ID as the object name prefix. If the MD5 hash of a four-character hexadecimal number is used in prefixes, the names of the objects are as follows:

    sample-bucket-01/2c99/2017-11-11/customer-1/file1
    sample-bucket-01/7a01/2017-11-11/customer-2/file2
    sample-bucket-01/1dbd/2017-11-11/customer-3/file3
    ...
    sample-bucket-01/7a01/2017-11-12/customer-2/file4
    sample-bucket-01/b1fc/2017-11-12/customer-5/file5
    sample-bucket-01/2bb7/2017-11-12/customer-7/file6
    ...

    The hash of the four-character hexadecimal number is used as the prefix. Each character can be any one of the 16 values (0-9, a-f). There are 65,536 possible character combinations. In the storage system, the data can be distributed to a maximum of 65,536 partitions. A maximum of 2,000 operations can be performed on each partition per second. You can determine whether the number of buckets that a hash table has meets business requirements based on the request rate.

    To list objects from the sample-bucket-01 bucket whose names contain a specified date such as 2017-11-11, you need only to list all objects from sample-bucket-01. In other words, you need only to call the ListObject operation multiple times to obtain all objects in sample-bucket-01 and list the objects whose names contain the specified date.

  • Reverse the order of digits that indicate seconds in object names

    If you use the UNIX timestamps accurate to the millisecond to generate object names, sequential prefixes are included in object names as follows:

    sample-bucket-02/1513160001245.log
    sample-bucket-02/1513160001722.log
    sample-bucket-02/1513160001836.log
    sample-bucket-02/1513160001956.log
    ...
    sample-bucket-02/1513160002153.log
    sample-bucket-02/1513160002556.log
    sample-bucket-02/1513160002859.log
    ...

    In this case, you can reverse the order of the digits in the UNIX timestamp so that the object names contain no sequential prefixes. The object names after reversion are as follows:

    sample-bucket-02/5421000613151.log
    sample-bucket-02/2271000613151.log
    sample-bucket-02/6381000613151.log
    sample-bucket-02/6591000613151.log
    ...
    sample-bucket-02/3512000613151.log
    sample-bucket-02/6552000613151.log
    sample-bucket-02/9582000613151.log
    ...

    The first three digits indicate milliseconds. 1,000 values are available. The fourth digit changes every second. Likewise, the fifth digit changes every 10 seconds. Reversion greatly increases the randomness of prefixes, distributing requests evenly to each partition, reaching load balancing, and avoiding performance bottlenecks.