If you upload a large number of objects with sequential prefixes such as timestamps and letters in the object names, multiple object indexes may be stored in a single partition. If too many requests are sent to query these objects, the responsiveness may become slow.
Partitions and naming conventions
OSS stores objects in partitions based on the UTF-8-encoded object names to process a large number of objects and high request rates. However, if you use sequential prefixes such as timestamps and letters in object names when you upload a large number of objects, multiple file indexes may be stored in a single partition. In this case, if you initiate more than 2,000 requests for the PUT, COPY, POST, DELETE, and HEAD operations per second (the number of objects on which the operations are performed indicate the number of requests), the following impacts are generated:
- The partition becomes a hotspot. The I/O capacity is exhausted, or the system automatically limits the request rate.
- The system repartitions the data to rebalance the data across partitions and reduce hotspots. This process may result in a longer process request time.
The preceding cases affect the horizontal scalability of OSS, which degrades request rates.
To maintain the horizontal scalability and request rates, we recommend that you do not use sequential prefixes in object names. You can randomize prefix naming to evenly distribute object indexes and I/O loads to multiple partitions.
The following code provides an example on how to randomize prefix naming instead of using sequential prefixes.
- Example 1: Add a hex hash as the prefix to an object name
The following code uses dates and custom IDs to generate object names that contain the sequential prefix of timestamps:
sample-bucket-01/2017-11-11/customer-1/file1 sample-bucket-01/2017-11-11/customer-2/file2 sample-bucket-01/2017-11-11/customer-3/file3 ... sample-bucket-01/2017-11-12/customer-2/file4 sample-bucket-01/2017-11-12/customer-5/file5 sample-bucket-01/2017-11-12/customer-7/file6 ...
In this case, you can calculate the MD5 hash of several characters from the customer ID as the object name prefix. If the MD5 hash of a four-character hexadecimal number is used in prefixes, similar results are as follows:
sample-bucket-01/2c99/2017-11-11/customer-1/file1 sample-bucket-01/7a01/2017-11-11/customer-2/file2 sample-bucket-01/1dbd/2017-11-11/customer-3/file3 ... sample-bucket-01/7a01/2017-11-12/customer-2/file4 sample-bucket-01/b1fc/2017-11-12/customer-5/file5 sample-bucket-01/2bb7/2017-11-12/customer-7/file6 ...
The hash of the four-character hexadecimal number is used as the prefix. Each character can be any one of the 16 values (0-9, a-f). There are 65,536 (16^4) possible character combinations. In the storage system, the data can be distributed to a maximum of 65,536 partitions theoretically. A maximum of 2,000 operations can be performed on each partition per second. You can determine whether the number of buckets that a hash table has meets business requirements based on the request rate.
To list objects from the sample-bucket-01 bucket whose names contain a specified date such as 2017-11-11, you need only to list all objects from sample-bucket-01. In other words, you need only to call the ListObject operation multiple times to obtain all objects in sample-bucket-01 and list the objects whose names contain the specified date.
- Example 2: Reverse the order of digits that indicate seconds in object names
The following code shows that the UNIX time accurate to the millisecond is used in the prefix to generate object names. The time-based prefix is also a sequential prefix.
sample-bucket-02/1513160001245.log sample-bucket-02/1513160001722.log sample-bucket-02/1513160001836.log sample-bucket-02/1513160001956.log ... sample-bucket-02/1513160002153.log sample-bucket-02/1513160002556.log sample-bucket-02/1513160002859.log ...
You can reverse the order of the digits in the UNIX time so that the object names contain no sequential prefixes. The object names after reversion:
sample-bucket-02/5421000613151.log sample-bucket-02/2271000613151.log sample-bucket-02/6381000613151.log sample-bucket-02/6591000613151.log ... sample-bucket-02/3512000613151.log sample-bucket-02/6552000613151.log sample-bucket-02/9582000613151.log ...
The first three digits indicate milliseconds. 1,000 values are available. The fourth digit changes every second. Likewise, the fifth digit changes every 10 seconds. Reversion greatly increases the randomness of prefixes, distributing requests evenly to each partition, reaching load balancing, and avoiding performance bottlenecks.