This topic provides best practices for table operations.
Primary key design
Tablestore automatically partitions table data based on the partition key, assigning each partition to a service node. The partition key value represents the smallest data unit and cannot be subdivided further within the same value.
To avoid creating hot spots with a single partition key value and reaching the service capacity limit of a single node, applications should distribute data and access volume evenly.
Tablestore organizes rows in a table lexicographically by primary key. An optimized primary key design promotes even data distribution across partitions, leveraging Tablestore's horizontal scalability.
When choosing a partition key, consider the following guidelines:
-
Keep the data volume for a single partition key value below 10 GB, if possible.
NoteThis recommendation aims to prevent hot spots and is not a limitation on data storage capacity.
-
Ensure logical independence between data in different partition key values within a table.
-
Avoid concentrating access pressure on a narrow range of consecutive partition key values.
Partition key design
Using concatenation for partition keys
To prevent exceeding 10 GB of data under a single partition key value in a Tablestore table, consider concatenating multiple primary key columns to form a partition key.
If the data volume for a single partition key value might surpass 10 GB, concatenate multiple primary key columns to avoid this limitation. When doing so, keep the following in mind:
-
Choose primary key columns to concatenate that effectively distribute records with the same original partition key value across different partition key values.
-
For Integer type primary key columns, pad with zeros at the higher order to maintain record order.
-
When choosing a delimiter, consider its impact on the lexicographic order. A delimiter smaller than all available characters is a safe choice.
Adding a hash prefix to the partition key
Avoid using sequentially increasing columns as partition keys. If necessary, add a hash prefix to the partition key to distribute data randomly and ensure even access pressure.
A hash prefix scatters originally contiguous data, preventing the use of the GetRange operation for logically continuous data within a range.
Data operations
Parallel data writing
Tablestore distributes table partitions across multiple servers.
Sequential data writing can overload a single partition while leaving others idle, not fully utilizing reserved read/write throughput and slowing data import. To enhance the import rate, consider the following strategies:
-
Shuffle data before importing to balance the load across all partitions.
-
Employ multiple worker threads to import data in parallel, dividing large datasets into smaller chunks for random selection and import by the threads.
Distinguishing cold data and hot data
Data has a lifecycle, with recent data accessed more frequently than older data, which becomes cold and occupies storage space.
A large volume of cold data can lead to uneven access pressure and underutilization of reserved read/write throughput.
To address this, separate cold and hot data into different tables with distinct reserved read/write throughputs.
Scenario cases
Explore table design processes through specific scenario cases.
Scenario introduction
Consider a table that logs student card transactions at a university, with primary key columns including CardID, SellerID, DeviceID, and OrderNumber. The following considerations apply:
-
Each student card is linked to a unique CardID, and each merchant to a SellerID.
-
Each terminal has a unique DeviceID globally.
-
Transaction records on each terminal have a unique OrderNumber, but OrderNumbers are not unique across the system. For instance, different terminals may generate distinct transactions with the same OrderNumber.
-
OrderNumbers from the same terminal are time-sequenced, with newer transactions having higher OrderNumbers.
-
Transactions are recorded in real-time.
Design process
-
Choose suitable primary keys as partition keys.
For efficient Tablestore use, consider partition key settings when designing table primary keys. Refer to the table below for different partitioning methods. From the analysis, CardID and DeviceID are suitable as partition keys, while SellerID and OrderNumber are not recommended. Design the remaining primary key columns based on business requirements.
Partition Method
Analysis
Analysis Conclusion
Use CardID as the table's partition key
The number of daily transactions per card is generally consistent, leading to even access pressure across partitions.
Using CardID as the partition key optimizes the use of reserved read/write throughput resources.
CardID is a suitable choice for the partition key.
Use SellerID as the table's partition key
With a limited number of campus merchants, some may generate a high volume of transactions, creating hot spots and uneven access pressure.
SellerID is not an ideal choice for the partition key.
Use DeviceID as the table's partition key
Despite varying transaction volumes across merchants, each terminal generates a predictable number of daily transactions. Using DeviceID as the partition key ensures relatively even access pressure.
DeviceID is a suitable choice for the partition key.
Use OrderNumber as the table's partition key
Sequential OrderNumbers concentrate transactions within a narrow range, underutilizing reserved read/write throughput.
If OrderNumber must be used, hash it and use the hash value as a prefix to ensure even data and access pressure distribution.
OrderNumber is not an ideal choice for the partition key.
-
If a single partition key value's total data volume may exceed 10 GB, concatenate multiple primary key columns as the partition key.
For the student card transaction table with DeviceID, SellerID, CardID, and OrderNumber as primary keys, and DeviceID as the partition key, concatenate DeviceID, SellerID, and CardID as the first primary key column if the data volume for a single DeviceID exceeds 10 GB.
The original table is as follows:
DeviceID
SellerID
CardID
OrderNumber
attrs
16
'a100'
66661
200001
...
54
'a100'
6777
200003
...
54
'a1001'
6777
200004
...
167
'a101'
283408
200002
...
After concatenating DeviceID, SellerID, and CardID as the partition key, the table is as follows:
CombineDeviceIDSellerIDCardID
OrderNumber
attrs
'16:a100:66661'
200001
...
'167:a101:283408'
200002
...
'54:a1001:6777'
200004
...
'54:a100:6777'
200003
...
In the original table, two records with Device=54 are under the same partition key value. In the new table, they have different partition key values, reducing the data volume under a single partition key value.
In the student card transaction table, since all records with the same DeviceID share the same SellerID, concatenating only DeviceID and SellerID does not address the issue of excessive data volume under a single partition key value. Therefore, concatenate DeviceID, SellerID, and CardID as the partition key, rather than just DeviceID and SellerID.
Directly concatenating primary key columns can cause issues. For example, in the original table, records with DeviceID=54 precede those with DeviceID=167. After concatenation, the order is reversed. If the application needs to read all records with DeviceID in the range [15, 100), this design does not meet the requirements.
To resolve this issue, you can prepend zeros to the DeviceID to standardize its length. The number of zeros added should match the maximum number of digits in the DeviceID. For instance, if the DeviceID ranges from [0, 999999], you can pad the DeviceID with zeros to ensure it has a consistent length of six digits before concatenation. The resulting table information is presented below:
CombineDeviceIDSellerIDCardID
OrderNumber
attrs
'000016:a100:66661'
200001
...
'000054:a1001:6777'
200004
...
'000054:a100:6777'
200003
...
'000167:a101:283408'
200002
...
After adding leading zeros, the table still presents some issues. For instance, the row where DeviceID=54 and SellerID='a1001' should follow the row with SellerID='a100' in the original table. This discrepancy arises because the lexicographic sequence of '000054:a1001' is less than that of '000054:a100:', while 'a1001' is lexicographically greater than 'a100'. The delimiter
:
influences the lexicographic ordering.When selecting a delimiter, it is recommended to choose a character smaller than the ASCII code of all available characters as the delimiter. In this table, the values of SellerID are numbers, uppercase, and lowercase English letters. After analysis, it is found that
,
is smaller than the ASCII code of all available characters for SellerID, so,
can be used as the delimiter.The table information, when concatenated with a
,
, is as follows:CombineDeviceIDSellerIDCardID
OrderNumber
attrs
'000016,a100,66661'
200001
...
'000054,a100,6777'
200003
...
'000054,a1001,6777'
200004
...
'000167,a101,283408'
200002
...
The sequence of table records, created by concatenating partition keys from the table above, matches the sequence of records in the original table.
-
If you are using a sequentially growing primary key as a partition key, concatenate a hash prefix to the partition key.
Since OrderNumber is sequentially growing, consumption records are always written within the range of the latest OrderNumber, and the old OrderNumber no longer has write pressure, causing uneven access pressure and inefficient utilization of reserved read/write throughput. Therefore, it is best not to use OrderNumber as the table's partition key in table design.
Assume OrderNumber is the partition key of the table. You can concatenate a hash prefix to the partition key to randomly distribute connected OrderNumbers in the table, ensuring even distribution of access pressure.
The consumption record table with OrderNumber as the partition key is as follows:
OrderNumber
DeviceID
SellerID
CardID
attrs
200001
16
'a100'
66661
...
200002
167
'a101'
283408
...
200003
54
'a100'
6777
...
200004
54
'a1001'
6777
...
200005
66
'b304'
178994
...
Use the md5 algorithm to calculate the prefix for OrderNumber (you can also use other hash algorithms) and concatenate it as HashOrderNumber. Since the hash string obtained by the md5 algorithm may be too long, you only need to take the first few characters to achieve the purpose of randomly distributing connected OrderNumber records in the table.
Here, the operation of concatenating the hash prefix is introduced using the first 4 characters of the hash string as an example. The consumption record table with the concatenated HashOrderNumber as the partition key is as follows:
HashOrderNumber
DeviceID
SellerID
CardID
attrs
'2e38200004'
54
'a1001'
6777
...
'a5a9200003'
54
'a100'
6777
...
'c335200005'
66
'b304'
178994
...
'db6e200002'
167
'a101'
283408
...
'ddba200001'
16
'a100'
66661
...
When subsequently accessing consumption records, use the same algorithm to calculate the hash prefix for OrderNumber to obtain the corresponding HashOrderNumber of the consumption record. However, using HashOrderNumber as the partition key makes it impossible to use the GetRange operation to read logically continuous records within a range.
-
Distinguish between cold and hot data storage based on data timeliness.
Since applications need to process and analyze consumption records in a timely manner or query recent consumption records, the possibility of accessing recently generated consumption records in the student card consumption record table is high. However, the possibility of querying earlier consumption records is low, and these data gradually become cold data. Additionally, cards of graduated students in the table will no longer generate consumption records.
Assume CardID increases with the card application time. Using CardID as the partition key will result in CardIDs of graduated students having no access pressure but being allocated reserved read/write throughput, causing waste. In this case, you can partition the consumption records by month, using a new table for each new calendar month. Maintain the table data based on its timeliness.
-
Consumption record table for the current month: New consumption records need to be continuously written, and there are query operations. You can set a larger reserved read/write throughput to meet access needs.
-
Consumption record table for previous months: No new data is written or the amount of new data written is small, and there are more query requests. You can set a smaller reserved write throughput and a larger reserved read throughput.
-
Consumption record table for over a year ago: The possibility of being used again is low. You can set a smaller reserved read/write throughput.
-
Consumption record table beyond the maintenance period: No longer accessed. You can export the data to OSS for archiving or directly delete the data.
-