By Chengbi
For many platform applications or systems (such as e-commerce CRM platforms and warehouse order platforms), their service models are based on the user dimension (which can be a seller, brand, warehouse, etc.). Therefore, the business database of this platform type is usually horizontally partitioned based on the user dimension to support the scalability of the business system.
However, when some users of platform applications grow into large users (such as big brands, big sellers, big warehouses, etc.), these large users can easily lead to the following issues due to the significant difference in their data or traffic compared to other users:
As a solution to these business problems, the SaaS multi-tenancy model is commonly used. In this model, the business system serves multiple users, with each user (or batch of users) considered as a tenant.
These tenants use the same infrastructure and platform within the business system for their operations, but their data is still isolated. Therefore, tenants usually have their own physical resources to store and manage data separately. The core idea of the SaaS multi-tenancy model is to ensure resource isolation and data isolation among tenants to solve the stability issues of the business system and provide flexible customization of tenant resources.
There are two common SaaS multi-tenancy solutions in different application scenarios:
Schema-level SaaS multi-tenancy means that a tenant corresponds to a schema that contains multiple tables (which is equivalent to a database in MySQL). The schemas of different tenants are distributed on different machines (as shown in Figure 1) to implement resource isolation. This solution is suitable for scenarios where different tenants need to use independent schemas for operation.
Partition-level SaaS multi-tenancy means that a tenant corresponds to one or more partitions of a table (or a part of rows of a table). The partitions of different tenants are distributed on different machines (as shown in Figure 2) to implement resource isolation. This solution is suitable for scenarios where different tenants need to use a unified schema for operation.
In terms of the degree of isolation, Schema-level SaaS multi-tenancy is higher than Partition-level SaaS multi-tenancy. However, because the former has to maintain a large number of schemas, it will incur higher O&M costs, as well as query and analysis costs, than the latter.
However, Partition-level SaaS multi-tenancy usually depends on middleware sharding or distributed database partitioning functions(otherwise, single-server databases cannot achieve resource isolation) for operation, while Schema-level SaaS multi-tenancy does not need such dependencies, and users can build several single-server MySQL databases for operation, which has a lower threshold.
The application architecture may be too abstract to understand. To make it easier for readers to understand how SaaS multi-tenancy helps the business solve problems, this article will use a real case to illustrate.
Company B is a leading seller order management platform that provides full-cycle customer service for e-commerce in China. Its business system needs to maintain numerous sellers of several different brands. Usually, there are multiple sellers for a brand (for example, a brand may open multiple online stores), so the brand and the seller are a one-to-many relationship.
At present, Company B's order management platform manages more than 50 TB of order data, with a daily QPS of nearly 30,000 times. The order volume of different brands will vary greatly (orders of big brands may be nearly 100 times or more than orders of small brands). In addition to having much larger orders, some big brands use more advanced paid VIP services. For example, they require exclusive resources for order data to be isolated from normal data, and allow independent statistical analysis of their own brand's order data.
Company B divides its sellers by brand (a brand is equivalent to a tenant) in order to address the differences in resource usage and services of data from different brands.
The appeal of big brands:
The order volume (such as order data storage size of more than 1 TB or 2 TB) and the data storage are large.
Monopolize a group of storage resources and need to independently access and analyze data.
All sellers of the brand must have the same set of storage resources.
The number of big sellers of big brands is more than 150, which will continue to increase.The appeal of small brands:
The order table is small, and the number of sellers is large (more than 60,000 sellers).
Share a set of storage resources.
Require that all seller data is evenly distributed on storage.
The core problem is how should the database of Company B's order management platform be designed to meet the demands of different brands and their major sellers for different resource usage and data isolation.
For the preceding business scenarios, Company B, instead of using distributed databases, can simply use standalone MySQL and some open-source sharding middleware to build a SaaS multi-tenancy scheme (such as dividing the brand and its sellers into tenants) to isolate the resources of tenants. This seems to be feasible, but in reality, the business will face more difficult problems.
The first is the issue of cross-machine distributed transactions. Most of the sharding middleware cannot provide strong consistent distributed transaction capacity, or can only provide eventual consistency-based transaction compensation solutions. This means that the business needs to do a lot of additional application transformations to avoid error reporting caused by cross-machine transactions as much as possible.
Then, there is the issue of schema consistency. Based on middleware sharding, whether it is Schema-level multi-tenancy or Partition-level multi-tenancy, the order platform of Company B has to maintain the metadata consistency of the schema or table of each tenant. For example, MySQL's common DDL operations such as creating and deleting tables and adding and deleting columns and indexes, the middleware solution obviously cannot ensure that the tables of all tenants on the platform can take effect at the same time. Once the execution is interrupted, manual intervention is required to correct them, resulting in high labor costs.
Then, there is the issue of data migration of tenants. Based on the SaaS multi-tenancy solution, if Company B wants to allocate new independent resources to a big brand, it is inevitable to migrate tenant data from the original machine to the new machine. This data migration operation can only be completed by relying on additional synchronization tools to build synchronization links. This intermediate cutting process even requires business downtime. This means that the basic operation of adding a new tenant will also bring very high O&M costs.
Based on the above analysis, Company B's SaaS multi-tenancy solution based on standalone MySQL and some middleware is not cost-effective.
In fact, with the Alibaba Cloud PolarDB Distributed Edition 2.0 (PolarDB for Xscale, also referred to as PolarDB-X), Company B has been able to solve the above-mentioned business problems by combining the non-modularized subpartition and Locality features. The following section describes the non-modularized subpartition and Locality features of PolarDB Distributed Edition 2.0 for easier understanding.
ApsaraDB for PolarDB Distributed Edition allows you to create partition tables by using list partitions since V5.4.17. Different from other distributed databases, the subpartitions of PolarDB-X are compatible with the syntax of native MySQL subpartitions. In addition to such features, many other subpartition features are expanded. For example, PolarDB-X supports user-defined non-modularized subpartitions. (Native MySQL only supports modularized subpartitions.)
The so-called non-modularized subpartition allows the number of subpartitions under each level-1 partition and their boundary value definitions to be inconsistent, as shown in the following code:
/* Level-1 partitions: list columns. Subpartitions: non-modularized composite partitions of Hash partitions. */
CREATE TABLE t_order /* Order table */ (
id bigint not null auto_increment,
sellerId bigint not null,
buyerId bigint not null,
primary key(id)
)
PARTITION BY LIST(sellerId/* Seller ID*/) /* * */
SUBPARTITION BY HASH(sellerId)
(
PARTITION pa VALUES IN (108,109)
SUBPARTITIONS 1 /* There is a hash partition under the level-1 partition pa, which stores all seller data of big brand a */,
PARTITION pb VALUES IN (208,209)
SUBPARTITIONS 1 /* There is a hash partition under the level-1 partition pb, which stores all seller data of big brand b */,
PARTITION pc VALUES IN (308,309,310)
SUBPARTITIONS 2 /* There are two hash partitions under the level-1 partition pc, which stores all seller data of big brand C */,
PARTITION pDefault VALUES IN (DEFAULT)
SUBPARTITIONS 64 /* There are 64 hash partitions under the level-1 partition pDefault, which stores the seller data of many small brands. */
);
Based on the preceding LIST + HASH non-modularized subpartition, it can directly bring the following effects to the application:
• For sellers of large brands (which is equivalent to a tenant), data can be routed to a separate set of partitions.
• For small and medium-sized brands, data can be automatically balanced to multiple different partitions based on the hash algorithm to avoid access to hot spots.
When the seller data of big brands and small and medium-sized brands are isolated at the partition level by LIST partition, it is natural to realize the physical isolation of the storage resources of big brands and small and medium-sized brands. In PolarDB-X, users can use the Locality capability to easily isolate resources between different partitions.
PolarDB-X allows you to use the LOCALITY keyword to specify the actual location of the storage resources in a database partition. In PolarDB-X, the storage resources consist of multiple data nodes (DN). You can use the IDs of DNs to allocate the storage resources. This way, data is isolated or evenly distributed. Its specific syntax is as follows:
ALTER TABLE #tableName
MODIFY (SUB)PARTITION #(sub)partName
SET LOCALITY='dn=dn1[, dn2,...]'
For example, Company B can use the following SQL command to move all the data of the big brand pa in t_order to a storage node dn4:
ALTER TABLE t_order MODIFY PARTITION pa SET LOCALITY='dn=dn4'
You can use the SHOW STORAGE statement to query the IDs of all PolarDB-X DNs. For example:
mysql> show storage;
+----------------------------+----------------+------------+-----------+----------+-------------+--------+-----------+------------+--------+
| STORAGE_INST_ID | LEADER_NODE | IS_HEALTHY | INST_KIND | DB_COUNT | GROUP_COUNT | STATUS | DELETABLE | DELAY | ACTIVE |
+----------------------------+----------------+------------+-----------+----------+-------------+--------+-----------+------------+--------+
| polardbx-storage-0-master | 10.0.x.1:3306 | true | MASTER | 41 | 66 | 0 | false | null | null |
| polardbx-storage-1-master | 10.0.x.1:3307 | true | MASTER | 41 | 53 | 0 | true | null | null |
| ...... | ...... | true | META_DB | 2 | 2 | 0 | false | null | null |
+----------------------------+----------------+------------+-----------+----------+-------------+--------+-----------+------------+--------+
Going back to the previous example of Company B, the core requirement of Company B is to isolate the seller data and its storage resources of large brands from small and medium-sized brands. Then, based on the partition table of the subpartition mentioned above, Company B can specify the storage resources allowed to be used by the level-1 partition and all its subpartitions by adding the corresponding LOCALITY definition to each level-1 partition. Then, the business can directly implement the isolation of storage resources of SaaS-level multi-tenancy (that is, brand side) during the table creation phase, as shown in the following code:
/* Level-1 partitions: list columns. Subpartitions: non-modularized composite partitions of keys. */
CREATE TABLE t_orders /* Order table */ (
id bigint not null auto_increment,
sellerId bigint not null,
buyerId bigint not null,
primary key(id)
)
PARTITION BY LIST(sellerId /* Seller ID */ )
SUBPARTITION BY HASH(sellerId)
(
PARTITION pa VALUES IN (108,109,....)
LOCALITY='dn=dn16' /* Big brand pa monopolizes a DN dn4 */
SUBPARTITIONS 1,
PARTITION pb VALUES IN (208,209,....)
LOCALITY='dn=dn17' /* Big brand pb monopolizes a DN dn5 */
SUBPARTITIONS 1 ,
PARTITION pc VALUES IN (308,309,310,...)
LOCALITY='dn=dn18,dn19' /* Big brand pc monopolizes two DNs: dn6 and dn7 */
SUBPARTITIONS 2,
PARTITION pDefault VALUES IN (DEFAULT)
/* The Level-1 partition pDefault occupies 16 DNs from dn0 to dn15. These partitions are shared by small and medium-sized brands. */
LOCALITY='dn=dn0,dn1,...,dn2,dn15'
SUBPARTITIONS 64
);
As shown in the above figure, through Locality binding to DN resources of each level-1 partition, tenants of three major brands, pa, pb, and pc, are respectively allocated node resources of DN16, DN17, and DN18~DN19 groups, while pDefault partition of small and medium-sized seller pool is bound to 16 DNs.
After the subpartition and Locality capabilities of Company B isolate the resources of tenants of different brands, the following problem is to be addressed: How can users effectively and conveniently manage these tenants? The answer is the PolarDB-X partition management capability.
PolarDB-X provides a series of comprehensive, flexible and powerful partition management commands for partition tables (as shown in the following figure), allowing users to implement different O&M changes in multi-tenancy scenarios by using simple SQL commands.
Next, we will still use the example of Company B to introduce the common O&M changes in the scenario of supporting SaaS multi-tenancy based on partition management.
Take Company B as an example. A tenant of Company B corresponds to a brand, and a brand usually has multiple sellers on Company B's platform. Therefore, when a brand party opens a new store, a new seller ID needs to be added to the tenant resource corresponding to the brand party.
With the MODIFY PARTITION ADD/DROP VALUES features of PolarDB-X, you can easily add a new seller ID to the LIST partition, as shown in the following code:
/* Add a new seller 205 to brand pb */
ALTER TABLE t_orders MODIFY PARTITION pb ADD VALUES (205);
During the execution of the DDL statement, PolarDB-X automatically extracts all data with sellerId = 205 from the DEFAULT partition of the LIST (if there is an explicitly defined DEFAULT partition) and migrates the data to the partition pb. The whole DDL process is online, and business applications are almost unaware of it.
On order management platforms such as Company B, sellers of various brands usually go through the process of growing from nothing to small sellers and then to large sellers. Therefore, when a small seller of a brand develops into a large seller, the brand may have Company B extract its seller out of the pool of small and medium-sized brands (e.g., DEFAULT partition), making it a VIP of independent tenants and allocating separate storage resources for it.
The ADD/DROP PARTITION and Locality features of PolarDB-X allow Company B to easily perform changes in the preceding scenarios online. For example, Company B wants to extract the big seller 301 of the new big brand pe from the DEFAULT partition and make it monopolize the new storage resource new_dn, as shown in the following code:
/* 1. Company B is managing to purchase new CN/DN resources... */
/* 2. Add a new big seller, create a new partition, and place it in a specific DN */
ALTER TABLE t_orders ADD PARTITION (
/* Extract a new big seller 301 from the pDefault partition, name it pe, and place its data in the new node new_dn. * /
PARTITION pe VALUES IN (301) LOCALITY='dn=new_dn' SUBPARTITIONS 1,
);
Similar to MODIFY PARTITION, these ADD/DROP PARTITION changes belong to online DDL operations. The data migration operations are almost transparent to business applications.
In multi-tenancy scenarios, the LIST + KEY non-modularized subpartition of PolarDB-X provides an important feature that allows different tenants to have different numbers of hash subpartitions. This means that different tenants are allowed to use different amounts of storage resources by defining different numbers of subpartitions.
For example, in the LIST partition definition of the t_orders table of Company B, the number of subpartitions under the level-1 LIST partition of the big brand pc is 2, and two DNs are monopolized to store order data. (That is, each DN of the pc partition is assigned a subpartition). In addition, sellers of small and medium-sized brands share 64 subpartitions under the DEFAULT partition and 16 DNs from dn0 to dn15 (as shown in the following code):
PARTITION pDefault VALUES IN (DEFAULT)
/* The level-1 partition pDefault occupies 16 DNs from dn0 to dn3, which are shared by small and medium-sized brands. * /
LOCALITY='dn=dn0,dn1,...,dn2,dn15'
SUBPARTITIONS 64
However, many small and medium-sized sellers in the DEFAULT partition may also have some hot spots (for example, 20% of the head sellers may account for 80% of the order quantity). If these hot sellers are not properly distributed, they may cause a load imbalance among the 16 DNs in the DEFAULT partition.
Therefore, the problem that Company B needs to address is how to manage the order data of many small and medium-sized sellers in these 64 subpartitions to enable them to be evenly distributed to these 16 DNs and ensure the overall load balancing of the system. To solve the problem, the partition-level rebalancing feature of PolarDB-X is required.
The partition-level rebalancing feature of PolarDB-X allows automatically scheduling multiple subpartitions under a level-1 partition based on the Locality of the level-1 partition. This way, the subpartitions are evenly distributed among the nodes that are defined by the Locality. You only need to execute an SQL statement (as shown in the following code) to complete the preceding balancing change:
REBLANCE TABLE t_orders PARTITIONS=pDefault;
The partition table of PolarDB-X and the SaaS-level multi-tenancy capability of Locality can provide more data query capabilities for order management platforms such as Company B, as well as meeting their demands for product data isolation and resource isolation.
For example, big brands on Company B's platform occasionally need to use VIP services such as independent queries and analysis of their own order data. These brands will directly query and analyze their order data (for example, query the number of orders from important customers) through some Web SQL tools provided by Company B.
However, Company B, as a platform system, needs to ensure data security and isolation among different tenants — tenants can only see their own data when querying order data, and cannot see any data of other tenants.
How does Company B isolate data between different tenants based on the PolarDB-X partition table? The answer is the partition selection and view definition feature.
For example, if Company B wants to authorize its tenant pb to query and analyze its own order data separately, its Web SQL tool will automatically use an SQL command similar to the following to create a corresponding read-only view t_order_pb_view for the tenant pb on PolarDB-X in advance:
CREATE VIEW t_order_pb_view AS
SELECT *
FROM t_orders PARTITION(pb) /* The data in the t_orders table will only be returned to partition pb and all subpartitions. */
Then, after the platform automatically authorizes the account information of the tenant pb, the tenant pb will only be allowed to see the read-only view t_order_pb_view after logging in to the Web SQL tool provided by the platform.
So, suppose the tenant wants to perform a view query to count the total number of orders, as shown in the following code:
/* The SQL statement that is used to query order data for a large tenant pb: count the number of orders. */
SELECT COUNT(1) FROM t_order_pb_view;
PolarDB-X automatically replaces the view t_order_pb_view with the corresponding subquery:
/* The SQL statement that is used to query order data for a large tenant pb: count the number of orders. */
SELECT COUNT(1) FROM
(
SELECT *
FROM
t_orders PARTITION(pb)
) as t_order_pb_view;
In this way, the view t_order_pb_view will only be allowed to return partition pb data based on the restriction of the partition selection syntax. By doing so, tenant pb cannot query the seller order data of other tenants, thus achieving data isolation.
The partition table and flexible management syntax of PolarDB-X can be used to package various business models in different scenarios. For example, the SaaS multi-tenancy solution described in this article, which is built based on non-modularized subpartitions and Locality capabilities, is a classic example.
In fact, the real case mentioned in this article is the seller order management system of Company B, which has been successfully launched based on the above-mentioned SaaS multi-tenancy solution of PolarDB-X (as shown in the figure below). Currently, the platform provided by Company B is responsible for managing over 50 TB of order data.
The case of Company B is clearly replicable and can be promoted. For example, its tenant dimension, brand, can be easily associated with other business dimensions to build similar practices, such as logistics management of major warehouses, gift-giving data management in live broadcast rooms on live broadcasting platforms, and traffic monitoring data management in major cities.
In summary, if the following business scenarios exist:
• You need to divide data into multiple business units based on dimensions such as region, warehouse, seller, and brand.
• You also need to allocate different resources and physically isolate data for each business unit.
Users who encounter the above points can refer to this article's SaaS multi-tenancy solution for database design.
Use YCSB and ApsaraDB for MongoDB to Monitor and Test Connection Parameters
ApsaraDB - January 3, 2024
Alibaba Container Service - April 28, 2020
ApsaraDB - November 28, 2022
Alibaba Clouder - March 17, 2021
ApsaraDB - July 3, 2019
Alibaba Cloud Community - July 27, 2022
Alibaba Cloud PolarDB for Xscale (PolarDB-X) is a cloud-native high-performance distributed database service independently developed by Alibaba Cloud.
Learn MoreAlibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAlibaba Cloud PolarDB for MySQL is a cloud-native relational database service 100% compatible with MySQL.
Learn MoreLeverage cloud-native database solutions dedicated for FinTech.
Learn MoreMore Posts by ApsaraDB