Ten Years of Tablestore Development Summary

foreword
Table storage was developed in 2009 when Alibaba Cloud was founded. Inspired by Google Bigtable, we decided to develop a similar distributed table storage. Since it is developed based on the Feitian kernel, in terms of architecture design, we choose to provide a highly scalable distributed architecture based on the Feitian platform, provide highly reliable data storage based on Pangu, and most importantly, provide services based on the cloud. The first version was released in 2010, and the first customer of the service is Cloud Mailbox, which supports its massive mail metadata storage. Following the official launch of Alibaba Cloud's official website in mid-2011, the Table Storage cloud service was officially launched for public testing in January 2012, half a year later. So this year is the 13th year since Tablestore (OTS for short) was born, and it has been 10 years since it was launched as a cloud service.
In the past ten years of cloud service development, product functions have been continuously improved, and business scale has also continued to grow. So far, Tablestore has been exported on public clouds in 24 regions around the world, serving thousands of enterprise customers. We have also served many important BUs in the group, including DingTalk, Cainiao, Taote, Tmall, AutoNavi, Hema, Ant, Youku, Fliggy, etc. Inside Cloud Intelligence, it also serves as the underlying core storage for many cloud services, including cloud monitoring, MaxCompute, function computing, smart storage, and video cloud. Currently, the total size of the clusters we serve exceeds tens of thousands of units, storing hundreds of petabytes of data.
The business within the Ali Group is a good exercise for products, because there are more complex scenarios, larger data scales, and higher requirements for cost and performance. We have also been honing our business in the past few years, tempering our core and evolving our differentiated functions in certain vertical scenarios. The scale of business has grown, and stability challenges have also grown, and we have experienced several major failures in the past ten years. Stability has always been defined as the highest priority within the storage team, so we are constantly optimizing the architecture to improve availability.
Next, this article will first introduce Tablestore as a whole, and then share the functional evolution, technical architecture evolution, and stability optimization of the product at the technical level over the past few years, as well as the core application scenarios we define at the business level. and some typical cases.
Product introduction and development history
Big picture of function
Let’s introduce the product with a big picture of the functions of the data flow view. The basic capability provided by Tablestore is distributed table storage. This table storage has the following characteristics:
• Provide a partitioned table model to support horizontal expansion of storage scale: the data model refers to the definition of Bigtable, which is currently classified as WideColumn in the open source community. A simple understanding is a large wide table, which can store almost unlimited rows of data records, and each row of data will have a row primary key to identify uniqueness. In terms of physical distribution, the table is divided into different partitions according to the range of the primary key of the row, and the scalability is provided through distributed scheduling of these partitions.
• Provides a serverless service form, which is easier to use: you only need to activate the service to use it directly, and there is no need to purchase an ECS instance to deploy the service. Users only see data storage and requests, and the underlying physical resources are invisible. The underlying physical resources are automatically scaled automatically and elastically as the storage scale and access volume of the table change.
• Flexible data indexing to speed up data query and retrieval: The table storage model defined by Bigtable only has an index on the row's primary key, so it can provide very fast single-row query and primary key range query. However, in actual business needs, users also need to perform conditional queries based on non-primary key columns, or need to support more complex real-time queries and retrievals such as multi-field combination queries or full-text searches. Therefore, we have extended the Bigtable model in the subsequent functional evolution, which can support the automatic construction of indexes for table data, and provide multi-type index structures (secondary index and multi-index, which will be introduced later), for different query modes optimize.
• Convenient data management, manage data life cycle and data flow: As the scale of data becomes larger, data management will become more difficult. Tablestore provides more convenient data management functions, including support for TTL to automatically eliminate expired data, support automatic hot and cold layering of data to reduce costs, and realize distributed CDC (change data capture) capabilities of table data through channel services (Tunnel Service) Let the data flow in real time.
• Complete enterprise-level features, more secure use: Tablestore grows on the cloud, so the entire resource management, network, and account systems are built based on cloud-based basic services. Support RAM-based account authorization, support SLR-based inter-service access authorization, support VPC and other proprietary networks, and support KMS-based BYOK transparent encryption.
Scene hint
Let's look at a real example to understand how these functions of Tablestore are combined and used, taking the IoT device state data storage scenario as an example. In this scenario, the state data of hundreds of millions of devices needs to be stored, and each device will report its latest state data regularly. The table structure design in this scenario is very simple, and each row in the table corresponds to storing the data of a device. You only need to create such a table in Tablestore to meet the basic functions in this scenario, meet the status update, query and storage requirements of hundreds of millions of devices, and can automatically expand horizontally as the scale of devices increases. At this point, if you need to implement more advanced functions in this scenario:
• It is necessary to implement combined conditional retrieval based on multiple device status attributes: only need to create a multi-element index on the table through the API, which can satisfy the multi-conditional combined retrieval of any field in the table.
• It is necessary to capture the latest state changes of the device and do some real-time calculations: You only need to create a distributed CDC channel for the table through the API, and then you can connect to Flink for real-time calculations.
• It is necessary to periodically analyze the status of all devices to generate statistical reports: You only need to create an external relationship through SQL in MaxCompute/Spark, and then use the computing engine of MaxCompute/Spark to perform full analysis on the data in the table.
Data Serving
Tablestore provides structured data storage services. The most commonly used structured data storage in application data is relational databases, so we are often asked what is the difference from relational databases. To answer this question from the perspective of data application, relational databases are located in Transaction Process, while Tablestore is located in Data Serving. Transaction Process is a data processing based on the data Full State, and any data change needs to ensure the integrity and consistency of the data state. Data Serving puts more emphasis on ensuring the online service of data, ensuring the provision of highly reliable and highly available data services at any scale. Transaction Process and Data Serving are actually an upstream-downstream relationship. Usually, the generation of data in an application system is a Transaction Process process, and after the data is generated, it is necessary to provide Data Serving services. To give a practical application scenario, the generation of each order in the e-commerce system is a Transaction Process process, because it is necessary to associate status data such as commodity, price, inventory, account, etc. to determine whether an order can be generated, which needs to be based on a complete data entity Full State for transaction processing. After the order is generated, it will be converted to provide Data Serving. At this time, the data application scenario is mainly query and simple data update.
development path

In the more than ten years of Tablestore's development, we have mainly divided it into three stages in terms of functional evolution:
• Stage 1.0 (providing stable table storage services): Tablestore's core basic capability is to provide serverless table storage services. It has been continuously polishing the table engine, stability, and serverless productization capabilities for the first 5 years after its launch as a cloud service.
• Stage 2.0 (more complete data service capabilities): After meeting the basic storage service requirements, customers put forward a series of higher requirements for data service capabilities: hope to support indexing and provide more flexible data query and retrieval; hope to support Real-time data flow, real-time calculation or real-time ETL to the data warehouse; it is hoped that data analysis can be performed directly through the open source computing engine. Therefore, we started planning a series of major functional features in 2017, and released our version 2.0 in 2019, releasing channel services (distributed CDC), data indexes (secondary indexes and multi-element indexes), and computing ecology. Interconnection (MaxCompute/Spark/Flink/DLA, etc.).
• Stage 3.0 (security, cost, performance, and user experience): After meeting basic data service requirements, customers have higher requirements for security, cost, performance, and user experience. We have released a unified query SQL to simplify API calls, support data backup and BYOK data encryption when connected to HBR, and upgrade the latest version of the table engine kernel to greatly improve performance. The 3.0 stage has not been completed as a whole, and some functions are still being polished.
In terms of evolution, we have positioned ourselves to provide serverless services based on the cloud from the very beginning, and this form has been extended as the business develops. In 2015, with the export of proprietary cloud offline, we have entered into projects such as postal services, meteorology, national grid, national taxation, medical insurance, and public security. Proprietary cloud has its own special service form and a different service model from public cloud, which is another extension of product capabilities. In 2016, we began to export overseas to support the global deployment of Alibaba Cloud business.
Technical Architecture and Core Components
Overall structure

• Storage layer: The bottom layer relies on Pangu and OSS. All-flash and hybrid-flash Pangu clusters will be deployed online to correspond to the two types of high-performance and capacity storage specifications currently provided. For cold data that requires low-cost storage and low-frequency access, we will use OSS to store it.
• Engine layer: There are currently two data engines, namely the table engine and the index engine. The data of the index engine is synchronized from the table engine in real time, and the data in the table is indexed in real time. The data synchronization between the two data engines uses the CDC engine, which can capture the data updates in the table in real time and provide the ability to subscribe.
• service layer
• Data interface: a unified entrance for applications to access Tablestore services, and all functions are provided externally through APIs. In addition to the encapsulation of underlying control and data interfaces, the capabilities of this layer also manage user authentication, permissions, flow control, and so on.
• Unified query: SQL is used as the query engine, and the underlying docking table engine and index engine.
Tablestore will reuse a large number of internal basic technologies, including Kuafu, Nuwa, Feitian basic library, etc. These basic technologies are constantly honed in other storage product lines, and we can directly enjoy the technological dividends brought about by the improvement of capabilities. The core parts of Tablestore's self-development are the engine layer and the service layer. Next, we will introduce these important components respectively.
Distributed table engine

The table engine is the core of Tablestore. Data indexing, update subscription, and data management are all built around the table engine, so it must have strong performance, high scalability, and stability. Summarize the core technology of the following table engine:
• Storage-computing separation architecture: The underlying layer relies on the Pangu distributed file system to provide highly reliable data storage. The advantage brought by the separation of storage and computing is that computing and storage can be flexibly expanded, and each has strong flexibility and elasticity, which is the basis for providing serverless services.
• Dynamic partitioning and automatic load balancing: Partitioning is the basis for tables to have distributed capabilities, and dynamic partitions are the basis for tables to have elastic expansion capabilities. The dynamic partitioning capability of the table supports the automatic splitting of partitions, and the unavailable time of partitions is optimized to within 100 milliseconds during splitting, which has almost no impact on user requests. Flexible scheduling of partitions through automatic load balancing, automatic detection and elimination of access hotspots, ensuring resource balance and reducing access jitter.
• High-performance data engine: The table engine uses a large number of technologies to optimize read and write performance, including user mode communication framework (Luna), thread switching and memory copy optimization, lock-free data structure, row-column mixed storage data block optimization, etc. The performance of similar open source engines has been improved several times.
• High availability and stability optimization: Scenarios that affect availability mainly include burst traffic, unbalanced load, and Failover of Worker nodes. We use the combination of global flow control and stand-alone flow control to handle burst traffic, use dynamic load balancing to balance resources and eliminate access hotspots, and optimize the failover time of Worker nodes to reduce impact. Currently, the availability of a single cluster has reached 99.99%. In addition, the faults over the years and online problems in an increasingly complex environment have also allowed us to see and solve many corner case problems, which is also of great help to the improvement of stability.
Distributed CDC engine
Channel service is the CDC capability provided by Tablestore. The full name of CDC is Change Data Capture, which provides the ability to capture data updates in tables in real time. Many database products provide similar capabilities, such as MySQL Binlog, DynamoDB Stream, and CosmosDB Change Feed. It can be said that CDC is one of the necessary capabilities of core data components in modern data systems, because it has several very key uses:
• Real-time synchronization of heterogeneous storage: Modern application architectures contain many heterogeneous storage components, such as databases, caches, search engines, message queues, etc., and data will flow between these components in real time. How to ensure the data consistency between these heterogeneous components is a technical difficulty. A better architectural practice is to select the only DataSource as the "Single source of truth" of the data in the entire system, and realize real-time by capturing the changes of this DataSource Synchronize.
• Event-driven architecture: Event-driven is a good best architectural practice for decoupling application systems. Many complex application systems containing multiple microservices apply event-driven technology, such as real-time and asynchronous delivery between different services through message queues message to achieve Complete decoupling. Message queues are used to implement event-driven between applications, while CDC technology is required to implement event-driven between data storage and applications.
• Real-time calculation: Real-time statistics of data are required in many application scenarios, such as the Double Eleven big screen. The real-time performance of computing depends on the speed of data flow, and the connection between storage and real-time computing needs to rely on CDC technology to improve the speed of flow.

The figure above shows the architecture of the Tablestore channel service. The table consists of multiple partitions, and each partition contains a Commit Log queue, which stores all data update records in the partition in the order in which data is written. If a partition splits, the queue of the old partition will stop writing, and the two new partitions will be written into their new queues. The channel service essentially opens up the data query of the Log queue in the partition, but it is not enough to open the query interface. The following key technologies are implemented in the channel service:
• Full-incremental integration: Real-time data synchronization among heterogeneous storages is required to maintain data consistency, so it is necessary to synchronize the current full amount of data first and then synchronize the incremental data. The log queue does not save the full amount of data, but only the most recent data, so if you want to synchronize the full amount, you need to read the table directly. The bottom layer of the channel service encapsulates the query of full and incremental data, and exposes a consistent interface to the outside world, which greatly simplifies the use.
• Distributed order-preserving consumption: The key to ensuring data consistency is to be able to play back strictly in accordance with the order of data change records. If only one queue is queried, order-preserving is better achieved. However, since partitions will dynamically split to generate new partitions, the change records of the same row of data may be stored in different queues. A parent-child relationship is formed between the old partition and the new partition. To achieve order preservation, it is necessary to ensure that the parent partition is consumed first and then the child partition is consumed. The parent-child relationship between partitions is managed inside the channel service, and the consumption order of the parent-child partitions can be strictly guaranteed.
• Consumption status management: The channel service manages the data production end and can ensure the correctness of consumption behavior (such as order-preserving consumption). However, since the consumer end is hosted in the client application, it is difficult to manage the operation of the consumer end. If there is an exception on the consumer side, it will cause a lag in data consumption. For this reason, the channel service provides consumption status management on the server side, and the consumption side will report the status and consumption statistics in real time. Using this state data, you can quickly locate the abnormal nodes with consumption lag.
Distributed Index Engine
The main table provides the basic primary key index, which can only provide simple query based on primary key or primary key range query. However, in many scenarios, it is necessary to perform conditional queries based on non-primary key columns. At this time, indexes need to be built to speed up. Indexes can be built-in or external, or a combination of internal and external. For example, MySQL provides a joint index that can meet query acceleration in most scenarios. This is a purely built-in solution. However, there are limitations in the use of joint indexes, which require that the query conditions must conform to the leftmost matching principle of index columns. Therefore, Elasticsearch has to be used as an external index in scenarios where flexible combinations of query conditions are required. This is a typical combination of inside and outside, and it is very common to use. The advantage of the built-in index is that it is easier to manage the consistency of the table and the index, and it can better ensure real-time synchronization. Although external indexes are flexible, there are great challenges in dealing with data consistency and optimizing data synchronization delay. In the early days, when Tablestore did not provide a built-in index, most users chose an external index solution and encountered the same problem. So we decided to develop our built-in index, and also provide MySQL-like joint index and Elasticsearch-like inverted index to meet the needs of most scenarios.
The MySQL-like joint index capability provided by Tablestore is called a secondary index, which provides a strongly consistent LocalIndex within a partition and a globally eventually consistent GlobalIndex. The data distribution and index structure of the secondary index are consistent with the implementation of the primary key index. Logically, the secondary index is another table that reorders another dimension of the primary table data. The function of the secondary index is directly provided by the table engine. This chapter mainly introduces the architecture of another multi-index with Elasticsearch-like capabilities.

The above figure shows the structure of multi-index, and summarizes the core technology of multi-index:
• Storage-computing separation architecture: The underlying layer also relies on the Pangu distributed file system. Multi-index requires more computing resources to be dynamically allocated, so it needs to rely more on storage-computing separation technology.
• Scalable query capability: The data partition of the multi-index can provide multiple read copies, which is very different from the data partition of the table engine, because the multi-index is mainly optimized for query. The read copy supports dynamic expansion. When the system detects that the query load is too high, the system can automatically expand the read copy, which can be completed in seconds depending on the shared storage architecture.
• Data organization optimization:
• Hash partition: Since the multi-index provides an uncertain multi-field condition combination query, in many scenarios it is impossible to filter the partition where the result data resides. Usually it is a joint query of all partitions, and the execution mode of scatter-gather is adopted internally. In order to avoid the long tail of partition queries, try to balance the data between partitions. Therefore, the data partition of the multi-element index adopts the Hash Partition strategy, which is different from the Range strategy of the table.
• Routing key: In some scenarios, the query condition will have a fixed condition field. For example, the query of the order index table usually specifies the userid condition. If partition filtering can be performed in advance according to the fixed condition, the query efficiency will be higher. Therefore, we provide the strategy of Routing Key, which is used in combination with Hash Partition. If the value of routing key is given in the query condition, partition filtering can be performed in advance.
• Pre-sorting: If the data sorting is in the same order as the query results, you don't need to query all the results and then perform full sorting. You only need to get the results that meet the number of records and return them directly, which greatly improves the query efficiency. Therefore, the multi-index provides the ability of pre-sorting. If the query conditions always return results in a certain order, you can define pre-sorting for optimization.
• Multiple types of index structures: There are various index structures, such as B-tree, Hash, Bitmap, BKD-tree, Reverted Index, etc. Each index structure is optimized for a specific query mode. For our common query scenarios, the multi-element index implements the BKD-tree index that optimizes numeric range queries and the Reverted Index that optimizes multi-condition combination retrieval.
• Adaptive optimization and index rebuilding: The routing key and pre-sorting strategies in data organization optimization mentioned above need to be determined according to the actual query mode. Many businesses cannot consider all optimization strategies in advance when designing indexes at the beginning . Therefore, the multi-index provides adaptive optimization, which automatically analyzes common query patterns to determine whether to adopt routing key and pre-sort optimization measures, and automatically rebuilds the index, which is completely transparent to the upper layer.
• Dynamic index column and grayscale switching strategy: Many application scenarios will require new index columns after they go online. Multi-index provides two ways to add index columns. If you need to rebuild the index on the stock data, we provide a grayscale switching strategy. The new index will rebuild the index in the background. After the index data is equalized, you can query the traffic grayscale. If there is no problem, you can switch or switch back to ensure the safety of the new index. online. If index rebuilding is not required for existing data, the new index column can only take effect for new data.
Unified query engine
The earliest Tablestore only provided access to APIs, and each API corresponds to the atomic function of each underlying module. The advantage of pure API is that users can flexibly combine and call the underlying functions, and have strong DIY capabilities, but there are many inconveniences:
• The code logic of the application layer is extremely complex: if a simple single-line query is expressed in SQL, the main logic only needs one line of code, but the naked tuning API requires at least several times the amount of code. If it is a statistical aggregation that calls a multi-index, the complex Group By plus multi-layer aggregation code can even reach dozens of times.
• Some simple data processing cannot be attached to the query: Usually, after the result data is queried from the storage layer, in some scenarios, it is not necessary to return the original result, but some statistical aggregation results, such as Sum and Avg, etc. are required. Bare APIs cannot express calculation processing, and with SQL, it is easy to support some lightweight statistical aggregations.
• Difficult to independently select indexes and look up the main table efficiently: Index selection depends on some RBO or CBO optimization algorithms. The application layer can barely implement a few simple RBOs by itself, but it is difficult to perform CBO optimization. In addition, the index selection optimization algorithm is preferably integrated within the service, so that each application does not need to implement a separate set. In addition, if you want to reverse query the main table after querying the index, you need to call the query API of the index and then call the API of the main table, which is very complicated to implement.
• Unable to provide interactive query: It is difficult to implement interactive query logic for the naked code debugging API, which usually needs to be supported by SQL.
• Application composition Using MySQL or migrating from MySQL requires a lot of transformation costs: in many scenarios, Tablestore is used as MySQL's hierarchical storage of historical data. Since the table structure is roughly the same, the query logic is also roughly the same. If SQL is not supported, the application layer needs a lot of query code transformation costs to adapt to the API.
The above are all from real scenarios and real feedback from customers, so we decided to introduce the SQL query engine in the planning of Tablestore 3.0 after a long consideration. But before introducing SQL, we need to be clear about several key points in architecture design:
• Does SQL provide a relational model or a federated query? The positioning of our SQL is to do federated queries instead of expressing independent models. There are two main points: one is that the underlying model of Tablestore can express richer semantics and is not fully compatible with the relational model; Joint query requirements.
• What is the relationship between API and SQL? The above positioning is clear, and the relationship between SQL and API is also clear. SQL will be built based on API. Or more precisely, the API defines the abstraction of the underlying data model, and SQL makes another layer of abstraction on the upper layer based on this.
• What are the computing power requirements of the Data Serving scenario? Specific to technical indicators, we hope to provide high concurrency, high QPS and low latency queries, requiring different SQL execution engine architecture choices and optimization goals will be different.
• What are the SQL engine optimization goals? Based on the above goals, our main optimization goals in the Data Serving scenario are: first, to provide a horizontally scalable SQL execution engine; second, to optimize performance per unit of computing power, rather than how to schedule more computing power.

The above is the architecture of the SQL engine, and its core technical points include:
• Provide federated query capability: It is built based on the API provided by the underlying storage engine, so that the data written by the API can be queried by SQL.
• Adaptive index selection and main table reverse lookup: optimize index selection according to RBO and CBO, and support automatic reverse lookup of the main table.
• Horizontally scalable execution engine: Similar to the API layer architecture, it uses stateless peer nodes and can be horizontally extended to support higher QPS.
• Performance optimization under unit computing power: A series of optimizations have been made to optimize computing performance under unit computing power, including: 1. Optimization of the data exchange protocol with the storage layer; 2. More calculations are pushed down to the storage layer for execution; 3. . The input data of the calculation engine adopts column encoding to apply vectorized execution technology to optimize performance.
Core Application Scenarios
This chapter will introduce several core application scenarios of Tablestore. Our definition of "core" scenarios is:
1. In this scenario, Tablestore has certain competitiveness in terms of core capabilities, such as cost, performance, and scale.
2. There are existing large-scale, mature, and high-demand businesses that have been running stably for many years and have passed the test of large traffic.
3. In this scenario, Tablestore has certain differentiated features, which bring great benefits to the application architecture.
Classification of application scenarios

We have classified the types of application scenario architectures on Tablestore, which roughly include the following categories:
• Internet application architecture
• Database layered architecture: For scenarios that require Transaction Process + Data Serving, a layered architecture will be adopted. A typical application is order storage.
• Distributed table storage architecture: For scenarios that only require Data Serving, the requirements for high scalability and high concurrent access of the storage layer are high, and it will be built directly based on Tablestore. Typical applications are IM/Feeds message storage, network disk metadata storage, health code storage, etc.
• Data lake architecture: A typical application of Bigtable is as real-time structured data storage in a big data architecture. A feature of a data lake is the modularization and serverless of each component of the big data architecture. Under this architecture, Tablestore can be used as the source table, dimension table, and result table of the streaming batch computing engine, providing serverless structured data storage capabilities.
• Internet of Things architecture: The Internet of Things is a vertical business scenario. Because it needs to manage a large number of devices, it needs a scalable storage. In specific application scenarios, Tablestore can be used as the storage of device messages, device metadata, and device time series data.
Data products are in full bloom, and developers often have the most headache how to make model selection. Our experience at the model selection level is:
1. The best way to test whether a product meets the requirements is actual testing, not only simple functional testing, but also scale and performance testing.
2. See if there is any application architecture similar to yours that you can learn from. Of course, the premise that you can learn from is that the business has realized a better architecture, and it has been running stably for many years, and it has withstood the test of large traffic.
3. Don't be afraid of wrong selection, but you must retain the ability to replace components to achieve a flexible architecture.
4. At the current level, it is still difficult for distributed systems to achieve self-operation and maintenance of business. Therefore, when selecting models, it is also necessary to consider whether there is a stable technical support team behind the products used.
Order Storage Architecture
Order storage is a typical data layered architecture In 2020, we will assist Alibaba Cloud’s financial and accounting system to transform the storage layer. The goal is to support the scale growth in the next 5-10 years after the transformation, and separate from the TP database to solve the impact on the query when the bill is retrieved and exported. Typical Transaction Process is separated from Data Serving. In the first phase, a total of 40 billion bill data migrations were completed, the average delay of slow interfaces was reduced from 7s to 100ms, the bill retrieval performance was increased by 55.3 times, and the bill export speed of large users was increased by 35 times. Afterwards, we summarized the storage architecture, replicated it inside and outside the group, and attracted more storage transformations for order-based businesses. After the completion of the first phase, the cumulative data scale in the second and third phases has reached trillions.
We have made a technical summary of this layered storage architecture. You can continue to read the article "Data Storage Architecture Evolution of Cloud Application Systems" on the official account. In addition, we also summarized the architecture technology sharing for this large-scale order storage scenario. You can continue to read the article "Large-scale order system practice based on MySQL + Tablestore hierarchical storage architecture-architecture".
Metadata Storage Architecture
Metadata is more like state data in type, and usually requires transaction processing. However, in some scenarios, the metadata layer first needs to have the ability to expand horizontally, and the requirements for transactions will be slightly weaker, and global transactions are not required, only local transactions are required. There are several types of businesses within Ali that use Tablestore as metadata storage, mainly network disk metadata, media asset metadata, and big data metadata. As the underlying storage of the PDS service, we indirectly support multiple personal network disk businesses within Ali. As the underlying metadata storage of MaxCompute and DLF, it supports metadata storage in big data scenarios inside and outside Alibaba.
Message Storage Architecture
Typical scenarios for message storage are mainly IM and Feeds streams in the Internet scenario, and device messages in the IoT scenario. The most important messaging applications within Ali are DingTalk, Mobile Taobao and the application messaging system ACCS/Agoo. We have cooperated with DingTalk since 2016, experienced the peak of the epidemic together in 20 years, and first applied it in the DingTalk scene in 21 years The latest version of the table engine has greatly improved performance and stability. DingTalk launched the IMPaaS platform based on DingTalk's messaging base, which unifies mobile Taobao messaging and other messaging services. As the underlying message storage of DingTalk and IMPaaS, Tablestore supports various message services within the group.
We have summarized the technical architecture sharing of using Tablestore in the IM and Feeds scenarios, you can continue reading "Secret! How to design the message architecture of modern IM system? ’, ‘How to make a “Dingding”? Talk about the realization of the message system architecture and "how to easily design a billion-scale feed flow system?" 』These three articles. DingTalk also shared the realization of its messaging system architecture that has been accumulated for many years. You can read "How to communicate efficiently with 500 million users?" DingTalk revealed the secret of the instant messaging service DTIM for the first time.”
Big Data Storage Architecture
The traditional big data architecture is an on-premise deployment model. After moving to the cloud, it evolves into a data lake architecture, and each component is gradually serverless. The most typical example is the replacement of HDFS by OSS. Traditional big data has a classic architecture system called Lambda. Under this architecture, data is required to be written into batch storage and stream storage respectively for batch computing and stream computing. Later, some new architectures simplified Lambda, such as Kappa and Kappa+. The two sets of storage and computing engines bring maintenance complexity, so the follow-up technology evolves toward the integration of streaming and batching. Stream-batch integration is further divided into stream-batch integration at the computing layer, such as Flink and Spark, and stream-batch integration at the storage layer. Tablestore provides batch query and CDC-based stream query, so we decided that we can do stream-batch integrated storage, so we tried to export the Lambda Plus architecture to the outside world, and also attracted some businesses to transform based on this architecture.
For an introduction to the Lambda plus architecture, you can read "How does the big data architecture integrate streaming and batching?" "This article. Regarding the positioning of Tablestore as structured data storage in the big data architecture, you can continue to read "Structured data storage, how to design it to meet the needs?" "This article. As the dimension table and result table of the computing engine in the big data architecture, Tablestore has the advantages of serverless and large-scale. We have a detailed comparison, you can continue to read "Selection of Real-time Computing Dimension Table and Result Table in Cloud Native Big Data Architecture Model Practice' article.
IoT storage architecture
In the Internet of Things scenario, Tablestore is mainly used to store device metadata, device message data, and device time series data. The core link is the storage of metadata and message data. Its architecture is similar to the above Internet scenario. In the Internet scenario, we have accumulated It has a lot of experience in metadata and message data storage, so it can also be well applied in IoT scenarios. Time-series data is a special type of data, which has some obvious characteristics, such as data with time stamps, real-time calculation for hot data, and batch query for cold data. The scale of time series data is large, and there is a very obvious difference between hot and cold, so it is usually necessary to use hot and cold layering technology to reduce costs. We have seen that time series data scenarios can do a lot of special optimization, so IoTStore optimized for time series data storage was developed in Tablestore 3.0. For the introduction of IoTStore usage scenarios, you can continue to read "One-stop IoT Storage Solution" article.
at last
Thank you for reading, and I hope you can learn more about the entire evolution of Tablestore. We still have a long way to go. The various components in the overall architecture mentioned above still have room for improvement in terms of cost, performance, and stability. We will continue to strive for perfection.
Tablestore has been able to persist until now, not only because of the persistence of the core R&D team, but more importantly because of the trust and support of customers. Thanks to all customers who support Tablestore behind the scenes.

Ten Years of Tablestore Development Summary

Related Articles

A detailed explanation of Hadoop core architecture HDFS

What Does IOT Mean

6 Optional Technologies for Data Storage

What Is Blockchain Technology

Explore More Special Offers

Short Message Service(SMS) & Mail Service

Sales Support

Technical Support

Connect & Report Abuse