Table Store is Alibaba Cloud's first distributed multi-model database, which is a NoSQL database. At present, many application systems no longer rely solely on relational databases in the underlying layer, but use different databases according to different business scenarios. For example, cache KeyValue data will be stored in Redis, document data will be stored in MongoDB, and graphic data will be stored in Neo4J.
Looking back at the development of NoSQL: NoSQL was created in the era of Web 2.0. The era of rapid development of the Internet has also brought about an explosion of Internet data. Traditional relational databases cannot handle such massive amounts of data, and a distributed database with high scalability is required. However, it is very challenging to implement a high-availability and scalable distributed database based on the traditional relational data model. Data models for most of the data on the Internet are simple, and do not require a relational model for modeling. If a simpler data model could be used instead of relational model to model data, weaken transactions and constraints, and aim at high availability and scalability, then databases designed in this way would better meet the business requirements. Such a concept has promoted the development of NoSQL.
In summary, the development of NoSQL is based on the new challenges of business and the new needs for the database in the Internet era. Developed based on this, NoSQL has distinctive features:
The development trends on DBEngines indicate that various NoSQL databases have undergone significant development in recent years. As a distributed NoSQL database, Alibaba Cloud's TableStore uses a multi-model architecture in terms of the data model, and supports both Wide Column and Time series.
The Wide Column model is a classic model proposed by BigTable and widely used by other systems of the same type. At present, most of the semi-structured and structured data in the world are stored using this model. In addition to the Wide Column model, we have also introduced another new data model: time series, a new generation model for message data, which is suitable for the storage and synchronization of messages in messaging systems such as IM, feeds, and IoT device pushdowns, and is now widely used. Next, the two models are described in detail.
The above is a schematic diagram of the Wide Column model. To better understand this model, we take a relational model for comparison. A relational model can be simply understood as a two-dimensional model consisting of rows and columns, with schema fixed for each row. So the features of a relational model are: two-dimension and fixed schema, which is the simplest understanding, aside from transactions and constraints. The Wide Column model is a three-dimensional model with an additional dimension of time to the two dimensions of row and column. The time dimension is reflected in the attribute column, which has multiple values, each value corresponding to a timestamp as the version. And each row is schema free, with no strong schema definition. So the differences between Wide Column model and the relational model are: three-dimension, schema free, and simplified transactions and constraints.
This model consists of:
The features of the Wide Column model are summarized as: three-dimensional structure (row, column, and time), wide row, multi-version data, and time-to-live management. Also, in terms of data operation, the Wide Column model provides two data access APIs, Data API and Stream API.
The Data API is a standard data API that provides online data read/write, including:
In the relational model database, there is no standard API for the incremental data in the database, while in many application scenarios of traditional relational databases, the use of the incremental data (binlog) cannot be ignored. This is widely used inside Alibaba, and provides the DRC middleware to fully utilize this part of the data. After fully utilizing the incremental data, we can do a lot of things in terms of the technical architecture:
However, even if the incremental data of a relational database is useful, the industry does not have a standard API definition to get this data. TableStore has long recognized the value of this data, and has provided a standard API to fully utilize the data. Here is our Stream API (documentation).
The Stream API generally includes:
The implementation of TableStore Stream is much more complicated than MySQL Binlog, because TableStore has a distributed architecture, and Stream also has a distributed incremental data consumption framework. The data consumption of Stream must be obtained in an order-preserving manner. The shards of the Stream correspond to the partitions of the table inside the TableStore. The partition of the table may be split and merged. To ensure the data consumption for the old and the new shards after partition splitting and merging is order-preserving, we have designed a more sophisticated mechanism. The design of TableStore Stream is not described here, and we will provide more detailed design documents later.
Because the complexity of Stream's internal architecture also impacts the Stream's data consumption side, it is not easy for users to use the Stream API. A new data consumption channel service we planned this year is coming soon, to simplify the data consumption of Stream, and provide a simpler and easier to use API.
The time series model is a new data model that we have created for the message data scenarios. It can meet the special requirements of message data scenarios for message order preserving, massive message storage, and real-time synchronization.
The above is a schematic diagram of the time series model, which abstracts the data in a large table into multiple time series. There is no limit for the number of time series in a large table.
A time series consists of:
The time series model is similar to the message queue in terms of logic, and a time series is similar to the topic in a message sequence. The difference is that the TableStore time series is more focused on the scale of topics. In the instant messaging scenario, both the inbox or outbox of a user is a topic. In the IoT message scenario, each device corresponds to a topic, and the scale of topics will reach the order of 10 million or even 100 million. TableStore time series is based on the underlying distributed engine. A single table can theoretically support the unlimited number of time series (topics), which simplifies the sequence's Pub/Sub model. It also supports message order preservation, random positioning, and scan in ascending and descending orders. It better meets the requirements of scenarios with massive message data, such as instant messaging (IM), feeds, and IoT messaging systems.
Time series is a new data model launched last year, which is constantly being optimized. Based on this model, we have helped DingTalk, Cainiao Smart Customer Service, Taopiaopiao Xiaojuchang, Smart Device Management, and other services to build messaging systems for instant messaging, feeds, and IoT messages.
To learn more about Alibaba Cloud Table Store, visit www.alibabacloud.com/product/table-store
Alibaba Cloud Storage - May 8, 2019
Alibaba Cloud Storage - May 8, 2019
Alibaba Cloud Storage - November 8, 2018
Alibaba Cloud Storage - March 28, 2019
Alibaba Cloud Storage - April 25, 2019
ApsaraDB - June 4, 2020
Provides secure and reliable communication between devices and the IoT Platform which allows you to manage a large number of devices on a single IoT Platform.Learn More
A fully managed NoSQL cloud database service that enables storage of massive amount of structured and semi-structured dataLearn More
A cloud solution for smart technology providers to quickly build stable, cost-efficient, and reliable ubiquitous platformsLearn More
ApsaraDB for HBase is a NoSQL database engine that is highly optimized and 100% compatible with the community edition of HBase.Learn More
More Posts by Alibaba Cloud Storage