The Wide Column model is similar to the data model of Bigtable or HBase and is applicable to various scenarios such as metadata and big data. The Wide Column model stores data in data tables. A single data table supports petabyte-level data storage and tens of millions of queries per second (QPS). The data tables are schema-free and support wide columns, multi-version data, and time-to-live (TTL) management. The data tables also support auto-increment primary key columns, local transactions, atomic counters, filters, and conditional updates.
Introduction
The Wide Column model of Tablestore is similar to the data model of Bigtable or HBase. The Wide Column model stores data in data tables in a three-dimensional structure, which is defined by rows, columns, and time. Each row of a data table can have different columns. The attribute columns of a data table can be dynamically added or removed. When you create a data table, you do not need to define a strict schema for the attribute columns of the data table.
Components
The preceding figure shows the components of the Wide Column model. The following table describes the components.
Component | Description |
Primary key | Primary keys uniquely identify each row in data tables. A primary key consists of one to four primary key columns. |
Partition key | The first primary key column is called the partition key. Tablestore partitions data in a data table based on the partition key values. Rows that share the same partition key value are allocated to the same partition to ensure balanced distribution of data access requests. |
Attribute column | All columns except for the primary key columns in a row are called attributed columns. Each attribute column can contain values of different versions. Tablestore does not impose limits on the number of attribute columns that can be contained in each row. |
Version | Each value in an attribute column has a unique version number. The version number is a timestamp based on which you can manage the TTL of attribute column values. For more information, see the Version number section of the "Data versions and TTL" topic. |
Data type | Tablestore supports the following data types: STRING, BINARY, DOUBLE, INTEGER, and BOOLEAN. For more information, see the Data types section of the "Naming conventions and data types" topic. |
TTL | You can specify a TTL value for each data table. For example, if you set the TTL value to one month for a data table, Tablestore deletes data that is older than one month in the data table from. For more information, see the TTL section of the "Data versions and TTL" topic. |
Max versions | You can set the maximum number of versions for the value in each attribute column of one data table. Max versions can be used to control the number of versions for the value in each attribute column. When the actual number of versions in an attribute column exceeds the max versions value, Tablestore asynchronously deletes earlier versions. For more information, see the Max versions section of the Data versions and TTL topic. |
Core components
Data tables, rows, primary keys, and attributes are the core components of the Wide Column model of Tablestore. A data table consists of rows. Each row consists of a primary key and one or more attributes. The first primary key column is called the partition key.
The following table describes the primary key, attribute, and partition key.
For more information about data types supported by primary key columns and attribute columns, see the Data types section of the "Naming conventions and data types" topic.
Component | Description |
Primary key | Primary keys uniquely identify each row in data tables. A primary key consists of one to four primary key columns. When you create a data table, you must specify primary key columns, including the name, data type, and sequence of the primary key columns. Tablestore indexes data in a data table based on the primary key values of the data table. By default, rows in a data table are sorted in ascending order based on the primary key values. |
Partition key | The first primary key column is called the partition key. To ensure load balancing, Tablestore automatically distributes a row of data to the corresponding partition and machine based on the range to which the partition key value of the row belongs. Rows that share the same partition key value belong to the same partition. A partition may store rows that have different partition key values. Tablestore splits and merges partitions based on specified rules. Note Partition key values are the basic unit to partition data. Data that shares the same partition key value cannot be further split. To prevent partitions from being too large to split, we recommend that you keep the total size of all rows that share the same partition key value to 10 GB at most. For more information about how to select a partition key, see Table operations. |
Attribute | A row can have multiple attribute columns. The number of attribute columns in each row is unlimited, and the attribute columns in each row can be different. The value of an attribute column in a row can be null. The values in the same attribute column of multiple rows can be of different data types. An attribute column stores multiple versions of its value. You can specify the number of versions that can be retained for an attribute column value for query and use. You can also specify a TTL value for attribute column values. For more information, see Data versions and TTL. |
Differences between the Wide Column model and the relational model
The following table describes the differences between the Wide Column model and the relational model.
Model | Feature |
Wide Column model | Three-dimensional structure (row, column, and time), schema-free data, wide columns, max versions, and TTL management |
Relational model | Two-dimensional structure (row and column) and fixed schema |
Limits
For more information about the general limits on the Wide Column model, see General limits.
If you use secondary indexes or search indexes to accelerate data queries, take note of the limits on the indexes. For more information, see Secondary index limits and Search index limits.
If you use SQL to query and analyze data, take note of the limits on SQL queries. For more information, see SQL limits.
Procedure
The following table describes the steps.
Step | Operation | Description |
1 | Grant permissions on Tablestore resources to a Resource Access Management (RAM) user | After you create a RAM user, configure minimal permissions for the RAM user to access Tablestore resources. You can use system default policies or custom policies to grant the RAM user the permissions to access Tablestore resources. If you want to use an Alibaba Cloud account or a RAM user that has the required permissions to access Tablestore resources, skip this step. Important By default, an Alibaba Cloud account has permissions on all cloud resources. To ensure the security of your resources, we recommend that you create RAM users for your Alibaba Cloud account and authorize them to access different resources. |
2 | Before you use the features of Tablestore, you must activate Tablestore. You need to activate Tablestore only once. You are not charged when you activate Tablestore. If Tablestore is activated, skip this step. | |
3 | Important
Create a Tablestore instance in the selected region based on the selected billing method and instance type. If an existing Tablestore instance meets your business requirements, skip this step. | |
4 | Note Proper design of the primary key and partition key can effectively prevent data hotspot issues. We recommend that you design tables by referring to Table operations. Create a data table to store business-related data. When you create a data table, you can configure the following features based on your business requirements:
| |
5 | Note Proper attribute column settings can improve the efficiency of business data usage. We recommend that you specify attribute columns by referring to Data operations. You can write, update, read, and delete data in the data table.
To delete data, you can manually delete the data or enable automatic deletion by setting the TTL value of the data. For more information, see Delete data or Data versions and TTL. | |
6 | Use indexes to accelerate queries | If data reading based on the primary key of a data table cannot meet your business requirements, you can use indexes to accelerate data queries. Tablestore provides secondary indexes and search indexes to meet data query requirements in different scenarios.
|
7 | Analyze data | Use the SQL query feature or search indexes to aggregate and analyze data in the data table.
Note You can also use compute engines such as MaxCompute, Spark, Hive, HadoopMR, Function Compute, and Realtime Compute for Apache Flink to analyze data in Tablestore. For more information, see Overview. |
Billing
The billable items include read throughput, write throughput, storage usage, and outbound traffic over the Internet. For more information, see Billing overview.
FAQ
References
You can use the Wide Column model in the Tablestore console and Tablestore CLI. For more information, see the Use the Wide Column model section of the "Use Tablestore" topic.
To implement data center-level disaster recovery for instance data, you can create an instance of the ZRS type. For more information, see ZRS.
To ensure data storage security and network access security, you can encrypt data tables or associate a virtual private cloud (VPC) with your Tablestore instance to allow access only over the VPC. For more information, see Data encryption and Network security management.
To prevent important data from being accidentally deleted, you can use the data backup feature to back up important data on a regular basis. For more information, see Back up data in Tablestore.
To consume historical and incremental data in a data table, you can use Tunnel Service. For more information, see Overview.
To configure alert notifications for monitoring metrics, you can use CloudMonitor. For more information, see Overview.
To visualize data such as displaying data in charts, you can use DataV or Grafana. For more information, see Data visualization tools.