How to create data tables by using Tablestore SDK for Python - Tablestore

This topic describes how to create a data table by calling the CreateTable operation. When you call the CreateTable operation to create a data table, you must specify schema information and configuration information for the data table. If the data table belongs to a high-performance instance, you can configure the reserved read throughput and reserved write throughout based on your business requirements. You can create one or more index tables when you create a data table.

Usage notes

It takes several seconds to load a data table after the data table is created. During this period, all read and write operations on the data table fail. Perform operations on the data table after the data table is loaded.
You must specify the primary key when you create a data table. A primary key consists of one to four primary key columns. Specify a name and data type for each primary key column.
Tablestore provides the auto-increment primary key column feature. This feature is suitable for system design scenarios that require an auto-increment primary key column, such as item IDs on e-commerce websites, user IDs on large websites, post IDs in forums, and message IDs in chat tools. For more information, see Configure an auto-increment primary key column.

Prerequisites

A Tablestore instance is created in the Tablestore console. For more information, see Create instances.
An OTSClient instance is initialized. For more information, see Initialize an OTSClient instance.

API operation

"""
Description: You can call this operation to create a data table based on the specified schema information. 

table_meta is an instance of the tablestore.metadata.TableMeta class. table_meta specifies the name of the data table and the schema of the primary key. 
For more information, see the documentation of the TableMeta class. After you create a data table, the partitions are loaded after several seconds. You can perform operations on the data table only after the partitions are loaded. 
table_options is an instance of the tablestore.metadata.TableOptions class. table_options contains the time_to_live, max_version, and max_time_deviation parameters. 
reserved_throughput is an instance of the tablestore.metadata.ReservedThroughput class. reserved_throughput specifies the reserved read throughput and reserved write throughput. 
secondary_indexes is an array that can contain one or more instances of the tablestore.metadata.SecondaryIndexMeta class. secondary_indexes specifies the global secondary index that you want to create. 

Return value: none. 
"""

def create_table(self, table_meta, table_options, reserved_throughput, secondary_indexes=[]):

Parameters

Configure the parameters in the code based on the parameter description in the following table and the "Request syntax" section of the CreateTable topic.

Parameter	Description
table_meta	The schema information about the data table. The schema information includes the following parameters: table_name: the name of the data table. schema_of_primary_key: the schema of the primary key. For more information, see Primary keys and attributes. Note You do not need to specify the schema for attribute columns. Different rows in a Tablestore table can have different attribute columns. You can specify the names of attribute columns when you write data to a data table. The primary key of a data table consists of one to four primary key columns. Primary key columns are sorted in the order in which they are added. For example, PRIMARY KEY (A, B, C) and PRIMARY KEY (A, C, B) have different schemas. Tablestore sorts rows based on the values of all primary key columns. The first primary key column is the partition key. Data that has the same partition key is stored in the same partition. We recommend that you keep the size of data with the same partition key less than or equal to 10 GB. Otherwise, a single partition may be too large to split. We also recommend that you evenly distribute read/write operations among different partition keys to facilitate load balancing. defined_columns: the predefined columns of the data table and the data types of the predefined column values. Primary key columns cannot be predefined columns. You can use predefined columns as the index columns or attribute columns of index tables.
table_options	The configuration information about the data table. For more information, see Data versions and TTL. The configuration information includes the following parameters: time_to_live: the retention period of data in the table. This period is the validity period of data. If the retention period exceeds the value of this parameter, Tablestore automatically deletes expired data. The value of this parameter must be greater than or equal to 86400. A value of 86400 indicates one day. You can also set this parameter to -1. If you set the timeToLive parameter to -1 for the data table, the data in the data table never expires. After the data table is created, you can call the UpdateTable operation to modify the value of the timeToLive parameter. Unit: seconds. Important If you want to create an index table for the data table, the timeToLive parameter must meet one of the following requirements: The timeToLive parameter of the data table is set to -1, which means that data in the data table never expires. The timeToLive parameter of the data table is set to a value other than -1 and update operations on the data table are prohibited. max_version: the maximum number of data versions that can be retained for a single attribute column. If the number of data versions in an attribute column exceeds the value of this parameter, the system deletes data of earlier versions. When you create a data table, you can set this parameter based on your business requirements. After the data table is created, you can call the UpdateTable operation to modify the value of the maxVersions parameter. Important If you want to create an index table for the data table, you must set the maxVersions parameter to 1. max_time_deviation: the max version offset, which is the maximum difference between the current system time and the timestamp of the written data. The difference between the version number and the time at which the data is written must be less than or equal to the value of the max_time_deviation parameter. Otherwise, an error occurs when the data is written. The valid version range of data in an attribute column is calculated by using the following formula: Valid version range = `[max{Data written time - Max version offset, Data written time - TTL value}, Data written time + Max version offset)`. When you create a data table, Tablestore uses the default value of 86400 if you do not specify a max version offset. After the data table is created, you can call the UpdateTable operation to modify the value of the maxTimeDeviation parameter. Unit: seconds. You can call the UpdateTable operation to modify the time_to_live and max_versions parameters of a data table. For more information, see UpdateTable.
reserved_throughput	The reserved read throughput and reserved write throughout for the data table. You can set the reserved read throughput and reserved write throughout only to 0 for data tables in capacity instances. Reserved throughput does not apply to these instances. The default value 0 indicates that you are charged for all throughput on a pay-as-you-go basis. Unit: capacity unit (CU). If you set the reserved read throughput and reserved write throughout to a value that is greater than 0 for a data table, Tablestore reserves related resources for the data table. After you create the data table, you are charged for the reserved throughput resources. You are charged for additional throughput on a pay-as-you-go basis. For more information, see Billing overview. If you set the reserved read throughput and reserved write throughout to 0, Tablestore does not reserve related resources for the data table.
secondary_indexes	The schema information about the index table. The schema information includes the following parameters: index_name: the name of the index table. primary_key_names: the primary key of the index table. The primary key is a combination of all primary key columns and a random number of predefined columns of the data table. If you want to create a local secondary index, the first primary key column of the index table must be the same as the first primary key column of the data table. defined_column_names: the attribute columns of the index table. The attribute columns are a combination of predefined columns of the data table. index_type: the type of the index table. Valid values: IT_GLOBAL_INDEX and IT_LOCAL_INDEX. If you do not specify the index_type parameter or you set the index_type parameter to IT_GLOBAL_INDEX, a global secondary index is created. Tablestore automatically synchronizes the data from the indexed columns and primary key columns of the data table to the columns of the index table in asynchronous mode. The synchronization latency is within milliseconds. If you set the index_type parameter to IT_LOCAL_INDEX, a local secondary index is created. Tablestore automatically synchronizes the data from the indexed columns and primary key columns of the data table to the columns of the index table in synchronous mode. You can query the data from the index table immediately after the data is written to the data table.

Examples

Create a data table without creating an index table for the data table

The following sample code provides an example on how to create a data table that contains two primary key columns. In this example, the time_to_live parameter is set to 31536000 (one year), the max_versions parameter is set to 3, the max_time_deviation parameter is set to 86400 (one day), and the reserved_throughput parameter is set to (0,0).

# Create a schema for the primary key columns of the data table, including the number, names, and types of the primary key columns. 
# The first primary key column is named pk0 and requires an INTEGER value. The first primary key column is also the partition key. 
# The second primary key column is named pk1 and requires an INTEGER value. In this example, the data type is set to INTEGER. You can also set the data type to STRING or BINARY. 
schema_of_primary_key = [('pk0', 'INTEGER'), ('pk1', 'INTEGER')]

# Create a tableMeta instance based on the name of the data table and the schema of the primary key columns. 
table_meta = TableMeta('<table_name>', schema_of_primary_key)

# Create a TableOptions instance. Set the time_to_live parameter to 31536000 to automatically delete expired data. Then, set the max_versions parameter to 3 and the max_time_deviation parameter to 86400 (one day). 
table_options = TableOptions(31536000, 3, 86400)

# Set the reserved read throughput and reserved write throughput to 0. 
reserved_throughput = ReservedThroughput(CapacityUnit(0, 0))

# Call the create_table method of the client. If no exception is thrown, the data table is created. 
try:
    ots_client.create_table(table_meta, table_options, reserved_throughput)
    print("create table succeeded.")
# If an exception is thrown, the data table fails to be created. Handle the exception. 
except Exception:
    print("create table failed.")

For more information about the sample code, see CreateTable at GitHub.

Create a data table and a global secondary index

The following sample code provides an example on how to create a global secondary index when you create a data table:

# Create a schema for the primary key columns of the data table, including the number, names, and types of the primary key columns. 
schema_of_primary_key = [('gid', 'INTEGER'), ('uid', 'STRING')]

# Specify the predefined columns of the data table. 
defined_columns = [('i', 'INTEGER'), ('bool', 'BOOLEAN'), ('d', 'DOUBLE'), ('s', 'STRING'), ('b', 'BINARY')]

# Create a tableMeta instance based on the name of the data table and the schema of the primary key columns. 
table_meta = TableMeta('<table_name>', schema_of_primary_key, defined_columns)

# Create a TableOptions instance. Set the time_to_live parameter to -1, which specifies that the data does not expire. Then, set the max_versions parameter to 1. 
table_option = TableOptions(-1, 1)

# Set the reserved read throughput and reserved write throughput to 0. 
reserved_throughput = ReservedThroughput(CapacityUnit(0, 0))

# Specify the name, primary key columns, and attribute columns of the secondary index. Do not specify the index_type parameter. If you do not specify this parameter, a global secondary index is created. 
secondary_indexes = [
    SecondaryIndexMeta('index1', ['i', 's'], ['bool', 'b', 'd']),
    ]

# Call the create_table method of the client. If no exception is thrown, the data table and secondary index are created. 
ots_client.create_table(table_meta, table_option, reserved_throughput, secondary_indexes)

Create a data table and a local secondary index

The following sample code provides an example on how to create a local secondary index when you create a data table:

# Create a schema for the primary key columns of the data table, including the number, names, and types of the primary key columns. 
schema_of_primary_key = [('gid', 'INTEGER'), ('uid', 'STRING')]

# Specify the predefined columns of the data table. 
defined_columns = [('i', 'INTEGER'), ('bool', 'BOOLEAN'), ('d', 'DOUBLE'), ('s', 'STRING'), ('b', 'BINARY')]

# Create a tableMeta instance based on the name of the data table and the schema of the primary key columns. 
table_meta = TableMeta('<table_name>', schema_of_primary_key, defined_columns)

# Create a TableOptions instance. Set the time_to_live parameter to -1, which specifies that the data does not expire. Then, set the max_versions parameter to 1. 
table_option = TableOptions(-1, 1)

# Set the reserved read throughput and reserved write throughput to 0. 
reserved_throughput = ReservedThroughput(CapacityUnit(0, 0))

# Specify the name, primary key columns, index columns, and index type of the secondary index. Set the index_type parameter to IT_LOCAL_INDEX, which specifies that a local secondary index is created. 
secondary_indexes = [
    SecondaryIndexMeta('index1', ['gid', 's'], ['bool', 'b', 'd'],index_type= SecondaryIndexType.LOCAL_INDEX),
    ]

# Call the create_table method of the client. If no exception is thrown, the data table and secondary index are created. 
ots_client.create_table(table_meta, table_option, reserved_throughput, secondary_indexes)

References

You can call API operations to read and write data in a table. For more information, see Basic operations on data.
You can delete a data table that you no longer require. For more information, see Delete tables.