All Products
Search
Document Center

Lindorm:Design a time series table

Last Updated:Feb 23, 2023

You can use time series tables to improve query performance and storage performance. This topic describes how to design a time series table based on the source of time series data and the business scenario in which the time series data is used.

Sample time series data

Modeling is critical for time series data. You need to analyze query scenarios and the characteristics of data sources to create appropriate data tables. This way, you can obtain optimal storage and query performance. The following figure provides an example. In the example, air quality monitoring (AQM) data is used.

2023-02-20_15-32-24

The data record that is generated at each point in time can be divided into the following parts:

  • Table: indicates a collection of time series data that is of the same type.

  • Tags: indicate the attributes of an object for which metrics are monitored. A tag consists of a tag key and a tag value. By default, LindormTSDB creates an index that maps tags to time series for the key-value pair of each tag. In specific scenarios, tags may be referred to as labels or dimensions.

  • Timestamp: indicates the time when the data record is generated.

  • Field: indicates an aspect of metrics. A data record can contain values of multiple fields. LindormTSDB does not create indexes for fields.

Best practice for designing time series tables

You can use measurements for the time series data of the same type as table names based on the preceding data model and then execute the CREATE TABLE SQL statement to design tables. For example, the measurement AQM that is shown in the preceding figure can be used as a table name.

The following sample code provides an example on how to define a table. In the example, the preceding AQM time series model is used.

CREATE TABLE aqm (
    city        VARCHAR TAG,
    district    VARCHAR TAG,
    id          VARCHAR TAG,
    time        TIMESTAMP,
    pm2_5       DOUBLE,
    pm10        DOUBLE,
    so2         DOUBLE,
    no2         DOUBLE,
    PRIMARY KEY(id)  //// Single-node Lindorm instances do not support primary keys. If your Lindorm instance is a single-node Lindorm instance, delete this row.
);

After the table is defined, execute INSERT statements to write data into the table.

INSERT INTO aqm (city, district, id, time, pm2_5, pm10, so2, no2) 
VALUES ('hangzhou', 'yuhang', 'HY00001', '2019-04-18 10:00:00', 31.0, 66.0, 10.0, 43.0);

INSERT INTO aqm (city, district, id, time, pm2_5, pm10, so2, no2) 
VALUES ('hangzhou', 'yuhang', 'HY00001', '2019-04-18 10:01:00', 31.2, 66.0, 10.5, 43.1);

You can also execute the following INSERT statement to write multiple records into the table at a time:

INSERT INTO aqm (city, district, id, time, pm2_5, pm10, so2, no2) 
VALUES ('hangzhou', 'yuhang', 'HY00001', '2019-04-18 10:02:00', 31.3, 66.0, 10.0, 42.9),  
('hangzhou', 'yuhang', 'HY00001', '2019-04-18 10:03:00', 31.2, 66.4, 10.3, 43.0);

Note

The TAG keyword is a syntax extension of SQL. This keyword is used to specify a column in a table as a tag. When you create a time series table, we recommend that you define columns as tags based on the following rules:

  • Make sure that the column that is defined as a tag can uniquely identify an attribute of the data source.

  • We recommend that you do not use attributes whose values frequently change as tags. For example, we recommend that you do not use process IDs and time-related attributes as tags. We recommend that you define these attributes as fields even if the data type of these attributes is STRING. If these attributes are used as tags, the number of time series increases. As a result, the size of the time series index increases. This affects data queries in a negative manner.

Best practice for designing primary keys

Databases perform storage sharding and query optimization based on primary keys. If primary keys are specified in queries, the query efficiency is increased. We recommend that you specify a primary key when you create a table.

Note

Single-node Lindorm instances do not support primary keys.

In most cases, we recommend that you use the unique identifier of the data source as the primary key based on actual scenarios.

  • In IoT and Industrial Internet of Things (IIoT) scenarios, you can use the device ID as the primary key.

  • In Internet of Vehicles (IoV) scenarios, you can use the unique identifier of vehicles as the primary key.

  • In monitoring scenarios, you can use an identifier such as the app ID or host:port as the primary key.

In the time series table aqm that is provided in the preceding example, you can use the ID as the primary key to obtain optimal query performance if a large number of queries are performed based on IDs.

CREATE TABLE aqm (
    city        VARCHAR TAG,
    district    VARCHAR TAG,
    id          VARCHAR TAG,
    time        TIMESTAMP,
    pm2_5       DOUBLE,
    pm10        DOUBLE,
    so2         DOUBLE,
    no2         DOUBLE,
    PRIMARY KEY (id)   // Single-node Lindorm instances do not support primary keys. If your Lindorm instance is a single-node Lindorm instance, delete this row.
);
Important

Time series tables are different from tables in traditional relational databases. Time series tables do not require the values in primary key columns to be unique. If you want to specify a column as the primary key column, make sure that the column is labeled as a tag.

Best practice for selecting data types

For more information about the data types supported by LindormTSDB, see Data types. When you create a time series table, we recommend that you select data types based on the following rules:

  • Tag columns

    Values in columns that are declared as tags must be of the VARCHAR data type. The data type of these columns cannot be changed.

  • Timestamp columns

    LindormTSDB stores timestamps as BIGINT values in the UNIX time format in LindormDFS. This improves the performance of timestamp compression. When you create a table, you can specify the BIGINT or TIMESTAMP type based on your calculation requirements for timestamps.

  • Field columns

    Field columns support all data types. To ensure excellent data compression and optimal query performance, we recommend that you do not use VARCHAR as the data type for the values in field columns.