All Products
Search
Document Center

Syntax and features

Last Updated: Sep 17, 2020

1. Data types

The following table lists the data types that are supported by TSQL.

Data type Description Valid value
BOOLEAN The BOOLEAN data type is a data type that has one of the two possible values: true and false. true or false
SMALLINT The SMALLINT data type stores signed integer values and occupies two bytes of storage space. Value range: -32,768 to 32,767
INT The INT data type stores signed integer values and occupies four bytes of storage space. Value range: -2,147,483,648 to 2,147,483,647
BIGINT The BIGINT data type stores signed integer values and occupies eight bytes of storage space. Value range: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
FLOAT The FLOAT data type stores single-precision floating-point numbers and occupies four bytes of storage space.
DOUBLE The DOUBLE data type stores double-precision floating-point numbers and occupies eight bytes of storage space. By default, if a metric value in TSDB is a floating-point number, the metric value is processed as a double-precision floating-point number.
VARCHAR The VARCHAR data type stores UTF-8 encoded characters.
TIMESTAMP The format for the values of the TIMESTAMP data type is yyyy-MM-dd HH:mm:ss.SSS. Each timestamp consists of the year, month, day, hour, minute, second, and millisecond components. The number of seconds or milliseconds that have elapsed since 00:00:00 on January 1, 1970
DATE The values of the DATE data type consist of the year, month, and day components. The format is YYYY-MM-DD. 2019-03-01
TIME The values of the TIME data type consist of the hour, minute, and second components, and use the 24-hour clock. The format is hh:mm:ss. 10:05:03

2. TSQL query syntax

TSQL queries are divided into metadata queries and time series data queries.

2.1 Metadata queries

2.1.1 Query TSDB tables or metrics

Execute the SHOW TABLES statement to query tables.

  1. SHOW TABLES FROM TSDB

Note that if your TSDB instance has a large number of metrics and you execute the preceding statement, a long response delay occurs. In this scenario, we recommend that you use the TSDB /api/suggest API endpoint instead of TSQL to query the values of a specific metric.

2.1.2 Query TSDB tables or metrics that have the specified names

The following example queries the tables whose names start with the cpu prefix.

  1. SHOW TABLES FROM TSDB WHERE TABLE_NAME like 'cpu%'

2.1.3 Query the column details of a table

Execute the DESCRIBE statement to query the column details of a table, including the column names and data types.

  1. DESCRIBE TSDB.`cpu.usage_system`
  2. +----------------------+--------------------+--------------+
  3. | COLUMN_NAME | DATA_TYPE | IS_NULLABLE |
  4. +----------------------+--------------------+--------------+
  5. | hostname | CHARACTER VARYING | YES |
  6. | rack | CHARACTER VARYING | YES |
  7. | service_environment | CHARACTER VARYING | YES |
  8. | os | CHARACTER VARYING | YES |
  9. | service | CHARACTER VARYING | YES |
  10. | datacenter | CHARACTER VARYING | YES |
  11. | arch | CHARACTER VARYING | YES |
  12. | service_version | CHARACTER VARYING | YES |
  13. | team | CHARACTER VARYING | YES |
  14. | region | CHARACTER VARYING | YES |
  15. | timestamp | TIMESTAMP | NO |
  16. | value | DOUBLE | YES |
  17. +----------------------+--------------------+--------------+

2.2 Time series data queries

You can use SELECT statements to query time series data in TSQL. Each SELECT statement contains query expressions and clauses, such as FROM, WHERE, GROUPBY, and ORDERBY. For more information about the standard SQL statements that are supported by TSQL, see the TSQL syntax topic. The structure of a SELECT statement in TSQL is described as follows:

  1. select_stmt =
  2. SELECT exp_list
  3. FROM from_clause
  4. [where_clause]
  5. [groupby_clause]
  6. [having_clause]
  7. [orderby_clause]
  8. [limit_clause]

2.2.1 TSQL reserved words

For more information, see the reserved words of the standard SQL. If the identifiers of table, column, or function names in a query contain special characters or reserved words, you must use the escape characters () to format the identifiers. This processing method is the same as that in the MySQL lexical analysis. For example, if you use the timestamp and value reserved words in a query, you must use the escape characters () to enclose these two reserved words.

  1. SELECT `timestamp`, `value` from tsdb.table_name

Another example is provided to explain when to use escape characters. In this example, the cpu.usage_system metric contains the period (.) special character. You must use the escape characters (`) to enclose the metric.

  1. SELECT * from tsdb.`cpu.usage_system`

3. TSQL time series data queries

3.1 Conditional filtering

3.1.1 Time-based filtering

You can perform time-based filtering based on the timestamp column. The following table describes the available time filter conditions.

Filter Description Additional information
timestamp between ‘2019-03-01 00:00:00’ and ‘2019-03-01 00:00:10’ Specifies the start time and end time of a time range. The start time and the end time use the yyyy-MM-dd HH:mm:ss.SSS format. By default, timestamp strings use the UTC+0 time zone.
timestamp between 1551398400000 and 1551398410000 Specifies the start time and end time of a time range. The start time and end time of the time range are specified by the number of milliseconds that have elapsed since 00:00:00, January 1, 1970. The number of milliseconds that have elapsed since 00:00:00, January 1, 1970, must be an integer of the LONG data type, and cannot be a string.
For more information, see the descriptions of the start and end parameters in the /api/query API endpoint.
timestamp >= ‘2019-03-01 00:00:00’ Specifies the start time of a time range. The format is yyyy-MM-dd HH:mm:ss.SSS.
timestamp >= 1551398400000 Specifies the start time of a time range. The start time is specified by the number of milliseconds that have elapsed since 00:00:00, January 1, 1970.
timestamp <= ‘2019-03-01 00:00:10’ Specifies the end time of a time range. The format is yyyy-MM-dd HH:mm:ss.SSS.
timestamp < 1551398410000 Specifies the end time of a time range. The end time is specified by the number of milliseconds that have elapsed since 00:00:00, January 1, 1970.
timestamp >= ‘2019-03-01 00:00:00’ and timestamp <= ‘2019-03-01 00:00:10’ Specifies the start time and end time of a time range. The format is yyyy-MM-dd HH:mm:ss.SSS.
timestamp >=1551398400000 and timestamp <=1551398410000 Specifies the start time and end time of a time range. The start time and end time are specified by the number of milliseconds that have elapsed since 00:00:00, January 1, 1970.
timestamp between date_diff(now(), interval ‘5’ minute) and now() Specifies the start time and end time of a time range. The start time uses the timestamp format, and is 5 minutes earlier than the current time. The start time is calculated based on the date_diff() function. The end time of the time range is the current time that is returned by the now() function.
timestamp >= date_diff(now(), interval ‘5’ minute) and timestamp <= now() Specifies the start time and end time of a time range. The start time uses the timestamp format, and is 5 minutes earlier than the current time. The start time is calculated based on the date_diff() function. The end time of the time range is the current time that is returned by the now() function.

The considerations are described as follows:

  • By default, timestamp strings use the yyyy-MM-dd HH:mm:ss.SSS format and use the UTC+0 time zone.
  • The start timestamp and end timestamp can be specified by the number of milliseconds that have elapsed since 00:00:00, January 1, 1970. The number of milliseconds that have elapsed since 00:00:00, January 1, 1970, must be an integer of the LONG data type, cannot be a string. Note that the time unit is milliseconds.
  • TSQL queries require time-based filtering. If you do not specify a time range in a query, TSQL may scan large amounts of data, which compromises system performance.
  • In a specified time range, the start and end timestamp constants must use the yyyy-MM-dd HH:mm:ss.SSS format. If the format is invalid, TSQL cannot parse the timestamps into correct constants and processes the timestamps as values of the VARCHAR data type. In this case, the system returns unexpected query results.

3.1.2 TSQL timestamps and time zones

3.1.2.1 Time zones supported by the versions earlier than TSDB V2.5.9

You can query a TSDB database version by using the /api/version API endpoint. The following rules apply to the versions that are earlier than TSDB V2.5.9:

  • Input timestamp strings: In TSQL queries, Input timestamp strings use the yyyy-MM-dd HH:mm:ss.SSS format. By default, the strings use the UTC+0 time zone. For example, if the input timestamp string is 2019-03-01 00:00:00.000, the string is parsed into 2019-03-01 00:00:00.00 (UTC+0). The UTC+0 time zone is also known as Greenwich Mean Time (GMT). If you use the UTC+8 time zone, the time is 2019-03-01 08:00:00.00 (UTC+8).
  • Output data of the TIMESTAMP type: In query results, the values in the column of the TIMESTAMP data type also use UTC+0 as the default time zone.

Example

  1. select hostname, `timestamp`, `value`
  2. from tsdb.`cpu.usage_system`
  3. where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:00:10'
  4. limit 2;
  5. +-----------+------------------------+---------------------+
  6. | hostname | timestamp | value |
  7. +-----------+------------------------+---------------------+
  8. | host_7 | 2019-03-01 00:00:00.0 | 14.568605774857241 |
  9. | host_7 | 2019-03-01 00:00:10.0 | 14.722472083963408 |
  10. +-----------+------------------------+---------------------+

This example queries the records whose timestamps fall within the range of 2019-03-01 00:00:00 (UTC+0) to 2019-03-01 00:00:10 (UTC+0). The values of the timestamp column in the query result also use the UTC+0 time zone.

3.1.2.2 Time zones supported by TSDB V2.5.9

In TSDB V2.5.9, you can specify a time zone offset to use all the time zones.

  • Input timestamp strings: In TSQL queries, input timestamp strings use the yyyy-MM-dd HH:mm:ss.SSS ZZZ format. You can specify a time zone offset based on your business needs. For example, if you want to use the UTC+8 time zone, you can specify the time zone offset as +0800. The value 2019-03-01 08:00.00 + 0800 indicates 8:00 in the morning on March 1, 2019 (UTC+8).
  • Output data of the TIMESTAMP type: TSDB introduces the function localtime(timestamp, timezoneOffset). This function returns the local time in the specified time zone. For example, you can use the localtime(timestamp, ‘+ 0800’) function to obtain the local time in the UTC+8 time zone.

Example

  1. select hostname, localtime(`timestamp`, '+0800') as ltime, `value`
  2. from tsdb.`cpu.usage_system`
  3. where `timestamp` between '2019-03-01 08:00:00 +0800' and '2019-03-01 08:00:10 +0800'
  4. limit 2;
  5. +-----------+------------------------+---------------------+
  6. | hostname | ltime | value |
  7. +-----------+------------------------+---------------------+
  8. | host_7 | 2019-03-01 08:00:00.0 | 14.568605774857241 |
  9. | host_7 | 2019-03-01 08:00:10.0 | 14.722472083963408 |
  10. +-----------+------------------------+---------------------+

This example uses the localtime() function and adds ‘+0800’ to the timestamp string to query the records. The timestamps of the records fall within the range of 8:00:00 to 8:00:10 in the morning on March 1, 2019 (UTC+8). The UTC+8 time zone is used in the query result. In the preceding two examples, the system returns the same set of records. The difference is that the timestamp strings use different time zones.

Note: TSDB V2.5.9 and later support the query semantics that are used in the versions earlier than TSDB V2.5.9. Therefore, if your application uses the UTC+0 time zone to query records, you can continue to use the UTC+0 time zone in TSDB V2.5.9 and later.

You can use the localtime() function to query the local time based on the result that is returned by the tumble() function. An example is provided as follows:

  1. select
  2. count(*) as cnt,
  3. localtime(tumble(`timestamp`, interval '5' second), '+0800') as ltime
  4. from tsdb.`cpu.usage_system`
  5. where `timestamp` between '2019-03-01 08:00:00 +0800' and '2019-03-01 08:00:10 +0800'
  6. group by ltime;
  7. +------+------------------------+
  8. | cnt | ltime |
  9. +------+------------------------+
  10. | 10 | 2019-03-01 08:00:00.0 |
  11. | 10 | 2019-03-01 08:00:10.0 |
  12. +------+------------------------+

3.1.3 Conditional filtering based on tags

TSQL allows you to perform conditional filtering based on tag columns. You can filter data based on various range comparison operators and regular comparison operators, such as equal to and not equal to. You can also filter data by retrieving a list of multiple values. In addition, you can filter data by comparing the values returned by the functions that are applied to the columns. You can combine these filter conditions by using AND and OR logical operators to perform advanced conditional filtering.

Comparison condition Example Description
Equal to operator hostname = ‘host_0’ Queries the timeline data where the host name is host_0.
A list of multiple values hostname in (‘host_0’, ‘host_1’, ‘host_2’) Queries the timeline data where the host name is host_0, host_1, or host_2.
Range comparison operators hostname >= ‘host_0’ Queries the timeline data where the ASCII values of the host name strings are larger than the ASCII value of the host_0 string.
String functions substr(hostname, 2, 4) = ‘ost’ Queries the timelines where the second to fourth character of the host name is ost.
String functions upper(substr(hostname, 1,4))=’HOST’ Queries the timelines where the uppercase versions for the first four characters of the host name are HOST. For example, the first four characters can be host, Host, HOST, and hOST.
Integer functions length(hostname) = 5 Queries the timelines where the host name consists of five characters.

3.1.4 Conditional filtering based on metric values

TSQL allows you to perform conditional filtering based on metric value columns. You can filter metric values by using the following conditions.

Comparison condition Example Description
Equal to operator value = 100.0 Queries the data points where the metric values are equal to 100.
Range comparison operators value >= 5.0 and value < = 60.0 Queries the data points where the metric values range from 5.0 to 60.0.
Computing results of functions sqrt(value) < 15.0 Queries the data points where the square roots of the metric values are smaller than 15.0.

3.2 Data aggregation based on groups

You can use GROUP BY clauses in SQL queries to divide data into groups, and use the supported aggregate functions to aggregate data. You can divide data into groups based on the following factors: tag columns or tag column expressions, timestamp intervals, and the combination of the preceding two factors. For more information about aggregate functions, see the aggregate functions in the “TSQL functions” topic.

3.2.1 Data aggregation based on tag columns

You can specify tag columns or the expressions of tag columns in GROUP BY clauses to perform data aggregation.

The following example divides the metric values into groups based on the data center and the host name prefix. The prefix consists of the first three characters of the host name. The example also calculates the maximum, minimum, and average metric values in each group.

  1. select
  2. datacenter,
  3. substr(hostname, 1,3) preHost,
  4. max(`value`) maxV,
  5. min(`value`) minV,
  6. avg(`value`) avgV
  7. from tsdb.`cpu.usage_system`
  8. where `timestamp` >='2016-01-01' and
  9. `timestamp` <= '2016-01-02'
  10. group by datacenter, prehost

3.2.2 Downsampling: Data aggregation based on timestamp intervals

TSQL introduces the tumble() function to divide a series of timestamps into multiple non-overlapping windows that have the same interval. You can use this function to divide data into groups based on a specified timestamp interval.

  • Syntax of the tumble() function
  1. tumble(timestamp_column, interval_expression)

Examples:
tumble(`timestamp`, interval ‘1’ hour): divides a series of timestamps into windows based on an interval of one hour.
tumble(`timestamp`, interval ‘5’ minute): divides a series of timestamps into windows based on an interval of five minutes.

The following example queries the timeline data points that meet the specified conditions. In this example, the timestamp range of the data points is one day, and the data points are divided into groups based on a timestamp interval of one hour. This example calculates the maximum, minimum, and average values of data points in each group.

  1. select
  2. tumble(`timestamp`, interval '1' hour) ts,
  3. max(`value`) maxV,
  4. min(`value`) minV,
  5. avg(`value`) avgV
  6. from tsdb.`cpu.usage_system`
  7. where `timestamp` >='2016-01-01' and
  8. `timestamp` <= '2016-01-02'
  9. group by ts

TSQL data aggregation based on timestamp intervals is similar to the TSDB downsampling feature. The differences are described as follows:

  • The TSDB downsampling feature is supported by the /api/query API endpoint in the TSDB protocol. This feature allows you to aggregate data from a single timeline. TSQL data aggregation based on timestamp intervals allows you to aggregate data from multiple timelines.
  • The TSDB downsampling feature can use the aggregate functions that are different from those for aggregating multiple timelines. TSQL data aggregation based on timestamp intervals must use the same aggregation functions as those for aggregating multiple timelines.

3.2.3 Data aggregation based on the combination of tag columns or tag column expressions and timestamp intervals

You can group and aggregate data based on the two factors: tag columns or tag column expressions, and timestamp intervals.
The following example divides the metric values into groups based on the timestamp interval of one hour, the data center, and the host name prefix. The prefix consists of the first three characters of the host name. The example also calculates the maximum, minimum, and average metric values in each group.

  1. select
  2. datacenter,
  3. substr(hostname, 1,3) preHost,
  4. tumble(`timestamp`, interval '1' hour) ts,
  5. max(`value`) maxV,
  6. min(`value`) minV,
  7. avg(`value`) avgV
  8. from tsdb.`cpu.usage_system`
  9. where `timestamp` >='2016-01-01' and
  10. `timestamp` <= '2016-01-02'
  11. group by datacenter, preHost, ts

3.2.4 Query the latest or the earliest data point on each timeline

To query the latest data point on a timeline, you can use the ts_last() time series data aggregation function. To query the earliest data point on a timeline, you can use the ts_first() time series data aggregation function.

  1. select
  2. hostname,
  3. ts_last(`value`, `timestamp`) as v_last,
  4. max(`timestamp`) maxT
  5. from tsdb.`cpu.usage_system`
  6. where `timestamp` >='2016-01-01 00:00:00' and
  7. `timestamp` <= '2016-01-01 01:00:00'
  8. group by hostname
  9. order by hostname

This example queries the latest data point in the specified time range and the timestamp of the latest data point for each host.

3.2.5 Aggregate data based on tag columns and timestamp intervals, and calculate the latest or the earliest data point in each group

You can divide data into groups for aggregation based on tag key columns and timestamp intervals, and query the latest or earliest data point for each group. To divide data into groups based on tag key columns, you can specify the tag key columns in GROUP BY clauses. When you divide data into groups based on timestamp intervals, you can use the tumble() function to obtain the intervals.

  1. select
  2. hostname,
  3. tumble(`timestamp`, interval '10' minute) ts,
  4. ts_last(`value`, `timestamp`) as v_last,
  5. max(`timestamp`) maxT
  6. from tsdb.`cpu.usage_system`
  7. where `timestamp` >='2016-01-01 00:00:00' and
  8. `timestamp` <= '2016-01-01 01:00:00'
  9. group by hostname, ts
  10. order by hostname, ts

3.3 Join metrics for association analysis

The TSQL query engine allows you to perform JOIN operations on multiple timelines. This allows you to analyze metrics that have different names. To join multiple metrics of different timelines, you must specify the conditions where the operator is equal to (=).

  • The timestamps for the timelines must be the same.
  • The tags for the timelines must be the same.

The following example joins the cpu.usage_system and cpu.usage_idle metrics of two timelines for association analysis. This example queries the metric values where the host name and the timestamp on timeline 1 are the same as those in timeline 2.

  1. select t1.`timestamp`, t1.`value` as value1, t2.`value` as value2, t1.hostname
  2. from tsdb.`cpu.usage_system` t1, tsdb.`cpu.usage_idle` t2
  3. where t1.`timestamp` >='2016-01-01' and t1.`timestamp` <= '2016-01-02'
  4. and t1.hostname = t2.hostname
  5. and t1.`timestamp`= t2.`timestamp`
  6. limit 100

The following example uses the same join conditions as those in the preceding example. The tags and timestamps are aligned, and metric values are divided into groups for aggregation based on the host name. This example calculates the average values of the two metrics.

  1. select t1.hostname, avg(t1.`value`) as avg_sys, avg(t2.`value`) as avg_idle
  2. from tsdb.`cpu.usage_system` t1, tsdb.`cpu.usage_idle` t2
  3. where t1.`timestamp` >='2016-01-01' and t1.`timestamp` <= '2016-01-02'
  4. and t1.hostname = t2.hostname
  5. and t1.`timestamp`= t2.`timestamp`
  6. group by t1.hostname