This topic describes the Transact-SQL (TSQL) syntax and features.

Supported data types

Data type Description Valid value
Boolean A Boolean value. True and False
SMALLINT A 16-bit signed integer. -32,768 to 32,767
INTEGER or INT A 32-bit signed integer. -2,147,483,648 to 2,147,483,647
BIGINT A 64-bit signed integer. -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
FLOAT A 32-bit single-precision floating point number.
DOUBLE A 64-bit double-precision floating point number. If a metric value in Time Series Database (TSDB) is a floating point number, the metric value is processed as a double-precision floating point number by default.
VARCHAR A UTF-8 encoded string.
Timestamp A timestamp that is in the yyyy-MM-dd HH:mm:ss.SSS format. A timestamp that is not earlier than 1970-01-01 00:00:00.000
Date A date that is in the YYYY-MM-DD format. 2019-03-01
Time A point in time that is in the hh:mm:ss format. 10:05:03

2. Query syntax

TSQL queries are classified into two types: metadata queries and time series data queries.

2.1 Metadata queries

2.1.1 Query tables or metrics from TSDB

You can execute the SHOW TABLES statement to query tables.

SHOW TABLES FROM TSDB

If you execute the SHOW TABLES statement to query data tables from a TSDB instance that contains a large number of metrics, a long delay occurs. We recommend that you call the /api/suggest operation provided by TSDB to query the specific metrics.

2.1.2 Query tables or metrics that meet specific conditions from TSDB

For example, you can execute the following statement to query tables whose names start with cpu:

SHOW TABLES FROM TSDB WHERE TABLE_NAME like 'cpu%'

2.1.3 Query information about columns in a data table

You can execute the DESCRIBE statement to query information about all columns in a data table. The name and data type of each column are returned.

DESCRIBE TSDB.`cpu.usage_system`
+----------------------+--------------------+--------------+
|     COLUMN_NAME      |     DATA_TYPE      | IS_NULLABLE  |
+----------------------+--------------------+--------------+
| hostname             | CHARACTER VARYING  | YES          |
| rack                 | CHARACTER VARYING  | YES          |
| service_environment  | CHARACTER VARYING  | YES          |
| os                   | CHARACTER VARYING  | YES          |
| service              | CHARACTER VARYING  | YES          |
| datacenter           | CHARACTER VARYING  | YES          |
| arch                 | CHARACTER VARYING  | YES          |
| service_version      | CHARACTER VARYING  | YES          |
| team                 | CHARACTER VARYING  | YES          |
| region               | CHARACTER VARYING  | YES          |
| timestamp            | TIMESTAMP          | NO           |
| value                | DOUBLE             | YES          |
+----------------------+--------------------+--------------+

2.2 Query time series data

TSQL allows you to query time series data by executing the SELECT statement. The SELECT statement contains the query expression and SQL clauses such as FROM, WHERE, GROUP BY, and ORDER BY. The following code provides the syntax of the SELECT statement:

select_stmt = 
        SELECT exp_list 
        FROM from_clause 
        [where_clause]  
        [groupby_clause] 
        [having_clause] 
        [orderby_clause] 
        [limit_clause]

2.2.1 Reserved words

For information about all SQL reserved words, see the SQL standard. If the name of a table, a column, or a function that you want to query contains special characters or reserved words, you must use escape characters (`) to enclose the name. This processing method is the same as the MySQL lexical analysis. For example, if you want to use reserved words timestamp and value in your query, the SELECT statement to execute must be in the following format:

SELECT `timestamp`, `value` from tsdb.table_name

For example, if you want to query the cpu.usage_system metric whose name contains a period (.), use a pair of escape characters (`) in your statement to enclose the metric name. The following code provides the syntax:

SELECT * from tsdb.`cpu.usage_system`

3. Time series data queries

3.1 Filters

3.1.1 Time-based filters

A time-based filter is a filter condition for the timestamp column. The following table describes the supported time-based filters.

Filter Description Remarks
timestamp between '2019-03-01 00:00:00' and '2019-03-01 00:00:10' Specifies the start time and end time to determine a time range. The start time and the end time must be in the yyyy-MM-dd HH:mm:ss.SSS format. By default, the specified timestamps are identified as UTC timestamps.
timestamp between 1551398400000 and 1551398410000 Specifies the start time and end time to determine a time range. The start time and the end time must be epoch timestamps expressed in milliseconds. The epoch timestamps must be of the LONG data type and cannot be strings. For more information, see the descriptions about the start and end parameters in the "Query single-value data points" topic.
timestamp >= '2019-03-01 00:00:00' Specifies the start time to determine a time range. The start time must be in the yyyy-MM-dd HH:mm:ss.SSS format.
timestamp >= 1551398400000 Specifies the start time to determine a time range. The start time must be an epoch timestamp expressed in milliseconds.
timestamp <= '2019-03-01 00:00:10' Specifies the end time to determine a time range. The end time must be in the yyyy-MM-dd HH:mm:ss.SSS format.
timestamp < 1551398410000 Specifies the end time to determine a time range. The end time must be an epoch timestamp expressed in milliseconds.
timestamp >= '2019-03-01 00:00:00' and 'timestamp' <= '2019-03-01 00:00:10' Specifies the start time and end time to determine a time range. The start time and the end time must be in the yyyy-MM-dd HH:mm:ss.SSS format.
timestamp >=1551398400000 and `timestamp` <=1551398410000 Specifies the start time and end time to determine a time range. The start time and the end time must be epoch timestamps expressed in milliseconds.
timestamp between date_diff(now(), interval '5' minute) and now() Specifies the start time and end time to determine a time range. The date_diff() function returns the timestamp that is 5 minutes earlier than the current timestamp as the start time of the time range. The now() function returns the current timestamp as the end time of the time range.
timestamp >= date_diff(now(), interval '5' minute) and timestamp <= now() Specifies the start time and end time to determine a time range. The date_diff() function returns the timestamp that is 5 minutes earlier than the current timestamp as the start time of the time range. The now() function returns the current timestamp as the end time of the time range.

Before you use the preceding filters, take note of the following items:

  • By default, the timestamps that are in the yyyy-MM-dd HH:mm:ss.SSS format are identified as UTC timestamps.
  • Epoch timestamps in milliseconds can be specified as the start time and end time of a time range. The epoch timestamps must be of the LONG data type and cannot be strings.
  • Time-based filters are required for TSQL queries. If you do not specify time-based filters when you perform a TSQL query, a large volume of data is scanned. This reduces the system performance.
  • When you specify the start time or the end time of a time range in the date and time format, you must specify a timestamp in the yyyy-MM-dd HH:mm:ss.SSS format. Otherwise, TSQL cannot parse the timestamp. As a result, TSQL processes the timestamp as a varchar and returns an unexpected result.

3.1.2 Supported time zones

3.1.2.1 Time zones supported by TSDB versions earlier than V2.5.9

You can call the /api/version operation to check the TSDB version. The following rules apply to TSDB versions earlier than V2.5.9:

  • Timestamps specified in TSQL queries must be in the yyyy-MM-dd HH:mm:ss.SSS format. By default, the timestamps are identified as UTC timestamps. For example, if the timestamp that you specify is 2019-03-01 00:00:00:00.000, the string is parsed into 2019-03-01 00:00:00:00.00 (UTC+0). The UTC+0 time zone is known as Greenwich Mean Time (GMT). If you use the UTC+8 time zone, the time is 2019-03-01 08:00:00.00 (UTC+8).
  • By default, timestamps in query results are UTC timestamps.

Example:

select hostname, `timestamp`, `value` 
from tsdb.`cpu.usage_system` 
where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:00:10' 
limit 2;
+-----------+------------------------+---------------------+
| hostname  |       timestamp        |        value        |
+-----------+------------------------+---------------------+
| host_7    | 2019-03-01 00:00:00.0  | 14.568605774857241  |
| host_7    | 2019-03-01 00:00:10.0  | 14.722472083963408  |
+-----------+------------------------+---------------------+

In this example, the records generated during GMT 2019-03-01 00:00:00 and GMT 2019-03-01 00:00:10 are queried. The timestamps returned in the timestamp column are GMT timestamps.

3.1.2.2 Time zones supported by TSDB V2.5.9 and later

Non-UTC time zones are supported by TSDB V2.5.9 and later.

  • Timestamps in queries are in the yyyy-MM-dd HH:mm:ss.SSS ZZZ format. ZZZ specifies the offset of the time zone and is optional. If you want to use UTC+8 as the time zone, set ZZZ to +0800. For example, 2019-03-01 08:00.00 +0800 indicates 08:00 on March 1, 2019 (UTC+8).
  • The localtime(timestamp, timezoneOffset) function is used to convert a specific timestamp to the local time in the specified time zone. The timestamp parameter specifies the timestamp. The timezoneOffset parameter specifies the offset for the time zone. For example, if you want to obtain the local time in UTC+8, call the localtime(timestamp, '+0800') function.

Example:

select hostname, localtime(`timestamp`, '+0800') as ltime, `value` 
from tsdb.`cpu.usage_system` 
where `timestamp` between '2019-03-01 08:00:00 +0800' and '2019-03-01 08:00:10 +0800' 
limit 2;
+-----------+------------------------+---------------------+
| hostname  |         ltime          |        value        |
+-----------+------------------------+---------------------+
| host_7    | 2019-03-01 08:00:00.0  | 14.568605774857241  |
| host_7    | 2019-03-01 08:00:10.0  | 14.722472083963408  |
+-----------+------------------------+---------------------+

In the preceding example, the localtime() returns the records generated during 2019-03-01 00:00:00 UTC+8 and 2019-03-01 00:00:10 UTC+8. The timestamps in the ltime column are in UTC+8. The returned records in this example are the same as the returned records in the example provided in the 3.1.2.1 section, though the timestamps are displayed based on different time zones.

Versions of TSDB later than V2.5.9 are compatible with the query syntax that is used in versions earlier than V2.5.9. This way, the UTC timestamps used in your applications are supported by all versions.

The localtime() function can be used on the results of the tumble() function. The following code provides an example:

select 
    count(*) as cnt,   
    localtime(tumble(`timestamp`, interval '5' second),  '+0800') as ltime 
from tsdb.`cpu.usage_system` 
where `timestamp` between '2019-03-01 08:00:00 +0800' and '2019-03-01 08:00:10 +0800' 
group by ltime;
+------+------------------------+
| cnt  |         ltime          |
+------+------------------------+
| 10   | 2019-03-01 08:00:00.0  |
| 10   | 2019-03-01 08:00:10.0  |
+------+------------------------+

3.1.3 Tag-based filters

TSQL supports various simple tag-based filters such as Equal To and Not Equal To. You can also filter data by retrieving a list of multiple values. In addition, you can filter data by comparing the values returned by the functions that are applied to the columns. Tag-based filters can be combined by using AND and OR logical operators to perform advanced conditional filtering.

Filter Example Description
Equal To operator hostname = 'host_0' Retrieves the data points in which hostname is host_0.
A list of multiple values hostname in ('host_0', 'host_1', 'host_2') Retrieves the data points in which hostname is host_0, host_1, or host_2.
Range comparison operators hostname >= 'host_0' Retrieves the data points in which the ASCII value of hostname is larger than the ASCII value of host_0.
String functions substr(hostname, 2, 4) = 'ost' Retrieves the data points in which the substring of the hostname that starts from the second character and ends at the fourth character is ost.
String functions upper(substr(hostname, 1, 4)) = 'HOST' Retrieves the data points in which hostname starts with a string that is HOST after converting to uppercase letters.
Integer functions length(hostname) = 5 Retrieve the data points in which hostname consists of five characters.

3.1.4 Metric value-based filters

TSQL allows you to filter data points based on the metric value. The following table describes the supported metric value-based filters.

Filter Example Description
Equal To operator value = 100.0 Retrieves the data points whose metric value is 100.
Range comparison operators value >= 5.0 and value < = 60.0 Retrieves the data points in which the metric value is in the range from 5.0 to 60.0.
Computing results of functions sqrt(value ) < 15.0 Retrieves the data points in which the square root of the metric value is less than 15.0.

3.2 Grouping and aggregation

You can use the GROUP BY clause in SQL queries to specify the grouping rules and then use the aggregate functions listed in the preceding table to aggregate the data in each group. Rules for grouping data points can be classified into three types: tag-based, timestamp-based, and tag- and timestamp-based. For more information about aggregate functions, see the list of aggregate functions described in the "TSQL functions" topic.

3.2.1 Tag-based grouping and aggregation

TSDB allows you to add a tag or a tag expression in the GROUP BY clause to group and aggregate data points.

The following code provides an example. After you execute the following statement, the system groups data points based on the first three characters in the values of datacenter and hostname, and then calculates the maximum value, minimum value, and average value of the metric values for each group.

select
  datacenter,
  substr(hostname, 1,3) preHost,
  max(`value`) maxV,
  min(`value`) minV,
  avg(`value`) avgV
from tsdb.`cpu.usage_system`
where `timestamp` >='2016-01-01' and
  `timestamp` <= '2016-01-02'
group by datacenter, prehost

3.2.2 Downsampling: timestamp-based grouping and aggregation

TSQL provides the tumple() function. The tumple() function can be used to divide a series of timestamps into multiple non-overlapping time windows based on a specified interval. You can call this function to divide data points into groups based on a specified timestamp interval.

  • tumble
tumble(timestamp_column, interval_expression)

If you call the tumble(`timestamp`, interval '1' hour) function, the function divides the timestamps into windows based on an interval of 1 hour. If you call the tumble(`timestamp`, interval '5' minute) function, the function divides the timestamps into windows based on an interval of 5 minutes.

For example, you can execute the following statement to query the data points in the timeline of one day. In this example, timestamps are divided into windows based on an interval of 1 hour. The maximum value, minimum value, and average value of the data points in each group are returned.

select
  tumble(`timestamp`, interval '1' hour) ts,
  max(`value`) maxV,
  min(`value`) minV,
  avg(`value`) avgV
from tsdb.`cpu.usage_system`
where `timestamp` >='2016-01-01' and
  `timestamp` <= '2016-01-02'
group by  ts

The timestamp-based grouping and aggregation feature provides functionality similar to the downsampling feature. However, the timestamp-based grouping and aggregation feature and the downsampling feature have multiple differences. The following information details the differences.

  • The downsampling feature can be used by calling the /api/query operation for a single timeline. Timestamp-based grouping and aggregation can be performed on multiple timelines.
  • TSQL uses the same aggregate function for timestamp-based aggregation on a single timeline and on multiple timelines. The aggregate function provided by TSDB for timestamp-based aggregation is different from the aggregate function used for downsampling.

3.2.3 Group and aggregate data points based on the tag column and the timestamp interval

TSQL allows you to group and aggregate data points based on the tag column and the timestamp interval. To specify the tag column, you can include the column name or the expression on the column in the GROUP BY clause. The following code provides an example. After you execute the following statement, the system groups data points based on the first three characters in the hostnames of the data points and the timestamps at an interval of 1 hour. Then, the system calculates the maximum value, minimum value, and mean value of the metric values for each group.

select
  datacenter,
  substr(hostname, 1,3) preHost,
  tumble(`timestamp`, interval '1' hour) ts,
  max(`value`) maxV,
  min(`value`) minV,
  avg(`value`) avgV
from tsdb.`cpu.usage_system`
where `timestamp` >='2016-01-01' and
  `timestamp` <= '2016-01-02'
group by datacenter, preHost, ts

3.2.4 Query the first and most recent data points in a timeline

You can call the ts_last() function to obtain the most recent data point in a timeline and the ts_first() function to obtain the first data point in a timeline.

select
  hostname,
  ts_last(`value`, `timestamp`) as v_last,
  max(`timestamp`) maxT
from tsdb.`cpu.usage_system`
where `timestamp` >='2016-01-01 00:00:00' and
  `timestamp` <= '2016-01-01 01:00:00'
group by hostname
order by hostname

After you execute the preceding statement, the metric value and the timestamp of the most recent data point generated in the specified time range are returned.

3.2.5 Query the first and most recent data points in each group that is divided based on the tag key column and the timestamp interval

You can divide data points into groups for aggregation based on the tag key column and the timestamp interval and query the first and most recent data points in each group. To specify the tag key column, include the column name in the GROUP BY clause. To obtain the timestamp interval, call the tumble() function.

select
  hostname, 
  tumble(`timestamp`, interval '10' minute) ts,
  ts_last(`value`, `timestamp`) as v_last,
  max(`timestamp`) maxT
from tsdb.`cpu.usage_system`
where `timestamp` >='2016-01-01 00:00:00' and
  `timestamp` <= '2016-01-01 01:00:00'
group by hostname, ts
order by hostname, ts

3.3 Join multiple metrics for association analysis

The TSQL query engine allows you to join multiple timelines for association analysis. You can use the TSQL query engine to analyze metrics whose names are different. Before you join multiple metrics, you must specify the attribute based on which the metrics must be the same. The attribute can be one of the following items:

  • The timestamp
  • Specific tags

For example, execute the following statement to join the cpu.usage_system and cpu.usage_idle metrics for association analysis. Then, the system returns the metric values that have the same hostname and the same timestamp.

select t1.`timestamp`, t1.`value` as value1, t2.`value` as value2, t1.hostname
from tsdb.`cpu.usage_system` t1, tsdb.`cpu.usage_idle` t2
where t1.`timestamp` >='2016-01-01' and t1.`timestamp` <= '2016-01-02' 
 and t1.hostname = t2.hostname
 and t1.`timestamp`= t2.`timestamp`
 limit 100

The following code provides another example. In this example, the join conditions are the same as those specified in the previous example. After the following statement is executed, the tags and the timestamps are aligned, and metric values are divided into groups for aggregation based on hostname. Then, the average values of the metrics are calculated and returned.

select t1.hostname, avg(t1.`value`) as avg_sys, avg(t2.`value`) as avg_idle
from tsdb.`cpu.usage_system` t1, tsdb.`cpu.usage_idle` t2
where t1.`timestamp` >='2016-01-01' and t1.`timestamp` <= '2016-01-02' 
 and t1.hostname = t2.hostname
 and t1.`timestamp`= t2.`timestamp`
group by t1.hostname