All Products
Search
Document Center

Lindorm:Downsampling queries

Last Updated:Feb 18, 2024

A downsampling query is an aggregate query that is performed based on a specified time interval. Downsampling queries are performed to decrease the sample rate in time series scenarios.

Applicable engines and versions

Downsampling queries are supported only by LindormTSDB 3.4.15 and later versions.

Note

For more information about how to view and upgrade the LindormTSDB version of the Lindorm instance, see Release notes of LindormTSDB and Upgrade the minor engine version of a Lindorm instance.

Syntax

select_sample_by_statement ::=  SELECT ( select_clause | '*' )
                                FROM table_identifier
                                WHERE where_clause
                                SAMPLE BY time_interval [ OFFSET offset_interval ] [ FILL fill_option ]
select_clause              ::=  selector [ AS identifier ] ( ',' selector [ AS identifier ] )
selector                   ::=  tag_identifier | time | function_identifier '(' field_identifier [ ',' function_args ] ')'
where_clause               ::=  relation ( AND relation )* (OR relation)*
relation                   ::=  ( field_identifier | tag_identifier ) operator term
operator                   ::=  '=' | '<' | '>' | '<=' | '>=' | '!=' | IN | CONTAINS | CONTAINS KEY
time_interval              ::= interval units | 0
offset_interval            ::= interval units

Aggregate functions supported by downsampling queries

The SAMPLE BY clause is executed to perform downsampling operations based on independent time series. For more information, see Data model.

The following table lists the functions supported by SAMPLE BY.

Function

Description

SUM

Returns the sum of values that are collected within each time window. For more information, see SUM.

AVG

Returns the average value that is calculated based on values that are collected within each time window. For more information, see AVG.

COUNT

Returns the number of values that are collected within each time window. For more information, see COUNT.

MIN

Returns the minimum value that is collected within each time window. For more information, see MIN.

MAX

Returns the maximum value that is collected within each time window. For more information, see MAX.

FIRST

Returns the first value that is collected within each time window. For more information, see FIRST.

LAST

Returns the last value that is collected within each time window. For more information, see LAST.

PERCENTILE

Returns the specified percentile of values that are collected within each time window. For more information, see PERCENTILE.

LATEST

Returns the latest values that are collected within a time window. For more information, see LATEST.

RATE

Returns the change rate of the value in the current data point compared with the value of the previous data point. For more information, see RATE.

DELTA

Returns the difference between the value of the current data point and the value in the previous data point. For more information, see DELTA.

Examples

You do not need to specify a function for downsampling in the SELECT statement to query data from the tag column of a table. If you want to query data from non-tag columns, you must specify a function for downsampling.

The following statement is executed to query data from a table named sensor and the sample results that are returned:

SELECT * FROM sensor;

The following result is returned:

+-----------+----------+---------------------------+-------------+-----------+
| device_id |  region  |           time            | temperature | humidity  |
+-----------+----------+---------------------------+-------------+-----------+
| F07A1260  | north-cn | 2021-01-01T09:00:00+08:00 | 0.000000    | 9.000000  |
| F07A1260  | north-cn | 2021-01-01T12:01:00+08:00 | 1.000000    | 45.000000 |
| F07A1260  | north-cn | 2021-01-01T14:03:00+08:00 | 2.000000    | 46.000000 |
| F07A1260  | north-cn | 2021-01-01T20:00:00+08:00 | 10.000000   | 47.000000 |
| F07A1261  | north-cn | 2021-02-10T12:00:30+08:00 | 3.000000    | 40.000000 |
| F07A1261  | north-cn | 2021-03-01T12:01:00+08:00 | 4.000000    | 41.000000 |
| F07A1261  | north-cn | 2021-03-08T12:08:00+08:00 | 5.000000    | 42.000000 |
| F07A1261  | north-cn | 2021-05-01T13:00:00+08:00 | 6.000000    | 43.000000 |
+-----------+----------+---------------------------+-------------+-----------+

Sample downsampling queries and downsampling queries that include subqueries

Note

Subqueries are not supported in downsampling queries. However, downsampling queries can be used as subqueries in other types of queries.

  • Example 1: Downsampling is performed at an interval that is specified based on the UTC time. The following statement is executed to aggregate time series based on an 8-hour interval and return the number of values that are collected in the humidity column within each time window:

    SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 8h;

    The following result is returned:

    +-----------+----------+---------------------------+----------------+
    | device_id |  region  |           time            | count_humidity |
    +-----------+----------+---------------------------+----------------+
    | F07A1260  | north-cn | 2021-01-01T08:00:00+08:00 | 3              |
    | F07A1260  | north-cn | 2021-01-01T16:00:00+08:00 | 1              |
    +-----------+----------+---------------------------+----------------+
  • Example 2: Downsampling is performed at an interval that is specified based on the UTC time and a specified time offset. The following statement specifies that time series are aggregated based on an 8-hour interval, the offset of the start time of each time window is 3 hours, and the number of values that are collected in the humidity column within each time window is calculated:

    SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 8h offset 3h;

    The following result is returned:

    +-----------+----------+---------------------------+----------------+
    | device_id |  region  |           time            | count_humidity |
    +-----------+----------+---------------------------+----------------+
    | F07A1260  | north-cn | 2021-01-01T03:00:00+08:00 | 1              |
    | F07A1260  | north-cn | 2021-01-01T11:00:00+08:00 | 2              |
    | F07A1260  | north-cn | 2021-01-01T19:00:00+08:00 | 1              |
    +-----------+----------+---------------------------+----------------+
  • Example 3: Downsampling is performed at an interval that is specified based on the UTC time. The following statement specifies that time series are aggregated based on a 24-hour interval, the offset of the start time of each time window is 16 hours (aligned to 00:00 in GMT+08:00), and the number of values that are collected in the humidity column within each time window is calculated:

    SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 24h offset 16h

    The following result is returned:

    +-----------+----------+---------------------------+----------------+
    | device_id | region   | time                      | count_humidity |
    +-----------+----------+---------------------------+----------------+
    | F07A1260  | north-cn | 2021-01-01T00:00:00+08:00 | 4              |
    +-----------+----------+---------------------------+----------------+
  • Example 4: The SAMPLE BY clause cannot be used together with the GROUP BY clause, LIMIT OFFSET clause, or ORDER BY clause. You can include the GROUP BY clause, LIMIT OFFSET clause, and ORDER BY clause in SQL statements to specify subqueries.

    SELECT device_id, max(avg_humidity) AS max_humidity  FROM (SELECT device_id,region,time,avg(humidity) AS avg_humidity FROM sensor sample by 8h) group by device_id;

    The following result is returned:

    +-----------+--------------+
    | device_id | max_humidity |
    +-----------+--------------+
    | F07A1261  | 43.000000    |
    | F07A1260  | 47.000000    |
    +-----------+--------------+
  • Example 5: Use the LIMIT OFFSET clause to specify the number of results that you want to obtain.

    SELECT device_id,region, avg_humidity FROM (select device_id,region,time,avg(humidity) AS avg_humidity FROM sensor sample by 8h) limit 1 offset 1;

    The following result is returned:

    +-----------+----------+--------------+
    | device_id |  region  | avg_humidity |
    +-----------+----------+--------------+
    | F07A1261  | north-cn | 40.000000    |
    +-----------+----------+--------------+

Sample interpolation queries

  • Example 1: Interpolate a specified value based on a specified time interval.

    SELECT * from (select device_id,region,time, avg(humidity) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h fill 1) order by device_id;

    The following result is returned:

    +-----------+----------+---------------------------+-----------+
    | device_id |  region  |           time            | humidity  |
    +-----------+----------+---------------------------+-----------+
    | F07A1260  | north-cn | 2021-01-01T08:00:00+08:00 | 9.000000  |
    | F07A1260  | north-cn | 2021-01-01T10:00:00+08:00 | 1.000000  |
    | F07A1260  | north-cn | 2021-01-01T12:00:00+08:00 | 45.000000 |
    | F07A1260  | north-cn | 2021-01-01T14:00:00+08:00 | 46.000000 |
    | F07A1260  | north-cn | 2021-01-01T16:00:00+08:00 | 1.000000  |
    | F07A1260  | north-cn | 2021-01-01T18:00:00+08:00 | 1.000000  |
    | F07A1260  | north-cn | 2021-01-01T20:00:00+08:00 | 47.000000 |
    +-----------+----------+---------------------------+-----------+
  • Example 2: Interpolate the value that is collected within the previous time window.

    SELECT * from (select device_id,region,time,avg(humidity) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h fill after) order by device_id;

    The following result is returned:

    +-----------+----------+---------------------------+-----------+
    | device_id |  region  |           time            | humidity  |
    +-----------+----------+---------------------------+-----------+
    | F07A1260  | north-cn | 2021-01-01T08:00:00+08:00 | 9.000000  |
    | F07A1260  | north-cn | 2021-01-01T10:00:00+08:00 | 45.000000 |
    | F07A1260  | north-cn | 2021-01-01T12:00:00+08:00 | 45.000000 |
    | F07A1260  | north-cn | 2021-01-01T14:00:00+08:00 | 46.000000 |
    | F07A1260  | north-cn | 2021-01-01T16:00:00+08:00 | 47.000000 |
    | F07A1260  | north-cn | 2021-01-01T18:00:00+08:00 | 47.000000 |
    | F07A1260  | north-cn | 2021-01-01T20:00:00+08:00 | 47.000000 |
    +-----------+----------+---------------------------+-----------+

Examples of aggregation after downsampling

  • Example 1: Use the AVG function to perform downsampling at an interval of 2 hours and then use the RATE function to calculate the change rate of values between two data points.

    SELECT device_id,region,time,rate(avg(humidity)) AS rate_humidity FROM sensor WHERE device_id='F07A1260' sample by 2h;

    The following result is returned:

    +-----------+----------+---------------------------+---------------+
    | device_id |  region  |           time            | rate_humidity |
    +-----------+----------+---------------------------+---------------+
    | F07A1260  | north-cn | 2021-01-01T12:00:00+08:00 | 0.002500      |
    | F07A1260  | north-cn | 2021-01-01T14:00:00+08:00 | 0.000139      |
    | F07A1260  | north-cn | 2021-01-01T20:00:00+08:00 | 0.000046      |
    +-----------+----------+---------------------------+---------------+
  • Example 2: Perform downsampling at an interval of 2 hours and then use the DELTA function to calculate the difference between the values of two data points.

    SELECT device_id,region,time,delta(avg(humidity)) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h;

    The following result is returned:

    +-----------+----------+---------------------------+-----------+
    | device_id |  region  |           time            | humidity  |
    +-----------+----------+---------------------------+-----------+
    | F07A1260  | north-cn | 2021-01-01T12:00:00+08:00 | 36.000000 |
    | F07A1260  | north-cn | 2021-01-01T14:00:00+08:00 | 1.000000  |
    | F07A1260  | north-cn | 2021-01-01T20:00:00+08:00 | 1.000000  |
    +-----------+----------+---------------------------+-----------+

Interpolation

During downsampling, all time series are split based on a specified time interval, and values of data points in each time window are aggregated. If no value is collected within a time window, a specified fill policy can be used to interpolate a value for the data point. For example, the timestamps of values that are collected in a time series after downsampling is completed are: t + 0, t + 20, and t + 30. If you do not specify conditions for interpolation in the query statement, only three values are returned. If you specify 1 for interpolation, four values are returned. The value 1 is interpolated as the value of the data point at t + 10.

Functions for interpolation

Fill Policy

Interpolated value

none

No values are interpolated. This is the default value.

zero

Interpolates the value 0.

linear

Performs linear interpolation.

previous

Interpolates the previous value.

near

Interpolates the adjacent value.

after

Interpolates the next value.

fixed

Interpolates a specified value.