A downsampling query is an aggregate query on the time dimension that is commonly used to reduce the sample rate of time series data.
Engine and version
Downsampling queries are supported only by LindormTSDB. The database engine version must be 3.4.15 or later.
For more information about how to view and upgrade the LindormTSDB version of the Lindorm instance, see Release notes of LindormTSDB and Upgrade the minor engine version of a Lindorm instance.
Syntax
select_sample_by_statement ::= SELECT ( select_clause | '*' )
FROM table_identifier
WHERE where_clause
SAMPLE BY time_interval [ OFFSET offset_interval ] [ FILL fill_option ]
select_clause ::= selector [ AS identifier ] ( ',' selector [ AS identifier ] )
selector ::= tag_identifier | time | function_identifier '(' field_identifier [ ',' function_args ] ')'
where_clause ::= relation ( AND relation )* (OR relation)*
relation ::= ( field_identifier | tag_identifier ) operator term
operator ::= '=' | '<' | '>' | '<=' | '>=' | '!=' | IN | CONTAINS | CONTAINS KEY
time_interval ::= interval units | 0
offset_interval ::= interval unitsList of supported aggregate functions
The SAMPLE BY clause performs a downsampling operation on each individual time series. For more information about time series, see Data model.
The following table lists the functions supported by SAMPLE BY.
Function | Description |
Calculates the sum of values within each specified time window. | |
Calculates the mean of values within each specified time window. | |
Counts the number of values within each specified time window. | |
Finds the minimum value within each specified time window. | |
Finds the maximum value within each specified time window. | |
Finds the first value within each specified time window. | |
Finds the last value within each specified time window. | |
Calculates the percentile within each specified time window. | |
Finds the latest value within the entire time range. | |
Calculates the rate of change from the value in the previous row. | |
Calculates the difference from the value in the previous row. |
Downsampling queries support filtering only by tag columns, not by field columns. If you specify a filter condition for a field column, the condition is ignored. This may cause the returned data to differ from your expectations. To filter by a field column, you must filter the data outside the downsampling query.
Take the LATEST function latest(field, n) as an example, where temperature is a field column:
SELECT device_id,region,time,temperature FROM (SELECT device_id,region,time,latest(temperature, 2) AS temperature FROM sensor WHERE device_id in ('F07A1260', 'F07A1261') SAMPLE BY 0) WHERE temperature > 45.0;Examples
A downsampling function is not required for tag columns specified in a SELECT query. However, a downsampling function must be specified for all other field columns.
The following examples use a sample table named sensor that has the following structure and data:
-- Create the sample table sensor
CREATE TABLE sensor (
device_id VARCHAR NOT NULL,
region VARCHAR NOT NULL,
time TIMESTAMP NOT NULL,
temperature DOUBLE,
humidity BIGINT,
PRIMARY KEY(device_id, region, time)
);
-- Insert data
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1260','north-cn','2021-01-01 09:00:00',0,9);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1260','north-cn','2021-01-01 12:01:00',1,45);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1260','north-cn','2021-01-01 14:03:00',2,46);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1260','north-cn','2021-01-01 20:00:00',10,47);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1261','north-cn','2021-02-10 12:00:30',3,40);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1261','north-cn','2021-03-01 12:01:00',4,41);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1261','north-cn','2021-03-08 12:08:00',5,42);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1261','north-cn','2021-05-01 13:00:00',6,43);Downsampling and subquery examples
Downsampling queries do not support nested subqueries, but they can be nested as subqueries within other queries.
Example 1: Calculate the count for each time series by aggregating data into 8 hour time windows with default UTC snapping.
SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 8h;The following result is returned:
+-----------+----------+---------------------------+----------------+ | device_id | region | time | count_humidity | +-----------+----------+---------------------------+----------------+ | F07A1260 | north-cn | 2021-01-01T08:00:00+08:00 | 3 | | F07A1260 | north-cn | 2021-01-01T16:00:00+08:00 | 1 | +-----------+----------+---------------------------+----------------+Example 2: Calculate the count for each time series by aggregating data into 8 hour time windows. This query uses default UTC snapping and a 3 hour offset for the start of the window.
SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 8h offset 3h;The following result is returned:
+-----------+----------+---------------------------+----------------+ | device_id | region | time | count_humidity | +-----------+----------+---------------------------+----------------+ | F07A1260 | north-cn | 2021-01-01T03:00:00+08:00 | 1 | | F07A1260 | north-cn | 2021-01-01T11:00:00+08:00 | 2 | | F07A1260 | north-cn | 2021-01-01T19:00:00+08:00 | 1 | +-----------+----------+---------------------------+----------------+Example 3: Calculate the count for each time series by aggregating data into 24 hour time windows. This query snaps to 00:00 local time (for example, in the UTC+8 time zone) using a 16 hour offset with default UTC snapping.
SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 24h offset 16hThe following result is returned:
+-----------+----------+---------------------------+----------------+ | device_id | region | time | count_humidity | +-----------+----------+---------------------------+----------------+ | F07A1260 | north-cn | 2021-01-01T00:00:00+08:00 | 4 | +-----------+----------+---------------------------+----------------+Example 4: You cannot use `SAMPLE BY` with `GROUP BY`, `LIMIT OFFSET`, or `ORDER BY`. To use these clauses, you must nest the `SAMPLE BY` query as a subquery.
SELECT device_id, max(avg_humidity) AS max_humidity FROM (SELECT device_id,region,time,avg(humidity) AS avg_humidity FROM sensor sample by 8h) group by device_id;The following result is returned:
+-----------+--------------+ | device_id | max_humidity | +-----------+--------------+ | F07A1261 | 43.000000 | | F07A1260 | 47.000000 | +-----------+--------------+Example 5: Use `LIMIT OFFSET` to limit the number of results.
SELECT device_id,region, avg_humidity FROM (select device_id,region,time,avg(humidity) AS avg_humidity FROM sensor sample by 8h) limit 1 offset 1;The following result is returned:
+-----------+----------+--------------+ | device_id | region | avg_humidity | +-----------+----------+--------------+ | F07A1261 | north-cn | 40.000000 | +-----------+----------+--------------+
Downsampling window interpolation examples
Example 1: Interpolate with a static value.
SELECT * from (select device_id,region,time, avg(humidity) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h fill 1) order by device_id;The following result is returned:
+-----------+----------+---------------------------+-----------+ | device_id | region | time | humidity | +-----------+----------+---------------------------+-----------+ | F07A1260 | north-cn | 2021-01-01T08:00:00+08:00 | 9.000000 | | F07A1260 | north-cn | 2021-01-01T10:00:00+08:00 | 1.000000 | | F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 45.000000 | | F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 46.000000 | | F07A1260 | north-cn | 2021-01-01T16:00:00+08:00 | 1.000000 | | F07A1260 | north-cn | 2021-01-01T18:00:00+08:00 | 1.000000 | | F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 47.000000 | +-----------+----------+---------------------------+-----------+Example 2: Interpolate with the next value.
SELECT * from (select device_id,region,time,avg(humidity) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h fill after) order by device_id;The following result is returned:
+-----------+----------+---------------------------+-----------+ | device_id | region | time | humidity | +-----------+----------+---------------------------+-----------+ | F07A1260 | north-cn | 2021-01-01T08:00:00+08:00 | 9.000000 | | F07A1260 | north-cn | 2021-01-01T10:00:00+08:00 | 45.000000 | | F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 45.000000 | | F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 46.000000 | | F07A1260 | north-cn | 2021-01-01T16:00:00+08:00 | 47.000000 | | F07A1260 | north-cn | 2021-01-01T18:00:00+08:00 | 47.000000 | | F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 47.000000 | +-----------+----------+---------------------------+-----------+
Transform after downsampling examples
Example 1: First, downsample the data by calculating the average value in 2 hour windows. Then, calculate the
rateof change for the downsampled data.SELECT device_id,region,time,rate(avg(humidity)) AS rate_humidity FROM sensor WHERE device_id='F07A1260' sample by 2h;The following result is returned:
+-----------+----------+---------------------------+---------------+ | device_id | region | time | rate_humidity | +-----------+----------+---------------------------+---------------+ | F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 0.002500 | | F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 0.000139 | | F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 0.000046 | +-----------+----------+---------------------------+---------------+Example 2: First, downsample the data into 2 hour windows. Then, calculate the delta (difference) for the downsampled data.
SELECT device_id,region,time,delta(avg(humidity)) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h;The following result is returned:
+-----------+----------+---------------------------+-----------+ | device_id | region | time | humidity | +-----------+----------+---------------------------+-----------+ | F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 36.000000 | | F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 1.000000 | | F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 1.000000 | +-----------+----------+---------------------------+-----------+
Interpolation
Downsampling divides each time series into specified time windows and performs an operation on the data within each window. After downsampling, if a time window contains no data points, you can use interpolation to fill that time point with a specified value. For example, a time series has data points at timestamps t+0, t+20, and t+30. If you downsample by a 10 unit time window, the window starting at t+10 is empty. Without interpolation, only three points are returned. If you use interpolation to fill empty windows with the value 1, a new data point with the value 1 is added at timestamp t+10.
The following table describes the interpolation functions.
Fill Policy | Fill value |
none | Default behavior. Does not fill values. |
zero | Fills with 0. |
linear | Fills with a linearly interpolated value. |
previous | The previous value. |
near | An adjacent value. |
after | The next value. |
fixed | Fills with a specified static value. |