降採樣查詢是在時間維度的彙總查詢,是時序資料應用情境常用的降低採樣率的查詢。
引擎與版本
降採樣查詢僅支援時序引擎,且引擎版本需為3.4.15及以上版本。
文法
select_sample_by_statement ::= SELECT ( select_clause | '*' )
FROM table_identifier
WHERE where_clause
SAMPLE BY time_interval [ OFFSET offset_interval ] [ FILL fill_option ]
select_clause ::= selector [ AS identifier ] ( ',' selector [ AS identifier ] )
selector ::= tag_identifier | time | function_identifier '(' field_identifier [ ',' function_args ] ')'
where_clause ::= relation ( AND relation )* (OR relation)*
relation ::= ( field_identifier | tag_identifier ) operator term
operator ::= '=' | '<' | '>' | '<=' | '>=' | '!=' | IN | CONTAINS | CONTAINS KEY
time_interval ::= interval units | 0
offset_interval ::= interval units降採樣支援的彙總函式列表
SAMPLE BY是針對每條獨立的時間軸(關於時間軸的概念,請參見資料模型)上的降採樣操作。
SAMPLE BY支援的函數列表:
函數 | 說明 |
每個指定時間視窗內求和。 | |
每個指定時間視窗內求均值。 | |
每個指定時間視窗內值個數。 | |
每個指定時間視窗內最小值。 | |
每個指定時間視窗內最大值。 | |
每個指定時間視窗內第一個值。 | |
每個指定時間視窗內最後值。 | |
每個指定時間視窗內求百分位。 | |
整個時間區間最新值。 | |
與前一行對應值的變動率。 | |
與前一行對應值的差。 |
降採樣查詢中只支援按照tag列進行過濾,不支援按照field列進行過濾。如果指定了field列過濾條件會被忽略掉,這可能導致實際返回的資料結果與預期不一致。因此,如果需要對 field 列進行過濾建議您在降採樣查詢外過濾。
以LATEST函數latest(field, n) 為例,其中temperature為field列:
SELECT device_id,region,time,temperature FROM (SELECT device_id,region,time,latest(temperature, 2) AS temperature FROM sensor WHERE device_id in ('F07A1260', 'F07A1261') SAMPLE BY 0) WHERE temperature > 45.0;樣本
SELECT查詢指定的tag列無需指定降採樣函數,其他field列必須指定降採樣函數。
假設樣本表sensor的結構及表中資料如下:
-- 建立樣本表sensor
CREATE TABLE sensor (
device_id VARCHAR NOT NULL,
region VARCHAR NOT NULL,
time TIMESTAMP NOT NULL,
temperature DOUBLE,
humidity BIGINT,
PRIMARY KEY(device_id, region, time)
);
-- 插入資料
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1260','north-cn','2021-01-01 09:00:00',0,9);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1260','north-cn','2021-01-01 12:01:00',1,45);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1260','north-cn','2021-01-01 14:03:00',2,46);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1260','north-cn','2021-01-01 20:00:00',10,47);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1261','north-cn','2021-02-10 12:00:30',3,40);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1261','north-cn','2021-03-01 12:01:00',4,41);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1261','north-cn','2021-03-08 12:08:00',5,42);
UPSERT INTO sensor(device_id, region, time, temperature, humidity) VALUES('F07A1261','north-cn','2021-05-01 13:00:00',6,43);降採樣與子查詢樣本
降採樣查詢中不支援嵌套子查詢,但可以作為子查詢被其他查詢嵌套。
樣本1:預設UTC對齊降採樣,時間軸按照8h時間區間彙總分別求count。
SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 8h;返回結果如下:
+-----------+----------+---------------------------+----------------+ | device_id | region | time | count_humidity | +-----------+----------+---------------------------+----------------+ | F07A1260 | north-cn | 2021-01-01T08:00:00+08:00 | 3 | | F07A1260 | north-cn | 2021-01-01T16:00:00+08:00 | 1 | +-----------+----------+---------------------------+----------------+樣本2:預設UTC對齊降採樣,指定視窗offset位移,時間軸按照8h時間區間彙總,開始視窗位移3h,求count。
SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 8h offset 3h;返回結果如下:
+-----------+----------+---------------------------+----------------+ | device_id | region | time | count_humidity | +-----------+----------+---------------------------+----------------+ | F07A1260 | north-cn | 2021-01-01T03:00:00+08:00 | 1 | | F07A1260 | north-cn | 2021-01-01T11:00:00+08:00 | 2 | | F07A1260 | north-cn | 2021-01-01T19:00:00+08:00 | 1 | +-----------+----------+---------------------------+----------------+樣本3: 預設UTC對齊降採樣,對齊到當地時間0點(例如東8區時間),時間軸按照24h時間區間彙總,開始視窗位移16h,求count。
SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 24h offset 16h返回結果如下:
+-----------+----------+---------------------------+----------------+ | device_id | region | time | count_humidity | +-----------+----------+---------------------------+----------------+ | F07A1260 | north-cn | 2021-01-01T00:00:00+08:00 | 4 | +-----------+----------+---------------------------+----------------+樣本4:sample by目前不支援與group by、limit offset、order by配合,但可以配合子查詢使用。
SELECT device_id, max(avg_humidity) AS max_humidity FROM (SELECT device_id,region,time,avg(humidity) AS avg_humidity FROM sensor sample by 8h) group by device_id;返回結果如下:
+-----------+--------------+ | device_id | max_humidity | +-----------+--------------+ | F07A1261 | 43.000000 | | F07A1260 | 47.000000 | +-----------+--------------+樣本5:配合limit offset限制結果條數。
SELECT device_id,region, avg_humidity FROM (select device_id,region,time,avg(humidity) AS avg_humidity FROM sensor sample by 8h) limit 1 offset 1;返回結果如下:
+-----------+----------+--------------+ | device_id | region | avg_humidity | +-----------+----------+--------------+ | F07A1261 | north-cn | 40.000000 | +-----------+----------+--------------+
降採樣視窗插值樣本
樣本1:固定值插值。
SELECT * from (select device_id,region,time, avg(humidity) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h fill 1) order by device_id;返回結果如下:
+-----------+----------+---------------------------+-----------+ | device_id | region | time | humidity | +-----------+----------+---------------------------+-----------+ | F07A1260 | north-cn | 2021-01-01T08:00:00+08:00 | 9.000000 | | F07A1260 | north-cn | 2021-01-01T10:00:00+08:00 | 1.000000 | | F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 45.000000 | | F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 46.000000 | | F07A1260 | north-cn | 2021-01-01T16:00:00+08:00 | 1.000000 | | F07A1260 | north-cn | 2021-01-01T18:00:00+08:00 | 1.000000 | | F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 47.000000 | +-----------+----------+---------------------------+-----------+樣本2:後值插值。
SELECT * from (select device_id,region,time,avg(humidity) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h fill after) order by device_id;返回結果如下:
+-----------+----------+---------------------------+-----------+ | device_id | region | time | humidity | +-----------+----------+---------------------------+-----------+ | F07A1260 | north-cn | 2021-01-01T08:00:00+08:00 | 9.000000 | | F07A1260 | north-cn | 2021-01-01T10:00:00+08:00 | 45.000000 | | F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 45.000000 | | F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 46.000000 | | F07A1260 | north-cn | 2021-01-01T16:00:00+08:00 | 47.000000 | | F07A1260 | north-cn | 2021-01-01T18:00:00+08:00 | 47.000000 | | F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 47.000000 | +-----------+----------+---------------------------+-----------+
降採樣後再轉換樣本
樣本1:先按照2h視窗avg降採樣,然後對降採樣後的資料計算
rate斜率。SELECT device_id,region,time,rate(avg(humidity)) AS rate_humidity FROM sensor WHERE device_id='F07A1260' sample by 2h;返回結果如下:
+-----------+----------+---------------------------+---------------+ | device_id | region | time | rate_humidity | +-----------+----------+---------------------------+---------------+ | F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 0.002500 | | F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 0.000139 | | F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 0.000046 | +-----------+----------+---------------------------+---------------+樣本2:先按照2h視窗降採樣,然後對降採樣後的資料計算差值delta。
SELECT device_id,region,time,delta(avg(humidity)) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h;返回結果如下:
+-----------+----------+---------------------------+-----------+ | device_id | region | time | humidity | +-----------+----------+---------------------------+-----------+ | F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 36.000000 | | F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 1.000000 | | F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 1.000000 | +-----------+----------+---------------------------+-----------+
插值
降採樣先把所有時間軸按照指定時間視窗切分,並把每個降採樣區間內的資料做一次運算,降採樣後如果某個精度區間沒有值,插值可以指定在這個時間點填充具體的值。比如某條時間軸降採樣後的時間戳記為:t+0, t+20, t+30,此時如果不指定插值,只有3個值,如果指定了插值為1,此時間軸會有4個值,其中t+10時刻的值為1。
插值函數表:
Fill Policy | 填儲值 |
none | 預設行為,不填值。 |
zero | 固定填入0。 |
linear | 線性填儲值。 |
previous | 之前的一個值。 |
near | 鄰近的一個值。 |
after | 之後的一個值。 |
fixed | 用指定的一個固定填儲值。 |