All Products
Search
Document Center

Time Series Database:10-minute quick start

Last Updated:Nov 25, 2021

This topic describes how to use TSQL to query time series data generated based on the performance monitoring of a data center.

Sample data

A benchmarking tool is used to test the time series performance and generate sample data. For more information, visit this link: [https://github.com/influxdata/influxdb-comparisons]. After you install and compile the benchmark tool, you can take the following steps to generate and load data to TSDB.

Generate data

cd influxdb-comparisons/cmds
bulk_data_gen/bulk_data_gen --seed=123 --use-case=devops --scale-var=10 --format=opentsdb --timestamp-start="2019-03-01T00:00:00Z" --timestamp-end="2019-03-01T00:10:00Z" > tsdb_devops_sf10_10m_seed123.json

The sample data is shown as follows.

{"metric":"redis.evicted_keys","timestamp":1551398990000,"tags":{"arch":"x86","datacenter":"us-east-1b","hostname":"host_9","os":"Ubuntu16.10","port":"1470","rack":"7","region":"us-east-1","server":"redis_29176","service":"14","service_environment":"production","service_version":"0","team":"LON"},"value":2951}
{"metric":"redis.keyspace_hits","timestamp":1551398990000,"tags":{"arch":"x86","datacenter":"us-east-1b","hostname":"host_9","os":"Ubuntu16.10","port":"1470","rack":"7","region":"us-east-1","server":"redis_29176","service":"14","service_environment":"production","service_version":"0","team":"LON"},"value":2945}
{"metric":"redis.keyspace_misses","timestamp":1551398990000,"tags":{"arch":"x86","datacenter":"us-east-1b","hostname":"host_9","os":"Ubuntu16.10","port":"1470","rack":"7","region":"us-east-1","server":"redis_29176","service":"14","service_environment":"production","service_version":"0","team":"LON"},"value":2944}
{"metric":"redis.instantaneous_ops_per_sec","timestamp":1551398990000,"tags":{"arch":"x86","datacenter":"us-east-1b","hostname":"host_9","os":"Ubuntu16.10","port":"1470","rack":"7","region":"us-east-1","server":"redis_29176","service":"14","service_environment":"production","service_version":"0","team":"LON"},"value":65}
{"metric":"redis.instantaneous_input_kbps","timestamp":1551398990000,"tags":{"arch":"x86","datacenter":"us-east-1b","hostname":"host_9","os":"Ubuntu16.10","port":"1470","rack":"7","region":"us-east-1","server":"redis_29176","service":"14","service_environment":"production","service_version":"0","team":"LON"},"value":58}

Load data

cat tsdb_devops_sf10_10m_seed123.json | bulk_load_opentsdb/bulk_load_opentsdb --urls=http://your_tsdb_host:port_num -workers=5

Queries

  • Query based on a time range: The following example queries all columns related to a metric within a specified time range. The columns include the value, the timestamp, and the tag values.

select *
from tsdb.`cpu.usage_system` 
where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:00:10'
  • Query based on a time range: The following example queries the specified columns related to a metric within a specified time range. The columns include the value, the timestamp, and the specified tag values.

select `value`, `timestamp`, hostname,  datacenter
from tsdb.`cpu.usage_system` 
where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:00:10'
  • Query based on a time range and tag keys: The following example queries the values and the timestamps of a metric within a specified time range. Metric data related to the listed host names is returned.

select `value`, `timestamp`, hostname,  datacenter
from tsdb.`cpu.usage_system` 
where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:00:10' and
      hostname in ('host_0', 'host_2', 'host_4')
  • Query results sorted by timestamp: The following example queries the value, timestamp, and tag values of a metric within a specified time range.

select *
from tsdb.`cpu.usage_system` 
where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:00:10'
order by `timestamp`
  • Query based on a condition that uses a mathematical expression to filter the metric value: The following example queries the value, timestamp, and tag values of a metric within the specified time range. This query returns only values of which the square roots are greater than 1.5.

select *
from tsdb.`cpu.usage_system` 
where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:00:10' and
    sqrt(`value`) > 1.5
  • Query based on group-based data aggregation: The following example groups data by host name and data center and calculates the maximum, minimum, and average values for each group.

select 
    hostname, 
    datacenter, 
    max(`value`) as maxV, 
    min(`value`) as minV, 
    avg(`value`) as avgV
from tsdb.`cpu.usage_system` 
where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:00:10'
group by hostname, datacenter
  • Query based on group- and time-based data aggregation: The following example groups data based on the host name, the data center, and the timestamp interval of two minutes. This example also calculates the maximum, minimum, and average metric values in each group.

select 
   hostname, 
   datacenter, 
   tumble(`timestamp`, interval '2' minute) as ts, 
   max(`value`) as maxV, 
   min(`value`) as minV, 
   avg(`value`) as avgV
from tsdb.`cpu.usage_system` 
where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:10:00'
group by hostname, datacenter, ts
  • Query based on group- and time-based aggregation: The following example groups data and aggregates data by time and then returns the output of the expression max(``value``) - min(``value``) + 0.5* avg(``value).

select 
    hostname, 
    datacenter, 
    tumble(`timestamp`, interval '2' minute) as ts, 
    max(`value`) - min(`value`) + 0.5* avg(`value`) as compV
from tsdb.`cpu.usage_system` 
where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:10:00'
group by hostname, datacenter, ts
  • Calculation of the difference between the values that are recorded on each host at two adjacent timestamps

The following example uses the window function lag(). This function returns the value of the record before the current record in the same frame. In the frame, data is grouped by host name and sorted by timestamp. Then, the lag() function calculates the difference between the two record values. In this way, you can obtain the difference between the values that are recorded on each host at two adjacent timestamps.

select hostname, `timestamp`, `value`, 
       `value` - lag(`value`) over(partition by hostname order by `timestamp`) as diff
from tsdb.`cpu.usage_system` 
  where `timestamp` between '2019-03-01' and '2019-03-01 00:10:00'
  • Calculation of the difference between the values that are recorded on each host at two adjacent timestamps. If the difference exceeds a specified value, the system sets the difference to 0.

The following example includes the preceding query as a subquery and adds a case statement to the subquery. The case statement indicates that if the difference exceeds 50.0, the system sets the difference to 0.0.

select hostname, `timestamp`, `value`, 
        case when diff > 50.0 then 0.0
                else diff
        end
from (
    select hostname, `timestamp`, `value`, 
       `value` - lag(`value`) over(partition by hostname order by `timestamp`) as diff
    from tsdb.`cpu.usage_system` 
      where `timestamp` between '2019-03-01' and '2019-03-01 00:10:00'
);
  • Calculation of the maximum value recorded on each host within the last minute, and the difference between the maximum values that are recorded for two adjacent minutes.

Different from the preceding query, the following example calculates the difference between the maximum values that are recorded for two adjacent minutes. You can calculate the maximum value recorded for each minute in a subquery, and use the window function lag() to calculate the difference between the maximum values that are recorded for two adjacent minutes.

select hostname, ts, maxValue, 
             maxValue - lag(maxValue) over(partition by hostname order by ts) as diff
from (
  select hostname, 
              tumble(`timestamp`, interval '1' minute) ts, max(`value`) maxValue
  from tsdb.`cpu.usage_system` 
  where `timestamp` between '2019-03-01' and '2019-03-01 00:10:00'
  group by hostname, ts)