TSDB aims to efficiently manage time series data. To achieve this purpose, TSDB is designed based on the basic principle of online databases: Time series data can be queried immediately after it is written into the database. In most cases, open features of TSDB do not change the semantics or accuracy of raw data.
As a result, the following issues may occur as the data volume increases.
- It takes a long time for TSDB to respond to queries on historical data or cold data within a wide time span or queries based on a large number of tags. Queries may even be terminated due to resource insufficiency.
- Some complex queries consume a lot of computing resources. When you perform complex online queries, the response time of the TSDB instance becomes unstable. As a result, the performance of other queries is affected.
TSDB integrates the time series preprocessing feature to fix the preceding issues by referring to the mechanism of Online Analytical Processing (OLAP). Similar to Multidimensional Online Analytical Processing (MOLAP), time series preprocessing enables real-time stream processing based on time granularity or custom rules in complex computing scenarios. In addition, TSDB can directly query the stream processing results. This allows TSDB to respond to time series queries on large amounts of data in seconds.
Time series preprocessing includes timeline processing and time series analysis. Timeline processing integrates with the internal query process of TSDB, and time series analysis enables rule-based data computing and analysis.
In time series preprocessing, a rollup is defined as a single timeline aggregated over time. It may also be called a time-based aggregation.Rollups improves the performance of data queries in wide time spans by aggregating time series data into lower resolution data. For example, rollups can store data points once every minute instead of once every second. This significantly reduces the amount of data to scan in a query.
If you are already skilled in using TSDB, you may be familiar with the downsampling feature in TSDB. Downsampling scans data points in each timeline and groups and aggregates data at a specified time interval, for example, one minute or one hour, to generate data points of a lower resolution.
A rollup is essentially the downsampling result of data before the data is stored. The rollups can be queried at any time. Therefore, the rollup feature is also known as pre-downsampling.
For example, if you write a data point every 60 seconds and query for one year of data, a time series returns more than 525,000 individual data points. Such a large number of data points can be difficult to plot. Instead, you may want to view lower resolution data. For example, if you write data points every one hour and query for one year of data, you only have about 8,000 data points to plot. Then, you can quickly plot your business diagrams and easily identify anomaly time ranges in the timeline. The rollup feature does not delete raw data points. You can still drill down for finer resolution data in specified time ranges.
The following are the key features of a rollup:
- Rollups tremendously reduce the number of data points to be scanned during a query. This shortens the response time.
- The rollup feature can generate data points at different intervals. For example, the data points at 1-minute, 5-minute, and 15-minute intervals can be generated at the same time. This enables TSDB to respond within seconds to queries on both higher-resolution and lower-resolution data.
If TSDB stores a large amount of data, it takes a long time for TSDB to filter and process data according to the rules you specify. The query performance may fluctuate due to unsatisfactory instance types, overloads, and other issues. The query may even fail because the number of data points to be scanned exceeds the upper limit or the query times out.
The rollup feature can reduce the number of data points to be scanned for queries by generating lower-resolution data. However, this feature is subject to the following limits:
- The rollup feature can only downsample a single timeline based on time.
- The rollup feature is not applicable to queries on high-cardinality metrics. Even if rollups can reduce the number of data points to be scanned in a single timeline, TSDB has to scan data points related to the same metric in different timelines for complex queries.
For example, assume the metric is CPU and you want to query the number of CPUs of multiple servers. Different servers are tagged to generate different timelines. Follow the instructions below to calculate the average number of CPUs:
- If only four web servers exist, you can use rollups to reduce the number of time points to be scanned in the four timelines, and then load data into memory to calculate the average number of CPUs.
- If 10,000 servers exist, TSDB has to scan the related data points in 10,000 timelines. In this case, rollups in single timelines hardly reduce the response time.
In the latter case, you can enable the pre-aggregate feature so that TSDB filters and aggregates data related to the metric in advance and writes the aggregation results to a new metric. In this way, TSDB can quickly return the results when you query the new metric.
By default, the pre-aggregate feature supports common aggregation and statistical functions.
The pre-aggregate feature is in internal testing and will be released in the future.
The rollup feature reduces the number of data points in each timeline. The pre-aggregate feature aggregate data points for multiple timelines and stores the aggregation results as a new timeline for quick queries. Both features focuses on data aggregation. However, the timeline analysis feature analyzes data points in different timelines. For example, this feature aligns the timestamps in different timelines, calculates the difference between the values of a metric at two adjacent timestamps, and analyzes the trend of the metric value in each timeline.
The timeline analysis feature is in internal testing and will be released in the future.