Design a timeline structure to improve query efficiency - Time Series Database

This topic describes how to design a timeline structure to improve query efficiency.

What is a timeline?

In Time Series Database (TSDB), a timeline is defined by a metric and a group of tags. Time series data is collected at specific intervals on the timeline. For example, in {"metric":"cpu","tags":{"site":"et2","ip":"1.1.1.1","app":"TSDB"}}, the cpu metric and the tags are combined to define a timeline. Consecutive data points are generated based on the timeline. A data point consists of a timestamp and a value of the metric. The following figure shows three timelines.

Inverse indexes for timelines

To accelerate queries, TSDB generates inverse indexes for each timeline. TSDB generates an index for the metric and an index for each tag of a timeline. This way, the system indexes the timeline that corresponds to the metric and the tags. For example, TSDB generates the following inverse indexes for the timelines that are shown in the preceding figure.

Best practices

Reduce the number of timelines
TSDB uses the following factors to determine a timeline:
- Metric
- The number of tags
- The key and value of each tag
  Note
  We recommend that you do not set tag values to process IDs or timestamps when you design tags for timelines. We recommend that you minimize tag value changes. When improper tag values are specified, even if the number of metrics and the number of tag keys do not significantly change, the number of timelines can rapidly increase.
- Fields (Fields must be specified only when the metric consists of multiple fields.)

The maximum number of timelines is equal to the Cartesian product of the number of metrics and the number of tags. If metrics consist of multiple fields, the maximum number of timelines is equal to the Cartesian product of the number of metrics, the number of fields, and the number of tags. When you design timelines for a database, we recommend that you minimize the value changes of metrics and tags. This way, you can minimize the number of timelines and ensure that the number of timelines does not exceed the upper limit.

Reduce the number of timelines that use the same tag
We recommend that you do not configure the same tag for every timeline. If a tag is used to index a large number of timelines, the query efficiency is affected in a negative manner.

Reduce the number of timelines that you want to scan in a query
For example, Tag 1 is a subset of Tag 2. If you use Tag 1 in a query request, the query efficiency is higher than the efficiency when you use Tag 2. You can use the following methods to query data on Timeline 1 that is shown in the preceding figure:
- Method 1: {"metric":"cpu","tags":{"site":"et2","ip":"1.1.1.1","app":"TSDB"}}
- Method 2: {"metric":"cpu","tags":{"ip":"1.1.1.1"}}
  The second query method uses only the ip=1.1.1.1 tag and the ip=1.1.1.1 tag indexes only one timeline. Therefore, the second method offers higher query performance.