Time Series Database:Key concepts - Time Series Database

TSDB for InfluxDB® organizes time series data around a small set of core concepts. Understanding these concepts before designing your schema or writing queries will save you significant debugging time later.

This document walks through each concept using a single sample dataset, so you can see exactly how the pieces fit together. For a complete glossary, see Terms.

Concepts covered: database · field key · field set · field value · measurement · point · retention policy · series · tag key · tag set · tag value · timestamp

Sample data

All examples in this document use the following dataset. It records the number of butterflies and honeybees counted by two scientists—langstroth and perpetua—at two locations between 00:00 and 06:12 on August 18, 2015. The data is stored in the my_database database under the autogen retention policy.

name: census

time	butterflies	honeybees	location	scientist
2015-08-18T00:00:00Z	12	23	1	langstroth
2015-08-18T00:00:00Z	1	30	1	perpetua
2015-08-18T00:06:00Z	11	28	1	langstroth
2015-08-18T00:06:00Z	3	28	1	perpetua
2015-08-18T05:54:00Z	2	11	2	langstroth
2015-08-18T06:00:00Z	1	10	2	langstroth
2015-08-18T06:06:00Z	8	23	2	perpetua
2015-08-18T06:12:00Z	7	22	2	perpetua

Timestamp

Every data point in TSDB for InfluxDB® has a time column. Timestamps follow the RFC 3339 standard and use the UTC+0 time zone.

Fields

Fields hold your actual measured values. They are a required part of every data point.

Field key

A field key is the name of a field, stored as a string. In the sample data, butterflies and honeybees are field keys.

Field value

A field value is the data recorded for a field key at a given timestamp. Field values can be strings, floating-point numbers, integers, or Boolean values.

The sample data contains these field values:

Field set

A field set is the collection of all field key-value pairs at a single point in time. The sample data contains eight field sets:

butterflies = 12, honeybees = 23
butterflies = 1,  honeybees = 30
butterflies = 11, honeybees = 28
butterflies = 3,  honeybees = 28
butterflies = 2,  honeybees = 11
butterflies = 1,  honeybees = 10
butterflies = 8,  honeybees = 23
butterflies = 7,  honeybees = 22

Fields are not indexed

TSDB for InfluxDB® does not index field values. Queries that filter on a field value must scan every stored value before returning results, which slows performance—especially on large datasets. If you frequently filter on a particular value, store it as a tag instead.

Tags

Tags store metadata about your data points. Unlike fields, tags are optional—but because they are indexed, queries that filter on tags run significantly faster than those that filter on field values.

Tag key

A tag key is the name of a tag, stored as a string. In the sample data, location and scientist are tag keys.

Tag value

A tag value is the string associated with a tag key. In the sample data, location has values 1 and 2, and scientist has values langstroth and perpetua.

Tag set

A tag set is a unique combination of tag key-value pairs. The sample data contains four tag sets:

location = 1, scientist = langstroth
location = 1, scientist = perpetua
location = 2, scientist = langstroth
location = 2, scientist = perpetua

Tags are indexed

Because tags are indexed, the query engine locates matching tag values without scanning every record. Store metadata you query frequently—such as host names, sensor IDs, or regions—as tags rather than fields.

Why indexing matters: the schema case study

Consider a schema where most queries filter on field values:

SELECT * FROM "census" WHERE "butterflies" = 1
SELECT * FROM "census" WHERE "honeybees" = 23

Because fields are not indexed, each query scans every value of the respective field before returning results. At scale, this significantly increases query latency.

To optimize these queries, swap the schema: make butterflies and honeybees tags, and make location and scientist fields.

name: census

time	location	scientist	butterflies	honeybees
2015-08-18T00:00:00Z	1	langstroth	12	23
2015-08-18T00:00:00Z	1	perpetua	1	30
2015-08-18T00:06:00Z	1	langstroth	11	28
2015-08-18T00:06:00Z	1	perpetua	3	28
2015-08-18T05:54:00Z	2	langstroth	2	11
2015-08-18T06:00:00Z	2	langstroth	1	10
2015-08-18T06:06:00Z	2	perpetua	8	23
2015-08-18T06:12:00Z	2	perpetua	7	22

With butterflies and honeybees indexed as tags, the same queries return results without a full scan.

Measurement

A measurement is a named container for tags, fields, and timestamps. Think of it as the equivalent of a SQL table—the name describes what the data represents.

In the sample data, the measurement is census. The name signals that field values record counts of insects, not sizes, directions, or other attributes.

A single measurement can belong to more than one retention policy.

Retention policy

A retention policy controls two things: how long TSDB for InfluxDB® retains data (set with the DURATION clause) and how many copies of each data point exist in a cluster (set with the REPLICATION clause).

Note

Replication factors apply to cluster deployments only, not standalone instances.

TSDB for InfluxDB® automatically creates the autogen retention policy, which stores data indefinitely with a replication factor of 1. All data in the sample dataset uses this policy.

Series

A series is a collection of data points that share the same retention policy, measurement, and tag set. The sample data contains four series:

Series	Retention policy	Measurement	Tag set
series 1	autogen	census	location = 1, scientist = langstroth
series 2	autogen	census	location = 2, scientist = langstroth
series 3	autogen	census	location = 1, scientist = perpetua
series 4	autogen	census	location = 2, scientist = perpetua

Understanding series before designing your schema helps you predict query patterns and avoid performance issues from high series cardinality.

Point

A data point is a set of field key-value pairs that share the same timestamp within the same series. For example:

name: census
-----------------
time                    butterflies  honeybees  location  scientist
2015-08-18T00:00:00Z    1            30         1         perpetua

This data point belongs to series 3 (retention policy: autogen, measurement: census, tag set: location = 1, scientist = perpetua).

Database

A database is the top-level logical container in TSDB for InfluxDB®. It holds users, retention policies, continuous queries, and measurements. TSDB for InfluxDB® databases are schemaless—you can add new measurements, tags, and fields at any time.

The sample data is stored in the my_database database.

What's next

InfluxDB® is a trademark registered by InfluxData, which is not affiliated with, and does not endorse, TSDB for InfluxDB®.