All Products
Search
Document Center

Lindorm:Time series anomaly detection

Last Updated:Mar 19, 2024

This topic describes the algorithms and syntax that are used to detect time series anomalies.

Applicable engines and versions

The time series anomaly detection syntax is applicable only to LindormTSDB. The time series anomaly detection syntax is supported by all versions of LindormTSDB.

Limits

The time series anomaly detection syntax must be used together with the SAMPLE BY clause.

Overview

Time series anomaly detection supports the online anomaly detection algorithms developed by DAMO Academy to detect abnormal points in the specified time series. During detection, these algorithms continuously learn the characteristics of time series data, such as data trends or periods, to detect anomalies in time series points that are newly inserted. For example, if the value of a newly-added time series point is significantly different from other points, the algorithm assumes that this point may be abnormal.

You can use time series anomaly detection with the SAMPLE BY clause by using the following methods:

  • Use the SAMPLE BY 0 clause to detect each data point in all time series. For more information about how to use the clause, see Example 1, Example 2, and Example 3.

  • Use the SAMPLE BY INTERVAL clause to specify the downsampling interval and use nested downsampling operators, such as MIN, MAX, AVG, COUNT, and SUM.

    Important

    The value of INTERVAL cannot be 0.

    For more information about how to use the clause, see Example 4.

  • Use the SAMPLE BY 0 clause and nested downsampling operators, such as LATEST, DELTA, and RATE, to query different data. For more information about how to use the clause, see Example 5.

Syntax

select_sample_by_statement ::=  SELECT ( select_clause )
                                FROM table_identifier
                                WHERE where_clause
                                SAMPLE BY 0
select_clause              ::=  selector [ AS identifier ] ( ',' selector [ AS identifier ] )
selector                   ::=  tag_identifier, | time | anomaly_detect '(' field_identifier ',' algo_identifier | model_identifier [ ',' options] ')'
where_clause               ::=  relation ( AND relation )* (OR relation)*
relation                   ::=  ( field_identifier| tag_identifier, ) operator term
operator                   ::=  '=' | '<' | '>' | '<=' | '>=' | '!=' | IN | CONTAINS | CONTAINS KEY

In the syntax, anomaly_detect indicates the anomaly detection function. The following table describes the parameters that you can configure.

Parameter

Description

field_identifier

The name of the field column.

Note

Data in the specified field column cannot be of the VARCHAR or BOOLEAN type.

algo_identifier

The name of the algorithm used to detect anomalies. The online anomaly detection algorithms developed by DAMO Academy are supported.

  • esd: an algorithm that is applicable to spiked anomalies, such as spikes in monitoring curves and scenarios in which a small number of data points are significantly different from other data points.

  • nsigma: an algorithm that is simple and easy to analyze the causes of anomalies.

  • ttest: an algorithm that is used to identify whether the metrics related to time series data are abnormal because of a change in the average value.

  • istl-esd: an algorithm that is applicable to detect anomalies in periodic data.

Note

The algo_identifier parameter is applicable to scenarios in which in-database machine learning is not enabled and anomalies related to time series data must be detected.

model_identifier

The name of the model used to detect anomalies.

Note
  • The value of the model_identifier parameter is of the VARCHAR type.

  • The model_identifier parameter is applicable to scenarios in which in-database machine learning is enabled and anomalies related to time series data must be detected. For more information, see In-database machine learning.

options

The options used to adjust the detection effect. This parameter is optional. Configure the options in the key1=value1 key2=value2 format.

Category

The following table describes the anomaly detection algorithms supported by LindormTSDB and scenarios to which the algorithms are applicable.

Algorithm

Scenario

esd

  • This algorithm can detect single abnormal points.

  • This algorithm is applicable to detect abnormal data spikes. This algorithm can accurately detect a small numbers of abnormal points whose values are significantly different from other points.

nsigma

  • This algorithm can detect single abnormal points.

  • This algorithm is simple and easy to use. You can use the results returned by this algorithm to easily identify the root causes of the detected anomalies.

Note

We recommend that you do not use this algorithm to detect a small numbers of abnormal points whose values are significantly different from other points. In this case, the detection results returned by this algorithm may be inaccurate.

ttest

  • This algorithm can detect time series data within a specific time window.

  • This algorithm detects anomalies that are caused by the variation of the average value of the specified metrics within a specific time window. For example, the algorithm can detect whether the average value of a time series array within the specified time window is significantly different from the average value of a time series array within the reference time window.

    • You can obtain the time window within which you want to detect anomalies for time series data by checking the array length specified by lenDetectWindow.

    • You can obtain the reference time window by checking the array length specified by lenHistoryWindow.

Incremental STL with ESD (istl-esd)

This algorithm is applicable to detect anomalies in periodic data. The istl-esd algorithm is an Incremental STL algorithm developed by DAMO Academy. The Incremental STL algorithm can decompose periodic incremental data into periodic terms, trend terms, and residual terms. The istl-esd algorithm integrates the Incremental STL algorithm with the esd algorithm. The Incremental STL algorithm is used to decompose periodic incremental data, and the esd algorithm is used to detect anomalies in the residual terms decomposed from the periodic data. The esd algorithm can detect non-periodic spikes based on the decomposed residual terms.

Incremental STL with Nsigma (istl-nsigma)

This algorithm is applicable to detect anomalies in periodic data. The Incremental STL algorithm can decompose periodic incremental data into periodic terms, trend terms, and residual terms. The istl-nsigma algorithm integrates the Incremental STL algorithm with the nsigma algorithm. The Incremental STL algorithm is used to decompose periodic incremental data, and the nsigma algorithm is used to detect anomalies in the residual terms decomposed from the periodic data. The nsigma algorithm can detect non-periodic spikes based on the decomposed residual terms.

The following figures show the application scenarios of different algorithms.

  • esd: This algorithm can be used to detect each data point in time series and is applicable to scenarios in which a small number of abnormal data points exist among a large number of normal data points with stable values.image.png

  • nsigma: This algorithm can be used to detect each data point in time series and is applicable to scenarios in which the values of abnormal data points are significantly different from the historical average value. You can configure the n parameter of this algorithm to adjust the allowed difference between the value of a data point and the historical average value.image.png

  • ttest: This algorithm can be used to detect anomalies from time series data within a time window and is applicable to scenarios in which the average value of the specified metrics significantly varies within two consecutive time windows.

    image.pngimage.png

  • istl-esd: This algorithm can be used to detect anomalies in periodic time series data. This algorithm removes the periodic terms of the original data and then uses the esd algorithm to detect anomalies. The istl-esd algorithm is applicable to scenarios in which a small number of abnormal data points exist among a large number of normal periodic data points with stable values.

    image.png

  • istl-nsigma: This algorithm can be used to detect anomalies in periodic time series data. This algorithm removes the periodic terms of the original data and then uses the nsigma algorithm to detect anomalies. The istl-nsigma algorithm is applicable to scenarios in which the values of abnormal data points are significantly different from the historical average value.image.png

Parameters

You can configure parameters for the anomaly detection algorithms that you use. These parameters can be categorized into common parameters, training parameters, and inference parameters. You can specify the optional parameter options to adjust the detection performance of the anomaly detection algorithm.

Note

Common parameters

You can configure common parameters to control the debugging, diagnosis, and other behaviors performed by algorithms during anomaly detection. Common parameters are applicable for all supported anomaly detection algorithms. The following table describes the common parameters that you can configure.

Parameter

Type

Default value

Description

verbose

BOOLEAN

FALSE

Specifies whether to return detailed information and identify the detection result of the specified columns. The returned information varies with the algorithm that you use. Valid values:

  • TRUE

  • FALSE

If you set this parameter to TRUE, additional columns are displayed in the returned results to show the detailed information. For more information, see the "Detailed information returned for the verbose parameter" section.

adhoc_state

BOOLEAN

FALSE

Specifies whether the anomaly detection status of the algorithm is available only in the current query. For more information about the anomaly detection status, see Exception detection status.

direction

VARCHAR

UP

The types of anomalies that you want to detect. Valid values:

  • Up: Only the abnormal increasing of time series data is detected as anomalies.

  • Down: Only the abnormal decreasing of time series data is detected as anomalies.

  • Both: The abnormal increasing and decreasing in time series data are both detected as anomalies.

Detailed information returned for the verbose parameter

Algorithm

Additional column

Type

Valid value

Description

esd

anomaly

BOOLEAN

  • TRUE

  • FALSE

  • TRUE: The current data point is abnormal.

  • FALSE: The current data point is normal.

anomalyLevel

STRING

  • NORMAL

  • UNKNOW

  • NORMAL: No anomalies are detected for the current data point.

  • UNKNOW: Anomalies are detected for the current data point and are not classified.

detectedDirection

STRING

  • UP

  • DOWN

  • NONE

  • UP: The value of the current data point is larger than the statistical value within the window.

  • DOWN: The value of the current data point is less than the statistical value within the window.

  • NONE: The current data point is normal and the value in the anomaly column is FALSE.

anomalyScore

DOUBLE

[0, Double.MAX_VALUE]

The score of the detected anomaly. A larger value indicates that the anomaly of the current data point is more obvious.

threshold

DOUBLE

[0, Double.MAX_VALUE]

The threshold based on which the algorithm determines whether the current data point is abnormal.

If the value in the anomalyScore column is larger than the value in the threshold column, the current data point is abnormal. If the value in the anomalyScore column is less than the value in the threshold column, the current data point is normal.

The threshold is calculated based on the alpha and lenHistoryWindow parameters. The threshold increases when the value of alpha decreases or the value of lenHistoryWindow increases.

upperBound

DOUBLE

All values of the DOUBLE type

The upper boundary for anomaly detection.

For example, if you set the maxAnomalyRatio parameter to 0.3, the value in the upperBound column is the 70% (calculated by the following formula: 1 - maxAnomalyRatio) distribution value of the ordered data within the window. In this case, data points whose values are smaller than the upperBound value are not detected for anomalies.

Note

The length of the window is specified by the lenHistoryWindow parameter.

If the value of a data point is within the [lowerBound, upperBound] range, the algorithm determines that the data point is normal. If the value of a data point is not within the range, the algorithm calculates the value of anomalyScore for the data point. If the anomalyScore of the data point is larger than the value in the threshold column, the data point is determined abnormal.

lowerBound

DOUBLE

All values of the DOUBLE type

The lower boundary for anomaly detection.

For example, if you set the maxAnomalyRatio parameter to 0.3, the value in the lowerBound column is the 30% distribution value of the ordered data within the window. In this case, data points whose values are larger than the lowerBound value are not detected for anomalies.

Note

The length of the window is specified by the lenHistoryWindow parameter.

If the value of a data point is within the [lowerBound, upperBound] range, the algorithm determines that the data point is normal. If the value of a data point is not within the range, the algorithm calculates the value of anomalyScore for the data point. If the anomalyScore of the data point is larger than the value in the threshold column, the data point is determined abnormal.

mean

DOUBLE

All values of the DOUBLE type

The average value of the data points within the window.

median

DOUBLE

All values of the DOUBLE type

The median of the data points within the window.

std

DOUBLE

All values of the DOUBLE type

The standard deviation of the data points within the window.

latestTimestamp

LONG

Positive integer

The timestamp of the latest data point within the window.

warmup

BOOLEAN

  • TRUE

  • FALSE

  • TRUE: The algorithm is being initialized and does not detect anomalies.

  • FALSE: The algorithm is initialized.

ttest

anomaly

BOOLEAN

  • TRUE

  • FALSE

  • TRUE: The current data point is abnormal.

  • FALSE: The current data point is normal.

anomalyLevel

STRING

  • NORMAL

  • UNKNOW

  • NORMAL: No anomalies are detected for the current data point.

  • UNKNOW: Anomalies are detected for the current data point and are not classified.

detectedDirection

STRING

  • UP

  • DOWN

  • NONE

  • UP: The value of the current data point is larger than the statistical value within the window.

  • DOWN: The value of the current data point is less than the statistical value within the window.

  • NONE: The current data point is normal and the value in the anomaly column is FALSE.

pValue

DOUBLE

(0, 1)

The ratio that indicates how the value of the current data point deviates from the statistical value within the window. A larger value indicates that value of the current data point deviates from the statistical value more significantly.

threshold

DOUBLE

(0, 1)

The threshold based on which the algorithm determines whether the current data point is abnormal.

  • If the value in the pValue column is less than the value in the threshold column, the current data point is abnormal.

  • If the value of pValue is larger than the value in the threshold column, the current data point is normal.

trendScore

DOUBLE

All values of the DOUBLE type

The degree of change in the trend of the data points. The larger the absolute value, the more obvious the trend changes.

  • If the value in the trendScore column is larger than zero, the trend of the data points is upward.

  • If the value in the trendScore column is less than zero, the trend of the data points is downward.

mean

DOUBLE

All values of the DOUBLE type

The average value of the data points within the window. The length of the window is specified by the lenHistoryWindow parameter.

std

DOUBLE

All values of the DOUBLE type

The standard deviation of the data points within the window.

latestTimestamp

LONG

Positive integer

The timestamp of the latest data point within the window.

warmup

BOOLEAN

  • TRUE

  • FALSE

  • TRUE: The algorithm is being initialized and does not detect anomalies.

  • FALSE: The algorithm is initialized.

nsigma

anomaly

BOOLEAN

  • TRUE

  • FALSE

  • TRUE: The current data point is abnormal.

  • FALSE: The current data point is normal.

anomalyLevel

STRING

  • NORMAL

  • UNKNOW

  • NORMAL: No anomalies are detected for the current data point.

  • UNKNOW: Anomalies are detected for the current data point and are not classified.

detectedDirection

STRING

  • UP

  • DOWN

  • NONE

  • UP: The value of the current data point is larger than the statistical value within the window.

  • DOWN: The value of the current data point is less than the statistical value within the window.

  • NONE: The current data point is normal and the value in the anomaly column is FALSE.

anomalyScore

DOUBLE

[0, Double.MAX_VALUE]

The score of the detected anomaly. A larger value indicates that the anomaly of the current data point is more obvious.

threshold

DOUBLE

[0, Double.MAX_VALUE]

The judgment threshold, which is used to determine whether the current data point is abnormal.

If the value in the anomalyScore column is larger than the value in the threshold column, the current data point is abnormal. If the value in the anomalyScore column is less than the value in the threshold column, the current data point is normal.

mean

DOUBLE

All values of the DOUBLE type

The average value of the data points within the window.

std

DOUBLE

All values of the DOUBLE type

The standard deviation of the data points within the window.

latestTimestamp

LONG

Positive integer

The timestamp of the latest data point within the window.

warmup

BOOLEAN

  • TRUE

  • FALSE

  • TRUE: The algorithm is being initialized and does not detect anomalies.

  • FALSE: The algorithm is initialized.

istl-esd

anomaly

BOOLEAN

  • TRUE

  • FALSE

  • TRUE: The current data point is abnormal.

  • FALSE: The current data point is normal.

anomalyLevel

STRING

  • NORMAL

  • UNKNOW

  • NORMAL: No anomalies are detected for the current data point.

  • UNKNOW: Anomalies are detected for the current data point and are not classified.

residual

DOUBLE

All values of the DOUBLE type

The residual value of the original data after the periodic component and the trend component are removed.

In the ISTL algorithm, data points are decomposed into three component in the following format: residual+trend+season.

If the algorithm is being initialized (value in the warmup column is TRUE), only the default value 0 is returned in this column.

trend

DOUBLE

All values of the DOUBLE type

The trending component in the original data.

If the algorithm is being initialized (value in the warmup column is TRUE), only the default value 0 is returned in this column.

season

DOUBLE

All values of the DOUBLE type

The periodic component in the original data.

If the algorithm is being initialized (value in the warmup column is TRUE), only the default value 0 is returned in this column.

warmup

BOOLEAN

  • TRUE

  • FALSE

  • TRUE: The algorithm is being initialized and does not detect anomalies.

    Note

    Four cycles of data points are required to initialize the algorithm. During initialization, values returned in the residual, trend, and season columns are invalid. The default value 0 is returned in these columns.

  • FALSE: The algorithm is initialized.

Other additional columns (same as the additional columns returned for the esd algorithm)

Same as the data types of the columns returned for the esd algorithm.

Same as the valid values of the columns returned for the esd algorithm.

If you specify esd.verbose=true when you call an anomaly detection function, the esd verbose mode is enabled.

In this case, all columns for the esd and ttest algorithms (excluding the anomaly, warmup, and anomalyLevel columns) in the verbose mode are returned.

istl-nsigma

anomaly

BOOLEAN

  • TRUE

  • FALSE

  • TRUE: The current data point is abnormal.

  • FALSE: The current data point is normal.

anomalyLevel

STRING

  • NORMAL

  • UNKNOW

  • NORMAL: No anomalies are detected for the current data point.

  • UNKNOW: Anomalies are detected for the current data point and are not classified.

trend

DOUBLE

All values of the DOUBLE type

The trending component in the original data.

If the algorithm is being initialized (value in the warmup column is TRUE), only the default value 0 is returned in this column.

season

DOUBLE

All values of the DOUBLE type

The periodic component in the original data.

If the algorithm is being initialized (value in the warmup column is TRUE), only the default value 0 is returned in this column.

residual

DOUBLE

All values of the DOUBLE type

The residual value of the original data after the periodic component and the trend component are removed.

If the algorithm is being initialized (value in the warmup column is TRUE), only the default value 0 is returned in this column.

warmup

BOOLEAN

  • TRUE

  • FALSE

  • TRUE: The algorithm is being initialized and does not detect anomalies.

    Note

    Four cycles of data points are required to initialize the algorithm. During initialization, values returned in the residual, trend, and season columns are invalid. The default value 0 is returned in these columns.

  • FALSE: The algorithm is initialized.

Other additional columns (same as the additional columns returned for the nsigma algorithm)

Same as the data types of the columns returned for the nsigma algorithm.

Same as the valid values of the columns returned for the nsigma algorithm

If you specify nsigma.verbose=true when you call an anomaly detection function, the nsigma verbose mode is enabled.

Training parameters

You can specify an algorithm and configure the training parameters to determine the model used to detect anomalies. The values of training parameters are cleared after you restart LindormTSDB. In this case, you must configure the training parameters again to train the model. The model is trained in real time during detection to adapt to learn and adapt the characteristics of the time series data.

Note

Take note of the following items when you configure training parameters:

  • The names of the parameters are not case-sensitive.

  • The values of training parameters can be digits and strings and cannot be NULL.

  • The values of the parameters must be within the specific ranges.

Algorithm

Parameter

Type

Valid value

Description

esd

compression

INTEGER

A positive integer. Valid values: (10,1000). Default value: 100.

The spatial complexity of the data structure in the algorithm. A larger value of this parameter indicates that the algorithm uses more memory during detection and returns more accurate results.

lenHistoryWindow

INTEGER

Valid values: positive integers that are equal to or larger than 20. Default value: null.

The length of the reference time window. If you specify a short reference time window, only the recent data points within the time window are used as references during the detection. If you set this parameter to null, all data points that are inserted after the first detection are used as references.

nsigma

lenHistoryWindow

INTEGER

Valid values: positive integers that are equal to or larger than 20. Default value: null.

The length of the reference time window. If you specify a short reference time window, only the recent data points within the time window are used as references during the detection. If you set this parameter to null, all data points that are inserted after the first detection are used as references.

ttest

lenDetectWindow

INTEGER

A positive integer. Default value: 10.

The length of the most recent time window within which you want to detect anomalies.

lenHistoryWindow

INTEGER

Valid values: positive integers that are equal to or larger than 20. Default value: 100.

The length of the reference time window. If you specify a short reference time window, only the recent data points within the time window are used as references during the detection. If you set this parameter to null, all data points that are inserted after the first detection are used as references.

Note

The value of this parameter must be larger than the value of lenDetectWindow.

istl-esd

frequency

VARCHAR

A string that consists of a digit and a time unit. Examples: 5M, 24H, and 1D.

Valid time units:

  • n/ns: nanosecond.

  • u/us: microsecond.

  • m/ms: millisecond.

  • s/S: second.

  • M/min: minute.

  • H/h: hour.

  • D/d: day.

The frequency at which the time series data is collected. For example, if one time series data point is collected per hour, set this parameter to 1H.

Important
  • If this parameter is not specified, the algorithm automatically calculates the frequency at which the time series data is collected. However, if a lot of values are missing in the time series data, the calculated frequency may be inaccurate.

  • If you specify the frequency parameter, the value of this parameter must be the same as that of the INTERVAL parameter specified in the SAMPLE BY INTERVAL statement.

periods

VARCHAR

A string that consists of a digit and a time unit. Examples: 5M, 24H, and 1D.

Valid time units:

  • n/ns: nanosecond.

  • u/us: microsecond.

  • m/ms: millisecond.

  • s/S: second.

  • M/min: minute.

  • H/h: hour.

  • D/d: day.

The total period length of the periodic data. You can use indexers to specify multiple period lengths. Example: periods[0]=1440;periods[1]=1880.

Note

If this parameter is not specified, the algorithm automatically calculates the period.

esd.*

N/A

The training parameters that are required to define the esd algorithm. These parameters are the same as the training parameters described in the esd section of this table. You can add the esd. prefix to the training parameters of the esd algorithm to configure these parameters. Example: esd.lenHistoryWindow=10.

istl-nsigma

frequency

VARCHAR

A string that consists of a digit and a time unit. Examples: 5M, 24H, and 1D.

Valid time units:

  • n/ns: nanosecond.

  • u/us: microsecond.

  • m/ms: millisecond.

  • s/S: second.

  • M/min: minute.

  • H/h: hour.

  • D/d: day.

The frequency at which the time series data is collected. For example, if one time series data point is collected per hour, set this parameter to 1H.

Important
  • If this parameter is not specified, the algorithm automatically calculates the frequency at which the time series data is collected. However, if a lot of values are missing in the time series data, the calculated frequency may be inaccurate.

  • If you specify the frequency parameter, the value of this parameter must be the same as that of the INTERVAL parameter specified in the SAMPLE BY INTERVAL statement.

periods

VARCHAR

A string that consists of a digit and a time unit. Examples: 5M, 24H, and 1D.

Valid time units:

  • n/ns: nanosecond.

  • u/us: microsecond.

  • m/ms: millisecond.

  • s/S: second.

  • M/min: minute.

  • H/h: hour.

  • D/d: day.

The total period length of the periodic data. You can use indexers to specify multiple period lengths. Example: periods[0]=1440;periods[1]=1880.

Note

If this parameter is not specified, the algorithm automatically calculates the period.

nsigma.*

N/A

The training parameters that are required to define the nsigma algorithm. These parameters are the same as the training parameters described in the nsigma section of this table. You can add the nsigma. prefix to the training parameters of the nsigma algorithm to configure these parameters. Example: nsigma.lenHistoryWindow=10.

Inference parameters

Inference parameters take effect only during anomaly detection and are not case-sensitive.

Algorithm

Parameter

Type

Valid value

Description

esd

alpha

DOUBLE

Default value: 0.1. Valid values: (0,1).

The sensitivity of anomaly detection. A larger value of this parameter indicates that the algorithm is more sensitive to anomalies and reports more anomalies.

direction

VARCHAR

Default value: Up.

The types of anomalies that you want to detect.

  • Up: Only the abnormal increasing of time series data is detected as anomalies.

  • Down: Only the abnormal decreasing of time series data is detected as anomalies.

  • Both: The abnormal increasing and decreasing in time series data are both detected as anomalies.

maxAnomalyRatio

DOUBLE

Default value: 0.3. Valid values: (0,1]. If you set this parameter to 1, no anomalies are returned.

The maximum ratio based on which anomalies are detected. For example, if you set maxAnomalyRatio to 0.3 and direction to Up, data points whose values are less than the 70th percentile are not detected as anomalies.

  • If you set direction to Up, you can configure this parameter to prevent data points with smaller values from being detected as anomalies.

  • If you set direction to Down, you can configure this parameter to prevent data points with larger values from being detected as anomalies.

warmupCount

INTEGER

A positive integer. Default value: 20.

The minimum number of data points that is required for the algorithm to start to report anomalies. For example, if you set this parameter to 20, the algorithm does not report anomalies when the number of data points that need to be detected is less than 20.

nsigma

n

DOUBLE

A non-zero floating-point number. Default value: 3.0.

  • If you set n to a positive number, the algorithm reports an anomaly when the difference between the current value and the average value is larger than the product of n and the standard deviation.

  • If you set n to a negative number, the algorithm reports an anomaly when the difference between the average value and the current value is larger than the product of n and the standard deviation.

warmupCount

INTEGER

A positive integer. Default value: 20.

The minimum number of data points that is required for the algorithm to start to report anomalies. For example, if you set this parameter to 20, the algorithm does not report anomalies when the number of data points that need to be detected is less than 20.

ttest

alpha

DOUBLE

Default value: 0.05. Valid values: (0,1).

The sensitivity of anomaly detection. A larger value of this parameter indicates that the algorithm is more sensitive to anomalies and reports more anomalies.

direction

VARCHAR

Default value: Up.

The types of anomalies that you want to detect.

  • Up: Only the abnormal increasing of time series data is detected as anomalies.

  • Down: Only the abnormal decreasing of time series data is detected as anomalies.

  • Both: The abnormal increasing and decreasing in time series data are both detected as anomalies.

istl-esd

esd.*

N/A

The inference parameters that are required to define the esd algorithm. These parameters are the same as the inference parameters described in the esd section of this table. You can add the esd. prefix to the inference parameters of the esd algorithm to configure these parameters. Example: esd.direction=Both.

istl-nsigma

nsigma.*

N/A

Define the inference parameters required by the nsigma algorithm. For more information, see Inference parameters of the nsigma algorithm. You can add the nsigma. prefix to the inference parameters of the nsigma algorithm to configure these parameters. Example: nsigma.n=5.

Examples

  • Example 1: Use the esd algorithm to detect anomalies in the temperature data within a specific time range in a time series table named sensor.

    SELECT device_id, region, time, anomaly_detect(temperature, 'esd') AS detect_result FROM sensor WHERE time >= '2022-01-01 00:00:00' and time < '2022-01-01 00:01:00' SAMPLE BY 0;

    The following result is returned:

    +-----------+----------+---------------------------+---------------+
    | device_id |  region  |           time            | detect_result |
    +-----------+----------+---------------------------+---------------+
    | F07A1260  | north-cn | 2022-01-01T00:00:00+08:00 | true          |
    | F07A1260  | north-cn | 2022-01-01T00:00:01+08:00 | false         |
    | F07A1260  | north-cn | 2022-01-01T00:00:02+08:00 | true          |
    | F07A1261  | south-cn | 2022-01-01T00:00:00+08:00 | false         |
    | F07A1261  | south-cn | 2022-01-01T00:00:01+08:00 | false         |
    | F07A1261  | south-cn | 2022-01-01T00:00:02+08:00 | false         |
    | F07A1261  | south-cn | 2022-01-01T00:00:03+08:00 | false         |
    +-----------+----------+---------------------------+---------------+
  • Example 2: Use the esd algorithm to detect anomalies in the temperature data of the F07A1260 device within a specific time range in the sensor table.

    SELECT device_id, region, time, anomaly_detect(temperature, 'esd') AS detect_result FROM sensor WHERE device_id in ('F07A1260') and time >= '2022-01-01 00:00:00' and time < '2022-01-01 00:01:00' SAMPLE BY 0;

    The following result is returned:

    +-----------+----------+---------------------------+---------------+
    | device_id |  region  |           time            | detect_result |
    +-----------+----------+---------------------------+---------------+
    | F07A1260  | north-cn | 2022-01-01T00:00:00+08:00 | true          |
    | F07A1260  | north-cn | 2022-01-01T00:00:01+08:00 | false         |
    | F07A1260  | north-cn | 2022-01-01T00:00:02+08:00 | true          |
    +-----------+----------+---------------------------+---------------+
  • Example 3: Use the esd algorithm and configure parameters to detect anomalies in the temperature data of the F07A1260 device within a specific time range in the sensor table.

    SELECT device_id, region, time, anomaly_detect(temperature, 'esd', 'lenHistoryWindow=30,maxAnomalyRatio=0.1') AS detect_result FROM sensor WHERE device_id in ('F07A1260') and time >= '2022-01-01 00:00:00' and time < '2022-01-01 00:01:00' SAMPLE BY 0;

    The following result is returned:

    +-----------+----------+---------------------------+---------------+
    | device_id |  region  |           time            | detect_result |
    +-----------+----------+---------------------------+---------------+
    | F07A1260  | north-cn | 2022-01-01T00:00:00+08:00 | false         |
    | F07A1260  | north-cn | 2022-01-01T00:00:01+08:00 | false         |
    | F07A1260  | north-cn | 2022-01-01T00:00:02+08:00 | true          |
    +-----------+----------+---------------------------+---------------+
  • Example 4: Use the nested downsampling operator MAX in the statement and specify the downsampling interval as 1 minute.

    SELECT time, anomaly_detect(max(temperature), 'esd') AS ad_result, max(temperature) AS rawVal FROM sensor SAMPLE BY 1m;

    The following result is returned:

    +---------------------------+-----------+-------------+
    |           time            | ad_result |   rawVal    |
    +---------------------------+-----------+-------------+
    | 2022-04-12T06:00:00+08:00 | null      | 923091.3175 |
    | 2022-04-11T08:00:00+08:00 | null      | 8035700     |
    | 2022-04-11T09:00:00+08:00 | null      | 8035690.25  |
    | 2022-04-11T10:00:00+08:00 | null      | 3306277.545 |
    | 2022-04-11T11:00:00+08:00 | null      | 5921167.787 |
    | 2022-04-11T12:00:00+08:00 | null      | 833541.304  |
    +---------------------------+-----------+-------------+
  • Example 5: Use the nested non-downsampling operator LATEST in the statement and specify the downsampling interval as 0.

    SELECT time, anomaly_detect(latest(temperature), 'esd') AS ad_result, latest(temperature) AS latestVal FROM sensor SAMPLE BY 0;

    The following result is returned:

    +---------------------------+-----------+-------------+
    |           time            | ad_result |  latestVal  |
    +---------------------------+-----------+-------------+
    | 2022-04-12T06:00:00+08:00 | false     | 923091.3175 |
    | 2022-04-13T07:00:00+08:00 | false     | 8037506.75  |
    | 2022-04-13T07:00:00+08:00 | false     | 50490.2     |
    +---------------------------+-----------+-------------+