To detect anomalies, you can use a prediction and anomaly detection function to predict time series curves and identify the Ksigma and quantiles of the errors between a predicted curve and an actual curve.

Functions

Function Description
ts_predicate_simple Uses default parameters to model time series data and performs simple time series prediction and anomaly detection.
ts_predicate_ar Uses an autoregressive (AR) model to model time series data and performs simple time series prediction and anomaly detection.
ts_predicate_arma Uses an autoregressive moving average (ARMA) model to model time series data and performs simple time series prediction and anomaly detection.
ts_predicate_arima Uses an autoregressive integrated moving average (ARIMA) model to model time series data and performs simple time series prediction and anomaly detection.
ts_regression_predict Accurately predicts the long-run trend for a single periodic time series with a certain tendency.

Scenario: This function can be used to predict metering data, network traffic, financial data, and different business data that follows certain rules.

ts_anomaly_filter Filters the anomalies detected during time series anomaly detection on multiple curves based on the custom anomaly mode. This function helps you quickly find abnormal instance curves.
Note The display items for all prediction and anomaly detection functions are the same. For more information about the output result and relevant description, see the output result and display item description of the ts_predicate_simple function.

ts_predicate_simple

Function format:
select ts_predicate_simple(x, y, nPred, isSmooth, samplePeriod, sampleMethod) 
The following table describes the parameters.
Parameter Description Value
x The time sequence. The time points along the x axis are sorted in ascending order. Each time point is a Unix timestamp. Unit: second.
y The sequence of the numeric values of the property under observation, corresponding to the specified time points. N/A
nPred The number of points for prediction. The value is of the Long type. Valid values: [1, 5 × p].
isSmooth Specifies whether to filter the raw data.

If you do not set this parameter, the default value true is used, indicating that the raw data will be filtered.

The value is of the Boolean type. Valid values:
  • true: filters the raw data.
  • false: does not filter the raw data.
Default value: true.
samplePeriod The period during which the current time series data is sampled. The value is of the Long type. Valid values: [1, 86399].
sampleMethod The method for sampling the data in the sampling window. Valid values:
  • avg: samples the average value of the data in the window.
  • max: samples the maximum value of the data in the window.
  • min: samples the minimum value of the data in the window.
  • sum: samples the sum of the data in the window.
Example:
  • The statement for query and analysis is as follows:
    * | select ts_predicate_simple(stamp, value, 6, 1, 'avg') from (select __time__ - __time__ % 60 as stamp, avg(v) as value from log GROUP BY stamp order by stamp)
  • The following figure shows the output result:



The following table describes the display items.
Display item Description
Horizontal axis unixtime The Unix timestamp of the data. Unit: second.
Vertical axis src The raw data.
predict The data after filtering.
upper The upper limit of the prediction. By default, the confidence level is 0.85, which cannot be modified.
lower The lower limit of the prediction. By default, the confidence level is 0.85, which cannot be modified.
anomaly_prob The probability that the point is an anomaly. Valid values: [0, 1].

ts_predicate_ar

Function format:
select ts_predicate_ar(x, y, p, nPred, isSmooth, samplePeriod, sampleMethod) 
The following table describes the parameters.
Parameter Description Value
x The time sequence. The time points along the x axis are sorted in ascending order. Each time point is a Unix timestamp. Unit: second.
y The sequence of the numeric values of the property under observation, corresponding to the specified time points. N/A
p The order of the AR model. The value is of the Long type. Valid values: [2, 8].
nPred The number of points for prediction. The value is of the Long type. Valid values: [1, 5 × p].
isSmooth Specifies whether to filter the raw data.

If you do not set this parameter, the default value true is used, indicating that the raw data will be filtered.

The value is of the Boolean type. Valid values:
  • true: filters the raw data.
  • false: does not filter the raw data.
Default value: true.
samplePeriod The period during which the current time series data is sampled. The value is of the Long type. Valid values: [1, 86399].
sampleMethod The method for sampling the data in the sampling window. Valid values:
  • avg: samples the average value of the data in the window.
  • max: samples the maximum value of the data in the window.
  • min: samples the minimum value of the data in the window.
  • sum: samples the sum of the data in the window.
For example, the statement for query and analysis is as follows:
* | select ts_predicate_ar(stamp, value, 3, 4, 1, 'avg') from (select __time__ - __time__ % 60 as stamp, avg(v) as value from log GROUP BY stamp order by stamp)

ts_predicate_arma

Function format:
select ts_predicate_arma(x, y, p, q, nPred, isSmooth, samplePeriod, sampleMethod) 
The following table describes the parameters.
Parameter Description Value
x The time sequence. The time points along the x axis are sorted in ascending order. Each time point is a Unix timestamp. Unit: second.
y The sequence of the numeric values of the property under observation, corresponding to the specified time points. N/A
p The order of the AR model. The value is of the Long type. Valid values: [2, 100].
q The order of the ARMA model. The value is of the Long type. Valid values: [2, 8].
nPred The number of points for prediction. The value is of the Long type. Valid values: [1, 5 × p].
isSmooth Specifies whether to filter the raw data.

If you do not set this parameter, the default value true is used, indicating that the raw data will be filtered.

The value is of the Boolean type. Valid values:
  • true: filters the raw data.
  • false: does not filter the raw data.
Default value: true.
samplePeriod The period during which the current time series data is sampled. The value is of the Long type. Valid values: [1, 86399].
sampleMethod The method for sampling the data in the sampling window. Valid values:
  • avg: samples the average value of the data in the window.
  • max: samples the maximum value of the data in the window.
  • min: samples the minimum value of the data in the window.
  • sum: samples the sum of the data in the window.
For example, the statement for query and analysis is as follows:
* | select ts_predicate_arma(stamp, value, 3, 2, 4, 1, 'avg') from (select __time__ - __time__ % 60 as stamp, avg(v) as value from log GROUP BY stamp order by stamp) 

ts_predicate_arima

Function format:
select ts_predicate_arima(x, y, p, d, qnPred, isSmooth, samplePeriod, sampleMethod) 
The following table describes the parameters.
Parameter Description Value
x The time sequence. The time points along the x axis are sorted in ascending order. Each time point is a Unix timestamp. Unit: second.
y The sequence of the numeric values of the property under observation, corresponding to the specified time points. N/A
p The order of the AR model. The value is of the Long type. Valid values: [2, 8].
d The order of the ARIMA model. The value is of the Long type. Valid values: [1, 3].
q The order of the ARMA model. The value is of the Long type. Valid values: [2, 8].
nPred The number of points for prediction. The value is of the Long type. Valid values: [1, 5 × p].
isSmooth Specifies whether to filter the raw data.

If you do not set this parameter, the default value true is used, indicating that the raw data will be filtered.

The value is of the Boolean type. Valid values:
  • true: filters the raw data.
  • false: does not filter the raw data.
Default value: true.
samplePeriod The period during which the current time series data is sampled. The value is of the Long type. Valid values: [1, 86399].
sampleMethod The method for sampling the data in the sampling window. Valid values:
  • avg: samples the average value of the data in the window.
  • max: samples the maximum value of the data in the window.
  • min: samples the minimum value of the data in the window.
  • sum: samples the sum of the data in the window.
For example, the statement for query and analysis is as follows:
* | select ts_predicate_arima(stamp, value, 3, 1, 2, 4, 1, 'avg') from (select __time__ - __time__ % 60 as stamp, avg(v) as value from log GROUP BY stamp order by stamp)

ts_regression_predict

Function format:
select ts_regression_predict(x, y, nPred, algo_type,processType, samplePeriod, sampleMethod)
The following table describes the parameters.
Parameter Description Value
x The time sequence. The time points along the x axis are sorted in ascending order. Each time point is a Unix timestamp. Unit: second.
y The sequence of the numeric values of the property under observation, corresponding to the specified time points. N/A
nPred The number of points for prediction. The value is of the Long type. Valid values: [1, 500].
algo_type The algorithm type for prediction. Valid values:
  • origin: uses the Gradient Boosted Regression Tree (GBRT) algorithm for prediction.
  • forest: uses the GBRT algorithm for prediction based on the trend components decomposed by Seasonal and Trend decomposition using Loess (STL), and then uses the additive model to sum up the decomposed components and obtains the predicted data.
  • linear: uses the Linear Regression algorithm for prediction based on the trend components decomposed by STL, and then uses the additive model to sum up the decomposed components and obtains the predicted data.
processType Specifies whether to preprocess the data.
  • 0: does not preprocess the data before the data is used for prediction.
  • 1: removes abnormal data before the data is used for prediction.
samplePeriod The period during which the current time series data is sampled. The value is of the Long type. Valid values: [1, 86399].
sampleMethod The method for sampling the data in the sampling window. Valid values:
  • avg: samples the average value of the data in the window.
  • max: samples the maximum value of the data in the window.
  • min: samples the minimum value of the data in the window.
  • sum: samples the sum of the data in the window.
Example:
  • The statement for query and analysis is as follows:
    * and h : nu2h05202.nu8 and m: NET |  select ts_regression_predict(stamp, value, 200, 'origin', 1, 'avg') from (select __time__ - __time__ % 60 as stamp, avg(v) as value from log GROUP  BY  stamp order by stamp)
  • The following figure shows the output result:



The following table describes the display items.
Display item Description
Horizontal axis unixtime The Unix timestamp of the data. Unit: second.
Vertical axis src The raw data.
predict The predicted data.

ts_anomaly_filter

Function format:
select ts_anomaly_filter(lineName, ts, ds, preds, probs, nWatch, anomalyType)
The following table describes the parameters.
Parameter Description Value
lineName The name of each curve. The value is of the Varchar type. N/A
ts The time sequence of the curve. The value is an array of the Double type. The time points are sorted in ascending order. N/A
ds The actual value sequence of the curve. The value is an array of the Double type. The actual values correspond to the time points specified by the ts parameter in one-to-one mode. N/A
preds The predicted value sequence of the curve. The value is an array of the Double type. The predicted values correspond to the time points specified by the ts parameter in one-to-one mode. N/A
probs The anomaly detection result sequence of the curve. The value is an array of the Double type. The anomaly detection results correspond to the time points specified by the ts parameter in one-to-one mode. N/A
nWatch The number of the recently observed actual values on the curve. The value is of the Long type. The value must be smaller than the number of time points on the curve. N/A
anomalyType The type of anomaly to be filtered. The value is of the Long type.
  • 0: all anomalies.
  • 1: positive anomalies.
  • -1: negative anomalies.
Example:
  • The statement for query and analysis is as follows:
    * | select res.name, res.ts, res.ds, res.preds, res.probs 
         from ( 
             select ts_anomaly_filter(name, ts, ds, preds, probs, cast(5 as bigint), cast(1 as bigint)) as res 
           from (
             select name, res[1] as ts, res[2] as ds, res[3] as preds, res[4] as uppers, res[5] as lowers, res[6] as probs 
         from (
             select name, array_transpose(ts_predicate_ar(stamp, value, 10)) as res 
             from (
               select name, stamp, value from log where name like '%asg-%') group by name)) );
  • The output result is as follows:
    | name                     | ts                                                               | ds          | preds     | probs       |
    | ------------------------ | ---------------------------------------------------------------- | ----------- | --------- | ----------- |
    | asg-bp1hylzdi2wx7civ0ivk | [1.5513696E9, 1.5513732E9, 1.5513768E9, 1.5513804E9] | [1,2,3,NaN] | [1,2,3,4] | [0,0,1,NaN] |