Simple Log Service uses the anomaly detection feature to automatically identify abnormal states in service systems and their root causes. The feature combines machine learning with a metric's current pattern to detect deviations from normal behavior. The multivariate pattern identification functions support multi-dimensional anomaly detection across correlated metrics.
Multivariate pattern recognition function list
|
Function name |
Syntax |
Description |
Data type of the return value |
|
Identifies and returns a multivariate pattern based on the specified samples and sample weights. The sample weights are optional. Statistical patterns cover a variety of statistics and joint statistics, such as a mean, standard deviation, and covariance matrix. |
varchar |
|
|
Merges multivariate patterns that are returned by the summarize function. The multivariate patterns can be patterns that are obtained by learning from the same dataset in different stages or patterns that are obtained by learning from two independent datasets. For more information, see summarize function. |
varchar |
|
|
normalize_vector(varchar summary, array(double) x_vector) |
Normalizes a new sample vector specified by the |
array(double) |
|
|
standardize_vector(varchar summary, array(double) x_vector) |
Standardizes a new sample vector specified by the |
array(double) |
|
|
mah_distance(varchar summary, array(double) x_vector) |
Calculates the Mahalanobis distance for a new sample vector specified by the |
double |
|
|
standard_distance(varchar summary, double metric_value, int element_index) |
Calculates the standardized distance for a metric specified by the |
double |
|
|
Calculates the Mahalanobis distance for a new sample vector specified by the If |
array(double) |
summarize function
The summarize function identifies and returns a multivariate pattern based on the specified samples and sample weights. The sample weights are optional. Statistical patterns cover a variety of statistics and joint statistics, such as a mean, standard deviation, and covariance matrix.
varchar summarize(array(array(double)) data_samples)
Or
varchar summarize(array(array(double)) data_samples, array(double) weights)
|
Parameter |
Description |
|
|
A two-dimensional array. The array can be used as a two-dimensional table. Each column specifies a variable. Each row specifies the variable values of a sample. |
|
|
Optional. A one-dimensional array with the same length as the first dimension of |
Example
-
Query statement
* | with data_table as ( select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features ) select entity_group, summarize(array_agg(features)) as statistical_summary from data_table group by entity_group -
Return value
entity_group
statistical_summary
A
{ "sampleCount": 8, "vectorSize": 4, "means": [ 11.5, 12.5, 9.25, 0.0 ], "stdDevs": [ 6.87386354243376, 6.87386354243376, 7.361215932167728, 0.0 ], "variances": [ 47.25, 47.25, 54.1875, 0.0 ], "mins": [ 1.0, 2.0, 1.0, 0.0 ], "maxs": [ 22.0, 23.0, 21.0, 0.0 ], "covariance": [ [ 47.25, 47.25, 19.125, 0.0 ], [ 47.25, 47.25, 19.125, 0.0 ], [ 19.125, 19.125, 54.1875, 0.0 ], [ 0.0, 0.0, 0.0, 0.0 ] ], "correlations": [ [ 1.0, 1.0, 0.37796447300922725, 0.0 ], [ 1.0, 1.0, 0.37796447300922725, 0.0 ], [ 0.37796447300922725, 0.37796447300922725, 1.0, 0.0 ], [ 0.0, 0.0, 0.0, 1.0 ] ], "sums": [ 92.0, 100.0, 74.0, 0.0 ], "weightSum": 8.0, "sumProducts": [ [ 1436.0, 1528.0, 1004.0, 0.0 ], [ 1528.0, 1628.0, 1078.0, 0.0 ], [ 1004.0, 1078.0, 1118.0, 0.0 ], [ 0.0, 0.0, 0.0, 0.0 ] ], "isSummarized": true }Response parameters
Parameter
Description
sampleCountThe number of samples.
vectorSizeThe length of the vector.
meansThe average value of each component across all vectors.
stdDevsThe standard deviation of each component across all vectors.
variancesThe variance of each component across all vectors.
minsThe minimum value of each component across all vectors.
maxsThe maximum value of each component across all vectors.
covarianceThe covariance matrix between the components of all vectors.
correlationsThe correlation coefficient matrix between the components of all vectors.
sumsThe sum of each component across all vectors.
weightSumThe sum of all sample weights.
sumProductsAn intermediate result used when merging statistical patterns.
isSummarizedIndicates whether the statistical pattern calculation returned normally.
-
true: The request was successful.
-
false: The request failed.
-
merge_summary function
You can use the summarize function to merge patterns learned at different stages, such as patterns learned from the same dataset at different times or patterns from two independent datasets.
varchar merge_summary(varchar summary1, varchar summary2)
Or
varchar merge_summary(varchar summary1, double weight1, varchar summary2, double weight2)
|
Parameter |
Description |
|
|
A multivariate pattern that is returned by the summarize function. For more information, see summarize function. |
|
|
The overall weight for the summary1 pattern. |
|
|
The pattern is derived from the summarize function. |
|
|
Specifies the overall weight for the summary2 pattern. |
Example
-
Query statement
* | with data_table_01 as ( select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features ), summaries_01 as ( select entity_group, summarize(array_agg(features)) as statistical_summary from data_table_01 group by entity_group ), data_table_02 as ( select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features ), summaries_02 as ( select entity_group, summarize(array_agg(features)) as statistical_summary from data_table_02 group by entity_group ) select s1.entity_group, merge_summary(s1.statistical_summary, s2.statistical_summary) as statistical_summary from summaries_01 as s1 join summaries_02 as s2 on s1.entity_group = s2.entity_group -
Query and analysis results
statistical_summaryis the aggregation pattern.entity_group
statistical_summary
2
{ "sampleCount": 8, "vectorSize": 4, "means": [ 11.5, 12.5, 9.25, 0.0 ], "stdDevs": [ 6.87386354243376, 6.87386354243376, 7.361215932167728, 0.0 ], "variances": [ 47.25, 47.25, 54.1875, 0.0 ], "mins": [ 1.0, 2.0, 1.0, 0.0 ], "maxs": [ 22.0, 23.0, 21.0, 0.0 ], "covariance": [ [ 47.25, 47.25, 19.125, 0.0 ], [ 47.25, 47.25, 19.125, 0.0 ], [ 19.125, 19.125, 54.1875, 0.0 ], [ 0.0, 0.0, 0.0, 0.0 ] ], "correlations": [ [ 1.0, 1.0, 0.37796447300922725, 0.0 ], [ 1.0, 1.0, 0.37796447300922725, 0.0 ], [ 0.37796447300922725, 0.37796447300922725, 1.0, 0.0 ], [ 0.0, 0.0, 0.0, 1.0 ] ], "sums": [ 92.0, 100.0, 74.0, 0.0 ], "weightSum": 8.0, "sumProducts": [ [ 1436.0, 1528.0, 1004.0, 0.0 ], [ 1528.0, 1628.0, 1078.0, 0.0 ], [ 1004.0, 1078.0, 1118.0, 0.0 ], [ 0.0, 0.0, 0.0, 0.0 ] ], "isSummarized": true }Return parameters:
Parameter
Description
sampleCountThe number of samples.
vectorSizeThe length of the vector.
meansThe average value of each component across all vectors.
stdDevsThe standard deviation of each component across all vectors.
variancesThe variance of each component across all vectors.
minsThe minimum value of each component across all vectors.
maxsThe maximum value of each component across all vectors.
covarianceThe covariance matrix between the components of all vectors.
correlationsThe correlation coefficient matrix between the components of all vectors.
sumsThe sum of each component across all vectors.
weightSumThe sum of all sample weights.
sumProductsAn intermediate result used when merging statistical patterns.
isSummarizedIndicates whether the statistical pattern calculation returned normally.
-
true: The request was successful.
-
false: The request failed.
-
normalize_vector function
You can use the multivariate pattern summary obtained from the summarize function to normalize the new sample vector x_vector, which maps each of its components to the interval [0, 1].
array(double) normalize_vector(varchar summary, array(double) x_vector)
|
Parameter |
Description |
|
|
The pattern is derived from the learning process of the summarize function. |
|
|
The new sample data. |
Example
-
Query statement
* | with data_table as ( select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features ), summaries as ( select entity_group, summarize(array_agg(features)) as statistical_summary from data_table group by entity_group ) select t1.entity_id, t1.entity_group, normalize_vector(t2.statistical_summary, t1.features) as normalized_features from data_table as t1 join summaries as t2 on t1.entity_group = t2.entity_group -
Query and analysis results
The
normalized_featuresparameter indicates the normalization results of the sample vector specified by thex_vectorparameter.entity_id
entity_group
normalized_features
2
A
[0.14285714285714286,0.14285714285714286,0.25,0.5]
4
A
[0.42857142857142857,0.42857142857142857,0.0,0.5]
3
A
[0.2857142857142857,0.2857142857142857,0.4,0.5]
...
...
...
standardize_vector function
Use the multivariate pattern summary from the summarize function to standardize the new sample vector x_vector so that its components have a mean of 0 and a standard deviation of 1.
array(double) standardize_vector(varchar summary, array(double) x_vector)
|
Parameter |
Description |
|
|
The pattern is derived from the learning process of the summarize function. |
|
|
The new sample data. |
Example
-
Query statement
* | with data_table as ( select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features ), summaries as ( select entity_group, summarize(array_agg(features)) as statistical_summary from data_table group by entity_group ) select t1.entity_id, t1.entity_group, standardize_vector(t2.statistical_summary, t1.features) as standardized_features from data_table as t1 join summaries as t2 on t1.entity_group = t2.entity_group -
Query and analysis results
The
standardized_featuresparameter indicates the standardization results of the sample vector specified by thex_vectorparameter.entity_id
entity_group
standardized_features
2
A
[-1.0910894511799619,-1.0910894511799619,-0.4415031470273609,0.0]
4
A
[-0.21821789023599237,-0.21821789023599237,-1.1207387578386854,0.0]
3
A
[-0.6546536707079771,-0.6546536707079771,-0.03396178054056622,0.0]
...
...
...
mah_distance function
The mah_distance function calculates the Mahalanobis distance for a new sample vector specified by the x_vector parameter based on a pattern specified by the summary parameter. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function. The Mahalanobis distance accounts for scale differences between variables and measures the distance between the standardized sample vector and the centroid. A Mahalanobis distance of 1 means the sample vector is at the average distance from the centroid across all vectors.
double mah_distance(varchar summary, array(double) x_vector)
|
Parameter |
Description |
|
|
The pattern is derived from the learning process of the summarize function. |
|
|
The new sample data. |
Example
-
Query statement
* | with data_table as ( select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features ), summaries as ( select entity_group, summarize(array_agg(features)) as statistical_summary from data_table group by entity_group ) select t1.entity_id, t1.entity_group, mah_distance(t2.statistical_summary, t1.features) as std_distance from data_table as t1 join summaries as t2 on t1.entity_group = t2.entity_group -
Return value
The
std_distanceparameter indicates the Mahalanobis distance of the sample vector specified by thex_vectorparameter.entity_id
entity_group
std_distance
8
A
2.386927730244857
7
A
1.6809080087793125
1
A
1.5554594371997328
...
...
...
standard_distance function
The standard_distance function calculates the standardized distance for a metric specified by the metric_value parameter based on a pattern specified by the summary parameter. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function. Unlike the Mahalanobis distance, which measures the standardized distance between a multi-metric vector and its centroid, the standardized distance measures the distance for a single metric within the vector. The element_index parameter specifies the index of the metric (starting from 0). The metric_value parameter specifies the value of the metric.
double standard_distance(varchar summary, double metric_value, int element_index)
|
Parameter |
Description |
|
|
A pattern learned by the summarize function. |
|
|
New sample data |
|
|
The index of a specified element in the |
Example
-
Query statement
* | with data_table as ( select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features ), summaries as ( select entity_group, summarize(array_agg(features)) as statistical_summary from data_table group by entity_group ) select t1.entity_id, t1.entity_group, standard_distance(t2.statistical_summary, 30, 1) as std_distance from data_table as t1 join summaries as t2 on t1.entity_group = t2.entity_group -
Query and analysis results
std_distanceis the standardized distance of the input samplemetric_valueat the specified index.entity_id
entity_group
std_distance
8
A
2.386927730244857
7
A
1.6809080087793125
1
A
1.5554594371997328
...
...
...
anomaly_level function
The anomaly_level function calculates the Mahalanobis distance for a new sample vector specified by the x_vector parameter based on a pattern specified by the summary parameter and rounds down each distance value to obtain different anomaly probability levels. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function. A return value of 0.1 indicates a 10% abnormal probability (first-level anomaly). A return value of 0.01 indicates a 1% probability (second-level). A return value of 0.001 indicates 0.1% (third-level). A return value of 0.0001 indicates 0.01% (fourth-level). Higher anomaly levels correspond to lower probabilities and greater suspicion of anomalies. You can configure a threshold to filter results, for example, retaining only fourth-level anomalies and above.
If you specify the element_index parameter, the function calculates the anomaly probability only for the component at that index. Otherwise, it calculates the anomaly probability for the entire vector.
double anomaly_level(varchar summary, array(double) x_vector)
Or
double anomaly_level(varchar summary, array(double) x_vector, int element_index)
|
Parameter |
Description |
|
|
The summarize function uses a learning process to generate a pattern. |
|
|
The new sample data. |
|
|
Optional. The element at the specified index in the |
Example
-
Query statement
* | with dummy as ( select sequence(1, 1000) as seq_data, count(*) as record_count from log ), sample_data as ( select 'G1' as group_id, s.seq_num, -- Generate 1,000 two-dimensional random vectors that are distributed around the (100, 5000) range. The standard deviation values of two components are 20 and 500. inverse_normal_cdf(100, 20, random()) as x1, inverse_normal_cdf(5000, 500, rand()) as x2 from dummy, unnest(seq_data) as s(seq_num) ), data_summary as ( select group_id, summarize(array_agg(array[x1, x2])) as metric_summary from sample_data group by group_id ), new_data as ( select 'G1' as group_id, 1001 as object_id, 100.0 as x1, 5000.0 as x2 union all select 'G1' as group_id, 1002 as object_id, 118.0 as x1, 5450.0 as x2 union all select 'G1' as group_id, 1003 as object_id, 138.0 as x1, 5950.0 as x2 union all select 'G1' as group_id, 1004 as object_id, 158.0 as x1, 6450.0 as x2 union all select 'G1' as group_id, 1005 as object_id, 178.0 as x1, 6950.0 as x2 union all select 'G1' as group_id, 1006 as object_id, 198.0 as x1, 7450.0 as x2 union all select 'G1' as group_id, 1007 as object_id, 318.0 as x1, 10000.0 as x2 ) select n.group_id, json_extract(s.metric_summary, '$.means') as metric_vector_mean, json_extract(s.metric_summary, '$.covariance') as metric_covariance, n.object_id, n.x1, n.x2, anomaly_level(s.metric_summary, array[x1, x2]) as anomaly_level from data_summary as s join new_data as n on s.group_id = n.group_id order by n.group_id, n.object_id limit 100000 -
Query and analysis results
The
anomaly_levelparameter indicates the abnormal probability of the sample vector specified by thex_vectorparameter.group_id
object_id
anomaly_level
G1
1007
13.0
G1
1006
5.0
G1
1005
4.0
...
...
...