Simple Log Service provides the anomaly detection feature to identify anomalies in service systems and the root causes of the anomalies. The anomaly detection feature can automatically identify abnormal metric changes based on the current pattern of a metric and machine learning. You can use multivariate pattern identification functions to identify metric patterns. This topic describes the multivariate pattern identification functions.
Functions
Function | Syntax | Description | Data type of the return value |
| Identifies and returns a multivariate pattern based on the specified samples and sample weights. The sample weights are optional. Statistical patterns cover a variety of statistics and joint statistics, such as a mean, standard deviation, and covariance matrix. | varchar | |
| Merges multivariate patterns that are returned by the summarize function. The multivariate patterns can be patterns that are obtained by learning from the same dataset in different stages or patterns that are obtained by learning from two independent datasets. For more information, see summarize function. | varchar | |
normalize_vector(varchar summary, array(double) x_vector) | Normalizes a new sample vector specified by the | array(double) | |
standardize_vector(varchar summary, array(double) x_vector) | Standardizes a new sample vector specified by the | array(double) | |
mah_distance(varchar summary, array(double) x_vector) | Calculates the Mahalanobis distance for a new sample vector specified by the | double | |
standard_distance(varchar summary, double metric_value, int element_index) | Calculates the standardized distance for a metric specified by the | double | |
| Calculates the Mahalanobis distance for a new sample vector specified by the If you specify a value for the | array(double) |
summarize function
The summarize function identifies and returns a multivariate pattern based on the specified samples and sample weights. The sample weights are optional. Statistical patterns cover a variety of statistics and joint statistics, such as a mean, standard deviation, and covariance matrix.
varchar summarize(array(array(double)) data_samples)Or
varchar summarize(array(array(double)) data_samples, array(double) weights)Parameter | Description |
| A two-dimensional array. The array can be used as a two-dimensional table. Each column specifies a variable. Each row specifies the variable values of a sample. |
| Optional. The sample weight. The value is a one-dimensional array in the same length as one of the one-dimensional arrays that constitute the two-dimensional array specified by the |
Example
Query statement
* | with data_table as ( select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features ) select entity_group, summarize(array_agg(features)) as statistical_summary from data_table group by entity_groupQuery and analysis results
entity_group
statistical_summary
A
{ "sampleCount": 8, "vectorSize": 4, "means": [ 11.5, 12.5, 9.25, 0.0 ], "stdDevs": [ 6.87386354243376, 6.87386354243376, 7.361215932167728, 0.0 ], "variances": [ 47.25, 47.25, 54.1875, 0.0 ], "mins": [ 1.0, 2.0, 1.0, 0.0 ], "maxs": [ 22.0, 23.0, 21.0, 0.0 ], "covariance": [ [ 47.25, 47.25, 19.125, 0.0 ], [ 47.25, 47.25, 19.125, 0.0 ], [ 19.125, 19.125, 54.1875, 0.0 ], [ 0.0, 0.0, 0.0, 0.0 ] ], "correlations": [ [ 1.0, 1.0, 0.37796447300922725, 0.0 ], [ 1.0, 1.0, 0.37796447300922725, 0.0 ], [ 0.37796447300922725, 0.37796447300922725, 1.0, 0.0 ], [ 0.0, 0.0, 0.0, 1.0 ] ], "sums": [ 92.0, 100.0, 74.0, 0.0 ], "weightSum": 8.0, "sumProducts": [ [ 1436.0, 1528.0, 1004.0, 0.0 ], [ 1528.0, 1628.0, 1078.0, 0.0 ], [ 1004.0, 1078.0, 1118.0, 0.0 ], [ 0.0, 0.0, 0.0, 0.0 ] ], "isSummarized": true }Parameters in the query and analysis results
Parameter
Description
sampleCountThe number of samples.
vectorSizeThe length of the vector.
meansThe average values of all vector components.
stdDevsThe standard deviation values of all vector components.
variancesThe variance values of all vector components.
minsThe minimum values of all vector components.
maxsThe maximum values of all vector components.
covarianceThe covariance matrix of all vector components.
correlationsThe correlation coefficient matrix of all vector components.
sumsThe sum of all vector components.
weightSumThe sum of all sample weights.
sumProductsThe intermediate results that are used to merge statistical patterns.
isSummarizedIndicates whether a normal multivariate pattern is returned. Valid values:
true
false
merge_summary function
The merge_summary function merges multivariate patterns that are returned by the summarize function. The multivariate patterns can be patterns that are obtained by learning from the same dataset in different stages or patterns that are obtained by learning from two independent datasets. For more information, see summarize function.
varchar merge_summary(varchar summary1, varchar summary2)Or
varchar merge_summary(varchar summary1, double weight1, varchar summary2, double weight2)Parameter | Description |
| A multivariate pattern that is returned by the summarize function. For more information, see summarize function. |
| The weight of the multivariate pattern specified by the summary1 parameter. |
| A multivariate pattern that is returned by the summarize function. For more information, see summarize function. |
| The weight of the multivariate pattern specified by the summary2 parameter. |
Example
Query statement
* | with data_table_01 as ( select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features ), summaries_01 as ( select entity_group, summarize(array_agg(features)) as statistical_summary from data_table_01 group by entity_group ), data_table_02 as ( select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features ), summaries_02 as ( select entity_group, summarize(array_agg(features)) as statistical_summary from data_table_02 group by entity_group ) select s1.entity_group, merge_summary(s1.statistical_summary, s2.statistical_summary) as statistical_summary from summaries_01 as s1 join summaries_02 as s2 on s1.entity_group = s2.entity_groupQuery and analysis results
The
statistical_summaryparameter indicates the multivariate pattern that is obtained after the merge operation.entity_group
statistical_summary
2
{ "sampleCount": 8, "vectorSize": 4, "means": [ 11.5, 12.5, 9.25, 0.0 ], "stdDevs": [ 6.87386354243376, 6.87386354243376, 7.361215932167728, 0.0 ], "variances": [ 47.25, 47.25, 54.1875, 0.0 ], "mins": [ 1.0, 2.0, 1.0, 0.0 ], "maxs": [ 22.0, 23.0, 21.0, 0.0 ], "covariance": [ [ 47.25, 47.25, 19.125, 0.0 ], [ 47.25, 47.25, 19.125, 0.0 ], [ 19.125, 19.125, 54.1875, 0.0 ], [ 0.0, 0.0, 0.0, 0.0 ] ], "correlations": [ [ 1.0, 1.0, 0.37796447300922725, 0.0 ], [ 1.0, 1.0, 0.37796447300922725, 0.0 ], [ 0.37796447300922725, 0.37796447300922725, 1.0, 0.0 ], [ 0.0, 0.0, 0.0, 1.0 ] ], "sums": [ 92.0, 100.0, 74.0, 0.0 ], "weightSum": 8.0, "sumProducts": [ [ 1436.0, 1528.0, 1004.0, 0.0 ], [ 1528.0, 1628.0, 1078.0, 0.0 ], [ 1004.0, 1078.0, 1118.0, 0.0 ], [ 0.0, 0.0, 0.0, 0.0 ] ], "isSummarized": true }Parameters in the query and analysis results
Parameter
Description
sampleCountThe number of samples.
vectorSizeThe length of the vector.
meansThe average values of all vector components.
stdDevsThe standard deviation values of all vector components.
variancesThe variance values of all vector components.
minsThe minimum values of all vector components.
maxsThe maximum values of all vector components.
covarianceThe covariance matrix of all vector components.
correlationsThe correlation coefficient matrix of all vector components.
sumsThe sum of all vector components.
weightSumThe sum of all sample weights.
sumProductsThe intermediate results that are used to merge statistical patterns.
isSummarizedIndicates whether a normal multivariate pattern is returned. Valid values:
true
false
normalize_vector function
The normalize_vector function normalizes a new sample vector specified by the x_vector parameter based on a pattern specified by the summary parameter. Make sure that each component of the vector is mapped to the [0, 1] interval. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function.
array(double) normalize_vector(varchar summary, array(double) x_vector)Parameter | Description |
| A multivariate pattern that is returned by the summarize function. For more information, see summarize function. |
| The new sample data. |
Example
Query statement
* | with data_table as ( select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features ), summaries as ( select entity_group, summarize(array_agg(features)) as statistical_summary from data_table group by entity_group ) select t1.entity_id, t1.entity_group, normalize_vector(t2.statistical_summary, t1.features) as normalized_features from data_table as t1 join summaries as t2 on t1.entity_group = t2.entity_groupQuery and analysis results
The
normalized_featuresparameter indicates the normalization results of the sample vector specified by thex_vectorparameter.entity_id
entity_group
normalized_features
2
A
[0.14285714285714286,0.14285714285714286,0.25,0.5]
4
A
[0.42857142857142857,0.42857142857142857,0.0,0.5]
3
A
[0.2857142857142857,0.2857142857142857,0.4,0.5]
...
...
...
standardize_vector function
The standardize_vector function standardizes a new sample vector specified by the x_vector parameter based on a pattern specified by the summary parameter. Make sure that the components of the vector have a mean of 0 and a standard deviation of 1. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function.
array(double) standardize_vector(varchar summary, array(double) x_vector)Parameter | Description |
| A multivariate pattern that is returned by the summarize function. For more information, see summarize function. |
| The new sample data. |
Example
Query statement
* | with data_table as ( select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features ), summaries as ( select entity_group, summarize(array_agg(features)) as statistical_summary from data_table group by entity_group ) select t1.entity_id, t1.entity_group, standardize_vector(t2.statistical_summary, t1.features) as standardized_features from data_table as t1 join summaries as t2 on t1.entity_group = t2.entity_groupQuery and analysis results
The
standardized_featuresparameter indicates the standardization results of the sample vector specified by thex_vectorparameter.entity_id
entity_group
standardized_features
2
A
[-1.0910894511799619,-1.0910894511799619,-0.4415031470273609,0.0]
4
A
[-0.21821789023599237,-0.21821789023599237,-1.1207387578386854,0.0]
3
A
[-0.6546536707079771,-0.6546536707079771,-0.03396178054056622,0.0]
...
...
...
mah_distance function
The mah_distance function calculates the Mahalanobis distance for a new sample vector specified by the x_vector parameter based on a pattern specified by the summary parameter. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function. You can use the Mahalanobis distance to handle the scale differences between different variables in an effective manner. The Mahalanobis distance measures the distance between the sample vector specified by the x_vector parameter after the vector is standardized and the centroid of the vector. If the Mahalanobis distance is 1, the distance between the sample vector and the centroid of the vector is equal to the average distance between all vectors and the centroid of the vectors.
double mah_distance(varchar summary, array(double) x_vector)Parameter | Description |
| A multivariate pattern that is returned by the summarize function. For more information, see summarize function. |
| The new sample data. |
Example
Query statement
* | with data_table as ( select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features ), summaries as ( select entity_group, summarize(array_agg(features)) as statistical_summary from data_table group by entity_group ) select t1.entity_id, t1.entity_group, mah_distance(t2.statistical_summary, t1.features) as std_distance from data_table as t1 join summaries as t2 on t1.entity_group = t2.entity_groupQuery and analysis results
The
std_distanceparameter indicates the Mahalanobis distance of the sample vector specified by thex_vectorparameter.entity_id
entity_group
std_distance
8
A
2.386927730244857
7
A
1.6809080087793125
1
A
1.5554594371997328
...
...
...
standard_distance function
The standard_distance function calculates the standardized distance for a new sample vector specified by the metric_value parameter based on a pattern sepcified by the summary parameter. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function. summarize function The standardized distance is similar to the Mahalanobis distance. The Mahalanobis distance measures the standardized distance between a vector that consists of multiple metrics and the centroid of the vector. The standardized distance measures the distance between a metric of a vector and the centroid of the vector. The element_index parameter specifies the index of the metric. The index starts from 0. The metric_value parameter specifies the value of the metric.
double standard_distance(varchar summary, double metric_value, int element_index)Parameter | Description |
| A multivariate pattern that is returned by the summarize function. For more information, see summarize function. |
| The metric. |
| The index of the metric in the multivariate pattern specified by the |
Example
Query statement
* | with data_table as ( select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features ), summaries as ( select entity_group, summarize(array_agg(features)) as statistical_summary from data_table group by entity_group ) select t1.entity_id, t1.entity_group, standard_distance(t2.statistical_summary, 30, 1) as std_distance from data_table as t1 join summaries as t2 on t1.entity_group = t2.entity_groupQuery and analysis results
The
std_distanceparameter indicates the standardized distance of the metric specified by themetric_valueparameter.entity_id
entity_group
std_distance
8
A
2.386927730244857
7
A
1.6809080087793125
1
A
1.5554594371997328
...
...
...
anomaly_level function
The anomaly_level function calculates the Mahalanobis distance for a new sample vector specified by the x_vector parameter based on a pattern specified by the summary parameter and rounds down each distance value to obtain different levels of abnormal probabilities. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function. If the return value is 0.1, the probability that the new sample vector is abnormal is 10%. This value indicates first-level anomalies. If the return value is 0.01, the probability that the new sample vector is abnormal is 1%. This value indicates second-level anomalies. If the return value is 0.001, the probability that the new sample vector is abnormal is 0.1%. This value indicates third-level anomalies. If the return value is 0.0001, the probability that the new sample vector is abnormal is 0.01%. This value indicates fourth-level anomalies.An increase in the level of anomalies indicates a decrease in the abnormal probability of the new sample vector and an increase in the suspiciability of the vector. In most cases, you can configure a threshold to filter anomalies based on the anomaly detection results. For example, you can configure a threshold to retain only fourth-level anomalies and higher.
If you specify a value for the element_index parameter, the function returns a probability only for the metric with the specified index of the specified vector. If you do not configure the element_index parameter, the function returns a probability for each metric of the specified vector.
double anomaly_level(varchar summary, array(double) x_vector)Or
double anomaly_level(varchar summary, array(double) x_vector, int element_index)Parameter | Description |
| A multivariate pattern that is returned by the summarize function. For more information, see summarize function. |
| The new sample data. |
| Optional. The index. |
Example
Query statement
* | with dummy as ( select sequence(1, 1000) as seq_data, count(*) as record_count from log ), sample_data as ( select 'G1' as group_id, s.seq_num, -- Generate 1,000 two-dimensional random vectors that are distributed around the (100, 5000) range. The standard deviation values of two components are 20 and 500. inverse_normal_cdf(100, 20, random()) as x1, inverse_normal_cdf(5000, 500, rand()) as x2 from dummy, unnest(seq_data) as s(seq_num) ), data_summary as ( select group_id, summarize(array_agg(array[x1, x2])) as metric_summary from sample_data group by group_id ), new_data as ( select 'G1' as group_id, 1001 as object_id, 100.0 as x1, 5000.0 as x2 union all select 'G1' as group_id, 1002 as object_id, 118.0 as x1, 5450.0 as x2 union all select 'G1' as group_id, 1003 as object_id, 138.0 as x1, 5950.0 as x2 union all select 'G1' as group_id, 1004 as object_id, 158.0 as x1, 6450.0 as x2 union all select 'G1' as group_id, 1005 as object_id, 178.0 as x1, 6950.0 as x2 union all select 'G1' as group_id, 1006 as object_id, 198.0 as x1, 7450.0 as x2 union all select 'G1' as group_id, 1007 as object_id, 318.0 as x1, 10000.0 as x2 ) select n.group_id, json_extract(s.metric_summary, '$.means') as metric_vector_mean, json_extract(s.metric_summary, '$.covariance') as metric_covariance, n.object_id, n.x1, n.x2, anomaly_level(s.metric_summary, array[x1, x2]) as anomaly_level from data_summary as s join new_data as n on s.group_id = n.group_id order by n.group_id, n.object_id limit 100000Query and analysis results
The
anomaly_levelparameter indicates the abnormal probability of the sample vector specified by thex_vectorparameter.group_id
object_id
anomaly_level
G1
1007
13.0
G1
1006
5.0
G1
1005
4.0
...
...
...