Simple Log Service menyediakan fitur deteksi anomali untuk mengidentifikasi anomali dalam sistem layanan serta penyebab akar dari anomali tersebut. Fitur ini secara otomatis mendeteksi perubahan metrik yang abnormal berdasarkan pola saat ini dan pembelajaran mesin. Anda dapat menggunakan fungsi identifikasi pola multivariat untuk menganalisis pola metrik. Topik ini menjelaskan fungsi identifikasi pola multivariat.
Fungsi
Fungsi | Sintaksis | Deskripsi | Tipe data nilai kembali |
| Mengidentifikasi dan mengembalikan pola multivariat berdasarkan sampel yang ditentukan dan bobot sampel. Bobot sampel bersifat opsional. Pola statistik mencakup berbagai statistik dan statistik gabungan, seperti rata-rata, deviasi standar, dan matriks kovariansi. | varchar | |
| Menggabungkan pola multivariat yang dikembalikan oleh fungsi summarize. Pola multivariat dapat berupa pola yang diperoleh dengan mempelajari dataset yang sama pada tahap yang berbeda atau pola yang diperoleh dengan mempelajari dua dataset independen. Untuk informasi lebih lanjut, lihat fungsi summarize. | varchar | |
normalize_vector(varchar summary, array(double) x_vector) | Menormalisasi vektor sampel baru yang ditentukan oleh parameter | array(double) | |
standardize_vector(varchar summary, array(double) x_vector) | Menstandarisasi vektor sampel baru yang ditentukan oleh parameter | array(double) | |
mah_distance(varchar summary, array(double) x_vector) | Menghitung jarak Mahalanobis untuk vektor sampel baru yang ditentukan oleh parameter | double | |
standard_distance(varchar summary, double metric_value, int element_index) | Menghitung jarak terstandarisasi untuk metrik yang ditentukan oleh parameter | double | |
| Menghitung jarak Mahalanobis untuk vektor sampel baru yang ditentukan oleh parameter Jika Anda menentukan nilai untuk parameter | array(double) |
fungsi summarize
Fungsi summarize mengidentifikasi dan mengembalikan pola multivariat berdasarkan sampel dan bobot sampel yang ditentukan. Bobot sampel bersifat opsional. Pola statistik mencakup berbagai statistik dan statistik gabungan seperti rata-rata, deviasi standar, dan matriks kovariansi.
varchar summarize(array(array(double)) data_samples)Atau
varchar summarize(array(array(double)) data_samples, array(double) weights)Parameter | Deskripsi |
| Array dua dimensi. Array ini dapat digunakan sebagai tabel dua dimensi. Setiap kolom menentukan variabel. Setiap baris menentukan nilai variabel dari sampel. |
| Opsional. Bobot sampel. Nilainya adalah array satu dimensi dengan panjang yang sama dengan salah satu array satu dimensi yang membentuk array dua dimensi yang ditentukan oleh parameter |
Contoh
Pernyataan Kueri
* | with data_table as ( select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features ) select entity_group, summarize(array_agg(features)) as statistical_summary from data_table group by entity_groupHasil Kueri dan Analisis
entity_group
statistical_summary
A
{ "sampleCount": 8, "vectorSize": 4, "means": [ 11.5, 12.5, 9.25, 0.0 ], "stdDevs": [ 6.87386354243376, 6.87386354243376, 7.361215932167728, 0.0 ], "variances": [ 47.25, 47.25, 54.1875, 0.0 ], "mins": [ 1.0, 2.0, 1.0, 0.0 ], "maxs": [ 22.0, 23.0, 21.0, 0.0 ], "covariance": [ [ 47.25, 47.25, 19.125, 0.0 ], [ 47.25, 47.25, 19.125, 0.0 ], [ 19.125, 19.125, 54.1875, 0.0 ], [ 0.0, 0.0, 0.0, 0.0 ] ], "correlations": [ [ 1.0, 1.0, 0.37796447300922725, 0.0 ], [ 1.0, 1.0, 0.37796447300922725, 0.0 ], [ 0.37796447300922725, 0.37796447300922725, 1.0, 0.0 ], [ 0.0, 0.0, 0.0, 1.0 ] ], "sums": [ 92.0, 100.0, 74.0, 0.0 ], "weightSum": 8.0, "sumProducts": [ [ 1436.0, 1528.0, 1004.0, 0.0 ], [ 1528.0, 1628.0, 1078.0, 0.0 ], [ 1004.0, 1078.0, 1118.0, 0.0 ], [ 0.0, 0.0, 0.0, 0.0 ] ], "isSummarized": true }Parameter dalam Hasil Kueri dan Analisis
Parameter
Deskripsi
sampleCountJumlah sampel.
vectorSizePanjang vektor.
meansNilai rata-rata semua komponen vektor.
stdDevsNilai deviasi standar semua komponen vektor.
variancesNilai varians semua komponen vektor.
minsNilai minimum semua komponen vektor.
maxsNilai maksimum semua komponen vektor.
covarianceMatriks kovariansi semua komponen vektor.
correlationsMatriks koefisien korelasi semua komponen vektor.
sumsJumlah semua komponen vektor.
weightSumJumlah semua bobot sampel.
sumProductsHasil antara yang digunakan untuk menggabungkan pola statistik.
isSummarizedMenunjukkan apakah pola multivariat normal dikembalikan. Nilai valid:
true
false
fungsi merge_summary
Fungsi merge_summary menggabungkan pola multivariat yang dikembalikan oleh fungsi summarize. Pola multivariat dapat berasal dari dataset yang sama pada tahap berbeda atau dari dua dataset independen. Untuk informasi lebih lanjut, lihat fungsi summarize.
varchar merge_summary(varchar summary1, varchar summary2)Atau
varchar merge_summary(varchar summary1, double weight1, varchar summary2, double weight2)Parameter | Deskripsi |
| Pola multivariat yang dikembalikan oleh fungsi summarize. Untuk informasi lebih lanjut, lihat fungsi summarize. |
| Bobot pola multivariat yang ditentukan oleh parameter summary1. |
| Pola multivariat yang dikembalikan oleh fungsi summarize. Untuk informasi lebih lanjut, lihat fungsi summarize. |
| Bobot pola multivariat yang ditentukan oleh parameter summary2. |
Contoh
Pernyataan Kueri
* | with data_table_01 as ( select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features ), summaries_01 as ( select entity_group, summarize(array_agg(features)) as statistical_summary from data_table_01 group by entity_group ), data_table_02 as ( select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features ), summaries_02 as ( select entity_group, summarize(array_agg(features)) as statistical_summary from data_table_02 group by entity_group ) select s1.entity_group, merge_summary(s1.statistical_summary, s2.statistical_summary) as statistical_summary from summaries_01 as s1 join summaries_02 as s2 on s1.entity_group = s2.entity_groupHasil Kueri dan Analisis
Parameter
statistical_summarymenunjukkan pola multivariat hasil operasi penggabungan.entity_group
statistical_summary
2
{ "sampleCount": 8, "vectorSize": 4, "means": [ 11.5, 12.5, 9.25, 0.0 ], "stdDevs": [ 6.87386354243376, 6.87386354243376, 7.361215932167728, 0.0 ], "variances": [ 47.25, 47.25, 54.1875, 0.0 ], "mins": [ 1.0, 2.0, 1.0, 0.0 ], "maxs": [ 22.0, 23.0, 21.0, 0.0 ], "covariance": [ [ 47.25, 47.25, 19.125, 0.0 ], [ 47.25, 47.25, 19.125, 0.0 ], [ 19.125, 19.125, 54.1875, 0.0 ], [ 0.0, 0.0, 0.0, 0.0 ] ], "correlations": [ [ 1.0, 1.0, 0.37796447300922725, 0.0 ], [ 1.0, 1.0, 0.37796447300922725, 0.0 ], [ 0.37796447300922725, 0.37796447300922725, 1.0, 0.0 ], [ 0.0, 0.0, 0.0, 1.0 ] ], "sums": [ 92.0, 100.0, 74.0, 0.0 ], "weightSum": 8.0, "sumProducts": [ [ 1436.0, 1528.0, 1004.0, 0.0 ], [ 1528.0, 1628.0, 1078.0, 0.0 ], [ 1004.0, 1078.0, 1118.0, 0.0 ], [ 0.0, 0.0, 0.0, 0.0 ] ], "isSummarized": true }Parameter dalam Hasil Kueri dan Analisis
Parameter
Deskripsi
sampleCountJumlah sampel.
vectorSizePanjang vektor.
meansNilai rata-rata semua komponen vektor.
stdDevsNilai deviasi standar semua komponen vektor.
variancesNilai varians semua komponen vektor.
minsNilai minimum semua komponen vektor.
maxsNilai maksimum semua komponen vektor.
covarianceMatriks kovariansi semua komponen vektor.
correlationsMatriks koefisien korelasi semua komponen vektor.
sumsJumlah semua komponen vektor.
weightSumJumlah semua bobot sampel.
sumProductsHasil antara yang digunakan untuk menggabungkan pola statistik.
isSummarizedMenunjukkan apakah pola multivariat normal dikembalikan. Nilai valid:
true
false
fungsi normalize_vector
Fungsi normalize_vector menormalisasi vektor sampel baru yang ditentukan oleh parameter x_vector berdasarkan pola yang ditentukan oleh parameter summary. Setiap komponen vektor dipetakan ke interval [0, 1]. Parameter summary dapat disetel ke pola yang dikembalikan oleh fungsi summarize. Untuk informasi lebih lanjut, lihat fungsi summarize.
array(double) normalize_vector(varchar summary, array(double) x_vector)Parameter | Deskripsi |
| Pola multivariat yang dikembalikan oleh fungsi summarize. Untuk informasi lebih lanjut, lihat fungsi summarize. |
| Data sampel baru. |
Contoh
Pernyataan Kueri
* | with data_table as ( select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features ), summaries as ( select entity_group, summarize(array_agg(features)) as statistical_summary from data_table group by entity_group ) select t1.entity_id, t1.entity_group, normalize_vector(t2.statistical_summary, t1.features) as normalized_features from data_table as t1 join summaries as t2 on t1.entity_group = t2.entity_groupHasil Kueri dan Analisis
Parameter
normalized_featuresmenunjukkan hasil normalisasi vektor sampel yang ditentukan oleh parameterx_vector.entity_id
entity_group
normalized_features
2
A
[0.14285714285714286,0.14285714285714286,0.25,0.5]
4
A
[0.42857142857142857,0.42857142857142857,0.0,0.5]
3
A
[0.2857142857142857,0.2857142857142857,0.4,0.5]
...
...
...
fungsi standardize_vector
Fungsi standardize_vector menstandarisasi vektor sampel baru yang ditentukan oleh parameter x_vector berdasarkan pola yang ditentukan oleh parameter summary. Komponen-komponen vektor memiliki rata-rata 0 dan deviasi standar 1. Parameter summary dapat disetel ke pola yang dikembalikan oleh fungsi summarize. Untuk informasi lebih lanjut, lihat fungsi summarize.
array(double) standardize_vector(varchar summary, array(double) x_vector)Parameter | Deskripsi |
| Pola multivariat yang dikembalikan oleh fungsi summarize. Untuk informasi lebih lanjut, lihat fungsi summarize. |
| Data sampel baru. |
Contoh
Pernyataan Kueri
* | with data_table as ( select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features ), summaries as ( select entity_group, summarize(array_agg(features)) as statistical_summary from data_table group by entity_group ) select t1.entity_id, t1.entity_group, standardize_vector(t2.statistical_summary, t1.features) as standardized_features from data_table as t1 join summaries as t2 on t1.entity_group = t2.entity_groupHasil Kueri dan Analisis
Parameter
standardized_featuresmenunjukkan hasil standarisasi vektor sampel yang ditentukan oleh parameterx_vector.entity_id
entity_group
standardized_features
2
A
[-1.0910894511799619,-1.0910894511799619,-0.4415031470273609,0.0]
4
A
[-0.21821789023599237,-0.21821789023599237,-1.1207387578386854,0.0]
3
A
[-0.6546536707079771,-0.6546536707079771,-0.03396178054056622,0.0]
...
...
...
fungsi mah_distance
Fungsi mah_distance menghitung jarak Mahalanobis untuk vektor sampel baru yang ditentukan oleh parameter x_vector, berdasarkan pola dari parameter summary. Anda dapat menyetel parameter summary ke pola yang dihasilkan oleh fungsi summarize. Jarak Mahalanobis efektif digunakan untuk menangani perbedaan skala antara variabel yang berbeda. Metrik ini mengukur jarak antara vektor sampel yang telah distandarisasi, yang ditentukan oleh parameter x_vector, dan pusat vektor. Jika nilai jarak Mahalanobis adalah 1, maka jarak antara vektor sampel dan pusat vektor sama dengan jarak rata-rata antara semua vektor dan pusat vektor.
double mah_distance(varchar summary, array(double) x_vector)Parameter | Deskripsi |
| Pola multivariat yang dikembalikan oleh fungsi summarize. Untuk informasi lebih lanjut, lihat fungsi summarize. |
| Data sampel baru. |
Contoh
Pernyataan Kueri
* | with data_table as ( select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features ), summaries as ( select entity_group, summarize(array_agg(features)) as statistical_summary from data_table group by entity_group ) select t1.entity_id, t1.entity_group, mah_distance(t2.statistical_summary, t1.features) as std_distance from data_table as t1 join summaries as t2 on t1.entity_group = t2.entity_groupHasil Kueri dan Analisis
Parameter
std_distancemenunjukkan jarak Mahalanobis vektor sampel yang ditentukan oleh parameterx_vector.entity_id
entity_group
std_distance
8
A
2.386927730244857
7
A
1.6809080087793125
1
A
1.5554594371997328
...
...
...
fungsi standard_distance
Fungsi standard_distance menghitung jarak terstandarisasi untuk vektor sampel baru berdasarkan pola yang ditentukan oleh parameter summary. Parameter ini dapat diatur ke pola yang dikembalikan oleh fungsi summarize. Untuk informasi lebih lanjut, lihat dokumentasi tentang fungsi summarize. Jarak terstandarisasi mirip dengan jarak Mahalanobis. Jarak Mahalanobis mengukur jarak terstandarisasi antara vektor yang terdiri dari beberapa metrik dan pusat vektor, sedangkan jarak terstandarisasi mengukur jarak antara nilai metrik vektor dan pusat vektor. Parameter element_index menentukan indeks metrik, dengan indeks dimulai dari 0. Parameter metric_value menentukan nilai metrik.
double standard_distance(varchar summary, double metric_value, int element_index)Parameter | Deskripsi |
| Sebuah pola multivariat yang dikembalikan oleh fungsi summarize. Untuk informasi lebih lanjut, lihat summarize function. |
| Metrik tersebut. |
| Indeks dari data deret waktu dalam pola multivariat yang ditentukan oleh parameter |
Contoh
Pernyataan Kueri
* | with data_table as ( select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features ), summaries as ( select entity_group, summarize(array_agg(features)) as statistical_summary from data_table group by entity_group ) select t1.entity_id, t1.entity_group, standard_distance(t2.statistical_summary, 30, 1) as std_distance from data_table as t1 join summaries as t2 on t1.entity_group = t2.entity_groupHasil Kueri dan Analisis
Parameter
std_distancemenunjukkan jarak standar dari data deret waktu yang ditentukan oleh parametermetric_value.entity_id
entity_group
std_distance
8
A
2.386927730244857
7
A
1.6809080087793125
1
A
1.5554594371997328
...
...
...
Fungsi anomaly_level
Fungsi anomaly_level menghitung jarak Mahalanobis untuk vektor sampel baru yang ditentukan oleh parameter x_vector berdasarkan pola yang ditentukan oleh parameter summary dan membulatkan setiap nilai jarak ke bawah untuk mendapatkan tingkat probabilitas abnormal yang berbeda. Parameter summary dapat disetel ke pola yang dikembalikan oleh fungsi summarize. Untuk informasi lebih lanjut, lihat fungsi summarize. Jika nilai pengembalian adalah 0,1, probabilitas bahwa vektor sampel baru abnormal adalah 10%, yang menunjukkan anomali tingkat pertama. Nilai 0,01 menunjukkan probabilitas 1% (tingkat kedua), nilai 0,001 menunjukkan probabilitas 0,1% (tingkat ketiga), dan nilai 0,0001 menunjukkan probabilitas 0,01% (tingkat keempat). Peningkatan tingkat anomali menunjukkan penurunan probabilitas abnormal dan peningkatan kecurigaan terhadap vektor tersebut. Dalam banyak kasus, Anda dapat mengonfigurasi ambang batas untuk menyaring anomali berdasarkan hasil deteksi anomali, misalnya hanya menyimpan anomali tingkat keempat atau lebih tinggi.
Jika Anda menentukan nilai untuk parameter element_index, fungsi akan mengembalikan probabilitas hanya untuk metrik dengan indeks tertentu dari vektor yang ditentukan. Jika tidak, fungsi akan mengembalikan probabilitas untuk setiap metrik dari vektor tersebut.
double anomaly_level(varchar summary, array(double) x_vector)Atau
double anomaly_level(varchar summary, array(double) x_vector, int element_index)Parameter | Deskripsi |
| Pola multivariat yang dikembalikan oleh fungsi summarize. Untuk informasi lebih lanjut, lihat fungsi summarize. |
| Data sampel baru. |
| Opsional. Indeks. |
Contoh
Pernyataan Kueri
* | with dummy as ( select sequence(1, 1000) as seq_data, count(*) as record_count from log ), sample_data as ( select 'G1' as group_id, s.seq_num, -- Hasilkan 1.000 vektor acak dua dimensi yang didistribusikan di sekitar rentang (100, 5000). Nilai deviasi standar dari dua komponen adalah 20 dan 500. inverse_normal_cdf(100, 20, random()) as x1, inverse_normal_cdf(5000, 500, rand()) as x2 from dummy, unnest(seq_data) as s(seq_num) ), data_summary as ( select group_id, summarize(array_agg(array[x1, x2])) as metric_summary from sample_data group by group_id ), new_data as ( select 'G1' as group_id, 1001 as object_id, 100.0 as x1, 5000.0 as x2 union all select 'G1' as group_id, 1002 as object_id, 118.0 as x1, 5450.0 as x2 union all select 'G1' as group_id, 1003 as object_id, 138.0 as x1, 5950.0 as x2 union all select 'G1' as group_id, 1004 as object_id, 158.0 as x1, 6450.0 as x2 union all select 'G1' as group_id, 1005 as object_id, 178.0 as x1, 6950.0 as x2 union all select 'G1' as group_id, 1006 as object_id, 198.0 as x1, 7450.0 as x2 union all select 'G1' as group_id, 1007 as object_id, 318.0 as x1, 10000.0 as x2 ) select n.group_id, json_extract(s.metric_summary, '$.means') as metric_vector_mean, json_extract(s.metric_summary, '$.covariance') as metric_covariance, n.object_id, n.x1, n.x2, anomaly_level(s.metric_summary, array[x1, x2]) as anomaly_level from data_summary as s join new_data as n on s.group_id = n.group_id order by n.group_id, n.object_id limit 100000Hasil Kueri dan Analisis
Parameter
anomaly_levelmenunjukkan probabilitas abnormal dari vektor sampel yang ditentukan oleh parameterx_vector.group_id
object_id
anomaly_level
G1
1007
13.0
G1
1006
5.0
G1
1005
4.0
...
...
...