All Products
Search
Document Center

Simple Log Service:Time series clustering functions

Last Updated:Jun 07, 2024

This topic describes the time series clustering functions that you can use to cluster multiple pieces of time series data and obtain different curve shapes. Then, you can quickly find the corresponding cluster center and curves with shapes that are different from the curve shapes in the cluster.

Function list

Function

Description

ts_density_cluster

Uses a density-based clustering method to cluster multiple pieces of time series data.

ts_hierarchical_cluster

Uses a hierarchical clustering method to cluster multiple pieces of time series data.

ts_similar_instance

Queries curves that are similar to a specified curve.

ts_density_cluster

Function format:

select ts_density_cluster(x, y, z) 

The following table lists the parameters of the function.

Parameter

Description

Value

x

The time sequence. The points in time along the horizontal axis are sorted in ascending order.

Each point in time is a Unix timestamp. Unit: seconds.

y

The sequence of numeric data corresponding to a specified point in time.

N/A

z

The name of the curve corresponding to the data at a specified point in time.

The value is of the string type. Example: machine01.cpu_usr.

Example

  • The query statement is as follows:

    * and (h: "machine_01" OR h: "machine_02" OR h : "machine_03") | select ts_density_cluster(stamp, metric_value,metric_name ) from ( select '("__time__" - ("__time__" % 600))' as stamp, avg(v) as metric_value, h as metric_name from log GROUP BY stamp, metric_name order BY metric_name, stamp ) 
  • Output resultOutput result

The following table lists the display items.

Display item

Description

cluster_id

The category of the cluster. The value of -1 indicates that the cluster is not categorized in any cluster centers.

rate

The proportion of instances in the cluster.

time_series

The timestamp sequence of the cluster center.

data_series

The data sequence of the cluster center.

instance_names

The collection of instances included in the cluster center.

sim_instance

The name of an instance in the cluster.

ts_hierarchical_cluster

Function format:

select ts_hierarchical_cluster(x, y, z) 

The following table lists the parameters of the function.

Parameter

Description

Value

x

The time sequence. The points in time along the horizontal axis are sorted in ascending order.

Each point in time is a Unix timestamp. Unit: seconds.

y

The sequence of numeric data corresponding to a specified point in time.

N/A

z

The name of the curve corresponding to the data at a specified point in time.

The value is of the string type. Example: machine01.cpu_usr.

Example

  • The query statement is as follows:

    * and (h: "machine_01" OR h: "machine_02" OR h : "machine_03") | select ts_hierarchical_cluster(stamp, metric_value, metric_name) from ( select '("__time__" - ("__time__" % 600))' as stamp, avg(v) as metric_value, h as metric_name from log GROUP BY stamp, metric_name order BY metric_name, stamp )
  • Output resultOutput result

The following table lists the display items.

Display item

Description

cluster_id

The category of the cluster. The value of -1 indicates that the cluster is not categorized in any cluster centers.

rate

The proportion of instances in the cluster.

time_series

The timestamp sequence of the cluster center.

data_series

The data sequence of the cluster center.

instance_names

The collection of instances included in the cluster center.

sim_instance

The name of an instance in the cluster.

ts_similar_instance

Function format:

select ts_similar_instance(x, y, z, instance_name, topK, metricType) 

The following table lists the parameters of the function.

Parameter

Description

Value

x

The time sequence. The points in time along the horizontal axis are sorted in ascending order.

Each point in time is a Unix timestamp. Unit: seconds.

y

The sequence of numeric data corresponding to a specified point in time.

N/A

z

The name of the curve corresponding to the data at a specified point in time.

The value is of the string type. Example: machine01.cpu_usr.

instance_name

The name of a specified curve to be queried in the z collection.

The value is of the string type. Example: machine01.cpu_usr.

Note

The curve to be queried must be an existing one.

topK

The curves similar to the specified curve. A maximum of K curves are returned.

N/A

metricType

{'shape', 'manhattan', 'euclidean'}. The metric used to measure the similarity between time series curves.

N/A

The query statement is as follows:

* and m: NET and m: Tcp and (h: "nu4e01524.nu8" OR  h: "nu2i10267.nu8" OR  h : "nu4q10466.nu8") | select ts_similar_instance(stamp, metric_value, metric_name, 'nu4e01524.nu8' ) from ( select '("__time__" - ("__time__" % 600))' as stamp, sum(v) as metric_value, h as metric_name from log GROUP BY stamp, metric_name order BY  metric_name, stamp )

The following table lists the display items.

Display item

Description

instance_name

The list of metrics that are similar to the specified metric.

time_series

The timestamp sequence of the cluster center.

data_series

The data sequence of the cluster center.