Log Service provides the machine learning feature that supports multiple algorithms and calling methods. You can use the analytic statement and machine learning functions to call machine learning algorithms to analyze the characteristics of one or more fields within a period of time.

Log Service offers various time series analysis algorithms. You can call these algorithms to solve problems that are related to time series data. For example, you can predict time series, detect time series anomalies, decompose time series, and cluster multiple time series. In addition, the algorithms are compatible with standard SQL functions. This simplifies the usage of the algorithms and improves the efficiency of troubleshooting.

Features

  • Supports various smooth operations on single-time series data.
  • Supports algorithms that are used for the prediction, anomaly detection, change point detection, inflection point detection, and multi-period estimation of single-time series data.
  • Supports decomposition operations on single-time series data.
  • Supports various clustering algorithms of multi-time series data.
  • Supports multi-field pattern mining (based on the sequence of numeric data or text).

Limits

When you use the machine learning feature of Log Service, you must take note of the following limits:

  • The specified time series data must be sampled based on the same interval.
  • The specified time series data cannot contain data that is repeatedly sampled from the same point in time.
  • The processing capacity cannot exceed the maximum capacity. The following table describes the limits.
    Item Limit
    Capacity of the time-series data processing Data can be collected from a maximum of 150,000 consecutive points in time.

    If the data volume exceeds the processing capacity, you must aggregate the data or reduce the sampling amount.

    Capacity of the density-based clustering algorithm A maximum of 5,000 time series curves can be clustered at a time. Each curve cannot contain more than 1,440 points in time.
    Capacity of the hierarchical clustering algorithm A maximum of 2,000 time series curves can be clustered at a time. Each curve cannot contain more than 1,440 points in time.

Machine learning functions

Category Function Description
Time series Smooth function ts_smooth_simple Uses the Holt Winters algorithm to smooth time series data.
ts_smooth_fir Uses the finite impulse response (FIR) filter to smooth time series data.
ts_smooth_iir Uses the infinite impulse response (IIR) filter to smooth time series data.
Multi-period estimation function ts_period_detect Estimates time series data by period.
Change point detection function ts_cp_detect Detects the intervals in which data has different statistical features. The interval endpoints are change points.
ts_breakout_detect Detects the points in time at which data experiences dramatic changes.
Maximum value detection function ts_find_peaks Detects the local maximum value of time series data in a specified window.
Prediction and anomaly detection function ts_predicate_simple Uses default parameters to model time series data, predict time series data, and detect anomalies.
ts_predicate_ar Uses an autoregressive (AR) model to model time series data, predict time series data, and detect anomalies.
ts_predicate_arma Uses an autoregressive moving average (ARMA) model to model time series data, predict time series data, and detect anomalies.
ts_predicate_arima Uses an autoregressive integrated moving average (ARIMA) model to model time series data, predict time series data, and detect anomalies.
ts_regression_predict Predicts the long-run trend for a single periodic time series.
Sequence decomposition function ts_decompose Uses the Seasonal and Trend decomposition using Loess (STL) algorithm to decompose time series data.
Time series clustering function ts_density_cluster Uses a density-based clustering method to cluster multiple time series.
ts_hierarchical_cluster Uses a hierarchical clustering method to cluster multiple time series.
ts_similar_instance Queries time series curves that are similar to a specified time series curve.
Kernal density estimation functions kernel_density_estimation Uses the smooth peak function to fit the observed data points. In this way, the function simulates the real probability distribution curve.
Time series padding function series_padding Pads data points that are missing in a time series.
Anomaly comparison function anomaly_compare Compares the degree of difference of an observed object in two periods of time.
Pattern mining Frequent pattern statistical function pattern_stat Mines representative combinations of attributes among the given multi-attribute field samples to obtain the frequent pattern in statistical patterns.
Differential pattern statistical function pattern_diff Identifies the pattern that causes differences between two collections in specified conditions.
Root cause analysis function rca_kpi_search Analyze the subdimension attributes that cause anomalies of the monitoring metric.
Correlation analysis functions ts_association_analysis Identifies the metrics that are correlated to a specified metric among multiple observed metrics in the system.
ts_similar Identifies the metrics that are correlated to specified time series data among multiple observed metrics in the system.
Request URL classification function url_classify Classifies a request URL and attaches a tag to the URL. The function also provides the regular expression that defines the pattern of the tag.