Machine Learning Platform for AI (PAI) provides comprehensive features such as feature encoding, model training, and model evaluation. You can create a model by extracting and labeling the features of system anomalies. Then, you can use the model to monitor system metrics and predict system anomalies.

Background information

A user system may encounter anomalies. For example, the CPU utilization of the operations and maintenance system suddenly increases or a system is flooded with illegal information. If you can monitor the metrics of the user system in real time, and take preventive measures and implement real-time alerting for abnormal metrics, the user system may be far less exposed to risks.

Solution

PAI provides a set of classification algorithms that are based on metric monitoring. You can use these algorithms to create binary classification models to monitor metrics and further detect system anomalies. Then, you can deploy the models to online systems to implement nearline risk control. The following conditions must be met before you can use these classification algorithms of PAI to create models:
  • You master knowledge of the classic algorithms in machine learning, especially feature engineering and binary classification algorithms.
  • You can be fully engaged in the development for one to two days.
  • You have more than 1,000 data records that are labeled with anomaly or normal.

Dataset

The experiment described in this topic is based on system-level monitoring logs, including 22,544 data records. Among these data records, 9,711 record system anomalies. The following figure shows the sample data that is used in the experiment.Sample data of the experiment
Parameter Description
protocol_type The protocol that is used for network connection. Example: tcp, icmp, or udp.
service The service protocol. Example: http, finger, pop, private, or smtp.
flage The connection status. Example: SF, RSTO, or REJ.
a2~a38 Different system metrics.
class The label field. Valid values: normal and anomaly. If the value of class is normal in a data record, the data record records a normal system event. If the value of class is anomaly in a data record, the data record records a system anomaly.

Procedure

  1. Go to the Machine Learning Studio console.
    1. Log on to the PAI console.
    2. In the left-side navigation pane, choose Model Training > Studio-Modeling Visualization.
    3. On the PAI Visualization Modeling page, find the project in which you want to create an experiment and click Machine Learning in the Operation column.Machine Learning
  2. Create an experiment.
    1. In the left-side navigation pane, click Home.
    2. In the Templates section, click Create below Abnormal behavior risk control.
    3. In the New Experiment dialog box, set the experiment parameters. You can use the default values of the parameters.
      Parameter Description
      Name The name of the experiment. Default value: Abnormal behavior risk control.
      Project The project in which you want to create the experiment. You cannot change the value of this parameter.
      Description The description of the experiment. Default value: Identify abnormal behaviors in the system through algorithms.
      Save To The directory for storing the experiment. Default value: My Experiments.
    4. Click OK.
    5. Optional:Wait about 10 seconds. Then, click Experiments in the left-side navigation pane.
    6. Optional:Click Abnormal behavior risk control_XX under My Experiments. The canvas of the experiment appears.
      My Experiments is the directory for storing the experiment that you created and Abnormal behavior risk control_XX is the name of the experiment. In the experiment name, _XX is the ID that the system automatically creates for the experiment.
    7. View the components of the experiment on the canvas, as shown in the following figure. The system automatically creates the experiment based on the preset template.
      Experiment on predicting system anomalies by monitoring system metrics
      Area No. Description
      1 The pai_online_project.operation_detection-1 component imports data from the source dataset.
      2 The components in this area perform feature engineering.
      1. The One Hot Encoding-1 component converts string-type feature data to numeric-type data.
      2. The Normalization-1 component limits all data within the range of 0 to 1 to eliminate the impact of dimensions. The following figure shows the normalized data.Normalized data
      3. The SQL Script-1 component executes an SQL statement to query the values of class in the source data. The SQL statement converts the value anomaly of class to 1 and the value normal of class to 0. In this example, the SQL Script-1 component executes the following SQL statement:
        select (case class  when 'anomaly' then 1 else 0 end) as class from  ${t1};
      3 The components in this area use the logistic regression for binary classification algorithm to train a monitoring model based on data about normal system events and system anomalies.
      4 The Binary Classification Evaluation-1 component evaluates the quality of the model by using indexes such as an area under curve (AUC), a Kolmogorov-Smirnov (KS) value, and an F1 score.
  3. Run the experiment and view the result.
    1. In the top toolbar of the canvas, click Run.
    2. After the experiment is run, right-click Binary Classification Evaluation-1 on the canvas and select View Evaluation Report.
    3. In the Evaluation Report dialog box, click the Indexes tab to view the indexes that are used to evaluate the model.
      Model evaluation reportIn the evaluation report, the value of AUC is greater than 0.9, which indicates that the prediction accuracy of the model is higher than 90%.