The Anomaly Detection component is used to detect data with continuous or enumerated features. It helps you locate exceptions in the data.

Background information

Anomalous features in data can be detected by using the Box Plot or Attribute Value Frequency (AVF) method.
  • Box Plot is used to detect data with continuous features. The detection is performed based on the maximum and minimum values of the box plot.
  • AVF is used to detect data with enumerated features. The detection is performed based on the frequency and threshold of enumerated features.

Configure the component

  • Machine Learning Platform for AI (PAI) console
    Tab Parameter Description
    Fields Setting Feature Columns The fields that you want to analyze.
    Anomaly Detection Method The method used to detect anomalous data. Box Plot is used to detect data with continuous features. AVF is used to detect data with continuous features.
  • PAI command
    PAI -name fe_detect_runner -project algo_public
         -DselectedCols="emp_var_rate,cons_price_rate,cons_conf_idx,euribor3m,nr_employed" \
         -Dlifecycle="28"
         -DdetectStrategy="boxPlot"
         -DmodelTable="pai_temp_2458_23565_2"
         -DinputTable="pai_bank_data"
         -DoutputTable="pai_temp_2458_23565_1";
    Parameter Description Required
    inputTable The name of the input table. Yes
    inputTablePartitions The partitions selected from the input table for training. By default, all partitions in the input table are selected. Specify this parameter in one of the following formats:
    • A single partition: partition_name=value
    • Multiple partitions: name1=value1,name2=value2
      Note Multiple partitions are separated by commas (,).
    • Multi-level partitions: name1=value1/name2=value2
    No
    selectedCols The input features. The data types of the features are not limited. Yes
    detectStrategy The detection method. Box Plot and AVF are supported. Box Plot is used to detect data with continuous features. AVF is used to detect data with enumerated features. Yes
    outputTable The output table that contains data with anomalous features. Yes
    modelTable The anomaly detection model. Yes
    lifecycle The lifecycle of the output table. Default value: 7. No
    coreNum The number of cores. This parameter is used with memSizePerCore.
    Note The value of this parameter must be a positive integer. Valid values: [1,9999].
    No
    memSizePerCore The memory size of each core. Unit: MB. Valid values: [2048,64 × 1024]. No