The Anomaly Detection component is used to detect data with continuous or enumeration features. It helps you detect exceptions in the data.

Background information

Anomalous features in data can be detected by using the Box Plot or Attribute Value Frequency (AVF) method.
  • Box Plot is used to detect data with continuous features. The detection is performed based on the maximum and minimum values of the box plot chart.
  • AVF is used to detect data with enumeration features. The detection is performed based on the frequency and threshold of enumeration features.

Configure the component

You can use one of the following methods to configure the Anomaly Detection component.

Method 1: Configure the component on the pipeline page

You can configure the parameters of the Anomaly Detection component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.
TabParameterDescription
Fields SettingFeature ColumnsThe fields to be analyzed.
Anomaly Detection MethodThe method used to detect anomalous data. Box Plot is used to detect data with continuous features. AVF is used to detect data with enumeration features.

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
PAI -name fe_detect_runner -project algo_public
     -DselectedCols="emp_var_rate,cons_price_rate,cons_conf_idx,euribor3m,nr_employed" \
     -Dlifecycle="28"
     -DdetectStrategy="boxPlot"
     -DmodelTable="pai_temp_2458_23565_2"
     -DinputTable="pai_bank_data"
     -DoutputTable="pai_temp_2458_23565_1";
ParameterDescriptionRequired
inputTableThe name of the input table. Yes
inputTablePartitionsThe partitions in the input table. By default, all partitions are selected.
  • Specify a single partition in the format of partition_name=value.
  • Specify multiple partitions in the format of name1=value1,name2=value2.
    Note Separate multiple partitions with commas (,).
  • Specify multi-level partitions in the format of name1=value1/name2=value2.
No
selectedColsThe input features. The data types of the features are not limited. Yes
detectStrategyThe detection method. Box Plot and AVF are supported. Box Plot is used to detect data with continuous features. AVF is used to detect data with enumeration features. Yes
outputTableThe output table that contains data with anomalous features. Yes
modelTableThe anomaly detection model. Yes
lifecycleThe lifecycle of the output table. Default value: 7. No
coreNumThe number of cores. This parameter must be used with the memSizePerCore parameter.
Note The value of this parameter must be a positive integer. Valid values: 1 to 9999.
No
memSizePerCoreThe memory size of each core. Unit: MB. Valid values: [2048,64 × 1024]. No