The Anomaly Detection component is used to detect data with continuous or enumerated features. It helps you locate exceptions in the data.
Background information
Anomalous features in data can be detected by using the Box Plot or Attribute Value
Frequency (AVF) method.
- Box Plot is used to detect data with continuous features. The detection is performed based on the maximum and minimum values of the box plot.
- AVF is used to detect data with enumerated features. The detection is performed based on the frequency and threshold of enumerated features.
Configure the component
- Machine Learning Platform for AI (PAI) console
Tab Parameter Description Fields Setting Feature Columns The fields that you want to analyze. Anomaly Detection Method The method used to detect anomalous data. Box Plot is used to detect data with continuous features. AVF is used to detect data with continuous features. - PAI command
PAI -name fe_detect_runner -project algo_public -DselectedCols="emp_var_rate,cons_price_rate,cons_conf_idx,euribor3m,nr_employed" \ -Dlifecycle="28" -DdetectStrategy="boxPlot" -DmodelTable="pai_temp_2458_23565_2" -DinputTable="pai_bank_data" -DoutputTable="pai_temp_2458_23565_1";
Parameter Description Required inputTable The name of the input table. Yes inputTablePartitions The partitions selected from the input table for training. By default, all partitions in the input table are selected. Specify this parameter in one of the following formats: - A single partition:
partition_name=value
- Multiple partitions:
name1=value1,name2=value2
Note Multiple partitions are separated by commas (,). - Multi-level partitions:
name1=value1/name2=value2
No selectedCols The input features. The data types of the features are not limited. Yes detectStrategy The detection method. Box Plot and AVF are supported. Box Plot is used to detect data with continuous features. AVF is used to detect data with enumerated features. Yes outputTable The output table that contains data with anomalous features. Yes modelTable The anomaly detection model. Yes lifecycle The lifecycle of the output table. Default value: 7. No coreNum The number of cores. This parameter is used with memSizePerCore. Note The value of this parameter must be a positive integer. Valid values: [1,9999].No memSizePerCore The memory size of each core. Unit: MB. Valid values: [2048,64 × 1024]. No - A single partition: