All Products
Search
Document Center

Platform For AI:One-Class SVM Outlier

Last Updated:Mar 21, 2024

One-Class Support Vector Machine (SVM) is an unsupervised machine learning algorithm that is different from traditional SVM algorithms. You can use the One-Class SVM Outlier component to detect outliers by learning a decision boundary. This topic describes how to configure the One-Class SVM Outlier component in Platform for AI (PAI).

Limits

You can use the One-Class SVM Outlier component based only on the computing resources of MaxCompute.

Configure the component

You can use one of the following methods to configure the parameters of the One-Class SVM Outlier component:

Method 1: Configure the component in the PAI console

Configure the component on the pipeline page of Machine Learning Designer. The following table describes the parameters.

Tab

Parameter

Description

Field Setting

featureCols

An array of the names of feature columns.

groupCols

An array of the names of group columns.

tensorCol

The tensor column.

vectorCol

The name of the vector column.

Parameter Setting

Prediction Result Column

The name of the prediction result column.

coef0

The coef0 parameter of the kernel function. Default value: 0.0.

Note

This parameter takes effect only if the type of the kernel function is polynomial or sigmoid.

degree

The degree of the polynomial.

epsilon

The value that you want to obtain from the training results before the iteration stops. Default value: 1.0E-6.

gamma

The gamma parameter of the kernel function. Default value: -1.0.

Note

This parameter takes effect only if the type of the kernel function is RBF, polynomial, or sigmoid. If you do not configure this parameter, the default value 1/data dimension is used.

kernelType

The type of the kernel function. Valid values:

  • RBF

  • POLY

  • SIGMOID

  • LINEAR

maxOutlierNumPerGroup

The maximum number of outliers per group.

maxOutlierRatio

The maximum ratio of outliers that are detected by the algorithm.

maxSampleNumPerGroup

The maximum number of samples per group.

nu

The nu parameter of the kernel function. This parameter is positively correlated with the number of support vectors. Valid values: (0,1). Default value: 0.01.

outlierThreshold

If the score exceeds the specified threshold, the data point is considered an anomalous point.

Column name of detail prediction information

The name of the prediction details column.

numThreads

The number of threads of the component.

Execute Tuning

Number of Workers

The number of worker nodes. The value must be a positive integer. This parameter must be used with the Memory per worker parameter. Valid values: 1 to 9999.

Memory per worker

The memory size of each worker node. Unit: MB. The value must be a positive integer. You must specify a value from 1024 to 65536.

Method 2: Configure the component by using Python code

You can configure the One-Class SVM Outlier component by using the PyAlink Script component to call Python code. For more information, see PyAlink script.

Parameter

Required

Description

Default value

predictionCol

Yes

The name of the prediction results column.

N/A

degree

No

The degree of the polynomial.

2

epsilon

No

The value that you want to obtain from the training results before the iteration stops.

1.0E-6

featureCols

No

An array of the names of feature columns.

Select All

groupCols

No

The array of the names of group columns.

N/A

maxOutlierNumPerGroup

No

The maximum number of outliers per group.

N/A

maxOutlierRatio

No

The maximum ratio of outliers that are detected by the algorithm.

N/A

maxSampleNumPerGroup

No

The maximum number of samples per group.

N/A

outlierThreshold

No

If the score exceeds the specified threshold, the data point is considered an anomalous point.

N/A

predictionDetailCol

No

The name of the prediction details column.

N/A

tensorCol

No

The name of the tensor column.

N/A

vectorCol

No

The name of the vector column.

N/A

kernelType

No

The type of the kernel function. Valid values:

  • RBF

  • POLY

  • SIGMOID

  • LINEAR

RBF

coef0

No

The coef0 parameter of the kernel function.

Note

This parameter takes effect only if the type of the kernel function is polynomial or sigmoid.

0.0

gamma

No

The gamma parameter of the kernel function.

Note

This parameter takes effect only if the type of the kernel function is RBF, polynomial, or sigmoid. If you do not configure this parameter, the default value 1/data dimension is used.

-1.0

nu

No

The nu parameter of the kernel function. This parameter is positively correlated with the number of support vectors. Valid values: (0,1).

0.01

numThreads

No

The number of threads of the component.

1

Sample Python code:

df = pd.DataFrame([
[0.730967787376657,0.24053641567148587,0.6374174253501083,0.5504370051176339],
[0.7308781907032909,0.41008081149220166,0.20771484130971707,0.3327170559595112],
[0.7311469360199058,0.9014476240300544,0.49682259343089075,0.9858769332362016],
[0.731057369148862,0.07099203475193139,0.06712000939049956,0.768156984078079],
[0.7306094602878371,0.9187140138555101,0.9186071189908658,0.6795571637816596],
[0.730519863614471,0.08825840967622589,0.4889045498516358,0.461837214623537],
[0.7307886238322471,0.5796252073129174,0.7780122870716483,0.11499709190022733],
[0.7306990420600421,0.7491696031336331,0.34830970303125697,0.8972771427421047]])

# load data
data = BatchOperator.fromDataframe(df, schemaStr="x1 double, x2 double, x3 double, x4 double")

OcsvmOutlierBatchOp() \
            .setFeatureCols(["x1", "x2", "x3", "x4"]) \
            .setGamma(0.5) \
            .setNu(0.1) \
            .setKernelType("RBF") \
            .setPredictionCol("pred").linkFrom(data).print()