A support vector machine (SVM) is a machine learning model based on the statistical learning theory. It minimizes risks and improves the generalization capability of learning machines. This way, empirical risks and confidence intervals are minimized. The Linear SVM component can be used only in a binary classification scenario.

Background information

The Linear SVM component is not implemented by using kernel functions. For more information about how to implement this component, see the "Trust region method for L2-SVM" section in Trust Region Newton Method for Large-Scale Logistic Regression. This component can be used only in a binary classification scenario.

Configure the component

You can configure the component by using one of the following methods:
  • Use the Machine Learning Platform for AI console
    Tab Parameter Description
    Fields Setting Feature Column The input columns. The columns of the BIGINT and DOUBLE types are supported.
    Label Column The label column. The columns of the BIGINT, DOUBLE, and STRING types are supported.
    Parameters Setting Positive Sample Label The objective reference value. A random label value is selected if you do not specify this parameter. If the difference between the positive example and negative example is large, we recommend that you specify this parameter.
    Positive Penalty Factor The weight of the positive example. Valid values: (0,+∞). Default value: 1.0.
    Negative Penalty Factor The weight of the negative example. Valid values: (0,+∞). Default value: 1.0.
    Convergence Coefficient The convergence error. Valid values: (0,1). Default value: 0.001.
    Tuning Number of Computing Cores The number of cores that are used in computing. The system automatically allocates cores if this parameter is not specified.
    Memory Size per Core The memory size of each core. The system automatically allocates the memory if this parameter is not specified. Unit: MB.
  • Use commands
    PAI -name LinearSVM -project algo_public
        -DinputTableName="bank_data"
        -DmodelName="xlab_m_LinearSVM_6143"
        -DfeatureColNames="pdays,emp_var_rate,cons_conf_idx"
        -DlabelColName="y"
        -DpositiveLabel="0";
        -DpositiveCost="1.0"
        -DnegativeCost="1.0"
        -Depsilon="0.001"
    Parameter Required Description Default value
    inputTableName Yes The name of the input table. N/A
    inputTableParitions No The partitions that are selected from the input table for training. The following formats are supported:
    • Partition_name=value
    • name1=value1/name2=value2: multi-level partitions
    Note If you specify multiple partitions, separate them with commas (,).
    All partitions of the input table
    modelName Yes The name of the output model. N/A
    featureColNames Yes The feature columns that are selected from the input table for training. N/A
    labelColName Yes The name of the label column in the input table. N/A
    positiveLabel No The value of the positive example. A random value that is selected from the values of label
    positiveCost No The weight of the positive example. It is also a positive penalty factor. Valid values: (0, +∞). 1.0
    negativeCost No The weight of the negative example. It is also a negative penalty factor. Valid values: (0, +∞). 1.0
    epsilon No The convergence coefficient. Valid values: (0,1). 0.001
    enableSparse No Specifies whether to configure input data in the sparse format. Valid values: true and false. false
    itemDelimiter No The delimiter that is used to separate key-value pairs if data in an input table is in the sparse format. ,
    kvDelimiter No The delimiter that is used to separate keys and values if data in an input table is in the sparse format. :
    coreNum No The number of cores that are used in computing. The value of this parameter must be a positive integer. Determined by the system
    memSizePerCore No The memory size of each core. Valid values: 1 to 65536. Unit: MB. Determined by the system

Example

  1. Use the following training data as the input.
    id y f0 f1 f2 f3 f4 f5 f6 f7
    1 -1 -0.294118 0.487437 0.180328 -0.292929 -1 0.00149028 -0.53117 -0.0333333
    2 +1 -0.882353 -0.145729 0.0819672 -0.414141 -1 -0.207153 -0.766866 -0.666667
    3 -1 -0.0588235 0.839196 0.0491803 -1 -1 -0.305514 -0.492741 -0.633333
    4 +1 -0.882353 -0.105528 0.0819672 -0.535354 -0.777778 -0.162444 -0.923997 -1
    5 -1 -1 0.376884 -0.344262 -0.292929 -0.602837 0.28465 0.887276 -0.6
    6 +1 -0.411765 0.165829 0.213115 -1 -1 -0.23696 -0.894962 -0.7
    7 -1 -0.647059 -0.21608 -0.180328 -0.353535 -0.791962 -0.0760059 -0.854825 -0.833333
    8 +1 0.176471 0.155779 -1 -1 -1 0.052161 -0.952178 -0.733333
    9 -1 -0.764706 0.979899 0.147541 -0.0909091 0.283688 -0.0909091 -0.931682 0.0666667
    10 -1 -0.0588235 0.256281 0.57377 -1 -1 -1 -0.868488 0.1
  2. Use the following test data as the input.
    id y f0 f1 f2 f3 f4 f5 f6 f7
    1 +1 -0.882353 0.0854271 0.442623 -0.616162 -1 -0.19225 -0.725021 -0.9
    2 +1 -0.294118 -0.0351759 -1 -1 -1 -0.293592 -0.904355 -0.766667
    3 +1 -0.882353 0.246231 0.213115 -0.272727 -1 -0.171386 -0.981213 -0.7
    4 -1 -0.176471 0.507538 0.278689 -0.414141 -0.702128 0.0491804 -0.475662 0.1
    5 -1 -0.529412 0.839196 -1 -1 -1 -0.153502 -0.885568 -0.5
    6 +1 -0.882353 0.246231 -0.0163934 -0.353535 -1 0.0670641 -0.627669 -1
    7 -1 -0.882353 0.819095 0.278689 -0.151515 -0.307329 0.19225 0.00768574 -0.966667
    8 +1 -0.882353 -0.0753769 0.0163934 -0.494949 -0.903073 -0.418778 -0.654996 -0.866667
    9 +1 -1 0.527638 0.344262 -0.212121 -0.356974 0.23696 -0.836038 -0.8
    10 +1 -0.882353 0.115578 0.0163934 -0.737374 -0.56974 -0.28465 -0.948762 -0.933333
  3. Create the experiment shown in the following figure. For more information, see Generate a model by using an algorithm. Linear SVM experiment
  4. Configure the parameters listed in the following table for the Linear SVM component. Retain the default values of the parameters that are not listed in the table.
    Tab Parameter Description
    Fields Setting Feature Column Select the f0, f1, f2, f3, f4, f5, f6, and f7 columns.
    Label Column Select the y column.
  5. Run the experiment and view the prediction results. Prediction results