Naive Bayes is a probabilistic classification algorithm based on Bayesian theorem with independent assumptions.

Configure the component

You can configure the component by using one of the following methods:
  • Use the Machine Learning Platform for AI console
    Tab Parameter Description
    Fields Setting Feature Column By default, all columns except the label columns are selected. The columns of the DOUBLE, STRING, and BIGINT types are supported.
    Excluded Columns The columns that are not used for training. These columns cannot be used as feature columns.
    Forced Conversion Column Comply with the following rules to parse columns:
    • Parse the columns of the STRING, BOOLEAN, or DATETIME type to the columns of a discrete type.
    • Parse the columns of the DOUBLE or BIGINT type to the columns of a continuous type.
    Note To parse the columns of the BIGINT type to the columns of the categorical type, you must use this parameter to specify the type.
    Label Column The label column in the input table. The label column cannot be used as a feature column. The columns of the DOUBLE, STRING, and BIGINT types are supported.
    Input Sparse Format Data Specifies whether the input data is in the sparse format. Data in the sparse format is presented by using key-value pairs.
    Separator between K:V when input is sparse The delimiter that is used to separate key-value pairs. Commas (,) are used by default.
    The separator of key and value when the input is sparse The delimiter that is used to separate keys and values. Colons (:) are used by default.
    Tuning Number of cores The number of cores. By default, the system determines the value.
    Memory Size of Core(MB) The memory size of each core. By default, the system determines the value.
  • Use commands
    PAI -name NaiveBayes -project algo_public
        -DinputTablePartitions="pt=20150501"
        -DmodelName="xlab_m_NaiveBayes_23772"
        -DlabelColName="poutcome"
        -DfeatureColNames="age,previous,cons_conf_idx,euribor3m"
        -DinputTableName="bank_data_partition";
    Parameter Required Description Default value
    inputTableName Yes The name of the input table. N/A
    inputTablePartitions No The partitions that are selected from the input table for training. All partitions
    modelName Yes The name of the output model. N/A
    labelColName Yes The name of the label column in the input table. N/A
    featureColNames No The feature columns that are selected from the input table for training. All columns except the label column
    excludedColNames No All columns except the feature columns. The columns specified by this parameter cannot be used as the columns specified by the featureColNames parameter. Empty string
    forceCategorical No Comply with the following rules to parse columns:
    • Parse the columns of the STRING, BOOLEAN, or DATETIME type to the columns of a discrete type.
    • Parse the columns of the DOUBLE or BIGINT type to the columns of a continuous type.
    Note To parse the columns of the BIGINT type to the columns of the categorical type, you must use the forceCategorical parameter to specify the type.
    INT is a continuous type.
    coreNum No The number of cores that are used in computing. Determined by the system
    memSizePerCore No The memory size of each core. Valid values: 1 to 65536. Unit: MB. Determined by the system

Example

  1. Prepare the following training data.
    id y f0 f1 f2 f3 f4 f5 f6 f7
    1 -1 -0.294118 0.487437 0.180328 -0.292929 -1 0.00149028 -0.53117 -0.0333333
    2 +1 -0.882353 -0.145729 0.0819672 -0.414141 -1 -0.207153 -0.766866 -0.666667
    3 -1 -0.0588235 0.839196 0.0491803 -1 -1 -0.305514 -0.492741 -0.633333
    4 +1 -0.882353 -0.105528 0.0819672 -0.535354 -0.777778 -0.162444 -0.923997 -1
    5 -1 -1 0.376884 -0.344262 -0.292929 -0.602837 0.28465 0.887276 -0.6
    6 +1 -0.411765 0.165829 0.213115 -1 -1 -0.23696 -0.894962 -0.7
    7 -1 -0.647059 -0.21608 -0.180328 -0.353535 -0.791962 -0.0760059 -0.854825 -0.833333
    8 +1 0.176471 0.155779 -1 -1 -1 0.052161 -0.952178 -0.733333
    9 -1 -0.764706 0.979899 0.147541 -0.0909091 0.283688 -0.0909091 -0.931682 0.0666667
    10 -1 -0.0588235 0.256281 0.57377 -1 -1 -1 -0.868488 0.1
  2. Prepare the following test data.
    id y f0 f1 f2 f3 f4 f5 f6 f7
    1 +1 -0.882353 0.0854271 0.442623 -0.616162 -1 -0.19225 -0.725021 -0.9
    2 +1 -0.294118 -0.0351759 -1 -1 -1 -0.293592 -0.904355 -0.766667
    3 +1 -0.882353 0.246231 0.213115 -0.272727 -1 -0.171386 -0.981213 -0.7
    4 -1 -0.176471 0.507538 0.278689 -0.414141 -0.702128 0.0491804 -0.475662 0.1
    5 -1 -0.529412 0.839196 -1 -1 -1 -0.153502 -0.885568 -0.5
    6 +1 -0.882353 0.246231 -0.0163934 -0.353535 -1 0.0670641 -0.627669 -1
    7 -1 -0.882353 0.819095 0.278689 -0.151515 -0.307329 0.19225 0.00768574 -0.966667
    8 +1 -0.882353 -0.0753769 0.0163934 -0.494949 -0.903073 -0.418778 -0.654996 -0.866667
    9 +1 -1 0.527638 0.344262 -0.212121 -0.356974 0.23696 -0.836038 -0.8
    10 +1 -0.882353 0.115578 0.0163934 -0.737374 -0.56974 -0.28465 -0.948762 -0.933333
  3. Create an experiment. For more information, see Generate a model by using an algorithm. Experiment of Naive Bayes
  4. Configure the parameters listed in the following table for the Naive Bayes component. Retain the default values of the parameters that are not listed in the table.
    Tab Parameter Description
    Fields Setting Feature Column Select the f0, f1, f2, f3, f4, f5, f6, and f7 columns from the training table.
    Label Column Select the y column from the training table.
  5. Run the experiment and view the prediction results. Prediction results of Naive Bayes