Naive Bayes is a probabilistic classification algorithm based on Bayesian theorem with independent assumptions.

Configure the component

You can use one of the following methods to configure the Naive Bayes component.

Method 1: Configure the component on the pipeline page

You can configure the parameters of the Table to KV component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.
TabParameterDescription
Fields SettingFeature ColumnBy default, all columns except the label columns are selected. The columns of the DOUBLE, STRING, and BIGINT types are supported.
Excluded ColumnsThe columns that are not used for training. These columns cannot be used as feature columns.
Forced Conversion ColumnComply with the following rules to parse columns:
  • Parse the columns of the STRING, BOOLEAN, or DATETIME type to the columns of a discrete type.
  • Parse the columns of the DOUBLE or BIGINT type to the columns of a continuous type.
Note To parse the columns of the BIGINT type to the columns of the categorical type, you must use this parameter to specify the type.
Label ColumnThe label column in the input table. The label column cannot be used as a feature column. The columns of the DOUBLE, STRING, and BIGINT types are supported.
Input Sparse Format DataSpecifies whether the input data is in the sparse format. Data in the sparse format is presented by using key-value pairs.
Separator between K:V when input is sparseThe delimiter that is used to separate key-value pairs. Commas (,) are used by default.
The separator of key and value when the input is sparseThe delimiter that is used to separate keys and values. Colons (:) are used by default.
TuningNumber of coresThe number of cores. By default, the system determines the value.
Memory Size of Core(MB)The memory size of each core. By default, the system determines the value.

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
PAI -name NaiveBayes -project algo_public
    -DinputTablePartitions="pt=20150501"
    -DmodelName="xlab_m_NaiveBayes_23772"
    -DlabelColName="poutcome"
    -DfeatureColNames="age,previous,cons_conf_idx,euribor3m"
    -DinputTableName="bank_data_partition";
ParameterRequiredDescriptionDefault value
inputTableNameYesThe name of the input table. N/A
inputTablePartitionsNoThe partitions that are selected from the input table for training. All partitions
modelNameYesThe name of the output model. N/A
labelColNameYesThe name of the label column in the input table. N/A
featureColNamesNoThe feature columns that are selected from the input table for training. All columns except the label column
excludedColNamesNoAll columns except the feature columns. The columns specified by this parameter cannot be used as the columns specified by the featureColNames parameter. Empty string
forceCategoricalNoComply with the following rules to parse columns:
  • Parse the columns of the STRING, BOOLEAN, or DATETIME type to the columns of a discrete type.
  • Parse the columns of the DOUBLE or BIGINT type to the columns of a continuous type.
Note To parse the columns of the BIGINT type to the columns of the categorical type, you must use the forceCategorical parameter to specify the type.
INT is a continuous type.
coreNumNoThe number of cores that are used in computing. Determined by the system
memSizePerCoreNoThe memory size of each core. Valid values: 1 to 65536. Unit: MB. Determined by the system

Example

  1. Prepare the following training data.
    idyf0f1f2f3f4f5f6f7
    1-1-0.2941180.4874370.180328-0.292929-10.00149028-0.53117-0.0333333
    2+1-0.882353-0.1457290.0819672-0.414141-1-0.207153-0.766866-0.666667
    3-1-0.05882350.8391960.0491803-1-1-0.305514-0.492741-0.633333
    4+1-0.882353-0.1055280.0819672-0.535354-0.777778-0.162444-0.923997-1
    5-1-10.376884-0.344262-0.292929-0.6028370.284650.887276-0.6
    6+1-0.4117650.1658290.213115-1-1-0.23696-0.894962-0.7
    7-1-0.647059-0.21608-0.180328-0.353535-0.791962-0.0760059-0.854825-0.833333
    8+10.1764710.155779-1-1-10.052161-0.952178-0.733333
    9-1-0.7647060.9798990.147541-0.09090910.283688-0.0909091-0.9316820.0666667
    10-1-0.05882350.2562810.57377-1-1-1-0.8684880.1
  2. Prepare the following test data.
    idyf0f1f2f3f4f5f6f7
    1+1-0.8823530.08542710.442623-0.616162-1-0.19225-0.725021-0.9
    2+1-0.294118-0.0351759-1-1-1-0.293592-0.904355-0.766667
    3+1-0.8823530.2462310.213115-0.272727-1-0.171386-0.981213-0.7
    4-1-0.1764710.5075380.278689-0.414141-0.7021280.0491804-0.4756620.1
    5-1-0.5294120.839196-1-1-1-0.153502-0.885568-0.5
    6+1-0.8823530.246231-0.0163934-0.353535-10.0670641-0.627669-1
    7-1-0.8823530.8190950.278689-0.151515-0.3073290.192250.00768574-0.966667
    8+1-0.882353-0.07537690.0163934-0.494949-0.903073-0.418778-0.654996-0.866667
    9+1-10.5276380.344262-0.212121-0.3569740.23696-0.836038-0.8
    10+1-0.8823530.1155780.0163934-0.737374-0.56974-0.28465-0.948762-0.933333
  3. Create an experiment. For more information, see Algorithm modeling. Experiment of Naive Bayes
  4. Configure the parameters listed in the following table for the Naive Bayes component. Retain the default values of the parameters that are not listed in the table.
    TabParameterDescription
    Fields SettingFeature ColumnSelect the f0, f1, f2, f3, f4, f5, f6, and f7 columns from the training table.
    Label ColumnSelect the y column from the training table.
  5. Run the experiment and view the prediction results. Prediction results of Naive Bayes