Naive Bayes is a probabilistic classification algorithm based on Bayesian theorem with independent assumptions.
Configure the component
You can configure the component by using one of the following methods:
- Use the Machine Learning Platform for AI console
Tab Parameter Description Fields Setting Feature Column By default, all columns except the label columns are selected. The columns of the DOUBLE, STRING, and BIGINT types are supported. Excluded Columns The columns that are not used for training. These columns cannot be used as feature columns. Forced Conversion Column Comply with the following rules to parse columns: - Parse the columns of the STRING, BOOLEAN, or DATETIME type to the columns of a discrete type.
- Parse the columns of the DOUBLE or BIGINT type to the columns of a continuous type.
Note To parse the columns of the BIGINT type to the columns of the categorical type, you must use this parameter to specify the type.Label Column The label column in the input table. The label column cannot be used as a feature column. The columns of the DOUBLE, STRING, and BIGINT types are supported. Input Sparse Format Data Specifies whether the input data is in the sparse format. Data in the sparse format is presented by using key-value pairs. Separator between K:V when input is sparse The delimiter that is used to separate key-value pairs. Commas (,) are used by default. The separator of key and value when the input is sparse The delimiter that is used to separate keys and values. Colons (:) are used by default. Tuning Number of cores The number of cores. By default, the system determines the value. Memory Size of Core(MB) The memory size of each core. By default, the system determines the value. - Use commands
PAI -name NaiveBayes -project algo_public -DinputTablePartitions="pt=20150501" -DmodelName="xlab_m_NaiveBayes_23772" -DlabelColName="poutcome" -DfeatureColNames="age,previous,cons_conf_idx,euribor3m" -DinputTableName="bank_data_partition";
Parameter Required Description Default value inputTableName Yes The name of the input table. N/A inputTablePartitions No The partitions that are selected from the input table for training. All partitions modelName Yes The name of the output model. N/A labelColName Yes The name of the label column in the input table. N/A featureColNames No The feature columns that are selected from the input table for training. All columns except the label column excludedColNames No All columns except the feature columns. The columns specified by this parameter cannot be used as the columns specified by the featureColNames parameter. Empty string forceCategorical No Comply with the following rules to parse columns: - Parse the columns of the STRING, BOOLEAN, or DATETIME type to the columns of a discrete type.
- Parse the columns of the DOUBLE or BIGINT type to the columns of a continuous type.
Note To parse the columns of the BIGINT type to the columns of the categorical type, you must use the forceCategorical parameter to specify the type.INT is a continuous type. coreNum No The number of cores that are used in computing. Determined by the system memSizePerCore No The memory size of each core. Valid values: 1 to 65536. Unit: MB. Determined by the system
Example
- Prepare the following training data.
id y f0 f1 f2 f3 f4 f5 f6 f7 1 -1 -0.294118 0.487437 0.180328 -0.292929 -1 0.00149028 -0.53117 -0.0333333 2 +1 -0.882353 -0.145729 0.0819672 -0.414141 -1 -0.207153 -0.766866 -0.666667 3 -1 -0.0588235 0.839196 0.0491803 -1 -1 -0.305514 -0.492741 -0.633333 4 +1 -0.882353 -0.105528 0.0819672 -0.535354 -0.777778 -0.162444 -0.923997 -1 5 -1 -1 0.376884 -0.344262 -0.292929 -0.602837 0.28465 0.887276 -0.6 6 +1 -0.411765 0.165829 0.213115 -1 -1 -0.23696 -0.894962 -0.7 7 -1 -0.647059 -0.21608 -0.180328 -0.353535 -0.791962 -0.0760059 -0.854825 -0.833333 8 +1 0.176471 0.155779 -1 -1 -1 0.052161 -0.952178 -0.733333 9 -1 -0.764706 0.979899 0.147541 -0.0909091 0.283688 -0.0909091 -0.931682 0.0666667 10 -1 -0.0588235 0.256281 0.57377 -1 -1 -1 -0.868488 0.1 - Prepare the following test data.
id y f0 f1 f2 f3 f4 f5 f6 f7 1 +1 -0.882353 0.0854271 0.442623 -0.616162 -1 -0.19225 -0.725021 -0.9 2 +1 -0.294118 -0.0351759 -1 -1 -1 -0.293592 -0.904355 -0.766667 3 +1 -0.882353 0.246231 0.213115 -0.272727 -1 -0.171386 -0.981213 -0.7 4 -1 -0.176471 0.507538 0.278689 -0.414141 -0.702128 0.0491804 -0.475662 0.1 5 -1 -0.529412 0.839196 -1 -1 -1 -0.153502 -0.885568 -0.5 6 +1 -0.882353 0.246231 -0.0163934 -0.353535 -1 0.0670641 -0.627669 -1 7 -1 -0.882353 0.819095 0.278689 -0.151515 -0.307329 0.19225 0.00768574 -0.966667 8 +1 -0.882353 -0.0753769 0.0163934 -0.494949 -0.903073 -0.418778 -0.654996 -0.866667 9 +1 -1 0.527638 0.344262 -0.212121 -0.356974 0.23696 -0.836038 -0.8 10 +1 -0.882353 0.115578 0.0163934 -0.737374 -0.56974 -0.28465 -0.948762 -0.933333 - Create an experiment. For more information, see Generate a model by using an algorithm.
- Configure the parameters listed in the following table for the Naive Bayes component.
Retain the default values of the parameters that are not listed in the table.
Tab Parameter Description Fields Setting Feature Column Select the f0, f1, f2, f3, f4, f5, f6, and f7 columns from the training table. Label Column Select the y column from the training table. - Run the experiment and view the prediction results.