Naive Bayes - Platform For AI - Alibaba Cloud Documentation Center

Naive Bayes is a probabilistic classification algorithm based on Bayesian theorem with independent assumptions.

Configure the component

You can use one of the following methods to configure the Naive Bayes component.

Method 1: Configure the component on the pipeline page

You can configure the parameters of the Table to KV component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.


Tab	Parameter	Description
Fields Setting	Feature Column	By default, all columns except the label columns are selected. The columns of the DOUBLE, STRING, and BIGINT types are supported.
	Excluded Columns	The columns that are not used for training. These columns cannot be used as feature columns.
	Forced Conversion Column	Comply with the following rules to parse columns: Parse the columns of the STRING, BOOLEAN, or DATETIME type to the columns of a discrete type. Parse the columns of the DOUBLE or BIGINT type to the columns of a continuous type. Note To parse the columns of the BIGINT type to the columns of the categorical type, you must use this parameter to specify the type.
	Label Column	The label column in the input table. The label column cannot be used as a feature column. The columns of the DOUBLE, STRING, and BIGINT types are supported.
	Input Sparse Format Data	Specifies whether the input data is in the sparse format. Data in the sparse format is presented by using key-value pairs.
	Separator between K:V when input is sparse	The delimiter that is used to separate key-value pairs. Commas (,) are used by default.
	The separator of key and value when the input is sparse	The delimiter that is used to separate keys and values. Colons (:) are used by default.
Tuning	Number of cores	The number of cores. By default, the system determines the value.
Tuning	Memory Size of Core(MB)	The memory size of each core. By default, the system determines the value.

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.

PAI -name NaiveBayes -project algo_public
    -DinputTablePartitions="pt=20150501"
    -DmodelName="xlab_m_NaiveBayes_23772"
    -DlabelColName="poutcome"
    -DfeatureColNames="age,previous,cons_conf_idx,euribor3m"
    -DinputTableName="bank_data_partition";


Parameter	Required	Description	Default value
inputTableName	Yes	The name of the input table.	N/A
inputTablePartitions	No	The partitions that are selected from the input table for training.	All partitions
modelName	Yes	The name of the output model.	N/A
labelColName	Yes	The name of the label column in the input table.	N/A
featureColNames	No	The feature columns that are selected from the input table for training.	All columns except the label column
excludedColNames	No	All columns except the feature columns. The columns specified by this parameter cannot be used as the columns specified by the featureColNames parameter.	Empty string
forceCategorical	No	Comply with the following rules to parse columns: Parse the columns of the STRING, BOOLEAN, or DATETIME type to the columns of a discrete type. Parse the columns of the DOUBLE or BIGINT type to the columns of a continuous type. Note To parse the columns of the BIGINT type to the columns of the categorical type, you must use the forceCategorical parameter to specify the type.	INT is a continuous type.
coreNum	No	The number of cores that are used in computing.	Determined by the system
memSizePerCore	No	The memory size of each core. Valid values: 1 to 65536. Unit: MB.	Determined by the system

Example

Prepare the following training data.


id	y	f0	f1	f2	f3	f4	f5	f6	f7
1	-1	-0.294118	0.487437	0.180328	-0.292929	-1	0.00149028	-0.53117	-0.0333333
2	+1	-0.882353	-0.145729	0.0819672	-0.414141	-1	-0.207153	-0.766866	-0.666667
3	-1	-0.0588235	0.839196	0.0491803	-1	-1	-0.305514	-0.492741	-0.633333
4	+1	-0.882353	-0.105528	0.0819672	-0.535354	-0.777778	-0.162444	-0.923997	-1
5	-1	-1	0.376884	-0.344262	-0.292929	-0.602837	0.28465	0.887276	-0.6
6	+1	-0.411765	0.165829	0.213115	-1	-1	-0.23696	-0.894962	-0.7
7	-1	-0.647059	-0.21608	-0.180328	-0.353535	-0.791962	-0.0760059	-0.854825	-0.833333
8	+1	0.176471	0.155779	-1	-1	-1	0.052161	-0.952178	-0.733333
9	-1	-0.764706	0.979899	0.147541	-0.0909091	0.283688	-0.0909091	-0.931682	0.0666667
10	-1	-0.0588235	0.256281	0.57377	-1	-1	-1	-0.868488	0.1

Prepare the following test data.


id	y	f0	f1	f2	f3	f4	f5	f6	f7
1	+1	-0.882353	0.0854271	0.442623	-0.616162	-1	-0.19225	-0.725021	-0.9
2	+1	-0.294118	-0.0351759	-1	-1	-1	-0.293592	-0.904355	-0.766667
3	+1	-0.882353	0.246231	0.213115	-0.272727	-1	-0.171386	-0.981213	-0.7
4	-1	-0.176471	0.507538	0.278689	-0.414141	-0.702128	0.0491804	-0.475662	0.1
5	-1	-0.529412	0.839196	-1	-1	-1	-0.153502	-0.885568	-0.5
6	+1	-0.882353	0.246231	-0.0163934	-0.353535	-1	0.0670641	-0.627669	-1
7	-1	-0.882353	0.819095	0.278689	-0.151515	-0.307329	0.19225	0.00768574	-0.966667
8	+1	-0.882353	-0.0753769	0.0163934	-0.494949	-0.903073	-0.418778	-0.654996	-0.866667
9	+1	-1	0.527638	0.344262	-0.212121	-0.356974	0.23696	-0.836038	-0.8
10	+1	-0.882353	0.115578	0.0163934	-0.737374	-0.56974	-0.28465	-0.948762	-0.933333

Create an experiment. For more information, see Algorithm modeling.
Configure the parameters listed in the following table for the Naive Bayes component. Retain the default values of the parameters that are not listed in the table.
Tab Parameter Description
Fields Setting Feature Column Select the f0, f1, f2, f3, f4, f5, f6, and f7 columns from the training table.
Label Column Select the y column from the training table.
Run the experiment and view the prediction results.

Tab	Parameter	Description
Fields Setting	Feature Column	Select the f0, f1, f2, f3, f4, f5, f6, and f7 columns from the training table.
Fields Setting	Label Column	Select the y column from the training table.