Artificial Intelligence 8 min read

Understanding the Bayesian Formula and Naive Bayes Classifiers with Scikit-learn

This article explains the Bayesian theorem, introduces the Bayesian classifier, and details three Naive Bayes algorithms—Gaussian, Multinomial, and Bernoulli—along with their Scikit-learn implementations, key parameters, attributes, methods, and typical text‑classification applications for spam filtering.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Understanding the Bayesian Formula and Naive Bayes Classifiers with Scikit-learn

The Bayesian theorem relates prior probabilities, likelihoods, evidence, and posterior probabilities. Given mutually exclusive events F₁, F₂, …, Fₙ and new evidence E, the probability of Fₖ occurring after observing E is expressed by the formula shown below.

In this formula:

Prior Probability (P(Fₖ)) : the initial belief before seeing evidence.

Likelihood (P(E|Fₖ)) : the probability of observing the evidence if Fₖ is true.

Evidence (P(E)) : the normalizing factor ensuring probabilities sum to one.

Posterior Probability (P(Fₖ|E)) : the updated belief after incorporating the evidence.

The Bayesian classifier applies this theorem to classification problems. For an instance x and classes c₁, c₂, …, cₙ, it computes the posterior probability for each class and assigns x to the class with the highest posterior.

Three common Naive Bayes algorithms are supported by scikit‑learn:

1. Gaussian Naive Bayes (sklearn.naive_bayes.GaussianNB)

Designed for continuous features assuming a Gaussian distribution. Important attributes and methods include:

priors : optional prior probabilities (default None). Example to set custom priors:

>>> gnb.set_params(priors=[0.6, 0.4])

Resulting estimator:

GaussianNB(priors=[0.6, 0.4], var_smoothing=1e-09)

class_prior_ : array of learned prior probabilities.

class_count_ : number of training samples per class.

theta_ : mean of each feature per class.

sigma_ : variance of each feature per class.

>>> gnb.class_prior_
array([0.625, 0.375])
>>> gnb.class_count_
array([5., 3.])
>>> gnb.theta_
array([[-3., -3.],
 [ 2.,  2.]])
>>> gnb.sigma_
array([[2.00000001, 2.00000001],
 [0.66666667, 0.66666667]])

Key methods:

get_params(deep=True) – returns a dictionary of parameters.

set_params(...) – modifies parameters such as priors.

fit(X, y) – trains the model.

partial_fit(...) – incremental learning for large datasets.

predict(X) , predict_proba(X) , predict_log_proba(X) – inference utilities.

score(X, y) – returns classification accuracy.

2. Multinomial Naive Bayes (sklearn.naive_bayes.MultinomialNB)

Suited for discrete features such as word counts in text classification.

alpha : smoothing parameter (default 1.0).

fit_prior : whether to learn class priors (default True).

class_prior : explicit prior probabilities (default None).

Additional attributes:

class_log_prior_ : log of smoothed class priors.

intercept_ : same as class_log_prior_ when expressed as a linear model.

coef_ : same as feature_log_prob_ in linear‑model form.

feature_log_prob_ : log probability of each feature per class.

>>> mnb.feature_log_prob_
array([[-2.15948425, -1.46633707, -1.178655  , -1.06087196],
 [-1.89711998, -1.60943791, -1.04982212, -1.2039728 ],
 [-1.02961942, -1.94591015, -1.54044504, -1.25276297],
 [-1.89711998, -1.38629436, -1.2039728 , -1.2039728 ]])

Feature counts per class are stored in feature_count_ :

>>> mnb.feature_count_
array([[2., 5., 7., 8.],
 [2., 3., 6., 5.],
 [9., 3., 5., 7.],
 [2., 4., 5., 5.]])

3. Bernoulli Naive Bayes (sklearn.naive_bayes.BernoulliNB)

Also for discrete data but uses binary/boolean features instead of counts, making it suitable for presence/absence representations.

Application Scenarios

Naive Bayes classifiers are frequently applied to text classification tasks such as categorizing online news articles, detecting spam emails, and other natural‑language processing problems.

Disclaimer: Content adapted from https://www.jianshu.com/p/48ee3eb0820c .

Artificial Intelligencemachine learningclassificationbayesianscikit-learnnaive bayes
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.