Artificial Intelligence 8 min read

Understanding the Bayesian Formula and Naive Bayes Classifiers with Scikit-learn

This article explains the Bayesian theorem, introduces the Bayesian classifier, and details three Naive Bayes algorithms—Gaussian, Multinomial, and Bernoulli—along with their Scikit-learn implementations, key parameters, attributes, methods, and typical text‑classification applications for spam filtering.

Python Programming Learning Circle

Dec 18, 2020

Understanding the Bayesian Formula and Naive Bayes Classifiers with Scikit-learn

The Bayesian theorem relates prior probabilities, likelihoods, evidence, and posterior probabilities. Given mutually exclusive events F₁, F₂, …, Fₙ and new evidence E, the probability of Fₖ occurring after observing E is expressed by the formula shown below.

In this formula:

Prior Probability (P(Fₖ)) : the initial belief before seeing evidence.

Likelihood (P(E|Fₖ)) : the probability of observing the evidence if Fₖ is true.

Evidence (P(E)) : the normalizing factor ensuring probabilities sum to one.

Posterior Probability (P(Fₖ|E)) : the updated belief after incorporating the evidence.

The Bayesian classifier applies this theorem to classification problems. For an instance x and classes c₁, c₂, …, cₙ, it computes the posterior probability for each class and assigns x to the class with the highest posterior.

Three common Naive Bayes algorithms are supported by scikit‑learn:

1. Gaussian Naive Bayes (sklearn.naive_bayes.GaussianNB)

Designed for continuous features assuming a Gaussian distribution. Important attributes and methods include:

priors : optional prior probabilities (default None). Example to set custom priors:

>>> gnb.set_params(priors=[0.6, 0.4])

Resulting estimator: GaussianNB(priors=[0.6, 0.4], var_smoothing=1e-09) class_prior_ : array of learned prior probabilities.

class_count_ : number of training samples per class.

theta_ : mean of each feature per class.

sigma_ : variance of each feature per class.

>>> gnb.class_prior_

array([0.625, 0.375])

>>> gnb.class_count_

array([5., 3.])

>>> gnb.theta_

array([[-3., -3.],
 [ 2.,  2.]])

>>> gnb.sigma_

array([[2.00000001, 2.00000001],
 [0.66666667, 0.66666667]])

Key methods: get_params(deep=True) – returns a dictionary of parameters. set_params(...) – modifies parameters such as priors. fit(X, y) – trains the model. partial_fit(...) – incremental learning for large datasets. predict(X), predict_proba(X), predict_log_proba(X) – inference utilities. score(X, y) – returns classification accuracy.

2. Multinomial Naive Bayes (sklearn.naive_bayes.MultinomialNB)

Suited for discrete features such as word counts in text classification.

alpha : smoothing parameter (default 1.0).

fit_prior : whether to learn class priors (default True).

class_prior : explicit prior probabilities (default None).

Additional attributes:

class_log_prior_ : log of smoothed class priors.

intercept_ : same as class_log_prior_ when expressed as a linear model.

coef_ : same as feature_log_prob_ in linear‑model form.

feature_log_prob_ : log probability of each feature per class.

>>> mnb.feature_log_prob_

array([[-2.15948425, -1.46633707, -1.178655  , -1.06087196],
 [-1.89711998, -1.60943791, -1.04982212, -1.2039728 ],
 [-1.02961942, -1.94591015, -1.54044504, -1.25276297],
 [-1.89711998, -1.38629436, -1.2039728 , -1.2039728 ]])

Feature counts per class are stored in feature_count_:

>>> mnb.feature_count_

array([[2., 5., 7., 8.],
 [2., 3., 6., 5.],
 [9., 3., 5., 7.],
 [2., 4., 5., 5.]])

3. Bernoulli Naive Bayes (sklearn.naive_bayes.BernoulliNB)

Also for discrete data but uses binary/boolean features instead of counts, making it suitable for presence/absence representations.

Application Scenarios

Naive Bayes classifiers are frequently applied to text classification tasks such as categorizing online news articles, detecting spam emails, and other natural‑language processing problems.

Disclaimer: Content adapted from https://www.jianshu.com/p/48ee3eb0820c .

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Artificial Intelligence classification Bayesian Naive Bayes

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.