Fundamentals 12 min read

Common Probability Distributions and Their Visualization with Python

This article explains the fundamentals of several common probability distributions—including uniform, normal, log‑normal, Poisson, exponential, binomial, Student's t, and chi‑squared—and provides complete Python code to generate and plot each distribution for data‑science and machine‑learning applications.

Python Programming Learning Circle

Mar 27, 2024

Common Probability Distributions and Their Visualization with Python

Probability and statistics are essential foundations for data science and machine learning; understanding data distributions helps model real‑world phenomena and assess variability.

Uniform Distribution

The uniform distribution assigns equal probability to all outcomes. A discrete example is a fair die (1/6 for each face), while a continuous uniform distribution spans a range a to b.

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# continuous uniform
a = 0
b = 50
size = 5000
X_continuous = np.linspace(a, b, size)
continuous_uniform = stats.uniform(loc=a, scale=b)
continuous_uniform_pdf = continuous_uniform.pdf(X_continuous)

# discrete uniform
X_discrete = np.arange(1, 7)
discrete_uniform = stats.randint(1, 7)
discrete_uniform_pmf = discrete_uniform.pmf(X_discrete)

# plot both
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(15, 5))
ax[0].bar(X_discrete, discrete_uniform_pmf)
ax[0].set_xlabel("X")
ax[0].set_ylabel("Probability")
ax[0].set_title("Discrete Uniform Distribution")
ax[1].plot(X_continuous, continuous_uniform_pdf)
ax[1].set_xlabel("X")
ax[1].set_ylabel("Probability")
ax[1].set_title("Continuous Uniform Distribution")
plt.show()

Normal (Gaussian) Distribution

The normal distribution, also known as the Gaussian or bell curve, is defined by its mean μ and standard deviation σ. The mean, median, and mode coincide, and the total area under the curve equals 1.

mu = 0
variance = 1
sigma = np.sqrt(variance)
x = np.linspace(mu - 3*sigma, mu + 3*sigma, 100)
plt.subplots(figsize=(8, 5))
plt.plot(x, stats.norm.pdf(x, mu, sigma))
plt.title("Normal Distribution")
plt.show()

The empirical rule states that about 68% of data fall within ±1σ, 95% within ±2σ, and 99.7% within ±3σ of the mean.

Log‑Normal Distribution

A log‑normal distribution arises when the logarithm of a variable follows a normal distribution. It is defined only for positive values and produces a right‑skewed curve.

X = np.linspace(0, 6, 500)
std = 1
mean = 0
lognorm_distribution = stats.lognorm([std], loc=mean)
lognorm_distribution_pdf = lognorm_distribution.pdf(X)
fig, ax = plt.subplots(figsize=(8, 5))
plt.plot(X, lognorm_distribution_pdf, label="μ=0, σ=1")
std = 0.5
lognorm_distribution = stats.lognorm([std], loc=mean)
lognorm_distribution_pdf = lognorm_distribution.pdf(X)
plt.plot(X, lognorm_distribution_pdf, label="μ=0, σ=0.5")
std = 1.5
mean = 1
lognorm_distribution = stats.lognorm([std], loc=mean)
lognorm_distribution_pdf = lognorm_distribution.pdf(X)
plt.plot(X, lognorm_distribution_pdf, label="μ=1, σ=1.5")
plt.title("Lognormal Distribution")
plt.legend()
plt.show()

Poisson Distribution

The Poisson distribution models the number of events occurring in a fixed interval when events happen at a constant average rate λ. It is a discrete count distribution.

from scipy import stats
print(stats.poisson.pmf(k=9, mu=3))

0.002700503931560479

Visualization of a Poisson sample:

X = stats.poisson.rvs(mu=3, size=500)
plt.subplots(figsize=(8, 5))
plt.hist(X, density=True, edgecolor="black")
plt.title("Poisson Distribution")
plt.show()

Exponential Distribution

The exponential distribution describes the time between successive events in a Poisson process. Its PDF is defined by the rate parameter λ.

X = np.linspace(0, 5, 5000)
exponential_distribution = stats.expon.pdf(X, loc=0, scale=1)
plt.subplots(figsize=(8, 5))
plt.plot(X, exponential_distribution)
plt.title("Exponential Distribution")
plt.show()

Binomial Distribution

The binomial distribution gives the probability of obtaining a fixed number of successes x in n independent Bernoulli trials with success probability p.

X = np.random.binomial(n=1, p=0.5, size=1000)
plt.subplots(figsize=(8, 5))
plt.hist(X)
plt.title("Binomial Distribution")
plt.show()

Student's t Distribution

The t‑distribution is used when estimating the mean of a normally distributed population with unknown variance, especially for small sample sizes. Its shape approaches the normal distribution as degrees of freedom increase.

import seaborn as sns
from scipy import stats
X1 = stats.t.rvs(df=1, size=4)
X2 = stats.t.rvs(df=3, size=4)
X3 = stats.t.rvs(df=9, size=4)
plt.subplots(figsize=(8, 5))
sns.kdeplot(X1, label="1 d.o.f")
sns.kdeplot(X2, label="3 d.o.f")
sns.kdeplot(X3, label="9 d.o.f")
plt.title("Student's t distribution")
plt.legend()
plt.show()

Chi‑Squared Distribution

The chi‑squared distribution is a special case of the gamma distribution and is widely used for hypothesis testing and confidence interval construction. It is defined by the number of degrees of freedom k.

X = np.arange(0, 6, 0.25)
plt.subplots(figsize=(8, 5))
plt.plot(X, stats.chi2.pdf(X, df=1), label="1 d.o.f")
plt.plot(X, stats.chi2.pdf(X, df=2), label="2 d.o.f")
plt.plot(X, stats.chi2.pdf(X, df=3), label="3 d.o.f")
plt.title("Chi-squared Distribution")
plt.legend()
plt.show()

Mastering these distributions equips data‑science practitioners with the tools needed to model, simulate, and analyze real‑world data effectively.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

statistics probability visualization distribution data-science

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.