Common Probability Distributions and Their Visualization with Python
This article explains the fundamentals of several common probability distributions—including uniform, normal, log‑normal, Poisson, exponential, binomial, Student's t, and chi‑squared—and provides complete Python code to generate and plot each distribution for data‑science and machine‑learning applications.
Probability and statistics are essential foundations for data science and machine learning; understanding data distributions helps model real‑world phenomena and assess variability.
Uniform Distribution
The uniform distribution assigns equal probability to all outcomes. A discrete example is a fair die (1/6 for each face), while a continuous uniform distribution spans a range a to b .
<code>import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
# continuous uniform
a = 0
b = 50
size = 5000
X_continuous = np.linspace(a, b, size)
continuous_uniform = stats.uniform(loc=a, scale=b)
continuous_uniform_pdf = continuous_uniform.pdf(X_continuous)
# discrete uniform
X_discrete = np.arange(1, 7)
discrete_uniform = stats.randint(1, 7)
discrete_uniform_pmf = discrete_uniform.pmf(X_discrete)
# plot both
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(15, 5))
ax[0].bar(X_discrete, discrete_uniform_pmf)
ax[0].set_xlabel("X")
ax[0].set_ylabel("Probability")
ax[0].set_title("Discrete Uniform Distribution")
ax[1].plot(X_continuous, continuous_uniform_pdf)
ax[1].set_xlabel("X")
ax[1].set_ylabel("Probability")
ax[1].set_title("Continuous Uniform Distribution")
plt.show()</code>Normal (Gaussian) Distribution
The normal distribution, also known as the Gaussian or bell curve, is defined by its mean μ and standard deviation σ . The mean, median, and mode coincide, and the total area under the curve equals 1.
<code>mu = 0
variance = 1
sigma = np.sqrt(variance)
x = np.linspace(mu - 3*sigma, mu + 3*sigma, 100)
plt.subplots(figsize=(8, 5))
plt.plot(x, stats.norm.pdf(x, mu, sigma))
plt.title("Normal Distribution")
plt.show()</code>The empirical rule states that about 68% of data fall within ±1σ, 95% within ±2σ, and 99.7% within ±3σ of the mean.
Log‑Normal Distribution
A log‑normal distribution arises when the logarithm of a variable follows a normal distribution. It is defined only for positive values and produces a right‑skewed curve.
<code>X = np.linspace(0, 6, 500)
std = 1
mean = 0
lognorm_distribution = stats.lognorm([std], loc=mean)
lognorm_distribution_pdf = lognorm_distribution.pdf(X)
fig, ax = plt.subplots(figsize=(8, 5))
plt.plot(X, lognorm_distribution_pdf, label="μ=0, σ=1")
std = 0.5
lognorm_distribution = stats.lognorm([std], loc=mean)
lognorm_distribution_pdf = lognorm_distribution.pdf(X)
plt.plot(X, lognorm_distribution_pdf, label="μ=0, σ=0.5")
std = 1.5
mean = 1
lognorm_distribution = stats.lognorm([std], loc=mean)
lognorm_distribution_pdf = lognorm_distribution.pdf(X)
plt.plot(X, lognorm_distribution_pdf, label="μ=1, σ=1.5")
plt.title("Lognormal Distribution")
plt.legend()
plt.show()</code>Poisson Distribution
The Poisson distribution models the number of events occurring in a fixed interval when events happen at a constant average rate λ . It is a discrete count distribution.
<code>from scipy import stats
print(stats.poisson.pmf(k=9, mu=3))
</code> <code>0.002700503931560479
</code>Visualization of a Poisson sample:
<code>X = stats.poisson.rvs(mu=3, size=500)
plt.subplots(figsize=(8, 5))
plt.hist(X, density=True, edgecolor="black")
plt.title("Poisson Distribution")
plt.show()</code>Exponential Distribution
The exponential distribution describes the time between successive events in a Poisson process. Its PDF is defined by the rate parameter λ .
<code>X = np.linspace(0, 5, 5000)
exponential_distribution = stats.expon.pdf(X, loc=0, scale=1)
plt.subplots(figsize=(8, 5))
plt.plot(X, exponential_distribution)
plt.title("Exponential Distribution")
plt.show()</code>Binomial Distribution
The binomial distribution gives the probability of obtaining a fixed number of successes x in n independent Bernoulli trials with success probability p .
<code>X = np.random.binomial(n=1, p=0.5, size=1000)
plt.subplots(figsize=(8, 5))
plt.hist(X)
plt.title("Binomial Distribution")
plt.show()</code>Student's t Distribution
The t‑distribution is used when estimating the mean of a normally distributed population with unknown variance, especially for small sample sizes. Its shape approaches the normal distribution as degrees of freedom increase.
<code>import seaborn as sns
from scipy import stats
X1 = stats.t.rvs(df=1, size=4)
X2 = stats.t.rvs(df=3, size=4)
X3 = stats.t.rvs(df=9, size=4)
plt.subplots(figsize=(8, 5))
sns.kdeplot(X1, label="1 d.o.f")
sns.kdeplot(X2, label="3 d.o.f")
sns.kdeplot(X3, label="9 d.o.f")
plt.title("Student's t distribution")
plt.legend()
plt.show()</code>Chi‑Squared Distribution
The chi‑squared distribution is a special case of the gamma distribution and is widely used for hypothesis testing and confidence interval construction. It is defined by the number of degrees of freedom k .
<code>X = np.arange(0, 6, 0.25)
plt.subplots(figsize=(8, 5))
plt.plot(X, stats.chi2.pdf(X, df=1), label="1 d.o.f")
plt.plot(X, stats.chi2.pdf(X, df=2), label="2 d.o.f")
plt.plot(X, stats.chi2.pdf(X, df=3), label="3 d.o.f")
plt.title("Chi-squared Distribution")
plt.legend()
plt.show()</code>Mastering these distributions equips data‑science practitioners with the tools needed to model, simulate, and analyze real‑world data effectively.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.