Fundamentals 7 min read

Common Statistical Methods for Data Analysis with Python Code Examples

This article introduces ten common statistical techniques used in data analysis—including descriptive statistics, correlation, t‑test, ANOVA, linear regression, PCA, outlier detection, frequency distribution, time‑series analysis, and non‑parametric tests—providing concise explanations and Python code snippets for each method.

Test Development Learning Exchange
Test Development Learning Exchange
Test Development Learning Exchange
Common Statistical Methods for Data Analysis with Python Code Examples

In data analysis, various statistical methods help reveal trends, relationships, and distributions within datasets.

1. Descriptive Statistics : Computes basic metrics such as mean, median, and standard deviation to provide an overall summary of the data.

import numpy as np
data = [1, 2, 3, 4, 5]
mean = np.mean(data)  # calculate mean
median = np.median(data)  # calculate median
std = np.std(data)  # calculate standard deviation
print("Mean:", mean)
print("Median:", median)
print("Std:", std)

2. Correlation Analysis : Measures the linear relationship between two variables using the Pearson correlation coefficient.

import numpy as np
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
correlation = np.corrcoef(x, y)[0, 1]  # calculate correlation coefficient
print("Correlation:", correlation)

3. t‑Test : Compares the means of two independent samples to determine if they differ significantly.

from scipy import stats
group1 = [1, 2, 3, 4, 5]
group2 = [2, 4, 6, 8, 10]
t_statistic, p_value = stats.ttest_ind(group1, group2)
print("t statistic:", t_statistic)
print("p value:", p_value)

4. ANOVA (Analysis of Variance) : Extends the t‑test to compare means across three or more groups.

from scipy import stats
group1 = [1, 2, 3, 4, 5]
group2 = [2, 4, 6, 8, 10]
group3 = [3, 6, 9, 12, 15]
f_statistic, p_value = stats.f_oneway(group1, group2, group3)
print("F statistic:", f_statistic)
print("p value:", p_value)

5. Linear Regression : Fits a linear model to predict a dependent variable from an independent variable using the least‑squares method.

import numpy as np
from sklearn.linear_model import LinearRegression
x = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([2, 4, 6, 8, 10])
regression = LinearRegression()
regression.fit(x, y)
intercept = regression.intercept_  # intercept
slope = regression.coef_[0]        # slope
print("Intercept:", intercept)
print("Slope:", slope)

6. Principal Component Analysis (PCA) : Reduces data dimensionality by extracting the most important features.

import numpy as np
from sklearn.decomposition import PCA
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
pca = PCA(n_components=2)
reduced_data = pca.fit_transform(data)
print("Reduced data:")
print(reduced_data)

7. Outlier Detection : Identifies anomalous observations, for example using a box plot.

import matplotlib.pyplot as plt
data = [1, 2, 3, 4, 5, 10]
plt.boxplot(data)
plt.show()

8. Frequency Distribution : Calculates counts and frequencies of values and visualizes them with a histogram.

import numpy as np
import matplotlib.pyplot as plt
data = np.array([1, 2, 2, 3, 3, 3, 4, 4, 5])
counts, bins, _ = plt.hist(data, bins=5)
plt.show()
print("Counts:", counts)
print("Frequencies:", counts / len(data))

9. Time‑Series Analysis : Examines trends and seasonality in data indexed by time.

import pandas as pd
import matplotlib.pyplot as plt
data = pd.Series([1, 2, 3, 4, 5], index=pd.date_range('2021-01-01', periods=5))
data.plot()
plt.show()

10. Non‑Parametric Tests : Performs statistical inference without assuming a specific data distribution, such as the Mann‑Whitney U test.

from scipy import stats
group1 = [1, 2, 3, 4, 5]
group2 = [2, 4, 6, 8, 10]
u_statistic, p_value = stats.mannwhitneyu(group1, group2)
print("U statistic:", u_statistic)
print("p value:", p_value)

These examples cover the most common statistical methods used in data analysis, allowing you to select and implement the appropriate technique based on your specific data characteristics and analytical goals.

machine learningstatisticsdata analysisstatistical methods
Test Development Learning Exchange
Written by

Test Development Learning Exchange

Test Development Learning Exchange

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.