Fundamentals 6 min read

Top 10 Essential Python Libraries for Data Analysis with Code Examples

This article introduces ten highly practical Python libraries for data analysis—from Pandas and NumPy for data manipulation to Matplotlib, Seaborn, Plotly, Bokeh for visualization, and Scikit‑learn, Prophet, Dask, and PySpark for machine learning and big‑data processing—each illustrated with concise code snippets.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Top 10 Essential Python Libraries for Data Analysis with Code Examples

In the field of Python data analysis, mastering core libraries can dramatically boost productivity. This guide selects ten highly useful libraries and provides code examples that walk through the full workflow from data handling to machine learning.

1. Pandas: The all‑rounder for structured data processing

Pandas excels at handling tabular data, offering efficient cleaning and transformation capabilities.

<code># 读取Excel文件并处理缺失值
import pandas as pd

df = pd.read_excel('customer_data.xlsx')
df['age'].fillna(df['age'].median(), inplace=True)  # 用中位数填充年龄缺失值

df['register_date'] = pd.to_datetime(df['register_date'])  # 数据转换:将日期字符串转为日期格式</code>

2. NumPy: The acceleration engine for multi‑dimensional array operations

NumPy provides high‑performance numerical computation, suitable for large‑scale data.

<code>import numpy as np

sales = np.array([1200, 1500, 800, 2000])
commission = sales * 0.05  # 计算5%的佣金

total = np.sum(sales)  # 总销售额:5500</code>

3. Matplotlib: The Swiss‑army knife for basic charting

Matplotlib can quickly generate line charts, scatter plots, and other basic visualizations.

<code>import matplotlib.pyplot as plt

products = ['A', 'B', 'C']
sales = [120, 150, 90]
plt.bar(products, sales, color=['#1f77b4', '#ff7f0e', '#2ca02c'])
plt.title('Product Sales Comparison')
plt.show()</code>

4. Seaborn: The stylish choice for statistical visualizations

Built on Matplotlib, Seaborn produces more attractive statistical charts.

<code>import seaborn as sns

corr_matrix = df.corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.title('Feature Correlation Heatmap')
plt.show()</code>

5. Plotly: The dynamic expert for interactive charts

Plotly supports interactive visualizations, ideal for dynamic reports.

<code>import plotly.express as px

fig = px.choropleth(df, locations='state', color='sales',
                    hover_data=['city', 'revenue'],
                    color_continuous_scale='Viridis')
fig.show()</code>

6. Scikit‑learn: The Swiss‑army knife for machine‑learning preprocessing

Scikit‑learn offers tools for data preprocessing and model training.

<code>from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(df[['price', 'advertising']])</code>

7. Dask: The parallel pioneer for distributed computing

Dask handles massive datasets and supports distributed computation.

<code>import dask.dataframe as dd

ddf = dd.read_csv('large_sales.csv')
average = ddf.groupby('category')['sales'].mean().compute()</code>

8. PySpark: The distributed engine for big‑data analysis

PySpark is suitable for processing huge data volumes with distributed computing.

<code>from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("SalesAnalysis").getOrCreate()

df_spark = spark.read.csv('sales_data.csv', header=True, inferSchema=True)

df_spark.orderBy(df_spark['sales'].desc()).show(5)</code>

9. Bokeh: The lightweight option for interactive visualizations

Bokeh creates interactive charts that integrate well with web applications.

<code>from bokeh.plotting import figure, show

p = figure(title="Sales vs. Price", x_axis_label='Price', y_axis_label='Sales')
p.circle(df['price'], df['sales'], size=10, color='blue', alpha=0.5)
show(p)</code>

10. Prophet: The powerhouse for time‑series forecasting

Prophet excels at handling time‑series data and provides high‑accuracy predictions.

<code>from prophet import Prophet

df_prophet = df[['register_date', 'sales']].rename(columns={'register_date':'ds','sales':'y'})
model = Prophet()
model.fit(df_prophet)
future = model.make_future_dataframe(periods=365)
forecast = model.predict(future)
model.plot(forecast)</code>

The article concludes with a call to action, offering free Python learning resources via QR code and links to related articles.

Pythondata analysisMatplotlibpandasNumPyDaskscikit-learnPySpark
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.