Fundamentals 11 min read

Comprehensive Guide to Pandas: Series, DataFrames, Aggregation, and Visualization with Matplotlib

This tutorial introduces Pandas as a core Python library for data processing, demonstrates environment setup, shows how to create and manipulate Series and DataFrames, performs data aggregation and grouping on the Iris dataset, and visualizes results using Matplotlib with extensive code examples.

Python Programming Learning Circle

May 5, 2025

Comprehensive Guide to Pandas: Series, DataFrames, Aggregation, and Visualization with Matplotlib

Pandas is a core Python library for data processing and analysis, offering fast, flexible data structures such as one‑dimensional Series and two‑dimensional DataFrames; it supports importing data from CSV, Excel, SQL and provides functions for cleaning, merging, reshaping, grouping, and time‑series analysis, making it valuable across finance, statistics, social science, and engineering.

Environment configuration : check installation with pip show pandas; if not installed, install via pip install pandas.

Series and DataFrame operations : initialize a Series with

import pandas as pd
A = pd.Series(data=[1, 2, 3, 4, 5], index=["A", "B", "C", "D", "E"], name="A1")
print(A)

; retrieve values using print("数值：", A.values), get the index with print("索引：", A.index), slice by index print(A[["A", "C"]]), modify values

A[["A", "C"]] = [11, 12]
print(A)

, create a Series from a dictionary

A = pd.Series({"A":1, "B":2, "C":3, "D":4})
print(A)

, and count occurrences print(A.value_counts()).

Data aggregation and grouping : load the Iris dataset with

iris = pd.read_csv("D:/iris.csv")
print(iris.head())

; compute column means, minima, maxima, and sizes using NumPy functions; group by species to calculate mean, skewness, and custom aggregations, e.g.,

res = iris.drop("Id", axis=1).groupby("Species").mean()
print(res)

and

res = iris.drop("Id", axis=1).groupby("SepalLengthCm").agg({"SepalLengthCm":["min","max","mean"], "SepalWidthCm":["min"], "PetalLengthCm":["skew"]})
print(res)

Data visualization with Matplotlib : verify installation with pip show matplotlib; create a boxplot of the four flower measurements by species, a scatter plot with color mapping based on species, a hexbin heatmap, and a line plot of the dataset using code such as

iris.iloc[:,1:6].boxplot(column=["SepalLengthCm","SepalWidthCm","PetalLengthCm","PetalWidthCm"], by="Species", figsize=(10,10))
plt.show()

color = iris.Species.map({"setosa":"blue","versicolor":"green","virginica":"red"})
iris.plot(kind="scatter", x="SepalLengthCm", y="SepalWidthCm", s=30, c=color, figsize=(10,10))
plt.show()

, and

iris.plot(kind="hexbin", x="SepalLengthCm", y="SepalWidthCm", gridsize=15, figsize=(10,7), sharex=False)
plt.show()

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Matplotlib pandas

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.