Fundamentals 11 min read

Comprehensive Guide to Pandas: Series, DataFrames, Aggregation, and Visualization with Matplotlib

This tutorial introduces Pandas as a core Python library for data processing, demonstrates environment setup, shows how to create and manipulate Series and DataFrames, performs data aggregation and grouping on the Iris dataset, and visualizes results using Matplotlib with extensive code examples.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Comprehensive Guide to Pandas: Series, DataFrames, Aggregation, and Visualization with Matplotlib

Pandas is a core Python library for data processing and analysis, offering fast, flexible data structures such as one‑dimensional Series and two‑dimensional DataFrames; it supports importing data from CSV, Excel, SQL and provides functions for cleaning, merging, reshaping, grouping, and time‑series analysis, making it valuable across finance, statistics, social science, and engineering.

Environment configuration : check installation with pip show pandas ; if not installed, install via pip install pandas .

Series and DataFrame operations : initialize a Series with import pandas as pd A = pd.Series(data=[1, 2, 3, 4, 5], index=["A", "B", "C", "D", "E"], name="A1") print(A) ; retrieve values using print("数值:", A.values) , get the index with print("索引:", A.index) , slice by index print(A[["A", "C"]]) , modify values A[["A", "C"]] = [11, 12] print(A) , create a Series from a dictionary A = pd.Series({"A":1, "B":2, "C":3, "D":4}) print(A) , and count occurrences print(A.value_counts()) .

Data aggregation and grouping : load the Iris dataset with iris = pd.read_csv("D:/iris.csv") print(iris.head()) ; compute column means, minima, maxima, and sizes using NumPy functions; group by species to calculate mean, skewness, and custom aggregations, e.g., res = iris.drop("Id", axis=1).groupby("Species").mean() print(res) and res = iris.drop("Id", axis=1).groupby("SepalLengthCm").agg({"SepalLengthCm":["min","max","mean"], "SepalWidthCm":["min"], "PetalLengthCm":["skew"]}) print(res) .

Data visualization with Matplotlib : verify installation with pip show matplotlib ; create a boxplot of the four flower measurements by species, a scatter plot with color mapping based on species, a hexbin heatmap, and a line plot of the dataset using code such as iris.iloc[:,1:6].boxplot(column=["SepalLengthCm","SepalWidthCm","PetalLengthCm","PetalWidthCm"], by="Species", figsize=(10,10)) plt.show() , color = iris.Species.map({"setosa":"blue","versicolor":"green","virginica":"red"}) iris.plot(kind="scatter", x="SepalLengthCm", y="SepalWidthCm", s=30, c=color, figsize=(10,10)) plt.show() , and iris.plot(kind="hexbin", x="SepalLengthCm", y="SepalWidthCm", gridsize=15, figsize=(10,7), sharex=False) plt.show() .

data analysisdata visualizationMatplotlibpandasNumPy
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.