Comprehensive Guide to Pandas: Series, DataFrames, Aggregation, and Visualization with Matplotlib
This tutorial introduces Pandas as a core Python library for data processing, demonstrates environment setup, shows how to create and manipulate Series and DataFrames, performs data aggregation and grouping on the Iris dataset, and visualizes results using Matplotlib with extensive code examples.
Pandas is a core Python library for data processing and analysis, offering fast, flexible data structures such as one‑dimensional Series and two‑dimensional DataFrames; it supports importing data from CSV, Excel, SQL and provides functions for cleaning, merging, reshaping, grouping, and time‑series analysis, making it valuable across finance, statistics, social science, and engineering.
Environment configuration : check installation with pip show pandas ; if not installed, install via pip install pandas .
Series and DataFrame operations : initialize a Series with import pandas as pd A = pd.Series(data=[1, 2, 3, 4, 5], index=["A", "B", "C", "D", "E"], name="A1") print(A) ; retrieve values using print("数值:", A.values) , get the index with print("索引:", A.index) , slice by index print(A[["A", "C"]]) , modify values A[["A", "C"]] = [11, 12] print(A) , create a Series from a dictionary A = pd.Series({"A":1, "B":2, "C":3, "D":4}) print(A) , and count occurrences print(A.value_counts()) .
Data aggregation and grouping : load the Iris dataset with iris = pd.read_csv("D:/iris.csv") print(iris.head()) ; compute column means, minima, maxima, and sizes using NumPy functions; group by species to calculate mean, skewness, and custom aggregations, e.g., res = iris.drop("Id", axis=1).groupby("Species").mean() print(res) and res = iris.drop("Id", axis=1).groupby("SepalLengthCm").agg({"SepalLengthCm":["min","max","mean"], "SepalWidthCm":["min"], "PetalLengthCm":["skew"]}) print(res) .
Data visualization with Matplotlib : verify installation with pip show matplotlib ; create a boxplot of the four flower measurements by species, a scatter plot with color mapping based on species, a hexbin heatmap, and a line plot of the dataset using code such as iris.iloc[:,1:6].boxplot(column=["SepalLengthCm","SepalWidthCm","PetalLengthCm","PetalWidthCm"], by="Species", figsize=(10,10)) plt.show() , color = iris.Species.map({"setosa":"blue","versicolor":"green","virginica":"red"}) iris.plot(kind="scatter", x="SepalLengthCm", y="SepalWidthCm", s=30, c=color, figsize=(10,10)) plt.show() , and iris.plot(kind="hexbin", x="SepalLengthCm", y="SepalWidthCm", gridsize=15, figsize=(10,7), sharex=False) plt.show() .
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.