Comprehensive Pandas Tutorial: Installation, Core Concepts, Data I/O, Manipulation, and Visualization
This tutorial introduces Pandas, covering installation, core data structures like Series and DataFrame, data input/output, viewing, selection, filtering, sorting, grouping, aggregation, handling missing values, merging, advanced features such as time series and multi‑index, performance tips, and basic visualization techniques.
Introduction
Pandas is a powerful data processing library widely used for data analysis, cleaning, and manipulation.
1. Install Pandas
To install the library, run:
pip install pandas2. Basic Concepts
2.1 Data Structures
Pandas provides two main structures:
Series – a one‑dimensional array similar to a list.
DataFrame – a two‑dimensional table similar to an Excel sheet or SQL table.
2.2 Creating Series and DataFrames
Series
import pandas as pd
# Create a Series
s = pd.Series([1, 3, 5, np.nan, 6, 8])
print(s)DataFrame
# Create a DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)3. Reading and Writing Data
3.1 Reading
# Read CSV file
df = pd.read_csv('data.csv')
# Read Excel file
df = pd.read_excel('data.xlsx')
# Read SQL data
import sqlite3
conn = sqlite3.connect('database.db')
df = pd.read_sql_query("SELECT * FROM table_name", conn)3.2 Writing
# Write CSV file
df.to_csv('output.csv', index=False)
# Write Excel file
df.to_excel('output.xlsx', index=False)
# Write SQL data
df.to_sql('table_name', conn, if_exists='replace', index=False)4. Data Inspection
4.1 Head and Tail
# First 5 rows
print(df.head())
# Last 5 rows
print(df.tail())4.2 Basic Info
# DataFrame info
print(df.info())
# Descriptive statistics
print(df.describe())5. Selection and Filtering
5.1 Column Selection
# Single column
age = df['Age']
# Multiple columns
subset = df[['Name', 'Age']]5.2 Row Filtering
# Filter rows where Age > 30
filtered_df = df[df['Age'] > 30]
# Multiple conditions
filtered_df = df[(df['Age'] > 30) & (df['City'] == 'Los Angeles')]6. Sorting
# Ascending sort by Age
sorted_df = df.sort_values(by='Age')
# Descending sort by Age
sorted_df = df.sort_values(by='Age', ascending=False)7. Grouping and Aggregation
7.1 Grouping
# Group by City
grouped = df.groupby('City')
# Mean age per city
mean_age = grouped['Age'].mean()
print(mean_age)7.2 Aggregation
# Multiple aggregation functions
aggregated = df.groupby('City').agg({'Age': ['mean', 'max', 'min']})
print(aggregated)8. Data Processing
8.1 Handling Missing Values
# Check missing values
print(df.isnull().sum())
# Drop rows with missing values
df = df.dropna()
# Fill missing values with a constant
df = df.fillna(0)
# Fill missing Age with mean age
df['Age'] = df['Age'].fillna(df['Age'].mean())8.2 Data Transformation
# Add new column
df['IsAdult'] = df['Age'] >= 18
# Apply function to column
df['AgeCategory'] = df['Age'].apply(lambda x: 'Child' if x < 18 else 'Adult')9. Merging Data
# Create another DataFrame
data2 = {
'Name': ['Alice', 'Bob', 'David'],
'Salary': [50000, 60000, 70000]
}
df2 = pd.DataFrame(data2)
# Merge on Name
merged_df = pd.merge(df, df2, on='Name', how='inner')
print(merged_df)10. Advanced Features
10.1 Time Series
# Create time series
dates = pd.date_range('20230101', periods=10)
ts = pd.Series(np.random.randn(10), index=dates)
print(ts)
# Resample daily
resampled = ts.resample('D').mean()
print(resampled)10.2 Multi‑Index
# Create multi‑level index
index = pd.MultiIndex.from_tuples([(i, j) for i in range(5) for j in range(5)])
df = pd.DataFrame(np.random.randn(25, 2), index=index, columns=['A', 'B'])
# Select a level
print(df.loc[0])
# Swap levels
df = df.swaplevel(0, 1)
print(df)11. Performance Optimization
11.1 Categorical Type
# Convert string column to categorical
df['City'] = df['City'].astype('category')11.2 Using NumPy Arrays
# Operate directly on NumPy array
arr = df['Age'].values
arr *= 2
df['Age'] = arr12. Visualization
import matplotlib.pyplot as plt
# Bar chart
df.plot(kind='bar', x='Name', y='Age')
plt.show()
# Line chart
df.plot(kind='line', x='Name', y='Age')
plt.show()Conclusion
Pandas offers a rich set of features for data handling and analysis; the examples above provide a solid foundation for mastering its basic usage and progressing to more advanced capabilities.
Test Development Learning Exchange
Test Development Learning Exchange
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.