Getting Started with Pandas: Installation, DataFrames, and Basic Data Analysis in Python
This tutorial introduces Pandas, a powerful Python data‑analysis library, covering installation, importing, creating DataFrames from various sources, basic inspection, selection, filtering, sorting, grouping, handling missing values, and a practical stock‑price analysis example with code snippets.
Pandas is a powerful Python data‑analysis library widely used for data cleaning, processing, and analysis. It provides convenient data structures and tools that simplify handling tabular data.
Installation
Install Pandas via pip:
<code>pip install pandas</code>Importing Pandas
After installation, import the library using the common alias pd :
<code>import pandas as pd</code>Creating a DataFrame
A DataFrame is Pandas' primary data structure, similar to an Excel sheet or SQL table. It can be created from lists, dictionaries, CSV files, etc.
From a List
<code>data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)
</code>Output:
<code> Name Age City
0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
</code>From a CSV File
Assuming a file data.csv with the following content:
<code>Name,Age,City
Alice,25,New York
Bob,30,Los Angeles
Charlie,35,Chicago
</code>Read it with:
<code>df = pd.read_csv('data.csv')
print(df)
</code>Output:
<code> Name Age City
0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
</code>Basic DataFrame Inspection
df.head() – view the first few rows.
df.tail() – view the last few rows.
df.columns – list column names.
df.dtypes – show data types of each column.
Data Selection and Filtering
Select a single column: df['Name']
Select multiple columns: df[['Name', 'Age']]
Conditional filtering: filtered_df = df[df['Age'] > 30]
Sorting
Sort by a column using sort_values :
<code>sorted_df = df.sort_values(by='Age', ascending=False)
print(sorted_df)
</code>Grouping
Group data and compute aggregates with groupby :
<code>grouped_df = df.groupby('City').mean()
print(grouped_df)
</code>Missing‑Value Handling
Check for missing values: df.isnull()
Fill missing values: df['Age'] = df['Age'].fillna(0)
Practical Example: Stock Data Analysis
Given a CSV stock_data.csv containing daily stock prices, read and analyze it:
<code>df = pd.read_csv('stock_data.csv')
print(df)
</code>Calculate daily percentage change:
<code>df['Change'] = df['Close'].pct_change() * 100
print(df)
</code>Plot the closing‑price trend (requires matplotlib ):
<code>import matplotlib.pyplot as plt
plt.figure(figsize=(10, 5))
plt.plot(df['Date'], df['Close'], marker='o')
plt.title('Stock Closing Price Trend')
plt.xlabel('Date')
plt.ylabel('Closing Price')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
</code>Conclusion
The article demonstrated how to use Pandas for data analysis in Python, covering installation, DataFrame creation, basic inspection, selection, filtering, sorting, grouping, missing‑value handling, and a real‑world stock‑price analysis case.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.