Fundamentals 8 min read

Getting Started with Pandas: Installation, DataFrames, and Basic Data Analysis in Python

This tutorial introduces Pandas, a powerful Python data‑analysis library, covering installation, importing, creating DataFrames from various sources, basic inspection, selection, filtering, sorting, grouping, handling missing values, and a practical stock‑price analysis example with code snippets.

Python Programming Learning Circle

Nov 21, 2024

Pandas is a powerful Python data‑analysis library widely used for data cleaning, processing, and analysis. It provides convenient data structures and tools that simplify handling tabular data.

Installation

Install Pandas via pip:

pip install pandas

Importing Pandas

After installation, import the library using the common alias pd:

import pandas as pd

Creating a DataFrame

A DataFrame is Pandas' primary data structure, similar to an Excel sheet or SQL table. It can be created from lists, dictionaries, CSV files, etc.

From a List

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

df = pd.DataFrame(data)
print(df)

Output:

Name  Age        City
0    Alice   25    New York
1      Bob   30  Los Angeles
2  Charlie   35     Chicago

From a CSV File

Assuming a file data.csv with the following content:

Name,Age,City
Alice,25,New York
Bob,30,Los Angeles
Charlie,35,Chicago

Read it with:

df = pd.read_csv('data.csv')
print(df)

Output:

Name  Age        City
0    Alice   25    New York
1      Bob   30  Los Angeles
2  Charlie   35     Chicago

Basic DataFrame Inspection

df.head()

– view the first few rows. df.tail() – view the last few rows. df.columns – list column names. df.dtypes – show data types of each column.

Data Selection and Filtering

Select a single column: df['Name'] Select multiple columns: df[['Name', 'Age']] Conditional filtering:

filtered_df = df[df['Age'] > 30]

Sorting

Sort by a column using sort_values:

sorted_df = df.sort_values(by='Age', ascending=False)
print(sorted_df)

Grouping

Group data and compute aggregates with groupby:

grouped_df = df.groupby('City').mean()
print(grouped_df)

Missing‑Value Handling

Check for missing values: df.isnull() Fill missing values:

df['Age'] = df['Age'].fillna(0)

Practical Example: Stock Data Analysis

Given a CSV stock_data.csv containing daily stock prices, read and analyze it:

df = pd.read_csv('stock_data.csv')
print(df)

Calculate daily percentage change:

df['Change'] = df['Close'].pct_change() * 100
print(df)

Plot the closing‑price trend (requires matplotlib):

import matplotlib.pyplot as plt

plt.figure(figsize=(10, 5))
plt.plot(df['Date'], df['Close'], marker='o')
plt.title('Stock Closing Price Trend')
plt.xlabel('Date')
plt.ylabel('Closing Price')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

Conclusion

The article demonstrated how to use Pandas for data analysis in Python, covering installation, DataFrame creation, basic inspection, selection, filtering, sorting, grouping, missing‑value handling, and a real‑world stock‑price analysis case.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Tutorial dataframe pandas data-analysis

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.