Fundamentals 6 min read

Master Pandas: Install, Import Data, and Perform Powerful Data Analysis

This tutorial introduces the Pandas library, covering installation, data import from CSV and Excel, DataFrame creation, descriptive statistics, indexing with loc/iloc, and applying custom functions to clean and transform column values, all illustrated with code snippets and images.

Model Perspective
Model Perspective
Model Perspective
Master Pandas: Install, Import Data, and Perform Powerful Data Analysis

Pandas library (named from its core data structures Pan el, Da taFrame, S eries) is a powerful data processing and visualization library that supports importing data, statistical analysis, visualization, and export.

Installing and Importing Pandas

You can install Pandas via the command line with:

<code>pip install pandas</code>

If you use Anaconda, Pandas is already installed. It is commonly imported as pd .

<code>import pandas as pd</code>

Pandas is often used together with Matplotlib for plotting.

Data Import

External CSV and Excel files can be loaded with pd.read_csv and pd.read_excel . For example:

<code>car = pd.read_csv('data/car-sales.csv')</code>

The file path is relative to the running script, and the data is loaded into a DataFrame .

Creating a DataFrame

You can create a DataFrame from dictionaries:

<code>make = ['Toyota','Honda','Toyota','BMW','Nissan','Toyota','Honda','Honda','Toyota','Nissan']
color = ['White','Red','Blue','Black','White','Green','Blue','Blue','White','White']
odometer = [150043,87899,32549,11179,213095,99213,45698,54738,60000,31600]
doors = [4,4,3,5,4,4,4,4,4,4]
price = ['$4,000.00','$5,000.00','$7,000.00','$22,000.00','$3,500.00','$4,500.00','$7,500.00','$7,000.00','$6,250.00','$9,700.00']
car = pd.DataFrame({'Make':make,'Colour':color,'Odometer (KM)':odometer,'Doors':doors,'Price':price})</code>

The resulting DataFrame is displayed in Jupyter Notebook.

Data Description

Pandas provides a describe method for quick statistical summaries:

<code>car.describe()</code>

By default it describes numeric columns; to include categorical data use include=['object','float','int'] . Missing values appear as NaN .

Data Indexing

Selecting Single or Multiple Columns

Column selection works like dictionary indexing:

<code>car['Price']  # single column
car[['Make','Colour']]  # multiple columns</code>

Using loc and iloc

loc indexes by label, while iloc indexes by integer position. Example retrieving the value at row index 1 and column 'Odometer (KM)':

<code>car.loc[1,'Odometer (KM)']</code>

Or using integer positions:

<code>car.iloc[1,2]</code>

Slice syntax with : selects all rows or columns, e.g., all values in the 'Odometer (KM)' column:

<code>car.loc[:, 'Odometer (KM)']
car.loc[:, 2]</code>

Using apply

The apply method applies a function to a column. To clean the Price column by removing dollar signs and commas and converting to float:

<code>def to_num(x):
    x_new = x.replace('$','')
    x_new = x_new.replace(',','')
    return float(x_new)</code>
<code>car['Price'].apply(to_num)</code>
Pythondata analysisdataframepandasdata-importdata-manipulation
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.