Master Pandas: Install, Import Data, and Perform Powerful Data Analysis
This tutorial introduces the Pandas library, covering installation, data import from CSV and Excel, DataFrame creation, descriptive statistics, indexing with loc/iloc, and applying custom functions to clean and transform column values, all illustrated with code snippets and images.
Pandas library (named from its core data structures Pan el, Da taFrame, S eries) is a powerful data processing and visualization library that supports importing data, statistical analysis, visualization, and export.
Installing and Importing Pandas
You can install Pandas via the command line with:
<code>pip install pandas</code>If you use Anaconda, Pandas is already installed. It is commonly imported as pd .
<code>import pandas as pd</code>Pandas is often used together with Matplotlib for plotting.
Data Import
External CSV and Excel files can be loaded with pd.read_csv and pd.read_excel . For example:
<code>car = pd.read_csv('data/car-sales.csv')</code>The file path is relative to the running script, and the data is loaded into a DataFrame .
Creating a DataFrame
You can create a DataFrame from dictionaries:
<code>make = ['Toyota','Honda','Toyota','BMW','Nissan','Toyota','Honda','Honda','Toyota','Nissan']
color = ['White','Red','Blue','Black','White','Green','Blue','Blue','White','White']
odometer = [150043,87899,32549,11179,213095,99213,45698,54738,60000,31600]
doors = [4,4,3,5,4,4,4,4,4,4]
price = ['$4,000.00','$5,000.00','$7,000.00','$22,000.00','$3,500.00','$4,500.00','$7,500.00','$7,000.00','$6,250.00','$9,700.00']
car = pd.DataFrame({'Make':make,'Colour':color,'Odometer (KM)':odometer,'Doors':doors,'Price':price})</code>The resulting DataFrame is displayed in Jupyter Notebook.
Data Description
Pandas provides a describe method for quick statistical summaries:
<code>car.describe()</code>By default it describes numeric columns; to include categorical data use include=['object','float','int'] . Missing values appear as NaN .
Data Indexing
Selecting Single or Multiple Columns
Column selection works like dictionary indexing:
<code>car['Price'] # single column
car[['Make','Colour']] # multiple columns</code>Using loc and iloc
loc indexes by label, while iloc indexes by integer position. Example retrieving the value at row index 1 and column 'Odometer (KM)':
<code>car.loc[1,'Odometer (KM)']</code>Or using integer positions:
<code>car.iloc[1,2]</code>Slice syntax with : selects all rows or columns, e.g., all values in the 'Odometer (KM)' column:
<code>car.loc[:, 'Odometer (KM)']
car.loc[:, 2]</code>Using apply
The apply method applies a function to a column. To clean the Price column by removing dollar signs and commas and converting to float:
<code>def to_num(x):
x_new = x.replace('$','')
x_new = x_new.replace(',','')
return float(x_new)</code> <code>car['Price'].apply(to_num)</code>Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.