Reading CSV and Excel Files and Merging Data with pandas in Python
This tutorial demonstrates how to import CSV and Excel stock‑listing files using pandas, adjust parsing parameters, view data summaries, and combine multiple sheets into a single DataFrame through concatenation, providing a practical example of data import and merging in Python.
The article uses real stock‑listing data to teach how to read common data files in Python with the pandas library.
Reading CSV data
CSV files are comma‑separated and can be loaded with import pandas as pd nasdaq = pd.read_csv('nasdaq-listings.csv') print(nasdaq.head(10)) . The df.head() , df.info() methods display the first rows and a summary of the DataFrame.
To handle missing values and parse dates, the call can be refined as nasdaq = pd.read_csv('nasdaq-listings.csv', na_values='NAN', parse_dates=['Last Update']) print(nasdaq.head()) nasdaq.info() , which shows proper dtypes such as datetime64[ns] for the "Last Update" column.
Reading Excel data
Excel files can be read with nyse = pd.read_excel('listings.xlsx', sheet_name='nyse', na_values='n/a') print(nyse.head()) nyse.info() . The pd.ExcelFile() object provides access to all sheet names: xls = pd.ExcelFile('listings.xlsx') exchanges = xls.sheet_names print(exchanges) # ['amex', 'nasdaq', 'nyse'] .
All sheets can be loaded at once: listings = pd.read_excel(xls, sheet_name=exchanges, na_values='n/a') listings['nasdaq'].info() .
Merging multiple tables
Because the sheets share the same column structure, they can be stacked with # read two sheets nyse = pd.read_excel('listings.xlsx', sheet_name='nyse', na_values='n/a') nasdaq = pd.read_excel('listings.xlsx', sheet_name='nasdaq', na_values='n/a') # add source label nyse['Exchange'] = 'NYSE' nasdaq['Exchange'] = 'NASDAQ' # concatenate combined_listings = pd.concat([nyse, nasdaq]) combined_listings.info() . For all sheets, a loop can collect each DataFrame into a list and then concatenate: listings = [] for exchange in exchanges: df = pd.read_excel(xls, sheet_name=exchange, na_values='n/a') df['Exchange'] = exchange listings.append(df) listing_data = pd.concat(listings) listing_data.info() .
The final merged DataFrame contains 8 columns, including an "Exchange" column that identifies the original sheet for each row.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.