Processing Excel Files in Python with openpyxl and pandas: Reading, Writing, Styling, and Merging
This article explains how to use Python's openpyxl and pandas libraries to read, write, style, and merge Excel files, covering installation, basic operations, advanced cell formatting, data manipulation, and techniques for combining multiple sheets or workbooks into a single DataFrame.
Introduction Handling Excel files is a common task in Python for data analysis, reporting, and automation. The two most popular libraries are openpyxl for low‑level Excel manipulation and pandas for high‑level data processing.
1. Using openpyxl
Installation
pip install openpyxlReading an Excel file
from openpyxl import load_workbook
# Load an existing workbook
wb = load_workbook('example.xlsx')
# Select the active worksheet
sheet = wb.active # or sheet = wb['Sheet1']
# Read a single cell
cell_value = sheet['A1'].value
print(f"Cell A1: {cell_value}")
# Iterate over rows and columns
for row in sheet.iter_rows(min_row=1, max_col=3, max_row=2):
for cell in row:
print(cell.value)Writing to an Excel file
from openpyxl import Workbook
wb = Workbook() # Create a new workbook
sheet = wb.active # Get the default sheet
sheet['A1'] = "Hello"
sheet['B1'] = "World"
sheet.append([1, 2, 3]) # Add a new row
wb.save('new_example.xlsx') # Save the workbookSetting cell styles
from openpyxl.styles import Font, Alignment, PatternFill
sheet['A1'].font = Font(bold=True, color="FF0000")
sheet['A1'].alignment = Alignment(horizontal="center", vertical="center")
sheet['A1'].fill = PatternFill(start_color="FFFF00", end_color="FFFF00", fill_type="solid")2. Using pandas
Installation
pip install pandas
pip install openpyxl # required for xlsx supportReading Excel files
import pandas as pd
# Read the whole workbook
df = pd.read_excel('example.xlsx')
# Read a specific sheet
df = pd.read_excel('example.xlsx', sheet_name='Sheet1')
# Read multiple sheets
dfs = pd.read_excel('example.xlsx', sheet_name=['Sheet1', 'Sheet2'])
# Read all sheets into a dict
dfs = pd.read_excel('example.xlsx', sheet_name=None)
print(df.head())Writing Excel files
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
# Write to a single sheet
df.to_excel('new_example.xlsx', index=False)
# Write to multiple sheets
with pd.ExcelWriter('new_example.xlsx') as writer:
df.to_excel(writer, sheet_name='Sheet1', index=False)
df.to_excel(writer, sheet_name='Sheet2', index=False)Data processing examples
# Filter rows
filtered_df = df[df['Age'] > 28]
# Sort rows
sorted_df = df.sort_values(by='Age', ascending=False)
# Group and aggregate
grouped_df = df.groupby('City').agg({'Age': 'mean'})
# Concatenate DataFrames
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'], 'B': ['B0', 'B1', 'B2']})
df2 = pd.DataFrame({'A': ['A3', 'A4', 'A5'], 'B': ['B3', 'B4', 'B5']})
merged_df = pd.concat([df1, df2], ignore_index=True)Excel data file merging: multiple sheets and multiple files
Merge multiple sheets from a single workbook
import pandas as pd
file_path = 'example.xlsx'
xls = pd.ExcelFile(file_path)
sheet_names = xls.sheet_names
combined_df = pd.DataFrame()
for sheet_name in sheet_names:
df = pd.read_excel(file_path, sheet_name=sheet_name)
combined_df = pd.concat([combined_df, df], ignore_index=True)
print(combined_df)
combined_df.to_excel('combined_output.xlsx', index=False)Merge multiple Excel files (each may contain multiple sheets)
import pandas as pd, glob
file_paths = glob.glob('data/*.xlsx')
combined_df = pd.DataFrame()
for file_path in file_paths:
xls = pd.ExcelFile(file_path)
sheet_names = xls.sheet_names
for sheet_name in sheet_names:
df = pd.read_excel(file_path, sheet_name=sheet_name)
combined_df = pd.concat([combined_df, df], ignore_index=True)
print(combined_df)
combined_df.to_excel('combined_output.xlsx', index=False)Handling different column names
import pandas as pd, glob
file_paths = glob.glob('data/*.xlsx')
combined_df = pd.DataFrame()
for file_path in file_paths:
xls = pd.ExcelFile(file_path)
sheet_names = xls.sheet_names
for sheet_name in sheet_names:
df = pd.read_excel(file_path, sheet_name=sheet_name)
# Standardize column names (example keeps only Name and Age)
df.columns = ['Name', 'Age', 'City']
df = df[['Name', 'Age']]
combined_df = pd.concat([combined_df, df], ignore_index=True)
print(combined_df)
combined_df.to_excel('combined_output.xlsx', index=False)Conclusion
openpyxl is ideal when fine‑grained control over Excel files is needed, such as custom cell styles or inserting charts. pandas excels at data analysis and manipulation, offering powerful filtering, sorting, grouping, and merging capabilities for large datasets.
Test Development Learning Exchange
Test Development Learning Exchange
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.