Automating Excel Column Sorting and Filtering with Python
This guide demonstrates how to use Python's openpyxl and pandas libraries to automatically sort Excel columns by sales amount, filter rows exceeding a threshold, and extend functionality with regex, providing complete code examples for efficient data processing.
Introduction In everyday work we often need to handle large amounts of Excel data for analysis, reporting, or customer management. Manual operations become inefficient and error‑prone, so Python can serve as a powerful assistant for automating column sorting and filtering in Excel.
Theoretical Basis To manipulate Excel files with Python, install the openpyxl library, which supports reading and writing .xlsx files: pip install openpyxl . The tutorial then proceeds with concrete examples.
Use Case Imagine a sales report where products need to be sorted by sales amount in descending order and only those with sales exceeding 10,000 should be retained. This can be achieved with just a few lines of Python code.
Code Example
# ----加载工作簿
import openpyxl
# 加载工作簿
wb = openpyxl.load_workbook('sales_data.xlsx')
sheet = wb.active
# ----获取列数据
# 获取销售额列数据
sales_column = [cell.value for cell in sheet['C'][1:]]
# ----按销售额排序
# 按销售额从高到低排序
sorted_rows = sorted(range(1, len(sales_column) + 1), key=lambda x: sales_column[x-1], reverse=True)
# ----创建新工作表并写入排序后的数据
# 创建新工作表
new_sheet = wb.create_sheet(title='Sorted_Sales')
# 写入排序后的数据
for row_index in sorted_rows:
for col_index in range(1, sheet.max_column + 1):
new_sheet.cell(row=row_index, column=col_index).value = sheet.cell(row=row_index, column=col_index).value
# ----筛选销售额超过10000的产品
# 筛选销售额超过10000的产品
filtered_sales = [(row_index, value) for row_index, value in enumerate(sales_column, start=1) if value > 10000]
# ----创建筛选结果工作表
# 创建筛选结果工作表
filter_sheet = wb.create_sheet(title='Filtered_Sales')
# 写入筛选结果
for row_index, value in filtered_sales:
for col_index in range(1, sheet.max_column + 1):
filter_sheet.cell(row=row_index, column=col_index).value = sheet.cell(row=row_index, column=col_index).value
# ----保存修改后的工作簿
wb.save('processed_sales_data.xlsx')
# ----读取筛选结果
# 读取筛选结果
filtered_data = []
for row in filter_sheet.iter_rows(values_only=True):
filtered_data.append(row)
# ----打印筛选结果
print(filtered_data)
# ----自动化执行以上步骤
def process_excel(file_path, output_path):
# 上述所有代码整合为一个函数
pass
process_excel('sales_data.xlsx', 'processed_sales_data.xlsx')Advanced Version
Using pandas and openpyxl utilities, the same tasks can be performed more concisely, and additional features such as regular‑expression filtering of product names are demonstrated.
import pandas as pd
from openpyxl import Workbook
from openpyxl.utils.dataframe import dataframe_to_rows
# 读取Excel文件
df = pd.read_excel('sales_data.xlsx')
# 按销售额降序排序
df_sorted = df.sort_values(by='销售额', ascending=False)
# 筛选销售额超过10000的产品
df_filtered = df[df['销售额'] > 10000]
# 创建新的工作簿
wb = Workbook()
ws_sorted = wb.create_sheet(title="Sorted_Sales")
ws_filtered = wb.create_sheet(title="Filtered_Sales")
# 将排序后的数据写入新的工作表
for r_idx, row in enumerate(dataframe_to_rows(df_sorted, index=False, header=True)):
for c_idx, value in enumerate(row, 1):
ws_sorted.cell(row=r_idx+1, column=c_idx, value=value)
# 将筛选后的数据写入新的工作表
for r_idx, row in enumerate(dataframe_to_rows(df_filtered, index=False, header=True)):
for c_idx, value in enumerate(row, 1):
ws_filtered.cell(row=r_idx+1, column=c_idx, value=value)
# 保存新的Excel文件
wb.remove(wb['Sheet']) # 移除默认的Sheet
wb.save('processed_sales_data.xlsx')
# 使用正则表达式筛选特定产品名称
import re
pattern = re.compile(r'(iPhone|Galaxy)')
df_pattern_filtered = df[df['产品名称'].str.contains(pattern)]
ws_pattern_filtered = wb.create_sheet(title="Pattern_Filtered")
for r_idx, row in enumerate(dataframe_to_rows(df_pattern_filtered, index=False, header=True)):
for c_idx, value in enumerate(row, 1):
ws_pattern_filtered.cell(row=r_idx+1, column=c_idx, value=value)Conclusion The provided scripts show how Python can fully automate Excel sorting and filtering, and how more advanced techniques like regex filtering further enhance data processing capabilities, improving efficiency, accuracy, and flexibility.
Test Development Learning Exchange
Test Development Learning Exchange
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.