Using Python pandas to Replicate Excel Functions: VLOOKUP, Pivot Tables, and Plotting
This article demonstrates how to replace common Excel operations such as VLOOKUP, data pivot tables, and charting with Python's pandas and plotly libraries, providing code examples, explanations of data import/export settings, and performance considerations for large‑scale data analysis.
Historically business analysis relied on Excel, but as data volumes grew, practitioners turned to Python and its pandas library for more scalable analytics; pandas now offers a complete ecosystem comparable to traditional spreadsheet tools.
Before any analysis, pandas is imported and display options are set to show a limited number of rows and columns without scientific notation:
<code>import numpy as np
import pandas as pd
pd.set_option('max_columns', 10)
pd.set_option('max_rows', 20)
pd.set_option('display.float_format', lambda x: '%.2f' % x) # disable scientific notation</code>Four primary pandas I/O functions— read_csv , read_excel , to_csv , and to_excel —allow flexible reading and writing of Excel data, e.g., reading the top‑left part of a sheet:
<code>df = pd.read_excel("test.xlsx", header=1, nrows=17, usecols=3)</code>To emulate Excel's VLOOKUP , the article shows a simple lookup using pandas' apply with a grading function, followed by a merge‑based solution for cross‑sheet lookups:
<code># Example 1: grade conversion
df['等级'] = df['语文'].apply(grade_to_point)
# Example 2: merge for depreciation lookup
df1 = pd.read_excel("test.xlsx", sheet_name='折旧明细表')
df2 = pd.read_excel("test.xlsx", sheet_name=1) # Sheet1
result = df2.merge(df1[['编号', '月折旧额']], how='left', on='编号')
</code>For approximate matches, a combination of pandas filtering and looping is used to assign monthly depreciation values based on partial string matches.
Data pivot tables are recreated with pandas: the groupby approach aggregates sales, cost, and profit by month and region, while pivot_table provides an alternative that mirrors Excel's row/column/value layout.
<code># Groupby example
df['订购月份'] = df['订购日期'].apply(lambda x: x.month)
agg = df.groupby(['订购月份', '所属区域'])[['销售额', '成本']].sum()
agg['利润'] = agg['销售额'] - agg['成本']
# Pivot table example
pivot = pd.pivot_table(df, values=['销售额', '成本'], index=['订购月份', '所属区域'], aggfunc='sum')
pivot['利润'] = pivot['销售额'] - pivot['成本']
</code>For visualization, plotly is used to create interactive bar charts and radar charts that replicate Excel graphics but require more code:
<code>import plotly.offline as off
import plotly.graph_objs as go
off.init_notebook_mode()
# Bar chart
trace1 = go.Bar(x=df['国家'], y=df[1995], name='1995', opacity=0.6, marker=dict(color='powderblue'))
trace2 = go.Bar(x=df['国家'], y=df[2005], name='2005', opacity=0.6, marker=dict(color='aliceblue'))
trace3 = go.Bar(x=df['国家'], y=df[2014], name='2014', opacity=0.6, marker=dict(color='royalblue'))
layout = go.Layout(barmode='group')
fig = go.Figure(data=[trace1, trace2, trace3], layout=layout)
off.plot(fig)
# Radar chart (scatterpolar)
trace1 = go.Scatterpolar(r=df[0], theta=theta, name=names[0])
# ... additional traces ...
layout = go.Layout(polar=dict(radialaxis=dict(visible=True, range=[0,1])), showlegend=True)
fig = go.Figure(data=[trace1, trace2, trace3, trace4], layout=layout)
off.plot(fig)
</code>Overall, for simple charts Excel remains the quickest tool, but for advanced, customizable visualizations and large‑scale data manipulation, Python with pandas and plotly offers greater flexibility and performance.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.