Fundamentals 9 min read

Using Python (pandas) to Perform Common Excel Data Processing Tasks

This article demonstrates how to replace typical Excel operations such as VLOOKUP, pivot tables, duplicate removal, missing‑value handling, multi‑condition filtering, fuzzy matching, column splitting, outlier replacement, grouping and labeling with concise Python pandas code to streamline data analysis workflows.

Python Programming Learning Circle

Dec 10, 2022

Using Python (pandas) to Perform Common Excel Data Processing Tasks

After learning Python, the author uses it to handle Excel data, aiming to reinforce Python skills and improve data‑processing capability.

1. VLOOKUP Equivalent

Split the original table into two DataFrames and merge them to retrieve profit for each order using df_c = pd.merge(df1, df2, on="订单明细号", how="left").

2. Pivot Table

Calculate total and average profit per salesperson in each region with

pd.pivot_table(sale, index="地区名称", columns="业务员名称", values="利润", aggfunc=[np.sum, np.mean])

3. Compare Two Columns

Create a modified order‑detail column and find differences using

result = sale.loc[sale["订单明细号"].isin(sale["订单明细号2"]) == False]

4. Remove Duplicates

Drop duplicate salesperson codes: sale.drop_duplicates("业务员编码", inplace=True).

5. Missing‑Value Handling

Identify columns with missing values via sale.info(), then fill or drop them: sale["客户名称"] = sale["客户名称"].fillna(0) and sale.dropna(subset=["客户编码"]).

6. Multi‑Condition Filtering

Retrieve orders where Beijing salesperson "张爱" sold items over 6000:

sale.loc[(sale["地区名称"]=="北京") & (sale["业务员名称"]=="张爱") & (sale["订单金额"]>5000)]

7. Fuzzy Filtering

Select rows with product names containing "三星" or "索尼": sale.loc[sale["存货名称"].str.contains("三星|索尼")].

8. Group Summary

Summarize total profit per salesperson in Beijing:

sale.groupby(["地区名称", "业务员名称"])["利润"].sum()

9. Conditional Statistics

For products with "三星" in the name and tax > 1000, compute count, sum and average profit using

sale.loc[sale["存货名称"].str.contains("三星") & (sale["税费"]>=1000)][["订单明细号", "利润"]].describe()

10. Trim Spaces

Remove surrounding spaces from product names:

sale["存货名称"] = sale["存货名称"].map(lambda s: s.strip())

11. Split Columns

Separate date and time from the "单据日期" column:

sale = pd.merge(sale, pd.DataFrame(sale["单据日期"].str.split(" ", expand=True)), how="inner", left_index=True, right_index=True)

12. Outlier Replacement

Replace negative tax values (considered outliers) with zero:

sale["订单金额"] = sale["订单金额"].replace(min(sale["订单金额"]), 0)

13. Profit‑Based Grouping

Classify regions into "较差", "中等", "较好", "非常好" based on total profit quartiles using pd.cut (bins: [-10,7091,10952,17656,37556]).

14. Business‑Logic Labeling

Label products with profit margin >30% as "优质商品" and <5% as "一般商品":

sale.loc[(sale["利润"]/sale["订单金额"])>0.3, "label"] = "优质商品"

and

sale.loc[(sale["利润"]/sale["订单金额"])<0.05, "label"] = "一般商品"

In conclusion, both Excel and Python are valuable tools; the choice depends on the specific task, and a good data analyst should use whichever tool solves the problem efficiently.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

data-processing Data cleaning data-analysis VLOOKUP

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.