Processing Graduate Admission Data: Converting PDF to Excel, Analyzing and Visualizing with Python
This tutorial demonstrates how to convert PDF admission lists into Excel, extract and filter candidate information using Python, compute score segment statistics, and generate bar and pie charts with pyecharts for clear visual insight into graduate enrollment data.
When preparing for graduate entrance exams, collecting accurate admission information is crucial; many schools publish their admission lists as PDFs, so the first step is to convert these PDFs into Excel files for further analysis.
Using Python libraries such as xlrd , numpy , and pyecharts , the Excel file is opened and the first worksheet is loaded:
<code>#coding=utf8
import xlrd
import numpy as np
from pyecharts.charts import Bar, Pie, Grid
from pyecharts import options as opts
data = xlrd.open_workbook("C:/深圳大学电子与信息工程学院2020年电子信息硕士生拟录取名单.xlsx")
table = data.sheets[0]
</code>The script then iterates from the fourth row to extract each candidate's ID, name, status, remarks, preliminary score, retest score, total score, and ranking, storing only those with status "拟录取" and remarks "普通计划":
<code>tables = []
def Read_Excel(excel):
for rows in range(3, excel.nrows-1):
dict_ = {"id":"", "name":"", "status":"", "preliminary_score":"", "retest_score":"", "total_score":"", "ranking":""}
dict_["id"] = table.cell_value(rows, 1)
dict_["name"] = table.cell_value(rows, 2)
dict_["status"] = table.cell_value(rows, 3)
dict_["remarks"] = table.cell_value(rows, 4)
dict_["preliminary_score"] = table.cell_value(rows, 5)
dict_["retest_score"] = table.cell_value(rows, 6)
dict_["total_score"] = table.cell_value(rows, 7)
dict_["ranking"] = table.cell_value(rows, 8)
if dict_["status"] == "拟录取" and dict_["remarks"] == "普通计划":
tables.append(dict_)
Read_Excel(table)
for i in tables:
print(i)
</code>After confirming the data extraction, the script counts the number of candidates falling into each 10‑point score interval (300‑310, 310‑320, …, 400‑410) while tracking the minimum and maximum scores:
<code>num_score_300_310 = 0
num_score_310_320 = 0
# ... (other intervals) ...
min_score = 999
max_score = 0
for i in tables:
score = i["preliminary_score"]
if score > max_score:
max_score = score
if score < min_score:
min_score = score
if score in range(300, 310):
num_score_300_310 += 1
elif score in range(310, 320):
num_score_310_320 += 1
# ... (other ranges) ...
elif score in range(400, 410):
num_score_400_410 += 1
bar_x_axis_data = ("300-310", "310-320", "320-330", "330-340", "340-350", "350-360", "360-370", "370-380", "380-390", "390-400", "400-410")
bar_y_axis_data = (num_score_300_310, num_score_310_320, num_score_320_330, num_score_330_340, num_score_340_350, num_score_350_360, num_score_360_370, num_score_370_380, num_score_380_390, num_score_390_400, num_score_400_410)
</code>With the aggregated data, a bar chart is created to show the number of admitted candidates per score range:
<code>c = (
Bar()
.add_xaxis(bar_x_axis_data)
.add_yaxis("录取考生", bar_y_axis_data, color="#af00ff")
.set_global_opts(title_opts=opts.TitleOpts(title="数量"))
.render("C:/录取数据图.html")
)
</code>A pie chart is also generated to give an overview of the score distribution, with customized styling and a subtitle indicating the author:
<code>c = (
Pie(init_opts=opts.InitOpts(height="800px", width="1200px"))
.add("录取分数概览", [list(z) for z in zip(bar_x_axis_data, bar_y_axis_data)], center=["35%", "38%"], radius="40%", label_opts=opts.LabelOpts(formatter="{b|{b}: }{c} {per|{d}%} ", rich={"b": {"fontSize": 16, "lineHeight": 33}, "per": {"color": "#eee", "backgroundColor": "#334455", "padding": [2, 4], "borderRadius": 2}}))
.set_global_opts(title_opts=opts.TitleOpts(title="录取", subtitle='Made by 王昊'), legend_opts=opts.LegendOpts(pos_left="0%", pos_top="65%"))
.render("C:/录取饼图.html")
)
</code>The resulting visualizations provide an intuitive overview of admission statistics, making it easy to understand the distribution of scores among the selected candidates.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.