Fundamentals 6 min read

Processing Graduate Admission Data: Converting PDF to Excel, Analyzing and Visualizing with Python

This tutorial demonstrates how to convert PDF admission lists into Excel, extract and filter candidate information using Python, compute score segment statistics, and generate bar and pie charts with pyecharts for clear visual insight into graduate enrollment data.

Python Programming Learning Circle

Jun 17, 2020

Processing Graduate Admission Data: Converting PDF to Excel, Analyzing and Visualizing with Python

When preparing for graduate entrance exams, collecting accurate admission information is crucial; many schools publish their admission lists as PDFs, so the first step is to convert these PDFs into Excel files for further analysis.

Using Python libraries such as xlrd, numpy, and pyecharts, the Excel file is opened and the first worksheet is loaded:

#coding=utf8
import xlrd
import numpy as np
from pyecharts.charts import Bar, Pie, Grid
from pyecharts import options as opts

data = xlrd.open_workbook("C:/深圳大学电子与信息工程学院2020年电子信息硕士生拟录取名单.xlsx")
table = data.sheets[0]

The script then iterates from the fourth row to extract each candidate's ID, name, status, remarks, preliminary score, retest score, total score, and ranking, storing only those with status "拟录取" and remarks "普通计划":

tables = []

def Read_Excel(excel):
    for rows in range(3, excel.nrows-1):
        dict_ = {"id":"", "name":"", "status":"", "preliminary_score":"", "retest_score":"", "total_score":"", "ranking":""}
        dict_["id"] = table.cell_value(rows, 1)
        dict_["name"] = table.cell_value(rows, 2)
        dict_["status"] = table.cell_value(rows, 3)
        dict_["remarks"] = table.cell_value(rows, 4)
        dict_["preliminary_score"] = table.cell_value(rows, 5)
        dict_["retest_score"] = table.cell_value(rows, 6)
        dict_["total_score"] = table.cell_value(rows, 7)
        dict_["ranking"] = table.cell_value(rows, 8)
        if dict_["status"] == "拟录取" and dict_["remarks"] == "普通计划":
            tables.append(dict_)

Read_Excel(table)
for i in tables:
    print(i)

After confirming the data extraction, the script counts the number of candidates falling into each 10‑point score interval (300‑310, 310‑320, …, 400‑410) while tracking the minimum and maximum scores:

num_score_300_310 = 0
num_score_310_320 = 0
# ... (other intervals) ...
min_score = 999
max_score = 0
for i in tables:
    score = i["preliminary_score"]
    if score > max_score:
        max_score = score
    if score < min_score:
        min_score = score
    if score in range(300, 310):
        num_score_300_310 += 1
    elif score in range(310, 320):
        num_score_310_320 += 1
    # ... (other ranges) ...
    elif score in range(400, 410):
        num_score_400_410 += 1

bar_x_axis_data = ("300-310", "310-320", "320-330", "330-340", "340-350", "350-360", "360-370", "370-380", "380-390", "390-400", "400-410")
bar_y_axis_data = (num_score_300_310, num_score_310_320, num_score_320_330, num_score_330_340, num_score_340_350, num_score_350_360, num_score_360_370, num_score_370_380, num_score_380_390, num_score_390_400, num_score_400_410)

With the aggregated data, a bar chart is created to show the number of admitted candidates per score range:

c = (
    Bar()
    .add_xaxis(bar_x_axis_data)
    .add_yaxis("录取考生", bar_y_axis_data, color="#af00ff")
    .set_global_opts(title_opts=opts.TitleOpts(title="数量"))
    .render("C:/录取数据图.html")
)

A pie chart is also generated to give an overview of the score distribution, with customized styling and a subtitle indicating the author:

c = (
    Pie(init_opts=opts.InitOpts(height="800px", width="1200px"))
    .add("录取分数概览", [list(z) for z in zip(bar_x_axis_data, bar_y_axis_data)], center=["35%", "38%"], radius="40%", label_opts=opts.LabelOpts(formatter="{b|{b}: }{c} {per|{d}%} ", rich={"b": {"fontSize": 16, "lineHeight": 33}, "per": {"color": "#eee", "backgroundColor": "#334455", "padding": [2, 4], "borderRadius": 2}}))
    .set_global_opts(title_opts=opts.TitleOpts(title="录取", subtitle='Made by 王昊'), legend_opts=opts.LegendOpts(pos_left="0%", pos_top="65%"))
    .render("C:/录取饼图.html")
)

The resulting visualizations provide an intuitive overview of admission statistics, making it easy to understand the distribution of scores among the selected candidates.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Data Analysis visualization Pyecharts

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.