Building an Automatic Math Grading System with Python: Data Generation, CNN Training, Image Segmentation, and Result Feedback
This tutorial explains how to create an automatic math‑grading tool in Python by generating synthetic digit images, training a small CNN on the data, segmenting handwritten equations with projection techniques, recognizing characters, evaluating the expressions, and overlaying the results back onto the original image.
Overview
The article walks through the complete pipeline for an automatic math‑grading application, covering data creation, model building, training, prediction, image segmentation, calculation, and visual feedback.
1. Data Preparation
Instead of using the standard MNIST set, the guide generates its own dataset by rendering characters with multiple fonts, sizes, and rotations. The label dictionary maps indices to characters (0‑9, '=', '+', '-', '×', '÷').
<code>from __future__ import print_function
from PIL import Image, ImageFont, ImageDraw
import os, shutil, time
label_dict = {0:'0', 1:'1', 2:'2', 3:'3', 4:'4', 5:'5', 6:'6', 7:'7', 8:'8', 9:'9', 10:'=', 11:'+', 12:'-', 13:'×', 14:'÷'}
for value, char in label_dict.items():
train_images_dir = "dataset/" + str(value)
if os.path.isdir(train_images_dir):
shutil.rmtree(train_images_dir)
os.makedirs(train_images_dir)
def makeImage(label_dict, font_path, width=24, height=24, rotate=0):
for value, char in label_dict.items():
img = Image.new("RGB", (width, height), "black")
draw = ImageDraw.Draw(img)
font = ImageFont.truetype(font_path, int(width*0.9))
font_width, font_height = draw.textsize(char, font)
x = (width - font_width - font.getoffset(char)[0]) / 2
y = (height - font_height - font.getoffset(char)[1]) / 2
draw.text((x, y), char, (255,255,255), font)
img = img.rotate(rotate)
time_value = int(round(time.time()*1000))
img_path = f"dataset/{value}/img-{value}_r-{rotate}_{time_value}.png"
img.save(img_path)
font_dir = "./fonts"
for font_name in os.listdir(font_dir):
path_font_file = os.path.join(font_dir, font_name)
for k in range(-10, 10, 1):
makeImage(label_dict, path_font_file, rotate=k)
</code>This process creates 15 folders (one per class) with 3,900 images each (13 fonts × 20 rotations).
2. Model Construction
A simple convolutional neural network is defined using TensorFlow/Keras. The architecture consists of a rescaling layer, two Conv2D‑MaxPooling blocks, flattening, and two dense layers ending with 15 output units.
<code>import tensorflow as tf
from tensorflow.keras import layers, Sequential
def create_model():
model = Sequential([
layers.experimental.preprocessing.Rescaling(1./255, input_shape=(24,24,1)),
layers.Conv2D(24, 3, activation='relu'),
layers.MaxPooling2D((2,2)),
layers.Conv2D(64, 3, activation='relu'),
layers.MaxPooling2D((2,2)),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(15)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
return model
</code>3. Training
The generated dataset is loaded with image_dataset_from_directory , cached, shuffled, and fed to the model for 10 epochs. After training, the weights are saved to checkpoint/char_checkpoint .
<code>data_dir = pathlib.Path('dataset')
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
data_dir, color_mode='grayscale', image_size=(24,24), batch_size=32)
class_names = train_ds.class_names
np.save('class_name.npy', class_names)
AUTOTUNE = tf.data.experimental.AUTOTUNE
train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
model = create_model()
model.fit(train_ds, epochs=10)
model.save_weights('checkpoint/char_checkpoint')
</code>4. Prediction on Single Images
Two example images ( img1.png and img2.png ) are read with OpenCV, stacked, and passed to the trained model. The class with the highest probability is selected using np.argmax .
<code>img1 = cv2.imread('img1.png',0)
img2 = cv2.imread('img2.png',0)
imgs = np.array([img1, img2])
model = create_model()
model.load_weights('checkpoint/char_checkpoint')
class_name = np.load('class_name.npy')
predicts = model.predict(imgs)
results = []
for predict in predicts:
index = np.argmax(predict)
results.append(class_name[index])
print(results) # e.g., ['6', '8']
</code>5. Image Segmentation (Projection Method)
To handle full‑page worksheets, the article uses vertical and horizontal projection (shadow) to locate rows and then characters. Functions img_y_shadow , img2rows , img_x_shadow , and row2blocks compute pixel counts per line/column and return bounding boxes.
<code>def img_y_shadow(img_b):
h, w = img_b.shape
a = [0 for _ in range(h)]
for i in range(h):
for j in range(w):
if img_b[i,j] == 255:
a[i] += 1
return a
def img2rows(a, w, h):
inLine = False
start = 0
mark_boxs = []
for i in range(len(a)):
if not inLine and a[i] > 10:
inLine = True
start = i
elif i-start > 5 and a[i] < 10 and inLine:
inLine = False
if i-start > 10:
top = max(start-1, 0)
bottom = min(h, i+1)
box = [0, top, w, bottom]
mark_boxs.append(box)
return mark_boxs
</code>After obtaining row boxes, each row image is further processed: binary thresholding, dilation, horizontal projection, block detection, and finally character cropping with cut_img .
6. Recognition of Cropped Characters
Each cropped character image is fed to the same CNN model. The predictions for a block are collected, forming the arithmetic expression.
<code>results = cnn.predict(model, block_imgs, class_name)
print('recognize result:', results)
</code>7. Calculation and Feedback
The recognized characters are joined into a string and evaluated with Python's eval after replacing the Chinese multiplication/division symbols. The function returns a check mark (✓) for correct answers, a cross (✗) for wrong ones, or the computed numeric result.
<code>def calculation(chars):
cstr = ''.join(chars)
if "=" in cstr:
left, right = cstr.split('=')
left = left.replace('×','*').replace('÷','/')
try:
calc = int(eval(left))
except Exception as e:
print('Exception', e)
return ''
if right == '':
return calc
return '√' if str(calc) == right else '×'
return ''
</code>The result is drawn onto the original image using OpenCV and Pillow, with green for correct, red for incorrect, and gray for missing answers.
<code>def cv2ImgAddText(img, text, left, top, textColor=(255,0,0), textSize=20):
if isinstance(img, np.ndarray):
img = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
draw = ImageDraw.Draw(img)
fontStyle = ImageFont.truetype('fonts/fangzheng_shusong.ttf', textSize, encoding='utf-8')
draw.text((left, top), text, textColor, font=fontStyle)
return cv2.cvtColor(np.asarray(img), cv2.COLOR_RGB2BGR)
</code>Finally, the annotated image is saved as result.jpg , showing check marks, crosses, or computed answers directly on the worksheet.
8. Practical Tips
Create a fonts folder and copy several TrueType fonts from C:\Windows\Fonts to increase data diversity.
Run get_character_pic.py to generate the synthetic dataset.
Use cnn.py to train the model.
Run main.py to segment a worksheet image ( imgs/question.png ) and produce imgs/result.png .
If recognition fails, ensure the worksheet font matches one of the fonts used during data generation.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.