Artificial Intelligence 28 min read

Implementing an Automatic Math Expression Grading System with Python and Convolutional Neural Networks

This tutorial walks through building a self‑trained OCR pipeline that generates synthetic digit images, trains a CNN model, segments handwritten math expressions, predicts each character, evaluates the arithmetic result, and overlays checkmarks, crosses or answers onto the original image.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Implementing an Automatic Math Expression Grading System with Python and Convolutional Neural Networks

The article describes how to create an automatic grading tool for handwritten arithmetic problems using Python. It starts by explaining the two essential tasks: recognizing digits and segmenting them from an image.

Data Generation

Instead of using the MNIST dataset, the author generates custom images by drawing characters with various fonts, sizes and rotation angles. A small script creates 24×24 pixel PNGs for each of the 15 symbols (0‑9, =, +, -, ×, ÷) across 13 fonts and 20 rotation angles, resulting in 3,900 images per class.

<code>from __future__ import print_function
from PIL import Image, ImageFont, ImageDraw
import os, shutil, time
# label dictionary
label_dict = {0:'0', 1:'1', 2:'2', 3:'3', 4:'4', 5:'5', 6:'6', 7:'7', 8:'8', 9:'9', 10:'=', 11:'+', 12:'-', 13:'×', 14:'÷'}
# create folders and generate images for each font and rotation
for font_name in os.listdir('./fonts'):
    font_path = os.path.join('./fonts', font_name)
    for angle in range(-10, 10):
        makeImage(label_dict, font_path, rotate=angle)
</code>

Model Construction

A simple CNN is built with TensorFlow/Keras: input rescaling, two Conv2D‑MaxPooling blocks, flattening, a dense layer of 128 units, and a final dense layer with 15 outputs. The model is compiled with Adam optimizer and sparse categorical cross‑entropy loss.

<code>def create_model():
    model = Sequential([
        layers.experimental.preprocessing.Rescaling(1./255, input_shape=(24,24,1)),
        layers.Conv2D(24,3,activation='relu'),
        layers.MaxPooling2D((2,2)),
        layers.Conv2D(64,3,activation='relu'),
        layers.MaxPooling2D((2,2)),
        layers.Flatten(),
        layers.Dense(128, activation='relu'),
        layers.Dense(15)
    ])
    model.compile(optimizer='adam',
                  loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                  metrics=['accuracy'])
    return model
</code>

Training

The generated dataset is loaded with image_dataset_from_directory , cached, shuffled and prefetched. The model is trained for 10 epochs, reaching near‑100% accuracy. Weights are saved to checkpoint/char_checkpoint .

<code>model = create_model()
model.fit(train_ds, epochs=10)
model.save_weights('checkpoint/char_checkpoint')
</code>

Prediction

Two sample images (e.g., a 6 and an 8) are read with OpenCV, converted to grayscale, and fed to the trained model. The np.argmax of the softmax output yields the predicted character.

<code>imgs = np.array([img1, img2])
predicts = model.predict(imgs)
results = [class_name[np.argmax(p)] for p in predicts]
print(results)
</code>

Image Segmentation

To handle full‑page worksheets, the author uses projection profiles. Vertical (Y‑axis) projection identifies rows, while horizontal (X‑axis) projection after dilation isolates individual characters. Functions img_y_shadow , img2rows , img_x_shadow , and block2chars perform these calculations, returning bounding boxes for each character.

<code># Example: compute Y‑axis projection
def img_y_shadow(img_b):
    h, w = img_b.shape
    a = [0 for _ in range(h)]
    for i in range(h):
        for j in range(w):
            if img_b[i, j] == 255:
                a[i] += 1
    return a
</code>

Recognition of Segmented Characters

Each cropped character image is resized to 24×24, stacked into a batch, and passed through the CNN. The predictions are collected per block, forming the original arithmetic expression.

Evaluation and Feedback

The recognized string is evaluated with Python's eval (after replacing × and ÷ with * and /). The result is compared to the provided answer; a checkmark (green), cross (red) or placeholder (gray) is drawn onto the original image using Pillow.

<code># Draw result on original image
def cv2ImgAddText(img, text, left, top, textColor=(255,0,0), textSize=20):
    if isinstance(img, np.ndarray):
        img = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
    draw = ImageDraw.Draw(img)
    font = ImageFont.truetype('fonts/fangzheng_shusong.ttf', textSize)
    draw.text((left, top), text, textColor, font=font)
    return cv2.cvtColor(np.asarray(img), cv2.COLOR_RGB2BGR)
</code>

The final output image shows each expression annotated with a green check for correct answers, a red cross for wrong ones, and a gray placeholder where the answer is missing.

CNNmachine learningPythonautomationimage processingOCR
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.