Artificial Intelligence 13 min read

Checkbox Detection and State Classification Using YOLOv5

This article describes a comprehensive solution for detecting checkboxes in document images and determining their selected or unselected status by combining YOLOv5 object detection, synthetic and semi‑synthetic data generation, specialized post‑processing, and association logic to handle varied shapes, positions, and markings.

Laiye Technology Team

Sep 28, 2022

Checkbox Detection and State Classification Using YOLOv5

Checkboxes are common symbols in documents used to capture user feedback, and accurate detection requires locating the box, recognizing its state, and linking it to descriptive text.

Initial attempts using OCR alone proved insufficient due to the variability of box shapes, sizes, and markings. The problem is abstracted into two tasks: checkbox position detection and state determination.

The technical approach treats checkboxes and their markings as detection targets, using YOLOv5 to identify them. Target sizes range from 20–40 px after resizing images to 1280 × 1280, and the model benefits from a small, fast architecture.

Data preparation includes a large real dataset (≈30 scenes, 1,200 images, 45 k annotations) and synthetic data generated by compositing checkbox and marking assets onto varied backgrounds. Two synthetic strategies are employed: fully synthetic generation by randomly placing and transforming assets, and semi‑synthetic augmentation that inserts synthetic markings into real, empty checkbox regions to balance class distribution.

Image augmentation (small rotations, gamma correction, blur, noise, and color shift) further diversifies training data.

Model training proceeds in two stages: pre‑training on abundant synthetic data (with optional pretrained weights) followed by fine‑tuning on real data.

Post‑processing includes class‑aware non‑maximum suppression to prevent merging of boxes and markings, a modified IoU calculation (intersection over the smaller box area), and an association algorithm that expands each checkbox region, groups nearby markings, and resolves ambiguous matches using IoU and distance metrics.

The final system achieves approximately 94 % accuracy and over 98 % recall on a test set of ~100 images, with identified areas for improvement such as expanding real‑world data, handling high overlap cases, and moving toward end‑to‑end solutions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

object detection post-processing document-analysis data synthesis YOLOv5 checkbox detection

Written by

Laiye Technology Team

Official account of Laiye Technology, featuring its best tech innovations, practical implementations, and cutting‑edge industry insights.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.