How YOLO Transforms Medical Report Screening and Occlusion Detection
Leveraging the YOLO family of deep‑learning models, this study demonstrates efficient filtering of irrelevant medical images, accurate classification of textual reports, and robust detection of occluding objects, achieving high precision and speed on both CPU and GPU, while outlining training details, performance metrics, and future improvements.
Introduction
In the medical domain, the explosive growth of data—especially various forms of medical reports such as patient records, examination reports, and surgical notes—contains abundant critical information. However, large datasets also include redundant or meaningless data (e.g., unrelated lifestyle photos, blank pages) and often suffer from occlusions (hands, receipts, everyday objects) that make the reports incomplete. Processing all reports without filtering wastes computational resources and slows down workflows, so efficient screening of medical text reports is essential.
YOLO Overview
What is YOLO? YOLO (You Only Look Once) is a popular deep‑learning model for object detection. It divides an image into a grid and predicts bounding boxes and class probabilities for each cell in a single forward pass, achieving high speed and accuracy. Since YOLOv1 (2015) the series has evolved through eight major versions.
YOLOv1
YOLOv1 uses a single convolutional network with 24 conv layers followed by 2 fully‑connected layers. It predicts bounding boxes and class probabilities for each grid cell, applying confidence scores and non‑maximum suppression to remove low‑confidence and redundant boxes.
YOLOv3
Improvements over YOLOv1 include multi‑scale detection, a deeper Darknet‑53 backbone, anchor boxes for bounding‑box prediction, and a separate classifier head, which together boost accuracy while maintaining speed.
YOLOv8
YOLOv8 introduces an advanced backbone and neck architecture, anchor‑free Ultralytics head, and balanced precision‑speed trade‑offs. It supports image classification, anchor‑free object detection, and instance segmentation, achieving state‑of‑the‑art performance.
Text Report Recognition Practice
We applied YOLOv8n‑cls, a lightweight classification model, to filter medical images and identify those containing textual reports. The model efficiently discards irrelevant images (e.g., blank pages, lifestyle photos) and focuses on valuable reports.
Training Details
Dataset: mixed medical report images with many redundant samples.
Model: YOLOv8n‑cls (lightweight, fast, high‑accuracy).
Results
Test set: 116 images (60 non‑report, 56 report).
Report‑only accuracy: 100%.
Overall accuracy (report + non‑report): 98.33%.
Inference time: 2.9 ms per image on CPU, 1.9 ms on RTX 3060 GPU.
Occlusion Detection Practice
To address occluded reports (hands, receipts, objects), we trained a detection model to recognize two classes:
reportand
other. Occlusion is determined by computing the Intersection over Union (IoU) between detected boxes; an IoU above a threshold indicates overlap.
Evaluation
[email protected]: 0.888.
Overall occlusion detection accuracy: 93.5%.
Occluded images accuracy: 87.5%.
Non‑occluded images accuracy: 95%.
Errors stem from splitting a single report into multiple boxes, missing very small occluders, and handling mosaicked images.
Performance and Limitations
Inference time for occlusion detection: 50.3 ms per image on CPU, 13 ms on RTX 3060 GPU. The current dataset is small and concentrated, limiting generalization. Future work includes expanding dataset diversity, improving annotation quality, refining IoU‑based occlusion logic with multi‑scale features, applying data augmentation, handling class imbalance, and leveraging transfer learning.
Conclusion and Outlook
Integrating text‑report classification and occlusion detection with YOLO dramatically improves the efficiency and accuracy of medical report processing. YOLO’s speed and adaptability make it suitable for broader medical imaging tasks such as automated blood‑cell counting, diagnosis assistance, and surgical support. Continued advances in deep‑learning techniques—attention mechanisms, multi‑scale processing, and richer feature extraction—are expected to further enhance detection of small and occluded targets, supporting personalized and remote healthcare.
References
A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO‑NAS
You Only Look Once: Unified, Real‑Time Object Detection
YOLO9000: Better, Faster, Stronger
YOLOv3: An Incremental Improvement
YOLOv4: Optimal Speed and Accuracy of Object Detection
GitHub – ultralytics/yolov5
GitHub – meituan/YOLOv6
GitHub – WongKinYiu/yolov7
https://github.com/ultralytics/ultralytics
https://docs.ultralytics.com/
160 Technical Team
Digital medical technology takes flight with algorithm accelerators driving progress. Code creates a new medical ecosystem, and health data co-creates a brilliant future.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.