Artificial Intelligence 17 min read

How YOLO Transforms Medical Report Screening and Occlusion Detection

Leveraging the YOLO family of deep‑learning models, this study demonstrates efficient filtering of irrelevant medical images, accurate classification of textual reports, and robust detection of occluding objects, achieving high precision and speed on both CPU and GPU, while outlining training details, performance metrics, and future improvements.

160 Technical Team

Jul 29, 2024

How YOLO Transforms Medical Report Screening and Occlusion Detection

Introduction

In the medical domain, the explosive growth of data—especially various forms of medical reports such as patient records, examination reports, and surgical notes—contains abundant critical information. However, large datasets also include redundant or meaningless data (e.g., unrelated lifestyle photos, blank pages) and often suffer from occlusions (hands, receipts, everyday objects) that make the reports incomplete. Processing all reports without filtering wastes computational resources and slows down workflows, so efficient screening of medical text reports is essential.

YOLO Overview

What is YOLO? YOLO (You Only Look Once) is a popular deep‑learning model for object detection. It divides an image into a grid and predicts bounding boxes and class probabilities for each cell in a single forward pass, achieving high speed and accuracy. Since YOLOv1 (2015) the series has evolved through eight major versions.

YOLOv1

YOLOv1 uses a single convolutional network with 24 conv layers followed by 2 fully‑connected layers. It predicts bounding boxes and class probabilities for each grid cell, applying confidence scores and non‑maximum suppression to remove low‑confidence and redundant boxes.

YOLOv3

Improvements over YOLOv1 include multi‑scale detection, a deeper Darknet‑53 backbone, anchor boxes for bounding‑box prediction, and a separate classifier head, which together boost accuracy while maintaining speed.

YOLOv8

YOLOv8 introduces an advanced backbone and neck architecture, anchor‑free Ultralytics head, and balanced precision‑speed trade‑offs. It supports image classification, anchor‑free object detection, and instance segmentation, achieving state‑of‑the‑art performance.

Text Report Recognition Practice

We applied YOLOv8n‑cls, a lightweight classification model, to filter medical images and identify those containing textual reports. The model efficiently discards irrelevant images (e.g., blank pages, lifestyle photos) and focuses on valuable reports.

Training Details

Dataset: mixed medical report images with many redundant samples.

Model: YOLOv8n‑cls (lightweight, fast, high‑accuracy).

Results

Test set: 116 images (60 non‑report, 56 report).

Report‑only accuracy: 100%.

Overall accuracy (report + non‑report): 98.33%.

Inference time: 2.9 ms per image on CPU, 1.9 ms on RTX 3060 GPU.

Occlusion Detection Practice

To address occluded reports (hands, receipts, objects), we trained a detection model to recognize two classes: report and other. Occlusion is determined by computing the Intersection over Union (IoU) between detected boxes; an IoU above a threshold indicates overlap.

Evaluation

[email protected]: 0.888.

Overall occlusion detection accuracy: 93.5%.

Occluded images accuracy: 87.5%.

Non‑occluded images accuracy: 95%.

Errors stem from splitting a single report into multiple boxes, missing very small occluders, and handling mosaicked images.

Performance and Limitations

Inference time for occlusion detection: 50.3 ms per image on CPU, 13 ms on RTX 3060 GPU. The current dataset is small and concentrated, limiting generalization. Future work includes expanding dataset diversity, improving annotation quality, refining IoU‑based occlusion logic with multi‑scale features, applying data augmentation, handling class imbalance, and leveraging transfer learning.

Conclusion and Outlook

Integrating text‑report classification and occlusion detection with YOLO dramatically improves the efficiency and accuracy of medical report processing. YOLO’s speed and adaptability make it suitable for broader medical imaging tasks such as automated blood‑cell counting, diagnosis assistance, and surgical support. Continued advances in deep‑learning techniques—attention mechanisms, multi‑scale processing, and richer feature extraction—are expected to further enhance detection of small and occluded targets, supporting personalized and remote healthcare.

References

A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO‑NAS

You Only Look Once: Unified, Real‑Time Object Detection

YOLO9000: Better, Faster, Stronger

YOLOv3: An Incremental Improvement

YOLOv4: Optimal Speed and Accuracy of Object Detection

GitHub – ultralytics/yolov5

GitHub – meituan/YOLOv6

GitHub – WongKinYiu/yolov7

https://github.com/ultralytics/ultralytics

https://docs.ultralytics.com/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

deep learning object detection medical imaging YOLO report filtering

Written by

160 Technical Team

Digital medical technology takes flight with algorithm accelerators driving progress. Code creates a new medical ecosystem, and health data co-creates a brilliant future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.