Applying YOLOv5 Object Detection for Black, Color, and Normal Screen Classification in Video Frames
This article presents a method that replaces traditional manual video frame quality checks with an automated YOLOv5‑based object detection pipeline, detailing data labeling, model training, loss computation, inference code, and experimental results that show higher accuracy than ResNet for classifying black, color‑screen, and normal frames.
Video frame black‑screen and color‑screen detection is a crucial part of video quality assessment, but manual inspection is labor‑intensive and inefficient; the article proposes an automated solution using a classification approach based on object detection models, specifically YOLOv5.
The workflow starts with a simplified labeling strategy where each whole image is treated as a single target, assigning class 0 to normal screens, 1 to colorful screens, and 2 to black screens. The labeling code is shown below:
OBJECT_DICT = {"Normalscreen": 0, "Colorfulscreen": 1, "Blackscreen": 2}
def parse_json_file(image_path):
imageName = os.path.basename(image_path).split('.')[0]
img = cv2.imread(image_path)
size = img.shape
label = image_path.split('/')[-1].split('\\')[0]
label = OBJECT_DICT.get(label)
imageWidth, imageHeight = size[0], size[1]
label_dict = {}
xmin, ymin = (0, 0)
xmax, ymax = (imageWidth, imageHeight)
xcenter = (xmin + xmax) / 2 / float(imageWidth)
ycenter = (ymin + ymax) / 2 / float(imageHeight)
width = (xmax - xmin) / float(imageWidth)
heigt = (ymax - ymin) / float(imageHeight)
label_dict.update({label: [str(xcenter), str(ycenter), str(width), str(heigt)]})
label_dict = sorted(label_dict.items(), key=lambda x: x[0])
return imageName, label_dictThe training pipeline follows the standard YOLOv5 workflow with minor adjustments for the single‑class dataset. Key steps such as data loading, model creation, learning‑rate scheduling, and the training loop are illustrated in the following snippet:
# Load data, get train and test paths
with open(opt.data) as f:
data_dict = yaml.load(f, Loader=yaml.FullLoader)
with torch_distributed_zero_first(rank):
check_dataset(data_dict)
train_path = data_dict['train']
test_path = data_dict['val']
Number_class, names = (1, ['item']) if opt.single_cls else (int(data_dict['nc']), data_dict['names'])
# Create model
model = Model(opt.cfg, ch=3, nc=Number_class).to(device)
# Learning‑rate schedule
lf = lambda x: ((1 + math.cos(x * math.pi / epochs)) / 2) * (1 - hyp['lrf']) + hyp['lrf']
scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf)
# Training loop
for epoch in range(start_epoch, epochs):
model.train()The loss function combines bounding‑box (GIoU), objectness, and classification components, as shown below:
def compute_loss(p, targets, model):
device = targets.device
loss_cls, loss_box, loss_obj = torch.zeros(1, device=device), torch.zeros(1, device=device), torch.zeros(1, device=device)
tcls, tbox, indices, anchors = build_targets(p, targets, model)
h = model.hyp
BCEcls = nn.BCEWithLogitsLoss(pos_weight=torch.Tensor([h['cls_pw']])).to(device)
BCEobj = nn.BCEWithLogitsLoss(pos_weight=torch.Tensor([h['obj_pw']])).to(device)
cp, cn = smooth_BCE(eps=0.0)
nt = 0
np = len(p)
balance = [4.0, 1.0, 0.4] if np == 3 else [4.0, 1.0, 0.4, 0.1]
for i, pi in enumerate(p):
image, anchor, gridy, gridx = indices[i]
tobj = torch.zeros_like(pi[..., 0], device=device)
n = image.shape[0]
if n:
nt += n
ps = pi[anchor, image, gridy, gridx]
pxy = ps[:, :2].sigmoid() * 2 - 0.5
pwh = (ps[:, 2:4].sigmoid() * 2) ** 2 * anchors[i]
predicted_box = torch.cat((pxy, pwh), 1).to(device)
giou = bbox_iou(predicted_box.T, tbox[i], x1y1x2y2=False, CIoU=True)
loss_box += (1.0 - giou).mean()
tobj[image, anchor, gridy, gridx] = (1.0 - model.gr) + model.gr * giou.detach().clamp(0).type(tobj.dtype)
if model.nc > 1:
t = torch.full_like(ps[:, 5:], cn, device=device)
t[range(n), tcls[i]] = cp
loss_cls += BCEcls(ps[:, 5:], t)
loss_obj += BCEobj(pi[..., 4], tobj) * balance[i]
s = 3 / np
loss_box *= h['giou'] * s
loss_obj *= h['obj'] * s * (1.4 if np == 4 else 1.0)
loss_cls *= h['cls'] * s
loss = loss_box + loss_obj + loss_cls
return loss * bs, torch.cat((loss_box, loss_obj, loss_cls, loss)).detach()During inference, the detection results are post‑processed to extract the class with the highest confidence, effectively turning the object detector into a classifier:
def detect(opt, img):
out, source, weights, view_img, save_txt, imgsz = opt.output, img, opt.weights, opt.view_img, opt.save_txt, opt.img_size
device = select_device(opt.device)
half = device.type != 'cpu'
model = experimental.attempt_load(weights, map_location=device)
imgsz = check_img_size(imgsz, s=model.stride.max())
if half:
model.half()
img = letterbox(img)[0]
img = img[:, :, ::-1].transpose(2, 0, 1)
img = np.ascontiguousarray(img)
img_warm = torch.zeros((1, 3, imgsz, imgsz), device=device)
_ = model(img_warm.half() if half else img_warm) if device.type != 'cpu' else None
img = torch.from_numpy(img).to(device)
img = img.half() if half else img.float()
img /= 255.0
if img.ndimension() == 3:
img = img.unsqueeze(0)
pred = model(img, augment=opt.augment)[0]
pred = non_max_suppression(pred, opt.conf_thres, opt.iou_thres, classes=opt.classes, agnostic=opt.agnostic_nms)
for i, det in enumerate(pred):
if det is not None and len(det):
det[:, :4] = scale_coords(img.shape[2:], det[:, :4], img.shape).round()
all_conf = det[:, 4]
if len(det[:, -1]) > 1:
ind = torch.max(all_conf, 0)[1]
c = torch.take(det[:, -1], ind)
detect_class = int(c)
else:
for c in det[:, -1]:
detect_class = int(c)
return detect_classExperimental results on a dataset of 600 labeled frames (200 normal, 200 colorful, 200 black) show that the YOLOv5‑based classifier achieves 97% accuracy, outperforming a ResNet‑based classifier which reaches only 88% and often confuses normal and colorful screens.
The conclusion recommends using object‑detection frameworks such as YOLOv5 for classification tasks when the dataset is small or when pure classification models struggle, noting that the approach can be adapted to other detection architectures.
360 Tech Engineering
Official tech channel of 360, building the most professional technology aggregation platform for the brand.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.