Accelerating OpenCV Image Matching with CUDA: CPU vs GPU Performance
This article explains how to compile OpenCV‑Python with CUDA support to speed up image template matching, compares CPU and GPU execution times, and shows a practical Python example demonstrating a 39.4% performance improvement using OpenCV 3.2.0 and CUDA 8.0.
Background
In automated testing, using image recognition to locate UI controls is a common requirement, and the response speed of the HTTP API that provides image recognition is critical.
Problem
This article focuses on accelerating the relevant OpenCV APIs; other server‑side optimization techniques are out of scope. By default, opencv‑python does not include GPU (CUDA) support, so you must compile it yourself. The article uses OpenCV 3.2.0 with CUDA 8.0, as recommended by the official documentation; other versions may cause compilation failures.
Example – Image Matching
# coding=utf-8
import cv2
import time
def match_test():
target = cv2.imread("./target.png")
template = cv2.imread("./template.jpg")
result = cv2.matchTemplate(target, template, cv2.TM_CCOEFF_NORMED)
minVal, maxVal, minLoc, maxLoc = cv2.minMaxLoc(result)
h, w = template.shape[:-1]
if maxVal > 0.5:
middle_point = (int(maxLoc[0] + w / 2), int(maxLoc[1] + h / 2))
return middle_point
else:
return None
if __name__ == '__main__':
num = 100
begin = time.time()
for i in range(num):
match_test()
print((time.time() - begin) / num)Result
CPU: 0.299 seconds per call
GPU: 0.181 seconds per call
Improvement: 39.4% faster using GPU acceleration.
360 Quality & Efficiency
360 Quality & Efficiency focuses on seamlessly integrating quality and efficiency in R&D, sharing 360’s internal best practices with industry peers to foster collaboration among Chinese enterprises and drive greater efficiency value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.