Automating Security Detection with Image Recognition: Workflow and Techniques
This article explains why security detection needs automation, compares static and dynamic analysis, and details an image‑recognition‑based pipeline—including grayscale conversion, edge detection, contour extraction, and OCR—to automatically identify risky app pop‑up warnings.
Why automation is needed for security detection
In the main battlefield of app content distribution, malicious behaviors such as stealing user privacy, bundling plugins, sending SMS in the background, and network communication have become the primary threats to users and product reputation.
Current app‑store detection techniques mainly include static analysis based on source code and permissions, and dynamic analysis.
Static detection supports high concurrency and is very efficient, but because it relies on security‑scan APIs and virus signature updates, it may miss some risks, requiring a second manual verification on a real device. It is therefore mostly used for an initial security screening.
Dynamic detection workflow
Test/operations personnel download and install the app to be tested.
Manually operate the app to traverse its functionality.
Review the running UI to ensure installation, launch, and operation are free of malicious behavior.
Drawbacks : low review efficiency and a cumbersome process.
Introducing image‑recognition technology
When a risk or virus is identified during runtime, security software will pop up a warning dialog.
Proposed solution : simulate the user perspective and use multiple security engines to kill threats.
Pre‑install several mainstream security apps (e.g., Baidu Security Guard, Tencent Guard, 360 Security Guard) and automatically recognize risk‑alert dialogs via image recognition.
Processing the pop‑up dialog consists of four steps :
Grayscale and edge processing to obtain image contours.
Filter contours to select target outlines.
Obtain target contour coordinates and generate a valid “alert box” image.
Perform OCR on the alert box, match extracted text against a risk‑feature database.
Key point – Grayscale
Convert the screenshot to a grayscale image to eliminate color interference.
Grayscale before‑and‑after comparison:
Key point – Edge detection
Apply the Canny operator to extract image edges; store contours as binary values (black or white).
Binary conversion after Canny edge extraction (no obvious change because the image is already grayscale):
Key point – Contour extraction
In the binary matrix, use the minimum covering rectangle to locate each contour.
Key point – Target contour selection
Since screenshots contain UI elements like status bars and app icons, and contours may be nested, custom rules (e.g., size too small, too thin, off‑center, irregular shape) are applied to filter out noise.
Analysis of target features: security‑software pop‑ups are usually centered and occupy the largest area. The following rules are used to extract them.
Selected target contours and their coordinates are recorded.
Key point – Determining the alert box
Using the contour coordinates, width, and height, the original image can be cropped to isolate the alert box.
Comparison of the image before and after cropping:
Key point – Risk judgment
Call Baidu OCR (or an open‑source OCR library) to extract the text from the cropped alert box.
If the extracted text matches keywords in the risk‑feature database (e.g., “risk detected”), the screenshot is considered flagged by the security software.
Engineering implementation
Overall automated detection workflow diagram:
By applying an engineering mindset, continuously discover potential risky apps and achieve a one‑time solution.
Author introduction
Wang Hui
Joined Baidu in 2013, currently a senior test development engineer in Baidu Content Ecology Quality Department. Previously worked on quality assurance for Baidu Maps, Baidu Waimai, and Baidu Mobile Assistant. Skilled at problem‑driven engineering to solve top product issues.
Baidu Intelligent Testing
Welcome to follow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.