Artificial Intelligence 16 min read

UI2CODE: Layout Analysis and Background/Foreground Extraction for UI Images

The UI2CODE system tackles UI layout analysis by first extracting backgrounds with Sobel, Laplacian and Canny edge detection plus a flood‑fill algorithm, then isolating foreground components through connected‑component analysis and a Faster R‑CNN classifier, and finally fusing both pipelines to achieve superior precision, recall and IoU on Xianyu app screenshots.

Xianyu Technology
Xianyu Technology
Xianyu Technology
UI2CODE: Layout Analysis and Background/Foreground Extraction for UI Images

This article presents the UI2CODE project, focusing on the challenging step of layout analysis when converting complex UI screenshots into GUI elements.

The system is divided into two main modules: background analysis and foreground analysis.

Background analysis extracts the UI's background by applying edge‑detection algorithms such as Sobel, Laplacian and Canny. Gradient direction is used to distinguish solid‑color regions from gradient regions, and a discrete Laplacian template is employed to locate flat background areas.

After identifying background blocks, a flood‑fill (diffuse‑water) algorithm removes gradient backgrounds. The core implementation is shown below:

def fill_color_diffuse_water_from_img(task_out_dir, image, x, y, thres_up=(10,10,10), thres_down=(10,10,10), fill_color=(255,255,255)):
    # get image height and width
    h, w = image.shape[:2]
    # create mask (required shape h+2, w+2, single‑channel uint8)
    mask = np.zeros([h+2, w+2], np.uint8)
    # perform flood fill with fixed‑range mode
    cv2.floodFill(image, mask, (x, y), fill_color, thres_down, thres_up, cv2.FLOODFILL_FIXED_RANGE)
    cv2.imwrite(task_out_dir + "/ui/tmp2.png", image)
    return image, mask

With the background cleaned, foreground analysis proceeds to extract GUI fragments. Connected‑component analysis prevents fragmentation, while a deep‑learning classifier identifies component types and merges fragments iteratively until no residual pieces remain.

A concrete use case is the detection of waterfall‑flow cards in the Xianyu app. Traditional image‑processing steps include CLAHE contrast enhancement, Canny edge detection, morphological dilation, contour extraction, Douglas‑Peucker rectangle approximation, and horizontal/vertical projection to obtain smooth contours.

For higher recall, a deep‑learning pipeline based on Faster R‑CNN is employed. The network extracts features with a backbone (e.g., ResNet), generates region proposals, performs RoI pooling, and classifies & regresses bounding boxes.

The two streams are fused: both methods run in parallel, their boxes are filtered by IoU thresholds, and the remaining boxes are refined by snapping edges to the nearest detected lines (within a pixel tolerance). This yields a final set of boxes that combine the high localization accuracy of traditional methods with the high recall of deep learning.

Experiments on 50 Xianyu screenshots (96 cards) show that the traditional pipeline detects 65 cards, the deep‑learning pipeline 97 cards, and the fused approach 98 cards, achieving superior precision, recall and IoU as illustrated in the result tables.

In conclusion, the hybrid approach demonstrates that integrating classic computer‑vision techniques with modern deep‑learning models can produce robust UI element extraction, while acknowledging remaining challenges such as edge‑case refinement.

Computer Visionimage-processingdeep learninglayout analysisFaster R-CNNflood fillUI2CODE
Xianyu Technology
Written by

Xianyu Technology

Official account of the Xianyu technology team

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.