Artificial Intelligence 9 min read

Decision Forests for Pixel-Level Classification in Computer Vision

This article traces the evolution of computer vision from its 1960s origins, explains the challenges of image classification and semantic segmentation, and introduces pixel-level decision forest algorithms as an efficient solution for large‑scale pixel classification tasks.

Architects Research Society
Architects Research Society
Architects Research Society
Decision Forests for Pixel-Level Classification in Computer Vision

Computer vision originated in the 1960s from artificial intelligence and cognitive neuroscience, aiming to design algorithms that enable computers to automatically understand image content. MIT formally proposed it as a summer project in 1966, but it quickly became clear that solving it would take much longer. Fifty years later, general image understanding tasks are still not perfectly solved, but significant progress has been made. Commercial successes have broadened attention and led to breakthroughs such as interactive segmentation (e.g., background removal in Microsoft Office), image search, face recognition, autofocus, and Kinect body motion capture. The recent rapid advances are largely due to the fast development of machine learning over the past 15–20 years.

The first part of this series will mainly discuss the challenges faced on the road to computer vision and will introduce a very important machine‑learning technique – the pixel‑level classification decision forest algorithm.

Image Classification

Imagine asking your computer a question about image classification – "Is there a car in this picture?"

For a computer, an image is merely a grid of pixels composed of three primary colors (red, green, blue), each channel ranging from 0 to 255. These values can change due to the object's presence, camera viewpoint, lighting conditions, background, and object shape. Additionally, different car models (sedan, SUV, truck) have distinct pixel patterns, adding further variability.

Fortunately, supervised machine learning algorithms provide an alternative to manually coding solutions for these many possibilities. By collecting a training dataset of images and manually labeling them with appropriate tags, we can use state‑of‑the‑art ML algorithms to discover which pixel patterns correspond to the target objects and which are distractions. The goal is for the algorithm to recognize new, unseen samples and to generalize invariant features despite various interfering factors. Today, we have made great strides both in developing new computer‑vision ML algorithms and in dataset collection and annotation.

Pixel Classification Decision Forest Algorithm

An image often contains multiple layers of information. As mentioned earlier, we can ask whether a specific object class exists in the whole image, such as a car. A more complex problem is to determine which objects are present throughout the image, which becomes a semantic image segmentation problem. Below is an example of street‑scene segmentation:

This kind of segmentation allows selective processing of photos or the synthesis of new images; we will see more applications later.

There are many ways to solve semantic segmentation, but pixel classification is a powerful foundational component: training a classifier to predict, at the pixel level, the distribution of each object (e.g., car, road, tree, wall). However, this brings computational challenges, especially because images contain a huge number of pixels (e.g., the Nokia 1020 smartphone has 41 million pixels), meaning the amount of training and testing data required is orders of magnitude larger than for image‑level classification.

This scale motivated the development of a more efficient classification model – decision forests (also known as random forests or random decision forests). As shown below, a decision forest consists of a collection of independently trained decision trees.

Each decision tree comprises a root node, multiple internal "branch" nodes, and leaf nodes. Classification starts at the root, evaluating a binary branch function (e.g., "Is this pixel redder than its neighbors?"). Based on the binary outcome, the test proceeds left or right to the next branch function, and so on, until reaching a leaf node that stores a prediction—typically a histogram of class labels.

The advantage of decision trees lies in testing efficiency: although the path from root to leaf can be exponential, a single pixel test follows only one path. Moreover, branch functions are computed based on previous decisions, similar to a "20‑questions" game where each answer narrows the next question, quickly arriving at the correct result.

With this technology we have achieved breakthroughs in several segmentation tasks: semantic segmentation of photos, street‑scene segmentation, 3D medical scan segmentation, camera relocalization, and body‑part segmentation using Kinect depth cameras. For Kinect, the testing speed of decision forests is critical due to strict computational limits, but the parallel processing capability of Xbox GPUs meets these requirements, enabling the approach.

In the second article of this series we will discuss the hot topic of deep‑learning‑based image classification and look ahead to the future of image‑classification technology. If you wish to start machine‑learning research on our cloud platform, you are welcome to visit our Machine Learning Center.

machine learningcomputer visionsemantic segmentationdecision forestpixel classification
Architects Research Society
Written by

Architects Research Society

A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.