Artificial Intelligence 11 min read

How Decision Trees Predict House Locations: From Intuition to Overfitting

This article explains machine learning fundamentals using a house‑location classification example, illustrating how decision trees create split points from features like elevation and price, grow recursively, achieve high training accuracy, and reveal overfitting when evaluated on unseen test data.

Model Perspective
Model Perspective
Model Perspective
How Decision Trees Predict House Locations: From Intuition to Overfitting

Meaning of Machine Learning

Machine Learning is a branch of computer science that uses statistical techniques; algorithms can automatically learn patterns in data and make highly accurate predictions.

Case Study – Predicting House Location

Starting with Intuition

Suppose we need to predict whether a house is in New York City or San Francisco. This is a classification task. Since San Francisco is famously hilly, we begin by looking at the house’s elevation.

From the elevation chart, houses above 73 m are likely in San Francisco – an initial intuitive guess.

Adding Common Sense

Adding another dimension, such as price per square meter, helps distinguish houses further because New York houses are generally more expensive.

Plotting price per square meter turns the data into a scatter plot, allowing us to separate low‑elevation houses based on price.

Using these features, we can predict that a house below 73 m elevation with a price above $19,117 per m² is in New York.

In machine‑learning terminology, these dimensions are called features, predictors, or variables.

Defining the Decision Boundary

We plot elevation and price on a scatter diagram. The green region (elevation > 73 m) corresponds to San Francisco, while the blue region (price > $19,117) corresponds to New York.

This process of finding reasonable boundaries from data is the essence of statistical and machine learning.

For houses that are low in both elevation and price, we need additional dimensions.

Our dataset contains seven features; building a model from them is called model training.

Below is a scatter‑matrix showing pairwise relationships among the features.

The matrix hints at underlying patterns, which we must turn into precise decision rules.

Introducing Decision Trees

Finding patterns in data is the foundation of machine learning. All algorithms use statistical learning to locate optimal boundaries.

We now introduce the decision‑tree algorithm, which examines features one by one to find split points.

Finding Better Splits

Re‑examining elevation data, we convert the elevation histogram to reveal more details about the distribution of houses.

Although the highest New York house sits at 73 m, most New York houses are at lower elevations.

First Split Node

Decision trees use binary if‑else conditions as split nodes. For example, if a house’s elevation exceeds a threshold, the tree classifies it as San Francisco.

These split nodes (forks) divide the data into two branches based on a numeric threshold, called the split point.

Balancing Trade‑offs

Choosing split nodes requires careful trade‑offs. Our initial split at 73 m misclassifies many San Francisco houses as New York (false negatives).

Optimal Split

At the optimal split, each resulting group should be as homogeneous as possible. Statistical algorithms help locate such split points.

Even with the best split, some misclassifications remain.

Recursive Splitting

We apply recursion: each branch is further split using the same procedure, a common technique in model training.

The series of histograms shows the distribution of each feature within the two branches.

New split points differ per branch.

In the lower‑elevation branch, the best discriminating feature is price per m² with a threshold of $11,420. In the higher‑elevation branch, the best feature is total house value with a threshold of $514,500.

Growing the Tree

Adding more branches provides the tree with additional information, increasing prediction accuracy.

With one extra layer, accuracy rises to 84%.

Adding several more layers pushes accuracy to 96%.

Continuing until the tree perfectly separates the two cities yields 100% training accuracy; leaf nodes contain only one class.

Leaf nodes determine the predicted class based on the majority of samples they contain.

Making Predictions

To predict, a sample starts at the root and follows the tree’s branches until reaching a leaf, whose class is the prediction. Each path represents a decision rule.

The training data used to build the tree are called training data.

Because the tree was built from these data, its training accuracy is 100%.

Facing Reality

Perfect training accuracy does not guarantee performance on unseen data. We evaluate the tree with separate test data.

The tree’s performance on test data reveals overfitting: it memorizes tiny variations in the training set, including irrelevant details.

Summary

Machine learning applies statistical learning to uncover classification boundaries and make precise predictions.

Decision trees use a series of if‑else rules to find these boundaries.

Not every tiny detail matters; overfitting must be avoided by testing on unseen data.

References: Illustrated Machine Learning http://www.r2d3.us/图解机器学习/

artificial intelligencemachine learningoverfittingclassificationdecision treedata visualization
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.