Artificial Intelligence 6 min read

Bayesian Optimal Discretization of Continuous Attributes (MODL) – Theory and Practice

This article explains why and how to discretize continuous features for categorical targets, compares three common discretization strategies, and details a Bayesian optimal method (MODL) with its mathematical formulation and implementation references.

360 Tech Engineering

Jun 13, 2019

Bayesian Optimal Discretization of Continuous Attributes (MODL) – Theory and Practice

In many machine‑learning tasks, features are continuous while the target variable is categorical, making direct linear or monotonic modeling difficult; discretizing the continuous features into ordered categories helps bridge this gap.

The article first motivates discretization, noting that although it does not remove the relative ordering of the original values, treating the discretized feature as a categorical variable aligns better with typical modeling goals.

Three broad discretization approaches are described: (1) empirical methods such as equal‑width or equal‑frequency binning, which are simple but may distort the data distribution; (2) distribution‑based methods that rely on strong assumptions about the feature’s underlying distribution; and (3) supervised methods that avoid strong distribution assumptions while still linking the discretized feature to the target, effectively framing the problem as a single‑feature classification task.

The article focuses on the supervised approach (method 3), specifically the Bayesian optimal discretization method known as MODL, originally presented in the paper “MODL: a Bayes Optimal Discretization Method for Continuous Attributes”.

Using Bayesian reasoning, the problem is cast as finding a model M (a particular discretization) that maximizes the posterior probability P(M|D), where D is the data consisting of the sorted feature values and corresponding target labels. Since P(D) is constant, the objective reduces to maximizing P(D|M)·P(M).

The prior P(M) is expanded to reflect preferences over discretization schemes, while the likelihood P(D|M) measures how probable the observed target labels are under a given discretization. Maximizing the product of these terms is equivalent to minimizing a loss function L, leading to the optimal discretization.

For practical implementation, the article points to the MODL source code (modl.c and modl.h) hosted on GitHub and recommends reading the original MODL paper (http://www.marc-boulle.fr/publications/BoulleML06.pdf) to fully understand the algorithmic details.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Machine Learning feature engineering Bayesian continuous attributes discretization MODL

Written by

360 Tech Engineering

Official tech channel of 360, building the most professional technology aggregation platform for the brand.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.