Fundamentals 11 min read

Unlocking Bayesian Sampling: How MCMC and Hamiltonian Monte Carlo Work

This article explains the principles behind Markov Chain Monte Carlo methods, including Monte Carlo sampling, the Metropolis‑Hastings algorithm, and the Hamiltonian Monte Carlo (HMC) approach, illustrating how they efficiently approximate posterior distributions in Bayesian analysis.

Model Perspective
Model Perspective
Model Perspective
Unlocking Bayesian Sampling: How MCMC and Hamiltonian Monte Carlo Work

Markov Methods

Many related techniques are referred to as MCMC methods. For grid approximation we must compute prior and likelihood at given points to approximate the entire posterior, but MCMC focuses computational effort on high‑probability regions, sampling proportionally to their probability.

MCMC can be split into two parts: the Monte Carlo part and the Markov chain part.

Monte Carlo

The Monte Carlo component uses random sampling to estimate or simulate a process. It originated with Stanislaw Ulam in Monte Carlo, Monaco, and was first applied to a nuclear physics problem over 70 years ago. Today it solves problems across science, industry, and art.

A classic example estimates the area of a circle inscribed in a square:

Randomly place points inside a square of side length L.

Draw a circle of radius R inside the square.

Count how many points fall inside the circle.

The proportion of points inside the circle approximates the area ratio.

Markov Chain

A Markov chain is a mathematical object consisting of states and transition probabilities that depend only on the current state.

Given such a chain whose transition probabilities are proportional to a target distribution (e.g., a Bayesian posterior), random walks on the chain produce samples from that distribution.

To construct such a chain without knowing the posterior, we enforce the Detailed Balance Condition, ensuring that the probability of moving from state A to B equals the probability of moving from B to A under the target distribution.

Metropolis‑Hastings Algorithm

An intuitive analogy: estimating a lake’s volume by randomly selecting points (using a boat and a pole) and deciding whether to accept new measurements based on a comparison rule.

If the new point is deeper, record it and continue; if shallower, accept it with a certain probability or reject it and stay at the previous point.

The acceptance probability follows the Metropolis‑Hastings criterion, the ratio of the new and old probabilities.

Iterating this process yields an approximation of the posterior distribution, with more iterations improving accuracy.

Formal steps of the Metropolis‑Hastings algorithm:

Initialize the parameter with a random or heuristic value.

Propose a new value from a simple proposal distribution (e.g., Gaussian or uniform).

Compute the acceptance probability using the Metropolis‑Hastings rule.

Draw a uniform random number; if it is less than the acceptance probability, accept the new value, otherwise keep the current one.

Repeat steps 2–4 until a sufficient number of samples are collected.

Key notes:

If the proposal distribution is symmetric, the acceptance rule reduces to the original Metropolis criterion.

Moves to higher‑probability states are always accepted; moves to lower‑probability states are accepted with probability equal to the ratio of their probabilities.

The recorded samples form a chain that approximates the target (posterior) distribution; the most frequently visited value approximates the mode.

Hamiltonian Monte Carlo / No‑U‑Turn Sampler

Standard MCMC can be slow and produce highly correlated samples. Hamiltonian Monte Carlo (HMC), also called Hybrid Monte Carlo, combines Metropolis‑Hastings with concepts from molecular dynamics, using gradient information to propose distant yet probable states, thus improving acceptance rates.

Imagine throwing a frictionless ball on the lake bottom to guide the boat; the ball’s trajectory, informed by the gradient, leads to more efficient exploration.

HMC requires computing the gradient of the target density, which adds complexity but yields higher efficiency for many problems. A modern variant, the No‑U‑Turn Sampler (NUTS) implemented in PyMC3, adapts step sizes automatically, reducing the need for manual tuning.

Reference:

Osvaldo Martin, "Python Bayesian Analysis"

Bayesian inferencemonte carloMCMCMetropolis-HastingsHamiltonian Monte Carlo
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.