Skip to main content
← Glossary · Statistics & Methodology

Markov Chain Monte Carlo (MCMC)

A family of algorithms that samples from complex probability distributions by constructing a Markov chain whose stationary distribution is the target.

What Is MCMC?

MCMC is how modern Bayesian inference works when you can't solve the math analytically. For any posterior — however complex, with hierarchical priors, latent variables, and nonlinear likelihoods — MCMC produces samples that approximate the posterior. With enough samples, any question you want to ask the posterior (mean, credible interval, P(B>A)) becomes a counting exercise.

Also Known As

  • Data science: MCMC, Gibbs sampling, Metropolis-Hastings, HMC, NUTS
  • Growth: Bayesian sampling, Stan/PyMC fitting
  • Marketing: probabilistic model inference
  • Engineering: stochastic posterior approximation

How It Works

You have a hierarchical model of conversion across 50 marketing channels with shrinkage priors to stabilize small-sample channels. The posterior has hundreds of parameters and no closed form. You fit in Stan or PyMC using the No-U-Turn Sampler (NUTS, a Hamiltonian MCMC variant). After 2,000 warmup iterations and 2,000 sampling iterations across 4 chains, you have 8,000 posterior samples for every parameter. Credible intervals, channel rankings, and probability-of-being-best queries all come from counting operations over those samples.

Modern MCMC (HMC, NUTS) converges efficiently for high-dimensional posteriors. Older methods (random-walk Metropolis) are slow and often fail to mix on complex models.

Best Practices

  • Run multiple chains (4+) with different starting points; check R-hat < 1.01 for convergence.
  • Check effective sample size per parameter; below ~400 means your estimates are noisy.
  • Use informative but weakly-informative priors rather than flat priors, which can cause sampling problems.
  • Inspect traceplots for pathological behavior — divergent transitions, low energy, funneling.
  • Use NUTS / HMC in Stan, PyMC, or NumPyro for most applied problems; skip hand-rolled Metropolis.

Common Mistakes

  • Ignoring divergent transitions in HMC output. They indicate the sampler couldn't explore regions of the posterior and estimates may be biased.
  • Under-warmup. Too few warmup iterations leave you sampling in a transient regime.
  • Interpreting autocorrelated samples as independent. Effective sample size matters, not raw sample count.

Industry Context

In SaaS/B2B, MCMC powers hierarchical models of cohort-level retention, plan-level churn, and multi-channel attribution where shrinkage is essential. In ecommerce, it fits media mix models and hierarchical merchandising models where products share information. In lead gen, MCMC fits multi-touch attribution models that require estimating latent decay parameters alongside channel effects.

The Behavioral Science Connection

MCMC makes thinking in full distributions practical. Point estimates feel certain; distributions are honest about uncertainty. The human tendency to collapse complexity into a single number is countered by the MCMC workflow, which rewards looking at full posteriors and understanding the shape of what we don't know.

Key Takeaway

MCMC is the engine room of modern Bayesian work. You may never write an MCMC sampler from scratch, but understanding what it does — and what can go wrong — is essential for any serious probabilistic modeling in experimentation.