When to Use Poisson vs. Negative Binomial Regression: A Practical Guide
Poisson regression models count data assuming equal mean and variance, while negative binomial regression handles over‑dispersion by adding a gamma‑distributed heterogeneity term; this article explains their assumptions, likelihoods, interpretation of incidence rate ratios, and guidance on choosing the appropriate model with robust standard errors.
Poisson Regression
When the dependent variable can only take non‑negative integer values (e.g., number of patents, Olympic gold medals, children, doctor visits), count data are often modeled with Poisson regression.
For an individual i, let the outcome Yi follow a Poisson distribution with mean \(\lambda_i\), where \(\lambda_i = \exp(X_i\beta)\) is the “arrival rate” determined by explanatory variables Xi. The Poisson mean equals its variance, \(\lambda_i\).
To ensure \(\lambda_i \ge 0\), the conditional mean function is specified as \(\lambda_i = \exp(X_i\beta)\). Assuming independent and identically distributed observations, the likelihood is … and the log‑likelihood is … Maximizing the log‑likelihood yields first‑order conditions.
Under correct likelihood specification, the maximum‑likelihood estimator (MLE) is consistent. Even if the likelihood is misspecified, because the Poisson belongs to the exponential family, a correctly specified conditional mean still yields a consistent estimator, though standard errors must be robust (e.g., sandwich estimators).
The coefficient \(\beta\) does not represent a marginal effect; instead, it is a semi‑elasticity: a one‑unit increase in a covariate changes the expected count by \(\exp(\beta)-1\), often expressed as an Incidence Rate Ratio (IRR).
If observation periods or exposure levels differ across units, an exposure variable can be included (often with coefficient fixed at 1) to adjust the mean accordingly.
Negative Binomial Regression
The Poisson assumption of equal mean and variance (equidispersion) often fails; overdispersion occurs when the variance exceeds the mean. Adding an unobserved heterogeneity term to the log‑mean yields a negative binomial model.
Let the heterogeneity term follow a Gamma distribution; integrating it out leads to a negative binomial likelihood, which can be estimated by MLE. The resulting conditional mean remains \(\lambda_i = \exp(X_i\beta)\), while the conditional variance becomes \(\lambda_i + \alpha\lambda_i^2\), where \(\alpha\) captures overdispersion.
Two common parameterizations are NB1 (variance linear in the mean) and NB2 (variance quadratic in the mean). NB2 is frequently used because it fits most data better. Robust standard errors remain consistent if the conditional mean is correctly specified.
Model selection can be guided by likelihood‑ratio tests comparing the Poisson (\(\alpha = 0\)) against the negative binomial, as well as by assessing overdispersion and predictive performance.
Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.