Estimating Overdispersion in Sparse Multinomial Data

The phenomenon of overdispersion arises when the data are more variable than we expect from the fitted model. This issue often arises when fitting a Poisson or a binomial model. When overdispersion is present, ignoring it may lead to misleading conclusions, with standard errors being underestimated and overly-complex models being selected. In our research we considered overdispersed multinomial data, which can arise in many research areas. Two approaches can be used to analyze overdispersed multinomial data; the use of the quasilikelihood method or explicit modelling of the overdispersion using, for example, a Dirichlet-multinomial or finite-mixture distribution. Use of quasilikelihood has the advantage of only requiring specification of the first two moments of the response variable. For sparse data, such as in a contingency table with many low expected counts, use of quasilikelihood to estimate the amount of overdispersion will be particularly useful, as it may be difficult to obtain reliable estimates of the parameters in a Dirichlet-multinomial or finite-mixture model. I consider four estimators of the amount of overdispersion in sparse multinomial data, discuss their theoretical properties and provide simulation results showing their performance in terms of bias, variance and mean squared error.