Likelihood and Bayesian Methods

Introduction and Overview

Authored by: G. Kenward Michael , Molenberghs Geert , Verbeke Geert

Missing Data Methodology

Print publication date:  November  2014
Online publication date:  November  2014

Print ISBN: 9781439854617
eBook ISBN: 9781439854624
Adobe ISBN:

10.1201/9781439854624-6

 

Abstract

In Chapter 1, key concepts were set out that are relevant throughout this volume. First, missing data mechanisms were considered: missing completely at random (MCAR), missing at random (MAR), and not missing at random (NMAR). Second, the choices of model framework to simultaneously model the outcome and missing-data were described: selection models, pattern-mixture models, and shared-parameter models. Third, the major routes of inference were reviewed: likelihood and Bayesian inference, methods based on inverse probability weighting, and multiple imputation. The latter classification also applies to Parts II, III, and IV of this volume.

 Add to shortlist  Cite

Likelihood and Bayesian Methods

3.1  Likelihood and Bayesian Inference and Ignorability

In Chapter 1, key concepts were set out that are relevant throughout this volume. First, missing data mechanisms were considered: missing completely at random (MCAR), missing at random (MAR), and not missing at random (NMAR). Second, the choices of model framework to simultaneously model the outcome and missing-data were described: selection models, pattern-mixture models, and shared-parameter models. Third, the major routes of inference were reviewed: likelihood and Bayesian inference, methods based on inverse probability weighting, and multiple imputation. The latter classification also applies to Parts II, III, and IV of this volume.

In this part, we are concerned with the likelihood and Bayesian routes. A key concept, with both of these, is ignorability, as already discussed in Section 1.3.2. Likelihood and Bayesian inferences rest upon the specification of the full joint distribution of the outcomes and missing-data mechanism, regardless of which framework is chosen. However, the missing-data mechanism does not always need to be specified. To see this, consider the following. The full data likelihood contribution for unit i takes the form

L * ( θ , ψ | X i , Y i , R i )   f ( Y i , R i | X i , θ , ψ ) .

Because inference is based on what is observed, the full data likelihood L * is replaced by the observed data likelihood L:

3.1 L ( θ , ψ | X i , Y i o , R i )   f ( Y i o , R i | X i , θ , ψ )

with

3.2 f ( Y i o , R i | θ , ψ )   = f ( Y i , R i | X i , θ , ψ ) d Y i m =   f ( Y i o , Y i m | X i , θ ) f ( R i | Y i o , Y i m , ψ ) d Y i m .

Under MAR, we obtain

3.3 f ( Y i o , R i | θ , ψ )   = f ( Y i o , Y i m | X i , θ ) f ( R i | Y i o , ψ ) d Y i m = f ( Y i o | X i , θ ) f ( R i | Y i o , ψ ) .

The likelihood then factors into two components. While this seems a trivial result at first sight, the crux is that the first factor depends on θ only, while the second one is a function of ψ only. Factoring also the left-hand side of (3.3), this can be written as:

f ( Y i o | X i , θ , ψ ) f ( R i | Y i o , θ , ψ ) = f ( Y i o | X i , θ ) f ( R i | Y i o , ψ ) .

Therefore, if, further, θ and ψ are disjoint (also termed: variationally independent) in the sense that the parameter space of the full vector (θ′, ψ′)′ is the product of the parameter spaces of θ and ψ, then inference can be based solely on the marginal observed data density. This is the so-called separability condition. A formal derivation is given in Rubin (1976). The practical implication is that, essentially an ignorable likelihood or ignorable Bayesian analysis is computationally as simple as the corresponding analysis in a non-missing data context.

A few remarks apply. First, with a likelihood analysis, the observed information matrix should be used rather than the expected one (Kenward and Molenberghs 1998), even though the discrepancies are usually minor. Second, ignoring the missing data mechanism assumes there is no scientific interest attached to this. When this is untrue, the analyst can, in a straightforward way, fit appropriate models to the missing data indicators. Third, regardless of the appeal of an ignorable analysis, NMAR can almost never be ruled out as a mechanism, and therefore one should also consider the possible impact of such mechanisms. In Chapters 4 and 5 such models are explored in the likelihood and Bayesian frameworks, respectively. Fourth, a particular NMAR model can never provide a definitive analysis, because of the necessary uncertainty about what is unobserved. Even when data are balanced by design, recording an incomplete version of it may induce imbalance. This, in turn, may lead to inferences that are more dependent on correctly specified modeling assumptions than is the case with balanced data. This is one of the main reasons why much research has been devoted to sensitivity analysis, the topic of Part V. The same issue has also stimulated important work in semi-parametric methods for missing data; these are studied in Part III. Fifth, while the interpretation of an MAR mechanism and ignorability is relatively straightforward in a monotone setting, this need not be the case in some non-monotone settings (Robins and Gill 1997, Molenberghs et al 2008). In such settings, the predictive model of what is unobserved given what is observed will necessarily be pattern-specific, however this is not an issue for all inferential targets. As stated above, should the need arise, an analysis in which the outcomes and the missing-data mechanism are jointly modeled is available (Chapters 45).

3.2  Joint Models

The specific but rapidly growing field of joint models for longitudinal and time-to-event data is the topic of Chapter 6. Such models have strong connections with those that arise in the missing-data literature. For example, much of the work done in the area uses shared-parameter models, partly because a missingness process, especially when confined to dropout, can be seen as a discrete version of a survival process and the use of a latent process to link the two is particularly convenient. As will be seen in Chapter 6, joint modeling may place the main focus on the time-to-event process, the longitudinal process, or both. Of course, when the time-to-event process refers to dropout, either in continuous or in discrete time, inferences are usually, though not always, directed more towards the longitudinal process than to that describing dropout.

References

Kenward, M.G. and Molenberghs, G. (1998). Likelihood based frequentist inference when data are missing at random. Statistical Science 13, 236–247.
Molenberghs, G. , Beunckens, C. , Sotto, C. , and Kenward, M.G. (2008). Every missing not at random model has got a missing at random counterpart with equal fit. Journal of the Royal Statistical Society, Series B 70, 371–388.
Robins, J.M. and Gill, R. (1997). Non-response models for the analysis of non-monotone ignorable missing data. Statistics in Medicine 16, 39–56.
Rubin, D. B. (1976). Inference and missing data. Biometrika 63, 581–592.
Search for more...
Back to top

Use of cookies on this website

We are using cookies to provide statistics that help us give you the best experience of our site. You can find out more in our Privacy Policy. By continuing to use the site you are agreeing to our use of cookies.