For statisticians working in clinical trial field, the best challenge may not be in the statistical methodologies. The best challenge may be in communication with non-statisticians (such as physicians, clinical team members, corporate executives) about the statistical concepts and the statistical terminologies in plain languages.
The missing data is very common in clinical trials and the concept of the missing data is very easy to understand. However, the categories for missing data mechanisms (or taxonomy of missingness) are not so easy to understand. A formal taxonomy exists for classifying missing data mechanisms, including for longitudinal and event history data. The mechanisms can be classified as MCAR (missing completely at random), MAR (missing at random), and MNAR (missing not at random). Take a look at the definition of MCAR, MAR, and MNAR below, you will see that these definitions are not easy to be understood by non-statisticians.
For the dependent variable (conditional on the covariates in the model), if the probability of an observation being missing does not depend on observed or unobserved measurements then the observation is Missing Completely At Random (MCAR).
In the case of MCAR, the missing data are unrelated to the study variables: thus, the participants with completely observed data are in effect a random sample of all the participants assigned a particular intervention. With MCAR, the random assignment of treatments is assumed to be preserved, but that is usually an unrealistically strong assumption in practice.
Conditional on the covariates in the model, if the probability of an observation being missing depends only on observed measurements then the observation is Missing At Random (MAR).
In the case of MAR, whether or not data are missing may depend on the values of the observed study variables. However, after conditioning on this information, whether or not data are missing does not depend on the values of the missing data.
When observations are neither MCAR nor MAR, they are classified as Missing Not At Random (MNAR), i.e. the probability of an observation being missing depends on unobserved measurements. In this scenario, the value of the unobserved responses depends on information not available for the analysis (i.e. not the values observed previously on the analysis variable or the covariates being used), and thus, future observations cannot be predicted without bias by the model.
In the case of MNAR, whether or not data are missing depends on the values of the missing data.
Thanks to Ziad Taib, the following example for three different missingness mechanisms were explained very well and were easy to be understood by the non-statisticians.
Suppose you are modelling weight (Y) as a function of sex (X). Some respondents wouldn't disclose their weight, so you are missing some values for Y. There are three possible mechanisms for the nondisclosure:
- There may be no particular reason why some respondents told you their weights and others didn't. That is, the probability that Y is missing may has no relationship to X or Y. In this case our data is missing completely at random (MCAR)
- One sex may be less likely to disclose its weight. That is, the probability that Y is missing depends only on the value of X. Such data are missing at random (MAR)
- Heavy (or light) people may be less likely to disclose their weight. That is, the probability that Y is missing depends on the unobserved value of Y itself. Such data are not missing at random or missing not at random (MNAR)
Understanding the concept of missing mechanism is one thing, fully understanding missing mechanism in practice is another story. The reason for missing data is often not collected or incompletely collected in the clinical trials. Patients may not tell the real reason for them to withdraw from the study (discontinue from the study earlier). Academy’s suggestions below are reasonable, however, ‘full and detailed documentation for each individual of the reasons for missing records or missing observations’ is not the reality in the current clinical trial practice.
"Reasons for missing data must be documented as much as possible. This includes full and detailed documentation for each individual of the reasons for missing records or missing observations. Knowing the reason for missingness permits formulation of sensible assumptions about observations that are missing, including whether those observations are well defined.Missing data in clinical trials can seriously undermine the benefits provided by randomization into control and treatment groups. Two approaches to the problem are to reduce the frequency of missing data in the first place and to use appropriate statistical techniques that account for the missing data. The former approach is preferred, since the choice of statistical method requires unverifiable assumptions concerning the mechanism that causes the missing data, and so always involves some degree of subjectivity.”