| We aim to develop a general theory for statistical inference under misspecification, to be used when all models under consideration are wrong (=misspecified), yet some are useful. In practice, wrong-yet-useful models are employed all the time: we often pretend that nonlinear variables are linear; or that dependent variables are independent; or that measurement error or noise is normally distributed, even though it isn't - and so on. Besides such misspecification of the data generating machinery, we also target sampling plan misspecification, which arises if our assumptions about how the data are gathered or sampled are incorrect. A key novel insight is that these two types of misspecification, while usually viewed as intrinsically different, can be given a unified treatment by employing safe distributions. These are probability distributions accompanied by a specification of what aspects of a domain they can predict well. Using safe distributions, we plan to develop statistical methods that work near optimal if the model under consideration is entirely correct, 'almost' correct (as in nonparametric settings) and entirely incorrect, without knowing in advance which of these situations pertains. In practice, such methods would unify and generalize both Bayesian and worst-case approaches to statistical learning, and in many cases considerably outperform them both, allowing us to do more with less data. Such a unification is a 'holy grail' in the fields of statistical learning, with applications in classification and regression problems such as automated object or character recognition, time series prediction and so on. But the same concept of 'safe distributions' also leads to improved, robustified versions of null hypothesis testing, the standard statistical method for inference in, for example, the medical sciences and experimental psychology; and, relatedly, to new insights on the use of statistics in court cases, shedding light on controversial issues such as 'does it sometimes make sense to ignore part of the data?' |