KNAW

Research

Safe Statistics

Pagina-navigatie:


Update Research data


Title Safe Statistics
Period 02 / 2010 - unknown
Status Current
Research number OND1339539
Data Supplier NWO

Abstract

We aim to develop a general theory for statistical inference under misspecification, to be used when all models under consideration are wrong (=misspecified), yet some are useful. In practice, wrong-yet-useful models are employed all the time: we often pretend that nonlinear variables are linear; or that dependent variables are independent; or that measurement error or noise is normally distributed, even though it isn't - and so on. Besides such misspecification of the data generating machinery, we also target sampling plan misspecification, which arises if our assumptions about how the data are gathered or sampled are incorrect. A key novel insight is that these two types of misspecification, while usually viewed as intrinsically different, can be given a unified treatment by employing safe distributions. These are probability distributions accompanied by a specification of what aspects of a domain they can predict well. Using safe distributions, we plan to develop statistical methods that work near optimal if the model under consideration is entirely correct, 'almost' correct (as in nonparametric settings) and entirely incorrect, without knowing in advance which of these situations pertains. In practice, such methods would unify and generalize both Bayesian and worst-case approaches to statistical learning, and in many cases considerably outperform them both, allowing us to do more with less data. Such a unification is a 'holy grail' in the fields of statistical learning, with applications in classification and regression problems such as automated object or character recognition, time series prediction and so on. But the same concept of 'safe distributions' also leads to improved, robustified versions of null hypothesis testing, the standard statistical method for inference in, for example, the medical sciences and experimental psychology; and, relatedly, to new insights on the use of statistics in court cases, shedding light on controversial issues such as 'does it sometimes make sense to ignore part of the data?'

Abstract (NL)

Wetenschappers maken volop gebruik van praktisch zinvolle, maar duidelijk foute modellen. Niet-lineaire verbanden worden bijvoorbeeld als lineair gemodelleerd, of afhankelijke variabelen als onafhankelijk, zoals in DNA-sequentieanalyse. Bestaande statistische methoden gaan er echter van uit dat de modellen correct zijn. De onderzoekers ontwikkelen nieuwe methoden die daar niet van uitgaan. Hierdoor kunnen we meer doen met minder data.

Related organisations

Related people

Project leader Prof.dr. P.D. Grünwald

Go to page top
Go back to contents
Go back to site navigation