In most databases, one can identify small partitions of the data where the observed behaviour is notably different from that of the database as a whole. A well-known approach to finding such exceptional subgroups, Subgroup Discovery, interprets the concept of 'behaviour' in terms of the distribution of a single nominal variable. For example, one may find that on average, the incidence of lung cancer (the variable) among smokers (the subgroup) is increased. This project proposes a thorough investigation into a framework called Exceptional Model Mining that generalises Subgroup Discovery by allowing the target concept to be models over sets of attributes, rather than a single nominal attribute. The models over the target attributes capture the multivariate dependencies between these attributes, and specific properties of the models may serve as a measure for how exceptional the subgroup is. The EMM process searches through a space of candidate subgroups, and repeatedly builds a model for the subset of data at hand. Finally, the subgroup is reported that optimises the measure of exceptionality. As all manner of model classes can be inserted into this process, one can think of the EMM framework as a generalisation of Subgroup Discovery that embeds any of the existing Data Mining paradigms (Classification, Regression, Graphical Modelling, ...). In this project, we will investigate possible instances of EMM, and propose appropriate quality measures. By considering the fundamental behaviour and proving key properties of quality measures, we can design efficient and effective implementations of EMM. |