# Machine Translation When Exact Pattern Match Fails

Samenvatting (EN)

 The translation between languages is a rare expertise attained by training and experience. Apart from {\em ambiguity}, translation is complicated by the {\em divergence}\/ among languages in morphology, word order, and the idiomatic ways of expressing concepts. The field of Machine Translation (MT) concerns building effective models of automatic translation. With {\em English}\/ in mind as the prototypical language, current state-of-the-art statistical MT models employ a translation dictionary/table" that consists of fixed pattern pairs (called phrase pairs), harvested from a bilingual corpus of example translations (parallel corpus). A phrase pair is used during translation only if its source side {\em exactly matches} contiguous subsequences of the input. However, by assuming English as the prototype language, a {\sl major challenge is shoved aside}: the notorious {\em intra-language variation}\/ in morphological forms and the freer word order that is typical for many languages, e.g., Polish, Greek, Arabic, Finnish. This proposal addresses machine translation from/into languages with substantial morpho-syntactic variation. Instead of a mere dictionary, it proposes a model that works with a probabilistic synchronous grammar over morpho-syntactic representations extracted from a parallel corpus. As well as the pattern pairs found in training data, this grammar can generate morphological and syntactic variants that are currently not available to the state-of-the-art models. The probability estimates over variants are such that least deviating variants are preferred. Instead of translation by mere exact match, our model translates by the most likely variant.

