KNAW

Onderzoek

Machine Translation When Exact Pattern Match Fails

Pagina-navigatie:


Wijzig gegevens


Titel Machine Translation When Exact Pattern Match Fails
Looptijd 09 / 2010 - onbekend
Status Afgesloten
Onderzoeknummer OND1339122
Leverancier gegevens NWO

Samenvatting (EN)

The translation between languages is a rare expertise attained by training and experience. Apart from {\em ambiguity}, translation is complicated by the {\em divergence}\/ among languages in morphology, word order, and the idiomatic ways of expressing concepts. The field of Machine Translation (MT) concerns building effective models of automatic translation. With {\em English}\/ in mind as the prototypical language, current state-of-the-art statistical MT models employ a ``translation dictionary/table" that consists of fixed pattern pairs (called phrase pairs), harvested from a bilingual corpus of example translations (parallel corpus). A phrase pair is used during translation only if its source side {\em exactly matches} contiguous subsequences of the input. However, by assuming English as the prototype language, a {\sl major challenge is shoved aside}: the notorious {\em intra-language variation}\/ in morphological forms and the freer word order that is typical for many languages, e.g., Polish, Greek, Arabic, Finnish. This proposal addresses machine translation from/into languages with substantial morpho-syntactic variation. Instead of a mere dictionary, it proposes a model that works with a probabilistic synchronous grammar over morpho-syntactic representations extracted from a parallel corpus. As well as the pattern pairs found in training data, this grammar can generate morphological and syntactic variants that are currently not available to the state-of-the-art models. The probability estimates over variants are such that least deviating variants are preferred. Instead of translation by mere exact match, our model translates by the most likely variant.

Betrokken organisaties

Betrokken personen

Projectleider Dr. K. Sima'an

Omhoog
Ga terug naar de inhoud
Ga terug naar de site navigatie