| Statistical Machine Translation models are trained on large parallel corpora of sentences and their translations. Current models collect huge statistical tables of strings of arbitrary length together with their translation equivalents and train them statistically using heuristic methods. To obtain good translation results it is often necessary to conduct tedious tuning (mainly due to the heuristic estimator). In this project we explore an alternative, provably consistent statistical estimator developed within our group aiming at a cleaner and more robust translation model. Under our model and estimator the training data is incomplete which necessitates training under latent variables. A simplified, yet effective version of this estimator works under held-out Expectation-Maximization. This kind of training demands large amounts of RAM memory and computing time. The project is conducted as a thesis requirement for the research master Artificial Intelligence track Natural Language Processing and Learning |