| The central problems that the project addresses are: (i) the lack of large and rich formalized lexicons for multi-word expressions for use in NLP; (ii) the lack of proper methods and tools to extend the lexicon of an NLP-system for multi-word expressions given a text corpus in a maximally automated manner. Therefore, the project aims to develop innovative methods and tools for the automatic identification and lexical representation of multi-word expressions. Concomitantly, a 5,000 entry corpus-based multi-word expression lexical database for Dutch will be developed. The database will be externally validated, and its usability will be evaluated in two independent NLP-systems for Dutch. The project contributes to the development of electronic lexicons, in particular for Dutch. The MWE database to be developed fills a gap in existing lexical resources for Dutch. The project carries out strategic research into generic methods and tools for MWE identification and lexical representation, focusing on Dutch, but these tools will be largely language-independent and can also be used for other languages, new domains, and beyond this project. In this way the project contributes directly to strengthening the digital infrastructure for Dutch. |