Identification and Representation of Multi-word Expressions (IRME)


Update content

Title Identification and Representation of Multi-word Expressions (IRME)
Period 04 / 2005 - unknown
Status Completed
Research number OND1306160
Data Supplier Website Nederlandse Taalunie


The central problems that the project addresses are: (i) the lack of large and rich formalized lexicons for multi-word expressions for use in NLP; (ii) the lack of proper methods and tools to extend the lexicon of an NLP-system for multi-word expressions given a text corpus in a maximally automated manner. Therefore, the project aims to develop innovative methods and tools for the automatic identification and lexical representation of multi-word expressions. Concomitantly, a 5,000 entry corpus-based multi-word expression lexical database for Dutch will be developed. The database will be externally validated, and its usability will be evaluated in two independent NLP-systems for Dutch. The project contributes to the development of electronic lexicons, in particular for Dutch. The MWE database to be developed fills a gap in existing lexical resources for Dutch. The project carries out strategic research into generic methods and tools for MWE identification and lexical representation, focusing on Dutch, but these tools will be largely language-independent and can also be used for other languages, new domains, and beyond this project. In this way the project contributes directly to strengthening the digital infrastructure for Dutch.

Related organisations

Other involved organisations

Van Dale Lexicografie BV, Utrecht: Dr. A. Schenk
ScanSoft Belgium BVBA, Merelbeke, Belgiƫ

Related people

Related research (upper level)


D16400 Information systems, databases
D36300 Germanic language and literature studies

Go to page top
Go back to contents
Go back to site navigation