Identificatie en representatie van meer-woord uitdrukkingen (IRME)


Wijzig gegevens

Titel Identificatie en representatie van meer-woord uitdrukkingen (IRME)
Looptijd 04 / 2005 - onbekend
Status Afgesloten
Onderzoeknummer OND1306160
Leverancier gegevens Website Nederlandse Taalunie

Samenvatting (EN)

The central problems that the project addresses are: (i) the lack of large and rich formalized lexicons for multi-word expressions for use in NLP; (ii) the lack of proper methods and tools to extend the lexicon of an NLP-system for multi-word expressions given a text corpus in a maximally automated manner. Therefore, the project aims to develop innovative methods and tools for the automatic identification and lexical representation of multi-word expressions. Concomitantly, a 5,000 entry corpus-based multi-word expression lexical database for Dutch will be developed. The database will be externally validated, and its usability will be evaluated in two independent NLP-systems for Dutch. The project contributes to the development of electronic lexicons, in particular for Dutch. The MWE database to be developed fills a gap in existing lexical resources for Dutch. The project carries out strategic research into generic methods and tools for MWE identification and lexical representation, focusing on Dutch, but these tools will be largely language-independent and can also be used for other languages, new domains, and beyond this project. In this way the project contributes directly to strengthening the digital infrastructure for Dutch.

Betrokken organisaties

Overige betrokken organisaties

Van Dale Lexicografie BV, Utrecht: Dr. A. Schenk
ScanSoft Belgium BVBA, Merelbeke, België

Betrokken personen

Bovenliggende onderzoeksactiviteit(en)


D16400 Informatiesystemen, databases
D36300 Germaanse taal- en letterkunde

Ga terug naar de inhoud
Ga terug naar de site navigatie