KNAW

Research

Development and evaluation of language resources and tools

Pagina-navigatie:


Update content


Title Development and evaluation of language resources and tools
Period 01 / 2006 - unknown
Status Completed
Research number OND1319925
Data Supplier Website CLS

Abstract

Each of the three other sub-programmes crucially depends on the availability of large corpora of spoken and written language as well as on suitable tools for processing the corpus data. In the past, this sub-programme was heavily involved in the creation and maintenance of the Spoken Dutch Corpus, and the accumulated expertise in corpus building is now applied to the development of further corpora. We intend to strengthen our position in the STEVIN Programme by taking a leading role in the creation of a very large and richly annotated corpus of contemporary written Dutch. In addition, we want to play a pivotal role in enlarging the Spoken Dutch Corpus, and in building processing tools for spoken language, such as orthographic and automatic phonetic transcription. In parallel, this sub-programme is developing a range of corpus-based tools for novel and scientifically interesting applications of language technology, for example, stylometry, authorship attribution, and the profiling of socio-geographic language variation. This will allow us to extend the domain in which our expertise in language resources and language technology tools and techniques can be deployed in other fields of inquiry, such as literary and historicalstudies. During the last decade the issue of how applications of language and speech technology can be evaluated has become ever more urgent. We will respond to this development by developing suitable evaluation procedures for the applications mentioned in the previous paragraph, as well as for the evaluation of language technology components in more conventional applications such as question answering. During the next years we will intensify our already close working relations with the MPI, and especially the Technical Support Group for building an e-Humanities research infrastructure dedicated to the needs of linguistics and literary and cultural studies. For the development of tools for speech processing we will collaborate closely with the Katholieke Universiteit Leuven.

Related organisations

Related people

Classification

D36200 Germanic language and literature studies

Go to page top
Go back to contents
Go back to site navigation