KNAW

Research

Dutch Language Corpus Initiative (D-coi)

Pagina-navigatie:


Update content


Title Dutch Language Corpus Initiative (D-coi)
Period 04 / 2005 - unknown
Status Completed
URL http://hmi.ewi.utwente.nl/project/STEVIN
Research number OND1306158
Data Supplier Website Nederlandse Taalunie

Abstract

The project can be characterized as a preparatory project and aims to produce a blueprint for the construction of a 500-million-word corpus of contemporary written Dutch. This will entail the design of the corpus and the development (or adaptation) of protocols, procedures and tools that are needed for sampling data, cleaning up, converting file formats, marking up, annotating, post editing, and validating the data. In order to support these developments, a 50-million-word pilot corpus will be compiled, parts of which will be enriched with linguistic annotations. The pilot corpus is intended to demonstrate the feasibility of the approach. It will provide the necessary testing ground on the basis of which feedback can be obtained about the adequacy and practicability of various annotation schemes and procedures, and the level of success with which tools can be applied. Moreover, it will serve to establish the usefulness of this type of resource and annotations for different types of HLT research and the development of applications. The Danish Center for Sprogteknologi (CST) will undertake the evaluation of the protocols and procedures. At the end of the project, the pilot corpus together with all other results obtained within the project will be made available through the Flemish-Dutch HLT Agency (TST-centrale).

Related organisations

Other involved organisations

Katholieke Universiteit Leuven, Centrum voor Computerlinguïstiek, Leuven, België: Prof.dr. F. van Eynde
Universiteit Antwerpen, Centrum voor Nederlandse Taal en Spraak - CNTS, Antwerpen, België
Polderland Speech & Technology BV, Nijmegen: Drs. Th. van den Heuvel

Related people

Related research (upper level)

Classification

A90000 Fundamental research
D16400 Information systems, databases
D36200 Germanic language and literature studies

Go to page top
Go back to contents
Go back to site navigation