KNAW

Onderzoek

Corpus van de Nederlandse taal (D-coi)

Pagina-navigatie:


Wijzig gegevens


Titel Corpus van de Nederlandse taal (D-coi)
Looptijd 04 / 2005 - onbekend
Status Afgesloten
URL http://hmi.ewi.utwente.nl/project/STEVIN
Onderzoeknummer OND1306158
Leverancier gegevens Website Nederlandse Taalunie

Samenvatting (EN)

The project can be characterized as a preparatory project and aims to produce a blueprint for the construction of a 500-million-word corpus of contemporary written Dutch. This will entail the design of the corpus and the development (or adaptation) of protocols, procedures and tools that are needed for sampling data, cleaning up, converting file formats, marking up, annotating, post editing, and validating the data. In order to support these developments, a 50-million-word pilot corpus will be compiled, parts of which will be enriched with linguistic annotations. The pilot corpus is intended to demonstrate the feasibility of the approach. It will provide the necessary testing ground on the basis of which feedback can be obtained about the adequacy and practicability of various annotation schemes and procedures, and the level of success with which tools can be applied. Moreover, it will serve to establish the usefulness of this type of resource and annotations for different types of HLT research and the development of applications. The Danish Center for Sprogteknologi (CST) will undertake the evaluation of the protocols and procedures. At the end of the project, the pilot corpus together with all other results obtained within the project will be made available through the Flemish-Dutch HLT Agency (TST-centrale).

Betrokken organisaties

Overige betrokken organisaties

Katholieke Universiteit Leuven, Centrum voor Computerlinguïstiek, Leuven, België: Prof.dr. F. van Eynde
Universiteit Antwerpen, Centrum voor Nederlandse Taal en Spraak - CNTS, Antwerpen, België
Polderland Speech & Technology BV, Nijmegen: Drs. Th. van den Heuvel

Betrokken personen

Bovenliggende onderzoeksactiviteit(en)

Classificatie

A90000 Zuiver-wetenschappelijk onderzoek
D16400 Informatiesystemen, databases
D36200 Germaanse taal- en letterkunde

Omhoog
Ga terug naar de inhoud
Ga terug naar de site navigatie