KNAW

Research

Extension of the Spoken Dutch Corpus (CGN) with speech of children,...

Pagina-navigatie:


Update Research data


Title Extension of the Spoken Dutch Corpus (CGN) with speech of children, non-natives, elderly and human-machine interaction (JASMIN-CGN)
Period 04 / 2005 - unknown
Status Current
URL http://taalunieversum.org/taal/technologie/stevin/projectportfolio_1ste_stevin_oproep.pdf
Research number OND1306159
Data Supplier Website Nederlandse Taalunie

Abstract

Large speech corpora (LSC) constitute an indispensable resource for conducting research in speech processing and for developing real-life speech applications. In 2004 the Spoken Dutch Corpus (Corpus Gesproken Nederlands - CGN) became available, which constitutes a plausible sample of standard Dutch as spoken by adult natives in the Netherlands and Flanders. Owing to budget constraints, CGN does not include speech of children, non-natives, elderly people and recordings of speech produced in human-machine interactions. Since such recordings would be extremely useful for conducting research and for developing HLT applications for these specific groups of speakers of Dutch, the present proposal aims at extending CGN in three dimensions. First, by collecting a corpus of contemporary Dutch as spoken by children of different age groups, non-natives with different mother tongues and elderly people in the Netherlands and Flanders (JASMIN-CGN), we aim at an extension along the age and mother tongue dimensions. In addition, we intend to collect speech material in a communication setting that was not envisaged in CGN: human-machine interaction. Therefore, in this project part of the speech material from the three speaker groups will be collected in a setting of human-machine communication. We expect that the knowledge gathered from these data can be generalized to developing appropriate systems also for speaker groups (i.e. adult natives). One-third of the data will be collected in Flanders and two-thirds in the Netherlands.

Related organisations

Other involved organisations

Katholieke Universiteit Leuven, ESAT/PSI Speech Group, leuven, Belgiƫ, Prof.dr. H. Vanhamme
TalkingHome, Enschede: Dr.ir. F.M.A. Smits

Related people

Project leader Dr. C. Cucchiarini

Related research (upper level)

Classification

A31100 ICT equipment
D16500 User interfaces, multimedia
D36200 Germanic language and literature studies

Go to page top
Go back to contents
Go back to site navigation