| Summarization of text is often a necessity when searching and selecting information from document repositories. However, summarization technology is for a large part restricted to the extraction of sentences. Current research focuses on creating compressions of salient sentences that convey the the important content of these sentences. Such compressions can be generated by removing non-salient words, based on characteristics of parse trees, such as the dependency trees generated by the Dutch Alpino parser and grammar. Compression techniques can be developed (or modified) to yield abstract dependency structures. This project aims to make a sentence generation module that produces actual grammatical sentences on the basis of such abstract representations, using the declarative grammar of Alpino as its key knowledge source. The (Dutch wide-coverage) Alpino grammar will be used to guide the generation process, in order that syntactic constraints on word order, agreement and subcategorisation are properly taken into account. Although the Alpino grammar can be used to ensure that well-formed sentences are produced, a fluency module will be developed to ensure that the sentences that are produced are natural and appropriate. Just as parsing needs a (statistical) disambiguation component to select the appropriate parse from potentially large sets of possible parses, a fluency component is needed to select the most appropriate sentence from the set of possible sentences given by the generator. For the fluency component, this project aims to develop a machine-learning method similar in approach to the disambiguation component of the Alpino parser. The disambiguation component of Alpino contains a discriminative maximum-entropy model, trained on the Alpino treebank. For statistical ranking of competing surface realizations of the same content, a similar discriminative maximum-entropy model could be developed. |