Jan De Belder | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jan De Belder is active.

Explore More

Publication

Featured researches published by Jan De Belder.

Computer Speech & Language | 2012

The latent words language model

Koen Deschacht; Jan De Belder; Marie-Francine Moens

We present a new generative model of natural language, the latent words language model. This model uses a latent variable for every word in a text that represents synonyms or related words in the given context. We develop novel methods to train this model and to find the expected value of these latent variables for a given unseen text. The learned word similarities help to reduce the sparseness problems of traditional n-gram language models. We show that the model significantly outperforms interpolated Kneser-Ney smoothing and class-based language models on three different corpora. Furthermore the latent variables are useful features for information extraction. We show that both for semantic role labeling and word sense disambiguation, the performance of a supervised classifier increases when incorporating these variables as extra features. This improvement is especially large when using only a small annotated corpus for training.

international conference on computational linguistics | 2010

Integer linear programming for dutch sentence compression

Jan De Belder; Marie-Francine Moens

Sentence compression is a valuable task in the framework of text summarization. In this paper we compress sentences from news articles from Dutch and Flemish newspapers written in Dutch using an integer linear programming approach. We rely on the Alpino parser available for Dutch and on the Latent Words Language Model. We demonstrate that the integer linear programming approach yields good results for compressing Dutch sentences, despite the large freedom in word order.

Essential Speech and Language Technology for Dutch | 2013

Question Answering of InformativeWeb Pages: How Summarisation Technology Helps

Jan De Belder; Daniël de Kok; Gertjan van Noord; Fabrice Nauze; Leonoor van der Beek; Marie-Francine Moens

During the DAISY project we have developed essential technology for automatic summarisation of Dutch informative web pages. The project especially focuses on paraphrasing and compression of Dutch sentences, and on the rhetorical classification of content blocks and sentences in the web pages. For the paraphrasing and compression we rely on language models and syntactic constraints. In addition, the Alpino parser for Dutch was extended with a fluency component. Because the rhetorical role of a sentence is dependent on the role of its surrounding sentences we improve the rhetorical classification by finding a globally optimal assignment for all the sentences in a web page. Both the sentence compression and rhetorical classification use an Integer Linear Programming optimization strategy.

international acm sigir conference on research and development in information retrieval | 2010