Jan De Belder
Katholieke Universiteit Leuven
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jan De Belder.
Computer Speech & Language | 2012
Koen Deschacht; Jan De Belder; Marie-Francine Moens
We present a new generative model of natural language, the latent words language model. This model uses a latent variable for every word in a text that represents synonyms or related words in the given context. We develop novel methods to train this model and to find the expected value of these latent variables for a given unseen text. The learned word similarities help to reduce the sparseness problems of traditional n-gram language models. We show that the model significantly outperforms interpolated Kneser-Ney smoothing and class-based language models on three different corpora. Furthermore the latent variables are useful features for information extraction. We show that both for semantic role labeling and word sense disambiguation, the performance of a supervised classifier increases when incorporating these variables as extra features. This improvement is especially large when using only a small annotated corpus for training.
international conference on computational linguistics | 2010
Jan De Belder; Marie-Francine Moens
Sentence compression is a valuable task in the framework of text summarization. In this paper we compress sentences from news articles from Dutch and Flemish newspapers written in Dutch using an integer linear programming approach. We rely on the Alpino parser available for Dutch and on the Latent Words Language Model. We demonstrate that the integer linear programming approach yields good results for compressing Dutch sentences, despite the large freedom in word order.
Essential Speech and Language Technology for Dutch | 2013
Jan De Belder; Daniël de Kok; Gertjan van Noord; Fabrice Nauze; Leonoor van der Beek; Marie-Francine Moens
During the DAISY project we have developed essential technology for automatic summarisation of Dutch informative web pages. The project especially focuses on paraphrasing and compression of Dutch sentences, and on the rhetorical classification of content blocks and sentences in the web pages. For the paraphrasing and compression we rely on language models and syntactic constraints. In addition, the Alpino parser for Dutch was extended with a fluency component. Because the rhetorical role of a sentence is dependent on the role of its surrounding sentences we improve the rhetorical classification by finding a globally optimal assignment for all the sentences in a web page. Both the sentence compression and rhetorical classification use an Integer Linear Programming optimization strategy.
international acm sigir conference on research and development in information retrieval | 2010
Jan De Belder; Marie-Francine Moens
Proceedings of SIM 2009, joint conference: SRL ILP MLG | 2009
Jan De Belder; Wim De Smet; Raquel Mochales Palau; Marie-Francine Moens
Essential Speech and Language Technology for Dutch | 2012
Jan De Belder; Daniël de Kok; Gertjan van Noord; Fabrice Nauze; Leonoor van der Beek; Marie-Francine Moens
international conference on computational linguistics | 2012
Jan De Belder; Marie-F rancine Moens
Lecture Notes in Computer Science | 2012
Jan De Belder; Marie-Francine Moens
Proceedings of the 10th Dutch-Belgian information retrieval workshop (DIR 2010) | 2010
Jan De Belder; Marie-Francine Moens
Lecture Notes in Computer Science | 2010
Jan De Belder; Marie-Francine Moens