Laurens Bastiaan van der Werff
University of Twente
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Laurens Bastiaan van der Werff.
Interdisciplinary Science Reviews | 2009
Willemijn Heeren; Laurens Bastiaan van der Werff; Franciska de Jong; Roeland Ordelman; Thijs Verschoor; Adrianus J. van Hessen; Mies Langelaar
Abstract Given the enormous backlog at audiovisual archives and the generally global level of item description, collection disclosure and item access are both at risk. At the same time, archival practice is seeking to evolve from the analogue to the digital world. CHoral investigates the role automatic annotation and search technology can play in improving disclosure and access of digitized spoken word collections during and after this transfer. The core business of the CHoral project is to design and build technology for spoken document retrieval for heritage collections. In this paper, we will argue that in addition to solving technological issues, closer attention is needed for the work-flow and daily practice at audiovisual archives on the one hand, and the state-of-the-art in technology on the other. Analysis of the interplay is needed to ensure that new developments are mutually beneficial and that continuing cooperation can indeed bring envisioned advancements.
acm multimedia | 2010
Laurens Bastiaan van der Werff
Information Retrieval systems determine relevance by comparing information needs with the content of potential retrieval units. Unlike most textual data, automatically generated speech transcripts cannot by default be easily divided into obvious retrieval units due to a lack of explicit structural markers. This problem can be addressed by automatically detecting topically cohesive segments, or stories. However, when the content collection consists of speech from less formal domains than broadcast news, most of the standard automatic boundary detection methods are potentially unsuitable due to their reliance on learned features. In particular for conversational speech, the lack of adequate training data can present a significant issue. In this paper four methods for automatic segmentation of speech transcriptions are compared. These are selected because of their independence from collection specific knowledge and implemented without the use of training data. Two of the four methods are based on existing algorithms, the others are novel approaches based on a dynamic segmentation algorithm (QDSA) that incorporates information about the query, and WordNet. Experiments were done on a task similar to TREC SDR unknown boundaries condition. For the best performing system, QDSA, the retrieval scores for a tfidf-type ranking function were equivalent to a reference segmentation, and improved through document length normalization using the bm25/Okapi method. For the task of automatically segmenting speech transcripts for use in information retrieval, we conclude that a training-poor processing paradigm which can be crucial for handling surprise data is feasible.Information Retrieval systems determine relevance by comparing information needs with the content of potential retrieval units. Unlike most textual data, automatically generated speech transcripts cannot by default be easily divided into obvious retrieval units due to a lack of explicit structural markers. This problem can be addressed by automatically detecting topically cohesive segments, or stories. However, when the content collection consists of speech from less formal domains than broadcast news, most of the standard automatic boundary detection methods are potentially unsuitable due to their reliance on learned features. In particular for conversational speech, the lack of adequate training data can present a significant issue. In this paper four methods for automatic segmentation of speech transcriptions are compared. These are selected because of their independence from collection specific knowledge and implemented without the use of training data. Two of the four methods are based on existing algorithms, the others are novel approaches based on a dynamic segmentation algorithm (QDSA) that incorporates information about the query, and WordNet. Experiments were done on a task similar to TREC SDR unknown boundaries condition. For the best performing system, QDSA, the retrieval scores for a tfidf-type ranking function were equivalent to a reference segmentation, and improved through document length normalization using the bm25/Okapi method. For the task of automatically segmenting speech transcripts for use in information retrieval, we conclude that a training-poor processing paradigm which can be crucial for handling surprise data is feasible.
cross language evaluation forum | 2006
Robin Aly; Djoerd Hiemstra; Roeland Ordelman; Laurens Bastiaan van der Werff; Franciska de Jong
In this paper the XML Information Retrieval System PF/Tijah is applied to retrieval tasks on large spoken document collections. The used example setting is the English CLEF-2006 CL-SR collection together with given English topics and self produced Dutch topics. The main findings presented in this paper are the easy way of adapting queries to use different kinds and combinations of metadata. Furthermore simple ways of combining different metadata kinds are shown to be beneficial in terms of mean average precision.
Lot Occasional Series | 2007
Laurens Bastiaan van der Werff; Willemijn Heeren; Roeland Ordelman; Franciska de Jong
conference of the international speech communication association | 2009
Marijn Huijbregts; Roeland Ordelman; Laurens Bastiaan van der Werff; Franciska de Jong
international acm sigir conference on research and development in information retrieval | 2007
Laurens Bastiaan van der Werff; Willemijn Heeren
conference of the international speech communication association | 2011
Laurens Bastiaan van der Werff; Wessel Kraaij; Franciska de Jong
Annals of Statistics | 2008
Willemijn Heeren; Franciska de Jong; Laurens Bastiaan van der Werff; Marijn Huijbregts; Roeland Ordelman
conference of the international speech communication association | 2015
Laurens Bastiaan van der Werff; Jón Guðnason; Kamilla R. Johannsdottir
Measurement | 2007
Willemijn Heeren; Laurens Bastiaan van der Werff; Roeland Ordelman; Arjan van Hessen; Franciska de Jong; Charles L. A. Clarke; Norbert Fuhr; Noriko Kando; Wessel Kraaij; Vries de E. F. A