Laurens Bastiaan van der Werff

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Laurens Bastiaan van der Werff is active.

Explore More

Publication

Featured researches published by Laurens Bastiaan van der Werff.

Interdisciplinary Science Reviews | 2009

Easy Listening: Spoken Document Retrieval in CHoral

Willemijn Heeren; Laurens Bastiaan van der Werff; Franciska de Jong; Roeland Ordelman; Thijs Verschoor; Adrianus J. van Hessen; Mies Langelaar

Abstract Given the enormous backlog at audiovisual archives and the generally global level of item description, collection disclosure and item access are both at risk. At the same time, archival practice is seeking to evolve from the analogue to the digital world. CHoral investigates the role automatic annotation and search technology can play in improving disclosure and access of digitized spoken word collections during and after this transfer. The core business of the CHoral project is to design and build technology for spoken document retrieval for heritage collections. In this paper, we will argue that in addition to solving technological issues, closer attention is needed for the work-flow and daily practice at audiovisual archives on the one hand, and the state-of-the-art in technology on the other. Analysis of the interplay is needed to ensure that new developments are mutually beneficial and that continuing cooperation can indeed bring envisioned advancements.

acm multimedia | 2010

Story segmentation for speech transcripts in sparse data conditions

Laurens Bastiaan van der Werff

Information Retrieval systems determine relevance by comparing information needs with the content of potential retrieval units. Unlike most textual data, automatically generated speech transcripts cannot by default be easily divided into obvious retrieval units due to a lack of explicit structural markers. This problem can be addressed by automatically detecting topically cohesive segments, or stories. However, when the content collection consists of speech from less formal domains than broadcast news, most of the standard automatic boundary detection methods are potentially unsuitable due to their reliance on learned features. In particular for conversational speech, the lack of adequate training data can present a significant issue. In this paper four methods for automatic segmentation of speech transcriptions are compared. These are selected because of their independence from collection specific knowledge and implemented without the use of training data. Two of the four methods are based on existing algorithms, the others are novel approaches based on a dynamic segmentation algorithm (QDSA) that incorporates information about the query, and WordNet. Experiments were done on a task similar to TREC SDR unknown boundaries condition. For the best performing system, QDSA, the retrieval scores for a tfidf-type ranking function were equivalent to a reference segmentation, and improved through document length normalization using the bm25/Okapi method. For the task of automatically segmenting speech transcripts for use in information retrieval, we conclude that a training-poor processing paradigm which can be crucial for handling surprise data is feasible.Information Retrieval systems determine relevance by comparing information needs with the content of potential retrieval units. Unlike most textual data, automatically generated speech transcripts cannot by default be easily divided into obvious retrieval units due to a lack of explicit structural markers. This problem can be addressed by automatically detecting topically cohesive segments, or stories. However, when the content collection consists of speech from less formal domains than broadcast news, most of the standard automatic boundary detection methods are potentially unsuitable due to their reliance on learned features. In particular for conversational speech, the lack of adequate training data can present a significant issue. In this paper four methods for automatic segmentation of speech transcriptions are compared. These are selected because of their independence from collection specific knowledge and implemented without the use of training data. Two of the four methods are based on existing algorithms, the others are novel approaches based on a dynamic segmentation algorithm (QDSA) that incorporates information about the query, and WordNet. Experiments were done on a task similar to TREC SDR unknown boundaries condition. For the best performing system, QDSA, the retrieval scores for a tfidf-type ranking function were equivalent to a reference segmentation, and improved through document length normalization using the bm25/Okapi method. For the task of automatically segmenting speech transcripts for use in information retrieval, we conclude that a training-poor processing paradigm which can be crucial for handling surprise data is feasible.

cross language evaluation forum | 2006

XML information retrieval from spoken word archives

Robin Aly; Djoerd Hiemstra; Roeland Ordelman; Laurens Bastiaan van der Werff; Franciska de Jong

In this paper the XML Information Retrieval System PF/Tijah is applied to retrieval tasks on large spoken document collections. The used example setting is the English CLEF-2006 CL-SR collection together with given English topics and self produced Dutch topics. The main findings presented in this paper are the easy way of adapting queries to use different kinds and combinations of metadata. Furthermore simple ways of combining different metadata kinds are shown to be beneficial in terms of mean average precision.

Lot Occasional Series | 2007

Radio Oranje: Enhanced Access to a Historical Spoken Word Collection

Laurens Bastiaan van der Werff; Willemijn Heeren; Roeland Ordelman; Franciska de Jong

conference of the international speech communication association | 2009

SHoUT, the university of twente submission to the n-best 2008 speech recognition evaluation for dutch.

Marijn Huijbregts; Roeland Ordelman; Laurens Bastiaan van der Werff; Franciska de Jong

international acm sigir conference on research and development in information retrieval | 2007

Evaluating ASR Output for Information Retrieval

Laurens Bastiaan van der Werff; Willemijn Heeren

conference of the international speech communication association | 2011

Speech Transcript Evaluation for Information Retrieval

Laurens Bastiaan van der Werff; Wessel Kraaij; Franciska de Jong

Annals of Statistics | 2008

Evaluation of spoken document retrieval for historic speech collections

Willemijn Heeren; Franciska de Jong; Laurens Bastiaan van der Werff; Marijn Huijbregts; Roeland Ordelman

conference of the international speech communication association | 2015

Detection of cardiovascular reactivity in speech.

Laurens Bastiaan van der Werff; Jón Guðnason; Kamilla R. Johannsdottir

Measurement | 2007

Radio Oranje: searching the queen's speech(es)

Willemijn Heeren; Laurens Bastiaan van der Werff; Roeland Ordelman; Arjan van Hessen; Franciska de Jong; Charles L. A. Clarke; Norbert Fuhr; Noriko Kando; Wessel Kraaij; Vries de E. F. A

Explore More