Heidi Christensen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Heidi Christensen is active.

Explore More

Publication

Featured researches published by Heidi Christensen.

Computer Speech & Language | 2013

The PASCAL CHiME speech separation and recognition challenge

Jon Barker; Emmanuel Vincent; Ning Ma; Heidi Christensen; Phil D. Green

Distant microphone speech recognition systems that operate with human-like robustness remain a distant goal. The key difficulty is that operating in everyday listening conditions entails processing a speech signal that is reverberantly mixed into a noise background composed of multiple competing sound sources. This paper describes a recent speech recognition evaluation that was designed to bring together researchers from multiple communities in order to foster novel approaches to this problem. The task was to identify keywords from sentences reverberantly mixed into audio backgrounds binaurally recorded in a busy domestic environment. The challenge was designed to model the essential difficulties of the multisource environment problem while remaining on a scale that would make it accessible to a wide audience. Compared to previous ASR evaluations a particular novelty of the task is that the utterances to be recognised were provided in a continuous audio background rather than as pre-segmented utterances thus allowing a range of background modelling techniques to be employed. The challenge attracted thirteen submissions. This paper describes the challenge problem, provides an overview of the systems that were entered and provides a comparison alongside both a baseline recognition system and human performance. The paper discusses insights gained from the challenge and lessons learnt for the design of future such evaluations.

european conference on information retrieval | 2004

From text summarisation to style-specific summarisation for broadcast news

Heidi Christensen; BalaKrishna Kolluru; Yoshihiko Gotoh; Steve Renals

In this paper we report on a series of experiments investigating the path from text-summarisation to style-specific summarisation of spoken news stories. We show that the portability of traditional text summarisation features to broadcast news is dependent on the diffusiveness of the information in the broadcast news story. An analysis of two categories of news stories (containing only read speech or some spontaneous speech) demonstrates the importance of the style and the quality of the transcript, when extracting the summary-worthy information content. Further experiments indicate the advantages of doing style-specific summarisation of broadcast news.

international conference on acoustics, speech, and signal processing | 2009

A speech fragment approach to localising multiple speakers in reverberant environments

Heidi Christensen; Ning Ma; Stuart N. Wrigley; Jon Barker

Sound source localisation cues are severely degraded when multiple acoustic sources are active in the presence of reverberation. We present a binaural system for localising simultaneous speakers which exploits the fact that in a speech mixture there exist spectro-temporal regions or ‘fragments’, where the energy is dominated by just one of the speakers. A fragment-level localisation model is proposed that integrates the localisation cues within a fragment using a weighted mean. The weights are based on local estimates of the degree of reverberation in a given spectro-temporal cell. The paper investigates different weight estimation approaches based variously on, i) an established model of the perceptual precedence effect; ii) a measure of interaural coherence between the left and right ear signals; iii) a data-driven approach trained in matched acoustic conditions. Experiments with reverberant binaural data with two simultaneous speakers show appropriate weighting can improve frame-based localisation performance by up to 24%.

international conference on acoustics, speech, and signal processing | 2000

Employing heterogeneous information in a multi-stream framework

Heidi Christensen; Børge Lindberg; Ove Kjeld Andersen

A multi-stream speech recogniser is based on the combination of multiple feature streams each containing complementary information. In the past, multi-stream research has typically focused on systems that use a single feature extraction method. This heritage from conventional speech recognisers is an unnecessary restriction and both psychoacoustic and phonetic knowledge strongly motivate the use of heterogeneous features. In this paper we investigate how heterogeneous processing can be used in two different multi-stream configurations: first, a system where each stream handles a different frequency region of the speech (a multi-band recogniser) and, second a multi-stream recogniser where each stream handles the full frequency region. For each type of system we compare the performance using both homogeneous and heterogeneous processing. We demonstrate that the use of heterogeneous information significantly improves the clean speech recognition performance motivating us to continue exploring more specifically designed stream processing.

international conference on acoustics, speech, and signal processing | 2005

Maximum entropy segmentation of broadcast news

Heidi Christensen; BalaKrishna Kolluru; Yoshihiko Gotoh; Steve Renals

The paper presents an automatic system for structuring and preparing a news broadcast for applications such as speech summarization, browsing, archiving and information retrieval. This process comprises transcribing the audio using an automatic speech recognizer and subsequently segmenting the text into utterances and topics. A maximum entropy approach is used to build statistical models for both utterance and topic segmentation. The experimental work addresses the effect on performance of the topic boundary detector of three factors - the types of feature used, the quality of the ASR transcripts, and the quality of the utterance boundary detector. The results show that the topic segmentation is not affected severely by transcript errors, whereas errors in utterance segmentation are more devastating.

annual meeting of the special interest group on discourse and dialogue | 2015

Knowledge transfer between speakers for personalised dialogue management

Iñigo Casanueva; Thomas Hain; Heidi Christensen; Ricard Marxer; Phil D. Green

Model-free reinforcement learning has been shown to be a promising data driven approach for automatic dialogue policy optimization, but a relatively large amount of dialogue interactions is needed before the system reaches reasonable performance. Recently, Gaussian process based reinforcement learning methods have been shown to reduce the number of dialogues needed to reach optimal performance, and pre-training the policy with data gathered from different dialogue systems has further reduced this amount. Following this idea, a dialogue system designed for a single speaker can be initialised with data from other speakers, but if the dynamics of the speakers are very different the model will have a poor performance. When data gathered from different speakers is available, selecting the data from the most similar ones might improve the performance. We propose a method which automatically selects the data to transfer by defining a similarity measure between speakers, and uses this measure to weight the influence of the data from each speaker in the policy model. The methods are tested by simulating users with different severities of dysarthria interacting with a voice enabled environmental control system.

ieee automatic speech recognition and understanding workshop | 2003

Are extractive text summarisation techniques portable to broadcast news

Heidi Christensen; Yoshihiko Gotoh; BalaKrishna Kolluru; Steve Renals

In this paper we report on a series of experiments which compare the effect of individual features on both text and speech summarisation, the effect of basing the speech summaries on automatic speech recognition transcripts with varying word error rates, and the effect of summarisation approach and transcript source on summary quality. We show that classical text summarisation features (based on stylistic and content information) are portable to broadcast news. However, the quality of the speech transcripts as well as the difference in information structure between broadcast and newspaper news affect the usability of the individual features.

IEEE Transactions on Audio, Speech, and Language Processing | 2012

Combining Speech Fragment Decoding and Adaptive Noise Floor Modeling

Ning Ma; Jon Barker; Heidi Christensen; Phil D. Green

This paper presents a novel noise-robust automatic speech recognition (ASR) system that combines aspects of the noise modeling and source separation approaches to the problem. The combined approach has been motivated by the observation that the noise backgrounds encountered in everyday listening situations can be roughly characterized as a slowly varying noise floor in which there are embedded a mixture of energetic but unpredictable acoustic events. Our solution combines two complementary techniques. First, an adaptive noise floor model estimates the degree to which high-energy acoustic events are masked by the noise floor (represented by a soft missing data mask). Second, a fragment decoding system attempts to interpret the high-energy regions that are not accounted for by the noise floor model. This component uses models of the target speech to decide whether fragments should be included in the target speech stream or not. Our experiments on the CHiME corpus task show that the combined approach performs significantly better than systems using either the noise model or fragment decoding approach alone, and substantially outperforms multicondition training.

IEEE Transactions on Audio, Speech, and Language Processing | 2008

A Cascaded Broadcast News Highlighter

Heidi Christensen; Yoshihiko Gotoh; Steve Renals

This paper presents a fully automatic news skimming system which takes a broadcast news audio stream and provides the user with the segmented, structured, and highlighted transcript. This constitutes a system with three different, cascading stages: converting the audio stream to text using an automatic speech recognizer, segmenting into utterances and stories, and finally determining which utterance should be highlighted using a saliency score. Each stage must operate on the erroneous output from the previous stage in the system, an effect which is naturally amplified as the data progresses through the processing stages. We present a large corpus of transcribed broadcast news data enabling us to investigate to which degree information worth highlighting survives this cascading of processes. Both extrinsic and intrinsic experimental results indicate that mistakes in the story boundary detection has a strong impact on the quality of highlights, whereas erroneous utterance boundaries cause only minor problems. Further, the difference in transcription quality does not affect the overall performance greatly.

international conference on acoustics, speech, and signal processing | 2017

Multi-view representation learning via gcca for multimodal analysis of Parkinson's disease

Juan Camilo Vásquez-Correa; Juan Rafael Orozco-Arroyave; Raman Arora; Elmar Nöth; Najim Dehak; Heidi Christensen; Frank Rudzicz; Tobias Bocklet; Milos Cernak; Hamidreza Chinaei; Julius Hannink; Phani Sankar Nidadavolu; Maria Yancheva; Alyssa Vann; Nikolai Vogler

Information from different bio-signals such as speech, handwriting, and gait have been used to monitor the state of Parkinsons disease (PD) patients, however, all the multimodal bio-signals may not always be available. We propose a method based on multi-view representation learning via generalized canonical correlation analysis (GCCA) for learning a representation of features extracted from handwriting and gait that can be used as a complement to speech-based features. Three different problems are addressed: classification of PD patients vs. healthy controls, prediction of the neurological state of PD patients according to the UPDRS score, and the prediction of a modified version of the Frenchay dysarthria assessment (m-FDA). According to the results, the proposed approach is suitable to improve the results in the addressed problems, specially in the prediction of the UPDRS, and m-FDA scores.

Explore More