Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Henk van den Heuvel is active.

Publication


Featured researches published by Henk van den Heuvel.


International Journal of Speech Technology | 1997

A Spoken Dialog System for the Dutch Public Transport Information Service

Helmer Strik; A.J.M. Russel; Henk van den Heuvel; Catia Cucchiarini; Lou Boves

In the Netherlands there is a nationwide premium rate telephone number that can be dialed to obtain information about various forms of public transport. In 1996 this number was called more than twelve million times. Human operators managed to handle only about nine million of these calls. In order to answer more of these calls, a spoken dialog system was developed to automate part of this service. The automation component concerns information about journeys between two train stations.The starting point of our research was an existing German information system. This system was ported to Dutch. A bootstrapping method was used to collect the data, which in turn were used to improve the system itself.


International Journal of Speech Technology | 2001

Annotation in the SpeechDat Projects

Henk van den Heuvel; L.W.J. Boves; Asunción Moreno; Maurizio Omologo; Gaël Richard; Eric Sanders

A large set of spoken language resources (SLR) for various European languages is being compiled in several SpeechDat projects with the aim to train and test speech recognizers for voice driven services, mainly over telephone lines. This paper is focused on the annotation conventions applied for the Speechdat SLR. These SLR contain typical examples of short monologue speech utterances with simple orthographic transcriptions in a hierarchically simple annotation structure. The annotation conventions and their underlying principles are described and compared to approaches used for related SLR. The synchronization of the orthographic transcriptions with the corresponding speech files is addressed, and the impact of the selected approach for capturing specific phonological and phonetic phenomena is discussed. In the SpeechDat projects a number of tools have been developed to carry out the transcription of the speech. In this paper, a short description of these tools and their properties is provided. For all SpeechDat projects, an internal validity check of the databases and their annotations is carried out. The procedure of this validation campaign, the performed evaluations, and some of the results are presented.


Telemedicine Journal and E-health | 2010

E-Learning-Based Speech Therapy: A Web Application for Speech Training

Lilian J. Beijer; Toni Rietveld; Marijn M.A. van Beers; Robert M.L. Slangen; Henk van den Heuvel; Bert J.M. de Swart; A.C.H. Geurts

Abstract In The Netherlands, a web application for speech training, E-learning-based speech therapy (EST), has been developed for patients with dysarthria, a speech disorder resulting from acquired neurological impairments such as stroke or Parkinsons disease. In this report, the EST infrastructure and its potentials for both therapists and patients are elucidated. EST provides patients with dysarthria the opportunity to engage in intensive speech training in their own environment, in addition to undergoing the traditional face-to-face therapy. Moreover, patients with chronic dysarthria can use EST to independently maintain the quality of their speech once the face-to-face sessions with their speech therapist have been completed. This telerehabilitation application allows therapists to remotely compose speech training programs tailored to suit each individual patient. Moreover, therapists can remotely monitor and evaluate changes in the patients speech. In addition to its value as a device for composing, monitoring, and carrying out web-based speech training, the EST system compiles a database of dysarthric speech. This database is vital for further scientific research in this area.


Procedia Computer Science | 2016

Investigating Bilingual Deep Neural Networks for Automatic Recognition of Code-switching Frisian Speech☆

Emre Yilmaz; Henk van den Heuvel; David A. van Leeuwen

Abstract In this paper, a code-switching automatic speech recognition (ASR) system built for the Frisian language is described. Frisian is mostly spoken in the province Fryslân which is located in the north of the Netherlands. The native speakers of Frisian are mostly bilingual and often code-switch in daily conversations due to the extensive influence of the Dutch language. In the scope of the FAME! Project, the influence of this unforeseen language switching on modern ASR systems will be investigated with the objective of building a robust recognizer that can handle this phenomenon. For this purpose, in this work, we design a bilingual deep neural network (DNN)-based ASR system and investigate the impact of bilingual DNN training in the context of code-switching speech.


Speech Communication | 2012

Improving proper name recognition by means of automatically learned pronunciation variants

Bert Réveil; Jean-Pierre Martens; Henk van den Heuvel

This paper introduces a novel lexical modeling approach that aims to improve large vocabulary proper name recognition for native and non-native speakers. The method uses one or more so-called phoneme-to-phoneme (P2P) converters to add useful pronunciation variants to a baseline lexicon. Each P2P converter is a stochastic automaton that applies context-dependent transformation rules to a baseline transcription that is generated by a standard grapheme-to-phoneme (G2P) converter. The paper focuses on the inclusion of different types of features to describe the rule context - ranging from the identities of neighboring phonemes to morphological and even semantic features such as the language of origin of the name - and on the development and assessment of methods that can cope with cross-lingual issues. Another aim is to ensure that the proposed solutions are applicable to new names (not seen during system development) and useful in the hands of product developers with good knowledge of their application domain but little expertise in automatic speech recognition (ASR) and speech corpus acquisition. The proposed method was evaluated on person name and geographical name recognition, two economically interesting domains in which non-native speakers as well as non-native names occur very frequently. For the recognition experiments a state-of-the-art commercial ASR engine was employed. The experimental results demonstrate that significant improvements of the recognition accuracy can be achieved: large gains (up to 40% relative) in case prior knowledge of the speaker tongue and the name origin is available, and still significant gains in case no such prior information is available.


Speech Communication | 2003

Modeling lexical stress in continuous speech recognition for Dutch

Henk van den Heuvel; David van Kuijk; Lou Boves

The acoustic realization of vowels with lexical stress generally differs substantially from their unstressed counterparts, which are more reduced in spectral quality, shorter in duration, weaker in intensity and tend to have a flatter spectral tilt. Therefore, in a continuous speech recognizer (CSR) it would appear profitable to train separate models for the stressed and unstressed variants of each vowel. In the experiments reported on here, we applied stress modeling in both training and testing of the recognizer. Recognition experiments on an independent test set showed that recognition rates did not improve by this use of stress in our CSR. However, if we swapped the stress markers in the recognition lexicon the recognition rates did significantly deteriorate. This demonstrated that the acoustic models for the stressed and unstressed variants of the vowels were different.A pitfall in this experiment was that lexical stress information and phonemic context were possibly confounded. In a follow-up experiment we controlled for context by using generalized context-dependent models. In this experiment the recognition results were not improved either, although the vowel models were better tailored to capture lexical stress-related information. We conclude that the mapping of lexical stress to the acoustic surface of fluent speech is not sufficiently straightforward to be of direct benefit for CSR, due to interaction of lexical stress with rhythm and sentence accent in real speech.


language resources and evaluation | 2008

Validation of spoken language resources: an overview of basic aspects

Henk van den Heuvel; Dorota J. Iskra; Eric Sanders; Folkert de Vriend

Spoken language resources (SLRs) are essential for both research and application development. In this article we clarify the concept of SLR validation. We define validation and how it differs from evaluation. Further, relevant principles of SLR validation are outlined. We argue that the best way to validate SLRs is to implement validation throughout SLR production and have it carried out by an external and experienced institute. We address which tasks should be carried out by the validation institute, and which not. Further, we list the basic issues that validation criteria for SLR should address. A standard validation protocol is shown, illustrating how validation can prove its value throughout the production phase in terms of pre-validation, full validation and pre-release validation.


Speech Communication | 2018

Semi-supervised acoustic model training for speech with code-switching

Emre Yilmaz; Mitchell McLaren; Henk van den Heuvel; David A. van Leeuwen

Abstract In the FAME! project, we aim to develop an automatic speech recognition (ASR) system for Frisian-Dutch code-switching (CS) speech extracted from the archives of a local broadcaster with the ultimate goal of building a spoken document retrieval system. Unlike Dutch, Frisian is a low-resourced language with a very limited amount of manually annotated speech data. In this paper, we describe several automatic annotation approaches to enable using of a large amount of raw bilingual broadcast data for acoustic model training in a semi-supervised setting. Previously, it has been shown that the best-performing ASR system is obtained by two-stage multilingual deep neural network (DNN) training using 11 hours of manually annotated CS speech (reference) data together with speech data from other high-resourced languages. We compare the quality of transcriptions provided by this bilingual ASR system with several other approaches that use a language recognition system for assigning language labels to raw speech segments at the front-end and using monolingual ASR resources for transcription. We further investigate automatic annotation of the speakers appearing in the raw broadcast data by first labeling with (pseudo) speaker tags using a speaker diarization system and then linking to the known speakers appearing in the reference data using a speaker recognition system. These speaker labels are essential for speaker-adaptive training in the proposed setting. We train acoustic models using the manually and automatically annotated data and run recognition experiments on the development and test data of the FAME! speech corpus to quantify the quality of the automatic annotations. The ASR and CS detection results demonstrate the potential of using automatic language and speaker tagging in semi-supervised bilingual acoustic model training.


spoken language technology workshop | 2016

Code-switching detection using multilingual DNNS

Emre Yilmaz; Henk van den Heuvel; David A. van Leeuwen

Automatic speech recognition (ASR) of code-switching speech requires careful handling of unexpected language switches that may occur in a single utterance. In this paper, we investigate the feasibility of using multilingually trained deep neural networks (DNN) for the ASR of Frisian speech containing code-switches to Dutch with the aim of building a robust recognizer that can handle this phenomenon. For this purpose, we train several multilingual DNN models on Frisian and two closely related languages, namely English and Dutch, to compare the impact of single-step and two-step multilingual DNN training on the recognition and code-switching detection performance. We apply bilingual DNN retraining on both target languages by varying the amount of training data belonging to the higher-resourced target language (Dutch). The recognition results show that the multilingual DNN training scheme with an initial multilingual training step followed by bilingual retraining provides recognition performance comparable to an oracle baseline recognizer that can employ language-specific acoustic models. We further show that we can detect code-switches at the word level with an equal error rate of around 17% excluding the deletions due to ASR errors.


Spyns, P.; Odijk, J. (ed.), Essential Speech and Language Technology for Dutch | 2013

Lexical Modeling for Proper name Recognition in Autonomata Too

Bert Réveil; Jean-Pierre Martens; Henk van den Heuvel; Gerrit Bloothooft; Marijn Schraagen

The research in Autonomata Too aimed at the development of new pronunciation modeling techniques that can bring the speech recognition component of a Dutch/Flemish POI (Points of Interest) information providing business service to the required level of accuracy. The automatic recognition of spoken POI is extremely difficult because of the existence of multiple pronunciations that are frequently used for the same POI and because of the presence of important cross-lingual effects one has to account for. In fact, the ASR (Automatic Speech Recognition) engine must be able to cope with pronunciations of (partly) foreign POI names spoken by native speakers and pronunciations of native POI names uttered by non-native speakers. In order to deal adequately with such pronunciations, one must model them at the level of the acoustic models as well as at the level of the recognition lexicon. This paper describes a novel lexical modeling approach that was developed and tested in the Autonomata Too project. The new method employs a G2P-P2P (grapheme-to-phoneme, phoneme-to-phoneme) tandem to generate suitable lexical pronunciation variants. It was shown to yield a significant improvement over a baseline system already embedding state-of-the-art acoustic and lexical models.

Collaboration


Dive into the Henk van den Heuvel's collaboration.

Top Co-Authors

Avatar

Eric Sanders

Radboud University Nijmegen

View shared research outputs
Top Co-Authors

Avatar

Lou Boves

Radboud University Nijmegen

View shared research outputs
Top Co-Authors

Avatar

Helmer Strik

Radboud University Nijmegen

View shared research outputs
Top Co-Authors

Avatar

Nelleke Oostdijk

Radboud University Nijmegen

View shared research outputs
Top Co-Authors

Avatar

Emre Yilmaz

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Catia Cucchiarini

Radboud University Nijmegen

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

A.J.M. Russel

Radboud University Nijmegen

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge