Marelie H. Davel
North-West University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marelie H. Davel.
Computer Speech & Language | 2008
Marelie H. Davel; Etienne Barnard
The Default&Refine algorithm is a new rule-based learning algorithm that was developed as an accurate and efficient pronunciation prediction mechanism for speech processing systems. The algorithm exhibits a number of attractive properties including rapid generalisation from small training sets, good asymptotic accuracy, robustness to noise in the training data, and the production of compact rule sets. We describe the Default&Refine algorithm in detail and demonstrate its performance on two benchmarked pronunciation databases (the English OALD and Flemish FONILEX pronunciation dictionaries) as well as a newly-developed Afrikaans pronunciation dictionary. We find that the algorithm learns more efficiently (achieves higher accuracy on smaller data sets) than any of the alternative pronunciation prediction algorithms considered. In addition, we demonstrate the ability of the algorithm to generate an arbitrarily small rule set in such a way that the trade-off between rule set size and accuracy is well controlled. A conceptual comparison with alternative algorithms (including Dynamically Expanding Context, Transformation-Based Learning and Pronunciation by Analogy) clarifies the competitive performance obtained with Default&Refine.
international conference on acoustics, speech, and signal processing | 2012
Florian Metze; Nitendra Rajput; Xavier Anguera; Marelie H. Davel; Guillaume Gravier; Charl Johannes van Heerden; Gautam Varma Mantena; Armando Muscariello; Kishore Prahallad; Igor Szöke; Javier Tejedor
In this paper, we describe the “Spoken Web Search” Task, which was held as part of the 2011 MediaEval benchmark campaign. The purpose of this task was to perform audio search with audio input in four languages, with very few resources being available in each language. The data was taken from “spoken web” material collected over mobile phone connections by IBM India. We present results from several independent systems, developed by five teams and using different approaches, compare them, and provide analysis and directions for future research.
Speech Communication | 2014
Nic J. de Vries; Marelie H. Davel; Jaco Badenhorst; Willem D. Basson; Febe de Wet; Etienne Barnard; Alta de Waal
Acoustic data collection for automatic speech recognition (ASR) purposes is a particularly challenging task when working with under-resourced languages, many of which are found in the developing world. We provide a brief overview of related data collection strategies, highlighting some of the salient issues pertaining to collecting ASR data for under-resourced languages. We then describe the development of a smartphone-based data collection tool, Woefzela, which is designed to function in a developing world context. Specifically, this tool is designed to function without any Internet connectivity, while remaining portable and allowing for the collection of multiple sessions in parallel; it also simplifies the data collection process by providing process support to various role players during the data collection process, and performs on-device quality control in order to maximise the use of recording opportunities. The use of the tool is demonstrated as part of a South African data collection project, during which almost 800 hours of ASR data was collected, often in remote, rural areas, and subsequently used to successfully build acoustic models for eleven languages. The on-device quality control mechanism (referred to as QC-on-the-go) is an interesting aspect of the Woefzela tool and we discuss this functionality in more detail. We experiment with different uses of quality control information, and evaluate the impact of these on ASR accuracy. Woefzela was developed for the Android Operating System and is freely available for use on Android smartphones.
international conference on acoustics, speech, and signal processing | 2013
Florian Metze; Xavier Anguera; Etienne Barnard; Marelie H. Davel; Guillaume Gravier
In this paper, we describe the “Spoken Web Search” Task, which was held as part of the 2011 MediaEval benchmark campaign. The purpose of this task was to perform audio search with audio input in four languages, with very few resources being available in each language. The data was taken from “spoken web” material collected over mobile phone connections by IBM India. We present results from several independent systems, developed by five teams and using different approaches, compare them, and provide analysis and directions for future research.In this paper we describe the systems presented by Telefonica Research to the Spoken Web Search task of the Mediaeval 2012 evaluation. This year we proposed two systems. The rst one consists on a segmental DTW system, similar to the one presented in 2011, with a few improvements. The second system also uses a DTW-like approach but allowing for all reference les o be searched at once using an information retrieval approach.
Computer Speech & Language | 2014
Florian Metze; Xavier Anguera; Etienne Barnard; Marelie H. Davel; Guillaume Gravier
Abstract In this paper, we describe several approaches to language-independent spoken term detection and compare their performance on a common task, namely “Spoken Web Search”. The goal of this part of the MediaEval initiative is to perform low-resource language-independent audio search using audio as input. The data was taken from “spoken web” material collected over mobile phone connections by IBM India as well as from the LWAZI corpus of African languages. As part of the 2011 and 2012 MediaEval benchmark campaigns, a number of diverse systems were implemented by independent teams, and submitted to the “Spoken Web Search” task. This paper presents the 2011 and 2012 results, and compares the relative merits and weaknesses of approaches developed by participants, providing analysis and directions for future research, in order to improve voice access to spoken information in low resource settings.
international conference on acoustics, speech, and signal processing | 2010
Charl Johannes van Heerden; Etienne Barnard; Marelie H. Davel; Christiaan van der Walt; Ewald van Dyk; Michael Feld; Christian A. Müller
We present a novel approach to automatic speaker age classification, which combines regression and classification to achieve competitive classification accuracy on telephone speech. Support vector machine regression is used to generate finer age estimates, which are combined with the posterior probabilities of well-trained discriminative gender classifiers to predict both the age and gender of a speaker. We show that this combination performs better than direct 7-class classifiers. The regressors and classifiers are trained using longterm features such as pitch and formants, as well as short-term (frame-based) features derived from MAP adaptation of GMMs that were trained on MFCCs.
spoken language technology workshop | 2008
Etienne Barnard; Madelaine Plauché; Marelie H. Davel
The commercial successes of spoken dialog systems in the developed world provide encouragement for their use in the developing world, where speech could play a role in the dissemination of relevant information in local languages. We investigate the evolution of spoken dialog system research in the developed world, and show that the utility of speech is based on user factors and application factors (amongst others). After adjusting the factors for the developing world context and plotting their interactions, we offer several predictions for the field. In particular, we show that the field of spoken dialog system for the developing world is in a nascent stage and will likely take another decade to have an impact similar to that in the developed world.
Proceedings of the First Workshop on Language Technologies for African Languages | 2009
Jaco Badenhorst; Charl Johannes van Heerden; Marelie H. Davel; Etienne Barnard
We describe the Lwazi corpus for automatic speech recognition (ASR), a new telephone speech corpus which includes data from nine Southern Bantu languages. Because of practical constraints, the amount of speech per language is relatively small compared to major corpora in world languages, and we report on our investigation of the stability of the ASR models derived from the corpus. We also report on phoneme distance measures across languages, and describe initial phone recognisers that were developed using this data.
Multilingual Speech Processing | 2006
Silke Goronzy; Laura Mayfield Tomokiyo; Etienne Barnard; Marelie H. Davel
This chapter focuses on problems posed by non-native speech input as well as accent and dialect variation with respect to acoustic modeling, dictionaries, and language modeling in speech recognition systems. It begins with a description of the manifold characteristics of non-native speech. This includes theoretical models and descriptions obtained from corpus analysis along with a description of non-native databases. Almost all investigations in non-native speech require non-native speech data. Obtaining sufficient non-native speech data is one of the biggest problems. While there are plenty of databases for many different languages, they usually contain only native speech data. Very few databases containing non-native speech are publicly available. Speaker adaptation techniques have proven valuable for adapting acoustic models (AMs) to both native and nonnative speakers. This chapter describes speaker adaptation techniques in the special context of non-native speakers, and discusses pragmatic strategies, such as handling code-switching, and the design of voice-based user interfaces for speakers with different cultural backgrounds.
south african institute of computer scientists and information technologists | 2010
Thipe Modipa; Marelie H. Davel; Febe de Wet
Automatic speech recognition (ASR) systems are increasingly being developed for under-resourced languages, especially for use in multilingual spoken dialogue systems. We investigate different approaches to the acoustic modelling of Sepedi affricates for ASR. We determine that it is possible to model various of these complex consonants as a sequence of much simpler sounds. This approach reduces the Sepedi phoneme inventory from 45 to 32, resulting in simpler dictionary development and transcription processes, as well as more accurate acoustic modelling.