Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jaco Badenhorst is active.

Publication


Featured researches published by Jaco Badenhorst.


Speech Communication | 2014

A smartphone-based ASR data collection tool for under-resourced languages

Nic J. de Vries; Marelie H. Davel; Jaco Badenhorst; Willem D. Basson; Febe de Wet; Etienne Barnard; Alta de Waal

Acoustic data collection for automatic speech recognition (ASR) purposes is a particularly challenging task when working with under-resourced languages, many of which are found in the developing world. We provide a brief overview of related data collection strategies, highlighting some of the salient issues pertaining to collecting ASR data for under-resourced languages. We then describe the development of a smartphone-based data collection tool, Woefzela, which is designed to function in a developing world context. Specifically, this tool is designed to function without any Internet connectivity, while remaining portable and allowing for the collection of multiple sessions in parallel; it also simplifies the data collection process by providing process support to various role players during the data collection process, and performs on-device quality control in order to maximise the use of recording opportunities. The use of the tool is demonstrated as part of a South African data collection project, during which almost 800 hours of ASR data was collected, often in remote, rural areas, and subsequently used to successfully build acoustic models for eleven languages. The on-device quality control mechanism (referred to as QC-on-the-go) is an interesting aspect of the Woefzela tool and we discuss this functionality in more detail. We experiment with different uses of quality control information, and evaluate the impact of these on ASR accuracy. Woefzela was developed for the Android Operating System and is freely available for use on Android smartphones.


Proceedings of the First Workshop on Language Technologies for African Languages | 2009

Collecting and Evaluating Speech Recognition Corpora for Nine Southern Bantu Languages

Jaco Badenhorst; Charl Johannes van Heerden; Marelie H. Davel; Etienne Barnard

We describe the Lwazi corpus for automatic speech recognition (ASR), a new telephone speech corpus which includes data from nine Southern Bantu languages. Because of practical constraints, the amount of speech per language is relatively small compared to major corpora in world languages, and we report on our investigation of the stability of the ASR models derived from the corpus. We also report on phoneme distance measures across languages, and describe initial phone recognisers that were developed using this data.


Procedia Computer Science | 2016

Developing Speech Resources from Parliamentary Data for South African English

Febe de Wet; Jaco Badenhorst; Thipe Modipa

Abstract The official languages of South Africa can still be classified as under-resourced with respect to the speech resources that are required for technology development. Harvesting speech data from existing sources is one means to create additional resources. The aim of the study reported on in this paper was to improve the harvesting and transcription accuracy of a corpus derived from parliamentary data. This aim was achieved by improving on the text normalisation process and pronunciation modelling as well as by iteratively training more accurate in-domain acoustic models. In this manner, more data could be harvested with higher confidence than using baseline pronunciation dictionaries and out-of-domain speech data.


conference of the international speech communication association | 2011

Woefzela - an open-source platform for ASR data collection in the developing world

Nic J. de Vries; Jaco Badenhorst; Marelie H. Davel; Etienne Barnard; Alta de Waal


SLTU | 2014

The NCHLT Speech Corpus of the South African languages

Etienne Barnard; Marelie H. Davel; Charl Johannes van Heerden; Febe de Wet; Jaco Badenhorst


SLTU | 2012

Quality measurements for mobile data collection in the developing world

Jaco Badenhorst; Alta de Waal; Febe de Wet


language resources and evaluation | 2011

Collecting and evaluating speech recognition corpora for 11 South African languages

Jaco Badenhorst; Charl Johannes van Heerden; Marelie H. Davel; Etienne Barnard


Archive | 2011

Trajectory behaviour at different phonemic context sizes

Jaco Badenhorst; Marelie H. Davel; Etienne Barnard


Archive | 2010

Analysing co-articulation using frame-based feature trajectories

Jaco Badenhorst; Marelie H. Davel; Etienne Barnard


Archive | 2012

Improved transition models for cepstral trajectories

Jaco Badenhorst; Marelie H. Davel; Etienne Barnard

Collaboration


Dive into the Jaco Badenhorst's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Charl Johannes van Heerden

Council of Scientific and Industrial Research

View shared research outputs
Top Co-Authors

Avatar

Alta de Waal

Council for Scientific and Industrial Research

View shared research outputs
Top Co-Authors

Avatar

Febe de Wet

Council of Scientific and Industrial Research

View shared research outputs
Top Co-Authors

Avatar

Nic J. de Vries

Council of Scientific and Industrial Research

View shared research outputs
Top Co-Authors

Avatar

Alfred Tshoane

Council of Scientific and Industrial Research

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge