Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Etienne Barnard is active.

Publication


Featured researches published by Etienne Barnard.


international conference on spoken language processing | 1996

Phone clustering using the Bhattacharyya distance

Brian Mak; Etienne Barnard

The authors study the use of the classification-based Bhattacharyya distance measure to guide biphone clustering. The Bhattacharyya distance is a theoretical distance measure between two Gaussian distributions which is equivalent to an upper bound on the optimal Bayesian classification error probability. It also has the desirable properties of being computationally simple and extensible to more Gaussian mixtures. Using the Bhattacharyya distance measure in a data-driven approach together with a novel a-level agglomerative hierarchical biphone clustering algorithm, generalized left/right biphones(BGBs) are derived. A neural-net based phone recognizer trained on the BGBs is found to have better frame-level phone recognition than one trained on generalized biphones (BCGBs) derived from a set of commonly used broad categories. They further evaluate the new BGBs on an isolated-word recognition task of perplexity 40 and obtain a 16.2% error reduction over the broad-category generalized biphones (BCGBs) and a 41.8% error reduction over the monophones.


international conference on acoustics, speech, and signal processing | 1995

An approach to automatic language identification based on language-dependent phone recognition

Yonghong Yan; Etienne Barnard

An approach to language identification (LID) based on language-dependent phone recognition is presented. A variety of features and their combinations extracted by language-dependent recognizers were evaluated based on the same database. Two novel information sources for LID were introduced: (1) forward and backward bigram based language models, and (2) context-dependent duration models. An LID system using hidden Markov models and neural network was developed. The system was trained and evaluated using the OGLTS database. For a six-language task, the system performance (correct rate) for 45-second long utterances and 10-second long utterances reached 91-96% and 81-08% respectively. The experiments demonstrated the importance of detailed modeling and the method by which these information sources are combined.


international conference on acoustics, speech, and signal processing | 1994

Analysis of phoneme-based features for language identification

Kay M. Berkling; Takayuki Arai; Etienne Barnard

This paper presents an analysis of the phonemic language identification system introduced previously (see Eurospeech, vol.2, p.1307, 1993), now extended to recognize German in addition to English and Japanese. In this system language identification is based on features derived from a superset of phonemes of all three languages. As we increase the number of languages, the need to reduce the feature space becomes apparent. Practical analysis of single-feature statistics in conjunction with linguistic knowledge leads to 90% reduction of the feature space with only a 5% loss in performance. Thus, the system discriminates between Japanese and English with 84.1% accuracy based on only 15 features compared to 84.6% based on the complete set of 318 phonemic features (or 83.6% using 333 broad-category features). Results indicate that a language identification system may be designed based on linguistic knowledge and then implemented with a neural network of appropriate complexity.<<ETX>>


IEEE Transactions on Neural Networks | 1993

Backpropagation uses prior information efficiently

Etienne Barnard; Elizabeth Botha

The ability of neural net classifiers to deal with a priori information is investigated. For this purpose, backpropagation classifiers are trained with data from known distributions with variable a priori probabilities, and their performance on separate test sets is evaluated. It is found that backpropagation employs a priori information in a slightly suboptimal fashion, but this does not have serious consequences on the performance of the classifier. Furthermore, it is found that the inferior generalization that results when an excessive number of network parameters are used can (partially) be ascribed to this suboptimality.


international conference on acoustics speech and signal processing | 1996

Creating speaker-specific phonetic templates with a speaker-independent phonetic recognizer: implications for voice dialing

Neena Jain; Ronald A. Cole; Etienne Barnard

We present a new approach to speaker dependent template generation which uses dramatically less storage to represent a speakers words, with minimal degradation in recognition accuracy. In this approach, the symbolic string produced by a speaker-independent phonetic recognizer is used to represent utterances. We investigate effective procedures for template generation, and compare the results of these procedures to templates represented by acoustic parameters for utterances produced with different telephone handsets. The use of speaker-specific templates led to a reduction of about 1:500 in data-storage requirements with comparable recognition accuracy. We also compare recognition performance for speaker-specific and speaker-independent templates, and for combinations of the two. The results showed that combining speaker-specific and speaker-independent templates produces better recognition performance than either alone. A voice dialing system is described which incorporates the speaker-specific templates.


Computer Speech & Language | 1996

Development of an approach to automatic language identification based on phone recognition

Yonghong Yan; Etienne Barnard; Ronald A. Cole

Abstract An Automatic Language Identification (LID) approach is presented. The baseline LID system consists of three parts: (1) hidden Markov model (HMM) based context-independent phone recognizers, (2) language identification score generators and (3) a linear language classifier. The system exploits language-dependent phonotactic constraints and prosodic information. Four methods are proposed to improve the system performance. Two bigram-based interpolated N-gram language models (forward and backward) are used to model the phone sequence constraints of different spoken languages. A context-dependent duration model interpolated by a context-independent duration model is used to capture the duration information. Comparison experiments between the linear classifier and neural network-based final classifiers were conducted. Finally, optimization of language model based on back propagation is proposed. The improved system was evaluated on an 11-language task, and performance reached 13·3% and 26·2% (error rate) for utterances averaging 45 s duration and 10 s duration, respectively. Compared with the baseline system performance, it shows the importance of the issues addressed in this paper for language identification.


international conference on spoken language processing | 1996

Speech recognition using syllable-like units

Zhihong Hu; Johan Schalkwyk; Etienne Barnard; Ronald A. Cole

It is well-known that speech is dynamic and that frame-based systems lack the ability to realistically model the dynamics of speech. Segment-based systems offer the potential to integrate the dynamics of speech, at least within the phoneme boundaries, although it is difficult to obtain accurate phonemic segmentation in fluent speech. In this paper, we propose a new approach which uses syllable-like units in recognition. In the proposed approach, syllable-like units are defined by rules and used as the basic units of recognition. The motivation for using syllabic-like units is that, by modeling perceptually more meaningful units, better modeling of speech can be achieved; and this method provides a better framework for incorporating dynamic modeling techniques into the recognition system. The proposed approach has achieved the same recognition performance on the task of recognizing months of the year as compared to the best frame-based recognizer available.


international conference on acoustics, speech, and signal processing | 1997

Explicit, N-best formant features for vowel classification

Philipp Schmid; Etienne Barnard

We demonstrate the use of explicit formant features for vowel and semi-vowel classification. The formant trajectories are approximated by either three line segments or Legendre polynomials. Together with formant amplitude, formant bandwidth, pitch, and segment duration, these formant features form a compact feature representation which performs as well (71.8%) as a cepstral-based feature representation (71.6%). The combination of the formant and cepstral feature improves the accuracy further to 73.4%. Additionally, we outline future experiments using our robust, N-best formant tracker.


international conference on acoustics speech and signal processing | 1996

Experiments for an approach to language identification with conversational telephone speech

Yonghong Yan; Etienne Barnard

This paper presents work on language identification research using conversational speech (the LDC Conversational Telephone Speech Database). The baseline system used in this study is based on language-dependent phone recognition and phonotactic constraints. The system was trained using monologue data and obtained an error rate of around 9% on a commonly used nine-language monologue test set. While the system was used to process conversational speech from the same nine-language task, dramatic performance degradation (with an error rate of 40%) was observed. Based on our analysis of conversational speech, two methods: (1) pre-processing and, (2) post-processing, were proposed. Without the presence of training data from conversational speech database, the final system (the baseline system enhanced by the two proposed methods) obtained an error rate of 24%, a substantial improvement (with 41% error reduction) compared with the baseline system.


SPIE's 1995 Symposium on OE/Aerospace Sensing and Dual Use Photonics | 1995

Real-world speech recognition with neural networks

Etienne Barnard; Ronald A. Cole; Mark A. Fanty; Pieter Vermeulen

We describe a system based on neural networks that is designed to recognize speech transmitted through the telephone network. Context-dependent phonetic modeling is studied as a method of improving recognition accuracy, and a special training algorithm is introduced to make the training of these nets more manageable. Our system is designed for real-world applications, and we have therefore specialized our implementation for this goal; a pipelined DSP structure and a compact search algorithm are described as examples of this specialization. Preliminary results from a realistic test of the system (a field trial for the U.S. Census Bureau) are reported.

Collaboration


Dive into the Etienne Barnard's collaboration.

Top Co-Authors

Avatar

Ronald A. Cole

University of Colorado Boulder

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Brian Mak

Hong Kong University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Elizabeth Botha

Carnegie Mellon University

View shared research outputs
Researchain Logo
Decentralizing Knowledge