Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Charles C. Broun is active.

Publication


Featured researches published by Charles C. Broun.


IEEE Transactions on Speech and Audio Processing | 2002

Speaker recognition with polynomial classifiers

William M. Campbell; Khaled T. Assaleh; Charles C. Broun

Modern speaker recognition applications require high accuracy at low complexity. We propose the use of a polynomial-based classifier to achieve these objectives. This approach has several advantages. First, polynomial classifier scoring yields a system which is highly computationally scalable with the number of speakers. Second, a new training algorithm is proposed which is discriminative, handles large data sets, and has low memory usage. Third, the output of the polynomial classifier is easily incorporated into a statistical framework allowing it to be combined with other techniques such as hidden Markov models. Results are given for the application of the new methods to the YOHO speaker recognition database.


EURASIP Journal on Advances in Signal Processing | 2002

Automatic speechreading with applications to human-computer interfaces

Xiaozheng Zhang; Charles C. Broun; Russell M. Mersereau; Mark A. Clements

There has been growing interest in introducing speech as a new modality into the human-computer interface (HCI). Motivated by the multimodal nature of speech, the visual component is considered to yield information that is not always present in the acoustic signal and enables improved system performance over acoustic-only methods, especially in noisy environments. In this paper, we investigate the usefulness of visual speech information in HCI related applications. We first introduce a new algorithm for automatically locating the mouth region by using color and motion information and segmenting the lip region by making use of both color and edge information based on Markov random fields. We then derive a relevant set of visual speech parameters and incorporate them into a recognition engine. We present various visual feature performance comparisons to explore their impact on the recognition accuracy, including the lip inner contour and the visibility of the tongue and teeth. By using a common visual feature set, we demonstrate two applications that exploit speechreading in a joint audio-visual speech signal processing task: speech recognition and speaker verification. The experimental results based on two databases demonstrate that the visual information is highly effective for improving recognition performance over a variety of acoustic noise levels.


international conference on acoustics, speech, and signal processing | 2002

Automatic speechreading with application to speaker verification

Charles C. Broun; Xiaozheng Zhang; Russell M. Mersereau; Mark A. Clements

Speech not only conveys the linguistic information, but also characterizes the talkers identity and therefore can be used in personal authentication. While most of the speech information is contained in the acoustic channel, the lip movement during speech production also provides useful information. In this paper we investigate the effectiveness of visual speech features in a speaker veri£cation task. We £rst present the visual front-end of the automatic speechreading system. We then develop a recognition engine to train and recognize sequences of visual parameters. The experimental results based on the XM2VTS database [I] demonstrate that visual information is highly effective in reducing both false acceptance and false rejection rates in speaker veri£cation tasks.


Journal of the Acoustical Society of America | 2000

Speaker independent speech recognition system and method

William M. Campbell; John Eric Kleider; Charles C. Broun; Carl Steven Gifford; Khaled Assaleh

An improved method of training a SISRS uses less processing and memory resources by operating on vectors instead of matrices which represent spoken commands. Memory requirements are linearly proportional to the number of spoken commands for storing each command model. A spoken command is identified from the set of spoken commands by a command recognition procedure (200). The command recognition procedure (200) includes sampling the speakers speech, deriving cepstral coefficients and delta-cepstral coefficients, and performing a polynomial expansion on cepstral coefficients. The identified spoken command is selected using the dot product of the command model data and the average command structure representing the unidentified spoken command.


international conference on acoustics, speech, and signal processing | 2000

Robust out-of-vocabulary rejection for low-complexity speaker independent speech recognition

Charles C. Broun; William M. Campbell

With the increased use of speech recognition outside of the lab environment, the need for better out-of-vocabulary (OOV) rejection techniques is critical for the continued success of this user interface. Not only must future speech recognition systems accurately reject OOV utterances, but they must also maintain their performance in mismatched (i.e. noisy) conditions. In this paper, we extend our work on low-complexity, high-accuracy speaker independent speech recognition. We present a novel rejection criterion that is shown to be robust in mismatched conditions. This technique continues our emphasis on speech recognition for resource limited applications, by providing a solution that is highly scalable, requiring no additional memory and no significant increase in computation. The technique is based on the use of multiple garbage models (on the order of 100 or more) and a novel ranking method to achieve robust performance. This method allows for a data dependent approach in order to optimize the performance over each class individually. Results are presented for a large database consisting of 166 speakers and 131 classes. Out-of-class rejection is based on 118 out-of-vocabulary phrases and 3 categories of spurious inputs (breath noise, coughs, and lipsmack). Performance is shown to be superior to the approximated optimal Bayes reject rule.


international conference on acoustics, speech, and signal processing | 2000

CipherVOX: scalable low-complexity speaker verification

Bruce Alan Fette; Charles C. Broun; William M. Campbell; Cynthia Ann Jaskie

Biometrics is gaining strong support for access control in the industry. It is not uncommon for individual users to be faced with a half-dozen or more passwords and personal identification numbers (PINs) controlling access to the systems required for them to do their job. The ubiquity of passwords actually relaxes system security since many users tend to use the same password across all applications, or collect the various passwords in a single location (perhaps a password protected spreadsheet, or a piece of paper in a desk drawer). The latter case is of extreme concern since security around the collection is much more easily compromised than that of the system. The use of biometrics not only recovers the ability to secure sensitive systems and data, but also does so in a user-friendly manner. In order to use biometrics successfully in server-based environments, several key concerns must be addressed. First, the authentication strategy must maintain acceptable levels of security. Second, the user community must accept the chosen biometric (unless there is a captive audience). The last major consideration is the scalability of the solution. In this paper we introduce CipherVOX, a speaker verification access control solution for server and standalone computing environments. We discuss the use of our polynomial-based classifier that combines high-accuracy and low-complexity via discriminative techniques, and give performance results for both a proprietary performance database and the standard YOHO database. We also review the challenges in designing for user acceptance, including the design of the speaker verification user interface, as well as the application programming interface (API).


Proceedings of SPIE | 2001

Multimodal Fusion of Polynomial Classifiers for Automatic Person Recognition

Charles C. Broun; Xiaozheng Zhang

With the prevalence of the information age, privacy and personalization are forefront in todays society. As such, biometrics are viewed as essential components of current evolving technological systems. Consumers demand unobtrusive and non-invasive approaches. In our previous work, we have demonstrated a speaker verification system that meets these criteria. However, there are additional constraints for fielded systems. The required recognition transactions are often performed in adverse environments and across diverse populations, necessitating robust solutions. There are two significant problem areas in current generation speaker verification systems. The first is the difficulty in acquiring clean audio signals in all environments without encumbering the user with a head- mounted close-talking microphone. Second, unimodal biometric systems do not work with a significant percentage of the population. To combat these issues, multimodal techniques are being investigated to improve system robustness to environmental conditions, as well as improve overall accuracy across the population. We propose a multi modal approach that builds on our current state-of-the-art speaker verification technology. In order to maintain the transparent nature of the speech interface, we focus on optical sensing technology to provide the additional modality-giving us an audio-visual person recognition system. For the audio domain, we use our existing speaker verification system. For the visual domain, we focus on lip motion. This is chosen, rather than static face or iris recognition, because it provides dynamic information about the individual. In addition, the lip dynamics can aid speech recognition to provide liveness testing. The visual processing method makes use of both color and edge information, combined within Markov random field MRF framework, to localize the lips. Geometric features are extracted and input to a polynomial classifier for the person recognition process. A late integration approach, based on a probabilistic model, is employed to combine the two modalities. The system is tested on the XM2VTS database combined with AWGN in the audio domain over a range of signal-to-noise ratios.


Proceedings of SPIE | 1998

Audio sensors and low-complexity recognition technologies for soldier systems

William M. Campbell; Khaled T. Assaleh; Charles C. Broun

The use of audio sensors for soldier systems is examined in detail. We propose two applications, speech recognition and speaker verification. These technologies provide two functions for a soldier system. First, the soldier has hands-free operation of his system via voice command and control. Second, identify can be verified for the soldier to maintain the security integrity of the soldier system. A low complexity, high accuracy technology based upon averaging a discriminative classifier over time is presented and applied to these processes. For speaker verification of the soldier, the classifier is used in a text-prompted mode to authenticate the identity of the soldier. Once the soldier has been authenticated, the interface can then be navigated via voice commands in a speaker independent manner. By using an artificial neural network structure, a high degree of accuracy can be obtained with low complexity. We show the resulting accuracy of the speaker verification technology. We also detail the simulation of the speech recognition under various noise conditions.


Applications and science of computational intelligence. Conference | 1999

Low-complexity speaker authentication techniques using polynomial classifiers

William M. Campbell; Charles C. Broun

Modern authentication systems require high-accuracy low complexity methods. High accuracy ensures secure access to sensitive data. Low computational requirements produce high transaction rates for large authentication populations. We propose a polynomial-based classification system that combines high-accuracy and low complexity using discriminative techniques. Traditionally polynomial classifiers have been difficult to use for authentication because of either low accuracy or problems associated with large training sets. We detail a new training method that solves these problems. The new method achieves high accuracy by implementing discriminative classification between in-class and out-of- class feature sets. A separable approach to the problem enables the method to be applied to large data sets. Storage is reduced by eliminating redundant correlations in the in- class and out-of-class sets. We also show several new techniques that can be applied to balance prior probabilities and facilitate low complexity retraining. We illustrate the method by applying it to the problem of speaker authentication using voice. We demonstrate the technique on a multisession speaker verification database collected over a one month period. Using a third order polynomial-based scheme, the new system gives less than one percent average equal error rate using only one minute of training data and less than five seconds of testing data per speaker.


international conference on acoustics, speech, and signal processing | 2002

Visual speech feature extraction for improved speech recognition

Xiaozheng Zhang; Russell M. Mersereau; Mark A. Clements; Charles C. Broun

Collaboration


Dive into the Charles C. Broun's collaboration.

Top Co-Authors

Avatar

William M. Campbell

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Xiaozheng Zhang

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Mark A. Clements

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Russell M. Mersereau

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge