Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kevin Brady is active.

Publication


Featured researches published by Kevin Brady.


IEEE Transactions on Audio, Speech, and Language Processing | 2006

Exploiting Nonacoustic Sensors for Speech Encoding

Thomas F. Quatieri; Kevin Brady; Dave Messing; Joseph P. Campbell; William M. Campbell; Michael S. Brandstein; Clifford J. Weinstein; John D. Tardelli; Paul D. Gatewood

The intelligibility of speech transmitted through low-rate coders is severely degraded when high levels of acoustic noise are present in the acoustic environment. Recent advances in nonacoustic sensors, including microwave radar, skin vibration, and bone conduction sensors, provide the exciting possibility of both glottal excitation and, more generally, vocal tract measurements that are relatively immune to acoustic disturbances and can supplement the acoustic speech waveform. We are currently investigating methods of combining the output of these sensors for use in low-rate encoding according to their capability in representing specific speech characteristics in different frequency bands. Nonacoustic sensors have the ability to reveal certain speech attributes lost in the noisy acoustic signal; for example, low-energy consonant voice bars, nasality, and glottalized excitation. By fusing nonacoustic low-frequency and pitch content with acoustic-microphone content, we have achieved significant intelligibility performance gains using the DRT across a variety of environments over the government standard 2400-bps MELPe coder. By fusing quantized high-band 4-to-8-kHz speech, requiring only an additional 116 bps, we obtain further DRT performance gains by exploiting the ears insensitivity to fine spectral detail in this frequency region.


international conference on acoustics, speech, and signal processing | 2005

Estimating and evaluating confidence for forensic speaker recognition

William M. Campbell; Douglas A. Reynolds; Joseph P. Campbell; Kevin Brady

Estimating and evaluating confidence has become a key aspect of the speaker recognition problem because of the increased use of this technology in forensic applications. We discuss evaluation measures for speaker recognition and some of their properties. We then propose a framework for confidence estimation based upon scores and meta-information, such as utterance duration, channel type, and SNR. The framework uses regression techniques with multilayer perceptrons to estimate confidence with a data-driven methodology. As an application, we show the use of the framework in a speaker comparison task drawn from the NIST 2000 evaluation. A relative comparison of different types of meta-information is given. We demonstrate that the new framework can give substantial improvements over standard distribution methods of estimating confidence.


acm multimedia | 2016

Multi-Modal Audio, Video and Physiological Sensor Learning for Continuous Emotion Prediction

Kevin Brady; Youngjune Gwon; Pooya Khorrami; Elizabeth Godoy; William M. Campbell; Charlie K. Dagli; Thomas S. Huang

The automatic determination of emotional state from multimedia content is an inherently challenging problem with a broad range of applications including biomedical diagnostics, multimedia retrieval, and human computer interfaces. The Audio Video Emotion Challenge (AVEC) 2016 provides a well-defined framework for developing and rigorously evaluating innovative approaches for estimating the arousal and valence states of emotion as a function of time. It presents the opportunity for investigating multimodal solutions that include audio, video, and physiological sensor signals. This paper provides an overview of our AVEC Emotion Challenge system, which uses multi-feature learning and fusion across all available modalities. It includes a number of technical contributions, including the development of novel high- and low-level features for modeling emotion in the audio, video, and physiological channels. Low-level features include modeling arousal in audio with minimal prosodic-based descriptors. High-level features are derived from supervised and unsupervised machine learning approaches based on sparse coding and deep learning. Finally, a state space estimation approach is applied for score fusion that demonstrates the importance of exploiting the time-series nature of the arousal and valence states. The resulting system outperforms the baseline systems [10] on the test evaluation set with an achieved Concordant Correlation Coefficient (CCC) for arousal of 0.770 vs 0.702 (baseline) and for valence of 0.687 vs 0.638. Future work will focus on exploiting the time-varying nature of individual channels in the multi-modal framework.


international conference on acoustics, speech, and signal processing | 2004

Multisensor MELPe using parameter substitution

Kevin Brady; Thomas F. Quatieri; Joseph P. Campbell; William M. Campbell; Michael S. Brandstein; Clifford J. Weinstein

The estimation of speech parameters and the intelligibility of speech transmitted through low-rate coders, such as MELP (mixed excitation linear prediction), are severely degraded when there are high levels of acoustic noise in the speaking environment. The application of nonacoustic and nontraditional sensors, which are less sensitive to acoustic noise than the standard microphone, is being investigated as a means to address this problem. Sensors being investigated include the general electromagnetic motion sensor (GEMS) and the physiological microphone (P-mic). As an initial effort in this direction, a multisensor MELPe coder (MELP coder with the addition of a noise preprocessor) using parameter substitution has been developed, where pitch and voicing parameters are obtained from GEMS and P-Mic sensors, respectively, and the remaining parameters are obtained as usual from a standard acoustic microphone. This parameter substitution technique is shown to produce significant and promising DRT (diagnostic rhyme test) intelligibility improvements over the standard 2400 bps MELPe coder in several high-noise military environments. Further work is in progress aimed at utilizing the nontraditional sensors for additional intelligibility improvements and for more effective lower-rate coding in noise.


international conference on image processing | 2016

How deep neural networks can improve emotion recognition on video data

Pooya Khorrami; Tom Le Paine; Kevin Brady; Charlie K. Dagli; Thomas S. Huang

We consider the task of dimensional emotion recognition on video data using deep learning. While several previous methods have shown the benefits of training temporal neural network models such as recurrent neural networks (RNNs) on hand-crafted features, few works have considered combining convolutional neural networks (CNNs) with RNNs. In this work, we present a system that performs emotion recognition on video data using both CNNs and RNNs, and we also analyze how much each neural network component contributes to the systems overall performance. We present our findings on videos from the Audio/Visual+Emotion Challenge (AV+EC2015). In our experiments, we analyze the effects of several hyperparameters on overall performance while also achieving superior performance to the baseline and other competing methods.


international conference on acoustics, speech, and signal processing | 2008

Multisensor very lowbit rate speech coding using segment quantization

Alan McCree; Kevin Brady; Thomas F. Quatieri

We present two approaches to noise robust very low bit rate speech coding using wideband MELP analysis/synthesis. Both methods exploit multiple acoustic and non-acoustic input sensors, using our previously-presented dynamic waveform fusion algorithm to simultaneously perform waveform fusion, noise suppression, and cross-channel noise cancellation. One coder uses a 600 bps scalable phonetic vocoder, with a phonetic speech recognizer followed by joint predictive vector quantization of the error in wideband MELP parameters. The second coder operates at 300 bps with fixed 80 ms segments, using novel variable-rate multistage matrix quantization techniques. Formal test results show that both coders achieve equivalent intelligibility to the 2.4 kbps NATO standard MELPe coder in harsh acoustic noise environments, at much lower bit rates, with only modest quality loss.


intelligent robots and systems | 2000

Internet based manufacturing technology: intelligent remote teleoperation

Kevin Brady; Tzyh Jong Tarn

Teleoperation permits humans to maneuver robots from a distance. Thus, a humans ability to intelligently manipulate and inspect can be performed in an otherwise inaccessible environment. As a result, teleoperation is gaining acceptance as a cost-effective way to work in remote, often hazardous environments. However, two significant challenges in designing such systems remain: 1. The communication link between the teleoperator and the telerobots location is usually bandwidth limited and has time-varying delays. Research shows that even a fraction of a second delay between generating a command and observing the corresponding action can seriously degrade the human operators intuition. This, in turn, diminishes his or her effectiveness. 2. The human teleoperator must rely on artificial means to gain sensory information from the remote environment. This observation is always incomplete due to current bandwidth and sensor limitations. (Also, remember, it is received in a delayed fashion.) The communication channel also limits the fidelity with which the human teleoperator can intervene in the remote environment. The research presented here addresses human/machine cooperation over a bandwidth-limited communication channel with time-varying delays. This cooperation is crucial for taking advantage of the automations efficiency and the human operators intelligence.


The International Journal of Robotics Research | 1998

A Modular Telerobotic Architecture for Waste Retrieval and Remediation

Kevin Brady; Tzyh Jong Tarn; Ning Xi; Lonnie J. Love; Peter D. Lloyd; Barry L. Burks; Hurley Davis

The Oak Ridge National Laboratory (ORNL) has developed and de ployed a telerobotic approach for the remote retrieval of hazardous and radioactive wastes from underground storage tanks. The teler obotic system, built by SparAerospace Ltd., is capable of dislodging and removing sludge and gravel-like wastes without endangering the human operators through direct contact with the environment. Working in partnership with Washington University, ORNL is im plementing an event-based planner/Function-Based Sharing Con troller (FBSC) as an integral part of their overall telerobotic archi tecture. These aspects of the system enable the seamless union of the human operator and an autonomous controller in such a way as to emphasize safety without loss of performance. The cooperation between ORNL, Spar, and Washington University requires an open and modular control software architecture to enable the parallel de velopment of various components of the system. ControlShell has been used as the underlying software architecture to help meet these criteria of generality and modularity.


ieee international conference on technologies for homeland security | 2011

Face recognition despite missing information

Charlie K. Dagli; Kevin Brady; Daniel C. Halbert

Missing or degraded information continues to be a significant practical challenge facing automatic face representation and recognition. Generally, existing approaches seek either to generatively invert the degradation process or find discriminative representations that are immune to it. Ideally, the solution to this problem exists between these two perspectives. To this end, in this paper we show the efficacy of using probabilistic linear subspace models (in particular variational probabilistic PCA) for both modeling and recognizing facial data under disguise or occlusion. From a discriminative perspective, we verify the efficacy of this approach for attenuating the effect of missing data due to disguise and non-linear speculars in several verification experiments. From a generative view, we show its usefulness in not only estimating missing information, but also understanding facial covariates for image reconstruction. In addition, we present a least-squares connection to the maximum likelihood solution under missing data and show its intuitive connection to the geometry of the subspace learning problem.


international conference on acoustics, speech, and signal processing | 2007

An Evaluation of Audio-Visual Person Recognition on the XM2VTS Corpus using the Lausanne Protocols

Kevin Brady; Michael S. Brandstein; Thomas F. Quatieri; Bob Dunn

A multimodal person recognition architecture has been developed for the purpose of improving overall recognition performance and for addressing channel-specific performance shortfalls. This multimodal architecture includes the fusion of a face recognition system with the MIT/LL GMM/UBM speaker recognition architecture. This architecture exploits the complementary and redundant nature of the face and speech modalities. The resulting multimodal architecture has been evaluated on the XM2VTS corpus using the Lausanne open set verification protocols, and demonstrates excellent recognition performance. The multimodal architecture also exhibits strong recognition performance gains over the performance of the individual modalities.

Collaboration


Dive into the Kevin Brady's collaboration.

Top Co-Authors

Avatar

Thomas F. Quatieri

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

William M. Campbell

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Joseph P. Campbell

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Michael S. Brandstein

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Charlie K. Dagli

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Clifford J. Weinstein

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Tzyh Jong Tarn

Washington University in St. Louis

View shared research outputs
Top Co-Authors

Avatar

Alan McCree

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Douglas A. Reynolds

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge