Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Michael A. Carlin is active.

Publication


Featured researches published by Michael A. Carlin.


PLOS Computational Biology | 2013

Sustained firing of model central auditory neurons yields a discriminative spectro-temporal representation for natural sounds.

Michael A. Carlin; Mounya Elhilali

The processing characteristics of neurons in the central auditory system are directly shaped by and reflect the statistics of natural acoustic environments, but the principles that govern the relationship between natural sound ensembles and observed responses in neurophysiological studies remain unclear. In particular, accumulating evidence suggests the presence of a code based on sustained neural firing rates, where central auditory neurons exhibit strong, persistent responses to their preferred stimuli. Such a strategy can indicate the presence of ongoing sounds, is involved in parsing complex auditory scenes, and may play a role in matching neural dynamics to varying time scales in acoustic signals. In this paper, we describe a computational framework for exploring the influence of a code based on sustained firing rates on the shape of the spectro-temporal receptive field (STRF), a linear kernel that maps a spectro-temporal acoustic stimulus to the instantaneous firing rate of a central auditory neuron. We demonstrate the emergence of richly structured STRFs that capture the structure of natural sounds over a wide range of timescales, and show how the emergent ensembles resemble those commonly reported in physiological studies. Furthermore, we compare ensembles that optimize a sustained firing code with one that optimizes a sparse code, another widely considered coding strategy, and suggest how the resulting population responses are not mutually exclusive. Finally, we demonstrate how the emergent ensembles contour the high-energy spectro-temporal modulations of natural sounds, forming a discriminative representation that captures the full range of modulation statistics that characterize natural sound ensembles. These findings have direct implications for our understanding of how sensory systems encode the informative components of natural stimuli and potentially facilitate multi-sensory integration.


ieee aerospace conference | 2007

Detection of Speaker Change Points in Conversational Speech

Michael A. Carlin; Brett Y. Smolenski

An important preprocessing step in many automatic speech segmentation and speaker clustering systems is the accurate detection of speaker change points, the times when one speaker stops talking and another begins. However, this becomes very difficult in conversational speech since utterance lengths can be extremely short, speaker changes occur frequently, speakers may talk over one another (co-channel interference), and the recording environment and/or communication channel is sub-optimal or degraded. Modern aviation systems can benefit from this research as a pre-processing stage in a variety of applications. Examples include automatic segmentation and clustering of pilot/air traffic controller communications, detection of a third or unauthorized speaker in commercial airline cockpits, and automatic transcription of cockpit audio recordings. This research presents an approach to detecting speaker change points using information obtained from voiced speech segments. This permits taking advantage of the facts that (1) speaker starting and stopping information should be contained between segments of voiced speech and (2) voiced speech contains the most useful speaker identifiable information. The technique presented here shows promise as an enhancement to currently available change point detection algorithms.


IEEE Transactions on Audio, Speech, and Language Processing | 2015

A framework for speech activity detection using adaptive auditory receptive fields

Michael A. Carlin; Mounya Elhilali

One of the hallmarks of sound processing in the brain is the ability of the nervous system to adapt to changing behavioral demands and surrounding soundscapes. It can dynamically shift sensory and cognitive resources to focus on relevant sounds. Neurophysiological studies indicate that this ability is supported by adaptively retuning the shapes of cortical spectro-temporal receptive fields (STRFs) to enhance features of target sounds while suppressing those of task-irrelevant distractors. Because an important component of human communication is the ability of a listener to dynamically track speech in noisy environments, the solution obtained by auditory neurophysiology implies a useful adaptation strategy for speech activity detection (SAD). SAD is an important first step in a number of automated speech processing systems, and performance is often reduced in highly noisy environments. In this paper, we describe how task-driven adaptation is induced in an ensemble of neurophysiological STRFs, and show how speech-adapted STRFs reorient themselves to enhance spectro-temporal modulations of speech while suppressing those associated with a variety of nonspeech sounds. We then show how an adapted ensemble of STRFs can better detect speech in unseen noisy environments compared to an unadapted ensemble and a noise-robust baseline. Finally, we use a stimulus reconstruction task to demonstrate how the adapted STRF ensemble better captures the spectro-temporal modulations of attended speech in clean and noisy conditions. Our results suggest that a biologically plausible adaptation framework can be applied to speech processing systems to dynamically adapt feature representations for improving noise robustness.


Frontiers in Computational Neuroscience | 2015

Modeling attention-driven plasticity in auditory cortical receptive fields.

Michael A. Carlin; Mounya Elhilali

To navigate complex acoustic environments, listeners adapt neural processes to focus on behaviorally relevant sounds in the acoustic foreground while minimizing the impact of distractors in the background, an ability referred to as top-down selective attention. Particularly striking examples of attention-driven plasticity have been reported in primary auditory cortex via dynamic reshaping of spectro-temporal receptive fields (STRFs). By enhancing the neural response to features of the foreground while suppressing those to the background, STRFs can act as adaptive contrast matched filters that directly contribute to an improved cognitive segregation between behaviorally relevant and irrelevant sounds. In this study, we propose a novel discriminative framework for modeling attention-driven plasticity of STRFs in primary auditory cortex. The model describes a general strategy for cortical plasticity via an optimization that maximizes discriminability between the foreground and distractors while maintaining a degree of stability in the cortical representation. The first instantiation of the model describes a form of feature-based attention and yields STRF adaptation patterns consistent with a contrast matched filter previously reported in neurophysiological studies. An extension of the model captures a form of object-based attention, where top-down signals act on an abstracted representation of the sensory input characterized in the modulation domain. The object-based model makes explicit predictions in line with limited neurophysiological data currently available but can be readily evaluated experimentally. Finally, we draw parallels between the model and anatomical circuits reported to be engaged during active attention. The proposed model strongly suggests an interpretation of attention-driven plasticity as a discriminative adaptation operating at the level of sensory cortex, in line with similar strategies previously described across different sensory modalities.


conference on information sciences and systems | 2011

Exploiting temporal coherence in speech for data-driven feature extraction

Michael A. Carlin; Mounya Elhilali

It is well known that speech sounds evolve at multiple timescales over the course of tens to hundreds of milliseconds. Such temporal modulations are crucial for speech perception and are believed to directly influence the underlying code for representing acoustic stimuli. The present work seeks to explicitly quantify this relationship using the principle of temporal coherence. Here we show that by constraining the outputs of model linear neurons to be highly correlated over timescales relevant to speech, we observe the emergence of neural response fields that are bandpass, localized, and reflective of the rich spectro-temporal structure present in speech. The emergent response fields also appear to share qualitative similarities those observed in auditory neurophysiology. Importantly, learning is accomplished using unlabeled speech data, and the emergent neural properties well-characterize the spectro-temporal statistics of the input. We analyze the characteristics and coverage of ensembles of learned response fields for a variety of timescales, and suggest uses of such a coherence learning framework for common speech tasks.


international conference on acoustics, speech, and signal processing | 2009

Perturbation and pitch normalization as enhancements to speaker recognition

Aaron Lawson; M. Linderman; Matthew R. Leonard; Allen Stauffer; B. B. Pokines; Michael A. Carlin

This study proposes an approach to improving speaker recognition through the process of minute vocal tract length perturbation of training files, coupled with pitch normalization for both train and test data. The notion of perturbation as a method for improving the robustness of training data for supervised classification is taken from the field of optical character recognition, where distorting characters within a certain range has shown strong improvements across disparate conditions. This paper demonstrates that acoustic perturbation, in this case analysis, distortion, and resynthesis of vocal tract length for a given speaker, significantly improves speaker recognition when the resulting files are used to augment or replace the training data. A pitch length normalization technique is also discussed, which is combined with perturbation to improve open-set speaker recognition from an EER of 20% to 6.7%.


Journal of the Acoustical Society of America | 2008

Estimation of target‐to‐interferer ratio using the Auditory Image Model

Michael A. Carlin

large, then accurate recognition results can still be achieved. During phonation, estimation of TIR is especially critical since uncorrupted vowel sounds contain important speaker-discriminating information. This research investigates a method to estimate the relative intensity of interfering speech using the Auditory Image Model (AIM) of Patterson et al. (J. Acoust. Soc. Am., Vol. 98, pp. 1890-1894, 1995). The proposed TIR estimator attempts to exploit both the apparent high resolution in the simulated Neural Activity Pattern and variation in cross-channel Strobe Point correlation when observing overlapping vowel sounds. Experiments were conducted for five canonical male vowels which were perceptually-scaled using the STRAIGHT algorithm (Chapter in Speech Separation by Humans and Machines, P. Divenyi, ed., Kluwer Academic Publishers, 2005) and superimposed at varying levels of TIR. Results suggest that the proposed approach is a promising step towards both detecting the presence and relative intensity of an interfering speaker.


conference of the international speech communication association | 2011

Rapid Evaluation of Speech Representations for Spoken Term Discovery.

Michael A. Carlin; Samuel Thomas; Aren Jansen; Hynek Hermansky


conference of the international speech communication association | 2012

Speech Enhancement Using Sparse Convolutive Non-negative Matrix Factorization with Basis Adaptation.

Michael A. Carlin; Nicolas Malyska; Thomas F. Quatieri


conference of the international speech communication association | 2006

Unsupervised detection of whispered speech in the presence of normal phonation.

Michael A. Carlin; Brett Y. Smolenski; Stanley J. Wenndt

Collaboration


Dive into the Michael A. Carlin's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Allen Stauffer

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Aren Jansen

Johns Hopkins University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kailash Patil

Johns Hopkins University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

M. Linderman

Air Force Research Laboratory

View shared research outputs
Top Co-Authors

Avatar

Matthew R. Leonard

University of Texas at Dallas

View shared research outputs
Researchain Logo
Decentralizing Knowledge