Karen Payton
University of Massachusetts Dartmouth
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Karen Payton.
Journal of the Acoustical Society of America | 1994
Karen Payton; Rosalie M. Uchanski; Louis D. Braida
The effect of articulating clearly on speech intelligibility is analyzed for ten normal-hearing and two hearing-impaired listeners in noisy, reverberant, and combined environments. Clear speech is more intelligible than conversational speech for each listener in every environment. The difference in intelligibility due to speaking style increases as noise and/or reverberation increase. The average difference in intelligibility is 20 percentage points for the normal-hearing listeners and 26 percentage points for the hearing-impaired listeners. Two predictors of intelligibility are used to quantify the environmental degradations: The articulation index (AI) and the speech transmission index (STI). Both are shown to predict, reliably, performance levels within a speaking style for normal-hearing listeners. The AI is unable to represent the reduction in intelligibility scores due to reverberation for the hearing-impaired listeners. Neither predictor can account for the difference in intelligibility due to speaking style.
Journal of the Acoustical Society of America | 1999
Karen Payton; Louis D. Braida
A method for computing the speech transmission index (STI) using real speech stimuli is presented and evaluated. The method reduces the effects of some of the artifacts that can be encountered when speech waveforms are used as probe stimuli. Speech-based STIs are computed for conversational and clearly articulated speech in several noisy, reverberant, and noisy-reverberant environments and compared with speech intelligibility scores. The results indicate that, for each speaking style, the speech-based STI values are monotonically related to intelligibility scores for the degraded speech conditions tested. Therefore, the STI can be computed using speech probe waveforms and the values of the resulting indices are as good predictors of intelligibility scores as those derived from MTFs by theoretical methods.
Journal of the Acoustical Society of America | 1988
Karen Payton
A model of peripheral auditory processing that incorporates processing steps describing the conversion from the acoustic pressure‐wave signal at the eardrum to the time course activity in auditory neurons has been developed. It can process arbitrary time domain waveforms and yield the probability of neural firing. The model consists of a concatenation of modules, one for each anatomical section of the periphery. All modules are based on published algorithms and current experimental data, except that the basilar membrane is assumed to be linear. The responses of this model to vowels alone and vowels in noise are compared to neural population responses, as determined by the temporal and average rate response measures of Sachs and Young [J. Acoust. Soc. Am. 66, 470–479, (1979)] and Young and Sachs [J. Acoust. Soc. Am. 66, 1381–1403, (1979)]. Despite the exclusion of nonlinear membrane mechanics, the model accurately predicts the vowel formant representations in the average localized synchronized rate (ALSR) ...
Journal of the Acoustical Society of America | 2013
Karen Payton; Mona Shrestha
Several algorithms have been shown to generate a metric corresponding to the Speech Transmission Index (STI) using speech as a probe stimulus [e.g., Goldsworthy and Greenberg, J. Acoust. Soc. Am. 116, 3679-3689 (2004)]. The time-domain approaches work well on long speech segments and have the added potential to be used for short-time analysis. This study investigates the performance of the Envelope Regression (ER) time-domain STI method as a function of window length, in acoustically degraded environments with multiple talkers and speaking styles. The ER method is compared with a short-time Theoretical STI, derived from octave-band signal-to-noise ratios and reverberation times. For windows as short as 0.3 s, the ER method tracks short-time Theoretical STI changes in stationary speech-shaped noise, fluctuating restaurant babble and stationary noise plus reverberation. The metric is also compared to intelligibility scores on conversational speech and speech articulated clearly but at normal speaking rates (Clear/Norm) in stationary noise. Correlation between the metric and intelligibility scores is high and, consistent with the subject scores, the metrics are higher for Clear/Norm speech than for conversational speech and higher for the first word in a sentence than for the last word.
workshop on applications of signal processing to audio and acoustics | 2009
Keith Gilbert; Karen Payton
This paper proposes a method to simultaneously estimate the number, pitches, and relative locations of individual speech sources within instantaneous and non-instantaneous linear mixtures containing additive white Gaussian noise. The algorithm makes no assumptions about the number of sources or the number of sensors, and is therefore applicable to over−, under−, and precisely-determined scenarios. The method is hypothesis-based and employs a power-spectrum-based FIR filter derived from probability distributions of speech pitch harmonics. This harmonic windowing function (HWF) dramatically improves time-difference of arrival (TDOA) estimates over standard cross-correlation for low SNR. The pitch estimation component of the algorithm implicitly performs voiced-region detection and does not require prior knowledge about voicing. Cumulative pitch and TDOA estimates from the HWF form the basis for robust source enumeration across a wide range of SNR.
Journal of the Acoustical Society of America | 2008
Karen Payton; Mona Shrestha
Various methods have been shown to compute the Speech Transmission Index (STI) using speech as a probe stimulus (Goldsworthy & Greenberg, J. Acoust. Soc. Am., 116, 3679‐3689, 2004). Frequency‐domain methods, while accurate at predicting the long‐term STI, cannot predict short‐term changes due to fluctuating backgrounds. Time‐domain methods also work well on long speech segments and have the added potential to be used for short‐time analysis. This study investigates the accuracy of two time‐domain STI methods: envelope regression (ER) and normalized correlation (NC), as functions of window length, in various acoustically degraded environments with multiple talkers and speaking styles. Short‐time STIs are compared with a short‐time Theoretical STI, derived from octave‐band signal‐to‐noise ratios and reverberation times. For windows as short as 0.3 s, the ER and NC Methods track the short‐time Theoretical STI and both the Theoretical and ER Methods converge to the long‐term result for windows greater than 4 ...
ieee sp international symposium on time frequency and time scale analysis | 1998
Ashok Ramasubramanian; Karen Payton; Antonio Costa
Loudness recruitment is a symptom of sensorineural hearing loss affecting the inner ear when the threshold of hearing is raised. The increase of the hearing threshold is often nonuniform across the range of audible frequencies and loudness perceptions are distorted. One method used to compensate for this type of hearing disorder is amplitude compression followed by equalization. In the present study, we compared amplitude compression in the discrete domain via two methods: (1) conventional two-channel amplitude compression, as proposed by Villchur and (2) wavelet based compression scheme using three levels of decomposition/reconstruction with the DB-9 wavelet. Both of these algorithms were tested on normal hearing subjects with elevated thresholds simulated by masking noise. The subjects were tested using nonsense sentences and the merit of each scheme was determined from the percentage of correctly identified keywords relative to linear amplification.
asilomar conference on signals, systems and computers | 2014
Keith Gilbert; Karen Payton
This paper proposes to enhance the blind source separation (BSS) solution by running multiple BSS algorithms in parallel and blending the outputs to produce a set of source estimates that is at least as good as any individual method, and potentially better. Although the method is applicable to more general BSS problems, the proposed blending method is described in the case of instantaneous mixtures of stationary, zero-mean, unit-variance, white sources. Experimental results show that the method is able to select a best set of sources with respect to minimum mutual information from an input consisting of source estimates.
Journal of the Acoustical Society of America | 2001
Peninah Fine Rosengard; Louis D. Braida; Karen Payton
Relations between objective intelligibility scores, subjective pleasantness ratings, and estimates of the STI for speech processed by multi‐band amplitude compression systems were studied in normal‐hearing listeners with simulated hearing loss. STI estimates were based on modulation spectrum changes in the processed speech signals [Payton and Braida, J. Acoust. Soc. Am. 106, 3637–3648 (1999)]. Linear amplification and two syllabic compression conditions were tested with and without two backgrounds: Speech‐spectrum noise and restaurant babble. Signals were compressed independently in four nonoverlapping frequency bands with compression ratios of two and three, and attack and release times of 20 and 200 ms, respectively. The NAL‐R formula determined output frequency‐gain characteristics. Flat, 50 dB, sensorineural hearing losses were simulated in normal‐hearing listeners via multiband expansion [Duchnowski and Zurek, J. Acoust. Soc. Am. 98, 3170–3181 (1995)]. Speech intelligibility and pleasantness ratings ...
Journal of the Acoustical Society of America | 1995
Karen Payton; Louis D. Braida
The speech transmission index (STI) is highly correlated with speech intelligibility scores when the environment is degraded by noise and/or reverberation and/or the listener’s hearing is impaired [e.g., Payton et al., J. Acoust. Soc. Am. 95, 1581–1592 (1994)]. The STI is typically computed from modulation transfer functions (MTFs) that are determined theoretically, based on effective SNR, or on measurements using the RASTI procedure. In principle, however, MTFs can be computed directly from speech envelope spectra. For the current study, envelope spectra were computed for both conversational and clearly articulated speech. Three environments were considered: quiet/anechoic, reverberant (0.6 s RT), and additive noise (0 dB SNR). Results indicate that reliable MTFs can be computed from speech envelope spectra if the coherence function is used to limit the range of modulation frequencies (to reduce the effects of computational artifacts). Also, while MTFs for the two speaking styles are very similar in addi...