Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Douglas O’Shaughnessy is active.

Publication


Featured researches published by Douglas O’Shaughnessy.


Journal of the Acoustical Society of America | 1983

Linguistic modality effects on fundamental frequency in speech

Douglas O’Shaughnessy; Jonathan Allen

This paper examines the effects on fundamental frequency (F0) patterns of modality operators, such as sentential adverbs, modals, negatives, and quantifiers. These words form inherently contrastive classes which have varying tendencies to produce emphasis deviations in F0 contours. Three speakers read a set of 186 sentences and three paragraphs to provide data for F0 analysis. The important words in each sentence were marked intonationally with rises or sharp falls in F0, compared to gradually falling F0 in unemphasized words. These emphasis deviations were measured in terms of F0 variations from the norm; they were larger toward the beginning of sentences, in longer sentences, on syllables surrounded by unemphasized syllables, and in contrastive contexts. Other results showed that embedded clauses tended to have lower F0, and negative contractions were emphasized on their first syllables. Individual speakers differed in overall F0 levels, while using roughly similar emphasis strategies. F0 levels changed in paragraphs, with emphasis going to contextually new information.


Journal of the Acoustical Society of America | 1996

CRITIQUE : SPEECH PERCEPTION : ACOUSTIC OR ARTICULATORY ?

Douglas O’Shaughnessy

In this review of John Ohala’s paper (‘‘Speech perception is hearing sounds, not tongues’’), we are sympathetic to his position that acoustic properties have largely driven the selection of sounds found in speech production. It is very reasonable that listeners need not refer to articulation during speech perception. However, like many human perception theories, it is difficult (if not impossible) to verify experimentally. The arguments Ohala uses to attack the articulatory theories of speech perception do not really disprove them; instead, they simply argue that an acoustically based theory is more plausible. When dealing with plausibility arguments, one must be careful to ponder the evidence for different positions. Thus this paper comments on the weight of Ohala’s arguments and notes whether any theory of the fundamental basis of speech perception can actually be proven or disproven. It also discusses speech perception from the communication point of view.


non-linear speech processing | 2013

Smoothed Nonlinear Energy Operator-Based Amplitude Modulation Features for Robust Speech Recognition

Md. Jahangir Alam; Patrick Kenny; Douglas O’Shaughnessy

In this paper we present a robust feature extractor that includes the use of a smoothed nonlinear energy operator (SNEO)-based amplitude modulation features for a large vocabulary continuous speech recognition (LVCSR) task. SNEO estimates the energy required to produce the AM-FM signal, and then the estimated energy is separated into its amplitude and frequency components using an energy separation algorithm (ESA). Similar to the PNCC (Power Normalized Cepstral Coefficients) front-end, a medium duration power bias subtraction (MDPBS) is used to enhance the AM power spectrum. The performance of the proposed feature extractor is evaluated, in the context of speech recognition, on the AURORA-4 corpus, which represents additive noise and channel mismatch conditions. The ETSI advanced front-end (ETSI-AFE),power normalized cepstral coefficients (PNCC), Cochlear filterbank cepstral coefficients (CFCC) and conventional MFCC and PLP features are used for comparison purposes. Experimental speech recognition results on the AURORA-4 task depict that the proposed method is robust against both additive and different microphone channel environments.


Soft Computing | 2011

Real-Time Bayesian Inference: A Soft Computing Approach to Environmental Learning for On-Line Robust Automatic Speech Recognition

Foezur Rahman Chowdhury; Sid-Ahmed Selouani; Douglas O’Shaughnessy

In this paper, we developed soft computing models for on-line automatic speech recognition (ASR) based on Bayesian on-line inference techniques.Bayesian on-line inference for change point detection (BOCPD) is tested for on-line environmental learning using highly non-stationary noisy speech samples from the Aurora2 speech database. Significant improvement in predicting and adapting to new acoustic conditions is obtained for highly non-stationary noises. The simulation results show that the Bayesian on-line inference-based soft computing approach would be one of the possible solutions to on-line ASR for real-time applications.


international conference on signal processing | 2009

A Comparative Study of Blind Speech Separation Using Subspace Methods and Higher Order Statistics

Yasmina Benabderrahmane; Sid-Ahmed Selouani; Douglas O’Shaughnessy; Habib Hamam

In this paper we report the results of a comparative study on blind speech signal separation approaches. Three algorithms, Oriented Principal Component Analysis (OPCA), High Order Statistics (HOS), and Fast Independent Component Analysis (Fast-ICA), are objectively compared in terms of signal-to-interference ratio criteria. The results of experiments carried out using the TIMIT and AURORA speech databases show that OPCA outperforms the other techniques. It turns out that OPCA can be used for blindly separating temporal signals from their linear mixtures without need for a pre-whitening step.


Journal of the Acoustical Society of America | 1992

Acoustical analysis of false starts in spontaneous speech

Douglas O’Shaughnessy

In spontaneous speech many false starts occur, where a speaker interrupts the flow of speech to restart the utterance. The acoustic aspects of such restarts in a common database were examined for duration and fundamental frequency. Automatically identifying the type of restart could improve speech recognition performance, by eliminating one version of any repeated words (or parts), and in the case of changed words, suppressing the unwanted words, so that the recognizer operates on only desired words. In virtually all current recognizers, words in a restart either simply pass to the textual component of the recognizer or cause difficulties in having a proper interpretation in the language‐model component (since the language model is invariably trained only on fluent text). The spoken data consists of 42 speakers, each speaking about 30 different utterances (median length of about 12 words). There were 60 occasions with simple repeated words (or portions), 30 cases of inserted words, and 25 occurrences of n...


international workshop on ambient assisted living | 2014

Automatic emotion recognition from cochlear implant-like spectrally reduced speech

Jahangir Alam; Yazid Attabi; Patrick Kenny; Pierre Dumouchel; Douglas O’Shaughnessy

In this paper we present a robust feature extractor that includes the In this paper we study the performance of emotion recognition from cochlear implant-like spectrally reduced speech (SRS) using the conventional Mel-frequency cepstral coefficients and a Gaussian mixture model (GMM)-based classifier. Cochlear-implant-like SRS of each utterance from the emotional speech corpus is obtained only from low-bandwidth subband temporal envelopes of the corresponding original utterance. The resulting utterances have less spectral information than the original utterances but contain the most relevant information for emotion recognition. The emotion classes are trained on the Mel-frequency cepstral coefficient (MFCC) features extracted from the SRS signals and classification is performed using MFCC features computed from the test SRS signals. In order to evaluate to the performance of the SRS-MFCC features, emotion recognition experiments are conducted on the FAU AIBO spontaneous emotion corpus. Conventional MFCC, Mel-warped DFT (discrete Fourier transform) spectrum-based cepstral coefficients (MWDCC), PLP (perceptual linear prediction), and amplitude modulation cepstral coefficient (AMCC) features extracted from the original signals are used for comparison purpose. Experimental results depict that the SRS-MFCC features outperformed all other features in terms of emotion recognition accuracy. Average relative improvements obtained over all baseline systems are 1.5% and 11.6% in terms of unweighted average recall and weighted average recall, respectively.


Archive | 2011

Convolutive Blind Separation of Speech Mixtures Using Auditory-Based Subband Model

Sid-Ahmed Selouani; Yasmina Benabderrahmane; Abderraouf Ben Salem; Habib Hamam; Douglas O’Shaughnessy

A new blind speech separation (BSS) method of convolutive mixtures is presented. This method uses a sample-by-sample algorithm to perform the subband decomposition by mimicking the processing performed by the human ear. The unknown source signals are separated by maximizing the entropy of a transformed set of signal mixtures through the use of a gradient ascent algorithm. Experimental results show the efficiency of the proposed approach in terms of signal-to-interference ratio (SIR) and perceptual evaluation of speech quality (PESQ) criteria. Compared to the fullband method that uses the Infomax algorithm and to the convolutive fast independent component analysis (C-FICA), our method achieves a better PESQ score and shows an important improvement of SIR for different locations of sensor inputs.


Journal of the Acoustical Society of America | 2010

Dealing with noise in automatic speech recognition.

Douglas O’Shaughnessy

While automatic speech recognition (ASR) can work very well for clean speech, recognition accuracy often degrades significantly when the speech signal is subject to corruption, as occurs in many communication channels. This paper will survey recent methods for handling various distortions in practical ASR. The problem is often presented as an issue of mismatch between the models that are created during prior training phases and unforeseen environmental acoustic conditions that occur during the normal test phase. As one can never anticipate all possible future conditions, ASR analysis must be able to adapt to a wide variety of distortions. Human listeners furnish a useful standard of comparison for ASR in that humans are much more flexible in handling unexpected acoustic distortions than current ASR is. Methods that adapt ASR features and models will be compared against ASR methods that enhance the noisy input speech. Other topics to be discussed will include estimation of noise and channel parameters, RAS...


international conference on signal processing | 2009

Robust Speech Enhancement Using Two-Stage Filtered Minima Controlled Recursive Averaging

Negar Ghourchian; Sid-Ahmed Selouani; Douglas O’Shaughnessy

In this paper we propose an algorithm for estimating noise in highly non-stationary noisy environments, which is a challenging problem in speech enhancement. This method is based on minima-controlled recursive averaging (MCRA) whereby an accurate, robust and efficient noise power spectrum estimation is demonstrated. We propose a two-stage technique to prevent the appearance of musical noise after enhancement. This algorithm filters the noisy speech to achieve a robust signal with minimum distortion in the first stage. Subsequently, it estimates the residual noise using MCRA and removes it with spectral subtraction. The proposed Filtered MCRA (FMCRA) performance is evaluated using objective tests on the Aurora database under various noisy environments. These measures indicate the higher output SNR and lower output residual noise and distortion.

Collaboration


Dive into the Douglas O’Shaughnessy's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Habib Hamam

Université de Moncton

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yasmina Benabderrahmane

Institut national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Pierre Dumouchel

École de technologie supérieure

View shared research outputs
Researchain Logo
Decentralizing Knowledge