Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Seyed Omid Sadjadi is active.

Publication


Featured researches published by Seyed Omid Sadjadi.


IEEE Signal Processing Letters | 2013

Unsupervised Speech Activity Detection Using Voicing Measures and Perceptual Spectral Flux

Seyed Omid Sadjadi; John H. L. Hansen

Effective speech activity detection (SAD) is a necessary first step for robust speech applications. In this letter, we propose a robust and unsupervised SAD solution that leverages four different speech voicing measures combined with a perceptual spectral flux feature, for audio-based surveillance and monitoring applications. Effectiveness of the proposed technique is evaluated and compared against several commonly adopted unsupervised SAD methods under simulated and actual harsh acoustic conditions with varying distortion levels. Experimental results indicate that the proposed SAD scheme is highly effective and provides superior and consistent performance across various noise types and distortion levels.


international conference on acoustics, speech, and signal processing | 2011

Hilbert envelope based features for robust speaker identification under reverberant mismatched conditions

Seyed Omid Sadjadi; John H. L. Hansen

It is well known that MFCC based speaker identification (SID) systems easily break down under mismatched training and test conditions. One such mismatch occurs when a SID system is trained on anechoic speech data, while test is carried out using reverberant data collected via a distant microphone. In this study, a new set of feature parameters based on the Hilbert envelope of Gammatone filterbank outputs is proposed to improve SID performance in the presence of room reverberation. Considering two distinct perceptual effects of reverberation on speech signals, i.e., coloration and long-term reverberation, two different compensation strategies are integrated within the feature extraction framework to effectively suppress the effects of reverberation. Experimental evaluation is performed using speech material from the TIMIT, four different measured room impulse responses (RIR) from Aachen impulse response (AIR) database, and a GMM-based SID system. Obtained results indicate significant improvement over the baseline system with MFCCs plus cepstral mean subtraction (CMS), confirming the effectiveness of the proposed feature parameters for SID under reverberant mismatched conditions.


international conference on intelligent transportation systems | 2012

Leveraging sensor information from portable devices towards automatic driving maneuver recognition

Amardeep Sathyanarayana; Seyed Omid Sadjadi; John H. L. Hansen

With the proliferation of smart portable devices, more people have started using them within the vehicular environment while driving. Although these smart devices provide a variety of useful information, using them while driving significantly affects the drivers attention towards the road. This can in turn cause driver distraction and lead to increased risk of crashes. On the positive side, these devices are equipped with powerful sensors which can be effectively utilized towards driver behavior analysis and safety. This study evaluates the effectiveness of portable sensor information in driver assistance systems. Available signals from the CAN-bus are compared with those extracted from an off-the-shelf portable device for recognizing patterns in driving sub-tasks and maneuvers. Through our analysis, a qualitative feature set is identified with which portable devices could be employed to prune the search space in recognizing driving maneuvers and possible instances of driver distraction. An absolute improvement of 15% is achieved with portable sensor information compared to CAN-bus signals, which motivates further study of portable devices to build driver behavior models for driver assistance systems.


Speech Communication | 2015

Mean Hilbert envelope coefficients (MHEC) for robust speaker and language identification

Seyed Omid Sadjadi; John H. L. Hansen

Abstract Adverse noisy conditions pose great challenges to automatic speech applications including speaker and language identification (SID and LID), where mel-frequency cepstral coefficients (MFCC) are the most commonly adopted acoustic features. Although systems trained using MFCCs provide competitive performance under matched conditions, it is well-known that such systems are susceptible to acoustic mismatch between training and test conditions due to noise and channel degradations. Motivated by this fact, this study proposes an alternative noise-robust acoustic feature front-end that is capable of capturing speaker identity as well as language structure/content conveyed in the speech signal. Specifically, a feature extraction procedure inspired by the human auditory processing is proposed. The proposed feature is based on the Hilbert envelope of Gammatone filterbank outputs that represent the envelope of the auditory nerve response. The subband amplitude modulations, which are captured through smoothed Hilbert envelopes (a.k.a. temporal envelopes), carry useful acoustic information and have been shown to be robust to signal degradations. Effectiveness of the proposed front-end, which is entitled mean Hilbert envelope coefficients (MHEC), is evaluated in the context of SID and LID tasks using degraded speech material from the DARPA Robust Automatic Transcription of Speech (RATS) program. In addition, we investigate the impact of the dynamic range compression stage in the MHEC feature extraction process on performance using logarithmic and power-law nonlinearities. Experimental results indicate that: (i) the MHEC feature is highly effective and performs favorably compared to other conventional and state-of-the-art front-ends, and (ii) the power-law non-linearity consistently yields the best performance across different conditions for both SID and LID tasks.


arXiv: Sound | 2016

The IBM 2016 Speaker Recognition System

Seyed Omid Sadjadi; Sriram Ganapathy; Jason W. Pelecanos

In this paper we describe the recent advancements made in the IBM i-vector speaker recognition system for conversational speech. In particular, we identify key techniques that contribute to significant improvements in performance of our system, and quantify their contributions. The techniques include: 1) a nearest-neighbor discriminant analysis (NDA) approach that is formulated to alleviate some of the limitations associated with the conventional linear discriminant analysis (LDA) that assumes Gaussian class-conditional distributions, 2) the application of speaker- and channel-adapted features, which are derived from an automatic speech recognition (ASR) system, for speaker recognition, and 3) the use of a deep neural network (DNN) acoustic model with a large number of output units (~10k senones) to compute the frame-level soft alignments required in the i-vector estimation process. We evaluate these techniques on the NIST 2010 speaker recognition evaluation (SRE) extended core conditions involving telephone and microphone trials. Experimental results indicate that: 1) the NDA is more effective (up to 35% relative improvement in terms of EER) than the traditional parametric LDA for speaker recognition, 2) when compared to raw acoustic features (e.g., MFCCs), the ASR speaker-adapted features provide gains in speaker recognition performance, and 3) increasing the number of output units in the DNN acoustic model (i.e., increasing the senone set size from 2k to 10k) provides consistent improvements in performance (for example from 37% to 57% relative EER gains over our baseline GMM i-vector system). To our knowledge, results reported in this paper represent the best performances published to date on the NIST SRE 2010 extended core tasks.


international conference on acoustics, speech, and signal processing | 2013

Overlapped-speech detection with applications to driver assessment for in-vehicle active safety systems

Navid Shokouhi; Amardeep Sathyanarayana; Seyed Omid Sadjadi; John H. L. Hansen

In this study we propose a system for overlapped-speech detection. Spectral harmonicity and envelope features are extracted to represent overlapped and single-speaker speech using Gaussian mixture models (GMM). The system is shown to effectively discriminate the single and overlapped speech classes. We further increase the discrimination by proposing a phoneme selection scheme to generate more reliable artificial overlapped data for model training. Evaluations on artificially generated co-channel data show that the novelty in feature selection and phoneme omission results in a relative improvement of 10% in the detection accuracy compared to baseline. As an example application, we evaluate the effectiveness of overlapped-speech detection for vehicular environments and its potential in assessing driver alertness. Results indicate a good correlation between driver performance and the amount and location of overlapped-speech segments.


IEEE Transactions on Audio, Speech, and Language Processing | 2014

Blind spectral weighting for robust speaker identification under reverberation mismatch

Seyed Omid Sadjadi; John H. L. Hansen

Room reverberation poses various deleterious effects on performance of automatic speech systems. Speaker identification (SID) performance, in particular, degrades rapidly as reverberation time increases. Reverberation causes two forms of spectro-temporal distortions on speech signals: i) self-masking which is due to early reflections and ii) overlap-masking which is due to late reverberation. Overlap-masking effect of reverberation has been shown to have a greater adverse impact on performance of speech systems. Motivated by this fact, this study proposes a blind spectral weighting (BSW) technique for suppressing the reverberation overlap-masking effect on SID systems. The technique is blind in the sense that prior knowledge of neither the anechoic signal nor the room impulse response is required. Performance of the proposed technique is evaluated on speaker verification tasks under simulated and actual reverberant mismatched conditions. Evaluations are conducted in the context of the conventional GMM-UBM as well as the state-of-the-art i-vector based systems. The GMM-UBM experiments are performed using speech material from a new data corpus well suited for speaker verification experiments under actual reverberant mismatched conditions, entitled MultiRoom8. The i-vector experiments are carried out with microphone (interview and phonecall) data from the NIST SRE 2010 extended evaluation set which are digitally convolved with three different measured room impulse responses extracted from the Aachen impulse response (AIR) database. Experimental results prove that incorporating the proposed blind technique into the standard MFCC feature extraction framework yields significant improvement in SID performance under reverberation mismatch.


international conference on acoustics, speech, and signal processing | 2012

A comparison of front-end compensation strategies for robust LVCSR under room reverberation and increased vocal effort

Seyed Omid Sadjadi; Hynek Boril; John H. L. Hansen

Automatic speech recognition is known to deteriorate in the presence of room reverberation and variation of vocal effort in speakers. This study considers robustness of several state-of-the-art front-end feature extraction and normalization strategies to these sources of speech signal variability in the context of large vocabulary continuous speech recognition (LVCSR). A speech database recorded in an anechoic room, capturing modal speech and speech produced at different levels of vocal effort, is reverberated using measured room impulse responses and utilized in the evaluations. It is shown that the combination of recently introduced mean Hilbert envelope coefficients (MHEC) and a normalization strategy combining cepstral gain normalization and modified RASTA filtering (CGN_RASTALP) provides considerable recognition performance gains for reverberant modal and high vocal effort speech.


international conference on acoustics, speech, and signal processing | 2012

Blind reverberation mitigation for robust speaker identification

Seyed Omid Sadjadi; John H. L. Hansen

Reverberation poses detrimental effects on performance of automatic speaker identification (SID) systems. This paper proposes a blind spectral weighting technique for combating the late reverberation effect (aka overlap-masking effect) on SID systems. The technique is blind in the sense that prior knowledge of neither the anechoic signal nor the room impulse response is required. Performance of the proposed technique is evaluated in terms of: 1) accuracy obtained from closed-set SID experiments, using speech material from the TIMIT corpus and four different measured room impulse responses from Aachen impulse response (AIR) database, and 2) equal-error rate (EER) obtained from experiments on a new data corpus well suited for speaker verification experiments under actual reverberant mismatched conditions, entitled MultiRoom8. Results prove that incorporating the proposed blind technique into the standard MFCC feature extraction framework yields significant improvement in SID performance.


international conference on acoustics, speech, and signal processing | 2015

Nearest neighbor based i-vector normalization for robust speaker recognition under unseen channel conditions

Weizhong Zhu; Seyed Omid Sadjadi; Jason W. Pelecanos

Many state-of-the-art speaker recognition engines use i-vectors to represent variable-length acoustic signals in a fixed low-dimensional total variability subspace. While such systems perform well under seen channel conditions, their performance greatly degrades under unseen channel scenarios. Accordingly, rapid adaptation of i-vector systems to unseen conditions has recently attracted significant research effort from the community. To mitigate this mismatch, in this paper we propose nearest neighbor based i-vector mean normalization (NN-IMN) and i-vector smoothing (IS) for unsupervised adaptation to unseen channel conditions within a state-of-the-art i-vector/PLDA speaker verification framework. A major advantage of the approach is its ability to handle multiple unseen channels without explicit retraining or clustering. Our observations on the DARPA Robust Automatic Transcription of Speech (RATS) speaker recognition task suggest that part of the distortion caused by an unseen channel may be modeled as an offset in the i-vector space. Hence, the proposed nearest neighbor based normalization technique is formulated to compensate for such a shift. Experimental results with the NN based normalized i-vectors indicate that, on average, we can recover 46% of the total performance degradation due to unseen channel conditions.

Collaboration


Dive into the Seyed Omid Sadjadi's collaboration.

Top Co-Authors

Avatar

John H. L. Hansen

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Navid Shokouhi

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hynek Boril

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Gang Liu

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Taufiq Hasan

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Tomi Kinnunen

University of Eastern Finland

View shared research outputs
Top Co-Authors

Avatar

Elie Khoury

Idiap Research Institute

View shared research outputs
Researchain Logo
Decentralizing Knowledge