Navid Shokouhi
University of Texas at Dallas
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Navid Shokouhi.
international conference on acoustics, speech, and signal processing | 2013
Navid Shokouhi; Amardeep Sathyanarayana; Seyed Omid Sadjadi; John H. L. Hansen
In this study we propose a system for overlapped-speech detection. Spectral harmonicity and envelope features are extracted to represent overlapped and single-speaker speech using Gaussian mixture models (GMM). The system is shown to effectively discriminate the single and overlapped speech classes. We further increase the discrimination by proposing a phoneme selection scheme to generate more reliable artificial overlapped data for model training. Evaluations on artificially generated co-channel data show that the novelty in feature selection and phoneme omission results in a relative improvement of 10% in the detection accuracy compared to baseline. As an example application, we evaluate the effectiveness of overlapped-speech detection for vehicular environments and its potential in assessing driver alertness. Results indicate a good correlation between driver performance and the amount and location of overlapped-speech segments.
international conference on acoustics, speech, and signal processing | 2015
Navid Shokouhi; Ali Ziaei; Abhijeet Sangwan; John H. L. Hansen
The ability to estimate the number of words spoken by an individual over a certain period of time is valuable in second language acquisition, healthcare, and assessing language development. However, establishing a robust automatic framework to achieve high accuracy is non-trivial in realistic/naturalistic scenarios due to various factors such as different styles of conversation or types of noise that appear in audio recordings, especially in multi-party conversations. In this study, we propose a noise robust overlapped speech detection algorithm to estimate the likelihood of overlapping speech in a given audio file in the presence of environment noise. This information is embedded into a word-count estimator, which uses a linear minimum mean square estimator (LMMSE) to predict the number of words from the syllable rate. Syllables are detected using a modified version of the mrate algorithm. The proposed word-count estimator is tested on long duration files from the Prof-Life-Log corpus. Data is recorded using a LENA recording device, worn by a primary speaker in various environments and under different noise conditions. The overlap detection system significantly outperforms baseline performance in noisy conditions. Furthermore, applying overlap detection results to word-count estimation achieves 35% relative improvement over our previous efforts, which included speech enhancement using spectral subtraction and silence removal.
spoken language technology workshop | 2014
Gang Liu; Chengzhu Yu; Navid Shokouhi; Abhinav Misra; Hua Xing; John H. L. Hansen
State-of-the-art speaker verification systems model speaker identity by mapping i-Vectors onto a probabilistic linear discriminant analysis (PLDA) space. Compared to other modeling approaches (such as cosine distance scoring), PLDA provides a more efficient mechanism to separate speaker information from other sources of undesired variabilities and offers superior speaker verification performance. Unfortunately, this efficiency is obtained at the cost of a required large corpus of labeled development data, which is too expensive/unrealistic in many cases. This study investigates a potential solution to resolve this challenge by effectively utilizing unlabeled development data with universal imposter clustering. The proposed method offers +21.9% and +34.6% relative gains versus the baseline system on two public available corpora, respectively. This significant improvement proves the effectiveness of the proposed method.
ieee intelligent vehicles symposium | 2013
Amardeep Sathyanarayana; Navid Shokouhi; Seyed Omid Sadjadi; John H. L. Hansen
In-vehicle conversations are typical when there is more than one person in the car. Although many conversations are beneficial in keeping the driver alert and active, there are also instances where a competitive conversation may adversely influence driving performance. Identifying such scenarios can improve vehicle safety systems by fusing the knowledge obtained from conversational speech analysis and vehicle dynamic signals. In this study we incorporate the use of smart portable devices to create a unified platform for recording in-vehicular conversations as well as the vehicle dynamic signals required to evaluate the driving performance. Results show that turn taking rate and overlapping speech segments under certain conditions correlate with deviations from normal driving patterns. The conversational speech analysis can thus be utilized as a component in driver assistance systems such that the impact of in-vehicle speech activity on driving performance is controlled or minimized.
conference of the international speech communication association | 2017
Chunlei Zhang; Fahimeh Bahmaninezhad; Shivesh Ranjan; Chengzhu Yu; Navid Shokouhi; John H. L. Hansen
In this study, we present systems submitted by the Center for Robust Speech Systems (CRSS) from UTDallas to NIST SRE 2018 (SRE18). Three alternative front-end speaker embedding frameworks are investigated, that includes: (i) i-vector, (ii) x-vector, (iii) and a modified triplet speaker embedding system (t-vector). Similar to the previous SRE, language mismatch between training and enrollment/test data, the so-called domain mismatch, remains as a major challenge in this evaluation. In addition, SRE18 also introduces a small portion of audio from an unstructured video corpus in which speaker detection/diarization is supposedly needed to be effectively integrated into speaker recognition for system robustness. In our system development, we focused on: (i) building novel deep neural network based speaker discriminative embedding systems as utterance level feature representations, (ii) exploring alternative dimension reduction methods, back-end classifiers, score normalization techniques which can incorporate unlabeled in-domain data for domain adaptation, (iii) finding an improved data set configurations for the speaker embedding network, LDA/PLDA, and score calibration training (v) and finally, investigating effective score calibration and fusion strategies. The final resulting systems are shown to be both complementary and effective in achieving overall improved speaker recognition performance.
conference of the international speech communication association | 2013
Rahim Saeidi; Tomi Kinnunen; Elie Khoury; P.L. Sordo Martinez; Hanwu Sun; Padmanabhan Rajan; V. Hautam; Cemal Hanilçi; R. Gonzales-Hautam; Seyed Omid Sadjadi; Navid Shokouhi; Driss Matrouf; L. El Shafey; John H. L. Hansen
international conference on acoustics, speech, and signal processing | 2013
Taufiq Hasan; Seyed Omid Sadjadi; Gang Liu; Navid Shokouhi; Hynek Boril; John H. L. Hansen
Archive | 2014
Gang Liu; Chengzhu Yu; Abhinav Misra; Navid Shokouhi; John H. L. Hansen
ieee intelligent vehicles symposium | 2015
Yang Zheng; Xian Shi; Amardeep Sathyanarayana; Navid Shokouhi; John H. L. Hansen
SAE 2016 World Congress and Exhibition | 2016
Yang Zheng; Navid Shokouhi; Nicolai Bæk Thomsen; Amardeep Sathyanarayana; John Hansen