Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Mahnoosh Mehrabani is active.

Publication


Featured researches published by Mahnoosh Mehrabani.


international conference on acoustics, speech, and signal processing | 2010

Dialect distance assessment method based on comparison of pitch pattern statistical models

Mahnoosh Mehrabani; Hynek Boril; John H. L. Hansen

Dialect variations of a language have a severe impact on the performance of speech systems. Therefore, knowing how close or diverse dialects are in a given language space provides useful information to predict, or improve, system performance when there is a mismatch between train and test data. Distance measures have been used in several applications of speech processing. However, apart from phonetic measures, little if any work has been done on dialect distance measurement. This study explores differences in pitch movement microstructure among dialects. A method of dialect distance assessment based on pitch patterns modeled progressively from pitch contour primitives is proposed. The presented method does not require any manual labeling and is text-independent. The KL divergence is employed to compare the resulting statistical models. The proposed scheme is evaluated on a corpus of Arabic dialects, and shown to be consistent with the results from the spectral-based dialect classification system. Finally, it is also shown using a perceptive evaluation that the proposed objective approach correlates well with subjective distances.


Speech Communication | 2013

Singing speaker clustering based on subspace learning in the GMM mean supervector space

Mahnoosh Mehrabani; John H. L. Hansen

Highlights? Mixed style speech causes problems when training acoustic models for speech applications, such as speaker ID and ASR. ? This study is a first attempt for speaker clustering under mixed speaking styles which include reading and singing. ? Two types of subspace learning strategies in the GMM mean supervector space are studied: unsupervised and supervised. ? Advanced clustering algorithms are evaluated on a database that includes reading and singing the lyrics for each speaker. ? LPP subspace learning and a proposed cluster refining based on PLDA significantly improves clustering accuracies. In this study, we propose algorithms based on subspace learning in the GMM mean supervector space to improve performance of speaker clustering with speech from both reading and singing. As a speaking style, singing introduces changes in the time-frequency structure of a speakers voice. The purpose of this study is to introduce advancements for speech systems such as speech indexing and retrieval which improve robustness to intrinsic variations in speech production. Speaker clustering techniques such as k-means and hierarchical are explored for analysis of acoustic space differences of a corpus consisting of reading and singing of lyrics for each speaker. Furthermore, a distance based on fuzzy c-means membership degrees is proposed to more accurately measure clustering difficulty or speaker confusability. Two categories of subspace learning methods are studied: unsupervised based on LPP, and supervised based on PLDA. Our proposed clustering method based on PLDA is a two stage algorithm: where first, initial clusters are obtained using full dimension supervectors, and next, each cluster is refined in a PLDA subspace resulting in a more speaker dependent representation that is less sensitive to speaking style. It is shown that LPP improves average clustering accuracy by 5.1% absolute versus a hierarchical baseline for a mixture of reading and singing, and PLDA based clustering increases accuracy by 9.6% absolute versus a k-means baseline. The advancements offer novel techniques to improve model formulation for speech applications including speaker ID, audio search, and audio content analysis.


international conference on acoustics, speech, and signal processing | 2010

Automatic language analysis and identification based on speech production knowledge

Abhijeet Sangwan; Mahnoosh Mehrabani; John H. L. Hansen

A language analysis and classification system that leverages knowledge of speech production is proposed. The proposed scheme automatically extracts key production traits (or “hot-spots”) that are strongly tied to the underlying language structure. Particularly, the speech utterance is first parsed into consonant and vowel clusters. Subsequently, the production traits for each cluster is represented by the corresponding temporal evolution of speech articulatory states. It is hypothesized that a selection of these production traits are strongly tied to the underlying language, and can be exploited for language ID. The new scheme is evaluated on our South Indian Languages (SInL) corpus which consists of 5 closely related languages spoken in India, namely, Kannada, Tamil, Telegu, Malayalam, and Marathi. Good accuracy is achieved with a rate of 65% obtained in a difficult 5-way classification task with about 4sec of train and test speech data per utterance. Furthermore, the proposed scheme is also able to automatically identify key production traits of each language (e.g., dominant vowels, stop-consonants, fricatives etc.).


international conference on acoustics, speech, and signal processing | 2011

Language identification for singing

Mahnoosh Mehrabani; John H. L. Hansen

In spoken language processing, considerable research has been accomplished on language identification. Singing language identification is an important yet challenging area that has attracted only a few researchers in music processing. As one information source that can be extracted from music, the language of vocal music is useful for song classification, recognition, and retrieval based on the singing language, specially in unlabeled or mislabeled music collections. In addition, consideration of singing as a speaking style introduces new challenges to existing language identification systems. Our objective in this paper, as one of the first attempts for singing language identification, is to evaluate successful LID systems, specifically PPRLM with singing speech. Furthermore, we propose a prosodic approach based on pitch contour approximation and compare the results to PPRLM system. Language identification performance for singing and read speech are compared in both systems. Finally, we combine the PPRLM and prosodic systems which achieves an average performance improvement of 4.7% for singing, and 8.7% for read speech compared to the baseline PPRLM system. Our evaluations are based on a multilingual singing corpus that we have collected for this study.


international conference on acoustics, speech, and signal processing | 2011

Language identification using a combined articulatory prosody framework

Abhijeet Sangwan; Mahnoosh Mehrabani; John H. L. Hansen

This study presents new advancements in our articulatory-based language identification (LID) system. Our LID system automatically identifies language-features (LFs) from a phonological features (PFs) based representation of speech. While our baseline system uses a static PF-representation for extracting LFs, the new system is based on a dynamic PF representation for feature extraction. Interestingly, the new LFs outperform our baseline system by 11.8% absolute in a difficult 5-way classification task of South Indian Languages. Additionally, we incorporate pitch and energy based features in our new system to leverage prosody in classification. In particular, we employ a Legendre polynomial based contour-estimation to capture shape parameters which are used in classification. Additionally, the fusion of PF and prosody-based LFs further improves the overall classification result by 16.5% absolute over the baseline system. Finally, the proposed articulatory language ID system is combined with a PPRLM (parallel phone recognition language model) system to obtain an overall classification accuracy of 86.6%.


conference of the international speech communication association | 2013

Unsupervised prominence prediction for speech synthesis

Mahnoosh Mehrabani; Taniya Mishra; Alistair D. Conkie


Archive | 2011

UTD-CRSS SYSTEMS FOR NIST LANGUAGE RECOGNITION EVALUATION 2011

Gang Liu; Seyed Omid Sadjadi; Taufiq Hasan; Jun-Won Suh; Chi Zhang; Mahnoosh Mehrabani; John H. L. Hansen


International Journal of Speech Technology | 2015

Automatic analysis of dialect/language sets

Mahnoosh Mehrabani; John H. L. Hansen


conference of the international speech communication association | 2008

Dialect separation assessment using log-likelihood score distributions

Mahnoosh Mehrabani; John H. L. Hansen


conference of the international speech communication association | 2012

Nativeness Classification with Suprasegmental Features on the Accent Group Level.

Mahnoosh Mehrabani; Joseph Tepperman; Emily Nava

Collaboration


Dive into the Mahnoosh Mehrabani's collaboration.

Top Co-Authors

Avatar

John H. L. Hansen

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Abhijeet Sangwan

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Hynek Boril

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Chi Zhang

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Emily Nava

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Gang Liu

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Joseph Tepperman

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Jun-Won Suh

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Ozlem Kalinli

University of Southern California

View shared research outputs
Researchain Logo
Decentralizing Knowledge