Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ali Ziaei is active.

Publication


Featured researches published by Ali Ziaei.


international conference on acoustics, speech, and signal processing | 2013

Prof-Life-Log: Personal interaction analysis for naturalistic audio streams

Ali Ziaei; Abhijeet Sangwan; John H. L. Hansen

Analysis of personal audio recordings is a challenging and interesting subject. Using contemporary speech and language processing techniques, it is possible to mine personal audio recordings for a wealth of information that can be used to measure a persons engagement with their environment as well as other people. In this study, we propose an analysis system that uses personal audio recordings to automatically estimate the number of unique people and environments which encompass the total engagement within the recording. The proposed system uses speech activity detection (SAD), speaker diarization and environmental sniffing techniques, and is evaluated on naturalistic audio streams from the Prof-Life-Log corpus. We also report performance of the individual systems, and also present a combined analysis which reveals the interaction of the subject with both people and environment. Hence, this study establishes the efficacy and novelty of using contemporary speech technology for life logging applications.


international conference on acoustics, speech, and signal processing | 2012

ProfLifeLog: Environmental analysis and keyword recognition for naturalistic daily audio streams

Abhijeet Sangwan; Ali Ziaei; John H. L. Hansen

This study presents keyword recognition evaluation on a new corpus named ProfLifeLog. ProfLifeLog is a collection of data captured on a portable audio recording device called the LENA unit. Each session in ProfLifeLog consists of 10+ hours of continuous audio recording that captures the work day of the speaker (person wearing the LENA unit). This study presents keyword spotting evaluation on the ProfLifeLog corpus using the PCN-KWS (phone confusion network-keyword spotting) algorithm [2]. The ProfLifeLog corpus contains speech data in a variety of noise backgrounds which is challenging for keyword recognition. In order to improve keyword recognition, this study also develops a front-end environment estimation strategy that uses the knowledge of speech-pause decisions and SNR (signal-to-noise ratio) to provide noise robustness. The combination of the PCN-KWS and the proposed front-end technique is evaluated on 1 hour of ProfLifeLog corpus. Our evaluation experiments demonstrate the effectiveness of the proposed technique as the number of false alarms in keyword recognition are reduced considerably.


international conference on acoustics, speech, and signal processing | 2015

Robust unsupervised detection of human screams in noisy acoustic environments

Mahesh Kumar Nandwana; Ali Ziaei; John H. L. Hansen

This study is focused on an unsupervised approach for detection of human scream vocalizations from continuous recordings in noisy acoustic environments. The proposed detection solution is based on compound segmentation, which employs weighted mean distance, T2-statistics and Bayesian Information Criteria for detection of screams. This solution also employs an unsupervised threshold optimized Combo-SAD for removal of non-vocal noisy segments in the preliminary stage. A total of five noisy environments were simulated for noise levels ranging from -20dB to +20dB for five different noisy environments. Performance of proposed system was compared using two alternative acoustic front-end features (i) Mel-frequency cepstral coefficients (MFCC) and (ii) perceptual minimum variance distortionless response (PMVDR). Evaluation results show that the new scream detection solution works well for clean, +20, +10 dB SNR levels, with performance declining as SNR decreases to -20dB across a number of the noise sources considered.


international conference on acoustics, speech, and signal processing | 2015

Robust overlapped speech detection and its application in word-count estimation for Prof-Life-Log data

Navid Shokouhi; Ali Ziaei; Abhijeet Sangwan; John H. L. Hansen

The ability to estimate the number of words spoken by an individual over a certain period of time is valuable in second language acquisition, healthcare, and assessing language development. However, establishing a robust automatic framework to achieve high accuracy is non-trivial in realistic/naturalistic scenarios due to various factors such as different styles of conversation or types of noise that appear in audio recordings, especially in multi-party conversations. In this study, we propose a noise robust overlapped speech detection algorithm to estimate the likelihood of overlapping speech in a given audio file in the presence of environment noise. This information is embedded into a word-count estimator, which uses a linear minimum mean square estimator (LMMSE) to predict the number of words from the syllable rate. Syllables are detected using a modified version of the mrate algorithm. The proposed word-count estimator is tested on long duration files from the Prof-Life-Log corpus. Data is recorded using a LENA recording device, worn by a primary speaker in various environments and under different noise conditions. The overlap detection system significantly outperforms baseline performance in noisy conditions. Furthermore, applying overlap detection results to word-count estimation achieves 35% relative improvement over our previous efforts, which included speech enhancement using spectral subtraction and silence removal.


international conference on acoustics, speech, and signal processing | 2015

Prof-Life-Log: Analysis and classification of activities in daily audio streams

Ali Ziaei; Abhijeet Sangwan; Lakshmish Kaushik; John H. L. Hansen

A new method to analyze and classify daily activities in personal audio recordings (PARs) is presented. The method employs speech activity detection (SAD) and speaker diarization systems to provide high level semantic segmentation of the audio file. Subsequently, a number of audio, speech and lexical features are computed in order to characterize events in daily audio streams. The features are selected to capture the statistical properties of conversations, topics and turn-taking behavior, which creates a classification space that allows us to capture the differences in interactions. The proposed system is evaluated on 9 days of data from Prof-Life-Log corpus, which contains naturalistic long duration audio recordings (each file is collected continuously and lasts between 8-to-16 hours). Our experimental results show that the proposed system achieves good classification accuracy on a difficult real-world dataset.


Journal of the Acoustical Society of America | 2017

The Lombard effect observed in speech produced by cochlear implant users in noisy environments: A naturalistic study

Jaewook Lee; Hussnain Ali; Ali Ziaei; Emily A. Tobey; John H. L. Hansen

The Lombard effect is an involuntary response speakers experience in the presence of noise during voice communication. This phenomenon is known to cause changes in speech production such as an increase in intensity, pitch structure, formant characteristics, etc., for enhanced audibility in noisy environments. Although well studied for normal hearing listeners, the Lombard effect has received little, if any, attention in the field of cochlear implants (CIs). The objective of this study is to analyze speech production of CI users who are postlingually deafened adults with respect to environmental context. A total of six adult CI users were recruited to produce spontaneous speech in various realistic environments. Acoustic-phonetic analysis was then carried out to characterize their speech production in these environments. The Lombard effect was observed in the speech production of all CI users who participated in this study in adverse listening environments. The results indicate that both suprasegmental (e.g., F0, glottal spectral tilt and vocal intensity) and segmental (e.g., F1 for /i/ and /u/) features were altered in such environments. The analysis from this study suggests that modification of speech production of CI users under the Lombard effect may contribute to some degree an intelligible communication in adverse noisy environments.


Speech Communication | 2016

Effective word count estimation for long duration daily naturalistic audio recordings

Ali Ziaei; Abhijeet Sangwan; John H. L. Hansen

The ability to count words in extended audio sequences allows researchers to explore characteristics of speakers (i.e., leading, following, task responsibility, personal engagement), as well as the dynamics of two-way or multi-subject conversation scenarios. As such, counting the number of words spoken by a person, offers a rich information source for several applications such as health monitoring (e.g., Autism, Parkinsons, Alzheimers and etc), second language learning, or language development studies. However, developing robust word count systems that can achieve high performance with low computational cost is very challenging due to the uncertain and dynamic behavior experienced in audio recordings. In this study, we address the problem for large-scale naturalistic audio recordings based on a 100-day audio collection entitled (i.e., Prof-Life-Log). This corpus contains continuously recorded audio from one person using a mobile LENA audio recording device (LENA,2015). The device captures audio for an entire workday which can last up to 16 hours. Our proposed framework to address word count consists of five main components, (i) Speech Activity Detection(SAD) to remove non-speech parts of the signal, (ii) Speech Enhancement to suppress the effects of background noise, (iii) Primary vs. Secondary Speaker Detection to remove secondary speaker segments, (iv) Syllable Rate Estimation to estimate the syllable rate for the primary speaker, and (v) Linear Minimum Mean Square Error Estimation (LMMSE) to find the linear mapping between syllable rate and word rate in spontaneous speech. In spite of the simplicity of the framework, it shows to be very effective in real scenarios with good performance on various datasets. As an indication of performance, the error of the framework for an entire 16h day audio file can be as low as 1% in terms of cumulative Word Count Error.


international conference on acoustics, speech, and signal processing | 2015

Analysis of speech and language communication for cochlear implant users in noisy lombard conditions

Jaewook Lee; Hussnain Ali; Ali Ziaei; John H. L. Hansen

Acoustic/linguistic modification of speech production with respect to auditory feedback is an important research domain for robust human-to-human and human-to-machine communication. For instance, in the presence of environmental noise, a speaker experiences the well-known phenomenon termed as Lombard effect. Lombard effect has been well studied for normal hearing listeners as well as for automatic speech/speaker recognition systems. However, limited effort has been employed to study if the speech production of cochlear implant (CI) users is influenced by the auditory feedback. The purpose of this study is to analyze the speech production and natural language model of CI users with respect to environmental changes. A mobile personal audio recording from continuous single-session audio streams collected over an individuals daily life was used for our study. The findings from this study will provide fundamental knowledge on the characteristics of speech production under Lombard effect in CI users. These specific variations in speech production can be leveraged in new algorithm development and further applications in speech systems to benefit cochlear implant users.?


Journal of the Acoustical Society of America | 2016

Prof-Life-Log: Monitoring and assessment of human speech and acoustics using daily naturalistic audio streams

John H. L. Hansen; Abhijeet Sangwan; Ali Ziaei; Harishchandra Dubey; Lakshmish Kaushik; Chengzhu Yu

Speech technology advancements have progressed significantly in the last decade, yet major research challenges continue to impact effective advancements for diarization in naturalistic environments. Traditional diarization efforts have focused on single audio streams based on telephone communications, broadcast news, and/or scripted speeches or lectures. Limited effort has focused on extended naturalistic data. Here, algorithm advancements are established for an extensive daily audio corpus called Prof-Life-Log, consisting of + 80days of 8-16 hr recordings from an individual’s daily life. Advancements include the formulation of (i) an improved threshold-optimized multiple feature speech activity detector (TO-Combo-SAD), (ii) advanced primary vs. secondary speaker detection, (iii) advanced word-count system using part-of-speech tagging and bag-of-words construction, (iv) environmental “sniffing” advancements to identify location based on properties of the acoustic space, and (v) diarization interaction ana...


Journal of the Acoustical Society of America | 2014

Lombard effect based speech analysis across noisy environments for voice communications with cochlear implant subjects

Jaewook Lee; Hussnain Ali; Ali Ziaei; Jonh H. Hansen

Changes in speech production including vocal effort based on auditory feedback are an important research domain for improved human communication. For example, in the presence of environmental noise, a speaker experiences the well-known phenomenon known as Lombard effect. Lombard effect has been studied for normal hearing listeners as well as for automatic speech/speaker recognition systems, but not for cochlear implant (CI) recipients. The objective of this study is to analyze the speech production of CI users with respect to environmental change. We observe and study this effect using mobile personal audio recordings from continuous single-session audio streams collected over an individuals daily life. Prior advancements in this domain include the “Prof-Life-Log” longitudinal study at UTDallas. Four CI speakers participated by producing read and spontaneous speech in six naturalistic noisy environments (e.g., office, car, outdoor, cafeteria, etc.). A number of speech production parameters (e.g., short-t...

Collaboration


Dive into the Ali Ziaei's collaboration.

Top Co-Authors

Avatar

John H. L. Hansen

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Abhijeet Sangwan

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Hussnain Ali

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Jaewook Lee

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Lakshmish Kaushik

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Emily A. Tobey

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Chengzhu Yu

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Dongxin Xu

University of Colorado Boulder

View shared research outputs
Top Co-Authors

Avatar

Harishchandra Dubey

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Hynek Boril

University of Texas at Dallas

View shared research outputs
Researchain Logo
Decentralizing Knowledge