Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Abhijeet Sangwan is active.

Publication


Featured researches published by Abhijeet Sangwan.


international conference on acoustics, speech, and signal processing | 2013

Sentiment extraction from natural audio streams

Lakshmish Kaushik; Abhijeet Sangwan; John H. L. Hansen

Automatic sentiment extraction for natural audio streams containing spontaneous speech is a challenging area of research that has received little attention. In this study, we propose a system for automatic sentiment detection in natural audio streams such as those found in YouTube. The proposed technique uses POS (part of speech) tagging and Maximum Entropy modeling (ME) to develop a text-based sentiment detection model. Additionally, we propose a tuning technique which dramatically reduces the number of model parameters in ME while retaining classification capability. Finally, using decoded ASR (automatic speech recognition) transcripts and the ME sentiment model, the proposed system is able to estimate the sentiment in the YouTube video. In our experimental evaluation, we obtain encouraging classification accuracy given the challenging nature of the data. Our results show that it is possible to perform sentiment analysis on natural spontaneous speech data despite poor WER (word error rates).


international conference on acoustics, speech, and signal processing | 2013

Prof-Life-Log: Personal interaction analysis for naturalistic audio streams

Ali Ziaei; Abhijeet Sangwan; John H. L. Hansen

Analysis of personal audio recordings is a challenging and interesting subject. Using contemporary speech and language processing techniques, it is possible to mine personal audio recordings for a wealth of information that can be used to measure a persons engagement with their environment as well as other people. In this study, we propose an analysis system that uses personal audio recordings to automatically estimate the number of unique people and environments which encompass the total engagement within the recording. The proposed system uses speech activity detection (SAD), speaker diarization and environmental sniffing techniques, and is evaluated on naturalistic audio streams from the Prof-Life-Log corpus. We also report performance of the individual systems, and also present a combined analysis which reveals the interaction of the subject with both people and environment. Hence, this study establishes the efficacy and novelty of using contemporary speech technology for life logging applications.


asilomar conference on signals, systems and computers | 2010

Keyword recognition with phone confusion networks and phonological features based keyword threshold detection

Abhijeet Sangwan; John H. L. Hansen

In this study, a new keyword spotting system (KWS) that utilizes phone confusion networks (PCNs) is presented. The new system exploits the compactness and accuracy of phone confusion networks to deliver fast and accurate results. Special design considerations are provided within the new algorithm to account for phone recognizer induced insertion and deletion errors. Furthermore, this study proposes a new threshold estimation technique that uses the keyword constituent phones and phonological features (PFs) for threshold computation. The new threshold estimation technique is able to deliver thresholds that improves the overall F-score for keyword detection. The final integrated system is able to achieve a better balance between precision and recall.


IEEE Signal Processing Letters | 2007

Theoretical Complex Cepstrum of DCT and Warped DCT Filters

R. Muralishankar; Abhijeet Sangwan; Douglas D. O'Shaughnessy

In this letter, we derive the theoretical complex cepstrum (TCC) of the discrete cosine transform (DCT) and warped DCT (WDCT) filters. Using these derivations, we intend to develop an analytic model of the warped discrete cosine transform cepstrum (WDCTC), which was recently introduced as a speech processing feature. In our derivation, we start with the filter bank structure for the DCT, where each basis is represented by a finite impulse response (FIR) filter. The WDCT filter bank is obtained by substituting z-1 in the DCT filter bank with a first-order all-pass filter. Using the filter bank structures, we first derive the transfer functions for the DCT and WDCT, and subsequently, the TCC for each filter is computed. We analyze the DCT and WDCT filter transfer functions and the TCC by illustrating the corresponding pole-zero maps and cepstral sequences. Moreover, we also use the derived TCC expressions to compute the cepstral sequence for a synthetic vowel /aa/ where the observations on the theoretical cepstrum corroborate well with our practical findings


international conference on acoustics, speech, and signal processing | 2012

ProfLifeLog: Environmental analysis and keyword recognition for naturalistic daily audio streams

Abhijeet Sangwan; Ali Ziaei; John H. L. Hansen

This study presents keyword recognition evaluation on a new corpus named ProfLifeLog. ProfLifeLog is a collection of data captured on a portable audio recording device called the LENA unit. Each session in ProfLifeLog consists of 10+ hours of continuous audio recording that captures the work day of the speaker (person wearing the LENA unit). This study presents keyword spotting evaluation on the ProfLifeLog corpus using the PCN-KWS (phone confusion network-keyword spotting) algorithm [2]. The ProfLifeLog corpus contains speech data in a variety of noise backgrounds which is challenging for keyword recognition. In order to improve keyword recognition, this study also develops a front-end environment estimation strategy that uses the knowledge of speech-pause decisions and SNR (signal-to-noise ratio) to provide noise robustness. The combination of the PCN-KWS and the proposed front-end technique is evaluated on 1 hour of ProfLifeLog corpus. Our evaluation experiments demonstrate the effectiveness of the proposed technique as the number of false alarms in keyword recognition are reduced considerably.


ieee automatic speech recognition and understanding workshop | 2013

Automatic sentiment extraction from YouTube videos

Lakshmish Kaushik; Abhijeet Sangwan; John H. L. Hansen

Extracting speaker sentiment from natural audio streams such as YouTube is challenging. A number of factors contribute to the task difficulty, namely, Automatic Speech Recognition (ASR) of spontaneous speech, unknown background environments, variable source and channel characteristics, accents, diverse topics, etc. In this study, we build upon our previous work [5], where we had proposed a system for detecting sentiment in YouTube videos. Particularly, we propose several enhancements including (i) better text-based sentiment model due to training on larger and more diverse dataset, (ii) an iterative scheme to reduce sentiment model complexity with minimal impact on performance accuracy, (iii) better speech recognition due to superior acoustic modeling and focused (domain dependent) vocabulary/language models, and (iv) a larger evaluation dataset. Collectively, our enhancements provide an absolute 10% improvement over our previous system in terms of sentiment detection accuracy. Additionally, we also present analysis that helps understand the impact of WER (word error rate) on sentiment detection accuracy. Finally, we investigate the relative importance of different Parts-of-Speech (POS) tag features towards sentiment detection. Our analysis reveals the practicality of this technology and also provides several potential directions for future work.


IEEE Transactions on Audio, Speech, and Language Processing | 2013

Automatic Accent Assessment Using Phonetic Mismatch and Human Perception

Freddy William; Abhijeet Sangwan; John H. L. Hansen

In this study, a new algorithm for automatic accent evaluation of native and non-native speakers is presented. The proposed system consists of two main steps: alignment and scoring. In the alignment step, the speech utterance is processed using a Weighted Finite State Transducer (WFST) based technique to automatically estimate the pronunciation mismatches (substitutions, deletions, and insertions). Subsequently, in the scoring step, two scoring systems which utilize the pronunciation mismatches from the alignment phase are proposed: (i) a WFST-scoring system to measure the degree of accentedness on a scale from -1 (non-native like) to +1 (native like), and a (ii) Maximum Entropy (ME) based technique to assign perceptually motivated scores to pronunciation mismatches. The accent scores provided from the WFST-scoring system as well as the ME scoring system are termed as the WFST and P-WFST (perceptual WFST) accent scores, respectively. The proposed systems are evaluated on American English (AE) spoken by native and non-native (native speakers of Mandarin-Chinese) speakers from the CU-Accent corpus. A listener evaluation of 50 Native American English (N-AE) was employed to assist in validating the performance of the proposed accent assessment systems. The proposed P-WFST algorithm shows higher and more consistent correlation with human evaluated accent scores, when compared to the Goodness Of Pronunciation (GOP) measure. The proposed solution for accent classification and assessment based on WFST and P-WFST scores show that an effective advancement is possible which correlates well with human perception.


international conference on acoustics, speech, and signal processing | 2010

Automatic language analysis and identification based on speech production knowledge

Abhijeet Sangwan; Mahnoosh Mehrabani; John H. L. Hansen

A language analysis and classification system that leverages knowledge of speech production is proposed. The proposed scheme automatically extracts key production traits (or “hot-spots”) that are strongly tied to the underlying language structure. Particularly, the speech utterance is first parsed into consonant and vowel clusters. Subsequently, the production traits for each cluster is represented by the corresponding temporal evolution of speech articulatory states. It is hypothesized that a selection of these production traits are strongly tied to the underlying language, and can be exploited for language ID. The new scheme is evaluated on our South Indian Languages (SInL) corpus which consists of 5 closely related languages spoken in India, namely, Kannada, Tamil, Telegu, Malayalam, and Marathi. Good accuracy is achieved with a rate of 65% obtained in a difficult 5-way classification task with about 4sec of train and test speech data per utterance. Furthermore, the proposed scheme is also able to automatically identify key production traits of each language (e.g., dominant vowels, stop-consonants, fricatives etc.).


conference of the international speech communication association | 2016

A Speaker Diarization System for Studying Peer-Led Team Learning Groups.

Harishchandra Dubey; Lakshmish Kaushik; Abhijeet Sangwan; John H. L. Hansen

Peer-led team learning (PLTL) is a model for teaching STEM courses where small student groups meet periodically to collaboratively discuss coursework. Automatic analysis of PLTL sessions would help education researchers to get insight into how learning outcomes are impacted by individual participation, group behavior, team dynamics, etc.. Towards this, speech and language technology can help, and speaker diarization technology will lay the foundation for analysis. In this study, a new corpus is established called CRSS-PLTL, that contains speech data from 5 PLTL teams over a semester (10 sessions per team with 5-to-8 participants in each team). In CRSS-PLTL, every participant wears a LENA device (portable audio recorder) that provides multiple audio recordings of the event. Our proposed solution is unsupervised and contains a new online speaker change detection algorithm, termed G 3 algorithm in conjunction with Hausdorff-distance based clustering to provide improved detection accuracy. Additionally, we also exploit cross channel information to refine our diarization hypothesis. The proposed system provides good improvements in diarization error rate (DER) over the baseline LIUM system. We also present higher level analysis such as the number of conversational turns taken in a session, and speaking-time duration (participation) for each speaker.


2015 IEEE Signal Processing and Signal Processing Education Workshop (SP/SPE) | 2015

Studying the relationship between physical and language environments of children: Who's speaking to whom and where?

Abhijeet Sangwan; John H. L. Hansen; Dwight W. Irvin; Stephen A. Crutchfield; Charles R. Greenwood

Understanding the language environments of early learners is critical in facilitating school success. Increasingly large scale projects (e.g., Providence Talks, Bridging the Word Gap) are investigating the language environments of young children in an attempt to better understand and facilitate language acquisition and development. The primary tool used to collect and analyze data related to the language environments of young learners is the LENA digital language processor (DLP). LENA allows for the continuous capture of language, primarily focused on a single child to adult interactions for up to 16 hrs. Subsequent analysis of the audio using spoken language technology (SLT) provides meaningful metrics such as total adult word count and conversational turns. One shortcoming of collecting continuous audio alone is that the physical context of adult-to-child or child-to-child communication is lost. In this study, we describe our recent data collection effort which combines the LENA and Ubisense sensors to allow for simultaneous capture of both spacial information along with speech and time. We are particularly interested in researching the relationship between the physical and language environments of children. In this study, we describe our collection methodology, results from initial probe experiments and our latest efforts in developing relevant SLT metrics. The new data and techniques described in this study can help in developing a richer understanding of how physical environments promote or encourage communication in early childhood classrooms. In theory, such speech and location technology can contribute to the design of future learning spaces specifically designed for typically developing children, or those with or at-risk for disabilities.

Collaboration


Dive into the Abhijeet Sangwan's collaboration.

Top Co-Authors

Avatar

John H. L. Hansen

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Lakshmish Kaushik

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Ali Ziaei

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Harishchandra Dubey

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Chengzhu Yu

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Hynek Boril

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Taufiq Hasan

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge