Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Lakshmish Kaushik is active.

Publication


Featured researches published by Lakshmish Kaushik.


international conference on acoustics, speech, and signal processing | 2013

Sentiment extraction from natural audio streams

Lakshmish Kaushik; Abhijeet Sangwan; John H. L. Hansen

Automatic sentiment extraction for natural audio streams containing spontaneous speech is a challenging area of research that has received little attention. In this study, we propose a system for automatic sentiment detection in natural audio streams such as those found in YouTube. The proposed technique uses POS (part of speech) tagging and Maximum Entropy modeling (ME) to develop a text-based sentiment detection model. Additionally, we propose a tuning technique which dramatically reduces the number of model parameters in ME while retaining classification capability. Finally, using decoded ASR (automatic speech recognition) transcripts and the ME sentiment model, the proposed system is able to estimate the sentiment in the YouTube video. In our experimental evaluation, we obtain encouraging classification accuracy given the challenging nature of the data. Our results show that it is possible to perform sentiment analysis on natural spontaneous speech data despite poor WER (word error rates).


ieee automatic speech recognition and understanding workshop | 2013

Automatic sentiment extraction from YouTube videos

Lakshmish Kaushik; Abhijeet Sangwan; John H. L. Hansen

Extracting speaker sentiment from natural audio streams such as YouTube is challenging. A number of factors contribute to the task difficulty, namely, Automatic Speech Recognition (ASR) of spontaneous speech, unknown background environments, variable source and channel characteristics, accents, diverse topics, etc. In this study, we build upon our previous work [5], where we had proposed a system for detecting sentiment in YouTube videos. Particularly, we propose several enhancements including (i) better text-based sentiment model due to training on larger and more diverse dataset, (ii) an iterative scheme to reduce sentiment model complexity with minimal impact on performance accuracy, (iii) better speech recognition due to superior acoustic modeling and focused (domain dependent) vocabulary/language models, and (iv) a larger evaluation dataset. Collectively, our enhancements provide an absolute 10% improvement over our previous system in terms of sentiment detection accuracy. Additionally, we also present analysis that helps understand the impact of WER (word error rate) on sentiment detection accuracy. Finally, we investigate the relative importance of different Parts-of-Speech (POS) tag features towards sentiment detection. Our analysis reveals the practicality of this technology and also provides several potential directions for future work.


conference of the international speech communication association | 2016

A Speaker Diarization System for Studying Peer-Led Team Learning Groups.

Harishchandra Dubey; Lakshmish Kaushik; Abhijeet Sangwan; John H. L. Hansen

Peer-led team learning (PLTL) is a model for teaching STEM courses where small student groups meet periodically to collaboratively discuss coursework. Automatic analysis of PLTL sessions would help education researchers to get insight into how learning outcomes are impacted by individual participation, group behavior, team dynamics, etc.. Towards this, speech and language technology can help, and speaker diarization technology will lay the foundation for analysis. In this study, a new corpus is established called CRSS-PLTL, that contains speech data from 5 PLTL teams over a semester (10 sessions per team with 5-to-8 participants in each team). In CRSS-PLTL, every participant wears a LENA device (portable audio recorder) that provides multiple audio recordings of the event. Our proposed solution is unsupervised and contains a new online speaker change detection algorithm, termed G 3 algorithm in conjunction with Hausdorff-distance based clustering to provide improved detection accuracy. Additionally, we also exploit cross channel information to refine our diarization hypothesis. The proposed system provides good improvements in diarization error rate (DER) over the baseline LIUM system. We also present higher level analysis such as the number of conversational turns taken in a session, and speaking-time duration (participation) for each speaker.


international conference on acoustics, speech, and signal processing | 2015

Prof-Life-Log: Analysis and classification of activities in daily audio streams

Ali Ziaei; Abhijeet Sangwan; Lakshmish Kaushik; John H. L. Hansen

A new method to analyze and classify daily activities in personal audio recordings (PARs) is presented. The method employs speech activity detection (SAD) and speaker diarization systems to provide high level semantic segmentation of the audio file. Subsequently, a number of audio, speech and lexical features are computed in order to characterize events in daily audio streams. The features are selected to capture the statistical properties of conversations, topics and turn-taking behavior, which creates a classification space that allows us to capture the differences in interactions. The proposed system is evaluated on 9 days of data from Prof-Life-Log corpus, which contains naturalistic long duration audio recordings (each file is collected continuously and lasts between 8-to-16 hours). Our experimental results show that the proposed system achieves good classification accuracy on a difficult real-world dataset.


international conference on signal processing | 2004

Time-scaling of speech and music using independent subspace analysis

R. Muralishankar; Lakshmish Kaushik; A. G. Ramakrishnan

We propose a new technique for modifying the time-scale of speech and music using independent subspace analysis (ISA). To carry out ISA, the single channel mixture signal is converted to a time-frequency representation such as spectrogram. The spectrogram is generated by taking Hartley or wavelet transform on overlapped frames of speech or music. We do dimensionality reduction of the autocorrelated original spectrogram using singular value decomposition. Then, we use independent component analysis to get unmixing matrix using JadeICA algorithm. It is then assumed that the overall spectrogram results from the superposition of a number of unknown statistically independent spectrograms. By using unmixing matrix, independent sources such as temporal amplitude envelopes and frequency weights can be extracted from the spectrogram. Time-scaling of speech and music is carried out by resampling the independent temporal amplitude envelopes. We then multiply the independent frequency weights with time-scaled temporal amplitude envelopes. We Sum these independent spectrograms and take inverse Hartley or wavelet transform of the sum spectrogram. The reconstructed time-domain signal is overlap-added to get the time-scaled signal. The quality of the time-scaled speech and music has been analyzed using Modified Bark spectral distortion (MBSD). From the MBSD score, one can infer that the time-scaled signal is less distorted.


IEEE Transactions on Audio, Speech, and Language Processing | 2017

Automatic Sentiment Detection in Naturalistic Audio

Lakshmish Kaushik; Abhijeet Sangwan; John H. L. Hansen

Audio sentiment analysis using automatic speech recognition is an emerging research area where opinion or sentiment exhibited by a speaker is detected from natural audio. It is relatively underexplored when compared to text based sentiment detection. Extracting speaker sentiment from natural audio sources is a challenging problem. Generic methods for sentiment extraction generally use transcripts from a speech recognition system, and process the transcript using text-based sentiment classifiers. In this study, we show that this baseline system is suboptimal for audio sentiment extraction. Alternatively, new architecture using keyword spotting (KWS) is proposed for sentiment detection. In the new architecture, a text-based sentiment classifier is utilized to automatically determine the most useful and discriminative sentiment-bearing keyword terms, which are then used as a term list for KWS. In order to obtain a compact yet discriminative sentiment term list, iterative feature optimization for maximum entropy sentiment model is proposed to reduce model complexity while maintaining effective classification accuracy. A new hybrid ME-KWS joint scoring methodology is developed to model both text and audio based parameters in a single integrated formulation. For evaluation, two new databases are developed for audio based sentiment detection, namely, YouTube sentiment database and another newly developed corpus called UT-Opinion Opinion audio archive. These databases contain naturalistic opinionated audio collected in real-world conditions. The proposed solution is evaluated on audio obtained from videos in youtube.com and UT-Opinion corpus. Our experimental results show that the proposed KWS based system significantly outperforms the traditional ASR architecture in detecting sentiment for challenging practical tasks.


Journal of the Acoustical Society of America | 2018

UTDallas-PLTL:Advancing multi-stream speech processing for interaction assessment in peer-led team learning

John H. L. Hansen; Harishchandra Dubey; Abhijeet Sangwan; Lakshmish Kaushik; Vinay Kothapally

Robust speech processing for single-stream audio data has achieved significant progress in the last decade. However, multi-stream speech processing poses new challenges not present in single-stream data. The peer-led team learning (PLTL) is a teaching paradigm popular among US universities for undergraduate education in STEM courses. In collaboration with UTDallas Student Success Center, we collected CRSS-PLTL and CRSS-PLTL-II corpora for assessment of speech communications in PLTL sessions. Both corpora consist of longitudinal recordings of five teams studying undergraduate Chemistry and Calculus courses consisting of 300 hours of speech data. The multi-stream audio data has unique challenges: (i) time-synchronization; (ii) multi-stream speech processing for speech activity detection, speaker diarization and linking, speech recognition, and (iii) behavioral informatics. We used a 1 kHz tone at the start and end of each session for time-synchronization of multi-stream audio. We leveraged auto-encoder neural network for combining MFCC features from multiple streams into compact bottleneck features. After diarization, each speaker segment is analyzed for behavioral metrics such as (i) dominance; (ii) curiosity in terms of question inflections; (iii) speech rate; (iv) cohesion; and (v) turn-duration and turn-taking patterns. Results are presented on individual and team based conversational interactions. This research suggests new emerging opportunities for wearable speech systems in education research.Robust speech processing for single-stream audio data has achieved significant progress in the last decade. However, multi-stream speech processing poses new challenges not present in single-stream data. The peer-led team learning (PLTL) is a teaching paradigm popular among US universities for undergraduate education in STEM courses. In collaboration with UTDallas Student Success Center, we collected CRSS-PLTL and CRSS-PLTL-II corpora for assessment of speech communications in PLTL sessions. Both corpora consist of longitudinal recordings of five teams studying undergraduate Chemistry and Calculus courses consisting of 300 hours of speech data. The multi-stream audio data has unique challenges: (i) time-synchronization; (ii) multi-stream speech processing for speech activity detection, speaker diarization and linking, speech recognition, and (iii) behavioral informatics. We used a 1 kHz tone at the start and end of each session for time-synchronization of multi-stream audio. We leveraged auto-encoder neur...


Journal of the Acoustical Society of America | 2018

Fearless steps: Advancing speech and language processing for naturalistic audio streams from Earth to the Moon with Apollo

John H. L. Hansen; Abhijeet Sangwan; Lakshmish Kaushik; Chengzhu Yu

NASA’s Apollo program represents one of the greatest achievements of mankind in the 20th century. CRSS-UTDallas has completed an effort to digitize and establish an Apollo audio corpus. The entire Apollo mission speech data consists of well over ~100,000 hours. The focus of this effort is to contribute to the development of Spoken Language Technology based algorithms to analyze and understand various aspects of conversational speech. Towards achieving this goal, a new 30 track analog audio decoder was designed using NASA Soundscriber. We have digitized 19,000 hours of data from Apollo 11,13,1 missions: named “Fearless Steps”. An automated diarization and transcript generation solution was developed based on deep neural networks (DNN) automatic speech recognition (ASR) along with Apollo mission specific language models. Demonstration of speech technologies including speech activity detection (SAD), speaker identification (SID), and ASR are shown for segments of the corpus. We will release this corpus to the SLT community. The data provide an opportunity for challenging tasks in various SLT areas. We have also defined and proposed 5 tasks as a part of a community based SLT challenge. The five challenges are as follows: (1) automatic speech recognition, (2) speaker identification, (3) speech activity detection, (4) speaker diarization, and (5) keyword spotting and joint topic/sentiment detection. All data, transcripts, and guidelines for employing the fearless steps corpus will be made freely available to the community.NASA’s Apollo program represents one of the greatest achievements of mankind in the 20th century. CRSS-UTDallas has completed an effort to digitize and establish an Apollo audio corpus. The entire Apollo mission speech data consists of well over ~100,000 hours. The focus of this effort is to contribute to the development of Spoken Language Technology based algorithms to analyze and understand various aspects of conversational speech. Towards achieving this goal, a new 30 track analog audio decoder was designed using NASA Soundscriber. We have digitized 19,000 hours of data from Apollo 11,13,1 missions: named “Fearless Steps”. An automated diarization and transcript generation solution was developed based on deep neural networks (DNN) automatic speech recognition (ASR) along with Apollo mission specific language models. Demonstration of speech technologies including speech activity detection (SAD), speaker identification (SID), and ASR are shown for segments of the corpus. We will release this corpus to th...


international conference on asian digital libraries | 2016

Toward Access to Multi-Perspective Archival Spoken Word Content

Douglas W. Oard; John H. L. Hansen; Abhijeet Sangawan; Bryan Toth; Lakshmish Kaushik; Chengzhu Yu

During the mid-twentieth century Apollo missions to the Moon, dozens of intercommunication and telecommunication voice channels were recorded for historical purposes in the Mission Control Center. These recordings are now being digitized. This paper describes initial experiments with integration of multi-channel audio into a mission reconstruction system, and it describes work in progress on the development of more advanced user experience designs.


Journal of the Acoustical Society of America | 2016

Prof-Life-Log: Monitoring and assessment of human speech and acoustics using daily naturalistic audio streams

John H. L. Hansen; Abhijeet Sangwan; Ali Ziaei; Harishchandra Dubey; Lakshmish Kaushik; Chengzhu Yu

Speech technology advancements have progressed significantly in the last decade, yet major research challenges continue to impact effective advancements for diarization in naturalistic environments. Traditional diarization efforts have focused on single audio streams based on telephone communications, broadcast news, and/or scripted speeches or lectures. Limited effort has focused on extended naturalistic data. Here, algorithm advancements are established for an extensive daily audio corpus called Prof-Life-Log, consisting of + 80days of 8-16 hr recordings from an individual’s daily life. Advancements include the formulation of (i) an improved threshold-optimized multiple feature speech activity detector (TO-Combo-SAD), (ii) advanced primary vs. secondary speaker detection, (iii) advanced word-count system using part-of-speech tagging and bag-of-words construction, (iv) environmental “sniffing” advancements to identify location based on properties of the acoustic space, and (v) diarization interaction ana...

Collaboration


Dive into the Lakshmish Kaushik's collaboration.

Top Co-Authors

Avatar

John H. L. Hansen

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Abhijeet Sangwan

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Chengzhu Yu

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Ali Ziaei

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Harishchandra Dubey

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

A. G. Ramakrishnan

Indian Institute of Science

View shared research outputs
Top Co-Authors

Avatar

Douglas D. O'Shaughnessy

Institut national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Abhijeet Sangawan

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Ahmet Emin Bulut

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Vinay Kothapally

University of Texas at Dallas

View shared research outputs
Researchain Logo
Decentralizing Knowledge