Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sree Harsha Yella is active.

Publication


Featured researches published by Sree Harsha Yella.


international conference on acoustics, speech, and signal processing | 2013

Improved overlap speech diarization of meeting recordings using long-term conversational features

Sree Harsha Yella

Overlapping speech is a source of significant errors in speaker diarization of spontaneous meeting recordings. Recent works on speaker diarization have attempted to solve the problem of overlap detection using classifiers trained on acoustic and spatial features. This paper proposes a method to improve the short-term spectral feature based overlap detector by incorporating information from long-term conversational features in the form of speaker change statistics. The statistics are obtained at segment level(around few seconds) from the output of a diarization system. The approach is motivated by the observation that segments containing more speaker changes are more probable to have more overlaps. Experiments on AMI meeting corpus reveal that the number of overlaps in a segment follows a Poisson distribution whose rate is directly proportional to the number of speaker changes in the segment. When this information is combined with acoustic information in an HMM/GMM overlap detector, improvements are verified in terms of F-measure and consequently, diarization error (DER) is reduced by 5% relative to the baseline overlap detector.


IEEE Transactions on Audio, Speech, and Language Processing | 2014

Overlapping speech detection using long-term conversational features for speaker diarization in meeting room conversations

Sree Harsha Yella

Overlapping speech has been identified as one of the main sources of errors in diarization of meeting room conversations. Therefore, overlap detection has become an important step prior to speaker diarization. Studies on conversational analysis have shown that overlapping speech is more likely to occur at specific parts of a conversation. They have also shown that overlap occurrence is correlated with various conversational features such as speech, silence patterns and speaker turn changes. We use features capturing this higher level information from structure of a conversation such as silence and speaker change statistics to improve acoustic feature based classifier of overlapping and single-speaker speech classes. The silence and speaker change statistics are computed over a long-term window (around 3-4 seconds) and are used to predict the probability of overlap in the window. These estimates are then incorporated into a acoustic feature based classifier as prior probabilities of the classes. Experiments conducted on three corpora (AMI, NIST-RT and ICSI) have shown that the proposed method improves the performance of acoustic feature-based overlap detector on all the corpora. They also reveal that the model based on long-term conversational features used to estimate probability of overlap which is learned from AMI corpus generalizes to meetings from other corpora (NIST-RT and ICSI). Moreover, experiments on ICSI corpus reveal that the proposed method also improves laughter overlap detection. Consequently, applying overlap handling techniques to speaker diarization using the detected overlap results in reduction of diarization error rate (DER) on all the three corpora.


spoken language technology workshop | 2010

Significance of anchor speaker segments for constructing extractive audio summaries of broadcast news

Sree Harsha Yella; Vasudeva Varma; Kishore Prahallad

Analysis of human reference summaries of broadcast news showed that humans give preference to anchor speaker segments while constructing a summary. Therefore, we exploit the role of anchor speaker in a news show by tracking his/her speech to construct indicative/informative extractive audio summaries. Speaker tracking is done by Bayesian information criterion (BIC) technique. The proposed technique does not require Automatic Speech Recognition (ASR) transcripts or human reference summaries for training. The objective evaluation by ROUGE showed that summaries generated by the proposed technique are as good as summaries generated by a baseline text summarization system taking manual transcripts as input and summaries generated by a supervised speech summarization system trained using human summaries. The subjective evaluation of audio summaries by humans showed that they prefer summaries generated by proposed technique to summaries generated by supervised speech summarization system.


international conference on acoustics, speech, and signal processing | 2014

Information bottleneck based speaker diarization of meetings using non-speech as side information

Sree Harsha Yella

Background noise and errors in speech/non-speech detection cause significant degradation to the output of a speaker diarization system. In a typical speaker diarization system, non-speech segments are excluded prior to unsupervised clustering. In the current study, we exploit the information present in the non-speech segments of a recording to improve the output of the speaker diarization system based on information bottleneck framework. This is achieved by providing information from non-speech segments as side (irrelevant) information to information bottleneck based clustering. Experiments on meeting recordings from RT 06, 07, 09, evaluation sets have shown that the proposed method decreases the diarization error rate by around 18% relative to the baseline speaker diarization system based on information bottleneck framework. Comparison with a state of the art system based on HMM/GMM framework shows that the proposed method significantly decreases the gap in performance between the information bottleneck system and HMM/GMM system.


systems, man and cybernetics | 2011

Understanding social signals in multi-party conversations: Automatic recognition of socio-emotional roles in the AMI meeting corpus

Alessandro Vinciarelli; Fabio Valente; Sree Harsha Yella; Ashtosh Sapru

Any social interaction is characterized by roles, patterns of behavior recognized as such by the interacting participants and corresponding to shared expectations that people hold about their own behavior as well as the behavior of others. In this respect, social roles are a key aspect of social interaction because they are the basis for making reasonable guesses about human behavior. Recognizing roles is a crucial need towards understanding (possibly in an automatic way) any social exchange, whether this means to identify dominant individuals, detect conflict, assess engagement or spot conversation highlights. This work presents an investigation on language-independent automatic social role recognition in AMI meetings, spontaneous multi-party conversations, based solely on turn organization and prosodic features. At first turn-taking statistics and prosodic features are integrated into a single generative conversation model which achieves an accuracy of 59%. This model is then extended to explicitly account for dependencies (or influence) between speakers achieving an accuracy of 65%. The last contribution consists in investigating the statistical dependency between the formal and the social role that participants have; integrating the information related to the formal role in the recognition model achieves an accuracy of 68%. The paper is concluded highlighting some future directions.


international conference on acoustics, speech, and signal processing | 2014

Improving speaker diarization using social role information

Ashtosh Sapru; Sree Harsha Yella

Speaker diarization systems for meetings commonly model acoustic and spatial information, ignoring that meetings are instances of human interactions. Recent studies have shown that social roles influence the interaction patterns of speakers. This paper proposes a novel method to integrate social roles information in the speaker diarization framework. First, we modify the minimum duration constraint in baseline diarization system by using role information to model the expected duration of speakers turn. Furthermore, we also propose a social role n-gram model as prior information on speaker interaction patterns. The proposed method is integrated in the state-of-the-art diarization system to reduce the speaker error. Experiments are performed on AMI corpus which is annotated in terms of social roles. The proposed method reduces the speaker error by 16% relative to baseline HMM-GMM system. Furthermore, the paper also investigates the performance of the proposed method on other meeting scenarios like those from NIST Rich Transcription campaigns. Experiments on Rich Transcription meetings reveal that speaker error can be reduced by 13% relative to the baseline system, thus demonstrating the potential of the proposed method.


conference of the international speech communication association | 2012

Automatic detection of conflict escalation in spoken conversations

Samuel Kim; Sree Harsha Yella; Fabio Valente


conference of the international speech communication association | 2012

Speaker diarization of overlapping speech based on silence distribution in meeting recordings

Sree Harsha Yella; Fabio Valente


conference of the international speech communication association | 2015

A comparison of neural network feature transforms for speaker diarization.

Sree Harsha Yella; Andreas Stolcke


conference of the international speech communication association | 2011

Information Bottleneck Features for HMM/GMM Speaker Diarization of Meetings Recordings

Sree Harsha Yella; Fabio Valente

Collaboration


Dive into the Sree Harsha Yella's collaboration.

Top Co-Authors

Avatar

Fabio Valente

Idiap Research Institute

View shared research outputs
Top Co-Authors

Avatar

Kishore Prahallad

International Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar

Vasudeva Varma

International Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar

Ashtosh Sapru

Idiap Research Institute

View shared research outputs
Top Co-Authors

Avatar

Petr Motlicek

Idiap Research Institute

View shared research outputs
Top Co-Authors

Avatar

Samuel Kim

Idiap Research Institute

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge