Shammur Absar Chowdhury
University of Trento
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shammur Absar Chowdhury.
international conference on acoustics, speech, and signal processing | 2015
Shammur Absar Chowdhury; Morena Danieli; Giuseppe Riccardi
Overlapping speech is a common and relevant phenomenon in human conversations, reflecting many aspects of discourse dynamics. In this paper, we focus on the pragmatic role of overlaps in turn-in-progress, where it can be categorized as competitive or non-competitive. Previous studies on these two categories have mostly relied on controlled scenarios and small datasets. In our study, we focus on call center data, with customers and operators engaged in problem-solving tasks. We propose and evaluate an annotation scheme for these two overlap categories in the context of spontaneous and in-vivo human conversations. We analyze the distinctive predictive characteristics of a very large set of high-dimensional acoustic feature. We obtained a significant improvement in classification results as well as significant reduction in the feature set size.
conference of the international speech communication association | 2016
Shammur Absar Chowdhury; Evgeny A. Stepanov; Giuseppe Riccardi
User satisfaction is an important aspect of the user experience while interacting with objects, systems or people. Traditionally user satisfaction is evaluated a-posteriori via spoken or written questionnaires or interviews. In automatic behavioral analysis we aim at measuring the user emotional states and its descriptions as they unfold during the interaction. In our approach, user satisfaction is modeled as the final state of a sequence of emotional states and given ternary values positive, negative, neutral. In this paper, we investigate the discriminating power of turn-taking in predicting user satisfaction in spoken conversations. Turn-taking is used for discourse organization of a conversation by means of explicit phrasing, intonation, and pausing. In this paper, we train different characterization of turn-taking, such as competitiveness of the speech overlaps. To extract turn-taking features we design a turn segmentation and labeling system that incorporates lexical and acoustic information. Given a human-human spoken dialog, our system automatically infers any of the three values of the state of the user satisfaction. We evaluate the classification system on real-life call-center human-human dialogs. The comparative performance analysis shows that the contribution of the turn-taking features outperforms both prosodic and lexical features.
international conference on acoustics, speech, and signal processing | 2016
Giuseppe Riccardi; Evgeny A. Stepanov; Shammur Absar Chowdhury
Discourse parsing is an important task in Language Understanding with applications to human-human and human-machine communication modeling. However, most of the research has focused on written text, and parsers heavily rely on syntactic parsers that themselves have low performance on dialog data. In our work, we address the problem of analyzing the semantic relations between discourse units in human-human spoken conversations. In particular, in this paper we focus on the detection of discourse connectives which are the predicate of such relations. The discourse relations are drawn from the Penn Discourse Treebank annotation model and adapted to a domain-specific Italian human-human spoken conversations. We study the relevance of lexical and acoustic context in predicting discourse connectives. We observe that both lexical and acoustic context have mixed effect on the prediction of specific connectives. While the oracle of using lexical and acoustic contextual feature combinations is F1 = 68.53, the lexical context alone significantly outperforms the baseline by more than 10 points with F1 = 64.93.
language resources and evaluation | 2018
Evgeny A. Stepanov; Shammur Absar Chowdhury; Ali Orkan Bayer; Arindam Ghosh; Ioannis Klasinas; Marcos Calvo; Emilio Sanchis; Giuseppe Riccardi
AbstractModern data-driven spoken language systems (SLS) require manual semantic annotation for training spoken language understanding parsers. Multilingual porting of SLS demands significant manual effort and language resources, as this manual annotation has to be replicated. Crowdsourcing is an accessible and cost-effective alternative to traditional methods of collecting and annotating data. The application of crowdsourcing to simple tasks has been well investigated. However, complex tasks, like cross-language semantic annotation transfer, may generate low judgment agreement and/or poor performance. The most serious issue in cross-language porting is the absence of reference annotations in the target language; thus, crowd quality control and the evaluation of the collected annotations is difficult. In this paper we investigate targeted crowdsourcing for semantic annotation transfer that delegates to crowds a complex task such as segmenting and labeling of concepts taken from a domain ontology; and evaluation using source language annotation. To test the applicability and effectiveness of the crowdsourced annotation transfer we have considered the case of close and distant language pairs: Italian–Spanish and Italian–Greek. The corpora annotated via crowdsourcing are evaluated against source and target language expert annotations. We demonstrate that the two evaluation references (source and target) highly correlate with each other; thus, drastically reduce the need for the target language reference annotations.
computer and information technology | 2016
Firoj Alam; Shammur Absar Chowdhury; Sheak Rashed Haider Noori
Part-of-speech (POS) information is one of the fundamental components in the natural language processing pipeline, which helps in extracting higher-level information such as named entities, discourse, and syntactic structure of a sentence. For some languages, such as English, Dutch, and Chinese, it is considered as a solved problem due to the higher accuracy (97%) of the predicted system. Significant efforts have been made for such languages in terms of making the data publicly accessible and also organizing evaluation campaigns. Compared to that there are very fewer efforts for Bangla (ethnonym: Bangla; exonym: Bengali). In this paper, we present a knowledge poor approach for POS tagging, which we evaluated using publicly accessible dataset from LDC. The motivation of our approach is that we did not want to rely on any existing resources such as lexicon or named entity recognizer for designing the system as they are not publicly available and difficult to develop. We have not used any handcrafted features, rather we employed distributed representations of word and characters. We designed the system using Long Short Term Memory (LSTM) neural networks followed by Conditional Random Fields (CRFs) for designing the model with an inclusion of pre-trained word embedded model. We obtained promising results with an accuracy of 86.0%.
conference of the international speech communication association | 2014
Shammur Absar Chowdhury; Giuseppe Riccardi; Firoj Alam
conference of the international speech communication association | 2015
Shammur Absar Chowdhury; Morena Danieli; Giuseppe Riccardi
conference of the international speech communication association | 2014
Shammur Absar Chowdhury; Arindam Ghosh; Evgeny A. Stepanov; Ali Orkan Bayer; Giuseppe Riccardi; Ioannis Klasinas
language resources and evaluation | 2016
Shammur Absar Chowdhury; Evgeny A. Stepanov; Giuseppe Riccardi
conference of the international speech communication association | 2013
Giuseppe Riccardi; Arindam Ghosh; Shammur Absar Chowdhury; Ali Orkan Bayer