Manjira Sinha | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Manjira Sinha is active.

Explore More

Publication

Featured researches published by Manjira Sinha.

acm conference on hypertext | 2017

Multi-part Representation Learning For Cross-domain Web Content Classification Using Neural Networks

Ganesh J; Himanshu Sharad Bhatt; Manjira Sinha; Shourya Roy

Owing to the tremendous increase in the volume and variety of user generated content, train-once-apply-forever models are insufficient for supervised learning tasks. The need is to develop algorithms that can adapt across domains by leveraging labeled data from source domain(s) and efficiently perform the task in the unlabeled target domain. Towards this, we present a novel two-stage neural network learning algorithm for domain adaptation which learns a multi-part hidden layer where individual parts contribute differently to the tasks in source and target domains. The multiple parts of the representation (i.e. hidden layer) are learned while being cognizant of what characteristics to transfer across domains and what to preserve within domains for enhanced performance. The first stage embroils around learning a two-part representation i.e. source specific and common representations in a manner such that the former do not detract the ability of the later to represent the target domain. In the second stage, the generalized common representation is further iteratively extended with discriminating target specific characteristics to adapt to the target domain. We empirically demonstrate that the learned representations, in different arrangements, outperform existing domain adaptation algorithms in the source classification as well as the cross-domain classification tasks on the user generated content from different domains on the web.

pacific-asia conference on knowledge discovery and data mining | 2017

Multi-task Representation Learning for Enhanced Emotion Categorization in Short Text

Anirban Sen; Manjira Sinha; Sandya Mannarswamy; Shourya Roy

Embedding based dense contextual representations of data have proven to be efficient in various NLP tasks as they alleviate the burden of heavy feature engineering. However, generalized representation learning approaches do not capture the task specific subtleties. In addition, often the computational model for each task is developed in isolation, overlooking the interrelation among certain NLP tasks. Given that representation learning typically requires a good amount of labeled annotated data which is scarce, it is essential to explore learning embedding under supervision of multiple related tasks jointly and at the same time, incorporating the task specific attributes too. Inspired by the basic premise of multi-task learning, which supposes that correlation between related tasks can be used to improve classification, we propose a novel technique for building jointly learnt task specific embeddings for emotion and sentiment prediction tasks. Here, a sentiment prediction task acts as an auxiliary input to enhance the primary emotion prediction task. Our experimental results demonstrate that embeddings learnt under supervised signals of two related tasks, outperform embeddings learnt in a uni-tasked setup for the downstream task of emotion prediction.

conference on information and knowledge management | 2018

Mining & Summarizing E-petitions for Enhanced Understanding of Public Opinion

Shreshtha Mundra; Sachin Kumar; Manjira Sinha; Sandya Mannarswamy

Today electronic communications have become the prime medium for people to express their opinions and influence the policy preferences. One such popular channel reflecting the voice of the masses is electronic petitions. To understand peoples perspective on various issues it is important to know what petitions say. However, due to the sheer volume of the petitions it is difficult to process each petition manually. As each petition talks about a different issue, prioritizing on over other is difficult. To alleviate these challenges, we present an end to end system for generating comprehensive and concise summaries from e-petitions. A petition contains multiple aspects, the core problem, evidence(s) in support of the problem and potential solutions. Therefore, it is imperative that an useful summary should contain information about all these three aspects explicitly. To achieve this, our system generates three aspect based summaries for each petition for better understanding. We also introduce a new annotated petition dataset, developed through crowd-sourcing, that served as gold standard. Our model is tested through quantitative and qualitative evaluations.

pacific-asia conference on knowledge discovery and data mining | 2017

Fine-Grained Emotion Detection in Contact Center Chat Utterances

Shreshtha Mundra; Anirban Sen; Manjira Sinha; Sandya Mannarswamy; Sandipan Dandapat; Shourya Roy

Contact center chats are textual conversations involving customers and agents on queries, issues, grievances etc. about products and services. Contact centers conduct periodic analysis of these chats to measure customer satisfaction, of which the chat emotion forms one crucial component. Typically, these measures are performed at chat level. However, retrospective chat-level analysis is not sufficiently actionable for agents as it does not capture the variation in the emotion distribution across the chat. Towards that, we propose two novel weakly supervised approaches for detecting fine-grained emotions in contact center chat utterances in real time. In our first approach, we identify novel contextual and meta features and treat the task of emotion prediction as a sequence labeling problem. In second approach, we propose a neural net based method for emotion prediction in call center chats that does not require extensive feature engineering. We establish the effectiveness of the proposed methods by empirically evaluating them on a real-life contact center chat dataset. We achieve average accuracy of the order 72.6% with our first approach and 74.38% with our second approach respectively.

international conference on management of data | 2017

Web and Social Media Analytics towards Enhancing Urban Transportations: A Case for Bangalore

Manjira Sinha; Preethy Varma; Tridib Mukherjee

Cities today are typically plagued by multiple issues such as âĂŞ traffic jams, garbage, transit overload, public safety, drainage etc. Citizens today tend to discuss these issues in public forums, social media, web blogs, in a widespread manner. Given that issues related to public transportation are most actively reported across web-based sources, we present a holistic framework for collection, categorization, aggregation and visualization of urban public transportation issues. The primary challenges in deriving useful insights from web-based sources, stem from: (a) the number of reports; (b) incomplete or implicit spatio-temporal context; and the (c) unstructured nature of text in these reports. This paper provides the text categorization techniques that can be adopted to address specifically these challenges. The work initiates with the formal complaint data from the largest public transportation agency in Bangalore, complemented by complaint reports from web-based and social media sources. An easy to navigate and well-organized dashboard is developed for efficient visualization. The dashboard is currently being piloted with the largest transportation agency in Bangalore.

conference on information and knowledge management | 2017

SMASC 2017: First International Workshop on Social Media Analytics for Smart Cities

Manjira Sinha; Xiangnan He; Alessandro Bozzon; Sandya Mannarswamy; Pradeep K. Murukannaiah; Tridib Mukherjee

In an increasingly digital urban setting, connected & concerned Citizens typically voice their opinions on various civic topics via social media. Efficient and scalable analysis of these citizen voices on social media to derive actionable insights is essential to the development of smart cities. The very nature of the data: heterogeneity and dynamism, the scarcity of gold standard annotated corpora, and the need for multi-dimensional analysis across space, time and semantics, makes urban social media analytics challenging. This workshop is dedicated to the theme of social media analytics for smart cities, with the aim of focusing the interest of CIKM research community on the challenges in mining social media data for urban informatics. The workshop hopes to foster collaboration between researchers working in information retrieval, social media analytics, linguistics; social scientists, and civic authorities, to develop scalable and practical systems for capturing and acting upon real world issues of cities as voiced by their citizens in social media. The aim of this workshop is to encourage researchers to develop techniques for urban analytics of social media data, with specific focus on applying these techniques to practical urban informatics applications for smart cities.

Proceedings of the Fourth ACM IKDD Conferences on Data Sciences | 2017

Embedding Learning of Figurative Phrases for Emotion Classification in Micro-Blog Texts

Shreshtha Mundra; Sandya Mannarswamy; Manjira Sinha; Anirban Sen

Figurative phrases such as idioms are a type of Multi-Word Expressions (MWE) that possess a specialized meaning, which is independent and different from the literal meaning of the constituent words. Figurative language is widely used to express emotions and are very predominant in micro-blog data.Therefore, an efficient model of emotion categorization for micro-blogs should be able to correctly represent the instances of figurative phrases in the data. However, due to their non-compositional nature, the phrasal representation of figurative language cannot be directly obtained from the constituent words and hence this requires novel approaches for addressing the problem of modeling figurative phrases in micro-blogs. Most of the existing methods of modeling figurative idiomatic phrases in traditional text data use the broader textual context available for better results. However, in case of micro-blog data, such large context is not available due to very short length of text, which poses an additional challenge. Given the need to model figurative language for emotion classification, this paper develops the novel idea of Emotion Sensitive Figurative Phrase Embedding (ESFPE) to model idiomatic phrases in micro-blog data and show upto 14% improvement in emotion classification performance over baseline. To the best of our knowledge, this is the first work towards figurative phrase modeling for emotion classification in micro-blog text.

meeting of the association for computational linguistics | 2016

Cross-domain Text Classification with Multiple Domains and Disparate Label Sets

Himanshu Sharad Bhatt; Manjira Sinha; Shourya Roy

Advances in transfer learning have let go the limitations of traditional supervised machine learning algorithms for being dependent on annotated training data for training new models for every new domain. However, several applications encounter scenarios where models need to transfer/adapt across domains when the label sets vary both in terms of count of labels as well as their connotations. This paper presents first-of-its-kind transfer learning algorithm for cross-domain classification with multiple source domains and disparate label sets. It starts with identifying transferable knowledge from across multiple domains that can be useful for learning the target domain task. This knowledge in the form of selective labeled instances from different domains is congregated to form an auxiliary training set which is used for learning the target domain task. Experimental results validate the efficacy of the proposed algorithm against strong baselines on a real world social media and the 20 Newsgroups datasets.

Proceedings of the 3rd IKDD Conference on Data Science, 2016 | 2016

Improving Urban Transportation through Social Media Analytics

Manjira Sinha; Preethy Varma; Gayatri Sivakumar; Mridula Singh; Tridib Mukherjee; Deepthi Chander; Koustuv Dasgupta

Citizens tend to discuss issues in public forums, social media, and web blogs. Given that issues related to public transportation are most actively reported across web-based sources, we present a holistic framework for collection, categorization, aggregation and visualization of urban public transportation issues. The primary challenges in deriving useful insights from web-based sources, stem from -- (a) the number of reports; (b) incomplete or implicit spatio-temporal context; and the (c) unstructured nature of text in these reports. The work initiates with the formal complaint data from the largest public transportation agency in Bangalore, complemented by complaint reports from web-based and social media sources. Text data is categorized into different transportation related problems and spatio-temporal context is added to the text data for geo-tagging and identifying persistent issues. A well-organized dashboard is developed for efficient visualization. The dashboard is currently being piloted with the largest transportation agency in Bangalore.

FIRE (Working Notes) | 2016