Is this you? Create Your Porfile

Aditya Joshi

Indian Institute of Technology Bombay

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Aditya Joshi is active.

Explore More

Publication

Featured researches published by Aditya Joshi.

international joint conference on natural language processing | 2015

Harnessing Context Incongruity for Sarcasm Detection

Aditya Joshi; Vinita Sharma; Pushpak Bhattacharyya

The relationship between context incongruity and sarcasm has been studied in linguistics. We present a computational system that harnesses context incongruity as a basis for sarcasm detection. Our statistical sarcasm classifiers incorporate two kinds of incongruity features: explicit and implicit. We show the benefit of our incongruity features for two text forms tweets and discussion forum posts. Our system also outperforms two past works (with Fscore improvement of 10-20%). We also show how our features can capture intersentential incongruity.

ACM Computing Surveys | 2017

Automatic Sarcasm Detection: A Survey

Aditya Joshi; Pushpak Bhattacharyya; Mark James Carman

Automatic sarcasm detection is the task of predicting sarcasm in text. This is a crucial step to sentiment analysis, considering prevalence and challenges of sarcasm in sentiment-bearing text. Beginning with an approach that used speech-based features, automatic sarcasm detection has witnessed great interest from the sentiment analysis community. This article is a compilation of past work in automatic sarcasm detection. We observe three milestones in the research so far: semi-supervised pattern extraction to identify implicit sentiment, use of hashtag-based supervision, and incorporation of context beyond target text. In this article, we describe datasets, approaches, trends, and issues in sarcasm detection. We also discuss representative performance values, describe shared tasks, and provide pointers to future work, as given in prior works. In terms of resources to understand the state-of-the-art, the survey presents several useful illustrations—most prominently, a table that summarizes past papers along different dimensions such as the types of features, annotation techniques, and datasets used.

empirical methods in natural language processing | 2015

Your Sentiment Precedes You: Using an authorâ€™s historical tweets to predict sarcasm

Anupam Khattri; Aditya Joshi; Pushpak Bhattacharyya; Mark James Carman

Sarcasm understanding may require information beyond the text itself, as in the case of ‘I absolutely love this restaurant!’ which may be sarcastic, depending on the contextual situation. We present the first quantitative evidence to show that historical tweets by an author can provide additional context for sarcasm detection. Our sarcasm detection approach uses two components: a contrast-based predictor (that identifies if there is a sentiment contrast within a target tweet), and a historical tweet-based predictor (that identifies if the sentiment expressed towards an entity in the target tweet agrees with sentiment expressed by the author towards that entity in the past).

empirical methods in natural language processing | 2016

Are Word Embedding-based Features Useful for Sarcasm Detection?

Aditya Joshi; Vaibhav Tripathi; Kevin Patel; Pushpak Bhattacharyya; Mark James Carman

This paper makes a simple increment to state-of-the-art in sarcasm detection research. Existing approaches are unable to capture subtle forms of context incongruity which lies at the heart of sarcasm. We explore if prior work can be enhanced using semantic similarity/discordance between word embeddings. We augment word embedding-based features to four feature sets reported in the past. We also experiment with four types of word embeddings. We observe an improvement in sarcasm detection, irrespective of the word embedding used or the original feature set to which our features are augmented. For example, this augmentation results in an improvement in F-score of around 4\% for three out of these four feature sets, and a minor degradation in case of the fourth, when Word2Vec embeddings are used. Finally, a comparison of the four embeddings shows that Word2Vec and dependency weight-based features outperform LSA and GloVe, in terms of their benefit to sarcasm detection.

meeting of the association for computational linguistics | 2014

Measuring Sentiment Annotation Complexity of Text

Aditya Joshi; Abhijit Mishra; Nivvedan Senthamilselvan; Pushpak Bhattacharyya

The effort required for a human annotator to detect sentiment is not uniform for all texts, irrespective of his/her expertise. We aim to predict a score that quantifies this effort, using linguistic properties of the text. Our proposed metric is called Sentiment Annotation Complexity (SAC). As for training data, since any direct judgment of complexity by a human annotator is fraught with subjectivity, we rely on cognitive evidence from eye-tracking. The sentences in our dataset are labeled with SAC scores derived from eye-fixation duration. Using linguistic features and annotated SACs, we train a regressor that predicts the SAC with a best mean error rate of 22.02% for five-fold cross-validation. We also study the correlation between a human annotator’s perception of complexity and a machine’s confidence in polarity determination. The merit of our work lies in (a) deciding the sentiment annotation cost in, for example, a crowdsourcing setting, (b) choosing the right classifier for sentiment prediction.

conference on computational natural language learning | 2016

Harnessing Sequence Labeling for Sarcasm Detection in Dialogue from TV Series `Friends'

Aditya Joshi; Vaibhav Tripathi; Pushpak Bhattacharyya; Mark James Carman

This paper is a novel study that views sarcasm detection in dialogue as a sequence labeling task, where a dialogue is made up of a sequence of utterances. We create a manuallylabeled dataset of dialogue from TV series ‘Friends’ annotated with sarcasm. Our goal is to predict sarcasm in each utterance, using sequential nature of a scene. We show performance gain using sequence labeling as compared to classification-based approaches. Our experiments are based on three sets of features, one is derived from information in our dataset, the other two are from past works. Two sequence labeling algorithms (SVM-HMM and SEARN) outperform three classification algorithms (SVM, Naive Bayes) for all these feature sets, with an increase in F-score of around 4%. Our observations highlight the viability of sequence labeling techniques for sarcasm detection of dialogue.

meeting of the association for computational linguistics | 2014

A cognitive study of subjectivity extraction in sentiment annotation

Abhijit Mishra; Aditya Joshi; Pushpak Bhattacharyya

Existing sentiment analysers are weak AI systems: they try to capture the functionality of human sentiment detection faculty, without worrying about how such faculty is realized in the hardware of the human. These analysers are agnostic of the actual cognitive processes involved. This, however, does not deliver when applications demand order of magnitude facelift in accuracy, as well as insight into characteristics of sentiment detection process. In this paper, we present a cognitive study of sentiment detection from the perspective of strong AI. We study the sentiment detection process of a set of human “sentiment readers”. Using eye-tracking, we show that on the way to sentiment detection, humans first extract subjectivity. They focus attention on a subset of sentences before arriving at the overall sentiment. This they do either through ”anticipation” where sentences are skipped during the first pass of reading, or through ”homing” where a subset of the sentences are read over multiple passes, or through both. ”Homing” behaviour is also observed at the sub-sentence level in complex sentiment phenomena like sarcasm.

north american chapter of the association for computational linguistics | 2016

Political Issue Extraction Model: A Novel Hierarchical Topic Model That Uses Tweets By Political And Non-Political Authors.

Aditya Joshi; Pushpak Bhattacharyya; Mark James Carman

People often use social media to discuss opinions, including political ones. We refer to relevant topics in these discussions as political issues, and the alternate stands towards these topics as political positions. We present a Political Issue Extraction (PIE) model that is capable of discovering political issues and positions from an unlabeled dataset of tweets. A strength of this model is that it uses twitter timelines of political and non-political authors, and affiliation information of only political authors. The model estimates word-specific distributions (that denote political issues and positions) and hierarchical author/group-specific distributions (that show how these issues divide people). Our experiments using a dataset of 2.4 million tweets from the US show that this model effectively captures the desired properties (with respect to words and groups) of political discussions. We also evaluate the two components of the model by experimenting with: (a) Use to alternate strategies to classify words, and (b) Value addition due to incorporation of group membership information. Estimated distributions are then used to predict political affiliation with 68% accuracy.

Archive | 2017

Sentiment Resources: Lexicons and Datasets

Aditya Joshi; Pushpak Bhattacharyya; Sagar Ahire

Sentiment lexicons and datasets represent the knowledge base that lies at the foundation of a SA system. In its simplest form, a sentiment lexicon is a repository of words/phrases labelled with sentiment. Similarly, a sentiment-annotated dataset consists of documents (tweets, sentences or longer documents) labelled with one or more sentiment labels. This chapter explores the philosophy, execution and utility of popular sentiment lexicons and datasets. We describe different labelling schemes that may be used. We then provide a detailed description of existing sentiment and emotion lexicons, and the trends underlying research in lexicon generation. This is followed by a survey of sentiment-annotated datasets and the nuances of labelling involved. We then show how lexicons and datasets created for one language can be transferred to a new language. Finally, we place these sentiment resources in the perspective of their classic applications to sentiment analysis.

sighum workshop on language technology for cultural heritage social sciences and humanities | 2016

How Do Cultural Differences Impact the Quality of Sarcasm Annotation?: A Case Study of Indian Annotators and American Text.

Aditya Joshi; Pushpak Bhattacharyya; Mark James Carman; Jaya Saraswati; Rajita Shukla

Sarcasm annotation extends beyond linguistic expertise, and often involves cultural context. This paper presents our first-of-its-kind study that deals with impact of cultural differences on the quality of sarcasm annotation. For this study, we consider the case of American text and Indian annotators. For two sarcasmlabeled datasets of American tweets and discussion forum posts that have been annotated by American annotators, we obtain annotations from Indian annotators. Our Indian annotators agree with each other more than their American counterparts, and face difficulties in case of unfamiliar situations and named entities. However, these difficulties in sarcasm annotation result in statistically insignificant degradation in sarcasm classification. We also show that these disagreements between annotators can be predicted using textual properties. Although the current study is limited to two annotators and one culture pair, our paper opens up a novel direction in evaluation of the quality of sarcasm annotation, and the impact of this quality on sarcasm classification. This study forms a stepping stone towards systematic evaluation of quality of these datasets annotated by non-native annotators, and can be extended to other culture combinations.

Explore More