Pushpak Bhattacharyya

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pushpak Bhattacharyya is active.

Explore More

Publication

Featured researches published by Pushpak Bhattacharyya.

Machine Translation | 2001

Interlingua-based English–Hindi Machine Translation and Language Divergence

Shachi Dave; Jignashu Parikh; Pushpak Bhattacharyya

Interlingua and transfer-based approaches tomachine translation have long been in use in competing and complementary ways. The former proves economical in situations where translation among multiple languages is involved, and can be used as a knowledge-representation scheme. But given a particular interlingua, its adoption depends on its ability (a) to capture the knowledge in texts precisely and accurately and (b) to handle cross-language divergences. This paper studies the language divergence between English and Hindi and its implication to machine translation between these languages using the Universal Networking Language (UNL). UNL has been introduced by the United Nations University, Tokyo, to facilitate the transfer and exchange of information over the internet. The representation works at the level of single sentences and defines a semantic net-like structure in which nodes are word concepts and arcs are semantic relations between these concepts. The language divergences between Hindi, an Indo-European language, and English can be considered as representing the divergences between the SOV and SVO classes of languages. The work presented here is the only one to our knowledge that describes language divergence phenomena in the framework of computational linguistics through a South Asian language.

international joint conference on natural language processing | 2015

Harnessing Context Incongruity for Sarcasm Detection

Aditya Joshi; Vinita Sharma; Pushpak Bhattacharyya

The relationship between context incongruity and sarcasm has been studied in linguistics. We present a computational system that harnesses context incongruity as a basis for sarcasm detection. Our statistical sarcasm classifiers incorporate two kinds of incongruity features: explicit and implicit. We show the benefit of our incongruity features for two text forms tweets and discussion forum posts. Our system also outperforms two past works (with Fscore improvement of 10-20%). We also show how our features can capture intersentential incongruity.

meeting of the association for computational linguistics | 2006

Morphological Richness Offsets Resource Demand -- Experiences in Constructing a POS Tagger for Hindi

Smriti Singh; Kuhoo Gupta; Manish Shrivastava; Pushpak Bhattacharyya

In this paper we report our work on building a POS tagger for a morphologically rich language- Hindi. The theme of the research is to vindicate the stand that- if morphology is strong and harnessable, then lack of training corpora is not debilitating. We establish a methodology of POS tagging which the resource disadvantaged (lacking annotated corpora) languages can make use of. The methodology makes use of locally annotated modestly-sized corpora (15,562 words), exhaustive morpohological analysis backed by high-coverage lexicon and a decision tree based learning algorithm (CN2). The evaluation of the system was done with 4-fold cross validation of the corpora in the news domain (www.bbc.co.uk/hindi). The current accuracy of POS tagging is 93.45% and can be further improved.

international conference on computational linguistics | 2012

Feature specific sentiment analysis for product reviews

Subhabrata Mukherjee; Pushpak Bhattacharyya

In this paper, we present a novel approach to identify feature specific expressions of opinion in product reviews with different features and mixed emotions. The objective is realized by identifying a set of potential features in the review and extracting opinion expressions about those features by exploiting their associations. Capitalizing on the view that more closely associated words come together to express an opinion about a certain feature, dependency parsing is used to identify relations between the opinion expressions. The system learns the set of significant relations to be used by dependency parsing and a threshold parameter which allows us to merge closely associated opinion expressions. The data requirement is minimal as this is a one time learning of the domain independent parameters. The associations are represented in the form of a graph which is partitioned to finally retrieve the opinion expression describing the user specified feature. We show that the system achieves a high accuracy across all domains and performs at par with state-of-the-art systems despite its data limitations.

meeting of the association for computational linguistics | 2003

Question Answering via Bayesian Inference on Lexical Relations

Ganesh Ramakrishnan; Apurva Jadhav; Ashutosh Joshi; Soumen Chakrabarti; Pushpak Bhattacharyya

Many researchers have used lexical networks and ontologies to mitigate synonymy and polysemy problems in Question Answering (QA), systems coupled with taggers, query classifiers, and answer extractors in complex and ad-hoc ways. We seek to make QA systems reproducible with shared and modest human effort, carefully separating knowledge from algorithms. To this end, we propose an aesthetically “clean” Bayesian inference scheme for exploiting lexical relations for passage-scoring for QA . The factors which contribute to the efficacy of Bayesian Inferencing on lexical relations are soft word sense disambiguation, parameter smoothing which ameliorates the data sparsity problem and estimation of joint probability over words which overcomes the deficiency of naive-bayes-like approaches.

international joint conference on natural language processing | 2009

Case markers and Morphology: Addressing the crux of the fluency problem in English-Hindi SMT

Ananthakrishnan Ramanathan; Hansraj Choudhary; Avishek Ghosh; Pushpak Bhattacharyya

We report in this paper our work on accurately generating case markers and suffixes in English-to-Hindi SMT. Hindi is a relatively free word-order language, and makes use of a comparatively richer set of case markers and morphological suffixes for correct meaning representation. From our experience of large-scale English-Hindi MT, we are convinced that fluency and fidelity in the Hindi output get an order of magnitude facelift if accurate case markers and suffixes are produced. Now, the moot question is: what entity on the English side encodes the information contained in case markers and suffixes on the Hindi side? Our studies of correspondences in the two languages show that case markers and suffixes in Hindi are predominantly determined by the combination of suffixes and semantic relations on the English side. We, therefore, augment the aligned corpus of the two languages, with the correspondence of English suffixes and semantic relations with Hindi suffixes and case markers. Our results on 400 test sentences, translated using an SMT system trained on around 13000 parallel sentences, show that suffix + semantic relation → case marker/suffix is a very useful translation factor, in the sense of making a significant difference to output quality as indicated by subjective evaluation as well as BLEU scores.

international conference on computational linguistics | 2008

Hindi Urdu Machine Transliteration using Finite-State Transducers

M. G. Abbas Malik; Christian Boitet; Pushpak Bhattacharyya

Finite-state Transducers (FST) can be very efficient to implement inter-dialectal transliteration. We illustrate this on the Hindi and Urdu language pair. FSTs can also be used for translation between surface-close languages. We introduce UIT (universal intermediate transcription) for the same pair on the basis of their common phonetic repository in such a way that it can be extended to other languages like Arabic, Chinese, English, French, etc. We describe a transliteration model based on FST and UIT, and evaluate it on Hindi and Urdu corpora.

conference on information and knowledge management | 2012

TwiSent: a multistage system for analyzing sentiment in twitter

Subhabrata Mukherjee; Akshat Malu; A. R. Balamurali; Pushpak Bhattacharyya

In this paper, we present TwiSent, a sentiment analysis system for Twitter. Based on the topic searched, TwiSent collects tweets pertaining to it and categorizes them into the different polarity classes positive, negative and objective. However, analyzing micro-blog posts have many inherent challenges compared to the other text genres. Through TwiSent, we address the problems of 1) Spams pertaining to sentiment analysis in Twitter, 2) Structural anomalies in the text in the form of incorrect spellings, nonstandard abbreviations, slangs etc., 3) Entity specificity in the context of the topic searched and 4) Pragmatics embedded in text. The system performance is evaluated on manually annotated gold standard data and on an automatically annotated tweet set based on hashtags. It is a common practise to show the efficacy of a supervised system on an automatically annotated dataset. However, we show that such a system achieves lesser classification accurcy when tested on generic twitter dataset. We also show that our system performs much better than an existing system.

ACM Computing Surveys | 2017

Automatic Sarcasm Detection: A Survey

Aditya Joshi; Pushpak Bhattacharyya; Mark James Carman

Automatic sarcasm detection is the task of predicting sarcasm in text. This is a crucial step to sentiment analysis, considering prevalence and challenges of sarcasm in sentiment-bearing text. Beginning with an approach that used speech-based features, automatic sarcasm detection has witnessed great interest from the sentiment analysis community. This article is a compilation of past work in automatic sarcasm detection. We observe three milestones in the research so far: semi-supervised pattern extraction to identify implicit sentiment, use of hashtag-based supervision, and incorporation of context beyond target text. In this article, we describe datasets, approaches, trends, and issues in sarcasm detection. We also discuss representative performance values, describe shared tasks, and provide pointers to future work, as given in prior works. In terms of resources to understand the state-of-the-art, the survey presents several useful illustrations—most prominently, a table that summarizes past papers along different dimensions such as the types of features, annotation techniques, and datasets used.

Iete Technical Review | 2001

Knowledge Extraction from Hindi text

Shachi Dave; Pushpak Bhattacharyya

In this paper, we present a unique approach to Knowledge Extraction from Hindi text which preserves the predicate till the end. This approach helps in highly accurate analysis of sentences. The analysis produces a semantic net like structure expressed by means of Universal Networking Language (UNL)—a recently proposed interlingua. Extremely varied and complex phenomena of Hindi language have been tackled as the case studies reveal. This representation can be used for developing web based applications in Hindi.

Explore More