Pradeep Dasigi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pradeep Dasigi is active.

Explore More

Publication

Featured researches published by Pradeep Dasigi.

meeting of the association for computational linguistics | 2012

Subgroup Detection in Ideological Discussions

Amjad Abu-Jbara; Pradeep Dasigi; Mona T. Diab; Dragomir R. Radev

The rapid and continuous growth of social networking sites has led to the emergence of many communities of communicating groups. Many of these groups discuss ideological and political topics. It is not uncommon that the participants in such discussions split into two or more subgroups. The members of each subgroup share the same opinion toward the discussion topic and are more likely to agree with members of the same subgroup and disagree with members from opposing subgroups. In this paper, we propose an unsupervised approach for automatically detecting discussant subgroups in online communities. We analyze the text exchanged between the participants of a discussion to identify the attitude they carry toward each other and towards the various aspects of the discussion topic. We use attitude predictions to construct an attitude vector for each discussant. We use clustering techniques to cluster these vectors and, hence, determine the subgroup membership of each participant. We compare our methods to text clustering and other baselines, and show that our method achieves promising results.

Natural Language Dialog Systems and Intelligent Assistants | 2015

Rapidly Scaling Dialog Systems with Interactive Learning

Jason D. Williams; Nobal B. Niraula; Pradeep Dasigi; Aparna Lakshmiratan; Carlos Garcia Jurado Suarez; Mouni Reddy; Geoffrey Zweig

In personal assistant dialog systems, intent models are classifiers that identify the intent of a user utterance, such as to add a meeting to a calendar or get the director of a stated movie. Rapidly adding intents is one of the main bottlenecks to scaling—adding functionality to—personal assistants. In this paper we show how interactive learning can be applied to the creation of statistical intent models. Interactive learning (Simard, ICE: enabling non-experts to build models interactively for large-scale lopsided problems, 2014) combines model definition, labeling, model building, active learning, model evaluation, and feature engineering in a way that allows a domain expert—who need not be a machine learning expert—to build classifiers. We apply interactive learning to build a handful of intent models in three different domains. In controlled lab experiments, we show that intent detectors can be built using interactive learning and then improved in a novel end-to-end visualization tool. We then applied this method to a publicly deployed personal assistant—Microsoft Cortana—where a non-machine learning expert built an intent model in just over 2 h, yielding excellent performance in the commercial service.

international joint conference on natural language processing | 2011

CODACT: Towards Identifying Orthographic Variants in Dialectal Arabic

Pradeep Dasigi; Mona T. Diab

Dialectal Arabic (DA) is the spoken vernacular for over 300M people worldwide. DA is emerging as the form of Arabic written in online communication: chats, emails, blogs, etc. However, most existing NLP tools for Arabic are designed for processing Modern Standard Arabic, a variety that is more formal and scripted. Apart from the genre variation that is a hindrance for any language processing, even in English, DA has no orthographic standard, compared to MSA that has a standard orthography and script. Accordingly, a word may be written in many possible inconsistent spellings rendering the processing of DA very challenging. To solve this problem, such inconsistencies have to be normalized. This work is the first step towards addressing this problem, as we attempt to identify spelling variants in a given textual document. We present an unsupervised clustering approach that addresses the problem of identifying orthographic variants in DA. We employ different similarity measures that exploit string similarity and contextual semantic similarity. To our knowledge this is the first attempt at solving the problem for DA. Our approaches are tested on data in two dialects of Arabic - Egyptian and Levantine. Our system achieves the highest Entropy of 0.19 for Egyptian (corresponding to 68% cluster precision) and Levantine (corresponding to 64% cluster precision) respectively. This constitutes a significant reduction in entropy (from 0.47 for Egyptian and 0.51 for Levantine) and improvement in cluster precision (from 29% for both) from the baseline.

meeting of the association for computational linguistics | 2012

Genre independent subgroup detection in online discussion threads: a pilot study of implicit attitude using latent textual semantics

Pradeep Dasigi; Weiwei Guo; Mona T. Diab

We describe an unsupervised approach to the problem of automatically detecting subgroups of people holding similar opinions in a discussion thread. An intuitive way of identifying this is to detect the attitudes of discussants towards each other or named entities or topics mentioned in the discussion. Sentiment tags play an important role in this detection, but we also note another dimension to the detection of peoples attitudes in a discussion: if two persons share the same opinion, they tend to use similar language content. We consider the latter to be an implicit attitude. In this paper, we investigate the impact of implicit and explicit attitude in two genres of social media discussion data, more formal wikipedia discussions and a debate discussion forum that is much more informal. Experimental results strongly suggest that implicit attitude is an important complement for explicit attitudes (expressed via sentiment) and it can improve the sub-group detection performance independent of genre.

Database | 2016

Automated detection of discourse segment and experimental types from the text of cancer pathway results sections

Gully A. P. C. Burns; Pradeep Dasigi; Anita de Waard; Eduard H. Hovy

Automated machine-reading biocuration systems typically use sentence-by-sentence information extraction to construct meaning representations for use by curators. This does not directly reflect the typical discourse structure used by scientists to construct an argument from the experimental data available within a article, and is therefore less likely to correspond to representations typically used in biomedical informatics systems (let alone to the mental models that scientists have). In this study, we develop Natural Language Processing methods to locate, extract, and classify the individual passages of text from articles’ Results sections that refer to experimental data. In our domain of interest (molecular biology studies of cancer signal transduction pathways), individual articles may contain as many as 30 small-scale individual experiments describing a variety of findings, upon which authors base their overall research conclusions. Our system automatically classifies discourse segments in these texts into seven categories (fact, hypothesis, problem, goal, method, result, implication) with an F-score of 0.68. These segments describe the essential building blocks of scientific discourse to (i) provide context for each experiment, (ii) report experimental details and (iii) explain the data’s meaning in context. We evaluate our system on text passages from articles that were curated in molecular biology databases (the Pathway Logic Datum repository, the Molecular Interaction MINT and INTACT databases) linking individual experiments in articles to the type of assay used (coprecipitation, phosphorylation, translocation etc.). We use supervised machine learning techniques on text passages containing unambiguous references to experiments to obtain baseline F1 scores of 0.59 for MINT, 0.71 for INTACT and 0.63 for Pathway Logic. Although preliminary, these results support the notion that targeting information extraction methods to experimental results could provide accurate, automated methods for biocuration. We also suggest the need for finer-grained curation of experimental methods used when constructing molecular biology databases

meeting of the association for computational linguistics | 2017

Ontology-Aware Token Embeddings for Prepositional Phrase Attachment

Pradeep Dasigi; Waleed Ammar; Chris Dyer; Eduard H. Hovy

Type-level word embeddings use the same set of parameters to represent all instances of a word regardless of its context, ignoring the inherent lexical ambiguity in language. Instead, we embed semantic concepts (or synsets) as defined in WordNet and represent a word token in a particular context by estimating a distribution over relevant semantic concepts. We use the new, context-sensitive embeddings in a model for predicting prepositional phrase(PP) attachments and jointly learn the concept embeddings and model parameters. We show that using context-sensitive embeddings improves the accuracy of the PP attachment model by 5.4% absolute points, which amounts to a 34.4% relative reduction in errors.

international acm sigir conference on research and development in information retrieval | 2009

Experiments in CLIR using fuzzy string search based on surface similarity

Sethuramalingam Subramaniam; Anil Kumar Singh; Pradeep Dasigi; Vasudeva Varma

Cross Language Information Retrieval (CLIR) between languages of the same origin is an interesting topic of research. The similarity of the writing systems used for these languages can be used effectively to not only improve CLIR, but to overcome the problems of textual variations, textual errors, and even the lack of linguistic resources like stemmers to an extent. We have conducted CLIR experiments between three languages which use writing systems (scripts) of Brahmi-origin, namely Hindi, Bengali and Marathi. We found significant improvements for all the six language pairs using a method for fuzzy text search based on Surface Similarity. In this paper we report these results and compare them with a baseline CLIR system and a CLIR system that uses Scaled Edit Distance (SED) for fuzzy string matching.

bioRxiv | 2017

Extracting Evidence Fragments for Distant Supervision of Molecular Interactions.

Gully A. P. C. Burns; Pradeep Dasigi; Eduard H. Hovy

We describe a methodology for automatically extracting ‘evidence fragments’ from a set of biomedical experimental research articles. These fragments provide the primary description of evidence that is presented in the papers’ figures. They elucidate the goals, methods, results and interpretations of experiments that support the original scientific contributions the study being reported. Within this paper, we describe our methodology and showcase an example data set based on the European Bioinformatics Institute’s INTACT database (http://www.ebi.ac.uk/intact/). Using figure codes as anchors, we linked evidence fragments to INTACT data records as an example of distant supervision so that we could use INTACT’s preexisting, manually-curated structured interaction data to act as a gold standard for machine reading experiments. We report preliminary baseline event extraction measures from this collection based on a publicly available, machine reading system (REACH). We use semantic web standards for our data and provide open access to all source code.

language resources and evaluation | 2014

Tharwa: A Large Scale Dialectal Arabic - Standard Arabic - English Lexicon

Mona T. Diab; Mohamed Al-Badrashiny; Maryam Aminian; Mohammed Attia; Heba Elfardy; Nizar Habash; Abdelati Hawwari; Wael Salloum; Pradeep Dasigi; Ramy Eskander

meeting of the association for computational linguistics | 2012