Rahul Jha | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rahul Jha is active.

Explore More

Publication

Featured researches published by Rahul Jha.

association for information science and technology | 2016

Predicting the impact of scientific concepts using full-text features

Kathy McKeown; Hal Daumé; Snigdha Chaturvedi; John Paparrizos; Kapil Thadani; Pablo Barrio; Or Biran; Suvarna Bothe; Michael Collins; Kenneth R. Fleischmann; Luis Gravano; Rahul Jha; Ben King; Kevin McInerney; Taesun Moon; Arvind Neelakantan; Diarmuid O'Seaghdha; Dragomir R. Radev; Clay Templeton; Simone Teufel

New scientific concepts, interpreted broadly, are continuously introduced in the literature, but relatively few concepts have a long‐term impact on society. The identification of such concepts is a challenging prediction task that would help multiple parties—including researchers and the general public—focus their attention within the vast scientific literature. In this paper we present a system that predicts the future impact of a scientific concept, represented as a technical term, based on the information available from recently published research articles. We analyze the usefulness of rich features derived from the full text of the articles through a variety of approaches, including rhetorical sentence analysis, information extraction, and time‐series analysis. The results from two large‐scale experiments with 3.8 million full‐text articles and 48 million metadata records support the conclusion that full‐text features are significantly more useful for prediction than metadata‐only features and that the most accurate predictions result from combining the metadata and full‐text features. Surprisingly, these results hold even when the metadata features are available for a much larger number of documents than are available for the full‐text features.

Natural Language Engineering | 2017

NLP-driven citation analysis for scientometrics

Rahul Jha; Amjad Abu Jbara; Vahed Qazvinian; Dragomir R. Radev

This paper summarizes ongoing research in Natural-Language-Processing-driven citation analysis and describes experiments and motivating examples of how this work can be used to enhance traditional scientometrics analysis that is based on simply treating citations as a ‘vote’ from the citing paper to cited paper. In particular, we describe our dataset for citation polarity and citation purpose, present experimental results on the automatic detection of these indicators, and demonstrate the use of such annotations for studying research dynamics and scientific summarization. We also look at two complementary problems that show up in Natural-Language-Processing-driven citation analysis for a specific target paper. The first problem is extracting citation context, the implicit citation sentences that do not contain explicit anchors to the target paper. The second problem is extracting reference scope, the target relevant segment of a complicated citing sentence that cites multiple papers. We show how these tasks can be helpful in improving sentiment analysis and citation-based summarization.

international joint conference on natural language processing | 2015

Content Models for Survey Generation: A Factoid-Based Evaluation

Rahul Jha; Catherine Finegan-Dollak; Ben King; Reed Coke; Dragomir R. Radev

We present a new factoid-annotated dataset for evaluating content models for scientific survey article generation containing 3,425 sentences from 7 topics in natural language processing. We also introduce a novel HITS-based content model for automated survey article generation called HITSUM that exploits the lexical network structure between sentences from citing and cited papers. Using the factoid-annotated data, we conduct a pyramid evaluation and compare HITSUM with two previous state-of-the-art content models: C-Lexrank, a network based content model, and TOPICSUM, a Bayesian content model. Our experiments show that our new content model captures useful survey-worthy information and outperforms C-Lexrank by 4% and TOPICSUM by 7% in pyramid evaluation.

Large-Scale Visual Geo-Localization | 2016

Where the Photos Were Taken: Location Prediction by Learning from Flickr Photos

Li-Jia Li; Rahul Jha; Bart Thomee; David A. Shamma; Liangliang Cao; Yang Wang

In this chapter, we explore the characteristics of geographically tagged Internet photos and determine their location based on the visual content. We develop a principled machine learning model to estimate geographical locations of photos by modeling the relationship between location and the photo content. To build reliable geographical estimators, it is important to find distinguishable geographical clusters in the world. These clusters cover general geographical regions not limited to just landmarks. Geographical clusters provide more training samples and hence lead to better recognition accuracy. We develop a framework for geographical cluster estimation, and employ latent variables to estimate the geographical clusters. To solve this estimation problem, we propose to build an efficient solver to find the latent clusters. We illustrate detailed qualitative results obtained from beaches photos taken at different continents. In addition, we show significantly improved quantitative results over other approaches for recognizing different beaches using the Flickr beach dataset as validation.

meeting of the association for computational linguistics | 2011

Identifying the Semantic Orientation of Foreign Words

Ahmed Hassan; Amjad Abu-Jbara; Rahul Jha; Dragomir R. Radev

language resources and evaluation | 2015

Humor in Collective Discourse: Unsupervised Funniness Detection in the New Yorker Cartoon Caption Contest

Dragomir R. Radev; Amanda Stent; Joel R. Tetreault; Aasish Pappu; Aikaterini Iliakopoulou; Agustin Chanfreau; Paloma de Juan; Jordi Vallmitjana; Alejandro Jaimes; Rahul Jha; Robert Mankoff

meeting of the association for computational linguistics | 2013