Rahul Jha
University of Michigan
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rahul Jha.
association for information science and technology | 2016
Kathy McKeown; Hal Daumé; Snigdha Chaturvedi; John Paparrizos; Kapil Thadani; Pablo Barrio; Or Biran; Suvarna Bothe; Michael Collins; Kenneth R. Fleischmann; Luis Gravano; Rahul Jha; Ben King; Kevin McInerney; Taesun Moon; Arvind Neelakantan; Diarmuid O'Seaghdha; Dragomir R. Radev; Clay Templeton; Simone Teufel
New scientific concepts, interpreted broadly, are continuously introduced in the literature, but relatively few concepts have a long‐term impact on society. The identification of such concepts is a challenging prediction task that would help multiple parties—including researchers and the general public—focus their attention within the vast scientific literature. In this paper we present a system that predicts the future impact of a scientific concept, represented as a technical term, based on the information available from recently published research articles. We analyze the usefulness of rich features derived from the full text of the articles through a variety of approaches, including rhetorical sentence analysis, information extraction, and time‐series analysis. The results from two large‐scale experiments with 3.8 million full‐text articles and 48 million metadata records support the conclusion that full‐text features are significantly more useful for prediction than metadata‐only features and that the most accurate predictions result from combining the metadata and full‐text features. Surprisingly, these results hold even when the metadata features are available for a much larger number of documents than are available for the full‐text features.
Natural Language Engineering | 2017
Rahul Jha; Amjad Abu Jbara; Vahed Qazvinian; Dragomir R. Radev
This paper summarizes ongoing research in Natural-Language-Processing-driven citation analysis and describes experiments and motivating examples of how this work can be used to enhance traditional scientometrics analysis that is based on simply treating citations as a ‘vote’ from the citing paper to cited paper. In particular, we describe our dataset for citation polarity and citation purpose, present experimental results on the automatic detection of these indicators, and demonstrate the use of such annotations for studying research dynamics and scientific summarization. We also look at two complementary problems that show up in Natural-Language-Processing-driven citation analysis for a specific target paper. The first problem is extracting citation context, the implicit citation sentences that do not contain explicit anchors to the target paper. The second problem is extracting reference scope, the target relevant segment of a complicated citing sentence that cites multiple papers. We show how these tasks can be helpful in improving sentiment analysis and citation-based summarization.
international joint conference on natural language processing | 2015
Rahul Jha; Catherine Finegan-Dollak; Ben King; Reed Coke; Dragomir R. Radev
We present a new factoid-annotated dataset for evaluating content models for scientific survey article generation containing 3,425 sentences from 7 topics in natural language processing. We also introduce a novel HITS-based content model for automated survey article generation called HITSUM that exploits the lexical network structure between sentences from citing and cited papers. Using the factoid-annotated data, we conduct a pyramid evaluation and compare HITSUM with two previous state-of-the-art content models: C-Lexrank, a network based content model, and TOPICSUM, a Bayesian content model. Our experiments show that our new content model captures useful survey-worthy information and outperforms C-Lexrank by 4% and TOPICSUM by 7% in pyramid evaluation.
Large-Scale Visual Geo-Localization | 2016
Li-Jia Li; Rahul Jha; Bart Thomee; David A. Shamma; Liangliang Cao; Yang Wang
In this chapter, we explore the characteristics of geographically tagged Internet photos and determine their location based on the visual content. We develop a principled machine learning model to estimate geographical locations of photos by modeling the relationship between location and the photo content. To build reliable geographical estimators, it is important to find distinguishable geographical clusters in the world. These clusters cover general geographical regions not limited to just landmarks. Geographical clusters provide more training samples and hence lead to better recognition accuracy. We develop a framework for geographical cluster estimation, and employ latent variables to estimate the geographical clusters. To solve this estimation problem, we propose to build an efficient solver to find the latent clusters. We illustrate detailed qualitative results obtained from beaches photos taken at different continents. In addition, we show significantly improved quantitative results over other approaches for recognizing different beaches using the Flickr beach dataset as validation.
meeting of the association for computational linguistics | 2011
Ahmed Hassan; Amjad Abu-Jbara; Rahul Jha; Dragomir R. Radev
language resources and evaluation | 2015
Dragomir R. Radev; Amanda Stent; Joel R. Tetreault; Aasish Pappu; Aikaterini Iliakopoulou; Agustin Chanfreau; Paloma de Juan; Jordi Vallmitjana; Alejandro Jaimes; Rahul Jha; Robert Mankoff
meeting of the association for computational linguistics | 2013
Rahul Jha; Amjad Abu-Jbara; Dragomir R. Radev
national conference on artificial intelligence | 2015
Rahul Jha; Reed Coke; Dragomir R. Radev
Transactions of the Association for Computational Linguistics | 2014
Ben King; Rahul Jha; Dragomir R. Radev
workshop on innovative use of nlp for building educational applications | 2013
Amjad Abu-Jbara; Rahul Jha; Eric Morley; Dragomir R. Radev