Dhruv Gupta
Max Planck Society
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dhruv Gupta.
european conference on information retrieval | 2016
Dhruv Gupta; Klaus Berberich
Getting an overview of a historic entity or event can be difficult in search results, especially if important dates concerning the entity or event are not known beforehand. For such information needs, users benefit if returned results covered diverse dates, thus giving an overview of what has happened throughout history. Such a method can be a building block for applications, for instance, in digital humanities. We describe an approach to diversify search results using temporal expressions (e.g., 1990s) from their contents. Our approach first identifies time intervals of interest to the given keyword query based on pseudo-relevant documents. It then re-ranks query results so as to maximize the coverage of identified time intervals. We present a novel and objective evaluation for our proposed approach. We test the effectiveness of our methods on The New York Times Annotated corpus and the Living Knowledge corpus, collectively consisting of around 6 million documents. Using history-oriented queries and encyclopedic resources we show that our method is able to present search results diversified along time.
string processing and information retrieval | 2015
Dhruv Gupta; Klaus Berberich
In this work, we consider the problem of classifying time-sensitive queries at different temporal granularities day, month, and year. Our approach involves performing Bayesian analysis on time intervals of interest obtained from pseudo-relevant documents. Based on the Bayesian analysis we derive several effective features which are used to train a supervised machine learning algorithm for classification. We evaluate our method on a large temporal query workload to show that we can determine the temporal class of a query with high precision.
web search and data mining | 2016
Dhruv Gupta
In this article, I present the questions that I seek to answer in my PhD research. I posit to analyze natural language text with the help of semantic annotations and mine important events for navigating large text corpora. Semantic annotations such as named entities, geographic locations, and temporal expressions can help us mine events from the given corpora. These events thus provide us with useful means to discover the locked knowledge in them. I pose three problems that can help unlock this knowledge vault in semantically annotated text corpora: i. identifying important events; ii. semantic search; iii. and event analytics.
international conference on the theory of information retrieval | 2016
Dhruv Gupta; Jannik Strötgen; Klaus Berberich
Events are central in human history and thus also in Web queries, in particular if they relate to history or news. However, ambiguity issues arise as queries may refer to ambiguous events differing in time, geography, or participating entities. Thus, users would greatly benefit if search results were presented along different events. In this paper, we present EventMiner, an algorithm that mines events from top-k pseudo-relevant documents for a given query. It is a probabilistic framework that leverages semantic annotations in the form of temporal expressions, geographic locations, and named entities to analyze natural language text and determine important events. Using a large news corpus, we show that using semantic annotations, EventMiner detects important events and presents documents covering the identified events in the order of their importance.
Companion of the The Web Conference 2018 on The Web Conference 2018 - WWW '18 | 2018
Dhruv Gupta; Klaus Berberich
Knowledge graphs capture very little temporal information associated with facts. In this work, we address the problem of identifying time intervals of knowledge graph facts from large document collections annotated with temporal expressions. Prior approaches in this direction have leveraged limited metadata associated with documents in large collections (e.g., publication dates) or have limited techniques to model the uncertainty and dynamics of temporal expressions. Our approach to identify time intervals for time-sensitive facts in knowledge graphs leverages a time model that incorporates uncertainty and models them at different levels of granularity (i.e., day, month, and year). Evaluation on a temporal fact benchmark using two large news archives amounting to more than eleven million documents show the quality of our results.
conference on information and knowledge management | 2018
Dhruv Gupta; Klaus Berberich
In this work, we describe GYANI (gyan stands for knowledge in Hindi), an indexing infrastructure for search and analysis of large semantically annotated document collections. To facilitate the search for sentences or text regions for many knowledge-centric tasks such as information extraction, question answering, and relationship extraction, it is required that one can query large annotated document collections interactively. However, currently such an indexing infrastructure that scales to millions of documents and provides fast query execution times does not exist. To alleviate this problem, we describe how we can effectively index layers of annotations (e.g., part-of-speech, named entities, temporal expressions, and numerical values) that can be attached to sequences of words. Furthermore, we describe a query language that provides the ability to express regular expressions between word sequences and semantic annotations to ease search for sentences and text regions for enabling knowledge acquisition at scale. We build our infrastructure on a state-of-the-art distributed extensible record store. We extensively evaluate GYANI over two large news archives and the entire Wikipedia amounting to more than fifteen million documents. We observe that using GYANI we can achieve significant speed ups of more than 95x in information extraction, 53x on extracting answer candidates for questions, and 12x on relationship extraction task.
acm ieee joint conference on digital libraries | 2018
Dhruv Gupta; Klaus Berberich; Jannik Strötgen; Demetrios Zeinalipour-Yazti
We present an approach to explore news archives by automatically generating semantic aspects for their navigation. Given a keyword query as an input, we utilize semantic annotations present in the pseudo-relevant set of documents for generating the aspects. Our approach to generate the aspects considers the salience of the annotations by modeling their semantics as well as considering their co-occurrence in the pseudo-relevant set of documents. The generated aspects are also beneficial for representing documents in a structured manner. We show preliminary results on two news archives demonstrating the quality of the generated aspects over a testbed of more than 5,000 aspects derived from Wikipedia.
acm ieee joint conference on digital libraries | 2018
Jannik Strötgen; Rosita Andrade; Dhruv Gupta
Street names are not only used across the world as part of addresses, but also reveal a lot about a countrys identity. Thus, they are subject to analysis in the fields of geography and social science. There, typically, a manual analysis limited to a small region is performed, e.g., focusing on the renaming of streets in a city after a political change in a country. Surprisingly, there have been hardly any automatic, large-scale studies of street names so far, although this might lead to interesting insights regarding the distribution of particular street name phenomena. In this paper, we present an automated, world-wide analysis of street names with date references. Such temporal streets are frequently used to commemorate important events and thus particularly interesting to study. After applying a multilingual temporal tagger to discover such street names, we analyze their temporal and geographic distributions on different levels of granularity. Furthermore, we present an approach to automatically harvest potential explanations why streets in specific regions refer to particular dates. Despite the challenges of the tasks, our evaluation demonstrates the feasibility of the street extraction and the explanation harvesting.
conference on information and knowledge management | 2014
Dhruv Gupta; Klaus Berberich
The 3rd HistoInformatics Workshop on Computational History | 2016
Dhruv Gupta; Jannik Strötgen; Klaus Berberich