Bhavana Dalvi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bhavana Dalvi is active.

Explore More

Publication

Featured researches published by Bhavana Dalvi.

web search and data mining | 2012

WebSets: extracting sets of entities from the web using unsupervised information extraction

Bhavana Dalvi; William W. Cohen; Jamie Callan

We describe a open-domain information extraction method for extracting concept-instance pairs from an HTML corpus. Most earlier approaches to this problem rely on combining clusters of distributionally similar terms and concept-instance pairs obtained with Hearst patterns. In contrast, our method relies on a novel approach for clustering terms found in HTML tables, and then assigning concept names to these clusters using Hearst patterns. The method can be efficiently applied to a large corpus, and experimental results on several datasets show that our method can accurately extract large numbers of concept-instance pairs.

web search and data mining | 2015

Automatic Gloss Finding for a Knowledge Base using Ontological Constraints

Bhavana Dalvi; Einat Minkov; Partha Pratim Talukdar; William W. Cohen

While there has been much research on automatically constructing structured Knowledge Bases (KBs), most of it has focused on generating facts to populate a KB. However, a useful KB must go beyond facts. For example, glosses (short natural language definitions) have been found to be very useful in tasks such as Word Sense Disambiguation. However, the important problem of Automatic Gloss Finding, i.e., assigning glosses to entities in an initially gloss-free KB, is relatively unexplored. We address that gap in this paper. In particular, we propose GLOFIN, a hierarchical semi-supervised learning algorithm for this problem which makes effective use of limited amounts of supervision and available ontological constraints. To the best of our knowledge, GLOFIN is the first system for this task. Through extensive experiments on real-world datasets, we demonstrate GLOFINs effectiveness. It is encouraging to see that GLOFIN outperforms other state-of-the-art SSL algorithms, especially in low supervision settings. We also demonstrate GLOFINs robustness to noise through experiments on a wide variety of KBs, ranging from user contributed (e.g., Freebase) to automatically constructed (e.g., NELL). To facilitate further research in this area, we have made the datasets and code used in this paper publicly available.

mining and learning with graphs | 2010

Structure, tie persistence and event detection in large phone and SMS networks

Leman Akoglu; Bhavana Dalvi

The effect of the network structure on the dynamics of social and communication networks has been of interest in recent years. It has been observed that network properties such as neighborhood overlap, clustering coefficient, etc. influence the tie strengths and link persistence between individuals. In this paper we study the communication records (both phonecall and SMS) of 2 million anonymized customers of a large mobile phone company with 50 million interactions over a period of 6 months. Our major contributions are the following: (a) we analyze several structural properties in these call/SMS networks and the correlations between them; (b) we formulate a learning problem to determine whether existing links between users will persist in the future. Experimental results show that our method performs better than existing rule based methods; and (c) we propose a change-point detection method in user behaviors using eigenvalue analysis of various behavioral features extracted over time. Our analysis shows that change-points detected by our method coincide with the social events and festivals in our data.

international conference on future energy systems | 2015

Integrating Energy Storage in Electricity Distribution Networks

Aditya Mishra; Ramesh K. Sitaraman; David E. Irwin; Ting Zhu; Prashant J. Shenoy; Bhavana Dalvi; Stephen Lee

Electricity generation combined with its transmission and distribution form the majority of an electric utilitys recurring operating costs. These costs are determined, not only by the aggregate energy generated, but also by the maximum instantaneous peak power demand required over time. Prior work proposes using energy storage devices to reduce these costs by periodically releasing energy to lower the electric grids peak demand. However, prior work generally considers only a single storage technology employed at a single level of the electric grids hierarchy. In this paper, we examine the efficacy of employing different combinations of storage technologies at different levels of the grids distribution hierarchy. We present an optimization framework for modeling the primary characteristics that dictate the lifetime cost of many prominent energy storage technologies. Our framework captures the important tradeoffs in placing different technologies at different levels of the distribution hierarchy with the goal of minimizing a utilitys operating costs. We evaluate our framework using real smart meter data from 5000 customers of a local electric utility. We show that by employing hybrid storage technologies at multiple levels of the distribution hierarchy, utilities can reduce their daily operating costs due to distributing electricity by up to 12%.

international acm sigir conference on research and development in information retrieval | 2014

A language modeling approach to entity recognition and disambiguation for search queries

Bhavana Dalvi; Chenyan Xiong; Jamie Callan

The Entity Recognition and Disambiguation (ERD) problem refers to the task of recognizing mentions of entities in a given query string, disambiguating them, and mapping them to entities in a given Knowledge Base(KB). If there are multiple ways to interpret the query, then an ERD system is supposed to group candidate entity annotations into consistent interpretations. In this paper, we propose a four step solution to this problem. First, we generate candidate entity strings by segmenting queries in different ways. Second, we retrieve candidate entities by searching for these candidate entity stringsin Freebase. Third, we rank the candidate entities using language model based query likelihood scores. Finally, we group the entity annotations into interpretations. We also present both quantitative and qualitative evaluation of our methods based on 91 training, 500 validation and 1000 test queries. Our system achieved an F1 score of 0.42 on the set of validation queries, whereas the NULL baseline which returns no annotations for any query achieved an F1 score of 0.3. Similarly, on the test queries, our method achieved an F1 score of 0.36 and outperformed the NULL baseline which achieved an F1 score of 0.2.

web search and data mining | 2016

Hierarchical Semi-supervised Classification with Incomplete Class Hierarchies

Bhavana Dalvi; Aditya Mishra; William W. Cohen

In an entity classification task, topic or concept hierarchies are often incomplete. Previous work by Dalvi et al. [12] has showed that in non-hierarchical semi-supervised classification tasks, the presence of such unanticipated classes can cause semantic drift for seeded classes. The Exploratory learning [12] method was proposed to solve this problem; however it is limited to the flat classification task. This paper builds such exploratory learning methods for hierarchical classification tasks. We experimented with subsets of the NELL [8] ontology and text, and HTML table datasets derived from the ClueWeb09 corpus. Our method (OptDAC-ExploreEM) outperforms the existing Exploratory EM method, and its naive extension (DAC-ExploreEM), in terms of seed class F1 on average by 10% and 7% respectively.

european conference on machine learning | 2013

From Topic Models to Semi-supervised Learning: Biasing Mixed-Membership Models to Exploit Topic-Indicative Features in Entity Clustering

Ramnath Balasubramanyan; Bhavana Dalvi; William W. Cohen

We present methods to introduce different forms of supervision into mixed-membership latent variable models. Firstly, we introduce a technique to bias the models to exploit topic-indicative features, i.e. features which are apriori known to be good indicators of the latent topics that generated them. Next, we present methods to modify the Gibbs sampler used for approximate inference in such models to permit injection of stronger forms of supervision in the form of labels for features and documents, along with a description of the corresponding change in the underlying generative process. This ability allows us to span the range from unsupervised topic models to semi-supervised learning in the same mixed membership model. Experimental results from an entity-clustering task demonstrate that the biasing technique and the introduction of feature and document labels provide a significant increase in clustering performance over baseline mixed-membership methods.

north american chapter of the association for computational linguistics | 2016

IKE - An Interactive Tool for Knowledge Extraction

Bhavana Dalvi; Sumithra Bhakthavatsalam; Christopher G. Clark; Peter Clark; Oren Etzioni; Anthony Fader; Dirk Groeneveld

Recent work on information extraction has suggested that fast, interactive tools can be highly effective; however, creating a usable system is challenging, and few publically available tools exist. In this paper we present IKE, a new extraction tool that performs fast, interactive bootstrapping to develop high-quality extraction patterns for targeted relations. Central to IKE is the notion that an extraction pattern can be treated as a search query over a corpus. To operationalize this, IKE uses a novel query language that is expressive, easy to understand, and fast to execute essential requirements for a practical system. It is also the first interactive extraction tool to seamlessly integrate symbolic (boolean) and distributional (similarity-based) methods for search. An initial evaluation suggests that relation tables can be populated substantially faster than by manual pattern authoring while retaining accuracy, and more reliably than fully automated tools, an important step towards practical KB construction. We are making IKE publically available (http://allenai.org/ software/interactive-knowledge-extraction).

conference on information and knowledge management | 2013

Classifying entities into an incomplete ontology

Bhavana Dalvi; William W. Cohen; Jamie Callan

Exponential growth of unlabeled web-scale datasets, and class hierarchies to represent them, has given rise to new challenges for hierarchical classification. It is costly and time consuming to create a complete ontology of classes to represent entities on the Web. Hence, there is a need for techniques that can do hierarchical classification of entities into incomplete ontologies. In this paper we present Hierarchical Exploratory EM algorithm (an extension of the Exploratory EM algorithm [7]) that takes a seed class hierarchy and seed class instances as input. Our method classifies relevant entities into some of the classes from the seed hierarchy and on its way adds newly discovered classes into the hierarchy. Experiments with subsets of the NELL ontology and text datasets derived from the ClueWeb09 corpus show that our Hierarchical Exploratory EM approach improves seed class F1 by up to 21% when compared to its semi-supervised counterpart.

siam international conference on data mining | 2013

Very Fast Similarity Queries on Semi-Structured Data from the Web

William W. Cohen; Bhavana Dalvi

In this paper, we propose a single low-dimensional representation for entities found in different datasets on the web. Our proposed PIC-D embeddings can represent large D-partite graphs using small number of dimensions enabling fast similarity queries. Our experiments show that this representation can be constructed in small amount of time (linear in number of dimensions). We demonstrate how it can be used for variety of similarity queries like set expansion, automatic set instance acquisition, and column classification. Our approach results in comparable precision with respect to task specific baselines and up to two orders of magnitude improvement in terms of query response time.

Explore More