Is this you? Create Your Porfile

Piyush Bansal

International Institute of Information Technology, Hyderabad

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Piyush Bansal is active.

Explore More

Publication

Featured researches published by Piyush Bansal.

european conference on information retrieval | 2015

Towards Deep Semantic Analysis of Hashtags

Piyush Bansal; Romil Bansal; Vasudeva Varma

Hashtags are semantico-syntactic constructs used across various social networking and microblogging platforms to enable users to start a topic specific discussion or classify a post into a desired category. Segmenting and linking the entities present within the hashtags could therefore help in better understanding and extraction of information shared across the social media. However, due to lack of space delimiters in the hashtags (e.g #nsavssnowden), the segmentation of hashtags into constituent entities (“NSA” and “Edward Snowden” in this case) is not a trivial task. Most of the current state-of-the-art social media analytics systems like Sentiment Analysis and Entity Linking tend to either ignore hashtags, or treat them as a single word. In this paper, we present a context aware approach to segment and link entities in the hashtags to a knowledge base (KB) entry, based on the context within the tweet. Our approach segments and links the entities in hashtags such that the coherence between hashtag semantics and the tweet is maximized. To the best of our knowledge, no existing study addresses the issue of linking entities in hashtags for extracting semantic information. We evaluate our method on two different datasets, and demonstrate the effectiveness of our technique in improving the overall entity linking in tweets via additional semantic information provided by segmenting and linking entities in a hashtag.

international symposium on distributed computing | 2011

Byzantine agreement using partial authentication

Piyush Bansal; Prasant Gopal; Anuj Gupta; Kannan Srinathan; Pranav K. Vasishta

Three decades ago, Pease et al. introduced the problem of Byzantine Agreement [PSL80] where nodes need to maintain a consistent view of the world in spite of the challenge posed by Byzantine faults. Subsequently, it is well known that Byzantine agreement over a completely connected synchronous network of n nodes tolerating up to t faults is (efficiently) possible if and only if t < n/3. Pease et al. further empowered the nodes with the ability to authenticate themselves and their messages and proved that agreement in this new model (popularly known as authenticated Byzantine agreement (ABA)) is possible if and only if t < n. (which is a huge improvement over the bound of t < n/3 in the absence of authentication for the same functionality). To understand the utility, potential and limitations of using authentication in distributed protocols for agreement, Gupta et al. [GGBS10] studied ABA in new light. They generalize the existing models and thus, attempt to give a unified theory of agreements over the authenticated and non-authenticated domains. In this paper we extend their results to synchronous (undirected) networks and give a complete characterization of agreement protocols. As a corollary, we show that agreement can be strictly easier than all-pair point-to-point communication. It is well known that in a synchronous network over n nodes of which up to any t are corrupted by a Byzantine adversary, BA is possible only if all pair point-to-point reliable communication is possible [Dol82, DDWY93]. Thus, a folklore in the area is that maintaining global consistency (agreement) is at least as hard as the problem of all pair point-to-point communication. Equivalently, it is widely believed that protocols for BA over incomplete networks exist only if it is possible to simulate an overlay-ed complete network. Surprisingly, we show that the folklore is not always true. Thus, it seems that agreement protocols may be more fundamental to distributed computing than reliable communication.

international conference of distributed computing and networking | 2010

Authenticated Byzantine generals in dual failure model

Anuj Gupta; Prasant Gopal; Piyush Bansal; Kannan Srinathan

Pease et al. introduced the problem of Byzantine Generals (BGP) to study the effects of Byzantine faults in distributed protocols for reliable broadcast. It is well known that BGP among n players tolerating up to t faults is (efficiently) possible iff n > 3t. To overcome this severe limitation, Pease et al. introduced a variant of BGP, Authenticated Byzantine General (ABG). Here players are supplemented with digital signatures (or similar tools) to thwart the challenge posed by Byzantine faults. Subsequently, they proved that with the use of authentication, fault tolerance of protocols for reliable broadcast can be amazingly increased to n > t (which is a huge improvement over the n > 3t). Byzantine faults are the most generic form of faults. In a network not all faults are always malicious. Some faulty nodes may only leak their data while others are malicious. Motivated from this, we study the problem of ABG in (tb, tp)-mixed adversary model where the adversary can corrupt up to any tb players actively and control up to any other tp players passively. We prove that in such a setting, ABG over a completely connected synchronous network of n nodes tolerating a (tb, tp)-adversary is possible iff n > 2tb+min(tb, tp) when tp > 0. Interestingly, our results can also be seen as an attempt to unify the extant literature on BGP and ABG.

conference on information and knowledge management | 2011

LSH based outlier detection and its application in distributed setting

Madhuchand Rushi Pillutla; Nisarg Raval; Piyush Bansal; Kannan Srinathan; C. V. Jawahar

In this paper, we give an approximate algorithm for distance based outlier detection using Locality Sensitive Hashing (LSH) technique. We propose an algorithm for the centralized case wherein the entire dataset is locally available for processing. However, in case of very large datasets collected from various input sources, often the data is distributed across the network. Accordingly, we show that our algorithm can be effectively extended to a constant round protocol with low communication costs, in a distributed setting with horizontal partitioning.

international world wide web conferences | 2015

Towards Semantic Retrieval of Hashtags in Microblogs

Piyush Bansal; Somay Jain; Vasudeva Varma

On various microblogging platforms like Twitter, the users post short text messages ranging from news and information to thoughts and daily chatter. These messages often contain keywords called Hashtags, which are semantico-syntactic constructs that enable topical classification of the microblog posts. In this poster, we propose and evaluate a novel method of semantic enrichment of microblogs for a particular type of entity search -- retrieving a ranked list of the top-k hashtags relevant to a users query Q. Such a list can help the users track posts of their general interest. We show that our technique significantly improved microblog retrieval as well. We tested our approach on the publicly available Stanford sentiment analysis tweet corpus. We observed an improvement of more than 10% in NDCG for microblog retrieval task, and around 11% in mean average precision for hashtag retrieval task.

conference on information and knowledge management | 2016

Active Content-Based Crowdsourcing Task Selection

Piyush Bansal; Carsten Eickhoff; Thomas Hofmann

Crowdsourcing has long established itself as a viable alternative to corpus annotation by domain experts for tasks such as document relevance assessment. The crowdsourcing process traditionally relies on high degrees of label redundancy in order to mitigate the detrimental effects of individually noisy worker submissions. Such redundancy comes at the cost of increased label volume, and, subsequently, monetary requirements. In practice, especially as the size of datasets increases, this is undesirable. In this paper, we focus on an alternate method that exploits document information instead, to infer relevance labels for unjudged documents. We present an active learning scheme for document selection that aims at maximising the overall relevance label prediction accuracy, for a given budget of available relevance judgements by exploiting system-wide estimates of label variance and mutual information. Our experiments are based on TREC 2011 Crowdsourcing Track data and show that our method is able to achieve state-of-the-art performance while requiring 17% - 25% less budget.

international conference on data mining | 2011

Privacy Preserving Outlier Detection Using Locality Sensitive Hashing

Nisarg Raval; Madhuchand Rushi Pillutla; Piyush Bansal; Kannan Srinathan; C. V. Jawahar

In this paper, we give approximate algorithms for privacy preserving distance based outlier detection for both horizontal and vertical distributions, which scale well to large datasets of high dimensionality in comparison with the existing techniques. In order to achieve efficient private algorithms, we introduce an approximate outlier detection scheme for the centralized setting which is based on the idea of Locality Sensitive Hashing. We also give theoretical and empirical bounds on the level of approximation of the proposed algorithms.

international acm sigir conference on research and development in information retrieval | 2014

CharBoxes: a system for automatic discovery of character infoboxes from books

Manish Gupta; Piyush Bansal; Vasudeva Varma

Entities are centric to a large number of real world applications. Wikipedia shows entity infoboxes for a large number of entities. However, not much structured information is available about character entities in books. Automatic discovery of characters from books can help in effective summarization. Such a structured summary which not just introduces characters in the book but also provides a high level relationship between them can be of critical importance for buyers. This task involves the following challenging novel problems: 1. automatic discovery of important characters given a book; 2. automatic social graph construction relating the discovered characters; 3. automatic summarization of text most related to each of the characters; and 4. automatic infobox extraction from such summarized text for each character. As part of this demo, we design mechanisms to address these challenges and experiment with publicly available books.

international conference on advanced computing | 2013

Allowing Multiple Rounds in the Shared Whiteboard Model: Some More Impossibility Results

Dharmeet Singh Hora; Piyush Bansal; Kishore Kothapalli; Kannan Srinathan

The shared whiteboard model for distributed computing is one of the recent interesting models to be proposed (See Becker et al. (SPAA 2012)). In its basic form, this model allows all nodes to write a message of no more than O(log n) bits on a whiteboard that every node can read. However, each node can write at most once. In this model, a variety of problems from graphs are shown to be either possible or impossible. In this paper we extend the work of Becker et al. to allow for nodes to write on the shared whiteboard more than once. However, each node can write at most O(log n) bits at any one instant. Interestingly, in this model, we show that allowing just two rounds of writing on the whiteboard, one can color the vertices of a d-degenerate graph using d+1 colors. Similarly, we show that two rounds suffice to find maximal independent set (MIS), whereas 2-ruling sets can be computed in one round in simultaneous synchronous models. Finally, we show that for finding connected components in a graph, even O(1) rounds is not enough in general. We show that any deterministic algorithm that follows certain rules requires at least Omega(log nlog log n) rounds to find the connected components of an n-vertex graph. At the same time, we show the existence of a O(log n) round algorithm for the same. Thus, our results indicate that the multi-round shared whiteboard model has interesting consequences.

principles of distributed computing | 2009

Brief announcement: global consistency can be easier than point-to-point communication

Prasant Gopal Anumanchipalli; Anuj Gupta; Pranav K. Vasishta; Piyush Bansal; Kannan Srinathan

Global consistency or Byzantine Agreement (BA) and reliable point-to-point communication are two of the most important and well-studied problems in distributed computing. Informally, BA is about maintaining a consistent view of the world among all the non-faulty players in the presence of faults. In a synchronous network over n nodes of which up to any t are corrupted by a Byzantine adversary, BA is possible only if all pair point-to-point reliable communication is possible [Dol82, DDWY93] Specifically, in the standard unauthenticated model, (2t + 1)-connectivity is necessary whereas in the authenticated setting (t + 1)-connectivity is required. Thus, a folklore is that maintaining global consistency is at least as hard as the problem of all pair point-to-point communication. Equivalently, it is widely believed that protocols for BA over incomplete graphs exist only if it is possible to simulate an overlay-ed complete graph. Surprisingly, we show that the folklore is far from true-- achieving global consistency can be strictly easier than all-pair point-to-point communication. In the authenticated model, it is assumed that the adversary can forge the signatures of only those nodes under its control. In contrast, the unauthenticated model assumes that the adversary can forge the signatures of all the nodes (that is, secure signatures are not used). We initiate a study on the entire gamut of BAs in between, viz., the adversary can forge the signatures of up to any k nodes apart from the up to t nodes that it can actively corrupt. We completely characterize the possibility of BA across the spectrum. Thus, our work attempts to unify the extant literature on agreement. It is, however, more than a mere attempt towards unification as it provides insights into the field. Specifically, apart from the extremes (of k = 0 and k = n − t where aforementioned folklore is known to hold), for every intermediate k, there are several networks over which BA is possible but all-pair point-to-point communication is not.

Explore More