Arun Shankar Iyer
Yahoo!
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Arun Shankar Iyer.
international conference on management of data | 2009
Philip Bohannon; Srujana Merugu; Cong Yu; Vipul Agarwal; Pedro DeRose; Arun Shankar Iyer; Ankur Jain; Vinay Kakade; Mridul Muralidharan; Raghu Ramakrishnan; Warren Shen
We describe the Purple SOX (PSOX) EMS, a prototype Extraction Management System currently being built at Yahoo!. The goal of the PSOX EMS is to manage a large number of sophisticated extraction pipelines across different application domains, at the web scale and with minimum human involvement. Three key value propositions are described: extensibility, the ability to swap in and out extraction operators; explainability, the ability to track the provenance of extraction results; and social feedback support, the facility for gathering and reconciling multiple, potentially conflicting sources.
web search and data mining | 2011
Ashwin Machanavajjhala; Arun Shankar Iyer; Philip Bohannon; Srujana Merugu
Automatic extraction of structured records from inconsistently formatted lists on the web is challenging: different lists present disparate sets of attributes with variations in the ordering of attributes; many lists contain additional attributes and noise that can confuse the extraction process; and formatting within a list may be inconsistent due to missing attributes or manual formatting on some sites. We present a novel solution to this extraction problem that is based on i) collective extraction from multiple lists simultaneously and ii) careful exploitation of a small database of seed entities. Our approach addresses the layout homogeneity within the individual lists, content redundancy across some snippets from different sources, and the noisy attribute rendering process. We experimentally evaluate variants of this algorithm on real world data sets and show that our approach is a promising direction for extraction from noisy lists, requiring mild and thus inexpensive supervision suitable for extraction from the tail of the web.
international conference on data mining | 2012
Namit Katariya; Arun Shankar Iyer; Sunita Sarawagi
The goal of this work is to estimate the accuracy of a classifier on a large unlabeled dataset based on a small labeled set and a human labeler. We seek to estimate accuracy and select instances for labeling in a loop via a continuously refined stratified sampling strategy. For stratifying data we develop a novel strategy of learning r bit hash functions to preserve similarity in accuracy values. We show that our algorithm provides better accuracy estimates than existing methods for learning distance preserving hash functions. Experiments on a wide spectrum of real datasets show that our estimates achieve between 15% and 62% relative reduction in error compared to existing approaches. We show how to perform stratified sampling on unlabeled data that is so large that in an interactive setting even a single sequential scan is impractical. We present an optimal algorithm for performing importance sampling on a static index over the data that achieves close to exact estimates while reading three orders of magnitude less data.
knowledge discovery and data mining | 2016
Arun Shankar Iyer; J. Saketha Nath; Sunita Sarawagi
In this paper we present learning models for the class ratio estimation problem, which takes as input an unlabeled set of instances and predicts the proportions of instances in the set belonging to the different classes. This problem has applications in social and commercial data analysis. Existing models for class-ratio estimation however require instance-level supervision. Whereas in domains like politics, and demography, set-level supervision is more common. We present a new method for directly estimating class-ratios using set-level supervision. Another serious limitation in applying these techniques to sensitive domains like health is data privacy. We propose a novel label privacy-preserving mechanism that is well-suited for supervised class ratio estimation and has guarantees for achieving efficient differential privacy, provided the per-class counts are large enough. We derive learning bounds for the estimation with and without privacy constraints, which lead to important insights for the data-publisher. Extensive empirical evaluation shows that our model is more accurate than existing methods and that the proposed privacy mechanism and learning model are well-suited for each other.
Archive | 2010
Tom Gulik; Arun Shankar Iyer; Prasenjit Sarkar; Vinay Kakade; Erwin Tam
international conference on machine learning | 2014
Arun Shankar Iyer; J. Saketha Nath; Sunita Sarawagi
Archive | 2010
Sathiya Keerthi Selvaraj; Philip Bohannon; Mridul Muralidharan; Cong Yu; Ashwin Machanavajjhala; Arun Shankar Iyer; Sundararajan Sellamanickam
Archive | 2009
Cong Yu; Mridul Muralidharan; Arun Shankar Iyer; Philip Bohannon
Archive | 2009
Srujana Merugu; Arun Shankar Iyer; Ashwin Machanavajjhala; Santhiya Keerthi Selvaraj; Philip Bohannon
Archive | 2009
Nilesh N. Dalvi; Raghu Ramakrishnan; Vinay Kakade; Arup Kumar Choudhury; Sathiya Keerthi Selvaraj; Philip Bohannon; Mani Abrol; David M. Ciemiewicz; Arun Shankar Iyer; Vipul Agarwal; Alok S. Kirpal