Geetha Jagannathan
Rutgers University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Geetha Jagannathan.
knowledge discovery and data mining | 2005
Geetha Jagannathan; Rebecca N. Wright
Advances in computer networking and database technologies have enabled the collection and storage of vast quantities of data. Data mining can extract valuable knowledge from this data, and organizations have realized that they can often obtain better results by pooling their data together. However, the collected data may contain sensitive or private information about the organizations or their customers, and privacy concerns are exacerbated if data is shared between multiple organizations.Distributed data mining is concerned with the computation of models from data that is distributed among multiple participants. Privacy-preserving distributed data mining seeks to allow for the cooperative computation of such models without the cooperating parties revealing any of their individual data items. Our paper makes two contributions in privacy-preserving data mining. First, we introduce the concept of arbitrarily partitioned data, which is a generalization of both horizontally and vertically partitioned data. Second, we provide an efficient privacy-preserving protocol for k-means clustering in the setting of arbitrarily partitioned data.
international conference on data mining | 2009
Geetha Jagannathan; Krishnan Pillaipakkamnatt; Rebecca N. Wright
In this paper, we study the problem of constructing private classifiers using decision trees, within the framework of differential privacy. We first construct privacy-preserving ID3 decision trees using differentially private sum queries. Our experiments show that for many data sets a reasonable privacy guarantee can only be obtained via this method at a steep cost of accuracy in predictions. We then present a differentially private decision tree ensemble algorithm using the random decision tree approach. We demonstrate experimentally that our approach yields good prediction accuracy even when the size of the datasets is small. We also present a differentially private algorithm for the situation in which new data is periodically appended to an existing database. Our experiments show that our differentially private random decision tree classifier handles data updates in a way that maintains the same level of privacy guarantee.
data and knowledge engineering | 2008
Geetha Jagannathan; Rebecca N. Wright
Handling missing data is a critical step to ensuring good results in data mining. Like most data mining algorithms, existing privacy-preserving data mining algorithms assume data is complete. In order to maintain privacy in the data mining process while cleaning data, privacy-preserving methods of data cleaning will be required. In this paper, we address the problem of privacy-preserving data imputation of missing data. Specifically, we present a privacy-preserving protocol for filling in missing values using a lazy decision tree imputation algorithm for data that is horizontally partitioned between two parties. The participants of the protocol learn only the imputed values; the computed decision tree is not learned by either party.
international conference on data mining | 2007
Geetha Jagannathan; Rebecca N. Wright
We study private inference control for aggregate queries, such as those provided by statistical databases or modern database languages, to a database in a way that satisfies privacy requirements and inference control requirements. For each query, the client learns the value of the function for that query if and only if the query passes a specified in- ference control rule. The server learns nothing about the queries, and the client learns nothing other than the query output for passing queries. We present general protocols for aggregate queries with private inference control.
international conference on data mining | 2007
Geetha Jagannathan; Rebecca N. Wright
The following topics are dealt with: data mining in Web 2.0 environment; knowledge-discovery from multimedia data and multimedia applications; mining and management of biological data; data mining in medicine; optimization-based data mining techniques; high performance data mining; mining graphs and complex structures; data mining on uncertain data; data streaming mining and management; spatial and spatio-temporal data mining.
algorithmic learning theory | 2013
Anna Choromanska; Krzysztof Choromanski; Geetha Jagannathan; Claire Monteleoni
In this paper, we study the problem of differentially-private learning of low dimensional manifolds embedded in high dimensional spaces. The problems one faces in learning in high dimensional spaces are compounded in differentially-private learning. We achieve the dual goals of learning the manifold while maintaining the privacy of the dataset by constructing a differentially-private data structure that adapts to the doubling dimension of the dataset. Our differentially-private manifold learning algorithm extends random projection trees of Dasgupta and Freund. A naive construction of differentially-private random projection trees could involve queries with high global sensitivity that would affect the usefulness of the trees. Instead, we present an alternate way of constructing differentially-private random projection trees that uses low sensitivity queries that are precise enough for learning the low dimensional manifolds. We prove that the size of the tree depends only on the doubling dimension of the dataset and not its extrinsic dimension.
international conference on data mining | 2007
Geetha Jagannathan; Krishnan Pillaipakkamnatt; Daryl Umano
We present a distributed privacy-preserving protocol for the clustering of data streams. The participants of the se- cure protocol learn cluster centers only on completion of the protocol. Our protocol does not reveal intermediate candidate cluster centers. It is also efficient in terms of communication. The protocol is based on a new memory- efficient clustering algorithm for data streams. Our experi- ments show that, on average, the accuracy of this algorithm is better than that of the well known k-means algorithm, and compares well with BIRCH, but has far smaller mem- ory requirements.
international conference on data mining | 2006
Geetha Jagannathan; Rebecca N. Wright
In this paper, we investigate privacy-preserving data imputation on distributed databases. We present a privacy-preserving protocol for filling in missing values using a lazy decision tree imputation algorithm for data that is horizontally partitioned between two parties. The participants of the protocol learn only the imputed values; the computed decision tree is not learned by either party
Theoretical Computer Science | 2016
Anna Choromanska; Krzysztof Choromanski; Geetha Jagannathan; Claire Monteleoni
In this paper, we study the problem of differentially-private learning of low dimensional manifolds embedded in high dimensional spaces. The problems one faces in learning in high dimensional spaces are compounded in a differentially-private learning. We achieve the dual goals of learning the manifold while maintaining the privacy of the dataset by constructing a differentially-private data structure that adapts to the doubling dimension of the dataset. Our differentially-private manifold learning algorithm extends random projection trees of Dasgupta and Freund. A naive construction of differentially-private random projection trees could involve queries with high global sensitivity that would affect the usefulness of the trees. Instead, we present an alternate way of constructing differentially-private random projection trees that uses low sensitivity queries that are precise enough for learning the low dimensional manifolds. We prove that the size of the tree depends only on the doubling dimension of the dataset and not its extrinsic dimension.
ACM Journal of Experimental Algorithms | 2008
Michael A. Bender; Bryan Bradley; Geetha Jagannathan; Krishnan Pillaipakkamnatt
The sum-of-squares algorithm (SS) was introduced by Csirik, Johnson, Kenyon, Shor, and Weber for online bin packing of integral-sized items into integral-sized bins. First, we show the results of experiments from two new variants of the SS algorithm. The first variant, which runs in time O(n&sqrt;BlogB), appears to have almost identical expected waste as the sum-of-squares algorithm on all the distributions mentioned in the original papers on this topic. The other variant, which runs in O(nlogB) time, performs well on most, but not on all of those distributions. We also apply SS to the online memory-allocation problem. Our experimental comparisons between SS and Best Fit indicate that neither algorithm is consistently better than the other. If the amount of randomness in item sizes is low, SS appears to have lower waste than Best Fit, whereas, if the amount of randomness is high Best Fit appears to have lower waste than SS. Our experiments suggest that in both real and synthetic traces, SS does not seem to have an asymptotic advantage over Best Fit, in contrast with the bin-packing problem.