Jaideep Vaidya | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jaideep Vaidya is active.

Explore More

Publication

Featured researches published by Jaideep Vaidya.

knowledge discovery and data mining | 2002

Privacy preserving association rule mining in vertically partitioned data

Jaideep Vaidya; Chris Clifton

Privacy considerations often constrain data mining projects. This paper addresses the problem of association rule mining where transactions are distributed across sources. Each site holds some attributes of each transaction, and the sites wish to collaborate to identify globally valid association rules. However, the sites must not reveal individual transaction data. We present a two-party algorithm for efficiently discovering frequent itemsets with minimum support levels, without either site revealing individual transaction values.

Sigkdd Explorations | 2002

Tools for privacy preserving distributed data mining

Chris Clifton; Murat Kantarcioglu; Jaideep Vaidya; Xiaodong Lin; Michael Y. Zhu

Privacy preserving mining of distributed data has numerous applications. Each application poses different constraints: What is meant by privacy, what are the desired results, how is the data distributed, what are the constraints on collaboration and cooperative computing, etc. We suggest that the solution to this is a toolkit of components that can be combined for specific privacy-preserving data mining applications. This paper presents some components of such a toolkit, and shows how they can be used to solve several privacy-preserving data mining problems.

knowledge discovery and data mining | 2003

Privacy-preserving k -means clustering over vertically partitioned data

Jaideep Vaidya; Chris Clifton

Privacy and security concerns can prevent sharing of data, derailing data mining projects. Distributed knowledge discovery, if done correctly, can alleviate this problem. The key is to obtain valid results, while providing guarantees on the (non)disclosure of data. We present a method for k-means clustering when different sites contain different attributes for a common set of entities. Each site learns the cluster of each entity, but learns nothing about the attributes at other sites.

Journal of Computer Security | 2005

Secure set intersection cardinality with application to association rule mining

Jaideep Vaidya; Chris Clifton

There has been concern over the apparent conflict between privacy and data mining. There is no inherent conflict, as most types of data mining produce summary results that do not reveal information about individuals. The process of data mining may use private data, leading to the potential for privacy breaches. Secure Multiparty Computation shows that results can be produced without revealing the data used to generate them. The problem is that general techniques for secure multiparty computation do not scale to data-mining size computations. This paper presents an efficient protocol for securely determining the size of set intersection, and shows how this can be used to generate association rules where multiple parties have different (and private) information about the same set of individuals.

computer and communications security | 2006

RoleMiner: mining roles using subset enumeration

Jaideep Vaidya; Vijayalakshmi Atluri; Janice Warner

Role engineering, the task of defining roles and associating permissions to them, is essential to realize the full benefits of the role-based access control paradigm. Essentially, there are two basic approaches to accomplish this: the top-down and the bottom-up. The top-down approach relies on a careful analysis of the business processes to define job functions and then specify appropriate roles from them. While this approach can aid in defining roles more accurately, it is tedious and time consuming since it requires that the semantics of the business processes be well understood. Moreover, it ignores existing permissions within an organization and does not utilize them. On the other hand, the bottom-up approach starts with existing permissions and attempts to derive roles from them, thus helping to automate role definition. In this paper, we present an unsupervised approach called RoleMiner that mines roles from existing user-permission assignments. Since a role is nothing but a set of permissions, when no semantics are available, the task of role mining is essentially that of clustering users that have same (or similar) permissions. However, unlike the traditional applications of data mining that ideally require identification of non-overlapping clusters, roles will have overlapping permission needs and thus permission sets that define roles should be allowed to overlap. It is this distinction from traditional clustering that makes the problem of role mining non-trivial. Our experiments with real and simulated data sets indicate that our role mining process is quite accurate and efficient.

international conference on management of data | 2004

Privacy-preserving data integration and sharing

Chris Clifton; Murat Kantarcıoǧlu; AnHai Doan; Gunther Schadow; Jaideep Vaidya; Ahmed K. Elmagarmid; Dan Suciu

Integrating data from multiple sources has been a longstanding challenge in the database community. Techniques such as privacy-preserving data mining promises privacy, but assume data has integration has been accomplished. Data integration methods are seriously hampered by inability to share the data to be integrated. This paper lays out a privacy framework for data integration. Challenges for data integration in the context of this framework are discussed, in the context of existing accomplishments in data integration. Many of these challenges are opportunities for the data mining community.

acm symposium on applied computing | 2006

Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data

Hwanjo Yu; Xiaoqian Jiang; Jaideep Vaidya

Traditional Data Mining and Knowledge Discovery algorithms assume free access to data, either at a centralized location or in federated form. Increasingly, privacy and security concerns restrict this access, thus derailing data mining projects. What we need is distributed knowledge discovery that is sensitive to this problem. The key is to obtain valid results, while providing guarantees on the non-disclosure of data. Support vector machine classification is one of the most widely used classification methodologies in data mining and machine learning. It is based on solid theoretical foundations and has wide practical application. This paper proposes a privacy-preserving solution for support vector machine (SVM) classification, PP-SVM for short. Our solution constructs the global SVM classification model from the data distributed at multiple parties, without disclosing the data of each party to others. We assume that data is horizontally partitioned -- each party collects the same features of information for different data objects. We quantify the security and efficiency of the proposed method, and highlight future challenges.

ACM Transactions on Knowledge Discovery From Data | 2008

Privacy-preserving decision trees over vertically partitioned data

Jaideep Vaidya; Chris Clifton; Murat Kantarcioglu; A. Scott Patterson

Privacy and security concerns can prevent sharing of data, derailing data-mining projects. Distributed knowledge discovery, if done correctly, can alleviate this problem. We introduce a generalized privacy-preserving variant of the ID3 algorithm for vertically partitioned data distributed over two or more parties. Along with a proof of security, we discuss what would be necessary to make the protocols completely secure. We also provide experimental results, giving a first demonstration of the practical complexity of secure multiparty computation-based data mining.

international conference on data engineering | 2008

Optimal Boolean Matrix Decomposition: Application to Role Engineering

Haibing Lu; Jaideep Vaidya; Vijayalakshmi Atluri

A decomposition of a binary matrix into two matrices gives a set of basis vectors and their appropriate combination to form the original matrix. Such decomposition solutions are useful in a number of application domains including text mining, role engineering as well as knowledge discovery. While a binary matrix can be decomposed in several ways, however, certain decompositions better characterize the semantics associated with the original matrix in a succinct but comprehensive way. Indeed, one can find different decompositions optimizing different criteria matching various semantics. In this paper, we first present a number of variants to the optimal Boolean matrix decomposition problem that have pragmatic implications. We then present a unified framework for modeling the optimal binary matrix decomposition and its variants using binary integer programming. Such modeling allows us to directly adopt the huge body of heuristic solutions and tools developed for binary integer programming. Although the proposed solutions are applicable to any domain of interest, for providing more meaningful discussions and results, in this paper, we present the binary matrix decomposition problem in a role engineering context, whose goal is to discover an optimal and correct set of roles from existing permissions, referred to as the role mining problem (RMP). This problem has gained significant interest in recent years as role based access control has become a popular means of enforcing security in databases. We consider several variants of the above basic RMP, including the min-noise RMP, delta-approximate RMP and edge-RMP. Solutions to each of them aid security administrators in specific scenarios. We then model these variants as Boolean matrix decomposition and present efficient heuristics to solve them.

very large data bases | 2008

Privacy-preserving Naïve Bayes classification

Jaideep Vaidya; Murat Kantarcioglu; Chris Clifton

Privacy-preserving data mining—developing models without seeing the data – is receiving growing attention. This paper assumes a privacy-preserving distributed data mining scenario: data sources collaborate to develop a global model, but must not disclose their data to others. The problem of secure distributed classification is an important one. In many situations, data is split between multiple organizations. These organizations may want to utilize all of the data to create more accurate predictive models while revealing neither their training data/databases nor the instances to be classified. Naïve Bayes is often used as a baseline classifier, consistently providing reasonable classification performance. This paper brings privacy-preservation to that baseline, presenting protocols to develop a Naïve Bayes classifier on both vertically as well as horizontally partitioned data.

Explore More