Is this you? Create Your Porfile

Hillol Kargupta

University of Maryland, Baltimore County

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hillol Kargupta is active.

Explore More

Publication

Featured researches published by Hillol Kargupta.

IEEE Transactions on Knowledge and Data Engineering | 2006

Random projection-based multiplicative data perturbation for privacy preserving distributed data mining

Kun Liu; Hillol Kargupta; Jessica Ryan

This paper explores the possibility of using multiplicative random projection matrices for privacy preserving distributed data mining. It specifically considers the problem of computing statistical aggregates like the inner product matrix, correlation coefficient matrix, and Euclidean distance matrix from distributed privacy sensitive data possibly owned by multiple parties. This class of problems is directly related to many other data-mining problems such as clustering, principal component analysis, and classification. This paper makes primary contributions on two different grounds. First, it explores independent component analysis as a possible tool for breaching privacy in deterministic multiplicative perturbation-based models such as random orthogonal transformation and random rotation. Then, it proposes an approximate random projection-based technique to improve the level of privacy protection while still preserving certain statistical characteristics of the data. The paper presents extensive theoretical analysis and experimental results. Experiments demonstrate that the proposed technique is effective and can be successfully used for different types of privacy-preserving data mining applications.

IEEE Internet Computing | 2006

Distributed Data Mining in Peer-to-Peer Networks

Souptik Datta; Kanishka Bhaduri; Chris Giannella; Ran Wolff; Hillol Kargupta

Peer-to-peer (P2P) networks are gaining popularity in many applications such as file sharing, e-commerce, and social networking, many of which deal with rich, distributed data sources that can benefit from data mining. P2P networks are, in fact, well-suited to distributed data mining (DDM), which deals with the problem of data analysis in environments with distributed data, computing nodes, and users. This article offers an overview of DDM applications and algorithms for P2P environments, focusing particularly on local algorithms that perform data analysis by using computing primitives with limited communication overhead. The authors describe both exact and approximate local P2P data mining algorithms that work in a decentralized and communication-efficient manner

ieee international conference on evolutionary computation | 1996

The Gene Expression Messy Genetic Algorithm

Hillol Kargupta

Introduces the Gene Expression Messy Genetic Algorithm (GEMGA)-a new generation of messy genetic algorithms that directly search for relations among the members of the search space. GEMGA is an O[/spl Lambda//sup k/(l/sup 2/+k)] sample complexity algorithm for the class of order-k delineable problems (problems that can be solved by considering no higher than order-k relations). GEMGA is designed based on an alternate perspective of natural evolution, as proposed by the SEARCH (Search Envisioned As Relation and Class Hierarchizing) framework, that emphasizes the role of gene expression. GEMGA uses the transcription operator to search for relations. This paper also presents test results of the GEMGA for large multimodal order-k delineable problems.

Knowledge and Information Systems | 2005

Random-data perturbation techniques and privacy-preserving data mining

Hillol Kargupta; Souptik Datta; Qi Wang; Krishnamoorthy Sivakumar

Privacy is becoming an increasingly important issue in many data-mining applications. This has triggered the development of many privacy-preserving data-mining techniques. A large fraction of them use randomized data-distortion techniques to mask the data for preserving the privacy of sensitive data. This methodology attempts to hide the sensitive data by randomly modifying the data values often using additive noise. This paper questions the utility of the random-value distortion technique in privacy preservation. The paper first notes that random matrices have predictable structures in the spectral domain and then it develops a random matrix-based spectral-filtering technique to retrieve original data from the dataset distorted by adding random values. The proposed method works by comparing the spectrum generated from the observed data with that of random matrices. This paper presents the theoretical foundation and extensive experimental results to demonstrate that, in many cases, random-data distortion preserves very little data privacy. The analytical framework presented in this paper also points out several possible avenues for the development of new privacy-preserving data-mining techniques. Examples include algorithms that explicitly guard against privacy breaches through linear transformations, exploiting multiplicative and colored noise for preserving privacy in data mining applications.

Knowledge and Information Systems | 2001

Distributed clustering using collective principal component analysis

Hillol Kargupta; Weiyun Huang; Krishnamoorthy Sivakumar; Erik L. Johnson

Abstract. This paper considers distributed clustering of high-dimensional heterogeneous data using a distributed principal component analysis (PCA) technique called the collective PCA. It presents the collective PCA technique, which can be used independent of the clustering application. It shows a way to integrate the Collective PCA with a given off-the-shelf clustering algorithm in order to develop a distributed clustering technique. It also presents experimental results using different test data sets including an application for web mining.

Information Sciences | 2006

Clustering distributed data streams in peer-to-peer environments

Sanghamitra Bandyopadhyay; Chris Giannella; Ujjwal Maulik; Hillol Kargupta; Kun Liu; Souptik Datta

This paper describes a technique for clustering homogeneously distributed data in a peer-to-peer environment like sensor networks. The proposed technique is based on the principles of the K-Means algorithm. It works in a localized asynchronous manner by communicating with the neighboring nodes. The paper offers extensive theoretical analysis of the algorithm that bounds the error in the distributed clustering process compared to the centralized approach that requires downloading all the observed data to a single site. Experimental results show that, in contrast to the case when all the data is transmitted to a central location for application of the conventional clustering algorithm, the communication cost (an important consideration in sensor networks which are typically equipped with limited battery power) of the proposed approach is significantly smaller. At the same time, the accuracy of the obtained centroids is high and the number of samples which are incorrectly labeled is also small.

Sigkdd Explorations | 2002

MobiMine: monitoring the stock market from a PDA

Hillol Kargupta; Byung-Hoon Park; Sweta Pittie; Lei Liu; Deepali Kushraj; Kakali Sarkar

This paper describes an experimental mobile data mining system that allows intelligent monitoring of time-critical financial data from a hand-held PDA. It presents the overall system architecture and the philosophy behind the design. It explores one particular aspect of the system---automated construction of personalized focus area that calls for users attention. This module works using data mining techniques. The paper describes the data mining component of the system that employs a novel Fourier analysis-based approach to efficiently represent, visualize, and communicate decision trees over limited bandwidth wireless networks. The paper also discusses a quadratic programming-based personalization module that runs on the PDAs and the multi-media based user-interfaces. It reports experimental results using an ad hoc peer-to-peer IEEE 802.11 wireless network.

IEEE Transactions on Knowledge and Data Engineering | 2009

Approximate Distributed K-Means Clustering over a Peer-to-Peer Network

Souptik Datta; Chris Giannella; Hillol Kargupta

Data intensive peer-to-peer (P2P) networks are finding increasing number of applications. Data mining in such P2P environments is a natural extension. However, common monolithic data mining architectures do not fit well in such environments since they typically require centralizing the distributed data which is usually not practical in a large P2P network. Distributed data mining algorithms that avoid large-scale synchronization or data centralization offer an alternate choice. This paper considers the distributed K-means clustering problem where the data and computing resources are distributed over a large P2P network. It offers two algorithms which produce an approximation of the result produced by the standard centralized K-means clustering algorithm. The first is designed to operate in a dynamic P2P network that can produce clusterings by ldquolocalrdquo synchronization only. The second algorithm uses uniformly sampled peers and provides analytical guarantees regarding the accuracy of clustering on a P2P network. Empirical results show that both the algorithms demonstrate good performance compared to their centralized counterparts at the modest communication cost.

european conference on principles of data mining and knowledge discovery | 2006

An attacker's view of distance preserving maps for privacy preserving data mining

Kun Liu; Chris Giannella; Hillol Kargupta

We examine the effectiveness of distance preserving transformations in privacy preserving data mining. These techniques are potentially very useful in that some important data mining algorithms can be efficiently applied to the transformed data and produce exactly the same results as if applied to the original data e.g. distance-based clustering, k-nearest neighbor classification. However, the issue of how well the original data is hidden has, to our knowledge, not been carefully studied. We take a step in this direction by assuming the role of an attacker armed with two types of prior information regarding the original data. We examine how well the attacker can recover the original data from the transformed data and prior information. Our results offer insight into the vulnerabilities of distance preserving transformations.

Engineering Applications of Artificial Intelligence | 2005

Distributed data mining and agents

Josenildo Costa da Silva; Chris Giannella; Ruchita Bhargava; Hillol Kargupta; Matthias Klusch

Multi-agent systems (MAS) offer an architecture for distributed problem solving. Distributed data mining (DDM) algorithms focus on one class of such distributed problem solving tasks-analysis and modeling of distributed data. This paper offers a perspective on DDM algorithms in the context of multi-agents systems. It discusses broadly the connection between DDM and MAS. It provides a high-level survey of DDM, then focuses on distributed clustering algorithms and some potential applications in multi-agent-based problem solving scenarios. It reviews algorithms for distributed clustering, including privacy-preserving ones. It describes challenges for clustering in sensor-network environments, potential shortcomings of the current algorithms, and future work accordingly. It also discusses confidentiality (privacy preservation) and presents a new algorithm for privacy-preserving density-based clustering.

Explore More