Chris Clifton | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chris Clifton is active.

Explore More

Publication

Featured researches published by Chris Clifton.

knowledge discovery and data mining | 2002

Privacy preserving association rule mining in vertically partitioned data

Jaideep Vaidya; Chris Clifton

Privacy considerations often constrain data mining projects. This paper addresses the problem of association rule mining where transactions are distributed across sources. Each site holds some attributes of each transaction, and the sites wish to collaborate to identify globally valid association rules. However, the sites must not reveal individual transaction data. We present a two-party algorithm for efficiently discovering frequent itemsets with minimum support levels, without either site revealing individual transaction values.

IEEE Transactions on Knowledge and Data Engineering | 2004

Privacy-preserving distributed mining of association rules on horizontally partitioned data

Murat Kantarcioglu; Chris Clifton

Data mining can extract important knowledge from large data collections ut sometimes these collections are split among various parties. Privacy concerns may prevent the parties from directly sharing the data and some types of information about the data. We address secure mining of association rules over horizontally partitioned data. The methods incorporate cryptographic techniques to minimize the information shared, while adding little overhead to the mining task.

Sigkdd Explorations | 2002

Tools for privacy preserving distributed data mining

Chris Clifton; Murat Kantarcioglu; Jaideep Vaidya; Xiaodong Lin; Michael Y. Zhu

Privacy preserving mining of distributed data has numerous applications. Each application poses different constraints: What is meant by privacy, what are the desired results, how is the data distributed, what are the constraints on collaboration and cooperative computing, etc. We suggest that the solution to this is a toolkit of components that can be combined for specific privacy-preserving data mining applications. This paper presents some components of such a toolkit, and shows how they can be used to solve several privacy-preserving data mining problems.

knowledge discovery and data mining | 2003

Privacy-preserving k -means clustering over vertically partitioned data

Jaideep Vaidya; Chris Clifton

Privacy and security concerns can prevent sharing of data, derailing data mining projects. Distributed knowledge discovery, if done correctly, can alleviate this problem. The key is to obtain valid results, while providing guarantees on the (non)disclosure of data. We present a method for k-means clustering when different sites contain different attributes for a common set of entities. Each site learns the cluster of each entity, but learns nothing about the attributes at other sites.

international conference on management of data | 2001

Using unknowns to prevent discovery of association rules

Yücel Saygin; Vassilios S. Verykios; Chris Clifton

Data mining technology has given us new capabilities to identify correlations in large data sets. This introduces risks when the data is to be made public, but the correlations are private. We introduce a method for selectively removing individual values from a database to prevent the discovery of a set of rules, while preserving the data for other applications. The efficacy and complexity of this method are discussed. We also present an experiment showing an example of this methodology.

international conference on management of data | 2007

Hiding the presence of individuals from shared databases

Mehmet Ercan Nergiz; Maurizio Atzori; Chris Clifton

Advances in information technology, and its use in research, are increasing both the need for anonymized data and the risks of poor anonymization. We present a metric, δ-presence, that clearly links the quality of anonymization to the risk posed by inadequate anonymization. We show that existing anonymization techniques are inappropriate for situations where δ-presence is a good metric (specifically, where knowing an individual is in the database poses a privacy risk), and present algorithms for effectively anonymizing to meet δ-presence. The algorithms are evaluated in the context of a real-world scenario, demonstrating practical applicability of the approach.

Journal of Computer Security | 2005

Secure set intersection cardinality with application to association rule mining

Jaideep Vaidya; Chris Clifton

There has been concern over the apparent conflict between privacy and data mining. There is no inherent conflict, as most types of data mining produce summary results that do not reveal information about individuals. The process of data mining may use private data, leading to the potential for privacy breaches. Secure Multiparty Computation shows that results can be produced without revealing the data used to generate them. The problem is that general techniques for secure multiparty computation do not scale to data-mining size computations. This paper presents an efficient protocol for securely determining the size of set intersection, and shows how this can be used to generate association rules where multiple parties have different (and private) information about the same set of individuals.

international conference on management of data | 1998

Query flocks: a generalization of association-rule mining

Dick Tsur; Jeffrey D. Ullman; Serge Abiteboul; Chris Clifton; Rajeev Motwani; Svetlozar Nestorov; Arnon Rosenthal

Association-rule mining has proved a highly successful technique for extracting useful information from very large databases. This success is attributed not only to the appropriateness of the objectives, but to the fact that a number of new query-optimization ideas, such as the “a-priori” trick, make association-rule mining run much faster than might be expected. In this paper we see that the same tricks can be extended to a much more general context, allowing efficient mining of very large databases for many different kinds of patterns. The general idea, called “query flocks,” is a generate-and-test model for data-mining problems. We show how the idea can be used either in a general-purpose mining system or in a next generation of conventional query optimizers.

international conference on management of data | 2004

Privacy-preserving data integration and sharing

Chris Clifton; Murat Kantarcıoǧlu; AnHai Doan; Gunther Schadow; Jaideep Vaidya; Ahmed K. Elmagarmid; Dan Suciu

Integrating data from multiple sources has been a longstanding challenge in the database community. Techniques such as privacy-preserving data mining promises privacy, but assume data has integration has been accomplished. Data integration methods are seriously hampered by inability to share the data to be integrated. This paper lays out a privacy framework for data integration. Challenges for data integration in the context of this framework are discussed, in the context of existing accomplishments in data integration. Many of these challenges are opportunities for the data mining community.

IEEE Transactions on Knowledge and Data Engineering | 2004

TopCat: data mining for topic identification in a text corpus

Chris Clifton; Robert Cooley; Jason D. M. Rennie

TopCat (topic categories) is a technique for identifying topics that recur in articles in a text corpus. Natural language processing techniques are used to identify key entities in individual articles, allowing us to represent an article as a set of items. This allows us to view the problem in a database/data mining context: Identifying related groups of items. We present a novel method for identifying related items based on traditional data mining techniques. Frequent itemsets are generated from the groups of items, followed by clusters formed with a hypergraph partitioning scheme. We present an evaluation against a manually categorized ground truth news corpus; it shows this technique is effective in identifying topics in collections of news articles.

Explore More