Alina Campan
Northern Kentucky University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alina Campan.
knowledge discovery and data mining | 2009
Alina Campan; Traian Marius Truta
The advent of social network sites in the last years seems to be a trend that will likely continue. What naive technology users may not realize is that the information they provide online is stored and may be used for various purposes. Researchers have pointed out for some time the privacy implications of massive data gathering, and effort has been made to protect the data from unauthorized disclosure. However, the data privacy research has mostly targeted traditional data models such as microdata. Recently, social network data has begun to be analyzed from a specific privacy perspective, one that considers, besides the attribute values that characterize the individual entities in the networks, their relationships with other entities. Our main contributions in this paper are a greedy algorithm for anonymizing a social network and a measure that quantifies the information loss in the anonymization process due to edge generalization.
acm symposium on applied computing | 2007
Traian Marius Truta; Alina Campan
New privacy regulations together with ever increasing data availability and computational power have created a huge interest in data privacy research. One major research direction is built around k-anonymity property, which is required for the released data. Although many k-anonymization algorithms exist for static data, a complete framework to cope with data evolution (a real world scenario) has not been proposed before. In this paper, we introduce algorithms for the maintenance of k-anonymized versions of large evolving datasets. These algorithms incrementally manage insert/delete/update dataset modifications. Our results showed that incremental maintenance is very efficient compared with existing techniques and preserves data quality. The second main contribution of this paper is an optimization algorithm that is able to improve the quality of the solutions attained by either the non-incremental or incremental algorithms.
very large data bases | 2007
Traian Marius Truta; Alina Campan; Paul Meyer
Existing privacy regulations together with large amounts of available data have created a huge interest in data privacy research. A main research direction is built around the k-anonymity property. Several shortcomings of the k-anonymity model have been fixed by new privacy models such as p-sensitive k-anonymity, l-diversity, (α, k)-anonymity, and t-closeness. In this paper we introduce the Enhanced PK Clustering algorithm for generating p-sensitive k- anonymous microdata based on frequency distribution of sensitive attribute values. The p-sensitive k-anonymity model and its enhancement, extended p- sensitive k-anonymity, are described, their properties are presented, and two diversity measures are introduced. Our experiments have shown that the proposed algorithm improves several cost measures over existing algorithms.
very large data bases | 2011
Alina Campan; Nicholas Cooper; Traian Marius Truta
Generalization hierarchies are frequently used in computer science, statistics, biology, bioinformatics, and other areas when less specific values are needed for data analysis. Generalization is also one of the most used disclosure control technique for anonymizing data. For numerical attributes, generalization is performed either by using existing predefined generalization hierarchies or a hierarchy-free model. Because hierarchy-free generalization is not suitable for anonymization in all possible scenarios, generalization hierarchies are of particular interest for data anonymization. Traditionally, these hierarchies were created by the data owner with help from the domain experts. But while it is feasible to construct a hierarchy of small size, the effort increases for hierarchies that have many levels. Therefore, new approaches of creating these numerical hierarchies involve their automatic/on-the-fly generation. In this paper we extend an existing method for creating on-the-fly generalization hierarchies, we present several existing information loss measures used to assess the quality of anonymized data, and we run a series of experiments that show that our new method improves over existing methods to automatically generate on-the-fly numerical generalization hierarchies.
systems, man and cybernetics | 2010
Yi Hu; Alina Campan; James Walden; Irina Vorobyeva; Justin Shelton
Organizations spend a significant amount of resources securing their servers and network perimeters. However, these mechanisms are not sufficient for protecting databases. In this paper, we present a new technique for identifying malicious database transactions. Compared to many existing approaches which profile SQL query structures and database user activities to detect intrusions, the novelty of this approach is the automatic discovery and use of essential data dependencies, namely, multi-dimensional and multi-level data dependencies, for identifying anomalous database transactions. Since essential data dependencies reflect semantic relationships among data items and are less likely to change than SQL query structures or database user behaviors, they are ideal for profiling data correlations for identifying malicious database activities.1
Data Mining | 2010
Traian Marius Truta; Alina Campan
Existing privacy regulations together with large amounts of available data created a huge interest in data privacy research. A main research direction is built around the k-anonymity property. Several shortcomings of the k-anonymity model were addressed by new privacy models such as p-sensitive k-anonymity, l-diversity, (α,k)-anonymity, t-closeness. In this chapter we describe two algorithms (GreedyPKClustering and EnhancedPKClustering) for generating (extended) p-sensitive k-anonymous microdata. In our experiments, we compare the quality of generated microdata obtained with the mentioned algorithms and with another existing anonymization algorithm (Incognito). Also, we present two new branches of p-sensitive k-anonymity, the constrained p-sensitive k-anonymity model and the p-sensitive k-anonymity model for social networks.
acm symposium on applied computing | 2010
Alina Campan; Traian Marius Truta; Nicholas Cooper
Numerous privacy models based on the k-anonymity property have been introduced in the last few years. While differing in their methods and quality of their results, they all focus first on masking the data, and then protecting the quality of the data as a whole. We consider a new approach, where requirements on the amount of distortion allowed on the initial data are imposed in order to preserve its usefulness. In this paper, the constrained p-sensitive k-anonymity model is introduced and an algorithm for generating constrained p-sensitive k-anonymous microdata is presented. Our experiments have shown that the proposed algorithm is comparable quality-wise with existing algorithms.
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems | 2012
Traian Marius Truta; Alina Campan; Xiaoxun Sun
In this paper, we present an overview of p-sensitive k-anonymity models including the basic model, the extended p-sensitive k-anonymity, the constrained p-sensitive k-anonymity, and the (p+, α)-sensitive k-anonymity. Existing properties of these models are reviewed and illustrated, and new properties regarding the maximum number of QI-clusters are discussed and proved. This paper includes a review of related anonymity models and a very brief summary of existing algorithms for the family of p-sensitive k-anonymity models.
very large data bases | 2010
Alina Campan; Nicholas Cooper
We present in this paper a method for dynamically creating hierarchies for quasi-identifier numerical attributes. The resulting hierarchies can be used for generalization in microdata k-anonymization, or for allowing users to define generalization boundaries for constrained k-anonymity. The construction of a new numerical hierarchy for a numerical attribute is performed as a hierarchical agglomerative clustering of that attributes values in the dataset to anonymize. Therefore, the resulting tree hierarchy reflects well the closeness and clustering tendency of the attributes values in the dataset. Due to this characteristic of the hierarchies created on-the-fly for quasi-identifier numerical attributes, the quality of the microdata anonymized through generalization based on these hierarchies is well preserved, and the information loss in the anonymization process remains in reasonable bounds, as proved experimentally.
ieee annual computing and communication workshop and conference | 2017
Matthew Beck; Wei Hao; Alina Campan
Current trends show a move away from desktop computing and toward the rise in popularity of mobile devices. Yet mobile devices suffer from limitations in memory, storage, computational power, and battery life. Many of these limitations can be solved by offloading computations and storage to cloud-based platforms. E-commerce mobile applications designed to serve the global customer base of a retail outlet experience fluctuations in demand for resources based on the location of the users. Given a traditional client-server architecture, where the server application and database are deployed to a single geographic location, this can cause large disparities in response time perceived by users close to the server location and those at a much further distance. This could cause a loss of business or slow user growth in more distant regions. Using several Amazon Web Services(AWS), this paper tests a proxy system and k-means analysis based data partitioning solution to this issue. The discussion of k-means database partitioning describes a preprocessing methodology for adapting raw AWS Mobile Analytics log data for use in the k-means algorithm. The paper also compares a few alternatives for distance measurements and centroid computations for use in the k-means algorithm. Experimental results confirm that this approach significantly reduces response time. It also shows that the approach significantly increases server-side throughput.