Anna Koufakou | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Anna Koufakou is active.

Explore More

Publication

Featured researches published by Anna Koufakou.

international conference on tools with artificial intelligence | 2007

A Scalable and Efficient Outlier Detection Strategy for Categorical Data

Anna Koufakou; Enrique Ortiz; Michael Georgiopoulos; Georgios C. Anagnostopoulos; Kenneth Reynolds

Outlier detection has received significant attention in many applications, such as detecting credit card fraud or network intrusions. Most existing research focuses on numerical datasets, and cannot directly apply to categorical sets where there is little sense in calculating distances among data points. Furthermore, a number of outlier detection methods require quadratic time with respect to the dataset size and usually multiple dataset scans. These characteristics are undesirable for large datasets, potentially scattered over multiple distributed sites. In this paper, we introduce Attribute Value Frequency (A VF), a fast and scalable outlier detection strategy for categorical data. A VF scales linearly with the number of data points and attributes, and relies on a single data scan. AVF is compared with a list of representative outlier detection approaches that have not been contrasted against each other. Our proposed solution is experimentally shown to be significantly faster, and as effective in discovering outliers.

Neural Networks | 2001

Cross-validation in Fuzzy ARTMAP for large databases

Anna Koufakou; Michael Georgiopoulos; Georgios C. Anagnostopoulos; Takis Kasparis

In this paper we are examining the issue of overtraining in Fuzzy ARTMAP. Over-training in Fuzzy ARTMAP manifests itself in two different ways: (a) it degrades the generalization performance of Fuzzy ARTMAP as training progresses; and (b) it creates unnecessarily large Fuzzy ARTMAP neural network architectures. In this work, we are demonstrating that overtraining happens in Fuzzy ARTMAP and we propose an old remedy for its cure: cross-validation. In our experiments, we compare the performance of Fuzzy ARTMAP that is trained (i) until the completion of training, (ii) for one epoch, and (iii) until its performance on a validation set is maximized. The experiments were performed on artificial and real databases. The conclusion derived from those experiments is that cross-validation is a useful procedure in Fuzzy ARTMAP, because it produces smaller Fuzzy ARTMAP architectures with improved generalization performance. The trade-off is that cross-validation introduces additional computational complexity in the training phase of Fuzzy ARTMAP.

international symposium on neural networks | 2008

Fast parallel outlier detection for categorical datasets using MapReduce

Anna Koufakou; Jimmy Secretan; John Reeder; Kelvin Cardona; Michael Georgiopoulos

Outlier detection has received considerable attention in many applications, such as detecting network attacks or credit card fraud The massive datasets currently available for mining in some of these outlier detection applications require large parallel systems, and consequently parallelizable outlier detection methods. Most existing outlier detection methods assume that all of the attributes of a dataset are numerical, usually have a quadratic time complexity with respect to the number of points in the dataset, and quite often they require multiple dataset scans. In this paper, we propose a fast parallel outlier detection strategy based on the Attribute Value Frequency (AVF) approach, a high-speed, scalable outlier detection method for categorical data that is inherently easy to parallelize. Our proposed solution, MR-AVF, is based on the MapReduce paradigm for parallel programming, which offers load balancing and fault tolerance. MR-AVF is particularly simple to develop and it is shown to be highly scalable with respect to the number of cluster nodes.

international symposium on neural networks | 2001

Overtraining in fuzzy ARTMAP: Myth or reality?

Michael Georgiopoulos; Anna Koufakou; Georgios C. Anagnostopoulos; Takis Kasparis

We examine the issue of overtraining in fuzzy ARTMAP. Over-training in fuzzy ARTMAP manifests itself in two different ways: 1) it degrades the generalization performance of fuzzy ARTMAP as training progresses; and 2) it creates unnecessarily large fuzzy ARTMAP neural network architectures. In this work we demonstrate that overtraining happens in fuzzy ARTMAP and propose an old remedy for its cure: cross-validation. In our experiments we compare the performance of fuzzy ARTMAP that is trained: 1) until the completion of training, 2) for one epoch, and 3) until its performance on a validation set is maximized. The experiments were performed on artificial and real databases. The conclusion derived from these experiments is that cross-validation is a useful procedure in fuzzy ARTMAP, because it produces smaller fuzzy ARTMAP architectures with improved generalization performance. The trade-off is that cross-validation introduces additional computational complexity in the training phase of fuzzy ARTMAP.

Future Generation Computer Systems | 2010

APHID: An architecture for private, high-performance integrated data mining

Jimmy Secretan; Michael Georgiopoulos; Anna Koufakou; Kelvin Cardona

While the emerging field of privacy preserving data mining (PPDM) will enable many new data mining applications, it suffers from several practical difficulties. PPDM algorithms are challenging to develop and computationally intensive to execute. Developers need convenient abstractions to simplify the engineering of PPDM applications. The individual parties involved in the data mining process need a way to bring high-performance, parallel computers to bear on the computationally intensive parts of the PPDM tasks. This paper discusses APHID (Architecture for Private and High-performance Integrated Data mining), a practical architecture and software framework for developing and executing large scale PPDM applications. At one tier, the system supports simplified use of cluster and grid resources, and at another tier, the system abstracts communication for easy PPDM algorithm development. This paper offers a detailed analysis of the challenges in developing PPDM algorithms with existing frameworks, and motivates the design of a new infrastructure based on these challenges.

International Journal on Artificial Intelligence Tools | 2006

ANSWER: APPROXIMATE NAME SEARCH WITH ERRORS IN LARGE DATABASES BY A NOVEL APPROACH BASED ON PREFIX-DICTIONARY

Olcay Kursun; Anna Koufakou; Abhijit Wakchaure; Michael Georgiopoulos; Kenneth Reynolds; Ronald Eaglin

The obvious need for using modern computer networking capabilities to enable the effective sharing of information has resulted in data-sharing systems, which store, and manage large amounts of data. These data need to be effectively searched and analyzed. More specifically, in the presence of dirty data, a search for specific information by a standard query (e.g., search for a name that is misspelled or mistyped) does not return all needed information, as required in homeland security, criminology, and medical applications, amongst others. Different techniques, such as soundex, phonix, n-grams, edit-distance, have been used to improve the matching rate in these name-matching applications. These techniques have demonstrated varying levels of success, but there is a pressing need for name matching approaches that provide high levels of accuracy in matching names, while at the same time maintaining low computational complexity. In this paper, such a technique, called ANSWER, is proposed and its characteristics are discussed. Our results demonstrate that ANSWER possesses high accuracy, as well as high speed and is superior to other techniques of retrieving fuzzy name matches in large databases.

Proceedings of SPIE | 2001

Cross-validation in fuzzy ARTMAP neural networks for large sample classification problems

Michael Georgiopoulos; Anna Koufakou; Georgios C. Anagnostopoulos; Takis Kasparis

In this paper we are examining the issue of overtraining in Fuzzy ARTMAP. Over-training in Fuzzy ARTMAP manifests itself in two different ways: (a) it degrades the generalization performance of Fuzzy ARTMAP as training progresses, and (b) it creates unnecessarily large Fuzzy ARTMAP neural network architectures. In this work we are demonstrating that overtraining happens in Fuzzy ARTMAP and we propose an old remedy for its cure: cross-validation. In our experiments we compare the performance of Fuzzy ARTMAP that is trained (i) until the completion of training, (ii) for one epoch, and (iii) until its performance on a validation set is maximized. The experiments were performed on artificial and real databases. The conclusion derived from these experiments is that cross-validation is a useful procedure in Fuzzy ARTMAP, because it produces smaller Fuzzy ARTMAP architectures with improved generalization performance. The trade-off is that cross-validation introduces additional computational complexity in the training phase of Fuzzy ARTMAP.

DMIN | 2008