Jimmy Secretan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jimmy Secretan is active.

Explore More

Publication

Featured researches published by Jimmy Secretan.

international symposium on neural networks | 2007

A Privacy Preserving Probabilistic Neural Network for Horizontally Partitioned Databases

Jimmy Secretan; Michael Georgiopoulos; Jose Castro

In this paper, we present a version of the probabilistic neural network (PNN) that is capable of operating on a distributed database that is horizontally partitioned. It does so in a way that is privacy-preserving: that is, a test point can be evaluated by the algorithm without any party knowing the data owned by the other parties. We present an analysis of this algorithm from the standpoints of security and computational performance. Finally, we provide performance results of an implementation of this privacy preserving, distributed PNN algorithm.

international symposium on neural networks | 2008

Fast parallel outlier detection for categorical datasets using MapReduce

Anna Koufakou; Jimmy Secretan; John Reeder; Kelvin Cardona; Michael Georgiopoulos

Outlier detection has received considerable attention in many applications, such as detecting network attacks or credit card fraud The massive datasets currently available for mining in some of these outlier detection applications require large parallel systems, and consequently parallelizable outlier detection methods. Most existing outlier detection methods assume that all of the attributes of a dataset are numerical, usually have a quadratic time complexity with respect to the number of points in the dataset, and quite often they require multiple dataset scans. In this paper, we propose a fast parallel outlier detection strategy based on the Attribute Value Frequency (AVF) approach, a high-speed, scalable outlier detection method for categorical data that is inherently easy to parallelize. Our proposed solution, MR-AVF, is based on the MapReduce paradigm for parallel programming, which offers load balancing and fault tolerance. MR-AVF is particularly simple to develop and it is shown to be highly scalable with respect to the number of cluster nodes.

Knowledge and Information Systems | 2011

Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data

Anna Koufakou; Jimmy Secretan; Michael Georgiopoulos

Detecting outliers in a dataset is an important data mining task with many applications, such as detection of credit card fraud or network intrusions. Traditional methods assume numerical data and compute pair-wise distances among points. Recently, outlier detection methods were proposed for categorical and mixed-attribute data using the concept of Frequent Itemsets (FIs). These methods face challenges when dealing with large high-dimensional data, where the number of generated FIs can be extremely large. To address this issue, we propose several outlier detection schemes inspired by the well-known condensed representation of FIs, Non-Derivable Itemsets (NDIs). Specifically, we contrast a method based on frequent NDIs, FNDI-OD, and a method based on the negative border of NDIs, NBNDI-OD, with their previously proposed FI-based counterparts. We also explore outlier detection based on Non-Almost Derivable Itemsets (NADIs), which approximate the NDIs in the data given a δ parameter. Our proposed methods use a far smaller collection of sets than the FI collection in order to compute an anomaly score for each data point. Experiments on real-life data show that, as expected, methods based on NDIs and NADIs offer substantial advantages in terms of speed and scalability over FI-based Outlier Detection method. What is significant is that NDI-based methods exhibit similar or better detection accuracy compared to the FI-based methods, which supports our claim that the NDI representation is especially well-suited for the task of detecting outliers. At the same time, the NDI approximation scheme, NADIs is shown to exhibit similar accuracy to the NDI-based method for various δ values and further runtime performance gains. Finally, we offer an in-depth discussion and experimentation regarding the trade-offs of the proposed algorithms and the choice of parameter values.

IEEE Transactions on Education | 2009

A Sustainable Model for Integrating Current Topics in Machine Learning Research Into the Undergraduate Curriculum

Michael Georgiopoulos; Ronald F. DeMara; Avelino J. Gonzalez; Annie S. Wu; Mansooreh Mollaghasemi; Erol Gelenbe; Marcella Kysilka; Jimmy Secretan; Carthik A. Sharma; Ayman J. Alnsour

This paper presents an integrated research and teaching model that has resulted from an NSF-funded effort to introduce results of current machine learning research into the engineering and computer science curriculum at the University of Central Florida (UCF). While in-depth exposure to current topics in machine learning has traditionally occurred at the graduate level, the model developed affords an innovative and feasible approach to expanding the depth of coverage in research topics to undergraduate students. The model has been self-sustaining as evidenced by its continued operation during the years after the NSF grants expiration, and is transferable to other institutions due to its use of modular and faculty-specific technical content. This model offers a tightly coupled teaching and research approach to introducing current topics in machine learning research to undergraduates, while also involving them in the research process itself. The approach has provided new mechanisms to increase faculty participation in undergraduate research, has exposed approximately 15 undergraduates annually to research at UCF, and has effectively prepared a number of these students for graduate study through active involvement in the research process and coauthoring of publications.

human factors in computing systems | 2009

Computational creativity support: using algorithms and machine learning to help people be more creative

Dan Morris; Jimmy Secretan

The emergence of computers as a core component of creative processes, coupled with recent advances in machine-learning, signal-processing, and algorithmic techniques for manipulating creative media, offers tremendous potential for building end-user creativity-support tools. However, the scientific community making advances in relevant algorithmic techniques is not, in many cases, the same community that is currently making advances in the design, evaluation, and user-experience aspects of creativity support. The primary objective of this workshop is thus to bring together participants from diverse backgrounds in the HCI, design, art, machine-learning, and algorithms communities to facilitate the advancement of novel creativity support tools.

Future Generation Computer Systems | 2010

APHID: An architecture for private, high-performance integrated data mining

Jimmy Secretan; Michael Georgiopoulos; Anna Koufakou; Kelvin Cardona

While the emerging field of privacy preserving data mining (PPDM) will enable many new data mining applications, it suffers from several practical difficulties. PPDM algorithms are challenging to develop and computationally intensive to execute. Developers need convenient abstractions to simplify the engineering of PPDM applications. The individual parties involved in the data mining process need a way to bring high-performance, parallel computers to bear on the computationally intensive parts of the PPDM tasks. This paper discusses APHID (Architecture for Private and High-performance Integrated Data mining), a practical architecture and software framework for developing and executing large scale PPDM applications. At one tier, the system supports simplified use of cluster and grid resources, and at another tier, the system abstracts communication for easy PPDM algorithm development. This paper offers a detailed analysis of the challenges in developing PPDM algorithms with existing frameworks, and motivates the design of a new infrastructure based on these challenges.

Neural Networks | 2007

Pipelining of Fuzzy ARTMAP without matchtracking: Correctness, performance bound, and Beowulf evaluation

José Castro; Jimmy Secretan; Michael Georgiopoulos; Ronald F. DeMara; Georgios C. Anagnostopoulos; Avelino J. Gonzalez

Fuzzy ARTMAP neural networks have been proven to be good classifiers on a variety of classification problems. However, the time that Fuzzy ARTMAP takes to converge to a solution increases rapidly as the number of patterns used for training is increased. In this paper we examine the time Fuzzy ARTMAP takes to converge to a solution and we propose a coarse grain parallelization technique, based on a pipeline approach, to speed-up the training process. In particular, we have parallelized Fuzzy ARTMAP without the match-tracking mechanism. We provide a series of theorems and associated proofs that show the characteristics of Fuzzy ARTMAPs, without matchtracking, parallel implementation. Results run on a BEOWULF cluster with three large databases show linear speedup as a function of the number of processors used in the pipeline. The databases used for our experiments are the Forrest CoverType database from the UCI Machine Learning repository and two artificial databases, where the data generated were 16-dimensional Gaussian distributed data belonging to two distinct classes, with different amounts of overlap (5% and 15%).

The Journal of Supercomputing | 2009

Efficient allocation and composition of distributed storage

Jimmy Secretan; Malachi Lawson; Ladislau Bölöni

In this paper, we investigate the composition of cheap network storage resources to meet specific availability and capacity requirements. We show that the problem of finding the optimal composition for availability and price requirements can be reduced to the knapsack problem, and propose three techniques for efficiently finding approximate solutions. The first algorithm uses a dynamic programming approach to find mirrored storage resources for high availability requirements, and runs in the pseudo-polynomial O(n2c) time where n is the number of sellers’ resources to choose from and c is a capacity function of the requested and minimum availability. The second technique is a heuristic which finds resources to be agglomerated into a larger coherent resource, with complexity of O(nlog n). The third technique finds a compromise between capacity and availability (which in our phrasing is a complex integer programming problem) using a genetic algorithm. The algorithms can be implemented on a broker that intermediates between buyers and sellers of storage resources. Finally, we show that a broker in an open storage market, using the combination of the three algorithms can more frequently meet user requests and lower the cost of requests that are met compared to a broker that simply matches single resources to requests.

international joint conference on neural network | 2006

Methods for Parallelizing the Probabilistic Neural Network on a Beowulf Cluster Computer

Jimmy Secretan; Michael Georgiopoulos; Ian Maidhof; Philip Shibly; Joshua Hecker

In this paper, we present three different methods for implementing the probabilistic neural network on a Beowulf cluster computer. The three methods, parallel full training set (PFT-PNN), parallel split training set (PST-PNN) and the pipelined PNN (PPNN) all present different performance tradeoffs for different applications. We present implementations for all three architectures that are fully equivalent to the serial version and analyze the tradeoffs governing their potential use in actual engineering applications. Finally we provide performance results for all three methods on a Beowulf cluster.

international symposium on neural networks | 2005

Parallelizing the fuzzy ARTMAP algorithm on a Beowulf cluster

Jimmy Secretan; José Castro; Michael Georgiopoulos; J. Tapia; Amit Chadha; B. Huber; Georgios C. Anagnostopoulos; S.M. Richie

Fuzzy ARTMAP neural networks have been proven to be good classifiers on a variety of classification problems. However, the time that it takes fuzzy ARTMAP to converge to a solution increases rapidly as the number of patterns used for training increases. In this paper, we propose a coarse grain parallelization technique, based on a pipeline approach, to speed-up fuzzy ARTMAPs training process. In particular, we first parallelized fuzzy ARTMAP, without the match-tracking mechanism, and then we parallelized fuzzy ARTMAP with the match-tracking mechanism. Results run on a Beowulf cluster with a well known large database (Forrest Covertype database from the UCI repository) show linear speedup with respect to the number of processors used in the pipeline.

Explore More