Imola K. Fodor
Lawrence Livermore National Laboratory
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Imola K. Fodor.
Bioinformatics | 2005
Imola K. Fodor; David O. Nelson; Michelle Alegria-Hartman; Kristin Robbins; Richard G. Langlois; Kenneth W. Turteltaub; Todd H. Corzett; Sandra L. McCutchen-Maloney
MOTIVATION The DeCyder software (GE Healthcare) is the current state-of-the-art commercial product for the analysis of two-dimensional difference gel electrophoresis (2D DIGE) experiments. Analyses complementing DeCyder are suggested by incorporating recent advances from the microarray data analysis literature. A case study on the effect of smallpox vaccination is used to compare the results obtained from DeCyder with the results obtained by applying moderated t-tests adjusted for multiple comparisons to DeCyder output data that was additionally normalized. RESULTS Application of the more stringent statistical tests applied to the normalized 2D DIGE data decreased the number of potentially differentially expressed proteins from the number obtained from DeCyder and increased the confidence in detecting differential expression in human clinical studies.
Computational Statistics & Data Analysis | 2002
Imola K. Fodor; Chandrika Kamath
As data mining gains acceptance in the analysis of massive data sets, it is becoming clear that there is a need for algorithms that can handle not only the massive size, but also the high dimensionality of the data. Certain pattern recognition algorithms can become computationally intractable when the number of features reaches hundreds or even thousands, while others can break down if there are large correlations among the features. A common solution to these problems is to reduce the dimension, either in conjunction with the pattern recognition algorithm or independent of it. We describe how dimension reduction techniques can be applied in the context of a specific data mining application, namely, the classification of radio-galaxies with a bent double morphology. We discuss certain statistical and exploratory data analysis methods to reduce the number of features, and the subsequent improvements in the performance of decision tree and generalized linear model classifiers. We show that a careful extraction and selection of features is necessary for the successful application of data mining techniques.
Computing in Science and Engineering | 2002
Chandrika Kamath; Erick Cantú-Paz; Imola K. Fodor; Nu Ai Tang
Astronomy data sets have led to interesting problems in mining scientific data. These problems will likely become more challenging as the astronomy community brings several surveys online as part of the National Virtual Observatory, giving rise to the possibility of mining data across many different surveys. In this article, we discuss the work we performed while using the catalog from the FIRST (Faint Images of the Radio Sky at Twenty centimetres) survey to classify galaxies with a bent-double morphology, meaning those galaxies that appear to be bent in shape. We describe the approach we took to mine this data, the issues we addressed in working with a real data set, and the lessons we learned in the process.
Archive | 2001
Chandrika Kamath; Erick Cantu-Paz; Imola K. Fodor; Nu Ai Tang
Data mining techniques are increasingly gaining popularity in various scientific domains as viable approaches to the analysis of massive data sets. In this chapter, we describe our experiences in applying data mining to a problem in astronomy, namely, the identification of radio-emitting galaxies with a bent-double morphology. Until recently, astronomers associated with the FIRST (Faint images of the radio Sky at Twenty-cm) survey identified these galaxies through a visual inspection of images. White this manual approach has been very subjective and tedious, it is also becoming increasingly infeasible as the survey has grown in size. Upon completion, FIRST will include almost a million galaxies, making the use of semi-automated analysis methods necessary. We describe the FIRST data set and the problem of identifying bent-double galaxies. We discuss our solution approach, focusing on the challenges we face in the application of data mining to a scientific data set. We explain why, in contrast with most commercial data mining applications, data preprocessing requires a considerable effort in scientific applications. Using decision tree classifiers, we describe the work we are doing in the detection of bent-double galaxies. Our results indicate that data mining techniques, steered by proper domain knowledge, can greatly enhance the manual exploration of massive data sets.
Parallel and distributed methods for image processing. Conference | 2000
Chandrika Kamath; Chuck Baldwin; Imola K. Fodor; Nu Ai Tang
Advances in technology have enabled us to collect data from observations, experiments, and simulations at an ever increasing pace. As these data sets approach the terabyte and petabyte range, scientists are increasingly using semi-automated techniques from data mining and pattern recognition to find useful information in the data. In order for data mining to be successful, the raw data must first be processed into a form suitable for the detection of patterns. When the data is in the form of images, this can involve a substantial amount of processing on very large data sets. To help make this task more efficient, we are designing and implementing an object-oriented image processing toolkit that specifically targets massively-parallel, distributed-memory architectures. We first show that it is possible to use object-oriented technology to effectively address the diverse needs of image applications. Next, we describe how we abstract out the similarities in image processing algorithms to enable re-use in our software. We will also discuss the difficulties encountered in parallelizing image algorithms on the massively parallel machines as well as the bottlenecks to high performance. We will demonstrate our work using images from an astronomical data set, and illustrate how techniques such as filters and denoising through the thresholding of wavelet coefficients can be applied when a large image is distributed across several processors.
Independent Component Analyses, Wavelets, and Neural Networks | 2003
Imola K. Fodor; Chandrika Kamath
Observed and simulated global temperature series include the effects of many different sources, such as volcano eruptions and El Nino Southern Oscillation (ENSO) variations. In order to compare the results of different models to each other, and to the observed data, it is necessary to first remove contributions from sources that are not commonly shared across the models considered. Such a separation of sources is also desired in order to assess the effect of human contributions on the global climate. Atmospheric scientists currently use parametric models and iterative techniques to remove the effects of volcano eruptions and ENSO variations from global temperature trends. Drawbacks of the parametric approach include the non-robustness of the results to the estimated values of the parameters, and the possible lack of fit of the data to the model. In this paper, we investigate ICA as an alternative method for separating independent sources in global temperature series. Instead of fitting parametric models, we let the data guide the estimation, and separate automatically the effects of the underlying sources. We first assess ICA on simple artificial datasets to establish the conditions under which ICA is feasible in our context, then we study its results on climate data from the National Centers for Environmental Predictions.
conference on image and video communications and processing | 2003
Imola K. Fodor; Chandrika Kamath
Detecting and tracking objects in spatio-temporal datasets is an active research area with applications in many domains. A common approach is to segment the 2D frames in order to separate the objects of interest from the background, then estimate the motion of the objects and track them over time. Most existing algorithms assume that the objects to be tracked are rigid. In many scientific simulations, however, the objects of interest evolve over time and thus pose additional challenges for the segmentation and tracking tasks. We investigate efficient segmentation methods in the context of scientific simulation data. Instead of segmenting each frame separately, we propose an incremental approach which incorporates the segmentation result from the previous time frame when segmenting the data at the current time frame. We start with the simple K-means method, then we study more complicated segmentation techniques based on Markov random fields. We compare the incremental methods to the corresponding sequential ones both in terms of the quality of the results, as well as computational complexity.
Archive | 2002
Imola K. Fodor; Chandrika Kamath
Scientists are collecting data from observations and simulations at an ever increasing pace. In order to extract useful information from these massive datasets, they are turning to data mining techniques as an attractive solution approach. Data mining is an iterative and interactive process that consists of data pre-processing and pattern recognition. Pre-processing the raw data in order to transform it into a form suitable for pattern recognition is an important and timeconsuming first step. In this paper, we discuss the crucial role multiresolution techniques can play in the pre-processing of massive datasets. Using both simulated and real images, we describe our work in de-noising image data using wavelet-based multiresolution techniques. Our initial experiences show that a judicious choice of wavelet transforms, threshold selection methods, and threshold application schemes can effectively reduce the noise in the data without a significant loss of the signal.
Proceedings of SPIE, the International Society for Optical Engineering | 2001
Chandrika Kamath; Erick Cantú-Paz; Imola K. Fodor; Nu Ai Tang
In this paper, we describe the use of data mining techniques to search for radio-emitting galaxies with a bent-double morphology. In the past, astronomers from the FIRST (Faint Images of the Radio Sky at Twenty-cm) survey identified these galaxies through visual inspection. This was not only subjective but also tedious as the on-going survey now covers 8000 square degrees, with each square degree containing about 90 galaxies. In this paper, we describe how data mining can be used to automate the identification of these galaxies. We discuss the challenges faced in defining meaningful features that represent the shape of a galaxy and our experiences with ensembles of decision trees for the classification of bent-double galaxies.
Archive | 2002
Imola K. Fodor