Lori A. Dalton
Ohio State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Lori A. Dalton.
IEEE Transactions on Signal Processing | 2011
Lori A. Dalton; Edward R. Dougherty
With the advent of high-throughput genomic and proteomic technologies, in conjunction with the difficulty in obtaining even moderately sized samples, small-sample classifier design has become a major issue in the biological and medical communities. With small samples, training-data error estimation becomes mandatory. Yet none of the popular error estimation techniques have been rigorously designed based on statistical inference and optimization. In this investigation, we place classifier error estimation into the framework of optimal mean-square error (MSE) signal estimation in the presence of uncertainty, which results in a Bayesian approach to error estimation based on a parameterized family of feature-label distributions with the prior distribution of the parameters governing the choice of feature-label distribution. These Bayesian error estimators are optimal when averaged over a given family of distributions, unbiased when averaged over a given family and all samples, and analytically address a trade-off between robustness (modeling assumptions) and accuracy (minimum mean-square error). In this paper, Part I of a two-part study, we define the minimum mean-square error (MMSE) error estimator, discuss its basic properties, provide closed-form analytic estimator representation for discrete classifiers with both non-informative and informative prior distributions, and examine the performance and robustness of the MMSE error estimator via simulations. In Part II of this paper, in this same issue of IEEE Transactions on Signal Processing, we address all of these issues, in particular, closed-form representation for linear classification in the Gaussian model with known and unknown covariance matrices. For both the discrete and Gaussian cases, the MMSE error estimator has especially good performance for distributions having moderate true errors.
Pattern Recognition | 2013
Lori A. Dalton; Edward R. Dougherty
In recent years, biomedicine has faced a flood of difficult small-sample phenotype discrimination problems. A host of classification rules have been proposed to discriminate types of pathology, stages of disease and other diagnoses. Typically, these classification rules are heuristic algorithms, with very little understood about their performance. To give a concrete mathematical structure to the problem, recent work has utilized a Bayesian modeling framework based on an uncertainty class of feature-label distributions to both optimize and analyze error estimator performance. The current study uses the same Bayesian framework to also optimize classifier design. This completes a Bayesian theory of classification, where both the classifier error and the estimate of the error may be optimized and studied probabilistically within the model framework. This paper, the first of a two-part study, derives optimal classifiers in discrete and Gaussian models, demonstrates their superior performance over popular classifiers within the assumed model, and applies the method to real genomic data. The second part of the study discusses properties of these optimal Bayesian classifiers.
IEEE Transactions on Signal Processing | 2011
Lori A. Dalton; Edward R. Dougherty
In this paper, Part II of a two-part study, we derive a closed-form analytic representation of the Bayesian minimum mean-square error (MMSE) error estimator for linear classification assuming Gaussian models. This is presented in a general framework permitting a structure on the covariance matrices and a very flexible class of prior parameter distributions with four free parameters. Closed-form solutions are provided for known, scaled identity, and arbitrary covariance matrices. We examine performance in small sample settings via simulations on both synthetic and real genomic data, and demonstrate the robustness of these error estimators to false Gaussian modeling assumptions by applying them to Johnson distributions.
Current Genomics | 2009
Lori A. Dalton; Virginia L. Ballarin; Marcel Brun
The development of microarray technology has enabled scientists to measure the expression of thousands of genes simultaneously, resulting in a surge of interest in several disciplines throughout biology and medicine. While data clustering has been used for decades in image processing and pattern recognition, in recent years it has joined this wave of activity as a popular technique to analyze microarrays. To illustrate its application to genomics, clustering applied to genes from a set of microarray data groups together those genes whose expression levels exhibit similar behavior throughout the samples, and when applied to samples it offers the potential to discriminate pathologies based on their differential patterns of gene expression. Although clustering has now been used for many years in the context of gene expression microarrays, it has remained highly problematic. The choice of a clustering algorithm and validation index is not a trivial one, more so when applying them to high throughput biological or medical data. Factors to consider when choosing an algorithm include the nature of the application, the characteristics of the objects to be analyzed, the expected number and shape of the clusters, and the complexity of the problem versus computational power available. In some cases a very simple algorithm may be appropriate to tackle a problem, but many situations may require a more complex and powerful algorithm better suited for the job at hand. In this paper, we will cover the theoretical aspects of clustering, including error and learning, followed by an overview of popular clustering algorithms and classical validation indices. We also discuss the relative performance of these algorithms and indices and conclude with examples of the application of clustering to computational biology.
IEEE Transactions on Wireless Communications | 2005
Lori A. Dalton; Costas N. Georghiades
We present a complex, full-rate quasi-orthogonal space-time block code for four transmit antennas. Using carefully tailored constellation phase rotations, we show that this code achieves full diversity for specialized PSK-based constellations. The optimal receiver for the new code decouples the symbol detection problem into pairs of symbols, thus greatly reducing complexity. Finally, we present and compare performance of the new code with several other codes in the literature. The new code is shown to perform as well as the best known code of its class.
Pattern Recognition | 2013
Lori A. Dalton; Edward R. Dougherty
In part I of this two-part study, we introduced a new optimal Bayesian classification methodology that utilizes the same modeling framework proposed in Bayesian minimum-mean-square error (MMSE) error estimation. Optimal Bayesian classification thus completes a Bayesian theory of classification, where both the classifier error and our estimate of the error may be simultaneously optimized and studied probabilistically within the assumed model. Having developed optimal Bayesian classifiers in discrete and Gaussian models in part I, here we explore properties of optimal Bayesian classifiers, in particular, invariance to invertible transformations, convergence to the Bayes classifier, and a connection to Bayesian robust classifiers. We also explicitly derive optimal Bayesian classifiers with non-informative priors, and explore relationships to linear and quadratic discriminant analysis (LDA and QDA), which may be viewed as plug-in rules under Gaussian modeling assumptions. Finally, we present several simulations addressing the robustness of optimal Bayesian classifiers to false modeling assumptions. Companion website: http://gsp.tamu.edu/Publications/supplementary/dalton12a.
IEEE Transactions on Signal Processing | 2012
Lori A. Dalton; Edward R. Dougherty
In recent years, biomedicine has been faced with difficult high-throughput small-sample classification problems. In such settings, classifier error estimation becomes a critical issue because training and testing must be done on the same data. A recently proposed error estimator places the problem in a signal estimation framework in the presence of uncertainty, permitting a rigorous solution optimal in a minimum-mean-square error sense. The uncertainty in this model is relative to the parameters of the feature-label distributions, resulting in a Bayesian approach to error estimation. Closed form solutions are available for two important problems: discrete classification with Dirichlet priors and linear classification of Gaussian distributions with normal-inverse-Wishart priors. In this work, Part I of a two-part study, we introduce the theoretical mean-square-error (MSE) conditioned on the observed sample of any estimate of the classifier error, including the Bayesian error estimator, for both Bayesian models. Thus, Bayesian error estimation has a unique advantage in that its mathematical framework naturally gives rise to a practical expected measure of performance given an observed sample. In Part II of the study we examine consistency of the error estimator, demonstrate various MSE properties, and apply the conditional MSE to censored sampling.
intelligent systems in molecular biology | 2011
Lori A. Dalton; Edward R. Dougherty
MOTIVATION With the development of high-throughput genomic and proteomic technologies, coupled with the inherent difficulties in obtaining large samples, biomedicine faces difficult small-sample classification issues, in particular, error estimation. Most popular error estimation methods are motivated by intuition rather than mathematical inference. A recently proposed error estimator based on Bayesian minimum mean square error estimation places error estimation in an optimal filtering framework. In this work, we examine the application of this error estimator to gene expression microarray data, including the suitability of the Gaussian model with normal-inverse-Wishart priors and how to find prior probabilities. RESULTS We provide an implementation for non-linear classification, where closed form solutions are not available. We propose a methodology for calibrating normal-inverse-Wishart priors based on discarded microarray data and examine the performance on synthetic high-dimensional data and a real dataset from a breast cancer study. The calibrated Bayesian error estimator has superior root mean square performance, especially with moderate to high expected true errors and small feature sizes. AVAILABILITY We have implemented in C code the Bayesian error estimator for Gaussian distributions and normal-inverse-Wishart priors for both linear classifiers, with exact closed-form representations, and arbitrary classifiers, where we use a Monte Carlo approximation. Our code for the Bayesian error estimator and a toolbox of related utilities are available at http://gsp.tamu.edu/Publications/supplementary/dalton11a. Several supporting simulations are also included. CONTACT [email protected]
IEEE Transactions on Signal Processing | 2012
Lori A. Dalton; Edward R. Dougherty
In Part I of a two part study on the MSE performance of Bayesian error estimation, we have derived analytical expressions for MSE conditioned on the sample for Bayesian error estimators and arbitrary error estimators in two Bayesian models: discrete classification with Dirichlet priors and linear classification of Gaussian distributions with normal-inverse-Wishart priors. Here, in Part II, we examine the consistency of Bayesian error estimation and provide several simulation studies that illustrate the concept of conditional MSE and how it may be used in practice. A salient application is censored sampling, where sample points are collected one at a time until the conditional MSE reaches a stopping criterion.
IEEE Transactions on Signal Processing | 2014
Lori A. Dalton; Edward R. Dougherty
When designing optimal filters it is often unrealistic to assume that the statistical model is known perfectly. The issue is then to design a robust filter that is optimal relative to an uncertainty class of processes. Robust filter design has been treated from minimax (best worst-case performance) and Bayesian (best average performance) perspectives. Heretofore, the Bayesian approach has involved finding a model-specific optimal filter, one that is optimal for some model in the uncertainty class. Lifting this constraint, we optimize over the full class from which the original optimal filters were obtained, for instance, over all linear filters. By extending the original characteristics determining the filter, such as the power spectral density, to “effective characteristics” that apply across the uncertainty class, we demonstrate, for both linear and morphological filtering, that an “intrinsically optimal” Bayesian robust filter can be represented in the same form as the standard solution to the optimal filter, except via the effective characteristics. Solutions for intrinsic Bayesian robust filters are more transparent and intuitive than solutions for model-specific filters, and also less tedious, because effective characteristics push through the spectral theory into the Bayesian setting, whereas solutions in the model-specific case depend on grinding out the optimization.