Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Marina Vannucci is active.

Publication


Featured researches published by Marina Vannucci.


Bioinformatics | 2003

Gene selection: a Bayesian variable selection approach

Kyeong Eun Lee; Naijun Sha; Edward R. Dougherty; Marina Vannucci; Bani K. Mallick

UNLABELLED Selection of significant genes via expression patterns is an important problem in microarray experiments. Owing to small sample size and the large number of variables (genes), the selection process can be unstable. This paper proposes a hierarchical Bayesian model for gene (variable) selection. We employ latent variables to specialize the model to a regression setting and uses a Bayesian mixture prior to perform the variable selection. We control the size of the model by assigning a prior distribution over the dimension (number of significant genes) of the model. The posterior distributions of the parameters are not in explicit form and we need to use a combination of truncated sampling and Markov Chain Monte Carlo (MCMC) based computation techniques to simulate the parameters from the posteriors. The Bayesian model is flexible enough to identify significant genes as well as to perform future predictions. The method is applied to cancer classification via cDNA microarrays where the genes BRCA1 and BRCA2 are associated with a hereditary disposition to breast cancer, and the method is used to identify a set of significant genes. The method is also applied successfully to the leukemia data. SUPPLEMENTARY INFORMATION http://stat.tamu.edu/people/faculty/bmallick.html.


Journal of The Royal Statistical Society Series B-statistical Methodology | 1998

Multivariate Bayesian variable selection and prediction

Philip J. Brown; Marina Vannucci; Tom Fearn

The multivariate regression model is considered with p regressors. A latent vector with p binary entries serves to identify one of two types of regression coefficients: those close to 0 and those not. Specializing our general distributional setting to the linear model with Gaussian errors and using natural conjugate prior distributions, we derive the marginal posterior distribution of the binary latent vector. Fast algorithms aid its direct computation, and in high dimensions these are supplemented by a Markov chain Monte Carlo approach to sampling from the known posterior distribution. Problems with hundreds of regressor variables become quite feasible. We give a simple method of assigning the hyperparameters of the prior distribution. The posterior predictive distribution is derived and the approach illustrated on compositional analysis of data involving three sugars with 160 near infra-red absorbances as regressors.


Journal of the American Statistical Association | 2005

Bayesian Variable Selection in Clustering High-Dimensional Data

Mahlet G. Tadesse; Naijun Sha; Marina Vannucci

Over the last decade, technological advances have generated an explosion of data with substantially smaller sample size relative to the number of covariates (p ≫ n). A common goal in the analysis of such data involves uncovering the group structure of the observations and identifying the discriminating variables. In this article we propose a methodology for addressing these problems simultaneously. Given a set of variables, we formulate the clustering problem in terms of a multivariate normal mixture model with an unknown number of components and use the reversible-jump Markov chain Monte Carlo technique to define a sampler that moves between different dimensional spaces. We handle the problem of selecting a few predictors among the prohibitively vast number of variable subsets by introducing a binary exclusion/inclusion latent vector, which gets updated via stochastic search techniques. We specify conjugate priors and exploit the conjugacy by integrating out some of the parameters. We describe strategies for posterior inference and explore the performance of the methodology with simulated and real datasets.


Journal of the American Statistical Association | 2001

Bayesian Wavelet Regression on Curves With Application to a Spectroscopic Calibration Problem

Philip J. Brown; Tom Fearn; Marina Vannucci

Motivated by calibration problems in near-infrared (NIR) spectroscopy, we consider the linear regression setting in which the many predictor variables arise from sampling an essentially continuous curve at equally spaced points and there may be multiple predictands. We tackle this regression problem by calculating the wavelet transforms of the discretized curves, then applying a Bayesian variable selection method using mixture priors to the multivariate regression of predictands on wavelet coefficients. For prediction purposes, we average over a set of likely models. Applied to a particular problem in NIR spectroscopy, this approach was able to find subsets of the wavelet coefficients with overall better predictive performance than the more usual approaches. In the application, the available predictors are measurements of the NIR reflectance spectrum of biscuit dough pieces at 256 equally spaced wavelengths. The aim is to predict the composition (i.e., the fat, flour, sugar, and water content) of the dough pieces using the spectral variables. Thus we have a multivariate regression of four predictands on 256 predictors with quite high intercorrelation among the predictors. A training set of 39 samples is available to fit this regression. Applying a wavelet transform replaces the 256 measurements on each spectrum with 256 wavelet coefficients that carry the same information. The variable selection method could use subsets of these coefficients that gave good predictions for all four compositional variables on a separate test set of samples. Selecting in the wavelet domain rather than from the original spectral variables is appealing in this application, because a single wavelet coefficient can carry information from a band of wavelengths in the original spectrum. This band can be narrow or wide, depending on the scale of the wavelet selected.


Journal of The Royal Statistical Society Series B-statistical Methodology | 2002

Bayes model averaging with selection of regressors

Philip J. Brown; Marina Vannucci; Tom Fearn

When a number of distinct models contend for use in prediction, the choice of a single model can offer rather unstable predictions. In regression, stochastic search variable selection with Bayesian model averaging offers a cure for this robustness issue but at the expense of requiring very many predictors. Here we look at Bayes model averaging incorporating variable selection for prediction. This offers similar mean-square errors of prediction but with a vastly reduced predictor space. This can greatly aid the interpretation of the model. It also reduces the cost if measured variables have costs. The development here uses decision theory in the context of the multivariate general linear model. In passing, this reduced predictor space Bayes model averaging is contrasted with single-model approximations. A fast algorithm for updating regressions in the Markov chain Monte Carlo searches for posterior inference is developed, allowing many more variables than observations to be contemplated. We discuss the merits of absolute rather than proportionate shrinkage in regression, especially when there are more variables than observations. The methodology is illustrated on a set of spectroscopic data used for measuring the amounts of different sugars in an aqueous solution. Copyright 2002 Royal Statistical Society.


Bioinformatics | 2000

Finding pathogenicity islands and gene transfer events in genome data

Pietro Liò; Marina Vannucci

MOTIVATION There is a growing literature on wavelet theory and wavelet methods showing improvements on more classical techniques, especially in the contexts of smoothing and extraction of fundamental components of signals. G+C patterns occur at different lengths (scales) and, for this reason, G+C plots are usually difficult to interpret. Current methods for genome analysis choose a window size and compute a chi(2) statistics of the average value for each window with respect to the whole genome. RESULTS Firstly, wavelets are used to smooth G+C profiles to locate characteristic patterns in genome sequences. The method we use is based on performing a chi(2) statistics on the wavelet coefficients of a profile; thus we do not need to choose a fixed window size, in that the smoothing occurs at a set of different scales. Secondly, a wavelet scalogram is used as a measure for sequence profile comparison; this tool is very general and can be applied to other sequence profiles commonly used in genome analysis. We show applications to the analysis of Deinococcus radiodurans chromosome I, of two strains of Helicobacter pylori (26695, J99) and two of Neisseria meningitidis (serogroup B strain MC58 and serogroup A strain Z2491). We report a list of loci that have different G+C content with respect to the nearby regions; the analysis of N. meningitidis serogroup B shows two new large regions with low G+C content that are putative pathogenicity islands. AVAILABILITY Software and numerical results (profiles, scalograms, high and low frequency components) for all the genome sequences analyzed are available upon request from the authors.


Biological Psychiatry | 2008

Decomposing Intra-Subject Variability in Children with Attention-Deficit/Hyperactivity Disorder

Adriana Di Martino; Manely Ghaffari; Jocelyn Curchack; Philip T. Reiss; Christopher Hyde; Marina Vannucci; Eva Petkova; Donald F. Klein; F. Xavier Castellanos

BACKGROUND Increased intra-subject response time standard deviations (RT-SD) discriminate children with attention-deficit/hyperactivity disorder (ADHD) from healthy control subjects. The RT-SD is averaged over time; thus it does not provide information about the temporal structure of RT variability. We previously hypothesized that such increased variability might be related to slow spontaneous fluctuations in brain activity occurring with periods between 15 sec and 40 sec. Here, we investigated whether these slow RT fluctuations add unique differentiating information beyond the global increase in RT-SD. METHODS We recorded RT at 3-sec intervals for 15 min during an Eriksen flanker task for 29 children with ADHD and 26 age-matched typically developing control subjects (TDC) (mean ages 12.5 +/- 2.4 and 11.6 +/- 2.5; 26 and 12 boys, respectively). The primary outcome was the magnitude of the spectral component in the frequency range between .027 and .073 Hz measured with continuous Morlet wavelet transform. RESULTS The magnitude of the low-frequency fluctuation was greater for children with ADHD compared with TDC (p = .02, d = .69). After modeling ADHD diagnosis as a function of RT-SD, adding this specific frequency range significantly improved the model fit (p = .03; odds ratio = 2.58). CONCLUSIONS Fluctuations in low-frequency RT variability predict the diagnosis of ADHD beyond the effect associated with global differences in variability. Future studies will examine whether such spectrally specific fluctuations in behavioral responses are linked to intrinsic regional cerebral hemodynamic oscillations that occur at similar frequencies.


Journal of the American Statistical Association | 2003

Wavelet-Based Nonparametric Modeling of Hierarchical Functions in Colon Carcinogenesis

Jeffrey S. Morris; Marina Vannucci; Philip J. Brown; Raymond J. Carroll

In this article we develop new methods for analyzing the data from an experiment using rodent models to investigate the effect of type of dietary fat on O6-methylguanine-DNA-methyltransferase (MGMT), an important biomarker in early colon carcinogenesis. The data consist of observed profiles over a spatial variable contained within a two-stage hierarchy, a structure that we dub hierarchical functional data. We present a new method providing a unified framework for modeling these data, simultaneously yielding estimates and posterior samples for mean, individual, and subsample-level profiles, as well as covariance parameters at the various hierarchical levels. Our method is nonparametric in that it does not require the prespecification of parametric forms for the functions and involves modeling in the wavelet space, which is especially effective for spatially heterogeneous functions as encountered in the MGMT data. Our approach is Bayesian; the only informative hyperparameters in our model are effectively smoothing parameters. Analysis of this dataset yields interesting new insights into how MGMT operates in early colon carcinogenesis, and how this may depend on diet. Our method is general, so it can be applied to other settings where hierarchical functional data are encountered.


The Annals of Applied Statistics | 2011

INCORPORATING BIOLOGICAL INFORMATION INTO LINEAR MODELS: A BAYESIAN APPROACH TO THE SELECTION OF PATHWAYS AND GENES.

Francesco C. Stingo; Yian A. Chen; Mahlet G. Tadesse; Marina Vannucci

The vast amount of biological knowledge accumulated over the years has allowed researchers to identify various biochemical interactions and define different families of pathways. There is an increased interest in identifying pathways and pathway elements involved in particular biological processes. Drug discovery efforts, for example, are focused on identifying biomarkers as well as pathways related to a disease. We propose a Bayesian model that addresses this question by incorporating information on pathways and gene networks in the analysis of DNA microarray data. Such information is used to define pathway summaries, specify prior distributions, and structure the MCMC moves to fit the model. We illustrate the method with an application to gene expression data with censored survival outcomes. In addition to identifying markers that would have been missed otherwise and improving prediction accuracy, the integration of existing biological knowledge into the analysis provides a better understanding of underlying molecular processes.


Bioinformatics | 2000

Wavelet change-point prediction of transmembrane proteins

Pietro Liò; Marina Vannucci

MOTIVATION A non-parametric method, based on a wavelet data-dependent threshold technique for change-point analysis, is applied to predict location and topology of helices in transmembrane proteins. A new propensity scale generated from a transmembrane helix database is proposed. RESULTS We show that wavelet change-point performs well for smoothing hydropathy and transmembrane profiles generated using different scales. We investigate which wavelet bases and threshold functions are overall most appropriate to detect transmembrane segments. Prediction accuracy is based on the analysis of two data sets used as standard benchmarks for transmembrane prediction algorithms. The analysis of a test set of 83 proteins results in accuracy per segment equal to 98.2%; the analysis of a 48 proteins blind-test set, i.e. containing proteins not used to generate the propensity scales, results in accuracy per segment equal to 97.4%. We believe that this method can also be applied to the detection of boundaries of other patterns such as G + Cisochores and dot-plots. AVAILABILITY The transmembrane database, TMALN and source code are available upon request from the authors.

Collaboration


Dive into the Marina Vannucci's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Christine B. Peterson

University of Texas MD Anderson Cancer Center

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kim-Anh Do

University of Texas MD Anderson Cancer Center

View shared research outputs
Top Co-Authors

Avatar

Mingyu Liang

Medical College of Wisconsin

View shared research outputs
Top Co-Authors

Avatar

Pengyuan Liu

Medical College of Wisconsin

View shared research outputs
Top Co-Authors

Avatar

Purushottam W. Laud

Medical College of Wisconsin

View shared research outputs
Researchain Logo
Decentralizing Knowledge