Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Genevera I. Allen is active.

Publication


Featured researches published by Genevera I. Allen.


The Annals of Applied Statistics | 2010

Transposable regularized covariance models with an application to missing data imputation

Genevera I. Allen; Robert Tibshirani

Missing data estimation is an important challenge with high-dimensional data arranged in the form of a matrix. Typically this data matrix is transposable, meaning that either the rows, columns or both can be treated as features. To model transposable data, we present a modification of the matrix-variate normal, the mean-restricted matrix-variate normal, in which the rows and columns each have a separate mean vector and covariance matrix. By placing additive penalties on the inverse covariance matrices of the rows and columns, these so called transposable regularized covariance models allow for maximum likelihood estimation of the mean and non-singular covariance matrices. Using these models, we formulate EM-type algorithms for missing data imputation in both the multivariate and transposable frameworks. We present theoretical results exploiting the structure of our transposable models that allow these models and imputation methods to be applied to high-dimensional data. Simulations and results on microarray data and the Netflix data show that these imputation techniques often outperform existing methods and offer a greater degree of flexibility.


Journal of the American Statistical Association | 2014

A Generalized Least-Square Matrix Decomposition

Genevera I. Allen; Logan Grosenick; Jonathan Taylor

Variables in many big-data settings are structured, arising, for example, from measurements on a regular grid as in imaging and time series or from spatial-temporal measurements as in climate studies. Classical multivariate techniques ignore these structural relationships often resulting in poor performance. We propose a generalization of principal components analysis (PCA) that is appropriate for massive datasets with structured variables or known two-way dependencies. By finding the best low-rank approximation of the data with respect to a transposable quadratic norm, our decomposition, entitled the generalized least-square matrix decomposition (GMD), directly accounts for structural relationships. As many variables in high-dimensional settings are often irrelevant, we also regularize our matrix decomposition by adding two-way penalties to encourage sparsity or smoothness. We develop fast computational algorithms using our methods to perform generalized PCA (GPCA), sparse GPCA, and functional GPCA on massive datasets. Through simulations and a whole brain functional MRI example, we demonstrate the utility of our methodology for dimension reduction, signal recovery, and feature selection with high-dimensional structured data. Supplementary materials for this article are available online.


IEEE Transactions on Nanobioscience | 2013

A Local Poisson Graphical Model for Inferring Networks From Sequencing Data

Genevera I. Allen; Zhandong Liu

Gaussian graphical models, a class of undirected graphs or Markov Networks, are often used to infer gene networks based on microarray expression data. Many scientists, however, have begun using high-throughput sequencing technologies such as RNA-sequencing or next generation sequencing to measure gene expression. As the resulting data consists of counts of sequencing reads for each gene, Gaussian graphical models are not optimal for this discrete data. In this paper, we propose a novel method for inferring gene networks from sequencing data: the Local Poisson Graphical Model. Our model assumes a Local Markov property where each variable conditional on all other variables is Poisson distributed. We develop a neighborhood selection algorithm to fit our model locally by performing a series of l1 penalized Poisson, or log-linear, regressions. This yields a fast parallel algorithm for estimating networks from next generation sequencing data. In simulations, we illustrate the effectiveness of our methods for recovering network structure from count data. A case study on breast cancer microRNAs (miRNAs), a novel application of graphical models, finds known regulators of breast cancer genes and discovers novel miRNA clusters and hubs that are targets for future research.


Bioinformatics | 2011

Sparse non-negative generalized PCA with applications to metabolomics

Genevera I. Allen; Mirjana Maletic-Savatic

MOTIVATION Nuclear magnetic resonance (NMR) spectroscopy has been used to study mixtures of metabolites in biological samples. This technology produces a spectrum for each sample depicting the chemical shifts at which an unknown number of latent metabolites resonate. The interpretation of this data with common multivariate exploratory methods such as principal components analysis (PCA) is limited due to high-dimensionality, non-negativity of the underlying spectra and dependencies at adjacent chemical shifts. RESULTS We develop a novel modification of PCA that is appropriate for analysis of NMR data, entitled Sparse Non-Negative Generalized PCA. This method yields interpretable principal components and loading vectors that select important features and directly account for both the non-negativity of the underlying spectra and dependencies at adjacent chemical shifts. Through the reanalysis of experimental NMR data on five purified neural cell types, we demonstrate the utility of our methods for dimension reduction, pattern recognition, sample exploration and feature selection. Our methods lead to the identification of novel metabolites that reflect the differences between these cell types. AVAILABILITY www.stat.rice.edu/~gallen/software.html. CONTACT [email protected]. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


Alzheimers & Dementia | 2016

Crowdsourced estimation of cognitive decline and resilience in Alzheimer's disease

Genevera I. Allen; Nicola Amoroso; Catalina V Anghel; Venkat K. Balagurusamy; Christopher Bare; Derek Beaton; Roberto Bellotti; David A. Bennett; Kevin L. Boehme; Paul C. Boutros; Laura Caberlotto; Cristian Caloian; Frederick Campbell; Elias Chaibub Neto; Yu Chuan Chang; Beibei Chen; Chien Yu Chen; Ting Ying Chien; Timothy W.I. Clark; Sudeshna Das; Christos Davatzikos; Jieyao Deng; Donna N. Dillenberger; Richard Dobson; Qilin Dong; Jimit Doshi; Denise Duma; Rosangela Errico; Guray Erus; Evan Everett

Identifying accurate biomarkers of cognitive decline is essential for advancing early diagnosis and prevention therapies in Alzheimers disease. The Alzheimers disease DREAM Challenge was designed as a computational crowdsourced project to benchmark the current state‐of‐the‐art in predicting cognitive outcomes in Alzheimers disease based on high dimensional, publicly available genetic and structural imaging data. This meta‐analysis failed to identify a meaningful predictor developed from either data modality, suggesting that alternate approaches should be considered for prediction of cognitive performance.


Urologic Oncology-seminars and Original Investigations | 2013

Surgical outcomes and complications associated with presurgical tyrosine kinase inhibition for advanced renal cell carcinoma (RCC)

Lauren C. Harshman; R. James Yu; Genevera I. Allen; Sandy Srinivas; Harcharan Gill; Benjamin I. Chung

BACKGROUND Tyrosine kinase inhibitors (TKI) have dramatically changed the management paradigm of advanced renal cell carcinoma (RCC) and are increasingly being used preoperatively to achieve cytoreduction. OBJECTIVE To review our case series of post-TKI surgical procedures to add to the current perioperative efficacy and complication profile. MATERIALS AND METHODS Between October 2006 and February 2010, 14 cytoreductive nephrectomies, radical nephrectomies, and metastectomies were performed after neoadjuvant sunitinib or sorafenib for advanced RCC. During the same time frame, a control group of 73 consecutive patients underwent radical nephrectomy, cytoreductive nephrectomy, or metastectomy in the absence of prior systemic therapy. We compared the incidence of perioperative complications and outcomes after surgical procedures between the two cohorts. RESULTS Median preoperative renal mass size was 11 cm (6.7-24.2 cm). Primary tumor shrinkage was seen in 57%; median shrinkage was 18% (8%-25%). The median treatment period was 17 weeks, and the median time from TKI discontinuation was 2 weeks. Compared with a control group and after adjusting for confounding covariates, presurgical TKI use was not associated with a significant increase in perioperative complications (50% vs. 40%, P = 0.25) or perioperative bleeding (36% vs. 34%, P = 0.97) but was associated with increased incidence and grade of intraoperative adhesions (86% vs. 58%, P = 0.001; grade 3 vs. 1, P = 0.002). CONCLUSIONS Compared with the published reports, we observed less hemorrhagic and wound healing issues but a significant increase in incidence and severity of intraoperative adhesions, which can present a formidable technical challenge. Potential reasons for our lower complication rate could be increased time from TKI discontinuation to surgery, longer time to postoperative TKI re-initiation, increased use of preoperative angioembolization, and the lack of preoperative bevacizumab administration. Presurgical TKI therapy can permit effective surgical cytoreduction with a safety and complication profile equivalent to that of non-TKI-nephrectomy; however safety data continue to evolve, and preoperative TKI use requires further prospective investigation.


BMC Genomics | 2013

Molecular pathway identification using biological network-regularized logistic models

Wen Zhang; Ying-Wooi Wan; Genevera I. Allen; Kaifang Pang; Matthew L. Anderson; Zhandong Liu

BackgroundSelecting genes and pathways indicative of disease is a central problem in computational biology. This problem is especially challenging when parsing multi-dimensional genomic data. A number of tools, such as L1-norm based regularization and its extensions elastic net and fused lasso, have been introduced to deal with this challenge. However, these approaches tend to ignore the vast amount of a priori biological network information curated in the literature.ResultsWe propose the use of graph Laplacian regularized logistic regression to integrate biological networks into disease classification and pathway association problems. Simulation studies demonstrate that the performance of the proposed algorithm is superior to elastic net and lasso analyses. Utility of this algorithm is also validated by its ability to reliably differentiate breast cancer subtypes using a large breast cancer dataset recently generated by the Cancer Genome Atlas (TCGA) consortium. Many of the protein-protein interaction modules identified by our approach are further supported by evidence published in the literature. Source code of the proposed algorithm is freely available at http://www.github.com/zhandong/Logit-Lapnet.ConclusionLogistic regression with graph Laplacian regularization is an effective algorithm for identifying key pathways and modules associated with disease subtypes. With the rapid expansion of our knowledge of biological regulatory networks, this approach will become more accurate and increasingly useful for mining transcriptomic, epi-genomic, and other types of genome wide association studies.


Bioinformatics | 2016

TCGA2STAT: Simple TCGA Data Access for Integrated Statistical Analysis in R

Ying-Wooi Wan; Genevera I. Allen; Zhandong Liu

MOTIVATION Massive amounts of high-throughput genomics data profiled from tumor samples were made publicly available by the Cancer Genome Atlas (TCGA). RESULTS We have developed an open source software package, TCGA2STAT, to obtain the TCGA data, wrangle it, and pre-process it into a format ready for multivariate and integrated statistical analysis in the R environment. In a user-friendly format with one single function call, our package downloads and fully processes the desired TCGA data to be seamlessly integrated into a computational analysis pipeline. No further technical or biological knowledge is needed to utilize our software, thus making TCGA data easily accessible to data scientists without specific domain knowledge. AVAILABILITY AND IMPLEMENTATION TCGA2STAT is available from the https://cran.r-project.org/web/packages/TCGA2STAT/index.html SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. CONTACT [email protected].


international symposium on biomedical imaging | 2013

Imaging genetics via sparse canonical correlation analysis

Eric C. Chi; Genevera I. Allen; Hua Zhou; Omid Kohannim; Kenneth Lange; Paul M. Thompson

The collection of brain images from populations of subjects who have been genotyped with genome-wide scans makes it feasible to search for genetic effects on the brain. Even so, multivariate methods are sorely needed that can search both images and the genome for relationships, making use of the correlation structure of both datasets. Here we investigate the use of sparse canonical correlation analysis (CCA) to home in on sets of genetic variants that explain variance in a set of images. We extend recent work on penalized matrix decomposition to account for the correlations in both datasets. Such methods show promise in imaging genetics as they exploit the natural covariance in the datasets. They also avoid an astronomically heavy statistical correction for searching the whole genome and the entire image for promising associations.


bioinformatics and biomedicine | 2012

A Log-Linear Graphical Model for inferring genetic networks from high-throughput sequencing data

Genevera I. Allen; Zhandong Liu

Gaussian graphical models are often used to infer gene networks based on microarray expression data. Many scientists, however, have begun using high-throughput sequencing technologies to measure gene expression. As the resulting high-dimensional count data consists of counts of sequencing reads for each gene, Gaussian graphical models are not optimal for modeling gene networks based on this discrete data. We develop a novel method for estimating high-dimensional Poisson graphical models, the Log-Linear Graphical Model, allowing us to infer networks based on high-throughput sequencing data. Our model assumes a pair-wise Markov property: conditional on all other variables, each variable is Poisson. We estimate our model locally via neighborhood selection by fitting 1-norm penalized log-linear models. Additionally, we develop a fast parallel algorithm permitting us to fit our graphical model to high-dimensional genomic data sets. We illustrate the effectiveness of our methods for recovering network structure from count data through simulations and a case study on breast cancer microRNA networks.

Collaboration


Dive into the Genevera I. Allen's collaboration.

Top Co-Authors

Avatar

Zhandong Liu

Baylor College of Medicine

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Pradeep Ravikumar

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Ying-Wooi Wan

Baylor College of Medicine

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Eric C. Chi

North Carolina State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge