Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Tanya P. Garcia is active.

Publication


Featured researches published by Tanya P. Garcia.


Bioinformatics | 2014

Identification of important regressor groups, subgroups and individuals via regularization methods: application to gut microbiome data.

Tanya P. Garcia; Samuel Müller; Raymond J. Carroll; Rosemary L. Walzem

MOTIVATION Gut microbiota can be classified at multiple taxonomy levels. Strategies to use changes in microbiota composition to effect health improvements require knowing at which taxonomy level interventions should be aimed. Identifying these important levels is difficult, however, because most statistical methods only consider when the microbiota are classified at one taxonomy level, not multiple. RESULTS Using L1 and L2 regularizations, we developed a new variable selection method that identifies important features at multiple taxonomy levels. The regularization parameters are chosen by a new, data-adaptive, repeated cross-validation approach, which performed well. In simulation studies, our method outperformed competing methods: it more often selected significant variables, and had small false discovery rates and acceptable false-positive rates. Applying our method to gut microbiota data, we found which taxonomic levels were most altered by specific interventions or physiological status. AVAILABILITY The new approach is implemented in an R package, which is freely available from the corresponding author. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


Journal of the American Statistical Association | 2012

Nonparametric Estimation for Censored Mixture Data With Application to the Cooperative Huntington’s Observational Research Trial

Yuanjia Wang; Tanya P. Garcia; Yanyuan Ma

This work presents methods for estimating genotype-specific outcome distributions from genetic epidemiology studies where the event times are subject to right censoring, the genotypes are not directly observed, and the data arise from a mixture of scientifically meaningful subpopulations. Examples of such studies include kin-cohort studies and quantitative trait locus (QTL) studies. Current methods for analyzing censored mixture data include two types of nonparametric maximum likelihood estimators (NPMLEs; Type I and Type II) that do not make parametric assumptions on the genotype-specific density functions. Although both NPMLEs are commonly used, we show that one is inefficient and the other inconsistent. To overcome these deficiencies, we propose three classes of consistent nonparametric estimators that do not assume parametric density models and are easy to implement. They are based on inverse probability weighting (IPW), augmented IPW (AIPW), and nonparametric imputation (IMP). AIPW achieves the efficiency bound without additional modeling assumptions. Extensive simulation experiments demonstrate satisfactory performance of these estimators even when the data are heavily censored. We apply these estimators to the Cooperative Huntington’s Observational Research Trial (COHORT), and provide age-specific estimates of the effect of mutation in the Huntington gene on mortality using a sample of family members. The close approximation of the estimated noncarrier survival rates to that of the U.S. population indicates small ascertainment bias in the COHORT family sample. Our analyses underscore an elevated risk of death in Huntington gene mutation carriers compared with that in noncarriers for a wide age range, and suggest that the mutation equally affects survival rates in both genders. The estimated survival rates are useful in genetic counseling for providing guidelines on interpreting the risk of death associated with a positive genetic test, and in helping future subjects at risk to make informed decisions on whether to undergo genetic mutation testing. Technical details and additional numerical results are provided in the online supplementary materials.


Biostatistics | 2013

Structured variable selection with q-values.

Tanya P. Garcia; Samuel Müller; Raymond J. Carroll; Tamara N. Dunn; Anthony P. Thomas; Sean H. Adams; Suresh D. Pillai; Rosemary L. Walzem

When some of the regressors can act on both the response and other explanatory variables, the already challenging problem of selecting variables when the number of covariates exceeds the sample size becomes more difficult. A motivating example is a metabolic study in mice that has diet groups and gut microbial percentages that may affect changes in multiple phenotypes related to body weight regulation. The data have more variables than observations and diet is known to act directly on the phenotypes as well as on some or potentially all of the microbial percentages. Interest lies in determining which gut microflora influence the phenotypes while accounting for the direct relationship between diet and the other variables A new methodology for variable selection in this context is presented that links the concept of q-values from multiple hypothesis testing to the recently developed weighted Lasso.


Transportation Research Record | 2012

Extension of negative binomial GARCH Model:: analyzing effects of gasoline price and miles traveled on fatal crashes involving intoxicated drivers in Texas

Fan Ye; Tanya P. Garcia; Mohsen Pourahmadi; Dominique Lord

Traditional crash count models, such as the Poisson and negative binomial models, do not account for the temporal correlation of crash data. In reality, crashes that occur in the same time frame are likely to share unobserved effects that may have been excluded from the model. If the temporal correlation of crash data is ignored, the estimated parameters can be biased and less precise. Therefore, there is a need to extend the standard crash count data models by incorporating temporal dependence. Whereas the literature for modeling time series count data is well developed, its applications for traffic crash data are limited. A particularly flexible model for the time series of counts is the negative binomial integer-valued generalized autoregressive conditional heteroscedastic (NBINGARCH) model, which properly accounts for the overdispersion, nonnegativity, and integer-valued features of count data. In this paper, the NBINGARCH model is extended to incorporate covariates so that the relationship between a time series of counts and correlated external factors may be properly modeled. The improved performance of the NBINGARCH model is demonstrated through a simulation study and an application to monthly driving under the influence (DUI) fatal crashes in Texas between 2003 and 2009. In addition, the relationship between monthly vehicle miles traveled (VMT) and gasoline prices in Texas is also examined. Ultimately, gasoline prices had no significant effect on DUI fatal crashes in Texas during that time period, and VMT had a positive effect.


Current Neurology and Neuroscience Reports | 2017

Statistical Approaches to Longitudinal Data Analysis in Neurodegenerative Diseases: Huntington’s Disease as a Model

Tanya P. Garcia; Karen Marder

Understanding the overall progression of neurodegenerative diseases is critical to the timing of therapeutic interventions and design of effective clinical trials. Disease progression can be assessed with longitudinal study designs in which outcomes are measured repeatedly over time and are assessed with respect to risk factors, either measured repeatedly or at baseline. Longitudinal data allows researchers to assess temporal disease aspects, but the analysis is complicated by complex correlation structures, irregularly spaced visits, missing data, and mixtures of time-varying and static covariate effects. We review modern statistical methods designed for these challenges. Among all methods, the mixed effect model most flexibly accommodates the challenges and is preferred by the FDA for observational and clinical studies. Examples from Huntington’s disease studies are used for clarification, but the methods apply to neurodegenerative diseases in general, particularly as the identification of prodromal forms of neurodegenerative disease through sensitive biomarkers is increasing.


The Annals of Applied Statistics | 2014

COMBINING ISOTONIC REGRESSION AND EM ALGORITHM TO PREDICT GENETIC RISK UNDER MONOTONICITY CONSTRAINT.

Jing Qin; Tanya P. Garcia; Yanyuan Ma; Ming-Xin Tang; Karen Marder; Yuanjia Wang

In certain genetic studies, clinicians and genetic counselors are interested in estimating the cumulative risk of a disease for individuals with and without a rare deleterious mutation. Estimating the cumulative risk is difficult, however, when the estimates are based on family history data. Often, the genetic mutation status in many family members is unknown; instead, only estimated probabilities of a patient having a certain mutation status are available. Also, ages of disease-onset are subject to right censoring. Existing methods to estimate the cumulative risk using such family-based data only provide estimation at individual time points, and are not guaranteed to be monotonic, nor non-negative. In this paper, we develop a novel method that combines Expectation-Maximization and isotonic regression to estimate the cumulative risk across the entire support. Our estimator is monotonic, satisfies self-consistent estimating equations, and has high power in detecting differences between the cumulative risks of different populations. Application of our estimator to a Parkinsons disease (PD) study provides the age-at-onset distribution of PD in PARK2 mutation carriers and non-carriers, and reveals a significant difference between the distribution in compound heterozygous carriers compared to non-carriers, but not between heterozygous carriers and non-carriers.


The American Statistician | 2012

Regressograms and Mean-Covariance Models for Incomplete Longitudinal Data

Tanya P. Garcia; Priya Kohli; Mohsen Pourahmadi

Longitudinal studies are prevalent in biological and social sciences where subjects are measured repeatedly over time. Modeling the correlations and handling missing data are among the most challenging problems in analyzing such data. There are various methods for handling missing data, but data-based and graphical methods for modeling the covariance matrix of longitudinal data are relatively new. We adopt an approach based on the modified Cholesky decomposition of the covariance matrix which handles both challenges. It amounts to formulating parametric models for the regression coefficients of the conditional mean and variance of each measurement given its predecessors. We demonstrate the roles of profile plots and regressograms in formulating joint mean-covariance models for incomplete longitudinal data. Applying these graphical tools to the Fruit Fly Mortality (FFM) data, which has 22% missing values, reveals a logistic curve for the mean function and two different models for the two factors of the modified Cholesky decomposition of the sample covariance matrix. An expectation-maximization algorithm is proposed for estimating the parameters of the mean-covariance models; it performs well for the FFM data and in a simulation study of incomplete longitudinal data.


The Annals of Applied Statistics | 2017

Robust mixed effects model for clustered failure time data: Application to Huntington’s disease event measures

Tanya P. Garcia; Yanyuan Ma; Karen Marder; Yuanjia Wang

An important goal in clinical and statistical research is properly modeling the distribution for clustered failure times which have a natural intraclass dependency and are subject to censoring. We handle these challenges with a novel approach that does not impose restrictive modeling or distributional assumptions. Using a logit transformation, we relate the distribution for clustered failure times to covariates and a random, subject-specific effect. The covariates are modeled with unknown functional forms, and the random effect may depend on the covariates and have an unknown and unspecified distribution. We introduce pseudovalues to handle censoring and splines for functional covariate effects, and frame the problem into fitting an additive logistic mixed effects model. Unlike existing approaches for fitting such models, we develop semiparametric techniques that estimate the functional model parameters without specifying or estimating the random effect distribution. We show both theoretically and empirically that the resulting estimators are consistent for any choice of random effect distribution and any dependency structure between the random effect and covariates. Last, we illustrate the methods utility in an application to a Huntingtons disease study where our method provides new insights into differences between motor and cognitive impairment event times in at-risk subjects.


Journal of Econometrics | 2017

Simultaneous treatment of unspecified heteroskedastic model error distribution and mismeasured covariates for restricted moment models

Tanya P. Garcia; Yanyuan Ma

We develop consistent and efficient estimation of parameters in general regression models with mismeasured covariates. We assume the model error and covariate distributions are unspecified, and the measurement error distribution is a general parametric distribution with unknown variance-covariance. We construct root-n consistent, asymptotically normal and locally efficient estimators using the semiparametric efficient score. We do not estimate any unknown distribution or model error heteroskedasticity. Instead, we form the estimator under possibly incorrect working distribution models for the model error, error-prone covariate, or both. Empirical results demonstrate robustness to different incorrect working models in homoscedastic and heteroskedastic models with error-prone covariates.


Journal of Multivariate Analysis | 2016

Modeling the Cholesky factors of covariance matrices of multivariate longitudinal data

Priya Kohli; Tanya P. Garcia; Mohsen Pourahmadi

Modeling the covariance matrix of multivariate longitudinal data is more challenging as compared to its univariate counterpart due to the presence of correlations among multiple responses. The modified Cholesky block decomposition reduces the task of covariance modeling into parsimonious modeling of its two matrix factors: the regression coefficient matrices and the innovation covariance matrices. These parameters are statistically interpretable, however ensuring positive-definiteness of several (innovation) covariance matrices presents itself as a new challenge. We address this problem using a subclass of Andersons (1973) linear covariance models and model several covariance matrices using linear combinations of known positive-definite basis matrices with unknown non-negative scalar coefficients. A novelty of this approach is that positive-definiteness is guaranteed by construction; it removes a drawback of Andersons model and hence makes linear covariance models more realistic and viable in practice. Maximum likelihood estimates are computed using a simple iterative majorization-minimization algorithm. The estimators are shown to be asymptotically normal and consistent. Simulation and a data example illustrate the applicability of the proposed method in providing good models for the covariance structure of a multivariate longitudinal data.

Collaboration


Dive into the Tanya P. Garcia's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Karen Marder

Columbia University Medical Center

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge