Trang T. Le | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Trang T. Le is active.

Explore More

Publication

Featured researches published by Trang T. Le.

Bioinformatics | 2017

Differential privacy-based evaporative cooling feature selection and classification with relief-F and random forests

Trang T. Le; W. Kyle Simmons; Masaya Misaki; Jerzy Bodurka; Bill C. White; Jonathan Savitz; Brett A. McKinney

Motivation: Classification of individuals into disease or clinical categories from high‐dimensional biological data with low prediction error is an important challenge of statistical learning in bioinformatics. Feature selection can improve classification accuracy but must be incorporated carefully into cross‐validation to avoid overfitting. Recently, feature selection methods based on differential privacy, such as differentially private random forests and reusable holdout sets, have been proposed. However, for domains such as bioinformatics, where the number of features is much larger than the number of observations Symbol, these differential privacy methods are susceptible to overfitting. Symbol. No caption available. Methods: We introduce private Evaporative Cooling, a stochastic privacy‐preserving machine learning algorithm that uses Relief‐F for feature selection and random forest for privacy preserving classification that also prevents overfitting. We relate the privacy‐preserving threshold mechanism to a thermodynamic Maxwell‐Boltzmann distribution, where the temperature represents the privacy threshold. We use the thermal statistical physics concept of Evaporative Cooling of atomic gases to perform backward stepwise privacy‐preserving feature selection. Results: On simulated data with main effects and statistical interactions, we compare accuracies on holdout and validation sets for three privacy‐preserving methods: the reusable holdout, reusable holdout with random forest, and private Evaporative Cooling, which uses Relief‐F feature selection and random forest classification. In simulations where interactions exist between attributes, private Evaporative Cooling provides higher classification accuracy without overfitting based on an independent validation set. In simulations without interactions, thresholdout with random forest and private Evaporative Cooling give comparable accuracies. We also apply these privacy methods to human brain resting‐state fMRI data from a study of major depressive disorder. Availability and implementation: Code available at http://insilico.utulsa.edu/software/privateEC. Contact: brett‐[email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Bellman Prize in Mathematical Biosciences | 2018

Structural and practical identifiability analysis of outbreak models

Necibe Tuncer; Trang T. Le

Estimating the reproduction number of an emerging infectious disease from an epidemiological data is becoming more essential in evaluating the current status of an outbreak. However, these studies are lacking the fundamental prerequisite in parameter estimation problem, namely the structural identifiability of the epidemic model, which determines the possibility of uniquely determining the model parameters from the epidemic data. In this paper, we perform both structural and practical identifiability analysis to classical epidemic models such as SIR (Susceptible-Infected-Recovered), SEIR (Susceptible-Exposed-Infected-Recovered) and an epidemic model with the treatment class (SITR). We performed structural identifiability analysis on these epidemic models using a differential algebra approach to investigate the well-posedness of the parameter estimation problem. Parameters of these models are estimated from different data types, namely prevalence, cumulative incidences and treated individuals. Furthermore, we carried out practical identifiability analysis on these models using Monte Carlo simulations and Fishers Information Matrix. Our study shows that the SIR model is both structurally and practically identifiable from the prevalence data. It is also structurally identifiable to cumulative incidence observations, but due to high correlations of the parameters, it is practically unidentifiable from the cumulative incidence data. Furthermore, we found that none of these simple epidemic models are practically identifiable from the cumulative incidence data which is the standard type of epidemiological data provided by CDC or WHO. Our analysis with simple SIR model suggest that the health agencies, if possible, should report prevalence rather than incidence data.

International Journal of Critical Infrastructure Protection | 2014

Effect of air travel on the spread of an avian influenza pandemic to the United States

Necibe Tuncer; Trang T. Le

Abstract The highly pathogenic avian influenza (HPAI) strain H5N1, which first appeared in Hong Kong in 1997, achieved bird-to-human transmission, causing a severe disease with high mortality to humans [18]. According to the World Health Organization (WHO), a total of 637 cases were reported in fifteen countries, including 378 deaths, corresponding to a case fatality rate of nearly 60% [19]. Avian influenza continues to be one of the deadliest diseases that jumps from animals to humans. Epidemiologists believe that it is likely to cause the next major global pandemic that could kill millions of people. The 2002 outbreak of severe acute respiratory syndrome (SARS) demonstrated that international air travel can significantly influence the global spread of an infectious disease. This paper studies the effects of air travel on the spread of avian influenza from Asian and Australian cities to the United States. A two-city mathematical model involving a pandemic strain is used to derive the basic reproduction number ( R 0 ), which determines if the disease will spread and persist ( R 0 > 1 ) or go extinct ( R 0 < 1 ). Real air travel data is used to model the disease spread by individuals who are susceptible to or are infected with pandemic avian influenza. Analysis of the two-city model helps understand the dynamics of the spread of pandemic influenza when the cities are connected by air travel. Understanding these effects can help public health officials and policy-makers select the appropriate disease control measures. Also, it can provide guidance to decision-makers on where to implement control measures while conserving precious resources.

bioRxiv | 2018

Labels of aberrant Clusters of Differentiation gene expression in a compendium of systemic lupus erythematosus patients

Trang T. Le; Nigel O. Blackwood; Matthew K. Breitenstein

This author manuscript serves as an extended annotation of gene expression for all known clusters of differentiation (CD) within a compendium of systemic lupus erythematosus (SLE) patients. The overarching goal for this line of research is to enrich the perspective of the CD transcriptome with upstream gene expression features.

Translational Psychiatry | 2018

Identification and replication of RNA-Seq gene network modules associated with depression severity

Trang T. Le; Jonathan Savitz; Hideo Suzuki; Masaya Misaki; T. Kent Teague; Bill C. White; Julie H. Marino; Graham B. Wiley; Patrick M. Gaffney; Wayne C. Drevets; Brett A. McKinney; Jerzy Bodurka

Genomic variation underlying major depressive disorder (MDD) likely involves the interaction and regulation of multiple genes in a network. Data-driven co-expression network module inference has the potential to account for variation within regulatory networks, reduce the dimensionality of RNA-Seq data, and detect significant gene-expression modules associated with depression severity. We performed an RNA-Seq gene co-expression network analysis of mRNA data obtained from the peripheral blood mononuclear cells of unmedicated MDD (n = 78) and healthy control (n = 79) subjects. Across the combined MDD and HC groups, we assigned genes into modules using hierarchical clustering with a dynamic tree cut method and projected the expression data onto a lower-dimensional module space by computing the single-sample gene set enrichment score of each module. We tested the single-sample scores of each module for association with levels of depression severity measured by the Montgomery-Åsberg Depression Scale (MADRS). Independent of MDD status, we identified 23 gene modules from the co-expression network. Two modules were significantly associated with the MADRS score after multiple comparison adjustment (adjusted p = 0.009, 0.028 at 0.05 FDR threshold), and one of these modules replicated in a previous RNA-Seq study of MDD (p = 0.03). The two MADRS-associated modules contain genes previously implicated in mood disorders and show enrichment of apoptosis and B cell receptor signaling. The genes in these modules show a correlation between network centrality and univariate association with depression, suggesting that intramodular hub genes are more likely to be related to MDD compared to other genes in a module.

Frontiers in Aging Neuroscience | 2018

A Nonlinear Simulation Framework Supports Adjusting for Age When Analyzing BrainAGE

Trang T. Le; Rayus Kuplicki; Brett A. McKinney; Hung-Wen Yeh; Wesley K. Thompson; Martin P. Paulus; Tulsa Investigators

Several imaging modalities, including T1-weighted structural imaging, diffusion tensor imaging, and functional MRI can show chronological age related changes. Employing machine learning algorithms, an individuals imaging data can predict their age with reasonable accuracy. While details vary according to modality, the general strategy is to: (1) extract image-related features, (2) build a model on a training set that uses those features to predict an individuals age, (3) validate the model on a test dataset, producing a predicted age for each individual, (4) define the “Brain Age Gap Estimate” (BrainAGE) as the difference between an individuals predicted age and his/her chronological age, (5) estimate the relationship between BrainAGE and other variables of interest, and (6) make inferences about those variables and accelerated or delayed brain aging. For example, a group of individuals with overall positive BrainAGE may show signs of accelerated aging in other variables as well. There is inevitably an overestimation of the age of younger individuals and an underestimation of the age of older individuals due to “regression to the mean.” The correlation between chronological age and BrainAGE may significantly impact the relationship between BrainAGE and other variables of interest when they are also related to age. In this study, we examine the detectability of variable effects under different assumptions. We use empirical results from two separate datasets [training = 475 healthy volunteers, aged 18–60 years (259 female); testing = 489 participants including people with mood/anxiety, substance use, eating disorders and healthy controls, aged 18–56 years (312 female)] to inform simulation parameter selection. Outcomes in simulated and empirical data strongly support the proposal that models incorporating BrainAGE should include chronological age as a covariate. We propose either including age as a covariate in step 5 of the above framework, or employing a multistep procedure where age is regressed on BrainAGE prior to step 5, producing BrainAGE Residualized (BrainAGER) scores.

Bioinformatics | 2018

STatistical Inference Relief (STIR) feature selection

Trang T. Le; Ryan J. Urbanowicz; Jason H. Moore; Brett A. McKinney

Motivation Relief is a family of machine learning algorithms that uses nearest‐neighbors to select features whose association with an outcome may be due to epistasis or statistical interactions with other features in high‐dimensional data. Relief‐based estimators are non‐parametric in the statistical sense that they do not have a parameterized model with an underlying probability distribution for the estimator, making it difficult to determine the statistical significance of Relief‐based attribute estimates. Thus, a statistical inferential formalism is needed to avoid imposing arbitrary thresholds to select the most important features. We reconceptualize the Relief‐based feature selection algorithm to create a new family of STatistical Inference Relief (STIR) estimators that retains the ability to identify interactions while incorporating sample variance of the nearest neighbor distances into the attribute importance estimation. This variance permits the calculation of statistical significance of features and adjustment for multiple testing of Relief‐based scores. Specifically, we develop a pseudo t‐test version of Relief‐based algorithms for case‐control data. Results We demonstrate the statistical power and control of type I error of the STIR family of feature selection methods on a panel of simulated data that exhibits properties reflected in real gene expression data, including main effects and network interaction effects. We compare the performance of STIR when the adaptive radius method is used as the nearest neighbor constructor with STIR when the fixed‐k nearest neighbor constructor is used. We apply STIR to real RNA‐Seq data from a study of major depressive disorder and discuss STIRs straightforward extension to genome‐wide association studies. Availability and implementation Code and data available at http://insilico.utulsa.edu/software/STIR. Supplementary information Supplementary data are available at Bioinformatics online.

ieee international conference on healthcare informatics | 2018