Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Tin Nguyen is active.

Publication


Featured researches published by Tin Nguyen.


Bioinformatics | 2016

A novel bi-level meta-analysis approach: applied to biological pathway analysis.

Tin Nguyen; Rebecca Tagett; Michele Donato; Cristina Mitrea; Sorin Draghici

MOTIVATION The accumulation of high-throughput data in public repositories creates a pressing need for integrative analysis of multiple datasets from independent experiments. However, study heterogeneity, study bias, outliers and the lack of power of available methods present real challenge in integrating genomic data. One practical drawback of many P-value-based meta-analysis methods, including Fishers, Stouffers, minP and maxP, is that they are sensitive to outliers. Another drawback is that, because they perform just one statistical test for each individual experiment, they may not fully exploit the potentially large number of samples within each study. RESULTS We propose a novel bi-level meta-analysis approach that employs the additive method and the Central Limit Theorem within each individual experiment and also across multiple experiments. We prove that the bi-level framework is robust against bias, less sensitive to outliers than other methods, and more sensitive to small changes in signal. For comparative analysis, we demonstrate that the intra-experiment analysis has more power than the equivalent statistical test performed on a single large experiment. For pathway analysis, we compare the proposed framework versus classical meta-analysis approaches (Fishers, Stouffers and the additive method) as well as against a dedicated pathway meta-analysis package (MetaPath), using 1252 samples from 21 datasets related to three human diseases, acute myeloid leukemia (9 datasets), type II diabetes (5 datasets) and Alzheimers disease (7 datasets). Our framework outperforms its competitors to correctly identify pathways relevant to the phenotypes. The framework is sufficiently general to be applied to any type of statistical meta-analysis. AVAILABILITY AND IMPLEMENTATION The R scripts are available on demand from the authors. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


bioRxiv | 2017

Community assessment of cancer drug combination screens identifies strategies for synergy prediction

Michael P. Menden; Dennis Wang; Yuanfang Guan; Michael Mason; Bence Szalai; Krishna C Bulusu; Thomas Yu; Jaewoo Kang; Minji Jeon; Russ Wolfinger; Tin Nguyen; Mikhail Zaslavskiy; In Sock Jang; Zara Ghazoui; Mehmet Eren Ahsen; Robert Vogel; Elias Chaibub Neto; Thea Norman; Eric Tang; Mathew J. Garnett; Giovanni Y. Di Veroli; Stephen Fawell; Gustavo Stolovitzky; Justin Guinney; Jonathan R. Dry; Julio Saez-Rodriguez

In the last decade advances in genomics, uptake of targeted therapies, and the advent of personalized treatments have fueled a dramatic change in cancer care. However, the effectiveness of most targeted therapies is short lived, as tumors evolve and develop resistance. Combinations of drugs offer the potential to overcome resistance. The space of possible combinations is vast, and significant advances are required to effectively find optimal treatment regimens tailored to a patient’s tumor. DREAM and AstraZeneca hosted a Challenge open to the scientific community aimed at computational prediction of synergistic drug combinations and predictive biomarkers associated to these combinations. We released a data set comprising ~11,500 experimentally tested drug combinations, coupled to deep molecular characterization of the respective 85 cancer cell lines. Among 150 submitted approaches, those that incorporated prior knowledge of putative drug targets showed superior performance predicting drug synergy across independent data. Genomic features of best-performing models revealed putative mechanisms of drug synergy for multiple drugs in combination with PI3K/AKT pathway inhibitors.Abstract The effectiveness of most cancer targeted therapies is short lived since tumors evolve and develop resistance. Combinations of drugs offer the potential to overcome resistance, however the number of possible combinations is vast necessitating data-driven approaches to find optimal treatments tailored to a patient’s tumor. AstraZeneca carried out 11,576 experiments on 910 drug combinations across 85 cancer cell lines, recapitulating in vivo response profiles. These data, the largest openly available screen, were hosted by DREAM alongside deep molecular characterization from the Sanger Institute for a Challenge to computationally predict synergistic drug pairs and associated biomarkers. 160 teams participated to provide the most comprehensive methodological development and subsequent benchmarking to date. Winning methods incorporated prior knowledge of putative drug target interactions. For >60% of drug combinations synergy was reproducibly predicted with an accuracy matching biological replicate experiments, however 20% of drug combinations were poorly predicted by all methods. Genomic rationale for synergy predictions were identified, including antagonism unique to combined PIK3CB/D inhibition with the ADAM17 inhibitor where synergy is seen with other PI3K pathway inhibitors. All data, methods and code are freely available as a resource to the community.


Proceedings of the IEEE | 2017

DANUBE: Data-Driven Meta-ANalysis Using UnBiased Empirical Distributions—Applied to Biological Pathway Analysis

Tin Nguyen; Cristina Mitrea; Rebecca Tagett; Sorin Draghici

Identifying the pathways and mechanisms that are significantly impacted in a given phenotype is challenging. Issues include patient heterogeneity and noise. Many experiments do not have a large enough sample size to achieve the statistical power necessary to identify significantly impacted pathways. Meta-analysis based on combining p-values from individual experiments has been used to improve power. However, all classical meta-analysis approaches work under the assumption that the p-values produced by experiment-level statistical tests follow a uniform distribution under the null hypothesis. Here, we show that this assumption does not hold for three mainstream pathway analysis methods, and significant bias is likely to affect many, if not all, such meta-analysis studies. We introduce DANUBE, a novel and unbiased approach to combine statistics computed from individual studies. Our framework uses control samples to construct empirical null distributions, from which empirical p-values of individual studies are calculated and combined using either a Central Limit Theorem approach or the additive method. We assess the performance of DANUBE using four different pathway analysis methods. DANUBE is compared to five meta-analysis approaches, as well as with a pathway analysis approach that employs multiple datasets (MetaPath). The 25 approaches have been tested on 16 different datasets related to two human diseases, Alzheimers disease (7 datasets) and acute myeloid leukemia (9 datasets). We demonstrate that DANUBE overcomes bias in order to consistently identify relevant pathways. We also show how the framework improves results in more general cases, compared to classical meta-analysis performed with common experiment-level statistical tests such as Wilcoxon and t-test.


Scientific Reports | 2016

Overcoming the matched-sample bottleneck: an orthogonal approach to integrate omic data

Tin Nguyen; Diana Diaz; Rebecca Tagett; Sorin Draghici

MicroRNAs (miRNAs) are small non-coding RNA molecules whose primary function is to regulate the expression of gene products via hybridization to mRNA transcripts, resulting in suppression of translation or mRNA degradation. Although miRNAs have been implicated in complex diseases, including cancer, their impact on distinct biological pathways and phenotypes is largely unknown. Current integration approaches require sample-matched miRNA/mRNA datasets, resulting in limited applicability in practice. Since these approaches cannot integrate heterogeneous information available across independent experiments, they neither account for bias inherent in individual studies, nor do they benefit from increased sample size. Here we present a novel framework able to integrate miRNA and mRNA data (vertical data integration) available in independent studies (horizontal meta-analysis) allowing for a comprehensive analysis of the given phenotypes. To demonstrate the utility of our method, we conducted a meta-analysis of pancreatic and colorectal cancer, using 1,471 samples from 15 mRNA and 14 miRNA expression datasets. Our two-dimensional data integration approach greatly increases the power of statistical analysis and correctly identifies pathways known to be implicated in the phenotypes. The proposed framework is sufficiently general to integrate other types of data obtained from high-throughput assays.


bioinformatics and biomedicine | 2011

SPATA: A seeding and patching algorithm for de novo transcriptome assembly

Zhiyu Zhao; Tin Nguyen; Nan Deng; Kristen Johnson; Dongxiao Zhu

RNA-seq reads are sampled from the underlying human transcriptome sequence, consisting of hundreds of thousands of mRNA transcripts. De novo transcriptome reconstruction from RNA-seq reads is a promising approach but facing algorithmic and computational challenges derived from nonlinear transcript structures and ultra high-throughput read counts. To tackle this issue, we designed a divide-and-conquer strategy to perform reads localization followed by a novel algorithm to assemble reads de novo. Using simulation studies, we have demonstrated a high accuracy in transcriptome structures reconstruction.


Genome Research | 2017

A novel approach for data integration and disease subtyping

Tin Nguyen; Rebecca Tagett; Diana Diaz; Sorin Draghici

Advances in high-throughput technologies allow for measurements of many types of omics data, yet the meaningful integration of several different data types remains a significant challenge. Another important and difficult problem is the discovery of molecular disease subtypes characterized by relevant clinical differences, such as survival. Here we present a novel approach, called perturbation clustering for data integration and disease subtyping (PINS), which is able to address both challenges. The framework has been validated on thousands of cancer samples, using gene expression, DNA methylation, noncoding microRNA, and copy number variation data available from the Gene Expression Omnibus, the Broad Institute, The Cancer Genome Atlas (TCGA), and the European Genome-Phenome Archive. This simultaneous subtyping approach accurately identifies known cancer subtypes and novel subgroups of patients with significantly different survival profiles. The results were obtained from genome-scale molecular data without any other type of prior knowledge. The approach is sufficiently general to replace existing unsupervised clustering approaches outside the scope of bio-medical research, with the additional ability to integrate multiple types of data.


international conference on bioinformatics | 2013

MarkovBin: An Algorithm to Cluster Metagenomic Reads Using a Mixture Modeling of Hierarchical Distributions

Tin Nguyen; Dongxiao Zhu

Metagenomics is the study of genomic content of microorganisms from environmental samples without isolation and cultivation. Recently developed next generation sequencing (NGS) technologies efficiently generate vast amounts of metagenomic DNA sequences. However, the ultra-high throughput and short read lengths make the separation of reads from different species more challenging. Among the existing computational tools for NGS data, there are supervised methods that use reference databases to classify reads and unsupervised methods that use oligonucleotide patterns to cluster reads. The former may leave a large fraction of reads unclassified due to the absence of closely related references. The latter often rely on long oligonucleotide frequencies and are sensitive to species abundance levels. In this work, we present MarkovBin, a new unsupervised method that can accurately cluster metagenomic reads across various species abundance ratios. We first model the nucleotide sequences as a fixed-order Markov chain. We then propose a hierarchical distribution to model the dependency between paired-end reads. Finally, we employ the mixture model framework to separate reads from different genomes in a metagenomic dataset. Using extensive simulation data, we demonstrate a high accuracy and precision by comparing to selected unsupervised read clustering tools. The software is freely available at http://orleans.cs.wayne.edu/MarkovBin.


Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine | 2012

QSEA for fuzzy subgraph querying of KEGG pathways

Thair Judeh; Tin Nguyen; Dongxiao Zhu

As biological pathway databases continually increase in size and availability, efficient tools and techniques to query these databases are needed to mine useful biological information. A plethora of existing techniques already allow for exact or approximate query matching. Despite initial success, powerful techniques used for XML and RDF query matching have yet to be sufficiently exploited for use in query matching in the bioinformatics domain. In this paper, we employ the transitive closure to focus on matching hierarchical queries, i.e., finding pathways or graphs that possess a querys overall hierarchical structure. This approach allows for a greater latitude in fuzzy matching by focusing on the overall hierarchies of queries and graphs. Since hierarchies are only inherent in directed acyclic graphs, we have also developed a robust heuristic to heuristically solve the minimum feedback arc set problem. Analysis on 53 H. sapiens and 23 S. cerevisiae cyclic KEGG pathways have shown that our heuristic performs quite favorably. We have implemented the techniques in an easy to use GUI software QSEA (Query Structure Enrichment Analysis). Binaries are freely available at http://code.google.com/p/s-e-a/ for Windows and MAC.


bioinformatics and biomedicine | 2011

iQuant: A fast yet accurate GUI tool for transcript quantification

Tin Nguyen; Nan Deng; Guorong Xu; Zhansheng Duan; Dongxiao Zhu

Transcript quantification using RNA-seq is central to contemporary and future transcriptomics research. The existing tools are useful but have much room for improvement. We present a new statistical model, a fast yet accurate transcript quantification algorithm. Our tool takes RNA-seq reads in fasta or fastq format as input and output transcript abundance through a few mouse clicks. Our method compares favorably with the existing GUI tools in terms of both time complexity and accuracy. Availability: Both simulation data used for method comparisons and the GUI tool are freely available at http://asammate.sourceforge.net/.


Briefings in Bioinformatics | 2018

A survey of the approaches for identifying differential methylation using bisulfite sequencing data

Adib Shafi; Cristina Mitrea; Tin Nguyen; Sorin Draghici

&NA; DNA methylation is an important epigenetic mechanism that plays a crucial role in cellular regulatory systems. Recent advancements in sequencing technologies now enable us to generate high‐throughput methylation data and to measure methylation up to single‐base resolution. This wealth of data does not come without challenges, and one of the key challenges in DNA methylation studies is to identify the significant differences in the methylation levels of the base pairs across distinct biological conditions. Several computational methods have been developed to identify differential methylation using bisulfite sequencing data; however, there is no clear consensus among existing approaches. A comprehensive survey of these approaches would be of great benefit to potential users and researchers to get a complete picture of the available resources. In this article, we present a detailed survey of 22 such approaches focusing on their underlying statistical models, primary features, key advantages and major limitations. Importantly, the intrinsic drawbacks of the approaches pointed out in this survey could potentially be addressed by future research.

Collaboration


Dive into the Tin Nguyen's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Diana Diaz

Wayne State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nan Deng

Wayne State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Zhiyu Zhao

University of Texas Southwestern Medical Center

View shared research outputs
Top Co-Authors

Avatar

Adib Shafi

Wayne State University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge