Tianwei Yu
Emory University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tianwei Yu.
BMC Genomics | 2008
Hui Ye; Tianwei Yu; Stéphane Temam; Barry L. Ziober; Jianguang Wang; Joel L. Schwartz; Li Mao; David T. Wong; Xiaofeng Zhou
BackgroundThe head and neck/oral squamous cell carcinoma (HNOSCC) is a diverse group of cancers, which develop from many different anatomic sites and are associated with different risk factors and genetic characteristics. The oral tongue squamous cell carcinoma (OTSCC) is one of the most common types of HNOSCC. It is significantly more aggressive than other forms of HNOSCC, in terms of local invasion and spread. In this study, we aim to identify specific transcriptomic signatures that associated with OTSCC.ResultsGenome-wide transcriptomic profiles were obtained for 53 primary OTSCCs and 22 matching normal tissues. Genes that exhibit statistically significant differences in expression between OTSCCs and normal were identified. These include up-regulated genes (MMP1, MMP10, MMP3, MMP12, PTHLH, INHBA, LAMC2, IL8, KRT17, COL1A2, IFI6, ISG15, PLAU, GREM1, MMP9, IFI44, CXCL1), and down-regulated genes (KRT4, MAL, CRNN, SCEL, CRISP3, SPINK5, CLCA4, ADH1B, P11, TGM3, RHCG, PPP1R3C, CEACAM7, HPGD, CFD, ABCA8, CLU, CYP3A5). The expressional difference of IL8 and MMP9 were further validated by real-time quantitative RT-PCR and immunohistochemistry. The Gene Ontology analysis suggested a number of altered biological processes in OTSCCs, including enhancements in phosphate transport, collagen catabolism, I-kappaB kinase/NF-kappaB signaling cascade, extracellular matrix organization and biogenesis, chemotaxis, as well as suppressions of superoxide release, hydrogen peroxide metabolism, cellular response to hydrogen peroxide, keratinization, and keratinocyte differentiation in OTSCCs.ConclusionIn summary, our study provided a transcriptomic signature for OTSCC that may lead to a diagnosis or screen tool and provide the foundation for further functional validation of these specific candidate genes for OTSCC.
BioMed Research International | 2015
Kai Wang; Qing Zhao; Jianwei Lu; Tianwei Yu
With modern technologies such as microarray, deep sequencing, and liquid chromatography-mass spectrometry (LC-MS), it is possible to measure the expression levels of thousands of genes/proteins simultaneously to unravel important biological processes. A very first step towards elucidating hidden patterns and understanding the massive data is the application of clustering techniques. Nonlinear relations, which were mostly unutilized in contrast to linear correlations, are prevalent in high-throughput data. In many cases, nonlinear relations can model the biological relationship more precisely and reflect critical patterns in the biological systems. Using the general dependency measure, Distance Based on Conditional Ordered List (DCOL) that we introduced before, we designed the nonlinear K-profiles clustering method, which can be seen as the nonlinear counterpart of the K-means clustering algorithm. The method has a built-in statistical testing procedure that ensures genes not belonging to any cluster do not impact the estimation of cluster profiles. Results from extensive simulation studies showed that K-profiles clustering not only outperformed traditional linear K-means algorithm, but also presented significantly better performance over our previous General Dependency Hierarchical Clustering (GDHC) algorithm. We further analyzed a gene expression dataset, on which K-profile clustering generated biologically meaningful results.
Bioinformatics | 2009
Tianwei Yu; Youngja Park; Jennifer M. Johnson; Dean P. Jones
MOTIVATION Liquid chromatography-mass spectrometry (LC/MS) profiling is a promising approach for the quantification of metabolites from complex biological samples. Significant challenges exist in the analysis of LC/MS data, including noise reduction, feature identification/ quantification, feature alignment and computation efficiency. RESULT Here we present a set of algorithms for the processing of high-resolution LC/MS data. The major technical improvements include the adaptive tolerance level searching rather than hard cutoff or binning, the use of non-parametric methods to fine-tune intensity grouping, the use of run filter to better preserve weak signals and the model-based estimation of peak intensities for absolute quantification. The algorithms are implemented in an R package apLCMS, which can efficiently process large LC/ MS datasets. AVAILABILITY The R package apLCMS is available at www.sph.emory.edu/apLCMS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Nucleic Acids Research | 2009
Wei Sun; Fred A. Wright; Zhengzheng Tang; Silje H. Nordgard; Peter Van Loo; Tianwei Yu; Vessela N. Kristensen; Charles M. Perou
We propose a statistical framework, named genoCN, to simultaneously dissect copy number states and genotypes using high-density SNP (single nucleotide polymorphism) arrays. There are at least two types of genomic DNA copy number differences: copy number variations (CNVs) and copy number aberrations (CNAs). While CNVs are naturally occurring and inheritable, CNAs are acquired somatic alterations most often observed in tumor tissues only. CNVs tend to be short and more sparsely located in the genome compared with CNAs. GenoCN consists of two components, genoCNV and genoCNA, designed for CNV and CNA studies, respectively. In contrast to most existing methods, genoCN is more flexible in that the model parameters are estimated from the data instead of being decided a priori. GenoCNA also incorporates two important strategies for CNA studies. First, the effects of tissue contamination are explicitly modeled. Second, if SNP arrays are performed for both tumor and normal tissues of one individual, the genotype calls from normal tissue are used to study CNAs in tumor tissue. We evaluated genoCN by applications to 162 HapMap individuals and a brain tumor (glioblastoma) dataset and showed that our method can successfully identify both types of copy number differences and produce high-quality genotype calls.
BMC Bioinformatics | 2013
Karan Uppal; Quinlyn A. Soltow; Frederick H. Strobel; W. Stephen Pittard; Kim M. Gernert; Tianwei Yu; Dean P. Jones
BackgroundDetection of low abundance metabolites is important for de novo mapping of metabolic pathways related to diet, microbiome or environmental exposures. Multiple algorithms are available to extract m/z features from liquid chromatography-mass spectral data in a conservative manner, which tends to preclude detection of low abundance chemicals and chemicals found in small subsets of samples. The present study provides software to enhance such algorithms for feature detection, quality assessment, and annotation.ResultsxMSanalyzer is a set of utilities for automated processing of metabolomics data. The utilites can be classified into four main modules to: 1) improve feature detection for replicate analyses by systematic re-extraction with multiple parameter settings and data merger to optimize the balance between sensitivity and reliability, 2) evaluate sample quality and feature consistency, 3) detect feature overlap between datasets, and 4) characterize high-resolution m/z matches to small molecule metabolites and biological pathways using multiple chemical databases. The package was tested with plasma samples and shown to more than double the number of features extracted while improving quantitative reliability of detection. MS/MS analysis of a random subset of peaks that were exclusively detected using xMSanalyzer confirmed that the optimization scheme improves detection of real metabolites.ConclusionsxMSanalyzer is a package of utilities for data extraction, quality control assessment, detection of overlapping and unique metabolites in multiple datasets, and batch annotation of metabolites. The program was designed to integrate with existing packages such as apLCMS and XCMS, but the framework can also be used to enhance data extraction for other LC/MS data software.
Nucleic Acids Research | 2006
Yi Xing; Tianwei Yu; Ying Nian Wu; Meenakshi Roy; Joseph Kim; Christopher Lee
Reconstructing full-length transcript isoforms from sequence fragments (such as ESTs) is a major interest and challenge for bioinformatic analysis of pre-mRNA alternative splicing. This problem has been formulated as finding traversals across the splice graph, which is a directed acyclic graph (DAG) representation of gene structure and alternative splicing. In this manuscript we introduce a probabilistic formulation of the isoform reconstruction problem, and provide an expectation-maximization (EM) algorithm for its maximum likelihood solution. Using a series of simulated data and expressed sequences from real human genes, we demonstrate that our EM algorithm can correctly handle various situations of fragmentation and coupling in the input data. Our work establishes a general probabilistic framework for splice graph-based reconstructions of full-length isoforms.
PLOS Pathogens | 2012
Jessica L. Prince; Daniel T. Claiborne; Jonathan M. Carlson; Malinda Schaefer; Tianwei Yu; Shabir Lahki; Heather A. Prentice; Ling Yue; Sundaram A. Vishwanathan; William Kilembe; Paul A. Goepfert; Matthew Price; Jill Gilmour; Joseph Mulenga; Paul Farmer; Cynthia A. Derdeyn; Jiaming Tang; David Heckerman; Richard A. Kaslow; Susan Allen; Eric Hunter
Initial studies of 88 transmission pairs in the Zambia Emory HIV Research Project cohort demonstrated that the number of transmitted HLA-B associated polymorphisms in Gag, but not Nef, was negatively correlated to set point viral load (VL) in the newly infected partners. These results suggested that accumulation of CTL escape mutations in Gag might attenuate viral replication and provide a clinical benefit during early stages of infection. Using a novel approach, we have cloned gag sequences isolated from the earliest seroconversion plasma sample from the acutely infected recipient of 149 epidemiologically linked Zambian transmission pairs into a primary isolate, subtype C proviral vector, MJ4. We determined the replicative capacity (RC) of these Gag-MJ4 chimeras by infecting the GXR25 cell line and quantifying virion production in supernatants via a radiolabeled reverse transcriptase assay. We observed a statistically significant positive correlation between RC conferred by the transmitted Gag sequence and set point VL in newly infected individuals (p = 0.02). Furthermore, the RC of Gag-MJ4 chimeras also correlated with the VL of chronically infected donors near the estimated date of infection (p = 0.01), demonstrating that virus replication contributes to VL in both acute and chronic infection. These studies also allowed for the elucidation of novel sites in Gag associated with changes in RC, where rare mutations had the greatest effect on fitness. Although we observed both advantageous and deleterious rare mutations, the latter could point to vulnerable targets in the HIV-1 genome. Importantly, RC correlated significantly (p = 0.029) with the rate of CD4+ T cell decline over the first 3 years of infection in a manner that is partially independent of VL, suggesting that the replication capacity of HIV-1 during the earliest stages of infection is a determinant of pathogenesis beyond what might be expected based on set point VL alone.
Bioinformatics | 2005
Tianwei Yu; Ker-Chau Li
MOTIVATION Microarray gene expression and cross-linking chromatin immunoprecipitation data contain voluminous information that can help the identification of transcriptional regulatory networks at the full genome scale. Such high-throughput data are noisy however. In contrast, from the biomedical literature, we can find many evidenced transcription factor (TF)-target gene binding relationships that have been elucidated at the molecular level. But such sporadically generated knowledge only offers glimpses on limited patches of the network. How to incorporate this valuable knowledge resource to build more reliable network models remains a question. RESULTS We present a modified factor analysis approach. Our algorithm starts with the evidenced TF-gene linkages. It iterates between the network configuration estimation step and the connection strength estimation step, using the high-throughput data, till convergence. We report two comprehensive regulatory networks obtained for Saccharomyces cerevisiae, one under the normal growth condition and the other under the environmental stress condition. SUPPLEMENTARY INFORMATION http://kiefer.stat.ucla.edu/lap2/download/bti656_supplement.pdf.
PLOS ONE | 2012
Toidi Adekambi; Chris Ibegbu; Ameeta S. Kalokhe; Tianwei Yu; Susan M. Ray; Jyothi Rengarajan
Two billion people worldwide are estimated to be latently infected with Mycobacterium tuberculosis (Mtb) and are at risk for developing active tuberculosis since Mtb can reactivate to cause TB disease in immune-compromised hosts. Individuals with latent Mtb infection (LTBI) and BCG-vaccinated individuals who are uninfected with Mtb, harbor antigen-specific memory CD4+ T cells. However, the differences between long-lived memory CD4+ T cells induced by latent Mtb infection (LTBI) versus BCG vaccination are unclear. In this study, we characterized the immune phenotype and functionality of antigen-specific memory CD4+ T cells in healthy BCG-vaccinated individuals who were either infected (LTBI) or uninfected (BCG) with Mtb. Individuals were classified into LTBI and BCG groups based on IFN-γ ELISPOT using cell wall antigens and ESAT-6/CFP-10 peptides. We show that LTBI individuals harbored high frequencies of late-stage differentiated (CD45RA−CD27−) antigen-specific effector memory CD4+ T cells that expressed PD-1. In contrast, BCG individuals had primarily early-stage (CD45RA−CD27+) cells with low PD-1 expression. CD27+ and CD27− as well as PD-1+ and PD-1− antigen-specific subsets were polyfunctional, suggesting that loss of CD27 expression and up-regulation of PD-1 did not compromise their capacity to produce IFN-γ, TNF-α and IL-2. PD-1 was preferentially expressed on CD27− antigen-specific CD4+ T cells, indicating that PD-1 is associated with the stage of differentiation. Using statistical models, we determined that CD27 and PD-1 predicted LTBI versus BCG status in healthy individuals and distinguished LTBI individuals from those who had clinically resolved Mtb infection after anti-tuberculosis treatment. This study shows that CD4+ memory responses induced by latent Mtb infection, BCG vaccination and clinically resolved Mtb infection are immunologically distinct. Our data suggest that differentiation into CD27−PD-1+ subsets in LTBI is driven by Mtb antigenic stimulation in vivo and that CD27 and PD-1 have the potential to improve our ability to evaluate true LTBI status.
Analyst | 2010
Jennifer M. Johnson; Tianwei Yu; Frederick H. Strobel; Dean P. Jones
Information-rich technologies have advanced personalized medicine, yet obstacles limit measurement of large numbers of chemicals in human samples. Current laboratory tests measure hundreds of chemicals based upon existing knowledge of exposures, metabolism and disease mechanisms. Practical issues of cost and throughput preclude measurement of thousands of chemicals. Additionally, individuals are genetically diverse and have different exposures and response characteristics; some have disease mechanisms that have not yet been elucidated. Consequently, methods are needed to detect unique metabolic characteristics without presumption of known pathways, exposures or disease mechanisms, i.e., using a top-down approach. In this report, we describe profiling of human plasma with liquid chromatography (LC) coupled to Fourier-transform mass spectrometry (FTMS). FTMS is a high-resolution mass spectrometer providing mass accuracy and resolution to discriminate thousands of m/z features, which are peaks defined by m/z, retention time and intensity. We demonstrate that LC-FTMS detects 2000 m/z features in 10 min. These features include known and unidentified chemicals with m/z between 85 and 850, most with <10% coefficient of variation. Comparison of metabolic profiles for 4 healthy individuals showed that 62% of the m/z features were common while 10% were unique and 770 discriminated the individuals. Because the simple one-step extraction and automated analysis is rapid and cost-effective, the approach is practical for personalized medicine. This provides a basis to rapidly characterize novel metabolic patterns which can be linked to genetics, environment and/or lifestyle.