Zhenqiang Su
Food and Drug Administration
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Zhenqiang Su.
Nature Biotechnology | 2014
Charles Wang; Binsheng Gong; Pierre R. Bushel; Jean Thierry-Mieg; Danielle Thierry-Mieg; Joshua Xu; Hong Fang; Huixiao Hong; Jie Shen; Zhenqiang Su; Joe Meehan; Xiaojin Li; Lu Yang; Haiqing Li; Paweł P. Łabaj; David P. Kreil; Dalila B. Megherbi; Stan Gaj; Florian Caiment; Joost H.M. van Delft; Jos Kleinjans; Andreas Scherer; Viswanath Devanarayan; Jian Wang; Yong Yang; Hui-Rong Qian; Lee Lancashire; Marina Bessarabova; Yuri Nikolsky; Cesare Furlanello
The concordance of RNA-sequencing (RNA-seq) with microarrays for genome-wide analysis of differential gene expression has not been rigorously assessed using a range of chemical treatment conditions. Here we use a comprehensive study design to generate Illumina RNA-seq and Affymetrix microarray data from the same liver samples of rats exposed in triplicate to varying degrees of perturbation by 27 chemicals representing multiple modes of action (MOAs). The cross-platform concordance in terms of differentially expressed genes (DEGs) or enriched pathways is linearly correlated with treatment effect size (R20.8). Furthermore, the concordance is also affected by transcript abundance and biological complexity of the MOA. RNA-seq outperforms microarray (93% versus 75%) in DEG verification as assessed by quantitative PCR, with the gain mainly due to its improved accuracy for low-abundance transcripts. Nonetheless, classifiers to predict MOAs perform similarly when developed using data from either platform. Therefore, the endpoint studied and its biological complexity, transcript abundance and the genomic application are important factors in transcriptomic research and for clinical and regulatory decision making.
BMC Bioinformatics | 2005
Leming Shi; Weida Tong; Hong Fang; Uwe Scherf; Jing Han; Raj K. Puri; Felix W. Frueh; Federico Goodsaid; Lei Guo; Zhenqiang Su; Tao Han; James C. Fuscoe; Z aAlex Xu; Tucker A. Patterson; Huixiao Hong; Qian Xie; Roger Perkins; James J. Chen; Daniel A. Casciano
BackgroundThe acceptance of microarray technology in regulatory decision-making is being challenged by the existence of various platforms and data analysis methods. A recent report (E. Marshall, Science, 306, 630–631, 2004), by extensively citing the study of Tan et al. (Nucleic Acids Res., 31, 5676–5684, 2003), portrays a disturbingly negative picture of the cross-platform comparability, and, hence, the reliability of microarray technology.ResultsWe reanalyzed Tans dataset and found that the intra-platform consistency was low, indicating a problem in experimental procedures from which the dataset was generated. Furthermore, by using three gene selection methods (i.e., p-value ranking, fold-change ranking, and Significance Analysis of Microarrays (SAM)) on the same dataset we found that p-value ranking (the method emphasized by Tan et al.) results in much lower cross-platform concordance compared to fold-change ranking or SAM. Therefore, the low cross-platform concordance reported in Tans study appears to be mainly due to a combination of low intra-platform consistency and a poor choice of data analysis procedures, instead of inherent technical differences among different platforms, as suggested by Tan et al. and Marshall.ConclusionOur results illustrate the importance of establishing calibrated RNA samples and reference datasets to objectively assess the performance of different microarray platforms and the proficiency of individual laboratories as well as the merits of various data analysis procedures. Thus, we are progressively coordinating the MAQC project, a community-wide effort for microarray quality control.
Expert Review of Molecular Diagnostics | 2011
Zhenqiang Su; Baitang Ning; Hong Fang; Huixiao Hong; Roger Perkins; Weida Tong; Leming Shi
DNA sequencing is a powerful approach for decoding a number of human diseases, including cancers. The advent of next-generation sequencing (NGS) technologies has reduced sequencing cost by orders of magnitude and significantly increased the throughput, making whole-genome sequencing a possible way for obtaining global genomic information about patients on whom clinical actions may be taken. However, the benefits offered by NGS technologies come with a number of challenges that must be adequately addressed before they can be transformed from research tools to routine clinical practices. This article provides an overview of four commonly used NGS technologies from Roche Applied Science//454 Life Sciences, Illumina, Life Technologies and Helicos Biosciences. The challenges in the analysis of NGS data and their potential applications in clinical diagnosis are also discussed.
Journal of Chemical Information and Modeling | 2008
Huixiao Hong; Qian Xie; Weigong Ge; Feng Qian; Hong Fang; Leming Shi; Zhenqiang Su; Roger Perkins; Weida Tong
Research applications in chemoinformatics and toxicoinformatics increasingly use representations of molecules in the form of numerical descriptors that capture the structural characteristics and properties of molecules. These representations are useful for ADME/toxicity prediction, diversity analysis, library design, QSAR/QSPR, virtual screening, and other purposes. Molecular descriptors have ranged from relatively simple forms calculated from simple two-dimensional (2D) chemical structures to more complex forms representing three-dimensional (3D) chemical structures or complex molecular fingerprints consisting of numerous bit positions to represent specific chemical information. The Mold (2) software was developed to enable the rapid calculation of a large and diverse set of descriptors encoding two-dimensional chemical structure information. Comparative analysis of Mold (2) descriptors with those calculated by Cerius (2), Dragon, and Molconn-Z on several data sets using Shannon entropy analysis demonstrated that Mold (2) descriptors convey a similar amount of information. In addition, using the same classification method, slightly better models were generated using Mold (2) descriptors compared to those generated using descriptors from the compared commercial software packages. The low computing cost for Mold (2) makes it suitable not only for small data sets, such as in QSAR, but also for large databases in virtual screening. High reproducibility and reliability are expected because Mold (2) does not require 3D structures. Mold (2) is freely available to the public ( http://www.fda.gov/nctr/science/centers/toxicoinformatics/index.htm).
Nature Communications | 2014
James C. Fuscoe; Chen Zhao; Chao Guo; Meiwen Jia; Tao Qing; Desmond I. Bannon; Lee Lancashire; Wenjun Bao; Tingting Du; Heng Luo; Zhenqiang Su; Wendell D. Jones; Carrie L. Moland; William S. Branham; Feng Qian; Baitang Ning; Yan Li; Huixiao Hong; Lei Guo; Nan Mei; Tieliu Shi; Kenneth Wang; Russell D. Wolfinger; Yuri Nikolsky; Stephen J. Walker; Penelope Jayne Duerksen-Hughes; Christopher E. Mason; Weida Tong; Jean Thierry-Mieg; Danielle Thierry-Mieg
The rat has been used extensively as a model for evaluating chemical toxicities and for understanding drug mechanisms. However, its transcriptome across multiple organs, or developmental stages, has not yet been reported. Here we show, as part of the SEQC consortium efforts, a comprehensive rat transcriptomic BodyMap created by performing RNA-Seq on 320 samples from 11 organs of both sexes of juvenile, adolescent, adult and aged Fischer 344 rats. We catalogue the expression profiles of 40,064 genes, 65,167 transcripts, 31,909 alternatively spliced transcript variants and 2,367 non-coding genes/non-coding RNAs (ncRNAs) annotated in AceView. We find that organ-enriched, differentially expressed genes reflect the known organ-specific biological activities. A large number of transcripts show organ-specific, age-dependent or sex-specific differential expression patterns. We create a web-based, open-access rat BodyMap database of expression profiles with crosslinks to other widely used databases, anticipating that it will serve as a primary resource for biomedical research using the rat model.
BMC Bioinformatics | 2005
Leming Shi; Weida Tong; Zhenqiang Su; Tao Han; Jing Han; Raj K. Puri; Hong Fang; Felix W. Frueh; Federico Goodsaid; Lei Guo; William S. Branham; James J. Chen; Z Alex Xu; Stephen Harris; Huixiao Hong; Qian Xie; Roger Perkins; James C. Fuscoe
BackgroundMicroarray-based measurement of mRNA abundance assumes a linear relationship between the fluorescence intensity and the dye concentration. In reality, however, the calibration curve can be nonlinear.ResultsBy scanning a microarray scanner calibration slide containing known concentrations of fluorescent dyes under 18 PMT gains, we were able to evaluate the differences in calibration characteristics of Cy5 and Cy3. First, the calibration curve for the same dye under the same PMT gain is nonlinear at both the high and low intensity ends. Second, the degree of nonlinearity of the calibration curve depends on the PMT gain. Third, the two PMTs (for Cy5 and Cy3) behave differently even under the same gain. Fourth, the background intensity for the Cy3 channel is higher than that for the Cy5 channel. The impact of such characteristics on the accuracy and reproducibility of measured mRNA abundance and the calculated ratios was demonstrated. Combined with simulation results, we provided explanations to the existence of ratio underestimation, intensity-dependence of ratio bias, and anti-correlation of ratios in dye-swap replicates. We further demonstrated that although Lowess normalization effectively eliminates the intensity-dependence of ratio bias, the systematic deviation from true ratios largely remained. A method of calculating ratios based on concentrations estimated from the calibration curves was proposed for correcting ratio bias.ConclusionIt is preferable to scan microarray slides at fixed, optimal gain settings under which the linearity between concentration and intensity is maximized. Although normalization methods improve reproducibility of microarray measurements, they appear less effective in improving accuracy.
Chemical Research in Toxicology | 2011
Zhenqiang Su; Zhiguang Li; Tao Chen; Quan Zhen Li; Hong Fang; Don Ding; Weigong Ge; Baitang Ning; Huixiao Hong; Roger Perkins; Weida Tong; Leming Shi
RNA-Seq has been increasingly used for the quantification and characterization of transcriptomes. The ongoing development of the technology promises the more accurate measurement of gene expression. However, its benefits over widely accepted microarray technologies have not been adequately assessed, especially in toxicogenomics studies. The goal of this study is to enhance the scientific communitys understanding of the advantages and challenges of RNA-Seq in the quantification of gene expression by comparing analysis results from RNA-Seq and microarray data on a toxicogenomics study. A typical toxicogenomics study design was used to compare the performance of an RNA-Seq approach (Illumina Genome Analyzer II) to a microarray-based approach (Affymetrix Rat Genome 230 2.0 arrays) for detecting differentially expressed genes (DEGs) in the kidneys of rats treated with aristolochic acid (AA), a carcinogenic and nephrotoxic chemical most notably used for weight loss. We studied the comparability of the RNA-Seq and microarray data in terms of absolute gene expression, gene expression patterns, differentially expressed genes, and biological interpretation. We found that RNA-Seq was more sensitive in detecting genes with low expression levels, while similar gene expression patterns were observed for both platforms. Moreover, although the overlap of the DEGs was only 40-50%, the biological interpretation was largely consistent between the RNA-Seq and microarray data. RNA-Seq maintained a consistent biological interpretation with time-tested microarray platforms while generating more sensitive results. However, there is clearly a need for future investigations to better understand the advantages and limitations of RNA-Seq in toxicogenomics studies and environmental health research.
Genome Biology | 2015
Wenqian Zhang; Falk Hertwig; Jean Thierry-Mieg; Wenwei Zhang; Danielle Thierry-Mieg; Jian Wang; Cesare Furlanello; Viswanath Devanarayan; Jie Cheng; Youping Deng; Barbara Hero; Huixiao Hong; Meiwen Jia; Li Li; Simon Lin; Yuri Nikolsky; André Oberthuer; Tao Qing; Zhenqiang Su; Ruth Volland; Charles Wang; May D. Wang; Junmei Ai; Davide Albanese; Shahab Asgharzadeh; Smadar Avigad; Wenjun Bao; Marina Bessarabova; Murray H. Brilliant; Benedikt Brors
BackgroundGene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model.ResultsWe generate gene expression profiles from 498 primary neuroblastomas using both RNA-seq and 44 k microarrays. Characterization of the neuroblastoma transcriptome by RNA-seq reveals that more than 48,000 genes and 200,000 transcripts are being expressed in this malignancy. We also find that RNA-seq provides much more detailed information on specific transcript expression patterns in clinico-genetic neuroblastoma subgroups than microarrays. To systematically compare the power of RNA-seq and microarray-based models in predicting clinical endpoints, we divide the cohort randomly into training and validation sets and develop 360 predictive models on six clinical endpoints of varying predictability. Evaluation of factors potentially affecting model performances reveals that prediction accuracies are most strongly influenced by the nature of the clinical endpoint, whereas technological platforms (RNA-seq vs. microarrays), RNA-seq data analysis pipelines, and feature levels (gene vs. transcript vs. exon-junction level) do not significantly affect performances of the models.ConclusionsWe demonstrate that RNA-seq outperforms microarrays in determining the transcriptomic characteristics of cancer, while RNA-seq and microarray-based models perform similarly in clinical endpoint prediction. Our findings may be valuable to guide future studies on the development of gene expression-based predictive models and their implementation in clinical practice.
Toxicologic Pathology | 2009
Robert N. McBurney; Wade M. Hines; Linda S. Von Tungeln; Laura K. Schnackenberg; Richard D. Beger; Carrie L. Moland; Tao Han; James C. Fuscoe; Ching-Wei Chang; James J. Chen; Zhenqiang Su; Xiaohui Fan; Weida Tong; Shelagh A. Booth; Raji Balasubramanian; Paul Courchesne; Jennifer M. Campbell; Armin Graber; Yu Guo; Peter Juhasz; Tricin Y. Li; Moira Lynch; Nicole Morel; Thomas N. Plasterer; Edward J. Takach; Chenhui Zeng; Frederick A. Beland
Drug-induced liver injury (DILI) is the primary adverse event that results in withdrawal of drugs from the market and a frequent reason for the failure of drug candidates in development. The Liver Toxicity Biomarker Study (LTBS) is an innovative approach to investigate DILI because it compares molecular events produced in vivo by compound pairs that (a) are similar in structure and mechanism of action, (b) are associated with few or no signs of liver toxicity in preclinical studies, and (c) show marked differences in hepatotoxic potential. The LTBS is a collaborative preclinical research effort in molecular systems toxicology between the National Center for Toxicological Research and BG Medicine, Inc., and is supported by seven pharmaceutical companies and three technology providers. In phase I of the LTBS, entacapone and tolcapone were studied in rats to provide results and information that will form the foundation for the design and implementation of phase II. Molecular analysis of the rat liver and plasma samples combined with statistical analyses of the resulting datasets yielded marker analytes, illustrating the value of the broad-spectrum, molecular systems analysis approach to studying pharmacological or toxicological effects.
BMC Bioinformatics | 2008
Huixiao Hong; Zhenqiang Su; Weigong Ge; Leming M. Shi; Roger Perkins; Hong Fang; Joshua Xu; James J. Chen; Tao Han; Jim Kaput; James C. Fuscoe; Weida Tong
BackgroundGenome-wide association studies (GWAS) aim to identify genetic variants (usually single nucleotide polymorphisms [SNPs]) across the entire human genome that are associated with phenotypic traits such as disease status and drug response. Highly accurate and reproducible genotype calling are paramount since errors introduced by calling algorithms can lead to inflation of false associations between genotype and phenotype. Most genotype calling algorithms currently used for GWAS are based on multiple arrays. Because hundreds of gigabytes (GB) of raw data are generated from a GWAS, the samples are typically partitioned into batches containing subsets of the entire dataset for genotype calling. High call rates and accuracies have been achieved. However, the effects of batch size (i.e., number of chips analyzed together) and of batch composition (i.e., the choice of chips in a batch) on call rate and accuracy as well as the propagation of the effects into significantly associated SNPs identified have not been investigated. In this paper, we analyzed both the batch size and batch composition for effects on the genotype calling algorithm BRLMM using raw data of 270 HapMap samples analyzed with the Affymetrix Human Mapping 500 K array set.ResultsUsing data from 270 HapMap samples interrogated with the Affymetrix Human Mapping 500 K array set, three different batch sizes and three different batch compositions were used for genotyping using the BRLMM algorithm. Comparative analysis of the calling results and the corresponding lists of significant SNPs identified through association analysis revealed that both batch size and composition affected genotype calling results and significantly associated SNPs. Batch size and batch composition effects were more severe on samples and SNPs with lower call rates than ones with higher call rates, and on heterozygous genotype calls compared to homozygous genotype calls.ConclusionBatch size and composition affect the genotype calling results in GWAS using BRLMM. The larger the differences in batch sizes, the larger the effect. The more homogenous the samples in the batches, the more consistent the genotype calls. The inconsistency propagates to the lists of significantly associated SNPs identified in downstream association analysis. Thus, uniform and large batch sizes should be used to make genotype calls for GWAS. In addition, samples of high homogeneity should be placed into the same batch.