Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Weigong Ge is active.

Publication


Featured researches published by Weigong Ge.


Journal of Chemical Information and Modeling | 2008

Mold(2), molecular descriptors from 2D structures for chemoinformatics and toxicoinformatics.

Huixiao Hong; Qian Xie; Weigong Ge; Feng Qian; Hong Fang; Leming Shi; Zhenqiang Su; Roger Perkins; Weida Tong

Research applications in chemoinformatics and toxicoinformatics increasingly use representations of molecules in the form of numerical descriptors that capture the structural characteristics and properties of molecules. These representations are useful for ADME/toxicity prediction, diversity analysis, library design, QSAR/QSPR, virtual screening, and other purposes. Molecular descriptors have ranged from relatively simple forms calculated from simple two-dimensional (2D) chemical structures to more complex forms representing three-dimensional (3D) chemical structures or complex molecular fingerprints consisting of numerous bit positions to represent specific chemical information. The Mold (2) software was developed to enable the rapid calculation of a large and diverse set of descriptors encoding two-dimensional chemical structure information. Comparative analysis of Mold (2) descriptors with those calculated by Cerius (2), Dragon, and Molconn-Z on several data sets using Shannon entropy analysis demonstrated that Mold (2) descriptors convey a similar amount of information. In addition, using the same classification method, slightly better models were generated using Mold (2) descriptors compared to those generated using descriptors from the compared commercial software packages. The low computing cost for Mold (2) makes it suitable not only for small data sets, such as in QSAR, but also for large databases in virtual screening. High reproducibility and reliability are expected because Mold (2) does not require 3D structures. Mold (2) is freely available to the public ( http://www.fda.gov/nctr/science/centers/toxicoinformatics/index.htm).


Chemical Research in Toxicology | 2011

Comparing Next-Generation Sequencing and Microarray Technologies in a Toxicological Study of the Effects of Aristolochic Acid on Rat Kidneys

Zhenqiang Su; Zhiguang Li; Tao Chen; Quan Zhen Li; Hong Fang; Don Ding; Weigong Ge; Baitang Ning; Huixiao Hong; Roger Perkins; Weida Tong; Leming Shi

RNA-Seq has been increasingly used for the quantification and characterization of transcriptomes. The ongoing development of the technology promises the more accurate measurement of gene expression. However, its benefits over widely accepted microarray technologies have not been adequately assessed, especially in toxicogenomics studies. The goal of this study is to enhance the scientific communitys understanding of the advantages and challenges of RNA-Seq in the quantification of gene expression by comparing analysis results from RNA-Seq and microarray data on a toxicogenomics study. A typical toxicogenomics study design was used to compare the performance of an RNA-Seq approach (Illumina Genome Analyzer II) to a microarray-based approach (Affymetrix Rat Genome 230 2.0 arrays) for detecting differentially expressed genes (DEGs) in the kidneys of rats treated with aristolochic acid (AA), a carcinogenic and nephrotoxic chemical most notably used for weight loss. We studied the comparability of the RNA-Seq and microarray data in terms of absolute gene expression, gene expression patterns, differentially expressed genes, and biological interpretation. We found that RNA-Seq was more sensitive in detecting genes with low expression levels, while similar gene expression patterns were observed for both platforms. Moreover, although the overlap of the DEGs was only 40-50%, the biological interpretation was largely consistent between the RNA-Seq and microarray data. RNA-Seq maintained a consistent biological interpretation with time-tested microarray platforms while generating more sensitive results. However, there is clearly a need for future investigations to better understand the advantages and limitations of RNA-Seq in toxicogenomics studies and environmental health research.


BMC Bioinformatics | 2008

Assessing batch effects of genotype calling algorithm BRLMM for the Affymetrix GeneChip Human Mapping 500 K array set using 270 HapMap samples.

Huixiao Hong; Zhenqiang Su; Weigong Ge; Leming M. Shi; Roger Perkins; Hong Fang; Joshua Xu; James J. Chen; Tao Han; Jim Kaput; James C. Fuscoe; Weida Tong

BackgroundGenome-wide association studies (GWAS) aim to identify genetic variants (usually single nucleotide polymorphisms [SNPs]) across the entire human genome that are associated with phenotypic traits such as disease status and drug response. Highly accurate and reproducible genotype calling are paramount since errors introduced by calling algorithms can lead to inflation of false associations between genotype and phenotype. Most genotype calling algorithms currently used for GWAS are based on multiple arrays. Because hundreds of gigabytes (GB) of raw data are generated from a GWAS, the samples are typically partitioned into batches containing subsets of the entire dataset for genotype calling. High call rates and accuracies have been achieved. However, the effects of batch size (i.e., number of chips analyzed together) and of batch composition (i.e., the choice of chips in a batch) on call rate and accuracy as well as the propagation of the effects into significantly associated SNPs identified have not been investigated. In this paper, we analyzed both the batch size and batch composition for effects on the genotype calling algorithm BRLMM using raw data of 270 HapMap samples analyzed with the Affymetrix Human Mapping 500 K array set.ResultsUsing data from 270 HapMap samples interrogated with the Affymetrix Human Mapping 500 K array set, three different batch sizes and three different batch compositions were used for genotyping using the BRLMM algorithm. Comparative analysis of the calling results and the corresponding lists of significant SNPs identified through association analysis revealed that both batch size and composition affected genotype calling results and significantly associated SNPs. Batch size and batch composition effects were more severe on samples and SNPs with lower call rates than ones with higher call rates, and on heterozygous genotype calls compared to homozygous genotype calls.ConclusionBatch size and composition affect the genotype calling results in GWAS using BRLMM. The larger the differences in batch sizes, the larger the effect. The more homogenous the samples in the batches, the more consistent the genotype calls. The inconsistency propagates to the lists of significantly associated SNPs identified in downstream association analysis. Thus, uniform and large batch sizes should be used to make genotype calls for GWAS. In addition, samples of high homogeneity should be placed into the same batch.


Genome Biology | 2014

An investigation of biomarkers derived from legacy microarray data for their utility in the RNA-seq era

Zhenqiang Su; Hong Fang; Huixiao Hong; Leming Shi; Wenqian Zhang; Wenwei Zhang; Yanyan Zhang; Zirui Dong; Lee Lancashire; Marina Bessarabova; Xi Yang; Baitang Ning; Binsheng Gong; Joe Meehan; Joshua Xu; Weigong Ge; Roger Perkins; Matthias Fischer; Weida Tong

BackgroundGene expression microarray has been the primary biomarker platform ubiquitously applied in biomedical research, resulting in enormous data, predictive models, and biomarkers accrued. Recently, RNA-seq has looked likely to replace microarrays, but there will be a period where both technologies co-exist. This raises two important questions: Can microarray-based models and biomarkers be directly applied to RNA-seq data? Can future RNA-seq-based predictive models and biomarkers be applied to microarray data to leverage past investment?ResultsWe systematically evaluated the transferability of predictive models and signature genes between microarray and RNA-seq using two large clinical data sets. The complexity of cross-platform sequence correspondence was considered in the analysis and examined using three human and two rat data sets, and three levels of mapping complexity were revealed. Three algorithms representing different modeling complexity were applied to the three levels of mappings for each of the eight binary endpoints and Cox regression was used to model survival times with expression data. In total, 240,096 predictive models were examined.ConclusionsSignature genes of predictive models are reciprocally transferable between microarray and RNA-seq data for model development, and microarray-based models can accurately predict RNA-seq-profiled samples; while RNA-seq-based models are less accurate in predicting microarray-profiled samples and are affected both by the choice of modeling algorithm and the gene mapping complexity. The results suggest continued usefulness of legacy microarray data and established microarray biomarkers and predictive models in the forthcoming RNA-seq era.


Toxicological Sciences | 2013

EADB: An Estrogenic Activity Database for Assessing Potential Endocrine Activity

Jie Shen; Lei Xu; Hong Fang; Ann M. Richard; Jeffrey D Bray; Richard S. Judson; Guangxu Zhou; Thomas Colatsky; Jason Aungst; Christina T. Teng; Stephen Harris; Weigong Ge; Susie Y Dai; Zhenqiang Su; Abigail Jacobs; Wafa Harrouk; Roger Perkins; Weida Tong; Huixiao Hong

Endocrine-active chemicals can potentially have adverse effects on both humans and wildlife. They can interfere with the bodys endocrine system through direct or indirect interactions with many protein targets. Estrogen receptors (ERs) are one of the major targets, and many endocrine disruptors are estrogenic and affect the normal estrogen signaling pathways. However, ERs can also serve as therapeutic targets for various medical conditions, such as menopausal symptoms, osteoporosis, and ER-positive breast cancer. Because of the decades-long interest in the safety and therapeutic utility of estrogenic chemicals, a large number of chemicals have been assayed for estrogenic activity, but these data exist in various sources and different formats that restrict the ability of regulatory and industry scientists to utilize them fully for assessing risk-benefit. To address this issue, we have developed an Estrogenic Activity Database (EADB; http://www.fda.gov/ScienceResearch/BioinformaticsTools/EstrogenicActivityDatabaseEADB/default.htm) and made it freely available to the public. EADB contains 18,114 estrogenic activity data points collected for 8212 chemicals tested in 1284 binding, reporter gene, cell proliferation, and in vivo assays in 11 different species. The chemicals cover a broad chemical structure space and the data span a wide range of activities. A set of tools allow users to access EADB and evaluate potential endocrine activity of chemicals. As a case study, a classification model was developed using EADB for predicting ER binding of chemicals.


PLOS ONE | 2012

Technical Reproducibility of Genotyping SNP Arrays Used in Genome-Wide Association Studies

Huixiao Hong; Lei Xu; Jie Liu; Wendell D. Jones; Zhenqiang Su; Baitang Ning; Roger Perkins; Weigong Ge; K Miclaus; Li Zhang; Kyung-Hee Park; Bridgett Green; Tao Han; Hong Fang; Christophe G. Lambert; Silvia C. Vega; Simon Lin; Nadereh Jafari; Wendy Czika; Russell D. Wolfinger; Federico Goodsaid; Weida Tong; Leming Shi

During the last several years, high-density genotyping SNP arrays have facilitated genome-wide association studies (GWAS) that successfully identified common genetic variants associated with a variety of phenotypes. However, each of the identified genetic variants only explains a very small fraction of the underlying genetic contribution to the studied phenotypic trait. Moreover, discordance observed in results between independent GWAS indicates the potential for Type I and II errors. High reliability of genotyping technology is needed to have confidence in using SNP data and interpreting GWAS results. Therefore, reproducibility of two widely genotyping technology platforms from Affymetrix and Illumina was assessed by analyzing four technical replicates from each of the six individuals in five laboratories. Genotype concordance of 99.40% to 99.87% within a laboratory for the sample platform, 98.59% to 99.86% across laboratories for the same platform, and 98.80% across genotyping platforms was observed. Moreover, arrays with low quality data were detected when comparing genotyping data from technical replicates, but they could not be detected according to venders’ quality control (QC) suggestions. Our results demonstrated the technical reliability of currently available genotyping platforms but also indicated the importance of incorporating some technical replicates for genotyping QC in order to improve the reliability of GWAS results. The impact of discordant genotypes on association analysis results was simulated and could explain, at least in part, the irreproducibility of some GWAS findings when the effect size (i.e. the odds ratio) and the minor allele frequencies are low.


BMC Bioinformatics | 2015

A heuristic approach to determine an appropriate number of topics in topic modeling.

Weizhong Zhao; James J. Chen; Roger Perkins; Zhichao Liu; Weigong Ge; Yijun Ding; Wen Zou

BackgroundTopic modelling is an active research field in machine learning. While mainly used to build models from unstructured textual data, it offers an effective means of data mining where samples represent documents, and different biological endpoints or omics data represent words. Latent Dirichlet Allocation (LDA) is the most commonly used topic modelling method across a wide number of technical fields. However, model development can be arduous and tedious, and requires burdensome and systematic sensitivity studies in order to find the best set of model parameters. Often, time-consuming subjective evaluations are needed to compare models. Currently, research has yielded no easy way to choose the proper number of topics in a model beyond a major iterative approach.Methods and resultsBased on analysis of variation of statistical perplexity during topic modelling, a heuristic approach is proposed in this study to estimate the most appropriate number of topics. Specifically, the rate of perplexity change (RPC) as a function of numbers of topics is proposed as a suitable selector. We test the stability and effectiveness of the proposed method for three markedly different types of grounded-truth datasets: Salmonella next generation sequencing, pharmacological side effects, and textual abstracts on computational biology and bioinformatics (TCBB) from PubMed.ConclusionThe proposed RPC-based method is demonstrated to choose the best number of topics in three numerical experiments of widely different data types, and for databases of very different sizes. The work required was markedly less arduous than if full systematic sensitivity studies had been carried out with number of topics as a parameter. We understand that additional investigation is needed to substantiate the methods theoretical basis, and to establish its generalizability in terms of dataset characteristics.


BMC Genomics | 2012

atBioNet– an integrated network analysis tool for genomics and biomarker discovery

Yijun Ding; Minjun Chen; Zhichao Liu; Don Ding; Yanbin Ye; Min Zhang; Reagan Kelly; Li Guo; Zhenqiang Su; Stephen Harris; Feng Qian; Weigong Ge; Hong Fang; Xiaowei Xu; Weida Tong

BackgroundLarge amounts of mammalian protein-protein interaction (PPI) data have been generated and are available for public use. From a systems biology perspective, Proteins/genes interactions encode the key mechanisms distinguishing disease and health, and such mechanisms can be uncovered through network analysis. An effective network analysis tool should integrate different content-specific PPI databases into a comprehensive network format with a user-friendly platform to identify key functional modules/pathways and the underlying mechanisms of disease and toxicity.ResultsatBioNet integrates seven publicly available PPI databases into a network-specific knowledge base. Knowledge expansion is achieved by expanding a user supplied proteins/genes list with interactions from its integrated PPI network. The statistically significant functional modules are determined by applying a fast network-clustering algorithm (SCAN: a Structural Clustering Algorithm for Networks). The functional modules can be visualized either separately or together in the context of the whole network. Integration of pathway information enables enrichment analysis and assessment of the biological function of modules. Three case studies are presented using publicly available disease gene signatures as a basis to discover new biomarkers for acute leukemia, systemic lupus erythematosus, and breast cancer. The results demonstrated that atBioNet can not only identify functional modules and pathways related to the studied diseases, but this information can also be used to hypothesize novel biomarkers for future analysis.ConclusionatBioNet is a free web-based network analysis tool that provides a systematic insight into proteins/genes interactions through examining significant functional modules. The identified functional modules are useful for determining underlying mechanisms of disease and biomarker discovery. It can be accessed at: http://www.fda.gov/ScienceResearch/BioinformaticsTools/ucm285284.htm.


BMC Bioinformatics | 2014

Competitive molecular docking approach for predicting estrogen receptor subtype α agonists and antagonists

Hui Wen Ng; Wenqian Zhang; Mao Shu; Heng Luo; Weigong Ge; Roger Perkins; Weida Tong; Huixiao Hong

BackgroundEndocrine disrupting chemicals (EDCs) are exogenous compounds that interfere with the endocrine system of vertebrates, often through direct or indirect interactions with nuclear receptor proteins. Estrogen receptors (ERs) are particularly important protein targets and many EDCs are ER binders, capable of altering normal homeostatic transcription and signaling pathways. An estrogenic xenobiotic can bind ER as either an agonist or antagonist to increase or inhibit transcription, respectively. The receptor conformations in the complexes of ER bound with agonists and antagonists are different and dependent on interactions with co-regulator proteins that vary across tissue type. Assessment of chemical endocrine disruption potential depends not only on binding affinity to ERs, but also on changes that may alter the receptor conformation and its ability to subsequently bind DNA response elements and initiate transcription. Using both agonist and antagonist conformations of the ERα, we developed an in silico approach that can be used to differentiate agonist versus antagonist status of potential binders.MethodsThe approach combined separate molecular docking models for ER agonist and antagonist conformations. The ability of this approach to differentiate agonists and antagonists was first evaluated using true agonists and antagonists extracted from the crystal structures available in the protein data bank (PDB), and then further validated using a larger set of ligands from the literature. The usefulness of the approach was demonstrated with enrichment analysis in data sets with a large number of decoy ligands.ResultsThe performance of individual agonist and antagonist docking models was found comparable to similar models in the literature. When combined in a competitive docking approach, they provided the ability to discriminate agonists from antagonists with good accuracy, as well as the ability to efficiently select true agonists and antagonists from decoys during enrichment analysis.ConclusionThis approach enables evaluation of potential ER biological function changes caused by chemicals bound to the receptor which, in turn, allows the assessment of a chemicals endocrine disrupting potential. The approach can be used not only by regulatory authorities to perform risk assessments on potential EDCs but also by the industry in drug discovery projects to screen for potential agonists and antagonists.


Chemical Research in Toxicology | 2015

Estrogenic Activity Data Extraction and in Silico Prediction Show the Endocrine Disruption Potential of Bisphenol A Replacement Compounds

Hui Wen Ng; Mao Shu; Heng Luo; Hao Ye; Weigong Ge; Roger Perkins; Weida Tong; Huixiao Hong

Bisphenol A (BPA) replacement compounds are released to the environment and cause widespread human exposure. However, a lack of thorough safety evaluations on the BPA replacement compounds has raised public concerns. We assessed the endocrine disruption potential of BPA replacement compounds in the market to assist their safety evaluations. A literature search was conducted to ascertain the BPA replacement compounds in use. Available experimental estrogenic activity data of these compounds were extracted from the Estrogenic Activity Database (EADB) to assess their estrogenic potential. An in silico model was developed to predict the estrogenic activity of compounds lacking experimental data. Molecular dynamics (MD) simulations were performed to understand the mechanisms by which the estrogenic compounds bind to and activate the estrogen receptor (ER). Forty-five BPA replacement compounds were identified in the literature. Seven were more estrogenic and five less estrogenic than BPA, while six were nonestrogenic in EADB. A two-tier in silico model was developed based on molecular docking to predict the estrogenic activity of the 27 compounds lacking data. Eleven were predicted as ER binders and 16 as nonbinders. MD simulations revealed hydrophobic contacts and hydrogen bonds as the main interactions between ER and the estrogenic compounds.

Collaboration


Dive into the Weigong Ge's collaboration.

Top Co-Authors

Avatar

Weida Tong

Food and Drug Administration

View shared research outputs
Top Co-Authors

Avatar

Huixiao Hong

Food and Drug Administration

View shared research outputs
Top Co-Authors

Avatar

Roger Perkins

National Center for Toxicological Research

View shared research outputs
Top Co-Authors

Avatar

Zhenqiang Su

Food and Drug Administration

View shared research outputs
Top Co-Authors

Avatar

Hong Fang

Food and Drug Administration

View shared research outputs
Top Co-Authors

Avatar

Leming Shi

National Center for Toxicological Research

View shared research outputs
Top Co-Authors

Avatar

Heng Luo

Food and Drug Administration

View shared research outputs
Top Co-Authors

Avatar

Hui Wen Ng

Food and Drug Administration

View shared research outputs
Top Co-Authors

Avatar

Baitang Ning

National Center for Toxicological Research

View shared research outputs
Top Co-Authors

Avatar

Hao Ye

Food and Drug Administration

View shared research outputs
Researchain Logo
Decentralizing Knowledge