Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Wenjie Shu is active.

Publication


Featured researches published by Wenjie Shu.


PLOS ONE | 2012

Comprehensive identification and annotation of cell type-specific and ubiquitous CTCF-binding sites in the human genome.

Hebing Chen; Yao Tian; Wenjie Shu; Xiaochen Bo; Shengqi Wang

Chromatin insulators are DNA elements that regulate the level of gene expression either by preventing gene silencing through the maintenance of heterochromatin boundaries or by preventing gene activation by blocking interactions between enhancers and promoters. CCCTC-binding factor (CTCF), a ubiquitously expressed 11-zinc-finger DNA-binding protein, is the only protein implicated in the establishment of insulators in vertebrates. While CTCF has been implicated in diverse regulatory functions, CTCF has only been studied in a limited number of cell types across human genome. Thus, it is not clear whether the identified cell type-specific differences in CTCF-binding sites are functionally significant. Here, we identify and characterize cell type-specific and ubiquitous CTCF-binding sites in the human genome across 38 cell types designated by the Encyclopedia of DNA Elements (ENCODE) consortium. These cell type-specific and ubiquitous CTCF-binding sites show uniquely versatile transcriptional functions and characteristic chromatin features. In addition, we confirm the insulator barrier function of CTCF-binding and explore the novel function of CTCF in DNA replication. These results represent a critical step toward the comprehensive and systematic understanding of CTCF-dependent insulators and their versatile roles in the human genome.


Nucleic Acids Research | 2011

Genome-wide analysis of the relationships between DNaseI HS, histone modifications and gene expression reveals distinct modes of chromatin domains

Wenjie Shu; Hebing Chen; Xiaochen Bo; Shengqi Wang

To understand the molecular mechanisms that underlie global transcriptional regulation, it is essential to first identify all the transcriptional regulatory elements in the human genome. The advent of next-generation sequencing has provided a powerful platform for genome-wide analysis of different species and specific cell types; when combined with traditional techniques to identify regions of open chromatin [DNaseI hypersensitivity (DHS)] or specific binding locations of transcription factors [chromatin immunoprecipitation (ChIP)], and expression data from microarrays, we become uniquely poised to uncover the mysteries of the genome and its regulation. To this end, we have performed global meta-analysis of the relationship among data from DNaseI-seq, ChIP-seq and expression arrays, and found that specific correlations exist among regulatory elements and gene expression across different cell types. These correlations revealed four distinct modes of chromatin domain structure reflecting different functions: repressive, active, primed and bivalent. Furthermore, CCCTC-binding factor (CTCF) binding sites were identified based on these integrative data. Our findings uncovered a complex regulatory process involving by DNaseI HS sites and histone modifications, and suggest that these dynamic elements may be responsible for maintaining chromatin structure and integrity of the human genome. Our integrative approach provides an example by which data from diverse technology platforms may be integrated to provide more meaningful insights into global transcriptional regulation.


BMC Bioinformatics | 2006

RDMAS: a web server for RNA deleterious mutation analysis

Wenjie Shu; Xiaochen Bo; Rujia Liu; Dongsheng Zhao; Zhiqiang Zheng; Shengqi Wang

BackgroundThe diverse functions of ncRNAs critically depend on their structures. Mutations in ncRNAs disrupting the structures of functional sites are expected to be deleterious. RNA deleterious mutations have attracted wide attentions because some of them in cells result in serious disease, and some others in microbes influence their fitness.ResultsThe RDMAS web server we describe here is an online tool for evaluating structural deleteriousness of single nucleotide mutation in RNA genes. Several structure comparison methods have been integrated; sub-optimal structures predicted can be optionally involved to mitigate the uncertainty of secondary structure prediction. With a user-friendly interface, the web application is easy to use. Intuitive illustrations are provided along with the original computational results to facilitate quick analysis.ConclusionRDMAS can be used to explore the structure alterations which cause mutations pathogenic, and to predict deleterious mutations which may help to determine the functionally critical regions. RDMAS is freely accessed via http://biosrv1.bmi.ac.cn/rdmas.


Scientific Reports | 2016

PEDLA: predicting enhancers with a deep learning-based algorithmic framework

Feng Liu; Hao Li; Chao Ren; Xiaochen Bo; Wenjie Shu

Transcriptional enhancers are non-coding segments of DNA that play a central role in the spatiotemporal regulation of gene expression programs. However, systematically and precisely predicting enhancers remain a major challenge. Although existing methods have achieved some success in enhancer prediction, they still suffer from many issues. We developed a deep learning-based algorithmic framework named PEDLA (https://github.com/wenjiegroup/PEDLA), which can directly learn an enhancer predictor from massively heterogeneous data and generalize in ways that are mostly consistent across various cell types/tissues. We first trained PEDLA with 1,114-dimensional heterogeneous features in H1 cells, and demonstrated that PEDLA framework integrates diverse heterogeneous features and gives state-of-the-art performance relative to five existing methods for enhancer prediction. We further extended PEDLA to iteratively learn from 22 training cell types/tissues. Our results showed that PEDLA manifested superior performance consistency in both training and independent test sets. On average, PEDLA achieved 95.0% accuracy and a 96.8% geometric mean (GM) of sensitivity and specificity across 22 training cell types/tissues, as well as 95.7% accuracy and a 96.8% GM across 20 independent test cell types/tissues. Together, our work illustrates the power of harnessing state-of-the-art deep learning techniques to consistently identify regulatory elements at a genome-wide scale from massively heterogeneous data across diverse cell types/tissues.


Bioinformatics | 2016

De novo identification of replication-timing domains in the human genome by deep learning

Feng Liu; Chao Ren; Hao Li; Pingkun Zhou; Xiaochen Bo; Wenjie Shu

Abstract Motivation: The de novo identification of the initiation and termination zones—regions that replicate earlier or later than their upstream and downstream neighbours, respectively—remains a key challenge in DNA replication. Results: Building on advances in deep learning, we developed a novel hybrid architecture combining a pre-trained, deep neural network and a hidden Markov model (DNN-HMM) for the de novo identification of replication domains using replication timing profiles. Our results demonstrate that DNN-HMM can significantly outperform strong, discriminatively trained Gaussian mixture model–HMM (GMM-HMM) systems and other six reported methods that can be applied to this challenge. We applied our trained DNN-HMM to identify distinct replication domain types, namely the early replication domain (ERD), the down transition zone (DTZ), the late replication domain (LRD) and the up transition zone (UTZ), using newly replicated DNA sequencing (Repli-Seq) data across 15 human cells. A subsequent integrative analysis revealed that these replication domains harbour unique genomic and epigenetic patterns, transcriptional activity and higher-order chromosomal structure. Our findings support the ‘replication-domain’ model, which states (1) that ERDs and LRDs, connected by UTZs and DTZs, are spatially compartmentalized structural and functional units of higher-order chromosomal structure, (2) that the adjacent DTZ-UTZ pairs form chromatin loops and (3) that intra-interactions within ERDs and LRDs tend to be short-range and long-range, respectively. Our model reveals an important chromatin organizational principle of the human genome and represents a critical step towards understanding the mechanisms regulating replication timing. Availability and implementation: Our DNN-HMM method and three additional algorithms can be freely accessed at https://github.com/wenjiegroup/DNN-HMM. The replication domain regions identified in this study are available in GEO under the accession ID GSE53984. Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Scientific Reports | 2015

An integrative analysis of TFBS-clustered regions reveals new transcriptional regulation models on the accessible chromatin landscape

Hebing Chen; Hao Li; Feng Liu; Xiaofei Zheng; Shengqi Wang; Xiaochen Bo; Wenjie Shu

DNase I hypersensitive sites (DHSs) define the accessible chromatin landscape and have revolutionised the discovery of distinct cis-regulatory elements in diverse organisms. Here, we report the first comprehensive map of human transcription factor binding site (TFBS)-clustered regions using Gaussian kernel density estimation based on genome-wide mapping of the TFBSs in 133 human cell and tissue types. Approximately 1.6 million distinct TFBS-clustered regions, collectively spanning 27.7% of the human genome, were discovered. The TFBS complexity assigned to each TFBS-clustered region was highly correlated with genomic location, cell selectivity, evolutionary conservation, sequence features, and functional roles. An integrative analysis of these regions using ENCODE data revealed transcription factor occupancy, transcriptional activity, histone modification, DNA methylation, and chromatin structures that varied based on TFBS complexity. Furthermore, we found that we could recreate lineage-branching relationships by simple clustering of the TFBS-clustered regions from terminally differentiated cells. Based on these findings, a model of transcriptional regulation determined by TFBS complexity is proposed.


BMC Bioinformatics | 2008

A novel representation of RNA secondary structure based on element-contact graphs

Wenjie Shu; Xiaochen Bo; Zhiqiang Zheng; Shengqi Wang

BackgroundDepending on their specific structures, noncoding RNAs (ncRNAs) play important roles in many biological processes. Interest in developing new topological indices based on RNA graphs has been revived in recent years, as such indices can be used to compare, identify and classify RNAs. Although the topological indices presented before characterize the main topological features of RNA secondary structures, information on RNA structural details is ignored to some degree. Therefore, it is necessity to identify topological features with low degeneracy based on complete and fine-grained RNA graphical representations.ResultsIn this study, we present a complete and fine scheme for RNA graph representation as a new basis for constructing RNA topological indices. We propose a combination of three vertex-weighted element-contact graphs (ECGs) to describe the RNA element details and their adjacent patterns in RNA secondary structure. Both the stem and loop topologies are encoded completely in the ECGs. The relationship among the three typical topological index families defined by their ECGs and RNA secondary structures was investigated from a dataset of 6,305 ncRNAs. The applicability of topological indices is illustrated by three application case studies. Based on the applied small dataset, we find that the topological indices can distinguish true pre-miRNAs from pseudo pre-miRNAs with about 96% accuracy, and can cluster known types of ncRNAs with about 98% accuracy, respectively.ConclusionThe results indicate that the topological indices can characterize the details of RNA structures and may have a potential role in identifying and classifying ncRNAs. Moreover, these indices may lead to a new approach for discovering novel ncRNAs. However, further research is needed to fully resolve the challenging problem of predicting and classifying noncoding RNAs.


BMC Evolutionary Biology | 2007

In silico genetic robustness analysis of microRNA secondary structures: potential evidence of congruent evolution in microRNA

Wenjie Shu; Xiaochen Bo; Ming Ni; Zhiqiang Zheng; Shengqi Wang

BackgroundRobustness is a fundamental property of biological systems and is defined as the ability to maintain stable functioning in the face of various perturbations. Understanding how robustness has evolved has become one of the most attractive areas of research for evolutionary biologists, as it is still unclear whether genetic robustness evolved as a direct consequence of natural selection, as an intrinsic property of adaptations, or as congruent correlate of environment robustness. Recent studies have demonstrated that the stem-loop structures of microRNA (miRNA) are tolerant to some structural changes and show thermodynamic stability. We therefore hypothesize that genetic robustness may evolve as a correlated side effect of the evolution for environmental robustness.ResultsWe examine the robustness of 1,082 miRNA genes covering six species. Our data suggest the stem-loop structures of miRNA precursors exhibit a significantly higher level of genetic robustness, which goes beyond the intrinsic robustness of the stem-loop structure and is not a byproduct of the base composition bias. Furthermore, we demonstrate that the phenotype of miRNA buffers against genetic perturbations, and at the same time is also insensitive to environmental perturbations.ConclusionThe results suggest that the increased robustness of miRNA stem-loops may result from congruent evolution for environment robustness. Potential applications of our findings are also discussed.


Scientific Reports | 2015

Functional annotation of HOT regions in the human genome: implications for human disease and cancer

Hao Li; Hebing Chen; Feng Liu; Chao Ren; Shengqi Wang; Xiaochen Bo; Wenjie Shu

Advances in genome-wide association studies (GWAS) and large-scale sequencing studies have resulted in an impressive and growing list of disease- and trait-associated genetic variants. Most studies have emphasised the discovery of genetic variation in coding sequences, however, the noncoding regulatory effects responsible for human disease and cancer biology have been substantially understudied. To better characterise the cis-regulatory effects of noncoding variation, we performed a comprehensive analysis of the genetic variants in HOT (high-occupancy target) regions, which are considered to be one of the most intriguing findings of recent large-scale sequencing studies. We observed that GWAS variants that map to HOT regions undergo a substantial net decrease and illustrate development-specific localisation during haematopoiesis. Additionally, genetic risk variants are disproportionally enriched in HOT regions compared with LOT (low-occupancy target) regions in both disease-relevant and cancer cells. Importantly, this enrichment is biased toward disease- or cancer-specific cell types. Furthermore, we observed that cancer cells generally acquire cancer-specific HOT regions at oncogenes through diverse mechanisms of cancer pathogenesis. Collectively, our findings demonstrate the key roles of HOT regions in human disease and cancer and represent a critical step toward further understanding disease biology, diagnosis, and therapy.


Bioinformatics | 2017

BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone

Bite Yang; Feng Liu; Chao Ren; Zhangyi Ouyang; Ziwei Xie; Xiaochen Bo; Wenjie Shu

Motivation: Enhancer elements are noncoding stretches of DNA that play key roles in controlling gene expression programmes. Despite major efforts to develop accurate enhancer prediction methods, identifying enhancer sequences continues to be a challenge in the annotation of mammalian genomes. One of the major issues is the lack of large, sufficiently comprehensive and experimentally validated enhancers for humans or other species. Thus, the development of computational methods based on limited experimentally validated enhancers and deciphering the transcriptional regulatory code encoded in the enhancer sequences is urgent. Results: We present a deep‐learning‐based hybrid architecture, BiRen, which predicts enhancers using the DNA sequence alone. Our results demonstrate that BiRen can learn common enhancer patterns directly from the DNA sequence and exhibits superior accuracy, robustness and generalizability in enhancer prediction relative to other state‐of‐the‐art enhancer predictors based on sequence characteristics. Our BiRen will enable researchers to acquire a deeper understanding of the regulatory code of enhancer sequences. Availability and Implementation: Our BiRen method can be freely accessed at https://github.com/wenjiegroup/BiRen. Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Collaboration


Dive into the Wenjie Shu's collaboration.

Top Co-Authors

Avatar

Xiaochen Bo

National University of Defense Technology

View shared research outputs
Top Co-Authors

Avatar

Shengqi Wang

National University of Defense Technology

View shared research outputs
Top Co-Authors

Avatar

Zhiqiang Zheng

National University of Defense Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jing Yang

Academy of Military Medical Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ziwei Xie

Huazhong University of Science and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge