Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yu-Hang Zhang is active.

Publication


Featured researches published by Yu-Hang Zhang.


Scientific Reports | 2016

Gene expression profiling gut microbiota in different races of humans

Lei Chen; Yu-Hang Zhang; Tao Huang; Yu-Dong Cai

The gut microbiome is shaped and modified by the polymorphisms of microorganisms in the intestinal tract. Its composition shows strong individual specificity and may play a crucial role in the human digestive system and metabolism. Several factors can affect the composition of the gut microbiome, such as eating habits, living environment, and antibiotic usage. Thus, various races are characterized by different gut microbiome characteristics. In this present study, we studied the gut microbiomes of three different races, including individuals of Asian, European and American races. The gut microbiome and the expression levels of gut microbiome genes were analyzed in these individuals. Advanced feature selection methods (minimum redundancy maximum relevance and incremental feature selection) and four machine-learning algorithms (random forest, nearest neighbor algorithm, sequential minimal optimization, Dagging) were employed to capture key differentially expressed genes. As a result, sequential minimal optimization was found to yield the best performance using the 454 genes, which could effectively distinguish the gut microbiomes of different races. Our analyses of extracted genes support the widely accepted hypotheses that eating habits, living environments and metabolic levels in different races can influence the characteristics of the gut microbiome.


Journal of Biomolecular Structure & Dynamics | 2017

Analysis and prediction of drug–drug interaction by minimum redundancy maximum relevance and incremental feature selection

Lili Liu; Lei Chen; Yu-Hang Zhang; Lai Wei; Shiwen Cheng; Xiangyin Kong; Mingyue Zheng; Tao Huang; Yu-Dong Cai

Drug–drug interaction (DDI) defines a situation in which one drug affects the activity of another when both are administered together. DDI is a common cause of adverse drug reactions and sometimes also leads to improved therapeutic effects. Therefore, it is of great interest to discover novel DDIs according to their molecular properties and mechanisms in a robust and rigorous way. This paper attempts to predict effective DDIs using the following properties: (1) chemical interaction between drugs; (2) protein interactions between the targets of drugs; and (3) target enrichment of KEGG pathways. The data consisted of 7323 pairs of DDIs collected from the DrugBank and 36,615 pairs of drugs constructed by randomly combining two drugs. Each drug pair was represented by 465 features derived from the aforementioned three categories of properties. The random forest algorithm was adopted to train the prediction model. Some feature selection techniques, including minimum redundancy maximum relevance and incremental feature selection, were used to extract key features as the optimal input for the prediction model. The extracted key features may help to gain insights into the mechanisms of DDIs and provide some guidelines for the relevant clinical medication developments, and the prediction model can give new clues for identification of novel DDIs.


Artificial Intelligence in Medicine | 2017

Analysis of cancer-related lncRNAs using gene ontology and KEGG pathways

Lei Chen; Yu-Hang Zhang; Guohui Lu; Tao Huang; Yu-Dong Cai

BACKGROUND Cancer is a disease that involves abnormal cell growth and can invade or metastasize to other tissues. It is known that several factors are related to its initiation, proliferation, and invasiveness. Recently, it has been reported that long non-coding RNAs (lncRNAs) can participate in specific functional pathways and further regulate the biological function of cancer cells. Studies on lncRNAs are therefore helpful for uncovering the underlying mechanisms of cancer biological processes. METHODS We investigated cancer-related lncRNAs using gene ontology (GO) terms and KEGG pathway enrichment scores of neighboring genes that are co-expressed with the lncRNAs by extracting important GO terms and KEGG pathways that can help us identify cancer-related lncRNAs. The enrichment theory of GO terms and KEGG pathways was adopted to encode each lncRNA. Then, feature selection methods were employed to analyze these features and obtain the key GO terms and KEGG pathways. RESULTS The analysis indicated that the extracted GO terms and KEGG pathways are closely related to several cancer associated processes, such as hormone associated pathways, energy associated pathways, and ribosome associated pathways. And they can accurately predict cancer-related lncRNAs. CONCLUSIONS This study provided novel insight of how lncRNAs may affect tumorigenesis and which pathways may play important roles during it. These results could help understanding the biological mechanisms of lncRNAs and treating cancer.


Molecular Genetics and Genomics | 2016

Identifying novel protein phenotype annotations by hybridizing protein–protein interactions and protein sequence similarities

Lei Chen; Yu-Hang Zhang; Tao Huang; Yu-Dong Cai

Studies of protein phenotypes represent a central challenge of modern genetics in the post-genome era because effective and accurate investigation of protein phenotypes is one of the most critical procedures to identify functional biological processes in microscale, which involves the analysis of multifactorial traits and has greatly contributed to the development of modern biology in the post genome era. Therefore, we have developed a novel computational method that identifies novel proteins associated with certain phenotypes in yeast based on the protein–protein interaction network. Unlike some existing network-based computational methods that identify the phenotype of a query protein based on its direct neighbors in the local network, the proposed method identifies novel candidate proteins for a certain phenotype by considering all annotated proteins with this phenotype on the global network using a shortest path (SP) algorithm. The identified proteins are further filtered using both a permutation test and their interactions and sequence similarities to annotated proteins. We compared our method with another widely used method called random walk with restart (RWR). The biological functions of proteins for each phenotype identified by our SP method and the RWR method were analyzed and compared. The results confirmed a large proportion of our novel protein phenotype annotation, and the RWR method showed a higher false positive rate than the SP method. Our method is equally effective for the prediction of proteins involving in all the eleven clustered yeast phenotypes with a quite low false positive rate. Considering the universality and generalizability of our supporting materials and computing strategies, our method can further be applied to study other organisms and the new functions we predicted can provide pertinent instructions for the further experimental verifications.


BioMed Research International | 2016

Analysis and Identification of Aptamer-Compound Interactions with a Maximum Relevance Minimum Redundancy and Nearest Neighbor Algorithm

ShaoPeng Wang; Yu-Hang Zhang; Jing Lu; Weiren Cui; Jerry Hu; Yu-Dong Cai

The development of biochemistry and molecular biology has revealed an increasingly important role of compounds in several biological processes. Like the aptamer-protein interaction, aptamer-compound interaction attracts increasing attention. However, it is time-consuming to select proper aptamers against compounds using traditional methods, such as exponential enrichment. Thus, there is an urgent need to design effective computational methods for searching effective aptamers against compounds. This study attempted to extract important features for aptamer-compound interactions using feature selection methods, such as Maximum Relevance Minimum Redundancy, as well as incremental feature selection. Each aptamer-compound pair was represented by properties derived from the aptamer and compound, including frequencies of single nucleotides and dinucleotides for the aptamer, as well as the constitutional, electrostatic, quantum-chemical, and space conformational descriptors of the compounds. As a result, some important features were obtained. To confirm the importance of the obtained features, we further discussed the associations between them and aptamer-compound interactions. Simultaneously, an optimal prediction model based on the nearest neighbor algorithm was built to identify aptamer-compound interactions, which has the potential to be a useful tool for the identification of novel aptamer-compound interactions. The program is available upon the request.


IEEE Access | 2017

Identify Key Sequence Features to Improve CRISPR sgRNA Efficacy

Lei Chen; ShaoPeng Wang; Yu-Hang Zhang; JiaRui Li; Zhihao Xing; Jialiang Yang; Tao Huang; Yu-Dong Cai

The CRISPR/Cas9 system is a creative and innovative gene editing biotechnology tool in genetic engineering. Although several achievements have been attained using the CRISPR/Cas9 system, it is still a challenge to avoid off-target effects and improve the editing efficacy. Previous efforts on evaluating the efficacy and designing the guide RNA mainly focused on DNA properties. However, some DNA features have not been characterized but can be reflected by protein properties, such as the disorder features and the sequence conservation. In this paper, we provided a computational framework to identify important features related to the efficacy of CRISPR/Cas9 focusing on the properties of the proteins encoded by the target DNA fragments. The feature selection method, maximal-relevance-minimal-redundancy, was adopted to analyze these features. And incremental feature selection together with support vector machine, were employed to extract optimal features, on which an optimal classifier can be constructed. As a result, 152 important features were extracted, with which an optimal classifier based on support vector machine was built. This classifier obtained the highest MCC value of 0.355. Finally, a series of detailed biological analyses were performed on the optimal features. From the results, we found that some key factors may differentially affect the binding activity of sgRNAs to their targets. Among them, the disorder status of the target protein sequences was found to be a major factor that is related to the efficacy of sgRNAs, suggesting the DNA features associated with the protein disorder status could also affect the CRISPR/Cas9 efficacy.


Scientific Reports | 2016

Identification of novel candidate drivers connecting different dysfunctional levels for lung adenocarcinoma using protein-protein interactions and a shortest path approach

Lei Chen; Tao Huang; Yu-Hang Zhang; Yang Jiang; Mingyue Zheng; Yu-Dong Cai

Tumors are formed by the abnormal proliferation of somatic cells with disordered growth regulation under the influence of tumorigenic factors. Recently, the theory of “cancer drivers” connects tumor initiation with several specific mutations in the so-called cancer driver genes. According to the differentiation of four basic levels between tumor and adjacent normal tissues, the cancer drivers can be divided into the following: (1) Methylation level, (2) microRNA level, (3) mutation level, and (4) mRNA level. In this study, a computational method is proposed to identify novel lung adenocarcinoma drivers based on dysfunctional genes on the methylation, microRNA, mutation and mRNA levels. First, a large network was constructed using protein-protein interactions. Next, we searched all of the shortest paths connecting dysfunctional genes on different levels and extracted new candidate genes lying on these paths. Finally, the obtained candidate genes were filtered by a permutation test and an additional strict selection procedure involving a betweenness ratio and an interaction score. Several candidate genes remained, which are deemed to be related to two different levels of cancer. The analyses confirmed our assertions that some have the potential to contribute to the tumorigenesis process on multiple levels.


Journal of Cellular Biochemistry | 2018

Identification of gene expression signatures across different types of neural stem cells with the Monte-Carlo feature selection method†

Lei Chen; JiaRui Li; Yu-Hang Zhang; Kai-Yan Feng; ShaoPeng Wang; YunHua Zhang; Tao Huang; Xiangyin Kong; Yu-Dong Cai

Adult neural stem cells (NSCs) are a group of multi‐potent, self‐renewing progenitor cells that contribute to the generation of new neurons and oligodendrocytes. Three subtypes of NSCs can be isolated based on the stages of the NSC lineage, including quiescent neural stem cells (qNSCs), activated neural stem cells (aNSCs) and neural progenitor cells (NPCs). Although it is widely accepted that these three groups of NSCs play different roles in the development of the nervous system, their molecular signatures are poorly understood. In this study, we applied the Monte‐Carlo Feature Selection (MCFS) method to identify the gene expression signatures, which can yield a Matthews correlation coefficient (MCC) value of 0.918 with a support vector machine evaluated by ten‐fold cross‐validation. In addition, some classification rules yielded by the MCFS program for distinguishing above three subtypes were reported. Our results not only demonstrate a high classification capacity and subtype‐specific gene expression patterns but also quantitatively reflect the pattern of the gene expression levels across the NSC lineage, providing insight into deciphering the molecular basis of NSC differentiation.


Journal of Proteome Research | 2017

Identification of Genes Associated with Breast Cancer Metastasis to Bone on a Protein–Protein Interaction Network with a Shortest Path Algorithm

Yu-Dong Cai; Qing Zhang; Yu-Hang Zhang; Lei Chen; Tao Huang

Tumor metastasis is defined as the spread of tumor cells from one organ or part to another that is not directly connected to it, which significantly contributes to the progression and aggravation of tumorigenesis. Because it always involves multiple organs, the metastatic process is difficult to study in its entirety. Complete identification of the genes related to this process is an alternative way to study metastasis. In this study, we developed a computational method to identify such genes. To test our method, we selected breast cancer bone metastasis. A large network was constructed using human protein-protein interactions. On the basis of the validated genes related to breast and bone cancer, a shortest path algorithm was applied to the network to search for novel genes that may mediate breast cancer metastasis to bone. In addition, further rules constructed using the permutation FDR, the betweenness ratio, and the max-min interaction score were also employed in the method to make the inferred genes more reliable. Eighteen putative genes were identified by the method and were extensively analyzed. The confirmation results indicate that these genes participate in metastasis.


International Journal of Molecular Sciences | 2017

Determination of Genes Related to Uveitis by Utilization of the Random Walk with Restart Algorithm on a Protein–Protein Interaction Network

Shiheng Lu; Yan Yan; Zhen Li; Lei Chen; Jing Yang; Yu-Hang Zhang; ShaoPeng Wang; Lin Liu

Uveitis, defined as inflammation of the uveal tract, may cause blindness in both young and middle-aged people. Approximately 10–15% of blindness in the West is caused by uveitis. Therefore, a comprehensive investigation to determine the disease pathogenesis is urgent, as it will thus be possible to design effective treatments. Identification of the disease genes that cause uveitis is an important requirement to achieve this goal. To begin to answer this question, in this study, a computational method was proposed to identify novel uveitis-related genes. This method was executed on a large protein–protein interaction network and employed a popular ranking algorithm, the Random Walk with Restart (RWR) algorithm. To improve the utility of the method, a permutation test and a procedure for selecting core genes were added, which helped to exclude false discoveries and select the most important candidate genes. The five-fold cross-validation was adopted to evaluate the method, yielding the average F1-measure of 0.189. In addition, we compared our method with a classic GBA-based method to further indicate its utility. Based on our method, 56 putative genes were chosen for further assessment. We have determined that several of these genes (e.g., CCL4, Jun, and MMP9) are likely to be important for the pathogenesis of uveitis.

Collaboration


Dive into the Yu-Hang Zhang's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tao Huang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Lei Chen

Shanghai Maritime University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xiangyin Kong

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xiaoyong Pan

Erasmus University Rotterdam

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mingyue Zheng

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

YunHua Zhang

Anhui Agricultural University

View shared research outputs
Researchain Logo
Decentralizing Knowledge