Zheng Rong Yang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zheng Rong Yang is active.

Explore More

Publication

Featured researches published by Zheng Rong Yang.

IEEE Transactions on Neural Networks | 2005

Bio-basis function neural network for prediction of protease cleavage sites in proteins

Zheng Rong Yang; Rebecca Thomson

The prediction of protease cleavage sites in proteins is critical to effective drug design. One of the important issues in constructing an accurate and efficient predictor is how to present nonnumerical amino acids to a model effectively. As this issue has not yet been paid full attention and is closely related to model efficiency and accuracy, we present a novel neural learning algorithm aimed at improving the prediction accuracy and reducing the time involved in training. The algorithm is developed based on the conventional radial basis function neural networks (RBFNNs) and is referred to as a bio-basis function neural network (BBFNN). The basic principle is to replace the radial basis function used in RBFNNs by a novel bio-basis function. Each bio-basis is a feature dimension in a numerical feature space, to which a nonnumerical sequence space is mapped for analysis. The bio-basis function is designed using an amino acid mutation matrix verified in biology. Thus, the biological content in protein sequences can be maximally utilized for accurate modeling. Mutual information (MI) is used to select the most informative bio-bases and an ensemble method is used to enhance a decision-making process, hence, improving the prediction accuracy further. The algorithm has been successfully verified in two case studies, namely the prediction of Human Immunodeficiency Virus (HIV) protease cleavage sites and trypsin cleavage sites in proteins.

Bioinformatics | 2004

Bio-support vector machines for computational proteomics

Zheng Rong Yang; Kuo-Chen Chou

MOTIVATION One of the most important issues in computational proteomics is to produce a prediction model for the classification or annotation of biological function of novel protein sequences. In order to improve the prediction accuracy, much attention has been paid to the improvement of the performance of the algorithms used, few is for solving the fundamental issue, namely, amino acid encoding as most existing pattern recognition algorithms are unable to recognize amino acids in protein sequences. Importantly, the most commonly used amino acid encoding method has the flaw that leads to large computational cost and recognition bias. RESULTS By replacing kernel functions of support vector machines (SVMs) with amino acid similarity measurement matrices, we have modified SVMs, a new type of pattern recognition algorithm for analysing protein sequences, particularly for proteolytic cleavage site prediction. We refer to the modified SVMs as bio-support vector machine. When applied to the prediction of HIV protease cleavage sites, the new method has shown a remarkable advantage in reducing the model complexity and enhancing the model robustness.

Bioinformatics | 2003

Characterizing proteolytic cleavage site activity using bio-basis function neural networks

Rebecca Thomson; T. Charles Hodgman; Zheng Rong Yang; Austin K. Doyle

MOTIVATION In protein chemistry, proteomics and biopharmaceutical development, there is a desire to know not only where a protein is cleaved by a protease, but also the susceptibility of its cleavage sites. The current tools for proteolytic cleavage prediction have often relied purely on regular expressions, or involve models that do not represent biological data well. RESULTS A novel methodology for characterizing proteolytic cleavage site activities has been developed, which incorporates two fundamental features: activity class prediction and the use of an amino acid similarity matrix for (non-parametric) neural learning. The first solved the problem of predicting proteolytic efficiency. The second significantly improved the robustness in prediction and reduced the time complexity for learning. This study shows that activity class prediction is successful when applying this methodology to the prediction and characterization of Trypsin cleavage sites and the prediction of HIV protease cleavage sites. AVAILABILITY Requests for software and data should be made respectively to Dr Zheng Rong Yang and Miss Rebecca Thomson.

IEEE Transactions on Neural Networks | 2006

A novel radial basis function neural network for discriminant analysis

Zheng Rong Yang

A novel radial basis function neural network for discriminant analysis is presented in this paper. In contrast to many other researches, this work focuses on the exploitation of the weight structure of radial basis function neural networks using the Bayesian method. It is expected that the performance of a radial basis function neural network with a well-explored weight structure can be improved. As the weight structure of a radial basis function neural network is commonly unknown, the Bayesian method is, therefore, used in this paper to study this a priori structure. Two weight structures are investigated in this study, i.e., a single-Gaussian structure and a two-Gaussian structure. An expectation-maximization learning algorithm is used to estimate the weights. The simulation results showed that the proposed radial basis function neural network with a weight structure of two Gaussians outperformed the other algorithms

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2001

Mutual information theory for adaptive mixture models

Zheng Rong Yang; Mark Zwolinski

Many pattern recognition systems need to estimate an underlying probability density function (pdf). Mixture models are commonly used for this purpose in which an underlying pdf is estimated by a finite mixing of distributions. The basic computational element of a density mixture model is a component with a nonlinear mapping function, which takes part in mixing. Selecting an optimal set of components for mixture models is important to ensure an efficient and accurate estimate of an underlying pdf. Previous work has commonly estimated an underlying pdf based on the information contained in patterns. In this paper, mutual information theory is employed to measure whether two components are statistically dependent. If a component has small mutual information, it is statistically independent of the other components. Hence, that component makes a significant contribution to the system pdf and should not be removed. However, if a particular component has large mutual information, it is unlikely to be statistically independent of the other components and may be removed without significant damage to the estimated pdf. Continuing to remove components with large and positive mutual information will give a density mixture model with an optimal structure, which is very close to the true pdf.

Computational Biology and Chemistry | 2004

Brief communication: Reduced bio basis function neural network for identification of protein phosphorylation sites: comparison with pattern recognition algorithms

Emily A. Berry; Andrew R. Dalby; Zheng Rong Yang

Protein phosphorylation is a post-translational modification performed by a group of enzymes known as the protein kinases or phosphotransferases (Enzyme Commission classification 2.7). It is essential to the correct functioning of both proteins and cells, being involved with enzyme control, cell signalling and apoptosis. The major problem when attempting prediction of these sites is the broad substrate specificity of the enzymes. This study employs back-propagation neural networks (BPNNs), the decision tree algorithm C4.5 and the reduced bio-basis function neural network (rBBFNN) to predict phosphorylation sites. The aim is to compare prediction efficiency of the three algorithms for this problem, and examine knowledge extraction capability. All three algorithms are effective for phosphorylation site prediction. Results indicate that rBBFNN is the fastest and most sensitive of the algorithms. BPNN has the highest area under the ROC curve and is therefore the most robust, and C4.5 has the highest prediction accuracy. C4.5 also reveals the amino acid 2 residues upstream from the phosporylation site is important for serine/threonine phosphorylation, whilst the amino acid 3 residues upstream is important for tyrosine phosphorylation.

Archive | 2010

Machine Learning Approaches to Bioinformatics

Zheng Rong Yang

This book covers a wide range of subjects in applying machine learning approaches for bioinformatics projects. The book succeeds on two key unique features. First, it introduces the most widely used machine learning approaches in bioinformatics and discusses, with evaluations from real case studies, how they are used in individual bioinformatics projects. Second, it introduces state-of-the-art bioinformatics research methods. The theoretical parts and the practical parts are well integrated for readers to follow the existing procedures in individual research. Unlike most of the bioinformatics books on the market, the content coverage is not limited to just one subject. A broad spectrum of relevant topics in bioinformatics including systematic data mining and computational systems biology researches are brought together in this book, thereby offering an efficient and convenient platform for teaching purposes. An essential reference for both final year undergraduates and graduate students in universities, as well as a comprehensive handbook for new researchers, this book will also serve as a practical guide for software development in relevant bioinformatics projects.

Acta Crystallographica Section D-biological Crystallography | 2006

Honing the in silico toolkit for detecting protein disorder

Robert M. Esnouf; R. Hamer; Joel L. Sussman; Israel Silman; David C. Trudgian; Zheng Rong Yang; Jaime Prilusky

Not all proteins form well defined three-dimensional structures in their native states. Some amino-acid sequences appear to strongly favour the disordered state, whereas some can apparently transition between disordered and ordered states under the influence of changes in the biological environment, thereby playing an important role in processes such as signalling. Although important biologically, for the structural biologist disordered regions of proteins can be disastrous even preventing successful structure determination. The accurate prediction of disorder is therefore important, not least for directing the design of expression constructs so as to maximize the chances of successful structure determination. Such design criteria have become integral to the construct-design strategies of laboratories within the Structural Proteomics In Europe (SPINE) consortium. This paper assesses the current state of the art in disorder prediction in terms of prediction reliability and considers how best to use these methods to guide construct design. Finally, it presents a brief discussion as to how methods of prediction might be improved in the future.

Nucleic Acids Research | 2015

Improved prediction of RNA secondary structure by integrating the free energy model with restraints derived from experimental probing data

Yang Wu; Binbin Shi; Xinqiang Ding; Tong Liu; Xihao Hu; Kevin Y. Yip; Zheng Rong Yang; David H. Mathews; Zhi John Lu

Recently, several experimental techniques have emerged for probing RNA structures based on high-throughput sequencing. However, most secondary structure prediction tools that incorporate probing data are designed and optimized for particular types of experiments. For example, RNAstructure-Fold is optimized for SHAPE data, while SeqFold is optimized for PARS data. Here, we report a new RNA secondary structure prediction method, restrained MaxExpect (RME), which can incorporate multiple types of experimental probing data and is based on a free energy model and an MEA (maximizing expected accuracy) algorithm. We first demonstrated that RME substantially improved secondary structure prediction with perfect restraints (base pair information of known structures). Next, we collected structure-probing data from diverse experiments (e.g. SHAPE, PARS and DMS-seq) and transformed them into a unified set of pairing probabilities with a posterior probabilistic model. By using the probability scores as restraints in RME, we compared its secondary structure prediction performance with two other well-known tools, RNAstructure-Fold (based on a free energy minimization algorithm) and SeqFold (based on a sampling algorithm). For SHAPE data, RME and RNAstructure-Fold performed better than SeqFold, because they markedly altered the energy model with the experimental restraints. For high-throughput data (e.g. PARS and DMS-seq) with lower probing efficiency, the secondary structure prediction performances of the tested tools were comparable, with performance improvements for only a portion of the tested RNAs. However, when the effects of tertiary structure and protein interactions were removed, RME showed the highest prediction accuracy in the DMS-accessible regions by incorporating in vivo DMS-seq data.

Bioinformatics | 2004

Predicting the linkage sites in glycoproteins using bio-basis function neural network

Zheng Rong Yang; Kuo-Chen Chou

MOTIVATION Although, it is known that O-glycosidically linked oligosaccharides are commonly conjugated to a serine, threonine or hydroxylysine residue of the polypeptide, the chemical nature of the anchoring monosaccharide and the size of the oligosaccharide unit varies. Among different types, O-linked or mucin-type oligosaccharides are intimately involved in the secretion of proteins, be they enzymes, hormones or structural glycoproteins. Knowledge of the linkage sites in glycoproteins is critical to the design of specific and efficient inhibitors against the enzyme to catalyse the formation of the carbohydrate-peptide linkage. RESULTS We present a method for predicting the linkage sites in O-linked glycoproteins using bio-basis function neural networks. The mean prediction accuracy of this method is 91.15 +/- 2.75% while it is 82.28 +/- 6.45% using back-propagation neural networks. Importantly, this method has significantly reduced the CPU time for modelling.

Explore More