Debby D. Wang
City University of Hong Kong
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Debby D. Wang.
IEEE Transactions on Systems, Man, and Cybernetics | 2014
Xi-Zhao Wang; Debby D. Wang
An important way to improve the performance of naive Bayesian classifiers (NBCs) is to remove or relax the fundamental assumption of independence among the attributes, which usually results in an estimation of joint probability density function (p.d.f.) instead of the estimation of marginal p.d.f. in the NBC design. This paper proposes a non-naive Bayesian classifier (NNBC) in which the independence assumption is removed and the marginal p.d.f. estimation is replaced by the joint p.d.f. estimation. A new technique of estimating the class-conditional p.d.f. based on the optimal bandwidth selection, which is the crucial part of the joint p.d.f. estimation, is applied in our NNBC. Three well-known indexes for measuring the performance of Bayesian classifiers, which are classification accuracy, area under receiver operating characteristic curve, and probability mean square error, are adopted to conduct a comparison among the four Bayesian models, i.e., normal naive Bayesian, flexible naive Bayesian (FNB), the homologous model of FNB (FNBROT), and our proposed NNBC. The comparative results show that NNBC is statistically superior to the other three models regarding the three indexes. And, in the comparison with support vector machine and four boosting-based classification methods, NNBC achieves a relatively favorable classification accuracy while significantly reducing the training time.
Neurocomputing | 2014
Debby D. Wang; Ran Wang; Hong Yan
Conventional machine learning methods can be used to identify protein-protein interaction sites and study the gene regulatory networks and functions. However, when applied to large datasets, the computational complexities of these methods become a major drawback. With a significantly reduced computational complexity, the Extreme Learning Machines provide an attractive balance between computational time and generalization performance. In the method proposed in this paper, after searching for interfacial residues using a dynamic strategy and extracting spatially neighboring residue profiles for a set of 563 non-redundant protein chains, we implement the interface prediction either on multi-chain sets or on single-chain sets, using the two methods Extreme Learning Machines and support vector machines for a comparable study. As a consequence, in both multi-chain and single-chain cases Extreme Learning Machines tend to obtain higher Recall values than support vector machines, and in the multi-chain case Extreme Learning Machines as well show a remarkable advantage in the computational speed.
Scientific Reports | 2013
Debby D. Wang; Weiqiang Zhou; Hong Yan; Maria Pik Wong; Victor C. S. Lee
EGFR mutation-induced drug resistance has significantly impaired the potency of small molecule tyrosine kinase inhibitors in lung cancer treatment. Computational approaches can provide powerful and efficient techniques in the investigation of drug resistance. In our work, the EGFR mutation feature is characterized by the energy components of binding free energy (concerning the mutant-inhibitor complex), and we combine it with specific personal features for 168 clinical subjects to construct a personalized drug resistance prediction model. The 3D structure of an EGFR mutant is computationally predicted from its protein sequence, after which the dynamics of the bound mutant-inhibitor complex is simulated via AMBER and the binding free energy of the complex is calculated based on the dynamics. The utilization of extreme learning machines and leave-one-out cross-validation promises a successful identification of resistant subjects with high accuracy. Overall, our study demonstrates advantages in the development of personalized medicine/therapy design and innovative drug discovery.
BMC Bioinformatics | 2015
Lichun Ma; Debby D. Wang; Yiqing Huang; Hong Yan; Maria Pik Wong; Victor C. S. Lee
BackgroundEpidermal growth factor receptor (EGFR) mutation-induced drug resistance has caused great difficulties in the treatment of non-small-cell lung cancer (NSCLC). However, structural information is available for just a few EGFR mutants. In this study, we created an EGFR Mutant Structural Database (freely available at http://bcc.ee.cityu.edu.hk/data/EGFR.html), including the 3D EGFR mutant structures and their corresponding binding free energies with two commonly used inhibitors (gefitinib and erlotinib).ResultsWe collected the information of 942 NSCLC patients belonging to 112 mutation types. These mutation types are divided into five groups (insertion, deletion, duplication, modification and substitution), and substitution accounts for 61.61% of the mutation types and 54.14% of all the patients. Among all the 942 patients, 388 cases experienced a mutation at residue site 858 with leucine replaced by arginine (L858R), making it the most common mutation type. Moreover, 36 (32.14%) mutation types occur at exon 19, and 419 (44.48%) patients carried a mutation at exon 21. In this study, we predicted the EGFR mutant structures using Rosetta with the collected mutation types. In addition, Amber was employed to refine the structures followed by calculating the binding free energies of mutant-drug complexes.ConclusionsThe EGFR Mutant Structural Database provides resources of 3D structures and the binding affinity with inhibitors, which can be used by other researchers to study NSCLC further and by medical doctors as reference for NSCLC treatment.
Computers in Biology and Medicine | 2015
Lichun Ma; Debby D. Wang; Yiqing Huang; Maria Pik Wong; Victor Ho Fun Lee; Hong Yan
Epidermal growth factor receptor (EGFR) mutation-induced drug resistance leads to a limited efficacy of tyrosine kinase inhibitors during lung cancer treatments. In this study, we explore the correlations between the local surface geometric properties of EGFR mutants and the progression-free survival (PFS). The geometric properties include local surface changes (four types) of the EGFR mutants compared with the wild-type EGFR, and the convex degrees of these local surfaces. Our analysis results show that the Spearman׳s rank correlation coefficients between the PFS and three types of local surface properties are all greater than 0.6 with small P-values, implying a high significance. Moreover, the number of atoms with solid angles in the ranges of [0.71, 1], [0.61, 1] or [0.5, 1], indicating the convex degree of a local EGFR surface, also shows a strong correlation with the PFS. Overall, these characteristics can be efficiently applied to the prediction of drug resistance in lung cancer treatments, and easily extended to other cancer treatments.
PLOS ONE | 2015
Debby D. Wang; Lichun Ma; Maria Pik Wong; Victor Ho Fun Lee; Hong Yan
EGFR mutation-induced drug resistance has become a major threat to the treatment of non-small-cell lung carcinoma. Essentially, the resistance mechanism involves modifications of the intracellular signaling pathways. In our work, we separately investigated the EGFR and ErbB-3 heterodimerization, regarded as the origin of intracellular signaling pathways. On one hand, we combined the molecular interaction in EGFR heterodimerization with that between the EGFR tyrosine kinase and its inhibitor. For 168 clinical subjects, we characterized their corresponding EGFR mutations using molecular interactions, with three potential dimerization partners (ErbB-2, IGF-1R and c-Met) of EGFR and two of its small molecule inhibitors (gefitinib and erlotinib). Based on molecular dynamics simulations and structural analysis, we modeled these mutant-partner or mutant-inhibitor interactions using binding free energy and its components. As a consequence, the mutant-partner interactions are amplified for mutants L858R and L858R_T790M, compared to the wild type EGFR. Mutant delL747_P753insS represents the largest difference between the mutant-IGF-1R interaction and the mutant-inhibitor interaction, which explains the shorter progression-free survival of an inhibitor to this mutant type. Besides, feature sets including different energy components were constructed, and efficient regression trees were applied to map these features to the progression-free survival of an inhibitor. On the other hand, we comparably examined the interactions between ErbB-3 and its partners (EGFR mutants, IGF-1R, ErbB-2 and c-Met). Compared to others, c-Met shows a remarkably-strong binding with ErbB-3, implying its significant role in regulating ErbB-3 signaling. Moreover, EGFR mutants corresponding to poor clinical outcomes, such as L858R_T790M, possess lower binding affinities with ErbB-3 than c-Met does. This may promote the communication between ErbB-3 and c-Met in these cancer cells. The analysis verified the important contribution of IGF-1R or c-Met in the drug resistance mechanism developed in lung cancer treatments, which may bring many benefits to specialized therapy design and innovative drug discovery.
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems | 2013
Ran Wang; Sam Kwong; Debby D. Wang
It is experimentally observed that the approximate errors of extreme learning machine (ELM) are dependent on the uniformity of training samples after the network architecture is fixed, and the uniformity, which is usually measured by the variance of distances among samples, varies with the linear transformation induced by the random weight matrix. By analyzing the dimension increase process in ELM, this paper gives an approximate relation between the uniformities before and after the linear transformation. Furthermore, by restricting ELM with a two-dimensional space, it gives an upper bound of ELM approximate error which is dependent on the distributive uniformity of training samples. The analytic results provide some useful guidelines to make clear the impact of random weights on ELM approximate ability and improve ELM prediction accuracy.
computational intelligence in bioinformatics and computational biology | 2013
Weiqiang Zhou; Debby D. Wang; Hong Yan; Maria Pik Wong; Victor C. S. Lee
Mutations in EGFR kinase domain can cause non-small-cell lung cancer, which is one of the most lethal diseases in the world. However, current therapy is limited by the drug resistance effect in different EGFR mutants. There is an urgent demand for developing computational methods to predict drug resisted mutations. In this study, we use quantum mechanics and molecular mechanics models to generate EGFR mutants, and apply molecular dynamic to simulate EGFR-drug interactions. Hydrogen bonds and binding free energy are used to reveal the underlying principle of drug resistance in EGFR. The results show that drug resisted mutants do not establish hydrogen bond between the drug and the protein molecule while having large binding free energy. These properties can be used to predict resistance to anti-EGFR drugs due to protein mutations.
Physical Biology | 2011
Debby D. Wang; Hong Yan
Nucleosomes, which contain DNA and proteins, are the basic unit of eukaryotic chromatins. Polymers such as DNA and proteins are dynamic, and their conformational changes can lead to functional changes. Periodic dinucleotide patterns exist in nucleosomal DNA chains and play an important role in the nucleosome structure. In this paper, we use normal mode analysis to detect significant structural deformations of nucleosomal DNA and investigate the relationship between periodic dinucleotides and DNA motions. We have found that periodic dinucleotides are usually located at the peaks or valleys of DNA and protein motions, revealing that they dominate the nucleosome dynamics. Also, a specific dinucleotide pattern CA/TG appears most frequently.
Fuzzy Sets and Systems | 2015
Debby D. Wang; Weiqiang Zhou; Hong Yan
It is a great challenge to process big data in bioinformatics. In this paper, we addressed the problem of identifying protein-protein interfacial residues from massive protein structural data. A protein set, comprising 154?993 residues, was analyzed. We applied the three-dimensional alpha shape modeling to the search of surface and interfacial residues in this set, and adopted the spatially neighboring residue profiles to characterize each residue. These residue profiles, which revealed the sequential and spatial information of proteins, translated the original data into a large matrix. After vertically and horizontally refining this matrix, we comparably implemented a series of popular learning procedures, including neuro-fuzzy classifiers (NFCs), CART, neighborhood classifiers (NECs), extreme learning machines (ELMs) and naive Bayesian classifiers (NBCs), to predict the interfacial residues, aiming to investigate the sensitivity of these massive structural data to different learning mechanisms. As a consequence, ELMs, CART and NFCs performed better in terms of computational costs; NFCs, NBCs and ELMs provided favorable prediction accuracies. Overall, NFCs, NBCs and ELMs are favourable choices for fastly and accurately handling this type of data. More importantly, the marginal differences between the prediction performances of these methods imply the insensitivity of this type of data to different learning mechanisms.