Hailong Zhu
Hong Kong Polytechnic University
                                 Network
                            
                            Latest external collaboration on country level. Dive into details by clicking on the dots.
                                 Publication
                            
                            Featured researches published by Hailong Zhu.
Pattern Recognition | 2010
Lin Zhang; Lei Zhang; David Zhang; Hailong Zhu
Biometric based personal authentication is an effective method for automatically recognizing, with a high confidence, a persons identity. By observing that the texture pattern produced by bending the finger knuckle is highly distinctive, in this paper we present a new biometric authentication system using finger-knuckle-print (FKP) imaging. A specific data acquisition device is constructed to capture the FKP images, and then an efficient FKP recognition algorithm is presented to process the acquired data in real time. The local convex direction map of the FKP image is extracted based on which a local coordinate system is established to align the images and a region of interest is cropped for feature extraction. For matching two FKPs, a feature extraction scheme, which combines orientation and magnitude information extracted by Gabor filtering is proposed. An FKP database, which consists of 7920 images from 660 different fingers, is established to verify the efficacy of the proposed system and promising results are obtained. Compared with the other existing finger-back surface based biometric systems, the proposed FKP system achieves much higher recognition rate and it works in real time. It provides a practical solution to finger-back surface based biometric systems and has great potentials for commercial applications.
Pattern Recognition | 2011
Lin Zhang; Lei Zhang; David Zhang; Hailong Zhu
Biometrics authentication is an effective method for automatically recognizing a persons identity. Recently, it has been found that the finger-knuckle-print (FKP), which refers to the inherent skin patterns of the outer surface around the phalangeal joint of ones finger, has high capability to discriminate different individuals, making it an emerging biometric identifier. In this paper, based on the results of psychophysics and neurophysiology studies that both local and global information is crucial for the image perception, we present an effective FKP recognition scheme by extracting and assembling local and global features of FKP images. Specifically, the orientation information extracted by the Gabor filters is coded as the local feature. By increasing the scale of Gabor filters to infinite, actually we can get the Fourier transform of the image, and hence the Fourier transform coefficients of the image can be taken as the global features. Such kinds of local and global features are naturally linked via the framework of time-frequency analysis. The proposed scheme exploits both local and global information for the FKP verification, where global information is also utilized to refine the alignment of FKP images in matching. The final matching distance of two FKPs is a weighted average of local and global matching distances. The experimental results conducted on our FKP database demonstrate that the proposed local-global information combination scheme could significantly improve the recognition accuracy obtained by either local or global information and lead to promising performance of an FKP-based personal authentication system.
BMC Bioinformatics | 2015
Guoxian Yu; Hailong Zhu; Carlotta Domeniconi
BackgroundProtein function prediction is to assign biological or biochemical functions to proteins, and it is a challenging computational problem characterized by several factors: (1) the number of function labels (annotations) is large; (2) a protein may be associated with multiple labels; (3) the function labels are structured in a hierarchy; and (4) the labels are incomplete. Current predictive models often assume that the labels of the labeled proteins are complete, i.e. no label is missing. But in real scenarios, we may be aware of only some hierarchical labels of a protein, and we may not know whether additional ones are actually present. The scenario of incomplete hierarchical labels, a challenging and practical problem, is seldom studied in protein function prediction.ResultsIn this paper, we propose an algorithm to Predict protein functions using Incomplete hierarchical LabeLs (PILL in short). PILL takes into account the hierarchical and the flat taxonomy similarity between function labels, and defines a Combined Similarity (ComSim) to measure the correlation between labels. PILL estimates the missing labels for a protein based on ComSim and the known labels of the protein, and uses a regularization to exploit the interactions between proteins for function prediction. PILL is shown to outperform other related techniques in replenishing the missing labels and in predicting the functions of completely unlabeled proteins on publicly available PPI datasets annotated with MIPS Functional Catalogue and Gene Ontology labels.ConclusionThe empirical study shows that it is important to consider the incomplete annotation for protein function prediction. The proposed method (PILL) can serve as a valuable tool for protein function prediction using incomplete labels. The Matlab code of PILL is available upon request.
Scientific Reports | 2015
C. George Priya Doss; B. Rajith; Chiranjib Chakraborty; N. Nagasundaram; Shabana Kouser Ali; Hailong Zhu
Some individuals with non-small-cell lung cancer (NSCLC) benefit from therapies targeting epidermal growth factor receptor (EGFR), and the characterization of a new mechanism of resistance to the EGFR-specific antibody gefitinib will provide valuable insight into how therapeutic strategies might be designed to overcome this particular resistance mechanism. The G719S and T790M mutations and their combination were involved in causing different conformational redistribution of EGFR. In the present computational study, we analyzed the impact and structural influence of G719S/T790M double mutation (DM) in EGFR with ligand (gefitinib) through molecular dynamic simulation (50 ns) and docking analysis. We observed the escalation in distance between the functional loop and activation loop with respect to T790M mutation compared to the G719S mutation. Furthermore, we confirmed that the G719S mutation causes the ligand to move closer to the hinge region, whereas T790M makes the ligand escape from the binding pocket. Obtained results provide with an explanation for the resistance induced by T790M and a vital clue for the design of drugs to combat gefitinib resistance.
Theranostics | 2014
Doss C. Priya George; Chiranjib Chakraborty; Sa Syed Haneef; Nagarajan NagaSundaram; Luonan Chen; Hailong Zhu
Heterozygous mutations in the central glycolytic enzyme glucokinase (GCK) can result in an autosomal dominant inherited disease, namely maturity-onset diabetes of the young, type 2 (MODY 2). MODY 2 is characterised by early onset: it usually appears before 25 years of age and presents as a mild form of hyperglycaemia. In recent years, the number of known GCK mutations has markedly increased. As a result, interpreting which mutations cause a disease or confer susceptibility to a disease and characterising these deleterious mutations can be a difficult task in large-scale analyses and may be impossible when using a structural perspective. The laborious and time-consuming nature of the experimental analysis led us to attempt to develop a cost-effective computational pipeline for diabetic research that is based on the fundamentals of protein biophysics and that facilitates our understanding of the relationship between phenotypic effects and evolutionary processes. In this study, we investigate missense mutations in the GCK gene by using a wide array of evolution- and structure-based computational methods, such as SIFT, PolyPhen2, PhD-SNP, SNAP, SNPs&GO, fathmm, and Align GVGD. Based on the computational prediction scores obtained using these methods, three mutations, namely E70K, A188T, and W257R, were identified as highly deleterious on the basis of their effects on protein structure and function. Using the evolutionary conservation predictors Consurf and Scorecons, we further demonstrated that most of the predicted deleterious mutations, including E70K, A188T, and W257R, occur in highly conserved regions of GCK. The effects of the mutations on protein stability were computed using PoPMusic 2.1, I-mutant 3.0, and Dmutant. We also conducted molecular dynamics (MD) simulation analysis through in silico modelling to investigate the conformational differences between the native and the mutant proteins and found that the identified deleterious mutations alter the stability, flexibility, and solvent-accessible surface area of the protein. Furthermore, the functional role of each SNP in GCK was identified and characterised using SNPeffect 4.0, F-SNP, and FASTSNP. We hope that the observed results aid in the identification of disease-associated mutations that affect protein structure and function. Our in silico findings provide a new perspective on the role of GCK mutations in MODY2 from an evolution-based structure-centric point of view. The computational architecture described in this paper can be used to predict the most appropriate disease phenotypes for large-genome sequencing projects and to provide individualised drug therapy for complex diseases such as diabetes.
PLOS ONE | 2015
Nagasundaram N; Hailong Zhu; Jiming Liu; Karthick; George Priya Doss C; Chiranjib Chakraborty; Luonan Chen
The cyclin-dependent kinase 4 (CDK4)-cyclin D1 complex plays a crucial role in the transition from the G1 phase to S phase of the cell cycle. Among the CDKs, CDK4 is one of the genes most frequently affected by somatic genetic variations that are associated with various forms of cancer. Thus, because the abnormal function of the CDK4-cyclin D1 protein complex might play a vital role in causing cancer, CDK4 can be considered a genetically validated therapeutic target. In this study, we used a systematic, integrated computational approach to identify deleterious nsSNPs and predict their effects on protein-protein (CDK4-cyclin D1) and protein-ligand (CDK4-flavopiridol) interactions. This analysis resulted in the identification of possible inhibitors of mutant CDK4 proteins that bind the conformations induced by deleterious nsSNPs. Using computational prediction methods, we identified five nsSNPs as highly deleterious: R24C, Y180H, A205T, R210P, and R246C. From molecular docking and molecular dynamic studies, we observed that these deleterious nsSNPs affected CDK4-cyclin D1 and CDK4-flavopiridol interactions. Furthermore, in a virtual screening approach, the drug 5_7_DIHYDROXY_ 2_ (3_4_5_TRI HYDROXYPHENYL) _4H_CHROMEN_ 4_ONE displayed good binding affinity for proteins with the mutations R24C or R246C, the drug diosmin displayed good binding affinity for the protein with the mutation Y180H, and the drug rutin displayed good binding affinity for proteins with the mutations A205T and R210P. Overall, this computational investigation of the CDK4 gene highlights the link between genetic variation and biological phenomena in human cancer and aids in the discovery of molecularly targeted therapies for personalized treatment.
BMC Systems Biology | 2015
Guoxian Yu; Hailong Zhu; Carlotta Domeniconi; Maozu Guo
BackgroundHigh throughput techniques produce multiple functional association networks. Integrating these networks can enhance the accuracy of protein function prediction. Many algorithms have been introduced to generate a composite network, which is obtained as a weighted sum of individual networks. The weight assigned to an individual network reflects its benefit towards the protein functional annotation inference. A classifier is then trained on the composite network for predicting protein functions. However, since these techniques model the optimization of the composite network and the prediction tasks as separate objectives, the resulting composite network is not necessarily optimal for the follow-up protein function prediction.ResultsWe address this issue by modeling the optimization of the composite network and the prediction problems within a unified objective function. In particular, we use a kernel target alignment technique and the loss function of a network based classifier to jointly adjust the weights assigned to the individual networks. We show that the proposed method, called MNet, can achieve a performance that is superior (with respect to different evaluation criteria) to related techniques using the multiple networks of four example species (yeast, human, mouse, and fly) annotated with thousands (or hundreds) of GO terms.ConclusionMNet can effectively integrate multiple networks for protein function prediction and is robust to the input parameters. Supplementary data is available at https://sites.google.com/site/guoxian85/home/mnet. The Matlab code of MNet is available upon request.
IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2016
Guoxian Yu; Guangyuan Fu; Jun Wang; Hailong Zhu
Determining the biological functions of proteins is one of the key challenges in the post-genomic era. The rapidly accumulated large volumes of proteomic and genomic data drives to develop computational models for automatically predicting protein function in large scale. Recent approaches focus on integrating multiple heterogeneous data sources and they often get better results than methods that use single data source alone. In this paper, we investigate how to integrate multiple biological data sources with the biological knowledge, i.e., Gene Ontology (GO), for protein function prediction. We propose a method, called SimNet, to Semantically integrate multiple functional association Networks derived from heterogenous data sources. SimNet firstly utilizes GO annotations of proteins to capture the semantic similarity between proteins and introduces a semantic kernel based on the similarity. Next, SimNet constructs a composite network, obtained as a weighted summation of individual networks, and aligns the network with the kernel to get the weights assigned to individual networks. Then, it applies a network-based classifier on the composite network to predict protein function. Experiment results on heterogenous proteomic data sources of Yeast, Human, Mouse, and Fly show that, SimNet not only achieves better (or comparable) results than other related competitive approaches, but also takes much less time. The Matlab codes of SimNet are available at https://sites.google.com/site/guoxian85/simnet.
BMC Bioinformatics | 2015
Guoxian Yu; Hailong Zhu; Carlotta Domeniconi; Jiming Liu
BackgroundHigh-throughput bio-techniques accumulate ever-increasing amount of genomic and proteomic data. These data are far from being functionally characterized, despite the advances in gene (or gene’s product proteins) functional annotations. Due to experimental techniques and to the research bias in biology, the regularly updated functional annotation databases, i.e., the Gene Ontology (GO), are far from being complete. Given the importance of protein functions for biological studies and drug design, proteins should be more comprehensively and precisely annotated.ResultsWe proposed downward Random Walks (dRW) to predict missing (or new) functions of partially annotated proteins. Particularly, we apply downward random walks with restart on the GO directed acyclic graph, along with the available functions of a protein, to estimate the probability of missing functions. To further boost the prediction accuracy, we extend dRW to dRW-kNN. dRW-kNN computes the semantic similarity between proteins based on the functional annotations of proteins; it then predicts functions based on the functions estimated by dRW, together with the functions associated with the k nearest proteins. Our proposed models can predict two kinds of missing functions: (i) the ones that are missing for a protein but associated with other proteins of interest; (ii) the ones that are not available for any protein of interest, but exist in the GO hierarchy. Experimental results on the proteins of Yeast and Human show that dRW and dRW-kNN can replenish functions more accurately than other related approaches, especially for sparse functions associated with no more than 10 proteins.ConclusionThe empirical study shows that the semantic similarity between GO terms and the ontology hierarchy play important roles in predicting protein function. The proposed dRW and dRW-kNN can serve as tools for replenishing functions of partially annotated proteins.
Scientific Reports | 2016
N. Nagasundaram; C. George Priya Doss; Chiranjib Chakraborty; V. Karthick; D. Thirumal Kumar; Veeraraghavan Balaji; Ramamoorthy Siva; Aiping Lu; Zhang Ge; Hailong Zhu
Artemisinin resistance in Plasmodium falciparum threatens global efforts in the elimination or eradication of malaria. Several studies have associated mutations in the PfATP6 gene in conjunction with artemisinin resistance, but the underlying molecular mechanism of the resistance remains unexplored. Associated mutations act as a biomarker to measure the artemisinin efficacy. In the proposed work, we have analyzed the binding affinity and efficacy between PfATP6 and artemisinin in the presence of L263D, L263E and L263K mutations. Furthermore, we performed virtual screening to identify potential compounds to inhibit the PfATP6 mutant proteins. In this study, we observed that artemisinin binding affinity with PfATP6 gets affected by L263D, L263E and L263K mutations. This in silico elucidation of artemisinin resistance enhanced the identification of novel compounds (CID: 10595058 and 10625452) which showed good binding affinity and efficacy with L263D, L263E and L263K mutant proteins in molecular docking and molecular dynamics simulations studies. Owing to the high propensity of the parasite to drug resistance the need for new antimalarial drugs will persist until the malarial parasites are eventually eradicated. The two compounds identified in this study can be tested in in vitro and in vivo experiments as possible candidates for the designing of new potential antimalarial drugs.
