Jian-Yu Shi
Northwestern Polytechnical University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jian-Yu Shi.
BMC Bioinformatics | 2016
Jian-Yu Shi; Jia-Xin Li; Hui-Meng Lu
BackgroundThere has been paid more and more attention to supervised classification models in the area of predicting drug-target interactions (DTIs). However, in terms of classification, unavoidable missing DTIs in data would cause three issues which have not yet been addressed appropriately by former approaches. Directly labeled as negatives (non-DTIs), missing DTIs increase the confusion of positives (DTIs) and negatives, aggravate the imbalance between few positives and many negatives, and are usually discriminated as highly-scored false positives, which influence the existing measures sharply.ResultsUnder the framework of local classification model (LCM), this work focuses on the scenario of predicting how possibly a new drug interacts with known targets. To address the first two issues, two strategies, Spy and Super-target, are introduced accordingly and further integrated to form a two-layer LCM. In the bottom layer, Spy-based local classifiers for protein targets are built by positives, as well as reliable negatives identified among unlabeled drug-target pairs. In the top layer, regular local classifiers specific to super-targets are built with more positives generated by grouping similar targets and their interactions. Furthermore, to handle the third issue, an additional performance measure, Coverage, is presented for assessing DTI prediction. The experiments based on benchmark datasets are finally performed under five-fold cross validation of drugs to evaluate this approach. The main findings are concluded as follows. (1) Both two individual strategies and their combination are effective to missing DTIs, and the combination wins the best. (2) Having the advantages of less confusing decision boundary at the bottom layer and less biased decision boundary at the top layer, our two-layer LCM outperforms two former approaches. (3) Coverage is more robust to missing interactions than other measures and is able to evaluate how far one needs to go down the list of targets to cover all the proper targets of a drug.ConclusionsProposing two strategies and one performance measure, this work has addressed the issues derived from missing interactions, which cause confusing and biased decision boundaries in classifiers, as well as the inappropriate measure of predicting performance, in the scenario of predicting interactions between new drugs and known targets.
BMC Medical Genomics | 2017
Jian-Yu Shi; Hua Huang; Yanning Zhang; Yu-Xi Long; Siu-Ming Yiu
BackgroundIn human genomes, long non-coding RNAs (lncRNAs) have attracted more and more attention because their dysfunctions are involved in many diseases. However, the associations between lncRNAs and diseases (LDA) still remain unknown in most cases. While identifying disease-related lncRNAs in vivo is costly, computational approaches are promising to not only accelerate the possible identification of associations but also provide clues on the underlying mechanism of various lncRNA-caused diseases. Former computational approaches usually only focus on predicting new associations between lncRNAs having known associations with diseases and other lncRNA-associated diseases. They also only work on binary lncRNA-disease associations (whether the pair has an association or not), which cannot reflect and reveal other biological facts, such as the number of proteins involved in LDA or how strong the association is (i.e., the intensity of LDA).ResultsTo address abovementioned issues, we propose a graph regression-based unified framework (GRUF). In particular, our method can work on lncRNAs, which have no previously known disease association and diseases that have no known association with any lncRNAs. Also, instead of only a binary answer for the association, our method tries to uncover more biological relationship between a pair of lncRNA and disease, which may provide better clues for researchers. We compared GRUF with three state-of-the-art approaches and demonstrated the superiority of GRUF, which achieves 5%~16% improvement in terms of the area under the receiver operating characteristic curve (AUC). GRUF also provides a predicted confidence score for the predicted LDA, which reveals the significant correlation between the score and the number of RNA-Binding Proteins involved in LDAs. Lastly, three out of top-5 LDA candidates generated by GRUF in novel prediction are verified indirectly by medical literature and known biological facts.ConclusionsThe proposed GRUF has two advantages over existing approaches. Firstly, it can be used to work on lncRNAs that have no known disease association and diseases that have no known association with any lncRNAs. Secondly, instead of providing a binary answer (with or without association), GRUF works for both discrete and continued LDA, which help revealing the pathological implications between lncRNAs and diseases.
BMC Bioinformatics | 2017
Jian-Yu Shi; Jia-Xin Li; Ke Gao; Peng Lei; Siu-Ming Yiu
BackgroundDrug Combination is one of the effective approaches for treating complex diseases. However, determining combinative drug pairs in clinical trials is still costly. Thus, computational approaches are used to identify potential drug pairs in advance. Existing computational approaches have the following shortcomings: (i) the lack of an effective integration of heterogeneous features leads to a time-consuming training and even results in an over-fitted classifier; and (ii) the narrow consideration of predicting potential drug combinations only among known drugs having known combinations cannot meet the demand of realistic screenings, which pay more attention to potential combinative pairs among newly-coming drugs that have no approved combination with other drugs at all.ResultsIn this paper, to tackle the above two problems, we propose a novel drug-driven approach for predicting potential combinative pairs on a large scale. We define four new features based on heterogeneous data and design an efficient fusion scheme to integrate these feature. Moreover importantly, we elaborate appropriate cross-validations towards realistic screening scenarios of drug combinations involving both known drugs and new drugs. In addition, we perform an extra investigation to show how each kind of heterogeneous features is related to combinative drug pairs. The investigation inspires the design of our approach. Experiments on real data demonstrate the effectiveness of our fusion scheme for integrating heterogeneous features and its predicting power in three scenarios of realistic screening. In terms of both AUC and AUPR, the prediction among known drugs achieves 0.954 and 0.821, that between known drugs and new drugs achieves 0.909 and 0.635, and that among new drugs achieves 0.809 and 0.592 respectively.ConclusionsOur approach provides not only an effective tool to integrate heterogeneous features but also the first tool to predict potential combinative pairs among new drugs.
international conference on bioinformatics and biomedical engineering | 2017
Jian-Yu Shi; Hua Huang; Jia-Xin Li; Peng Lei; Yanning Zhang; Siu-Ming Yiu
There is an urgent need to discover or deduce drug-drug interactions (DDIs), which would cause serious adverse drug reactions. However, preclinical detection of DDIs bears a high cost. Machine learning-based computational approaches can be the assistance of experimental approaches. Utilizing pre-market drug properties (e.g. side effects), they are able to predict DDIs on a large scale before drugs enter the market. However, none of them can predict comprehensive DDIs, including enhancive and degressive DDIs, though it is important to know whether the interaction increases or decreases the behavior of the interacting drugs before making a co-prescription. Furthermore, existing computational approaches focus on predicting DDIs for new drugs that have none of existing interactions. However, none of them can predict DDIs among those new drugs. To address these issues, we first build a comprehensive dataset of DDIs, which contains both enhancive and degressive DDIs, and the side effects of the involving drugs in DDIs. Then we propose an algorithm of Triple Matrix Factorization and design a Unified Framework of DDI prediction based on it (TMFUF). The proposed approach is able to predict not only conventional binary DDIs but also comprehensive DDIs. Moreover, it provides a unified solution for the scenario that predicting potential DDIs for newly given drugs (having no known interaction at all), as well as the scenario that predicting potential DDIs among these new drugs. Finally, the experiments demonstrate that TMFUF is significantly superior to three state-of-the-art approaches in the conventional binary DDI prediction and also shows an acceptable performance in the comprehensive DDI prediction.
bioinformatics and biomedicine | 2014
Jian-Yu Shi; Siu-Ming Yiu; Yiming Li; Henry C. M. Leung; Francis Y. L. Chin
Predicting drug-target interaction using computational approaches is an important step in drug discovery and repositioning. To predict whether there will be an interaction between a drug and a target, most existing methods identify similar drugs and targets in the database. The prediction is then made based on the known interactions of these drugs and targets. This idea is promising. However, there are two shortcomings that have not yet been addressed appropriately. Firstly, most of the methods only use 2D chemical structures and protein sequences to measure the similarity of drugs and targets respectively. However, this information may not fully capture the characteristics determining whether a drug will interact with a target. Secondly, there are very few known interactions, i.e. many interactions are “missing” in the database. Existing approaches are biased towards known interactions and have no good solutions to handle possibly missing interactions which affect the accuracy of the prediction. In this paper, we enhance the similarity measures to include non-structural (and non-sequence-based) information and introduce the concept of a “super-target” to handle the problem of possibly missing interactions. Based on evaluations on real data, we show that our similarity measure is better than the existing measures and our approach is able to achieve higher accuracy than the two best existing algorithms, WNN-GIP and KBMF2K.
bioinformatics and biomedicine | 2016
Jian-Yu Shi; Ke Gao; Xuequn Shang; Siu-Ming Yiu
There is an urgent need to discover or predict DDIs, which would cause serious adverse drug reactions. However, preclinical detection of DDIs bear high cost. Similarity-based computational approaches can be the assistance of experimental approaches. Utilizing pre-market drug similarities, they are able to predict DDIs on a large scale. However, they neglect the topological structure among DDIs and non-DDIs and have a burden of slow training and much memory. Or, they bear the bias that the pairs between a newly-given drug and the drugs having many DDIs tend to obtain high ranks. More importantly, they lack an effective combination of multiple predictions. To address these issues, we develop a local classification-based model (LCM), which has the advantages of faster training, less memory requirement as well as no that bias. We further design a novel supervised algorithm of fusion based on Dempster-Shafer (DS) theory of evidence for combine multiple predictions. Finally, the experiments demonstrate that our LCM-DS is significantly superior to three state-of-the-art approaches and outperforms both individual LCMs and classical fusion algorithms.
Current Protein & Peptide Science | 2016
Jian-Yu Shi; Jia-Xin Li; Bo-Lin Chen; Yong Zhang
BACKGROUND Experimental approaches to identify drug-target interactions (DTIs) among a large number of chemical compounds and proteins are still costly and time-consuming. As an assistant, computational approaches are able to rapidly infer potential drug or target candidates for diverse screenings on a large scale. The most difficult scenario (S4) of screenings tries to explore the pairwise interacting candidates between newly designed chemical compounds (new potential drugs) and proteins (new target candidates). Few of current computational approaches can be applied to the inference of potential DTIs in S4 because the new potential drugs have no known target and the new target candidates have no existing drug at all. In addition, due to the essential issues among DTI, such as missing DTIs and the imbalance between few approved DTIs and many unknown drug-target pairs, existing metrics of DTI inference may not reflect the performance of inferring approaches fairly. METHODS To address these issues, this paper develops three instance neighborhood-based models: individualto- individual (I2I), individual-to-group (I2G) and nearest-neighbor-zone (NNZ). In I2I, if a new drug tends to interact with individual targets similar to a new target of interest, it likely interacts with the new target. In I2G, the new drug possibly interacts with the new target if it tends to interact with a target group, in which member targets are similar to each other and one or more of them are similar to the new target. In NNZ, the pair of the new drug and the new target is a potential DTI if it is similar to known existing DTIs. This paper also designs a topological dense index to guide the selection of the appropriate models when given different datasets. Moreover, an additional metric Coverage is introduced to enhance the assessment of DTI inference. RESULTS Performed on four benchmark datasets, our models demonstrate that the instance neighborhood can improve the DTI inference significantly. Under the guidance of our topological dense index, the best models for the datasets are chosen and achieve inspiring performances, including ~85%, ~81%, ~86% and ~81% in terms of AUC and ~29%, ~32%, ~32% and ~33% in terms of AUPR respectively. The superiority of our models is demonstrated by both the comparison with two state-of-the-art approaches and the novel DTI inference. CONCLUSION By leveraging the instance neighborhood, our models are able to infer DTIs in the most difficult scenario S4. Moreover, our topological dense index can guide the appropriate models when given different datasets.
PLOS ONE | 2013
Jian-Yu Shi; Siu-Ming Yiu; Yanning Zhang; Francis Y. L. Chin
Imaging processing techniques have been shown to be useful in studying protein domain structures. The idea is to represent the pairwise distances of any two residues of the structure in a 2D distance matrix (DM). Features and/or submatrices are extracted from this DM to represent a domain. Existing approaches, however, may involve a large number of features (100–400) or complicated mathematical operations. Finding fewer but more effective features is always desirable. In this paper, based on some key observations on DMs, we are able to decompose a DM image into four basic binary images, each representing the structural characteristics of a fundamental secondary structure element (SSE) or a motif in the domain. Using the concept of moments in image processing, we further derive 45 structural features based on the four binary images. Together with 4 features extracted from the basic images, we represent the structure of a domain using 49 features. We show that our feature vectors can represent domain structures effectively in terms of the following. (1) We show a higher accuracy for domain classification. (2) We show a clear and consistent distribution of domains using our proposed structural vector space. (3) We are able to cluster the domains according to our moment features and demonstrate a relationship between structural variation and functional diversity.
Scientific Reports | 2018
Jian-Yu Shi; Xuequn Shang; Ke Gao; Shao-Wu Zhang; Siu-Ming Yiu
Drug-drug interactions (DDIs) may trigger adverse drug reactions, which endanger the patients. DDI identification before making clinical medications is critical but bears a high cost in clinics. Computational approaches, including global model-based and local model based, are able to screen DDI candidates among a large number of drug pairs by utilizing preliminary characteristics of drugs (e.g. drug chemical structure). However, global model-based approaches are usually slow and don’t consider the topological structure of DDI network, while local model-based approaches have the degree-induced bias that a new drug tends to link to the drug having many DDI. All of them lack an effective ensemble method to combine results from multiple predictors. To address the first two issues, we propose a local classification-based model (LCM), which considers the topology of DDI network and has the relaxation of the degree-induced bias. Furthermore, we design a novel supervised fusion rule based on the Dempster-Shafer theory of evidence (LCM-DS), which aggregates the results from multiple LCMs. To make the final prediction, LCM-DS integrates three aspects from multiple classifiers, including the posterior probabilities output by individual classifiers, the proximity between their instance decision profiles and their reference profiles, as well as the quality of their reference profiles. Last, the substantial comparison with three state-of-the-art approaches demonstrates the effectiveness of our LCM, and the comparison with both individual LCM implementations and classical fusion algorithms exhibits the superiority of our LCM-DS.
BMC Bioinformatics | 2018
Jian-Yu Shi; Hua Huang; Yanning Zhang; Jiang-Bo Cao; Siu-Ming Yiu
BackgroundHuman Microbiome Project reveals the significant mutualistic influence between human body and microbes living in it. Such an influence lead to an interesting phenomenon that many noninfectious diseases are closely associated with diverse microbes. However, the identification of microbe-noninfectious disease associations (MDAs) is still a challenging task, because of both the high cost and the limitation of microbe cultivation. Thus, there is a need to develop fast approaches to screen potential MDAs. The growing number of validated MDAs enables us to meet the demand in a new insight. Computational approaches, especially machine learning, are promising to predict MDA candidates rapidly among a large number of microbe-disease pairs with the advantage of no limitation on microbe cultivation. Nevertheless, a few computational efforts at predicting MDAs are made so far.ResultsIn this paper, grouping a set of MDAs into a binary MDA matrix, we propose a novel predictive approach (BMCMDA) based on Binary Matrix Completion to predict potential MDAs. The proposed BMCMDA assumes that the incomplete observed MDA matrix is the summation of a latent parameterizing matrix and a noising matrix. It also assumes that the independently occurring subscripts of observed entries in the MDA matrix follows a binomial model. Adopting a standard mean-zero Gaussian distribution for the nosing matrix, we model the relationship between the parameterizing matrix and the MDA matrix under the observed microbe-disease pairs as a probit regression. With the recovered parameterizing matrix, BMCMDA deduces how likely a microbe would be associated with a particular disease. In the experiment under leave-one-out cross-validation, it exhibits the inspiring performance (AUC = 0.906, AUPR =0.526) and demonstrates its superiority by ~ 7% and ~ 5% improvements in terms of AUC and AUPR respectively in the comparison with the pioneering approach KATZHMDA.ConclusionsOur BMCMDA provides an effective approach for predicting MDAs and can be also extended to other similar predicting tasks of binary relationship (e.g. protein-protein interaction, drug-target interaction).