Dong-Jun Yu
Nanjing University of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dong-Jun Yu.
Information Sciences | 2008
Xibei Yang; Jingyu Yang; Chen Wu; Dong-Jun Yu
Many methods based on the rough set to deal with incomplete information systems have been proposed in recent years. However, they are only suitable for the incomplete systems with regular attributes whose domains are not preference-ordered. This paper thus attempts to present research focusing on a complex incomplete information system-the incomplete ordered information system. In such incomplete information systems, all attributes are considered as criterions. A criterion indicates an attribute with preference-ordered domain. To conduct classification analysis in the incomplete ordered information system, the concept of similarity dominance relation is first proposed. Two types of knowledge reductions are then formed for preserving two different notions of similarity dominance relations. With introduction of the approximate distribution reduct into the incomplete ordered decision system, the judgment theorems and discernibility matrixes associated with four novel approximate distribution reducts are obtained. A numerical example is employed to substantiate the conceptual arguments.
Analytical Biochemistry | 2016
Zi Liu; Xuan Xiao; Dong-Jun Yu; Jianhua Jia; Wang-Ren Qiu; Kuo-Chen Chou
Just like PTM or PTLM (post-translational modification) in proteins, PTCM (post-transcriptional modification) in RNA plays very important roles in biological processes. Occurring at adenine (A) with the genetic code motif (GAC), N(6)-methyldenosine (m(6)A) is one of the most common and abundant PTCMs in RNA found in viruses and most eukaryotes. Given an uncharacterized RNA sequence containing many GAC motifs, which of them can be methylated, and which cannot? It is important for both basic research and drug development to address this problem. Particularly with the avalanche of RNA sequences generated in the postgenomic age, it is highly demanded to develop computational methods for timely identifying the N(6)-methyldenosine sites in RNA. Here we propose a new predictor called pRNAm-PC, in which RNA sequence samples are expressed by a novel mode of pseudo dinucleotide composition (PseDNC) whose components were derived from a physical-chemical matrix via a series of auto-covariance and cross covariance transformations. It was observed via a rigorous jackknife test that, in comparison with the existing predictor for the same purpose, pRNAm-PC achieved remarkably higher success rates in both overall accuracy and stability, indicating that the new predictor will become a useful high-throughput tool for identifying methylation sites in RNA, and that the novel approach can also be used to study many other RNA-related problems and conduct genome analysis. A user-friendly Web server for pRNAm-PC has been established at http://www.jci-bioinfo.cn/pRNAm-PC, by which users can easily get their desired results without needing to go through the mathematical details.
data and knowledge engineering | 2009
Xibei Yang; Dong-Jun Yu; Jingyu Yang; Lihua Wei
Since preference order is a crucial feature of data concerning decision situations, the classical rough set model has been generalized by replacing the indiscernibility relation with a dominance relation. The purpose of this paper is to further investigate the dominance-based rough set in incomplete interval-valued information system, which contains both incomplete and imprecise evaluations of objects. By considering three types of unknown values in the incomplete interval-valued information system, a data complement method is used to transform the incomplete interval-valued information system into a traditional one. To generate the optimal decision rules from the incomplete interval-valued decision system, six types of relative reducts are proposed. Not only the relationships between these reducts but also the practical approaches to compute these reducts are then investigated. Some numerical examples are employed to substantiate the conceptual arguments.
international conference of fuzzy information and engineering | 2007
Xibei Yang; Dong-Jun Yu; Jingyu Yang; Chen Wu
The traditional soft set is a mapping from parameter to the crisp subset of universe. However, the situation may be more complex in real world because the fuzzy characters of parameters. In this paper, the traditional soft set theory is expanded to be a fuzzy one, the fuzzy membership is used to describe parameter-approximate elements of fuzzy soft set. Furthermore, basic fuzzy logic operators are used to define generalized operators on fuzzy soft set and then the DeMorgan’s laws are proved. Finally, the parametrization reduction of fuzzy soft set is defined, a decision-making problem is analyzed to indicate the validity of the fuzzy soft set.
Journal of Computational Chemistry | 2013
Dong-Jun Yu; Jun Hu; Yan Huang; Hong-Bin Shen; Yong Qi; Zhenmin Tang; Jingyu Yang
Understanding the interactions between proteins and ligands is critical for protein function annotations and drug discovery. We report a new sequence‐based template‐free predictor (TargetATPsite) to identify the Adenosine‐5′‐triphosphate (ATP) binding sites with machine‐learning approaches. Two steps are implemented in TargetATPsite: binding residues and pockets predictions, respectively. To predict the binding residues, a novel image sparse representation technique is proposed to encode residue evolution information treated as the input features. An ensemble classifier constructed based on support vector machines (SVM) from multiple random under‐samplings is used as the prediction model, which is effective for dealing with imbalance phenomenon between the positive and negative training samples. Compared with the existing ATP‐specific sequence‐based predictors, TargetATPsite is featured by the second step of possessing the capability of further identifying the binding pockets from the predicted binding residues through a spatial clustering algorithm. Experimental results on three benchmark datasets demonstrate the efficacy of TargetATPsite.
BMC Bioinformatics | 2012
Ya-Nan Zhang; Dong-Jun Yu; Shu-Sen Li; Yong-Xian Fan; Yan Huang; Hong-Bin Shen
BackgroundAdenosine-5′-triphosphate (ATP) is one of multifunctional nucleotides and plays an important role in cell biology as a coenzyme interacting with proteins. Revealing the binding sites between protein and ATP is significantly important to understand the functionality of the proteins and the mechanisms of protein-ATP complex.ResultsIn this paper, we propose a novel framework for predicting the proteins’ functional residues, through which they can bind with ATP molecules. The new prediction protocol is achieved by combination of sequence evolutional information and bi-profile sampling of multi-view sequential features and the sequence derived structural features. The hypothesis for this strategy is single-view feature can only represent partial target’s knowledge and multiple sources of descriptors can be complementary.ConclusionsPrediction performances evaluated by both 5-fold and leave-one-out jackknife cross-validation tests on two benchmark datasets consisting of 168 and 227 non-homologous ATP binding proteins respectively demonstrate the efficacy of the proposed protocol. Our experimental results also reveal that the residue structural characteristics of real protein-ATP binding sites are significant different from those normal ones, for example the binding residues do not show high solvent accessibility propensities, and the bindings prefer to occur at the conjoint points between different secondary structure segments. Furthermore, results also show that performance is affected by the imbalanced training datasets by testing multiple ratios between positive and negative samples in the experiments. Increasing the dataset scale is also demonstrated useful for improving the prediction performances.
IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2013
Dong-Jun Yu; Jun Hu; Jing Yang; Hong-Bin Shen; Jinhui Tang; Jingyu Yang
Accurately identifying the protein-ligand binding sites or pockets is of significant importance for both protein function analysis and drug design. Although much progress has been made, challenges remain, especially when the 3D structures of target proteins are not available or no homology templates can be found in the library, where the template-based methods are hard to be applied. In this paper, we report a new ligand-specific template-free predictor called TargetS for targeting protein-ligand binding sites from primary sequences. TargetS first predicts the binding residues along the sequence with ligand-specific strategy and then further identifies the binding sites from the predicted binding residues through a recursive spatial clustering algorithm. Protein evolutionary information, predicted protein secondary structure, and ligand-specific binding propensities of residues are combined to construct discriminative features; an improved AdaBoost classifier ensemble scheme based on random undersampling is proposed to deal with the serious imbalance problem between positive (binding) and negative (nonbinding) samples. Experimental results demonstrate that TargetS achieves high performances and outperforms many existing predictors. TargetS web server and data sets are freely available at: http://www.csbio.sjtu.edu.cn/bioinf/TargetS/ for academic use.
Information Sciences | 2015
Xibei Yang; Yong Qi; Dong-Jun Yu; Hualong Yu; Jingyu Yang
Though rough set has been widely used to study systems characterized by insufficient and incomplete information, its performance in dealing with initial interval-valued data needs to be seriously considered for improving the suitability and scalability. The aim of this paper is to present a parameterized dominance-based rough set approach to interval-valued information systems. First, by considering the degree that an interval-valued data is dominating another one, we propose the concept of α-dominance relation. Second, we present the α-dominance based rough set model in interval-valued decision systems. Finally, we introduce lower and upper approximate reducts into α-dominance based rough set for simplifying decision rules, we also present the judgement theorems and discernibility functions, which describe how lower and upper approximate reducts can be calculated. This study suggests potential application areas and new research trends concerning rough set approach to interval-valued information systems.
Neurocomputing | 2013
Dong-Jun Yu; Jun Hu; Zhenmin Tang; Hong-Bin Shen; Jian Yang; Jingyu Yang
Correctly localizing the protein-ATP binding residues is valuable for both basic experimental biology and drug discovery studies. Protein-ATP binding residues prediction is a typical imbalanced learning problem as the size of minority class (binding residues) is far less than that of majority class (non-binding residues) in the entire sequence. Directly applying the traditional machine learning approach for this task is not suitable as the learning results will be severely biased towards the majority class. To circumvent this problem, a modified AdaBoost ensemble scheme based on random under-sampling is developed. In addition, effectiveness of different features for protein-ATP binding residues prediction is systematically analyzed and a method for objectively reporting evaluation results under the imbalanced learning scenario is also discussed. Experimental results on three benchmark datasets show that the proposed method achieves higher prediction accuracy. The proposed method, called TargetATP, has been implemented with Java programming language and is distributed via Java Web Start technology. TargetATP and the datasets used are freely available at http://www.csbio.sjtu.edu.cn/bioinf/targetATP/ for academicuse.
Knowledge Based Systems | 2016
Suping Xu; Xibei Yang; Hualong Yu; Dong-Jun Yu; Jingyu Yang; Eric C. C. Tsang
We propose two multi-label learning approaches with LIFT reduction.The idea of fuzzy rough set attribute reduction is adopted in our approaches.Sample selection improves the efficiency in feature dimension reduction. In multi-label learning, since different labels may have some distinct characteristics of their own, multi-label learning approach with label-specific features named LIFT has been proposed. However, the construction of label-specific features may encounter the increasing of feature dimensionalities and a large amount of redundant information exists in feature space. To alleviate this problem, a multi-label learning approach FRS-LIFT is proposed, which can implement label-specific feature reduction with fuzzy rough set. Furthermore, with the idea of sample selection, another multi-label learning approach FRS-SS-LIFT is also presented, which effectively reduces the computational complexity in label-specific feature reduction. Experimental results on 10 real-world multi-label data sets show that, our methods can not only reduce the dimensionality of label-specific features when compared with LIFT, but also achieve satisfactory performance among some popular multi-label learning approaches.