Zardad Khan
University of Essex
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Zardad Khan.
BMC Bioinformatics | 2014
Osama Mahmoud; Andrew P. Harrison; Aris Perperoglou; Asma Gul; Zardad Khan; Metodi V. Metodiev; Berthold Lausen
BackgroundMicroarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected discriminative genes. We propose a statistical method for selecting genes based on overlapping analysis of expression data across classes. This method results in a novel measure, called proportional overlapping score (POS), of a feature’s relevance to a classification task.ResultsWe apply POS, along‐with four widely used gene selection methods, to several benchmark gene expression datasets. The experimental results of classification error rates computed using the Random Forest, k Nearest Neighbor and Support Vector Machine classifiers show that POS achieves a better performance.ConclusionsA novel gene selection method, POS, is proposed. POS analyzes the expressions overlap across classes taking into account the proportions of overlapping samples. It robustly defines a mask for each gene that allows it to minimize the effect of expression outliers. The constructed masks along‐with a novel gene score are exploited to produce the selected subset of genes.
Advanced Data Analysis and Classification | 2016
Asma Gul; Aris Perperoglou; Zardad Khan; Osama Mahmoud; Miftahuddin Miftahuddin; Werner Adler; Berthold Lausen
Combining multiple classifiers, known as ensemble methods, can give substantial improvement in prediction performance of learning algorithms especially in the presence of non-informative features in the data sets. We propose an ensemble of subset of kNN classifiers, ESkNN, for classification task in two steps. Firstly, we choose classifiers based upon their individual performance using the out-of-sample accuracy. The selected classifiers are then combined sequentially starting from the best model and assessed for collective performance on a validation data set. We use bench mark data sets with their original and some added non-informative features for the evaluation of our method. The results are compared with usual kNN, bagged kNN, random kNN, multiple feature subset method, random forest and support vector machines. Our experimental comparisons on benchmark classification problems and simulated data sets reveal that the proposed ensemble gives better classification performance than the usual kNN and its ensembles, and performs comparable to random forest and support vector machines.
2nd European Conference on Data Analysis, ECDA 2014 | 2016
Zardad Khan; Asma Gul; Osama Mahmoud; Miftahuddin Miftahuddin; Aris Perperoglou; Werner Adler; Berthold Lausen
Machine learning methods can be used for estimating the class membership probability of an observation. We propose an ensemble of optimal trees in terms of their predictive performance. This ensemble is formed by selecting the best trees from a large initial set of trees grown by random forest. A proportion of trees is selected on the basis of their individual predictive performance on out-of-bag observations. The selected trees are further assessed for their collective performance on an independent training data set. This is done by adding the trees one by one starting from the highest predictive tree. A tree is selected for the final ensemble if it increases the predictive performance of the previously combined trees. The proposed method is compared with probability estimation tree, random forest and node harvest on a number of bench mark problems using Brier score as a performance measure. In addition to reducing the number of trees in the ensemble, our method gives better results in most of the cases. The results are supported by a simulation study.
Archive | 2016
Osama Mahmoud; Andrew P. Harrison; Asma Gul; Zardad Khan; Metodi V. Metodiev; Berthold Lausen
For many functional genomic experiments, identifying the most characterizing genes is a main challenge. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on a set of discriminative genes. Analyzing overlapping between gene expression of different classes is an effective criterion for identifying relevant genes. However, genes selected according to maximizing a relevance score could have rich redundancy. We propose a scheme for minimizing selection redundancy, in which the Proportional Overlapping Score (POS) technique is extended by using a recursive approach to assign a set of complementary discriminative genes. The proposed scheme exploits the gene masks defined by POS to identify more integrated genes in terms of their classification patterns. The approach is validated by comparing its classification performance with other feature selection methods, Wilcoxon Rank Sum, mRMR, MaskedPainter and POS, for several benchmark gene expression datasets using three different classifiers: Random Forest; k Nearest Neighbour; Support Vector Machine. The experimental results of classification error rates show that our proposal achieves a better performance.
Methods of Information in Medicine | 2016
Werner Adler; Olaf Gefeller; Asma Gul; Folkert K. Horn; Zardad Khan; Berthold Lausen
BACKGROUND Random forests are successful classifier ensemble methods consisting of typically 100 to 1000 classification trees. Ensemble pruning techniques reduce the computational cost, especially the memory demand, of random forests by reducing the number of trees without relevant loss of performance or even with increased performance of the sub-ensemble. The application to the problem of an early detection of glaucoma, a severe eye disease with low prevalence, based on topographical measurements of the eye background faces specific challenges. OBJECTIVES We examine the performance of ensemble pruning strategies for glaucoma detection in an unbalanced data situation. METHODS The data set consists of 102 topographical features of the eye background of 254 healthy controls and 55 glaucoma patients. We compare the area under the receiver operating characteristic curve (AUC), and the Brier score on the total data set, in the majority class, and in the minority class of pruned random forest ensembles obtained with strategies based on the prediction accuracy of greedily grown sub-ensembles, the uncertainty weighted accuracy, and the similarity between single trees. To validate the findings and to examine the influence of the prevalence of glaucoma in the data set, we additionally perform a simulation study with lower prevalences of glaucoma. RESULTS In glaucoma classification all three pruning strategies lead to improved AUC and smaller Brier scores on the total data set with sub-ensembles as small as 30 to 80 trees compared to the classification results obtained with the full ensemble consisting of 1000 trees. In the simulation study, we were able to show that the prevalence of glaucoma is a critical factor and lower prevalence decreases the performance of our pruning strategies. CONCLUSIONS The memory demand for glaucoma classification in an unbalanced data situation based on random forests could effectively be reduced by the application of pruning strategies without loss of performance in a population with increased risk of glaucoma.
ECDA | 2016
Osama Mahmoud; Andrew P. Harrison; Asma Gul; Zardad Khan; Metodi V. Metodiev; Berthold Lausen
For many functional genomic experiments, identifying the most characterizing genes is a main challenge. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on a set of discriminative genes. Analyzing overlapping between gene expression of different classes is an effective criterion for identifying relevant genes. However, genes selected according to maximizing a relevance score could have rich redundancy. We propose a scheme for minimizing selection redundancy, in which the Proportional Overlapping Score (POS) technique is extended by using a recursive approach to assign a set of complementary discriminative genes. The proposed scheme exploits the gene masks defined by POS to identify more integrated genes in terms of their classification patterns. The approach is validated by comparing its classification performance with other feature selection methods, Wilcoxon Rank Sum, mRMR, MaskedPainter and POS, for several benchmark gene expression datasets using three different classifiers: Random Forest; k Nearest Neighbour; Support Vector Machine. The experimental results of classification error rates show that our proposal achieves a better performance.
2nd European Conference on Data Analysis, ECDA 2014 | 2016
Osama Mahmoud; Andrew P. Harrison; Asma Gul; Zardad Khan; Metodi V. Metodiev; Berthold Lausen
For many functional genomic experiments, identifying the most characterizing genes is a main challenge. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on a set of discriminative genes. Analyzing overlapping between gene expression of different classes is an effective criterion for identifying relevant genes. However, genes selected according to maximizing a relevance score could have rich redundancy. We propose a scheme for minimizing selection redundancy, in which the Proportional Overlapping Score (POS) technique is extended by using a recursive approach to assign a set of complementary discriminative genes. The proposed scheme exploits the gene masks defined by POS to identify more integrated genes in terms of their classification patterns. The approach is validated by comparing its classification performance with other feature selection methods, Wilcoxon Rank Sum, mRMR, MaskedPainter and POS, for several benchmark gene expression datasets using three different classifiers: Random Forest; k Nearest Neighbour; Support Vector Machine. The experimental results of classification error rates show that our proposal achieves a better performance.
2nd European Conference on Data Analysis, ECDA 2014 | 2016
Asma Gul; Zardad Khan; Aris Perperoglou; Osama Mahmoud; Miftahuddin Miftahuddin; Werner Adler; Berthold Lausen
Combining multiple classifiers can give substantial improvement in prediction performance of learning algorithms especially in the presence of non-informative features in the data sets. This technique can also be used for estimating class membership probabilities. We propose an ensemble of k-Nearest Neighbours (kNN) classifiers for class membership probability estimation in the presence of non-informative features in the data. This is done in two steps. Firstly, we select classifiers based upon their individual performance from a set of base kNN models, each generated on a bootstrap sample using a random feature set from the feature space of training data. Secondly, a step wise selection is used on the selected learners, and those models are added to the ensemble that maximize its predictive performance. We use bench mark data sets with some added non-informative features for the evaluation of our method. Experimental comparison of the proposed method with usual kNN, bagged kNN, random kNN and random forest shows that it leads to high predictive performance in terms of minimum Brier score on most of the data sets. The results are also verified by simulation studies.
Archive | 2016
Zardad Khan; Asma Gul; Aris Perperoglou; Miftahuddin Miftahuddin; Osama Mahmoud; Werner Adler; Berthold Lausen
International Federation of Classification Societies | 2015
Berthold Lausen; Asma Gul; Zardad Khan; Osama Mahmoud