Is this you? Create Your Porfile

Kashif Javed

University of Engineering and Technology, Lahore

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kashif Javed is active.

Explore More

Publication

Featured researches published by Kashif Javed.

Neurocomputing | 2015

A two-stage Markov blanket based feature selection algorithm for text classification

Kashif Javed; Sameen Maruf; Haroon Atique Babri

Abstract Designing a good feature selection (FS) algorithm is of utmost importance especially for text classification (TC), wherein a large number of features representing terms or words pose serious challenges to the effectiveness and efficiency of classifiers. FS algorithms are divided into two broad categories, namely, feature ranking (FR) and feature subset selection (FSS) algorithms. Unlike FSS, FR algorithms select those features that are individually highly relevant for the class or category without taking the feature interactions into account. This makes FR algorithms simple and computationally more efficient than FSS and thus, mostly a preferred choice for TC. Bi-normal separation (BNS) (Forman, 2003) and information gain (IG) (Yang and Pedersen, 1997) are well-known FR metrics. However, FR algorithms output a set of highly relevant features or terms which can possibly be redundant and can thus, deteriorate a classifier׳s performance. This paper suggests taking the interactions of words into account in order to eliminate redundant terms. Stand-alone FSS algorithms can be computationally expensive for the high-dimensional text data. We therefore suggest a two-stage FS algorithm, which employs an FR metric such as BNS or IG in the first stage and an FSS algorithm such as the Markov blanket filter (MBF) (Koller and Sahami, 1996) in the second stage. Most of the two-stage algorithms proposed in the literature for TC combine feature ranking and feature transformation such as principal component analysis (PCA) algorithms. To estimate the statistical significance of our two-stage algorithm, we carry out experiments on 10 different splits of training and test sets of each of the three (Reuters-21578, TREC, OHSUMED) data sets with naive Bayes׳ and support vector machines. Our results based on a paired two-sided t-test show that the macro F1 performance of BNS+MBF is statistically significant than that of stand-alone BNS in 69% of the total experimental trials. The macro F1 values of IG get enhanced in 72% of the trials when MBF is used in the second stage. We also compare our two-stage algorithm against two recently proposed FS algorithms, namely, distinguishing feature selector (DFS) (Uysal and Gunal, 2012) and a two stage algorithm consisting of IG and PCA algorithms (Uguz, 2011). BNS+MBF is found to be significantly better than DFS and IG+PCA in 74 and 78% of the trials respectively. IG+MBF outperforms DFS and IG+PCA in 93 and 80% of the experimental trials respectively. Similar results are observed for BNS+MBF and IG+MBF when the performances are evaluated in terms of balanced error rate.

Expert Systems With Applications | 2015

Relative discrimination criterion - A novel feature ranking method for text data

Abdur Rehman; Kashif Javed; Haroon Atique Babri; Mehreen Saeed

Discussed characteristics of text data.Indicated that term counts are being ignored to calculated term rank.Proposed new feature ranking algorithm (RDC) which considers term counts.Compared performance of RDC with four feature ranking metrics on four datasets.RDC show highest performance in 65% of the classification cases. High dimensionality of text data hinders the performance of classifiers making it necessary to apply feature selection for dimensionality reduction. Most of the feature ranking metrics for text classification are based on document frequencies (df) of a term in positive and negative classes. Considering only document frequencies to rank features favors terms frequently occurring in larger classes in unbalanced datasets. In this paper we introduce a new feature ranking metric termed as relative discrimination criterion (RDC), which takes document frequencies for each term count of a term into account while estimating the usefulness of a term. The performance of RDC is compared with four well known feature ranking metrics, information gain (IG), CHI squared (CHI), odds ratio (OR) and distinguishing feature selector (DFS) using support vector machines (SVM) and multinomial naive Bayes (MNB) classifiers on four benchmark datasets, namely Reuters, 20 Newsgroups and two subsets of Ohsumed dataset. Our results based on macro and micro F1 measures show that the performance of RDC is superior than the other four metrics in 65% of our experimental trials. Also, RDC attains highest macro and micro F1 values in 69% of the cases.

Information Processing and Management | 2017

Feature selection based on a normalized difference measure for text classification

Abdur Rehman; Kashif Javed; Haroon Atique Babri

We analyzed Balanced Accuracy (ACC2)feature ranking metrics and identified its draw backs.We proposed to normalize Balanced Accuracy by minimum of tpr and fpr values.We compared results of proposed feature ranking metric with seven well known feature ranking metrics on seven datasets.Newly proposed metric outperforms in more than 60% cases of our experimental trials. The goal of feature selection in text classification is to choose highly distinguishing features for improving the performance of a classifier. The well-known text classification feature selection metric named balanced accuracy measure (ACC2) (Forman, 2003) evaluates a term by taking the difference of its document frequency in the positive class (also known as true positives) and its document frequency in the negative class (also known as false positives). This however results in assigning equal ranks to terms having equal difference, ignoring their relative document frequencies in the classes. In this paper we propose a new feature ranking (FR) metric, called normalized difference measure (NDM), which takes into account the relative document frequencies. The performance of NDM is investigated against seven well known feature ranking metrics including odds ratio (OR), chi squared (CHI), information gain (IG), distinguishing feature selector (DFS), gini index (GINI) ,balanced accuracy measure (ACC2) and Poisson ratio (POIS) on seven datasets namely WebACE(WAP,K1a,K1b), Reuters (RE0, RE1),spam email dataset and 20 newsgroups using the multinomial naive Bayes (MNB) and supports vector machines (SVM) classifiers. Our results show that the NDM metric outperforms the seven metrics in 66% cases in terms of macro-F1 measure and in 51% cases in terms of micro F1 measure in our experimental trials on these datasets.

Neurocomputing | 2013

Machine learning using Bernoulli mixture models: Clustering, rule extraction and dimensionality reduction

Mehreen Saeed; Kashif Javed; Haroon Atique Babri

Probabilistic models are common in the machine learning community for representing and modeling data. In this paper we focus on a probabilistic model based upon Bernoulli mixture models to solve different types of problems in pattern recognition like feature selection, classification, dimensionality reduction and rule generation. We illustrate the effectiveness of Bernoulli mixture models by applying them to various real life datasets taken from different domains, and used as part of various machine learning challenges. Our algorithms, based upon Bernoulli mixture models, are not only simple and intuitive but have also proven to give accurate and good results.

Neurocomputing | 2014

Impact of a metric of association between two variables on performance of filters for binary data

Kashif Javed; Haroon Atique Babri; Mehreen Saeed

In the feature selection community, filters are quite popular. Design of a filter depends on two parameters, namely the objective function and the metric it employs for estimating the feature-to-class (relevance) and feature-to-feature (redundancy) association. Filter designers pay relatively more attention towards the objective function. But a poor metric can overshadow the goodness of an objective function. The metrics that have been proposed in the literature estimate the relevance and redundancy differently, thus raising the question: can the metric estimating the association between two variables improve the feature selection capability of a given objective function or in other words a filter. This paper investigates this question. Mutual information is the metric proposed for measuring the relevance and redundancy between the features for the mRMR filter [1] while the MBF filter [2] employs correlation coefficient. Symmetrical uncertainty, a variant of mutual information, is used by the fast correlation-based filter (FCBF) [3]. We carry out experiments on mRMR, MBF and FCBF filters with three different metrics (mutual information, correlation coefficient and diff-criterion) using three binary data sets and four widely used classifiers. We find that MBF@?s performance is much better if it uses diff-criterion rather than correlation coefficient while mRMR with diff-criterion demonstrates performance better or comparable to mRMR with mutual information. For the FCBF filter, the diff-criterion also exhibits results much better than mutual information.

Knowledge and Information Systems | 2014

The correctness problem: evaluating the ordering of binary features in rankings

Kashif Javed; Mehreen Saeed; Haroon Atique Babri

In machine learning, feature ranking (FR) algorithms are used to rank features by relevance to the class variable. FR algorithms are mostly investigated for the feature selection problem and less studied for the problem of ranking. This paper focuses on the latter. A question asked about the problem of ranking given in the terminology of FR is: as different FR criteria estimate the relationship between a feature and the class variable differently on a given data, can we determine which criterion better captures the “true” feature-to-class relationship and thus generates the most “correct” order of individual features? This is termed as the “correctness” problem. It requires a reference ordering against which the ranks assigned to features by a FR algorithm are directly compared. The reference ranking is generally unknown for real-life data. In this paper, we show through theoretical and empirical analysis that for two-class classification tasks represented with binary data, the ordering of binary features based on their individual predictive powers can be used as a benchmark. Thus, allowing us to test how correct is the ordering of a FR algorithm. Based on these ideas, an evaluation method termed as FR evaluation strategy (FRES) is proposed. Rankings of three different FR criteria (relief, mutual information, and the diff-criterion) are investigated on five artificially generated and four real-life binary data sets. The results indicate that FRES works equally good for synthetic and real-life data and the diff-criterion generates the most correct orderings for binary data.

PLOS ONE | 2012

Reverse Engineering Boolean Networks: From Bernoulli Mixture Models to Rule Based Systems

Mehreen Saeed; Maliha Ijaz; Kashif Javed; Haroon Atique Babri

A Boolean network is a graphical model for representing and analyzing the behavior of gene regulatory networks (GRN). In this context, the accurate and efficient reconstruction of a Boolean network is essential for understanding the gene regulation mechanism and the complex relations that exist therein. In this paper we introduce an elegant and efficient algorithm for the reverse engineering of Boolean networks from a time series of multivariate binary data corresponding to gene expression data. We call our method ReBMM, i.e., reverse engineering based on Bernoulli mixture models. The time complexity of most of the existing reverse engineering techniques is quite high and depends upon the indegree of a node in the network. Due to the high complexity of these methods, they can only be applied to sparsely connected networks of small sizes. ReBMM has a time complexity factor, which is independent of the indegree of a node and is quadratic in the number of nodes in the network, a big improvement over other techniques and yet there is little or no compromise in accuracy. We have tested ReBMM on a number of artificial datasets along with simulated data derived from a plant signaling network. We also used this method to reconstruct a network from real experimental observations of microarray data of the yeast cell cycle. Our method provides a natural framework for generating rules from a probabilistic model. It is simple, intuitive and illustrates excellent empirical results.

international conference on electrical engineering | 2008

The behavior of k-Means: An empirical study

Kashif Javed; Haroon Atique Babri; Mehreen Saeed

In this paper, we study the behavior of the typical k-Means clustering algorithm by investigating the distributions of the final centroids, the sum-of-squares error and the iterations to convergence. This behavior is observed on two different synthetic data sets. It is found that when the clusters are well isolated from each other, the spread of the solutions found by k-Means algorithm indicates a much larger number of local minima as compared to the data set in which clusters overlap.

Expert Systems With Applications | 2018

Selection of the most relevant terms based on a max-min ratio metric for text classification

Abdur Rehman; Kashif Javed; Haroon Atique Babri; Muhammad Asim

Abstract Text classification automatically assigns text documents to one or more predefined categories based on their content. In text classification, data are characterized by a large number of highly sparse terms and highly skewed categories. Working with all the terms in the data has an adverse impact on the accuracy and efficiency of text classification tasks. A feature selection algorithm helps in selecting the most relevant terms. In this paper, we propose a new feature ranking metric called max-min ratio (MMR). It is a product of max-min ratio of the true positives and false positives and their difference, which allows MMR to select smaller subsets of more relevant terms even in the presence of highly skewed classes. This results in performing text classification with higher accuracy and more efficiency. To investigate the effectiveness of our newly proposed metric, we compare its performance against eight metrics (balanced accuracy measure, information gain, chi-squared, Poisson ratio, Gini index, odds ratio, distinguishing feature selector, and normalized difference measure) on six data sets namely WebACE (WAP, K1a, K1b), Reuters (RE0, RE1), and 20 Newsgroups using the multinomial naive Bayes (MNB) and support vector machines (SVM) classifiers. The statistical significance of MMR has been estimated on 5 different splits of training and test data sets using the one-way analysis of variance (ANOVA) method and a multiple comparisons test based on Tukey–Kramer method. We found that performance of MMR is statistically significant than that of the other 8 metrics in 76.2% cases in terms of macro F1 measure and in 74.4% cases in terms of micro F1 measure.

IEEE Transactions on Knowledge and Data Engineering | 2012