Amir Ahmad
King Abdulaziz University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Amir Ahmad.
Pattern Recognition Letters | 2004
Shehroz S. Khan; Amir Ahmad
Performance of iterative clustering algorithms which converges to numerous local minima depend highly on initial cluster centers. Generally initial cluster centers are selected randomly. In this paper we propose an algorithm to compute initial cluster centers for K-means clustering. This algorithm is based on two observations that some of the patterns are very similar to each other and that is why they have same cluster membership irrespective to the choice of initial cluster centers. Also, an individual attribute may provide some information about initial cluster center. The initial cluster centers computed using this methodology are found to be very close to the desired cluster centers, for iterative clustering algorithms. This procedure is applicable to clustering algorithms for continuous data. We demonstrate the application of proposed algorithm to K-means clustering algorithm. The experimental results show improved and consistent solutions using the proposed algorithm.
data and knowledge engineering | 2007
Amir Ahmad; Lipika Dey
Use of traditional k-mean type algorithm is limited to numeric data. This paper presents a clustering algorithm based on k-mean paradigm that works well for data with mixed numeric and categorical features. We propose new cost function and distance measure based on co-occurrence of values. The measures also take into account the significance of an attribute towards the clustering process. We present a modified description of cluster center to overcome the numeric data only limitation of k-mean algorithm and provide a better characterization of clusters. The performance of this algorithm has been studied on real world data sets. Comparisons with other clustering algorithms illustrate the effectiveness of this approach.
Pattern Recognition Letters | 2005
Amir Ahmad; Lipika Dey
Abstract Patterns summarizing mutual associations between class decisions and attribute values in a pre-classified database, provide insight into the significance of attributes and also useful classificatory knowledge. In this paper we have proposed a conditional probability based, efficient method to extract the significant attributes from a database. Reducing the feature set during pre-processing enhances the quality of knowledge extracted and also increases the speed of computation. Our method supports easy visualization of classificatory knowledge. A likelihood-based classification algorithm that uses this classificatory knowledge is also proposed. We have also shown how the classification methodology can be used for cost-sensitive learning where both accuracy and precision of prediction are important.
Pattern Recognition Letters | 2007
Amir Ahmad; Lipika Dey
Computation of similarity between categorical data objects in unsupervised learning is an important data mining problem. We propose a method to compute distance between two attribute values of same attribute for unsupervised learning. This approach is based on the fact that similarity of two attribute values is dependent on their relationship with other attributes. Computational cost of this method is linear with respect to number of data objects in data set. To see the effectiveness of our proposed distance measure, we use proposed distance measure with K-mode clustering algorithm to cluster various categorical data sets. Significant improvement in clustering accuracy is observed as compared to clustering results obtained using traditional K-mode clustering algorithm.
Expert Systems With Applications | 2013
Shehroz S. Khan; Amir Ahmad
Partitional clustering of categorical data is normally performed by using K-modes clustering algorithm, which works well for large datasets. Even though the design and implementation of K-modes algorithm is simple and efficient, it has the pitfall of randomly choosing the initial cluster centers for invoking every new execution that may lead to non-repeatable clustering results. This paper addresses the randomized center initialization problem of K-modes algorithm by proposing a cluster center initialization algorithm. The proposed algorithm performs multiple clustering of the data based on attribute values in different attributes and yields deterministic modes that are to be used as initial cluster centers. In the paper, we propose a new method for selecting the most relevant attributes, namely Prominent attributes, compare it with another existing method to find Significant attributes for unsupervised learning, and perform multiple clustering of data to find initial cluster centers. The proposed algorithm ensures fixed initial cluster centers and thus repeatable clustering results. The worst-case time complexity of the proposed algorithm is log-linear to the number of data objects. We evaluate the proposed algorithm on several categorical datasets and compared it against random initialization and two other initialization methods, and show that the proposed method performs better in terms of accuracy and time complexity. The initial cluster centers computed by the proposed approach are close to the actual cluster centers of the different data we tested, which leads to faster convergence of K-modes clustering algorithm in conjunction to better clustering results.
Pattern Recognition Letters | 2011
Amir Ahmad; Lipika Dey
Almost all subspace clustering algorithms proposed so far are designed for numeric datasets. In this paper, we present a k-means type clustering algorithm that finds clusters in data subspaces in mixed numeric and categorical datasets. In this method, we compute attributes contribution to different clusters. We propose a new cost function for a k-means type algorithm. One of the advantages of this algorithm is its complexity which is linear with respect to the number of the data points. This algorithm is also useful in describing the cluster formation in terms of attributes contribution to different clusters. The algorithm is tested on various synthetic and real datasets to show its effectiveness. The clustering results are explained by using attributes weights in the clusters. The clustering results are also compared with published results.
IEEE Transactions on Knowledge and Data Engineering | 2014
Amir Ahmad; Gavin Brown
In this paper, we present a novel ensemble method random projection random discretization ensembles(RPRDE) to create ensembles of linear multivariate decision trees by using a univariate decision tree algorithm. The present method combines the better computational complexity of a univariate decision tree algorithm with the better representational power of linear multivariate decision trees. We develop random discretization (RD) method that creates random discretized features from continuous features. Random projection (RP) is used to create new features that are linear combinations of original features. A new dataset is created by augmenting discretized features (created by using RD) with features created by using RP. Each decision tree of a RPRD ensemble is trained on one dataset from the pool of these datasets by using a univariate decision tree algorithm. As these multivariate decision trees (because of features created by RP) have more representational power than univariate decision trees, we expect accurate decision trees in the ensemble. Diverse training datasets ensure diverse decision trees in the ensemble. We study the performance of RPRDE against other popular ensemble techniques using C4.5 tree as the base classifier. RPRDE matches or outperforms other popular ensemble methods. Experiments results also suggest that the proposed method is quite robust to the class noise.
Nanotechnology | 2006
Amir Ahmad; V. K. Tripathi
The field enhancement factor of a carbon nanotube (CNT) placed in a cluster of CNTs is smaller than an isolated CNT because the electric field on one tube is screened by neighbouring tubes. This screening depends on the length of the CNTs and the spacing between them. We have derived an expression to compute the field enhancement factor of CNTs under any positional distribution of CNTs using a model of a floating sphere between parallel anode and cathode plates. Using this expression we can compute the field enhancement factor of a CNT in a cluster (non-uniformly distributed CNTs). This expression is used to compute the field enhancement factor of a CNT in an array (uniformly distributed CNTs). Comparison has been shown with experimental results and existing models.
Applied Intelligence | 2014
Amir Ahmad
A classifier ensemble is a set of classifiers whose individual decisions are combined to classify new examples. Classifiers, which can represent complex decision boundaries are accurate. Kernel functions can also represent complex decision boundaries. In this paper, we study the usefulness of kernel features for decision tree ensembles as they can improve the representational power of individual classifiers. We first propose decision tree ensembles based on kernel features and found that the performance of these ensembles is strongly dependent on the kernel parameters; the selected kernel and the dimension of the kernel feature space. To overcome this problem, we present another approach to create ensembles that combines the existing ensemble methods with the kernel machine philosophy. In this approach, kernel features are created and concatenated with the original features. The classifiers of an ensemble are trained on these extended feature spaces. Experimental results suggest that the approach is quite robust to the selection of parameters. Experiments also show that different ensemble methods (Random Subspace, Bagging, Adaboost.M1 and Random Forests) can be improved by using this approach.
Information Sciences | 2012
Amir Ahmad; Lipika Dey; Sami M. Halawani
The analysis of customer satisfaction datasets has shown that product-related features fall into three categories (i.e., basic, performance, and excitement), which affect overall satisfaction differently. Because the relationship between product features and customer satisfaction is characterized by non-linearity and asymmetry, feature values are studied to understand the characteristics of a feature. However, existing methods are computationally expensive and work for ordinal features only. We propose a rule-based method that can be used to analyze data features regarding various characteristics of customer satisfaction. The inputs for these rules are derived by using a probabilistic feature-selection technique. In this feature selection method, mutual associations between feature values and class decisions in a pre-classified database are computed to measure the significance of feature values. The proposed method can be used for both types of features: ordinal and categorical. The proposed method is more computationally efficient than previously recommended methods. We performed experiments on a synthetic dataset with known characteristics, and our method correctly predicted the characteristics of the dataset. We also performed experiments with a real-housing dataset. The knowledge extracted from the dataset by using this method is in agreement with the domain knowledge.