Is this you? Create Your Porfile

Muhammad Rafi

National University of Computer and Emerging Sciences

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Muhammad Rafi is active.

Explore More

Publication

Featured researches published by Muhammad Rafi.

ieee international multitopic conference | 2011

Comparing SVM and naïve Bayes classifiers for text categorization with Wikitology as knowledge enrichment

Sundus Hassan; Muhammad Rafi; Muhammad Shahid Shaikh

The activity of labeling of documents according to their content is known as text categorization. Many experiments have been carried out to enhance text categorization by adding background knowledge to the document using knowledge repositories like Word Net, Open Project Directory (OPD), Wikipedia and Wikitology. In our previous work, we have carried out intensive experiments by extracting knowledge from Wikitology and evaluating the experiment on Support Vector Machine with 10- fold cross-validations. The results clearly indicate Wikitology is far better than other knowledge bases. In this paper we are comparing Support Vector Machine (SVM) and Naïve Bayes (NB) classifiers under text enrichment through Wikitology. We validated results with 10-fold cross validation and shown that NB gives an improvement of +28.78%, on the other hand SVM gives an improvement of +636% when compared with baseline results. Naïve Bayes classifier is better choice when external enriching is used through any external knowledge base.

International Journal of Computer Applications | 2010

Document Clustering based on Topic Maps

Muhammad Rafi; M. Shahid Shaikh; Amir Farooq

of document clustering is now widely acknowledged by researchers for better management, smart navigation, efficient filtering, and concise summarization of large collection of documents like World Wide Web (WWW). The next challenge lies in semantically performing clustering based on the semantic contents of the document. The problem of document clustering has two main components: (1) to represent the document in such a form that inherently captures semantics of the text. This may also help to reduce dimensionality of the document, and (2) to define a similarity measure based on the semantic representation such that it assigns higher numerical values to document pairs which have higher semantic relationship. Feature space of the documents can be very challenging for document clustering. A document may contain multiple topics, it may contain a large set of class- independent general-words, and a handful class-specific core- words. With these features in mind, traditional agglomerative clustering algorithms, which are based on either Document Vector model (DVM) or Suffix Tree model (STC), are less efficient in producing results with high cluster quality. This paper introduces a new approach for document clustering based on the Topic Map representation of the documents. The document is being transformed into a compact form. A similarity measure is proposed based upon the inferred information through topic maps data and structures. The suggested method is implemented using agglomerative hierarchal clustering and tested on standard Information retrieval (IR) datasets. The comparative experiment reveals that the proposed approach is effective in improving the cluster quality.

international conference on information and emerging technologies | 2010

A comparison of two suffix tree-based document clustering algorithms

Muhammad Rafi; Mehdi Maujood; Murtaza Munawar Fazal; Syed Muhammad Ali

Document clustering as an unsupervised approach extensively used to navigate, filter, summarize and manage large collection of document repositories like the World Wide Web (WWW). Recently, focuses in this domain shifted from traditional vector based document similarity for clustering to suffix tree based document similarity, as it offers more semantic representation of the text present in the document. In this paper, we compare and contrast two recently introduced approaches to document clustering based on suffix tree data model. The first is an Efficient Phrase based document clustering, which extracts phrases from documents to form compact document representation and uses a similarity measure based on common suffix tree to cluster the documents. The second approach is a frequent word/word meaning sequence based document clustering, it similarly extracts the common word sequence from the document and uses the common sequence/ common word meaning sequence to perform the compact representation, and finally, it uses document clustering approach to cluster the compact documents. These algorithms are using agglomerative hierarchical document clustering to perform the actual clustering step, the difference in these approaches are mainly based on extraction of phrases, model representation as a compact document, and the similarity measures used for clustering. This paper investigates the computational aspect of the two algorithms, and the quality of results they produced.

international conference on machine learning | 2017

Towards A Soft Computing Approach to Document Clustering

Muhammad Rafi; Sufyan Shahid; Junaid Aftab; Muhammad Faizan Uddin; Muhammad Shahid Shaikh

Soft computing refers to partnership of methods to produce an approximate and low cost solution for hard problems. We believe that document clustering is one such problem and trust that the soft computing approach will be a good candidate for this problem. In this paper, we propose a generalized document clustering approach using soft computing techniques. We define two methods for document clustering using k-Mean partition algorithm: (i) A Genetic Algorithm (GA) based k-Mean algorithm that optimized to find a local optimal solution and (ii) A Harmony Search (HS) based k-Mean algorithm that optimized to find a global optimal solution. We also proposed a novel soft computing partnership method (Hybrid) that uses solution produced from either (GA k-Mean) or (HS k-Mean) method to seed the other for improvement. We extensively performed experiments with our proposed method on standard text mining datasets like: (i) NEWS20, (ii) Reuters and (iii) WebKB-courses-courses and evaluated the results on Purity and Silhouette. In comparison the proposed outperform the basic k-Mean and the hybrid approach performs exceptionally good.

International Journal of Advanced Computer Science and Applications | 2015

Study of Automatic Extraction, Classification, and Ranking of Product Aspects Based on Sentiment Analysis of Reviews

Muhammad Rafi; Muhammad Farooq; Usama Noman; Abdul Rehman Farooq; Umair Ali Khatri

It is very common for a customer to read reviews about the product before making a final decision to buy it. Customers are always eager to get the best and the most objective information about the product theywish to purchase and reviews are the major source to obtain this information. Although reviews are easily accessible from the web, but since most of them carry ambiguous opinion and different structure, it is often very difficult for a customer to filter the information he actually needs. This paper suggests a framework, which provides a single user interface solution to this problem based on sentiment analysis of reviews. First, it extracts all the reviews from different websites carrying varying structure, and gathers information about relevant aspects of that product. Next, it does sentiment analysis around those aspects and gives them sentiment scores. Finally, it ranks all extracted aspects and clusters them into positive and negative class. The final output is a graphical visualization of all positive and negative aspects, which provide the customer easy, comparable, and visual information about the important aspects of the product. The experimental results on five different products carrying 5000 reviewsshow 78% accuracy. Moreover, the paper also explained the effect of Negation, Valence Shifter, and Diminisher with sentiment lexiconon sentiment analysis, andconcluded that they all are independent of the case problem , and have no effect on the accuracy of sentiment analysis.

international conference on machine learning | 2018

Solving document clustering problem through meta heuristic algorithm: black hole

Muhammad Rafi; Bilal Aamer; Mubashir Naseem; Muhammad Osama

The paper proposed a soft computing approach to solve document clustering problem. Document clustering is a specialized clustering problem in which textual documents autonomously segregated to a number of identifiable, subject homogenous and smaller sub-collections (also called clusters). Identifying implicit textual patterns within the documents is a challenging aspect as there can be thousands of such textual features. Partition clustering algorithm like k-means is mainly used for this problem. There are several drawbacks in k-means algorithm such as (i) initial seeds dependency, and (ii) it traps into local optimal solution. Although every k-means solution may contain some good partial arrangements for clustering. Meta-heuristic algorithm like Black Hole (BH) uses certain trade-off of randomization and local search for finding the optimal and near optimal solution. Our motivation comes from the fact that meta-heuristic optimization can quickly produce a global optimal solution using random k-means initial solution. The contributions from this research are (i) an implementation of black hole algorithm using k-mean as embedding (ii) The phenomena of global search and local search optimization are used as parameters adjustments. A series of experiments are performed with our proposed method on standard text mining datasetslike: (i) NEWS20, (ii) Reuters and (iii) WebKB and results are evaluated on Purity and Silhouette Index. In comparison the proposed method outperforms the basic k-means, GA with k-means embedding and quickly converges to global or near global optimal solution.

web intelligence, mining and semantics | 2016

Multi-Layer Semantics Based Document Clustering

Muhammad Rafi; Muhammad Sharif; Waleed Arshad; Sheharyar Mohsin; Habibullah Rafay

Document clustering is an unsupervised machine learning method that separates a large subject heterogeneous collection (Document Base or Corpus) into smaller, more manageable subject homogeneous collections (clusters). Traditional method of document clustering uses features like words, sequence, phrases, etc. These features are independent to each other and do not cater semantics. In order to perform semantic viable clustering, we believe that the problem of document clustering has two main components: (1) to represent the document in such a form that it inherently captures semantics of the text. This may also help to reduce dimensionality of the document and (2) to define a similarity measure based on the semantic representation such that it assigns higher numerical values to document pairs which have higher semantic relationship. In this paper, we propose a representation of document, based on three distinct layers: these are lexical, syntactic and semantic layers. We believe that these three layers are essential to ensure semantics into document meta descriptor. Using these three layers features we propose a similarity function for performing document clustering. We performed an extensive series of experiments on standard text mining data sets with external clustering evaluations like: F-Measure and Purity.

Journal of Independent Studies and Research - Computing | 2015

Extracting patterns from Global Terrorist Dataset (GTD) Using Co-Clustering approach

Muhammad Adnan; Szabist, Karachi, Pakistan; Muhammad Rafi

Global Terrorist Dataset (GTD) is a vast collection of terrorist activities reported around the globe. The terrorism database incorporates more than 27,000 terrorism incidents from 1968 to 2014. Every record has spatial data, a period stamp, and a few different fields (e.g. strategies, weapon sorts, targets and wounds). There were few earlier studies to find interesting patterns from this textual gamut of data. The author believes that GTD has numerous interesting patterns still hidden and the full potential of this resource is still to be divulged. In this Independent Study, the author tries to investigate the GTD through co-clustering method for pattern discovery. Author has extracted textual data from GTD as per motivation to cluster the data in space and time simultaneously, through co-clustering. Co-clustering has become an important and powerful tool for data mining. By using co-clustering, bilateral data can be analysed by describing the connections between two different entities. There are many applications in the real world that can extensively benefits from this approach of co-clustering, such as market basket analysis and recommendation system. In this study, the effectiveness of coclustering model will be described by performing experiment on database of global terrorist events. Keywords—Global Terrorism Dataset, GTD, Coclustering, Bi-clustering, Two-Way-Clustering

arXiv: Information Retrieval | 2013