Mahamad Suhil | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mahamad Suhil is active.

Explore More

Publication

Featured researches published by Mahamad Suhil.

Procedia Computer Science | 2015

Segmentation and Classification of Skin Lesions for Disease Diagnosis

Sumithra R; Mahamad Suhil; D. S. Guru

In this paper, a novel approach for automatic segmentation and classification of skin lesions is proposed. Initially, skin images are filtered to remove unwanted hairs and noise and then the segmentation process is carried out to extract lesion areas. For segmentation, a region growing method is applied by automatic initialization of seed points. The segmentation performance is measured with different well known measures and the results are appreciable. Subsequently, the extracted lesion areas are represented by color and texture features. SVM and k-NN classifiers are used along with their fusion for the classification using the extracted features. The performance of the system is tested on our own dataset of 726 samples from 141 images consisting of 5 different classes of diseases. The results are very promising with 46.71% and 34% of F-measure using SVM and k-NN classifier respectively and with 61% of F-measure for fusion of SVM and k-NN.

Procedia Computer Science | 2015

A Novel Term_Class Relevance Measure for Text Categorization

D. S. Guru; Mahamad Suhil

In this paper, we introduce a new measure called Term_Class relevance to compute the relevancy of a term in classifying a document into a particular class. The proposed measure estimates the degree of relevance of a given term, in placing an unlabeled document to be a member of a known class, as a product of Class_Term weight and Class_Term density; where the Class_Term weight is the ratio of the number of documents of the class containing the term to the total number of documents containing the term and the Class_Term density is the relative density of occurrence of the term in the class to the total occurrence of the term in the entire population. Unlike the other existing term weighting schemes such as TF-IDF and its variants, the proposed relevance measure takes into account the degree of relative participation of the term across all documents of the class to the entire population. To demonstrate the significance of the proposed measure experimentation has been conducted on the 20 Newsgroups dataset. Further, the superiority of the novel measure is brought out through a comparative analysis.

arXiv: Computer Vision and Pattern Recognition | 2013

A Novel Approach for Shot Boundary Detection in Videos

D. S. Guru; Mahamad Suhil; P. Lolika

This paper presents a novel approach for video shot boundary detection. The proposed approach is based on split and merge concept. A fisher linear discriminant criterion is used to guide the process of both splitting and merging. For the purpose of capturing the between class and within class scatter we employ 2D2 FLD method which works on texture feature of regions in each frame of a video. Further to reduce the complexity of the process we propose to employ spectral clustering to group related regions together to a single there by achieving reduction in dimension. The proposed method is experimentally also validated on a cricket video. It is revealed that shots obtained by the proposed approach are highly cohesive and loosely coupled.

advances in computing and communications | 2016

Simple yet effective classification model for skewed text categorization

Mahamad Suhil; D. S. Guru; Lavanya Narayana Raju; Harsha S. Gowda

In this paper, the problem of skewness in text corpora for effective classification is addressed. A method of converting an imbalanced text corpus into a more or less balanced one is presented through an application of a classwise clustering algorithm. Further, to avoid curse of dimensionality, the chi-squared feature selection is employed. Nevertheless, each cluster of documents has been given a single vector representation by the use of a vector of interval-valued data which accomplishes a compact representation of text data thereby requiring a less memory for storage. A suitable symbolic classifier is used to match a query document against stored interval valued vectors. The superiority of the model has been demonstrated by conducting series of experiments on two benchmarking imbalanced corpora viz., Reuters-21578 and TDT2. In addition, a comparative analysis of the results of the proposal model versus that of the state of the art models on Reuters 21578 dataset indicates that the proposed model outperforms several contemporary models.

Archive | 2019

A Novel Feature Selection Technique for Text Classification

D. S. Guru; Mostafa Ali; Mahamad Suhil

In this paper, a new feature selection technique called Term-Class Weight-Inverse-Class Frequency is proposed for the purpose of text classification. The technique is based on selecting the most discriminating features with respect to each class. Nevertheless, the number of selected features by our technique is equal to the multiples of the number of classes present in the collection. The vectors of the document have been built based on varying number of selected features. The effectiveness of the technique has been demonstrated by conducting a series of experiments on two benchmarking text corpora, viz., Reuters-21578 and TDT2 using KNN classifier. In addition, a comparative analysis of the results of the proposed technique with that of the state-of-the-art techniques on the datasets indicates that the proposed technique outperforms several techniques.

Pattern Recognition Letters | 2018

An alternative framework for univariate filter based feature selection for text categorization

D. S. Guru; Mahamad Suhil; Lavanya Narayana Raju; N. Vinay Kumar

Abstract In this paper, we introduce an alternative framework for selecting a most relevant subset of the original set of features for the purpose of text categorization. Given a feature set and a local feature evaluation function (such as chi-square measure, mutual information etc.,) the proposed framework ranks the features in groups instead of ranking individual features. A group of features with rth rank is more powerful than the group of features with (r+1)th rank. Each group is made up of a subset of features which are supposed to be capable of discriminating every class from every other class. The added advantage of the proposed framework is that it automatically eliminates the redundant features while selecting features without requirement of study of features in combination. Further the proposed framework also helps in handling overlapping classes effectively through selection of low ranked yet powerful features. An extensive experimentation has been conducted on three benchmarking datasets using four different local feature evaluation functions with Support Vector Machine and Naive Bayes classifiers to bring out the effectiveness of the proposed framework over the respective conventional counterparts.

intelligent systems design and applications | 2017

Ensemble of Feature Selection Methods for Text Classification: An Analytical Study

D. S. Guru; Mahamad Suhil; S. K. Pavithra; G. R. Priya

In this paper, alternative models for ensembling of feature selection methods for text classification have been studied. An analytical study on three different models with various rank aggregation techniques has been made. The three models proposed for ensembling of feature selection are homogeneous ensemble, heterogeneous ensemble and hybrid ensemble. In homogeneous ensemble, the training feature matrix is randomly partitioned into multiple equal sized training matrices. A common feature evaluation function (FEF) is applied on all the smaller training matrices so as to obtain multiple ranks for each feature. Then a final score for each feature is computed by applying a suitable rank aggregation method. In heterogeneous ensemble, instead of partitioning the training matrix, multiple FEFs are applied onto the same training matrix to obtain multiple rankings for every feature. Then a final score for each feature is computed by applying a suitable rank aggregation method. Hybrid ensembling combines the ranks obtained by multiple homogeneous ensembling through multiple FEFs. It has been experimentally proven on two benchmarking text collections that, in most of the cases the proposed ensembling methods achieve better performance than that of any one of the feature selection methods when applied individually.

International Journal of Computer Vision | 2017

Feature selection of interval valued data through interval K-means clustering

D. S. Guru; N. Vinay Kumar; Mahamad Suhil

This paper introduces a novel feature selection model for supervised interval valued data based on interval K-Means clustering. The proposed model explores two kinds of feature selection through feature clustering viz., class independent feature selection and class dependent feature selection. The former one clusters the features spread across all the samples belonging to all the classes, whereas the latter one clusters the features spread across only the samples belonging to the respective classes. Both feature selection models are demonstrated to explore the generosity of clustering in selecting the interval valued features. For clustering, the kernel of the K-means clustering has been altered to operate on interval valued data. For experimentation purpose four standard benchmarking datasets and three symbolic classifiers have been used. To corroborate the effectiveness of the proposed model, a comparative analysis against the state-of-the-art models is given and results show the superiority of the proposed model.

advances in computing and communications | 2016

Detection of a new class in a huge corpus of text documents through semi-supervised learning

D. S. Guru; Mahamad Suhil; Harsha S. Gowda; Lavanya Narayana Raju

This paper poses a new problem of detecting an unknown class present in a text corpus which has huge amount of unlabeled samples but a very small quantity of labeled samples. A simple yet efficient solution has also been proposed by modifying conventional clustering technique to demonstrate the scope of the problem for further research. A novel way to estimate cluster diameter is proposed which in turn has been used as a measure to estimate the degree of dissimilarity between two clusters. The main idea of the model is to arrive at a cluster of unlabeled text samples which is far away from any of the labeled clusters guided by few rules such as diameter of the cluster and dissimilarity between pair of clusters. This work is first of its kind in the literature and has tremendous applications in text mining tasks. In fact the model proposed is a general framework which can be applied onto any application which necessarily involves identification of unseen classes in a semi-supervised learning environment. The model has been studied with extensive empirical analysis on different text datasets created from the benchmarking 20Newsgroups dataset. The results of the experimentation have revealed the capabilities of the proposed approach and the possibilities for future research.

advances in computing and communications | 2016

Term-class-max-support (TCMS): A simple text document categorization approach using term-class relevance measure

D. S. Guru; Mahamad Suhil

In this paper, a simple text categorization method using term-class relevance measures is proposed. Initially, text documents are processed to extract significant terms present in them. For every term extracted from a document, we compute its importance in preserving the content of a class through a novel term-weighting scheme known as Term_Class Relevance (TCR) measure proposed by Guru and Suhil (2015) [1]. In this way, for every term, its relevance for all the classes present in the corpus is computed and stored in the knowledgebase. During testing, the terms present in the test document are extracted and the term-class relevance of each term is obtained from the stored knowledgebase. To achieve quick search of term weights, B-tree indexing data structure has been adapted. Finally, the class which receives maximum support in terms of term-class relevance is decided to be the class of the given test document. The proposed method works in logarithmic complexity in testing time and simple to implement when compared to any other text categorization techniques available in literature. The experiments conducted on various benchmarking datasets have revealed that the performance of the proposed method is satisfactory and encouraging.

Explore More