Is this you? Create Your Porfile

Saba Bashir

College of Electrical and Mechanical Engineering

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Saba Bashir is active.

Explore More

Publication

Featured researches published by Saba Bashir.

Applied Soft Computing | 2016

SentiMI: Introducing point-wise mutual information with SentiWordNet to improve sentiment polarity detection

Farhan Hassan Khan; Usman Qamar; Saba Bashir

Abstract Supervised learning has attracted much attention in recent years. As a consequence, many of the state-of-the-art algorithms are domain dependent as they require a labeled training corpus to learn the domain features. This requires the availability of labeled corpora which is a cumbersome task in itself. However, for text sentiment detection SentiWordNet (SWN) may be used. It is a vocabulary where terms are arranged in synonym groups called synsets. This research makes use of SentiWordNet and treats it as the labeled corpus for training. A sentiment dictionary, SentiMI, builds upon the mutual information calculated from these terms. A complete framework is developed by using feature selection and extracting mutual information, from SentiMI, for the selected features. Training, testing and evaluation of the proposed framework are conducted on a large dataset of 50,000 movie reviews. A notable performance improvement of 7% in accuracy, 14% in specificity, and 8% in F-measure is achieved by the proposed framework as compared to the baseline SentiWordNet classifier. Comparison with the state-of-the-art classifiers is also performed on widely used Cornell Movie Review dataset which also proves the effectiveness of the proposed approach.

Knowledge Based Systems | 2016

SWIMS: Semi-supervised subjective feature weighting and intelligent model selection for sentiment analysis

Farhan Hassan Khan; Usman Qamar; Saba Bashir

Abstract Sentiment Analysis, also called Opinion Mining, is currently one of the most studied research fields. Its aim is to analyze publics’ sentiments, opinions, attitudes etc., towards different elements such as topics, products, individuals, organizations, or services. Sentiment classification can be achieved by machine learning or lexical based methodologies or a combination of both. In an effort to improve the performance of domain independent lexicons, this research incorporates machine learning with a lexical based approach introducing a new framework called SWIMS to determine the feature weight based on a well-known general-purpose sentiment lexicon, SentiWordNet. Support vector machine is used to learn the feature weights and an intelligent model selection approach is employed in order to enhance the classification performance. The features are selected based on their subjectivity and the effects of feature selection with respect to their part of speech information are studied extensively. Seven benchmark datasets have been used in this research including large movie review dataset, multi-domain sentiment dataset and Cornell movie review dataset, all of which are available online. In-depth performance comparison is conducted with the state of art machine learning approaches and lexical based methodologies. The evaluation of performance measures proves that the proposed framework outperforms other techniques for sentiment analysis.

Information Sciences | 2016

eSAP: A decision support framework for enhanced sentiment analysis and polarity classification

Farhan Hassan Khan; Usman Qamar; Saba Bashir

Abstract Sentiment analysis or opinion mining is an imperative research area of natural language processing. It is used to determine the writers attitude or speakers opinion towards a particular person, product or topic. Polarity or subjectivity classification is the process of categorizing a piece of text into positive or negative classes. In recent years, various supervised and unsupervised methods have been presented to accomplish sentiment polarity detection. SentiWordNet (SWN) has been extensively used as a lexical resource for opinion mining. This research incorporates SWN as the labeled training corpus where the sentiment scores are extracted based on the part of speech information. A vocabulary SWN-V with revised sentiment scores, generated from SWN, is then used for Support Vector Machines model learning and classification process. Based on this vocabulary, a framework named “Enhanced Sentiment Analysis and Polarity Classification (eSAP)” is proposed. Training, testing and evaluation of the proposed eSAP are conducted on seven benchmark datasets from various domains. 10-fold cross validated accuracy, precision, recall, and f-measure results averaged over seven datasets for the proposed framework are 80.82%, 80.83%, 80.94% and 80.81% respectively. A notable performance improvement of 13.4% in accuracy, 14.2% in precision, 6.9% in recall and 11.1% in f-measure is observed on average by evaluating the proposed eSAP against the baseline SWN classifier. State of the art performance comparison is conducted which also verifies the superiority of the proposed eSAP framework.

Cognitive Computation | 2016

Multi-Objective Model Selection (MOMS)-based Semi-Supervised Framework for Sentiment Analysis

Farhan Hassan Khan; Usman Qamar; Saba Bashir

Sentiment analysis has emerged as an active research field due to the rapid growth of user-generated content on the Internet. This research area analyzes the opinions and attitudes of masses toward products, movies, topics, individuals, and services. Various machine learning and text mining algorithms have been used for sentiment analysis and classification. The recent research concludes that domain-specific lexicons perform significantly better as compared to domain-independent lexicons. The proposed research aims at improving the performance of general-purpose lexicons utilizing machine learning algorithms. A semi-supervised framework based on “MOMS” is introduced in order to determine the feature weight by incorporating SentiWordNet, a well-known general-purpose sentiment lexicon. The feature weights are learned by support vector machine, and the classification performance is enhanced by using Multi-Objective Model Selection procedure. Subjectivity criterion is used to select the desired features, and the effects of feature selection with respect to their part-of-speech information are studied comprehensively. Experimental evaluation is performed on seven different benchmark datasets which includes Large movie review dataset, Multi-domain sentiment dataset, and Cornell movie review dataset. The comparison of the proposed approach is performed with state-of-the-art techniques, lexicon-based approaches, and other methods for sentiment analysis. The proposed framework results in high performance when compared to other research in this field.

Journal of Computational Science | 2016

HMV: A medical decision support framework using multi-layer classifiers for disease prediction

Saba Bashir; Usman Qamar; Farhan Hassan Khan; Lubna Naseem

Abstract Decision support is a crucial function for decision makers in many industries. Typically, Decision Support Systems (DSS) help decision-makers to gather and interpret information and build a foundation for decision-making. Medical Decision Support Systems (MDSS) play an increasingly important role in medical practice. By assisting doctors with making clinical decisions, DSS are expected to improve the quality of medical care. Conventional clinical decision support systems are based on individual classifiers or a simple combination of these classifiers which tend to show moderate performance. In this research, a multi-layer classifier ensemble framework is proposed based on the optimal combination of heterogeneous classifiers. The proposed model named “HMV” overcomes the limitations of conventional performance bottlenecks by utilizing an ensemble of seven heterogeneous classifiers. The framework is evaluated on two different heart disease datasets, two breast cancer datasets, two diabetes datasets, two liver disease datasets, one Parkinsons disease dataset and one hepatitis dataset obtained from public repositories. Effectiveness of the proposed ensemble is investigated by comparison of results with several well-known classifiers as well as ensemble techniques. The experimental evaluation shows that the proposed framework dealt with all types of attributes and achieved high diagnosis accuracy. A case study is also presented based on a real time medical dataset in order to show the high performance and effectiveness of the proposed model.

Knowledge and Information Systems | 2017

A semi-supervised approach to sentiment analysis using revised sentiment strength based on SentiWordNet

Farhan Hassan Khan; Usman Qamar; Saba Bashir

An immense amount of data is available with the advent of social media in the last decade. This data can be used for sentiment analysis and decision making. The data present on blogs, news/review sites, social networks, etc., are so enormous that manual labeling is not feasible and an automatic approach is required for its analysis. The sentiment of the masses can be understood by analyzing this large scale and opinion rich data. The major issues in the application of automated approaches are data unavailability, data sparsity, domain independence and inadequate performance. This research proposes a semi-supervised sentiment analysis approach that incorporates lexicon-based methodology with machine learning in order to improve sentiment analysis performance. Mathematical models such as information gain and cosine similarity are employed to revise the sentiment scores defined in SentiWordNet. This research also emphasizes on the importance of nouns and employs them as semantic features with other parts of speech. The evaluation of performance measures and comparison with state-of-the-art techniques proves that the proposed approach is superior.

Journal of Intelligent and Fuzzy Systems | 2015

Building Normalized SentiMI to enhance semi-supervised sentiment analysis

Farhan Hassan Khan; Usman Qamar; Saba Bashir

Sentiment analysis and polarity detection is a type of text classification where natural language opinion is analyzed in order to classify it into either positive or negative categories. Classification of text into sentiment labels is a very difficult task as opinions expressed in natural language may contain abbreviations, slangs, sarcasm, irony and/or idioms. The proposed research focuses on the use of SentiWordNet3.0 as a labeled corpus for training purposes. We present a complete framework based on a dictionary named Normalized SentiMI (nSentiMI) which is created by calculating point-wise mutual information for each term/part-of-speech pair extracted from SentiWordNet. The proposed framework is applied on a dataset of 50,000 movie reviews to identify the value of a weight factor and then evaluated on an unseen test dataset of 2000 movie reviews. Comparison with state of art techniques also confirms the superiority of proposed approach.

Artificial Intelligence Review | 2017

Lexicon based semantic detection of sentiments using expected likelihood estimate smoothed odds ratio

Farhan Hassan Khan; Usman Qamar; Saba Bashir

Sentiment analysis is an active research area in today’s era due to the abundance of opinionated data present on online social networks. Semantic detection is a sub-category of sentiment analysis which deals with the identification of sentiment orientation in any text. Many sentiment applications rely on lexicons to supply features to a model. Various machine learning algorithms and sentiment lexicons have been proposed in research in order to improve sentiment categorization. Supervised machine learning algorithms and domain specific sentiment lexicons generally perform better as compared to the unsupervised or semi-supervised domain independent lexicon based approaches. The core hindrance in the application of supervised algorithms or domain specific sentiment lexicons is the unavailability of sentiment labeled training datasets for every domain. On the other hand, the performance of algorithms based on general purpose sentiment lexicons needs improvement. This research is focused on building a general purpose sentiment lexicon in a semi-supervised manner. The proposed lexicon defines word semantics based on Expected Likelihood Estimate Smoothed Odds Ratio that are then incorporated with supervised machine learning based model selection approach. A comprehensive performance comparison verifies the superiority of our proposed approach.

International Journal of Computer and Electrical Engineering | 2011

Entropy Based Data Hiding in Binary Document Images

Aihab Khan; Memoona Khanam; Saba Bashir; Malik Sikander Hayat Khiyal; Asima Iqbal; Farhan Hassan Khan

research paper has presented a data hiding technique for binary document images. Entropy measure method is used to minimize the perceptual distortion due to embedding. The watermark extraction is a blind system because neither the original image nor the watermark is required for extraction. The document image is similar to any other image. The proposed method discovers the specific regions where minimum distortion delay exists due to embedding. For embedding, the blocks that exist in the area of small font sizes are selected. Experimental results show that marked documents have excellent visual quality and less computational complexity.

Information Systems Frontiers | 2018

WebMAC: A web based clinical expert system

Saba Bashir; Usman Qamar; Farhan Hassan Khan

Disease diagnosis at early stages can enable the physicians to overcome the complications and treat them properly. The diagnosis method plays an important role in disease diagnosis and accuracy of its treatment. A diagnosis expert system can help a great deal in identifying those diseases and describing methods of treatment to be carried out; taking into account the user capability in order to deal and interact with expert system easily and clearly. A good way to improve diagnosis accuracy of expert systems is use of ensemble classifiers. The proposed research presents an expert system using multi-layer classification with enhanced bagging and optimized weighting. The proposed method is named as “M2-BagWeight” which overcomes the limitations of individual as well as other ensemble classifiers. Evaluation of the proposed model is performed on two different liver disease datasets, chronic kidney disease dataset, heart disease dataset, diabetic retinopathy debrecen dataset, breast cancer dataset and primary tumor dataset obtained from UCI public repository. It is clear from the analysis of results that proposed expert system has achieved high classification and prediction accuracy when compared with individual as well as ensemble classifiers. Moreover, an application named “WebMAC” is also developed for practical implementation of proposed model in hospital for diagnostic advice.

Explore More