Shubhamoy Dey
Indian Institute of Management Indore
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shubhamoy Dey.
research in applied computation symposium | 2012
Anuj Sharma; Shubhamoy Dey
Sentiment analysis is performed to extract opinion and subjectivity knowledge from user generated text content. This is contextually different from traditional topic based text classification since it involves classifying opinionated text according to the sentiment conveyed by it. Feature selection is a critical task in sentiment analysis and effectively selected representative features from subjective text can improve sentiment based classification. This paper explores the applicability of five commonly used feature selection methods in data mining research (DF, IG, GR, CHI and Relief-F) and seven machine learning based classification techniques (Naïve Bayes, Support Vector Machine, Maximum Entropy, Decision Tree, K-Nearest Neighbor, Winnow, Adaboost) for sentiment analysis on online movie reviews dataset. The paper demonstrates that feature selection does improve the performance of sentiment based classification, but it depends on the method adopted and the number of feature selected. The experimental results presented in this paper show that Gain Ratio gives the best performance for sentimental feature selection, and SVM performs better than other techniques for sentiment based classification.
ACM Sigapp Applied Computing Review | 2012
Anuj Sharma; Shubhamoy Dey
The abundance of discussion forums, Weblogs, e-commerce portals, social networking, product review sites and content sharing sites has facilitated flow of ideas and expression of opinions. The user-generated text content on Internet and Web 2.0 social media can be a rich source of sentiments, opinions, evaluations, and reviews. Sentiment analysis or opinion mining has become an open research domain that involves classifying text documents based on the opinion expressed, about a given topic, being positive or negative. This paper proposes a sentiment classification model using back-propagation artificial neural network (BPANN). Information Gain, and three popular sentiment lexicons are used to extract sentiment representing features that are then used to train and test the BPANN. This novel approach combines the strength of BPANN in classification accuracy with intrinsic subjectivity knowledge available in the sentiment lexicons. The results obtained from experiments on the movie and hotel review corpora have shown that the proposed approach has been able to reduce dimensionality, while producing accurate results for sentiment based classification of text.
computational science and engineering | 2013
Shrawan Kumar Trivedi; Shubhamoy Dey
Identification of unsolicited emails (spams) is now a well-recognized research area within text classification. A good email classifier is not only evaluated by performance accuracy but also by the false positive rate. This research presents an Enhanced Genetic Programming (EGP) approach which works by building an ensemble of classifiers for detecting spams. The proposed classifier is tested on the most informative features of two public ally available corpuses (Enron and Spam assassin) found using Greedy stepwise search method. Thereafter, the proposed ensemble of classifiers is compared with various Machine Learning Classifiers: Genetic Programming (GP), Bayesian, Naïve Bayes (NB), J48, Random forest (RF), and SVM. Results of this study indicate that the proposed classifier (EGP) is the best classifier among those compared in terms of performance accuracy as well as false positive rate.
research in adaptive and convergent systems | 2013
Anuj Sharma; Shubhamoy Dey
The opinionated text available on the Internet and Web 2.0 social media has created ample research opportunities related to mining and analyzing public sentiments. At the same time, the large volume of such data poses severe data processing and sentiment extraction related challenges. Different contemporary solutions based on machine learning, dictionary, statistical, and semantic based approaches have been proposed in literature for sentiment analysis of online user-generated data. Recent research studies have proved that supervised machine learning techniques like Naive Bayes (NB) and Support Vector Machines (SVM) are very effective for sentiment based classification of opinionated text. This paper proposes a hybrid sentiment classification model based on Boosted SVM. The proposed model exploits classification performance of two techniques (Boosting and SVM) applied for the task of sentiment based classification of online reviews. The results on movies and hotel review corpora of 2000 reviews have shown that the proposed approach has succeeded in improving performance of SVM when used as a weak learner for sentiment based classification. Specifically, the results show that SVM ensemble with bagging or boosting significantly outperforms a single SVM in terms of accuracy of sentiment based classification.
research in applied computation symposium | 2012
Anuj Sharma; Shubhamoy Dey
The Internet and Web 2.0 social media have emerged as an important medium for expressing sentiments, opinions, evaluations, and reviews. Sentiment analysis or opinion mining is becoming an open research domain due to the abundance of discussion forums, Weblogs, e-commerce portals, social networking and content sharing sites where people tend to express their opinions. Sentiment Analysis involves classifying text documents based on the opinion expressed being positive or negative about a given topic. This paper proposes a sentiment classification model using back-propagation artificial neural network (BPANN). Information Gain and three popular sentiment lexicons are used to extract sentiment representing features that are then used to train and test the BPANN. This novel approach combines the strength of BPANN in classification accuracy with utilizing intrinsic domain knowledge available in the sentiment lexicons. The results obtained on the movie-review corpora have shown that the proposed approach has been able to reduce dimensionality, while producing accurate sentiment based classification of text.
international conference on information and communication technology | 2016
Shrawan Kumar Trivedi; Shubhamoy Dey
Classification of the spam from bunch of the email files is a challenging research area in text mining domain. However, machine learning based approaches are widely experimented in the literature with enormous success. For excellent learning of the classifiers, few numbers of informative features are important. This researh presents a comparative study between various supervised feature selection methods such as Document Frequency (DF), Chi-Squared (χ2), Information Gain (IG), Gain Ratio (GR), Relief F (RF), and One R (OR). Two corpuses (Enron and SpamAssassin) are selected for this study where enron is main corpus and spamAssassin is used for validation of the results. Bayesian Classifier is taken to classify the given corpuses with the help of features selected by above feature selection techniques. Results of this study shows that RF is the excellent feature selection technique amongst other in terms of classification accuracy and false positive rate whereas DF and X2 were not so effective methods. Bayesian classifier has proven its worth in this study in terms of good performance accuracy and low false positives.
research in adaptive and convergent systems | 2015
Swapnajit Chakraborti; Shubhamoy Dey
With proliferation of web content, nowadays, various information about companies have become publicly available online. These information are mostly text documents such as news, reports, which can provide useful insight into various aspects about corporations. In order to extract useful information from this huge and diverse collection of texts, appropriate state-of-the-art text mining techniques are necessary. In this paper, a novel multi-document extractive text summarization technique, based on topic identification and artificial bee colony optimization, is described which can be used by companies for extracting important facts from the product-specific news items of their competitors and subsequently use them as one of the inputs for strategic business decision making. The results presented in this paper are based on the corpus created by collecting news items for a specific consumer electronics company from authentic news sites available on the internet. The quality of summary generated using this approach is found to be better on many aspects as compared to summaries generated by a well-known benchmark summarizer called MEAD.
research in adaptive and convergent systems | 2014
Shrawan Kumar Trivedi; Shubhamoy Dey
Identification of unsolicited emails or spam in a set of email files has become a challenging area of research. A robust classifier is not only appraised by performance accuracy but also false positive rate. Recently, Evolutionary algorithms and ensemble of classifiers methods have gained popularity in this domain. For developing an accurate and sensitive spam classifier, this research conducts a study of Evolutionary algorithm based classifiers i.e. Genetic Algorithm (GA) and Genetic Programming (GP) along with ensemble techniques. Two publicly available datasets (Enron and SpamAssassin) are used for testing, with the help of most informative features selected by Greedy Stepwise Search algorithm. Results show that without ensemble, GA performs better than GP but after an ensemble of many weak classifiers is developed, GP overshoots GA with significantly higher accuracy. Also, Greedy Stepwise Feature Search is found to be a strong method for feature selection in this application domain. Ensemble based GP turns out to be not only good in terms of classification accuracy but also in terms of low False Positive rates, which is considered to be an important criteria for building a robust spam classifier.
2014 2nd International Symposium on Computational and Business Intelligence | 2014
Swapnajit Chakraborti; Shubhamoy Dey
With increasing adoption of Internet and various social media technologies by companies, the web has become a rich source of information about many aspects of organizational activities. Hence one input towards gathering competitive intelligence is to mine the text sources available on the web for any particular company and use the summarized version of that information for strategic decision making. This paper discusses a methodological approach and an architecture for such summarization system which can be used for gathering competitor intelligence by the companies.
research challenges in information science | 2016
Swapnajit Chakraborti; Shubhamoy Dey
Proliferation of web as an easily accessible information resource has led many corporations to gather competitor intelligence from the internet. While collection of such information is easy from internet, the collation and structuring of them for perusal of business decision makers, is a real trouble. Text clustering based topic identification techniques are expected to be very useful for such application. Using appropriate clustering technologies, the competitor intelligence corpus, gathered from the web, can be divided into topical groups and henceforth the analysis of this information becomes comparatively easier for the managers. This paper presents a study on the effectiveness of standard K-means text clustering algorithm applied at multiple levels, in a top-down, divide-and-conquer fashion, on competitor intelligence corpus, created from publicly available sources on the web, such as news, blogs, research papers etc. The paper also demonstrates the capability of Multi-level K-means (ML-KM) clustering technique to determine the optimal number of clusters as part of clustering process. The cluster validity metric used to determine cluster quality has also been explained along with other user-controlled configuration parameters. It is empirically found that ML-KM technique also addresses one problem of stand-alone standard K-means (S-KM), which is its bias towards convex, spherical clusters, resulting in bigger clusters subsuming smaller ones. This specific advantage of ML-KM over stand-alone S-KM to detect smaller clusters, makes it more suitable for clustering competitor intelligence related text corpus where niche, smaller clusters can actually lead to important findings. The experimental results are presented for both ML-KM and stand-alone S-KM clustering techniques based on competitor intelligence corpus as well as the standard Reuters corpus.