Shrawan Kumar Trivedi
Indian Institute of Management Ahmedabad
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shrawan Kumar Trivedi.
computational science and engineering | 2013
Shrawan Kumar Trivedi; Shubhamoy Dey
Identification of unsolicited emails (spams) is now a well-recognized research area within text classification. A good email classifier is not only evaluated by performance accuracy but also by the false positive rate. This research presents an Enhanced Genetic Programming (EGP) approach which works by building an ensemble of classifiers for detecting spams. The proposed classifier is tested on the most informative features of two public ally available corpuses (Enron and Spam assassin) found using Greedy stepwise search method. Thereafter, the proposed ensemble of classifiers is compared with various Machine Learning Classifiers: Genetic Programming (GP), Bayesian, Naïve Bayes (NB), J48, Random forest (RF), and SVM. Results of this study indicate that the proposed classifier (EGP) is the best classifier among those compared in terms of performance accuracy as well as false positive rate.
international conference on information and communication technology | 2016
Shrawan Kumar Trivedi; Shubhamoy Dey
Classification of the spam from bunch of the email files is a challenging research area in text mining domain. However, machine learning based approaches are widely experimented in the literature with enormous success. For excellent learning of the classifiers, few numbers of informative features are important. This researh presents a comparative study between various supervised feature selection methods such as Document Frequency (DF), Chi-Squared (χ2), Information Gain (IG), Gain Ratio (GR), Relief F (RF), and One R (OR). Two corpuses (Enron and SpamAssassin) are selected for this study where enron is main corpus and spamAssassin is used for validation of the results. Bayesian Classifier is taken to classify the given corpuses with the help of features selected by above feature selection techniques. Results of this study shows that RF is the excellent feature selection technique amongst other in terms of classification accuracy and false positive rate whereas DF and X2 were not so effective methods. Bayesian classifier has proven its worth in this study in terms of good performance accuracy and low false positives.
research in adaptive and convergent systems | 2014
Shrawan Kumar Trivedi; Shubhamoy Dey
Identification of unsolicited emails or spam in a set of email files has become a challenging area of research. A robust classifier is not only appraised by performance accuracy but also false positive rate. Recently, Evolutionary algorithms and ensemble of classifiers methods have gained popularity in this domain. For developing an accurate and sensitive spam classifier, this research conducts a study of Evolutionary algorithm based classifiers i.e. Genetic Algorithm (GA) and Genetic Programming (GP) along with ensemble techniques. Two publicly available datasets (Enron and SpamAssassin) are used for testing, with the help of most informative features selected by Greedy Stepwise Search algorithm. Results show that without ensemble, GA performs better than GP but after an ensemble of many weak classifiers is developed, GP overshoots GA with significantly higher accuracy. Also, Greedy Stepwise Feature Search is found to be a strong method for feature selection in this application domain. Ensemble based GP turns out to be not only good in terms of classification accuracy but also in terms of low False Positive rates, which is considered to be an important criteria for building a robust spam classifier.
Vine | 2016
Shrawan Kumar Trivedi; Shubhamoy Dey
Purpose The email is an important medium for sharing information rapidly. However, spam, being a nuisance in such communication, motivates the building of a robust filtering system with high classification accuracy and good sensitivity towards false positives. In that context, this paper aims to present a combined classifier technique using a committee selection mechanism where the main objective is to identify a set of classifiers so that their individual decisions can be combined by a committee selection procedure for accurate detection of spam. Design/methodology/approach For training and testing of the relevant machine learning classifiers, text mining approaches are used in this research. Three data sets (Enron, SpamAssassin and LingSpam) have been used to test the classifiers. Initially, pre-processing is performed to extract the features associated with the email files. In the next step, the extracted features are taken through a dimensionality reduction method where non-informative features are removed. Subsequently, an informative feature subset is selected using genetic feature search. Thereafter, the proposed classifiers are tested on those informative features and the results compared with those of other classifiers. Findings For building the proposed combined classifier, three different studies have been performed. The first study identifies the effect of boosting algorithms on two probabilistic classifiers: Bayesian and Naive Bayes. In that study, AdaBoost has been found to be the best algorithm for performance boosting. The second study was on the effect of different Kernel functions on support vector machine (SVM) classifier, where SVM with normalized polynomial (NP) kernel was observed to be the best. The last study was on combining classifiers with committee selection where the committee members were the best classifiers identified by the first study i.e. Bayesian and Naive bays with AdaBoost, and the committee president was selected from the second study i.e. SVM with NP kernel. Results show that combining of the identified classifiers to form a committee machine gives excellent performance accuracy with a low false positive rate. Research limitations/implications This research is focused on the classification of email spams written in English language. Only body (text) parts of the emails have been used. Image spam has not been included in this work. We have restricted our work to only emails messages. None of the other types of messages like short message service or multi-media messaging service were a part of this study. Practical implications This research proposes a method of dealing with the issues and challenges faced by internet service providers and organizations that use email. The proposed model provides not only better classification accuracy but also a low false positive rate. Originality/value The proposed combined classifier is a novel classifier designed for accurate classification of email spam.
2016 4th International Symposium on Computational and Business Intelligence (ISCBI) | 2016
Shrawan Kumar Trivedi
In the present world, there is a need of emails communication but unsolicited emails hamper such communications. The present research emphasises to build a spam classification model with/without the use of ensemble of classifiers methods have been incorporated. Through this study, the aim is to distinguish between ham emails and spam emails by making an efficient and sensitive classification model that gives good accuracy with low false positive rate. Greedy Stepwise feature search method has been incorporated for searching informative feature of the Enron email dataset. The comparison has been done among different machine learning classifiers (such as Bayesian, Naïve Bayes, SVM (support vector machine), J48 (decision tree), Bayesian with Adaboost, Naïve Bayes with Adaboost). The concerned classifiers are tested and evaluated on metric (such as F-measure (accuracy), False Positive Rate, and training time). By analysing all these aspects in their entirety, it has been found that SVM is the best classifier to be used. It has the high accuracy and the low false positive rate. However, training time of SVM to build the model is high, but as the results on other parameters are positive, the time does not pose such an issue.
Vine | 2018
Shrawan Kumar Trivedi; Mohit Yadav
Purpose Shopping online is a fast-growing phenomenon. A look into the rapid exponential growth of the primary players in this sector shows huge market potential for e-commerce. Given the convenience of internet shopping, e-commerce is seen as an emerging trend among consumers, specifically the younger generation (Gen Y). The popularity of e-commerce and online shopping has captured the attention of e-retailers, encouraging researchers to focus on this area. The present empirical study examines the relationship between online repurchase intention and other variables such as security, privacy concerns, trust, and ease of use (EOU), mediated by e-satisfaction. Design/methodology/approach A self-administered survey method is employed, and students aged between 20 and 35 at universities in northern India are selected as subjects. To test the hypotheses of this study, an online questionnaire is distributed to participants, with 309 legitimate responses received. The data is analyzed using SPSS version 20.0 and ...
Knowledge and Information Systems | 2018
Shrawan Kumar Trivedi; Shubhamoy Dey
AbstractThis computational research seeks to classify unsolicited versus legitimate emails. A modified version of an existing genetic programming (GP) classifier—i.e., modified genetic programming (MGP)—is implemented to build an ensemble of classifiers to identify unsolicited emails. The proposed classifier is assessed using informative features extracted from two corpora (Enron and SpamAssassin) with the help of the greedy stepwise feature search method. Further, a comparative study is performed with other popular classifiers, such as Bayesian network, naïve Bayes, decision tree, random forest (RF), support vector machine (SVM), and GP. Further the results are validated with 20-fold cross-validation and paired T test. The results prove that the proposed classifier performs better in terms of accuracy and false-positive detection in comparison with the other machine learning classifiers tested in this study. Using different training and testing a set of email files from the Enron corpus, ensemble-based classifiers, such as boosted SVM, boosted Bayesian, boosted naïve Bayesian, RF, and the proposed MGP classifier, are tested and compared on all metrics, including training and testing time. The findings suggest that the MGP classifier with the greedy stepwise feature search method offers an improvement over alternative methods in detecting unsolicited emails.
International Journal of Computer Applications | 2014
Shrawan Kumar Trivedi; Kapil Kaushik
research examines the impact of IT investment on the cost structure of the firm and it also explores the different factors related to IT investment that directly or indirectly can affect the automation process of supply chain as well as performance of the firm. After the brief literature survey this paper identifies different independent variables such as internal and external integration, physical and information integration, use of technology, external environmental condition, internal operational characteristics and appropriate technology. These variables have been identified as important for the automation process of supply chain integration with the use of information technology. Further this paper have taken these variables (dependent and independent) for making a robust model where independent variables are used to improve the automation process of supply chain integration for minimizing production and coordination cost and hence improve the firm performance. The presented model can be further taken for validating empirically. KeywordsInvestment, Supply Chain Integration, Production cost, Transaction Cost, Coordination cost, Cost function, Firm performance.
International Journal of Computer Applications | 2013
Shrawan Kumar Trivedi; Shubhamoy Dey
Journal of Advances in Computer Networks | 2013
Shrawan Kumar Trivedi; Shubhamoy Dey
Collaboration
Dive into the Shrawan Kumar Trivedi's collaboration.
Indian Institute of Information Technology and Management
View shared research outputs