Sanjay Kumar Sahay
Birla Institute of Technology and Science
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sanjay Kumar Sahay.
International Journal of Computer Applications | 2014
Ashu Sharma; Sanjay Kumar Sahay
Malwares are big threat to digital world and evolving with high complexity. It can penetrate networks, steal confidential information from computers, bring down servers and can cripple infrastructures etc. To combat the threat/attacks from the malwares, anti- malwares have been developed. The existing anti-malwares are mostly based on the assumption that the malware structure does not changes appreciably. But the recent advancement in second generation malwares can create variants and hence posed a challenge to anti-malwares developers. To combat the threat/attacks from the second generation malwares with low false alarm we present our survey on malwares and its detection techniques.
arXiv: Information Retrieval | 2015
Rajendra Kumar Roul; Saransh Varshneya; Ashu Kalra; Sanjay Kumar Sahay
The Traditional apriori algorithm can be used for clustering the web documents based on the association technique of data mining. But this algorithm has several limitations due to repeated database scans and its weak association rule analysis. In modern world of large databases, efficiency of traditional apriori algorithm would reduce manifolds. In this paper, we proposed a new modified apriori approach by cutting down the repeated database scans and improving association analysis of traditional apriori algorithm to cluster the web documents. Further we improve those clusters by applying Fuzzy C-Means (FCM), K-Means and Vector Space Model (VSM) techniques separately. We use Classic3 and Classic4 datasets of Cornell University having more than 10,000 documents and run both traditional apriori and our modified apriori approach on it. Experimental results show that our approach outperforms the traditional apriori algorithm in terms of database scan and improvement on association of analysis.
Monthly Notices of the Royal Astronomical Society | 2002
D. C. Srivastava; Sanjay Kumar Sahay
We have studied the problem of an all-sky search in reference to a continuous gravitational wave particularly for such sources whose wave-form are known in advance. We have made an analysis of the number of templates required for matched-filter analysis as applicable to these sources. We have employed the concept of fitting factor (FF); treating the source location as the parameters of the signal manifold and have studied the matching of the signal with templates corresponding to different source locations. We have investigated the variation of FF with source location and have noticed a symmetry in template parameters, θ T and Φ T . It has been found that the two different template values in source location, each in θ T and Φ T , have the same FF. We have also computed the number of templates required assuming the noise power spectral density S n (f) to be flat. It is observed that higher FF requires an exponentially increasing, large number of templates.
software engineering artificial intelligence networking and parallel distributed computing | 2015
Rajendra Kumar Roul; Ashish Nanda; Viraj Patel; Sanjay Kumar Sahay
The World Wide Web serves as a huge repository of information that is highly dynamic, diverse and growing at an exponential rate in a lightening speed. In order to speed-up and further improve tasks like information search and retrieval, personalization etc; it is highly important to develop techniques to classify text documents more accurately and efficiently than before. This paper is an effort in that direction, where the effectiveness of Extreme Learning Machines(ELM) in the domain of text classification is studied and compared with many of the existing relevant techniques like Support Vector Machines(SVM), which are currently one of the most popular and effective techniques for classifying text documents. Ours is one of the few works that highlight the high performance of ELM in the field of text classification, by implementing classifiers based on different interpretations of ELM, analyzing their performance, and studying which feature selection techniques are most suited to improve their accuracy. In our multi-class classification problem, we studied a single ELM classifier based on the one-against-all scheme, and a multi-layer ELM classifier inspired from deep networks, and then perform extensive experiments on different datasets to demonstrate the applicability and effectiveness of our approach. Results show that ELM based classifiers can outperform many of the traditional classification techniques including the most powerful state-of-the-art technique such as SVM.
arXiv: Cryptography and Security | 2016
Ashu Sharma; Sanjay Kumar Sahay; Abhishek Kumar
Detection of unknown malware with high accuracy is always a challenging task. Therefore, in this paper we study the classification of unknown malware by two methods. In the first/regular method, similar to other authors (Mehdi et al. Proceedings of the 11th Annual conference on Genetic and evolutionary computation 2009, Moskovitch et al. Intelligence and Security Informatics 2008, Ravi and Manoharan Int J Comput Appl 43(17):12–16 2012) approaches we select the features by taking all dataset in one group and in second method, we select the features by partitioning the dataset in the range of file 5 KB size. We find that the second method detect the malwares with ~8.7 % more accurate than the first/regular method.
Transactions on Machine Learning and Artificial Intelligence | 2014
Aruna Govada; Shree Ranjani; Aditi Viswanathan; Sanjay Kumar Sahay
With data sizes constantly expanding, and with classical machine learning algorithms that analyze such data requiring larger and larger amounts of computation time and storage space, the need to distribute computation and memory requirements among several computers has become apparent. Although substantial work has been done in developing distributed binary SVM algorithms and multi-class SVM algorithms individually, the field of multi-class distributed SVMs remains largely unexplored. This research proposes a novel algorithm that implements the Support Vector Machine over a multi-class dataset and is efficient in a distributed environment (here, Hadoop). The idea is to divide the dataset into half recursively and thus compute the optimal Support Vector Machine for this half during the training phase, much like a divide and conquer approach. While testing, this structure has been effectively exploited to significantly reduce the prediction time. Our algorithm has shown better computation time during the prediction phase than the traditional sequential SVM methods (One vs. One, One vs. Rest) and out-performs them as the size of the dataset grows. This approach also classifies the data with higher accuracy than the traditional multi-class algorithms.
Monthly Notices of the Royal Astronomical Society | 2002
D. C. Srivastava; Sanjay Kumar Sahay
In this paper we obtain the Fourier Transform of a continuous gravitational wave. We have analysed the data set for (i) a 1-yr observation time and (ii) an arbitrary observation time, for an arbitrary location of detector and source, taking into account the effects arising due to the rotational as well as orbital motion of the Earth. As an application of the transform we considered spin-down and N-component signal analysis.
ieee india conference | 2016
Rajendra Kumar Roul; Sanjay Kumar Sahay
Extreme learning machine (ELM) is based on single layer feed forward neural networks (SLFNs) and has become a rapidly developing learning technology today. Recently developed Multilayer form of ELM called ML-ELM which is based on the architecture of deep learning, become more popular compared to other traditional classifiers because of its important qualities such as multiple non-linear transformation of input data, higher level abstraction of data, learning different form of input data, capable of managing huge volume of data etc. In addition to the above, another good quality which ML-ELM possesses is its ability to map the input feature vector non-linearly to an extended dimensional feature space for giving better performance. This paper proposes an approach where unsupervised and semi-supervised clustering using kMeans and seeded-kMeans have been done in ML-ELM feature space. The empirical results of the proposed approach on two benchmark datasets outperform the results of clustering done in TF-IDF vector space. Also, it is observed that in ML-ELM feature space, the results of seeded-kMeans are better compared to the traditional kMeans.
machine learning and data mining in pattern recognition | 2015
Aruna Govada; Pravin Joshi; Sahil Mittal; Sanjay Kumar Sahay
Semi supervised learning methods have gained importance in todays world because of large expenses and time involved in labeling the unlabeled data by human experts. The proposed hybrid approach uses SVM and Label Propagation to label the unlabeled data. In the process, at each step SVM is trained to minimize the error and thus improve the prediction quality. Experiments are conducted by using SVM and logistic regression(Logreg). Results prove that SVM performs tremendously better than Logreg. The approach is tested using 12 datasets of different sizes ranging from the order of 1000s to the order of 10000s. Results show that the proposed approach outperforms Label Propagation by a large margin with F-measure of almost twice on average. The parallel version of the proposed approach is also designed and implemented, the analysis shows that the training time decreases significantly when parallel version is used.
international symposium on women in computing and informatics | 2015
Aruna Govada; Bhavul Gauri; Sanjay Kumar Sahay
Data mining algorithms are originally designed by assuming the data is available at one centralized site. These algorithms also assume that the whole data is fit into main memory while running the algorithm. But in todays scenario the data has to be handled is distributed even geographically. Bringing the data into a centralized site is a bottleneck in terms of the bandwidth when compared with the size of the data. In this paper for multiclass SVM we propose an algorithm which builds a global SVM model by merging the local SVMs using a distributed approach(DSVM). And the global SVM will be communicated to each site and made it available for further classification. The experimental analysis has shown promising results with better accuracy when compared with both the centralized and ensemble method. The time complexity is also reduced drastically because of the parallel construction of local SVMs. The experiments are conducted by considering the data sets of size 100s to hundred of 100s which also addresses the issue of scalability.