Bhogeswar Borah
Tezpur University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bhogeswar Borah.
The Computer Journal | 2011
Prasanta Gogoi; Dhruba K. Bhattacharyya; Bhogeswar Borah; Jugal K. Kalita
The detection of outliers has gained considerable interest in data mining with the realization that outliers can be the key discovery to be made from very large databases. Outliers arise due to various reasons such as mechanical faults, changes in system behavior, fraudulent behavior, human error and instrument error. Indeed, for many applications the discovery of outliers leads to more interesting and useful results than the discovery of inliers. Detection of outliers can lead to identification of system faults so that administrators can take preventive measures before they escalate. It is possible that anomaly detection may enable detection of new attacks. Outlier detection is an important anomaly detection approach. In this paper, we present a comprehensive survey of well-known distance-based, density-based and other techniques for outlier detection and compare them. We provide definitions of outliers and discuss their detection based on supervised and unsupervised learning in the context of network anomaly detection.
Journal of Computers | 2008
Bhogeswar Borah; Dhruba K. Bhattacharyya
Finding clusters with widely differing sizes, shapes and densities in presence of noise and outliers is a challenging job. The DBSCAN is a versatile clustering algorithm that can find clusters with differing sizes and shapes in databases containing noise and outliers. But it cannot find clusters based on difference in densities. We extend the DBSCAN algorithm so that it can also detect clusters that differ in densities. Local densities within a cluster are reasonably homogeneous. Adjacent regions are separated into different clusters if there is significant change in densities. Thus the algorithm attempts to find density based natural clusters that may not be separated by any sparse region. Computational complexity of the algorithm is O(n log n).
International Journal of Approximate Reasoning | 2014
Pritpal Singh; Bhogeswar Borah
In real time, one observation always relies on several observations. To improve the forecasting accuracy, all these observations can be incorporated in forecasting models. Therefore, in this study, we have intended to introduce a new Type-2 fuzzy time series model that can utilize more observations in forecasting. Later, this Type-2 model is enhanced by employing particle swarm optimization (PSO) technique. The main motive behind the utilization of the PSO with the Type-2 model is to adjust the lengths of intervals in the universe of discourse that are employed in forecasting, without increasing the number of intervals. The daily stock index price data set of SBI (State Bank of India) is used to evaluate the performance of the proposed model. The proposed model is also validated by forecasting the daily stock index price of Google. Our experimental results demonstrate the effectiveness and robustness of the proposed model in comparison with existing fuzzy time series models and conventional time series models.
Engineering Applications of Artificial Intelligence | 2013
Pritpal Singh; Bhogeswar Borah
In this paper, we present a new model to handle four major issues of fuzzy time series forecasting, viz., determination of effective length of intervals, handling of fuzzy logical relationships (FLRs), determination of weight for each FLR, and defuzzification of fuzzified time series values. To resolve the problem associated with the determination of length of intervals, this study suggests a new time series data discretization technique. After generating the intervals, the historical time series data set is fuzzified based on fuzzy time series theory. Each fuzzified time series values are then used to create the FLRs. Most of the existing fuzzy time series models simply ignore the repeated FLRs without any proper justification. Since FLRs represent the patterns of historical events as well as reflect the possibility of appearances of these types of patterns in the future. If we simply discard the repeated FLRs, then there may be a chance of information lost. Therefore, in this model, it is recommended to consider the repeated FLRs during forecasting. It is also suggested to assign weights on the FLRs based on their severity rather than their patterns of occurrences. For this purpose, a new technique is incorporated in the model. This technique determines the weight for each FLR based on the index of the fuzzy set associated with the current state of the FLR. To handle these weighted FLRs and to obtain the forecasted results, this study proposes a new defuzzification technique. The proposed model is verified and validated with three different time series data sets. Empirical analyses signify that the proposed model have the robustness to handle one-factor time series data set very efficiently than the conventional fuzzy time series models. Experimental results show that the proposed model also outperforms over the conventional statistical models.
The Computer Journal | 2014
Prasanta Gogoi; D. K. Bhattacharyya; Bhogeswar Borah; Jugal K. Kalita
With the growth of networked computers and associated applications, intrusion detection has become essential to keeping networks secure. A number of intrusion detection methods have been developed for protecting computers and networks using conventional statistical methods as well as data mining methods. Data mining methods for misuse and anomaly-based intrusion detection, usually encompass supervised, unsupervised and outlier methods. It is necessary that the capabilities of intrusion detection methods be updated with the creation of new attacks. This paper proposes a multi-level hybrid intrusion detection method that uses a combination of supervised, unsupervised and outlierbased methods for improving the efficiency of detection of new and old attacks. The method is evaluated with a captured real-time flow and packet dataset called the Tezpur University intrusion detection system (TUIDS) dataset, a distributed denial of service dataset, and the benchmark intrusion dataset called the knowledge discovery and data mining Cup 1999 dataset and the new version of KDD (NSL-KDD) dataset. Experimental results are compared with existing multi-level intrusion detection methods and other classifiers. The performance of our method is very good.
international conference on signal processing | 2007
Bhogeswar Borah; Dhruba K. Bhattacharyya
Finding clusters with widely differing sizes, shapes and densities in presence of noise and outliers is a challenging job. The DBSCAN algorithm is a versatile clustering algorithm that can find clusters with differing size and shape in databases containing noise and outliers. But it cannot find clusters with different densities. We extend the DBSCAN algorithm so that it can also detect clusters that differ in densities. While expanding a cluster local density is taken into consideration. Starting with a core object a cluster is extended by expanding only those density connected core objects whose neighbourhood sizes are within certain ranges as determined by their neighbours already existing in the cluster. Our algorithm detects clusters even if they are not separated by sparse regions. The computational complexity of the modified algorithm (O(n log n)) remains same as the original DBSCAN
Knowledge and Information Systems | 2014
Pritpal Singh; Bhogeswar Borah
Fuzzy time series forecasting method has been applied in several domains, such as stock market price, temperature, sales, crop production and academic enrollments. In this paper, we introduce a model to deal with forecasting problems of two factors. The proposed model is designed using fuzzy time series and artificial neural network. In a fuzzy time series forecasting model, the length of intervals in the universe of discourse always affects the results of forecasting. Therefore, an artificial neural network- based technique is employed for determining the intervals of the historical time series data sets by clustering them into different groups. The historical time series data sets are then fuzzified, and the high-order fuzzy logical relationships are established among fuzzified values based on fuzzy time series method. The paper also introduces some rules for interval weighing to defuzzify the fuzzified time series data sets. From experimental results, it is observed that the proposed model exhibits higher accuracy than those of existing two-factors fuzzy time series models.
2015 International Symposium on Advanced Computing and Communication (ISACC) | 2015
Nazreena Rahman; Bhogeswar Borah
Condensation of document information from the text according to the query is primarily a concerned matter due to the rapid growth of information. In fact, there is not usually enough time to scan through the contents and understand each document and make decision based on the queries. Hence, there is a great demand for query based summarization of text documents. Therefore, text summarization is considered to be a challenging task and a paramount research area these days. Moreover, query-based text summarization is a significant problem, which has innumerable applications. It becomes increasingly important to extensively study the different mechanisms that can provide an effective and a short summary. Therefore, in this paper, an effort has been made to go through the various extractive based approaches for query-based text summarization so that time and effort can be reduced to find a useful summary for the users as specified by their need.
international conference on advanced computing | 2007
Bhogeswar Borah; Dhruba K. Bhattacharyya
Biclustering algorithms simultaneously cluster both rows and columns. This type of algorithms are applied to gene expression data analysis to find a subset of genes that exhibit similar expression pattern under a subset of conditions. Cheng and Church introduced the mean squared residue measure to capture the coherence of a subset of genes over a subset of conditions. They provided a set of heuristic algorithms based primarily on node deletion to find one bicluster or a set of biclusters after masking discovered biclusters with random values. Masking of discovered biclusters with random values interferes with discovery of high quality biclusters. We provide an efficient node addition algorithm to find a set of biclusters without the need of masking discovered biclusters. Initialized with a gene and a subset of conditions, a bicluster is extended by adding more genes and conditions. Thus it provides facility to study individual genes, besides generating a large number of biclusters with different initializations. Biclusters with lower or higher scores within a specified limit can be generated by parameter setting. Use of incremental method of computing score makes the algorithm faster.
pattern recognition and machine intelligence | 2005
Bhogeswar Borah; Dhruba K. Bhattacharyya
This paper presents an efficient image retrieval technique based on content using segmentation approach and by considering global distribution of color. To cope with significant appearance changes, the method uses a global size and shape histogram to represent the image regions obtained after segmenting the image based on color similarity. The indexing technique can be found to be significant in comparison to its other counterparts, such as moment based method [12], due to its transformation invariance and effective retrieval performance over several application domains.