Jugal K. Kalita
University of Colorado Colorado Springs
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jugal K. Kalita.
IEEE Communications Surveys and Tutorials | 2014
Monowar H. Bhuyan; Dhruba K. Bhattacharyya; Jugal K. Kalita
Network anomaly detection is an important and dynamic research area. Many network intrusion detection methods and systems (NIDS) have been proposed in the literature. In this paper, we provide a structured and comprehensive overview of various facets of network anomaly detection so that a researcher can become quickly familiar with every aspect of network anomaly detection. We present attacks normally encountered by network intrusion detection systems. We categorize existing network anomaly detection methods and systems based on the underlying computational techniques used. Within this framework, we briefly describe and compare a large number of network anomaly detection methods and systems. In addition, we also discuss tools that can be used by network defenders and datasets that researchers in network anomaly detection can use. We also highlight research directions in network anomaly detection.
international conference on social computing | 2010
Beaux Sharifi; Mark-Anthony Hutton; Jugal K. Kalita
Abstract —This paper presents algorithms for summarizingmicroblog posts. In particular, our algorithms process collectionsof short posts on specific topics on the well-known site calledTwitter and create short summaries from these collections ofposts on a specific topic. The goal is to produce summariesthat are similar to what a human would produce for the samecollection of posts on a specific topic. We evaluate the summariesproduced by the summarizing algorithms, compare them withhuman-produced summaries and obtain excellent results. I. I NTRODUCTION Twitter, the microblogging site started in 2006, has becomea social phenomenon, with more than 20 million visitors eachmonth. While the majority posts are conversational or notvery meaningful, about 3.6% of the posts concern topics ofmainstream news 1 . At the end of 2009, Twitter had 75 millionaccount holders, of which about 20% are active 2 . There areapproximately 2.5 million Twitter posts per day 3 . To helppeople who read Twitter posts or tweets, Twitter provides ashort list of popular topics called
privacy security risk and trust | 2011
David I. Inouye; Jugal K. Kalita
Due to the sheer volume of text generated by a micro log site like Twitter, it is often difficult to fully understand what is being said about various topics. In an attempt to understand micro logs better, this paper compares algorithms for extractive summarization of micro log posts. We present two algorithms that produce summaries by selecting several posts from a given set. We evaluate the generated summaries by comparing them to both manually produced summaries and summaries produced by several leading traditional summarization systems. In order to shed light on the special nature of Twitter posts, we include extensive analysis of our results, some of which are unexpected.
The Computer Journal | 2011
Prasanta Gogoi; Dhruba K. Bhattacharyya; Bhogeswar Borah; Jugal K. Kalita
The detection of outliers has gained considerable interest in data mining with the realization that outliers can be the key discovery to be made from very large databases. Outliers arise due to various reasons such as mechanical faults, changes in system behavior, fraudulent behavior, human error and instrument error. Indeed, for many applications the discovery of outliers leads to more interesting and useful results than the discovery of inliers. Detection of outliers can lead to identification of system faults so that administrators can take preventive measures before they escalate. It is possible that anomaly detection may enable detection of new attacks. Outlier detection is an important anomaly detection approach. In this paper, we present a comprehensive survey of well-known distance-based, density-based and other techniques for outlier detection and compare them. We provide definitions of outliers and discuss their detection based on supervised and unsupervised learning in the context of network anomaly detection.
web based communities | 2013
James Benhardus; Jugal K. Kalita
As social media continue to grow, the zeitgeist of society is increasingly found not in the headlines of traditional media institutions, but in the activity of ordinary individuals. The identification of trending topics utilises social media (such as Twitter) to provide an overview of the topics and issues that are currently popular within the online community. In this paper, we outline methodologies of detecting and identifying trending topics from streaming data. Data from Twitters streaming API was collected and put into documents of equal duration using data collection procedures that allow for analysis over multiple timespans, including those not currently associated with Twitter-identified trending topics. Term frequency-inverse document frequency analysis and relative normalised term frequency analysis were performed on the documents to identify the trending topics. Relative normalised term frequency analysis identified unigrams, bigrams, and trigrams as trending topics, while term frequency-inverse document frequency analysis identified unigrams as trending topics. Application of these methodologies to streaming data resulted in F-measures ranging from 0.1468 to 0.7508.
conference on information and knowledge management | 2001
Aleksander Kolcz; Vidya Prabakarmurthi; Jugal K. Kalita
We address the problem of evaluating the effectiveness of summarization techniques for the task of document categorization. It is argued that for a large class of automatic categorization algorithms, extraction-based document categorization can be viewed as a particular form of feature selection performed on the full text of the document and, in this context, its impact can be compared with state-of-the-art feature selection techniques especially devised to provide good categorization performance. Such a framework provides for a better assessment of the expected performance of a categorizer if the compression rate of the summarizer is known.
The Computer Journal | 2014
Monowar H. Bhuyan; Hirak Kashyap; Dhruba K. Bhattacharyya; Jugal K. Kalita
The minimal processing and best-e↵ort forwarding of any packet, malicious or not, was the prime concern when the Internet was designed. This architecture creates an unregulated network path, which can be exploited by any cyber attacker motivated by revenge, prestige, politics or money. Denial-of-service (DoS) attacks exploit this to target critical Web services [1, 2, 3, 4, 5]. This type of attack is intended to make a computer resource unavailable to its legitimate users. Denial of service attack programs have been around for many years. Old single source attacks are now countered easily by many defense mechanisms and the source of these attacks can be easily rebu↵ed or shut down with improved tracking capabilities. However, with the astounding growth of the Internet during the last decade, an increasingly large number of vulnerable systems are now available to attackers. Attackers can now employ a large number of these vulnerable hosts to launch an attack instead of using a single server, an approach which is not very e↵ective and detected easily. A distributed denial of service (DDoS) attack [1, 6] is a large-scale, coordinated attack on the availability of services of a victim system or network resources, launched indirectly through many compromised computers on the Internet. The first well-documented DDoS attack appears to have occurred in August 1999, when a DDoS tool called Trinoo was deployed in at least 227 systems, to flood a single University of Minnesota computer, which was knocked down for more than two days1. The first largescale DDoS attack took place on February 20001. On February 7, Yahoo! was the victim of a DDoS attack during which its Internet portal was inaccessible for three hours. On February 8, Amazon, Buy.com, CNN and eBay were all hit by DDoS attacks that caused them to either stop functioning completely or slowed them down significantly1. DDoS attack networks follow two types of architectures: the Agent-Handler architecture and the Internet Relay Chat (IRC)-based architecture as discussed by [7]. The Agent-Handler architecture for DDoS attacks is comprised of clients, handlers, and agents (see Figure 6). The attacker communicates with the rest of the DDoS attack system at the client systems. The handlers are often software packages located throughout the Internet that are used by the client to communicate with the agents. Instances of the agent software are placed in the compromised systems that finally carry out the attack. The owners and users of the agent systems are generally unaware of the situation. In the IRC-based DDoS attack architecture, an IRC communication channel is used to connect the client(s) to the agents. IRC
Expert Systems With Applications | 2014
Nazrul Hoque; Dhruba K. Bhattacharyya; Jugal K. Kalita
Abstract Feature selection is used to choose a subset of relevant features for effective classification of data. In high dimensional data classification, the performance of a classifier often depends on the feature subset used for classification. In this paper, we introduce a greedy feature selection method using mutual information. This method combines both feature–feature mutual information and feature–class mutual information to find an optimal subset of features to minimize redundancy and to maximize relevance among features. The effectiveness of the selected feature subset is evaluated using multiple classifiers on multiple datasets. The performance of our method both in terms of classification accuracy and execution time performance, has been found significantly high for twelve real-life datasets of varied dimensionality and number of instances when compared with several competing feature selection techniques.
Bioinformatics | 2014
Connor Clark; Jugal K. Kalita
MOTIVATION As biological inquiry produces ever more network data, such as protein-protein interaction networks, gene regulatory networks and metabolic networks, many algorithms have been proposed for the purpose of pairwise network alignment-finding a mapping from the nodes of one network to the nodes of another in such a way that the mapped nodes can be considered to correspond with respect to both their place in the network topology and their biological attributes. This technique is helpful in identifying previously undiscovered homologies between proteins of different species and revealing functionally similar subnetworks. In the past few years, a wealth of different aligners has been published, but few of them have been compared with one another, and no comprehensive review of these algorithms has yet appeared. RESULTS We present the problem of biological network alignment, provide a guide to existing alignment algorithms and comprehensively benchmark existing algorithms on both synthetic and real-world biological data, finding dramatic differences between existing algorithms in the quality of the alignments they produce. Additionally, we find that many of these tools are inconvenient to use in practice, and there remains a need for easy-to-use cross-platform tools for performing network alignment.
Journal of Network and Computer Applications | 2014
Nazrul Hoque; Monowar H. Bhuyan; Ram Charan Baishya; D. K. Bhattacharyya; Jugal K. Kalita
To prevent and defend networks from the occurrence of attacks, it is highly essential that we have a broad knowledge of existing tools and systems available in the public domain. Based on the behavior and possible impact or severity of damages, attacks are categorized into a number of distinct classes. In this survey, we provide a taxonomy of attack tools in a consistent way for the benefit of network security researchers. This paper also presents a comprehensive and structured survey of existing tools and systems that can support both attackers and network defenders. We discuss pros and cons of such tools and systems for better understanding of their capabilities. Finally, we include a list of observations and some research challenges that may help new researchers in this field based on our hands-on experience.