Prakash Mandayam Comar

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Prakash Mandayam Comar is active.

Explore More

Publication

Featured researches published by Prakash Mandayam Comar.

international conference on computer communications | 2013

Combining supervised and unsupervised learning for zero-day malware detection

Prakash Mandayam Comar; Lei Liu; Sabyasachi Saha; Pang Ning Tan; Antonio Nucci

Malware is one of the most damaging security threats facing the Internet today. Despite the burgeoning literature, accurate detection of malware remains an elusive and challenging endeavor due to the increasing usage of payload encryption and sophisticated obfuscation methods. Also, the large variety of malware classes coupled with their rapid proliferation and polymorphic capabilities and imperfections of real-world data (noise, missing values, etc) continue to hinder the use of more sophisticated detection algorithms. This paper presents a novel machine learning based framework to detect known and newly emerging malware at a high precision using layer 3 and layer 4 network traffic features. The framework leverages the accuracy of supervised classification in detecting known classes with the adaptability of unsupervised learning in detecting new classes. It also introduces a tree-based feature transformation to overcome issues due to imperfections of the data and to construct more informative features for the malware detection task. We demonstrate the effectiveness of the framework using real network data from a large Internet service provider.

Neurocomputing | 2012

A framework for joint community detection across multiple related networks

Prakash Mandayam Comar; Pang Ning Tan; Anil K. Jain

Community detection in networks is an active area of research with many practical applications. However, most of the early work in this area has focused on partitioning a single network or a bipartite graph into clusters/communities. With the rapid proliferation of online social media, it has become increasingly common for web users to have noticeable presence across multiple web sites. This raises the question whether it is possible to combine information from several networks to improve community detection. In this paper, we present a framework that identifies communities simultaneously across different networks and learns the correspondences between them. The framework is applicable to networks generated from multiple web sites as well as to those derived from heterogeneous nodes of the same web site. It also allows the incorporation of prior information about the potential relationships between the communities in different networks. Extensive experiments have been performed on both synthetic and real-life data sets to evaluate the effectiveness of our framework. Our results show superior performance of simultaneous community detection over three alternative methods, including normalized cut and matrix factorization on a single network or a bipartite graph.

international conference on data mining | 2011

LinkBoost: A Novel Cost-Sensitive Boosting Framework for Community-Level Network Link Prediction

Prakash Mandayam Comar; Pang Ning Tan; Anil K. Jain

Link prediction is a challenging task due to the inherent skew ness of network data. Typical link prediction methods can be categorized as either local or global. Local methods consider the link structure in the immediate neighborhood of a node pair to determine the presence or absence of a link, whereas global methods utilize information from the whole network. This paper presents a community (cluster) level link prediction method without the need to explicitly identify the communities in a network. Specifically, a variable-cost loss function is defined to address the data skew ness problem. We provide theoretical proof that shows the equivalence between maximizing the well-known modularity measure used in community detection and minimizing a special case of the proposed loss function. As a result, any link prediction method designed to optimize the loss function would result in more links being predicted within a community than between communities. We design a boosting algorithm to minimize the loss function and present an approach to scale-up the algorithm by decomposing the network into smaller partitions and aggregating the weak learners constructed from each partition. Experimental results show that our proposed Link Boost algorithm consistently performs as good as or better than many existing methods when evaluated on 4 real-world network datasets.

conference on information and knowledge management | 2012

Weighted linear kernel with tree transformed features for malware detection

Prakash Mandayam Comar; Lei Liu; Sabyasachi Saha; Antonio Nucci; Pang Ning Tan

Malware detection from network traffic flows is a challenging problem due to data irregularity issues such as imbalanced class distribution, noise, missing values, and heterogeneous types of features. To address these challenges, this paper presents a two-stage classification approach for malware detection. The framework initially employs random forest as a macro-level classifier to separate the malicious from non-malicious network flows, followed by a collection of one-class support vector machine classifiers to identify the specific type of malware. A novel tree-based feature construction approach is proposed to deal with data imperfection issues. As the performance of the support vector machine classifier often depends on the kernel function used to compute the similarity between every pair of data points, designing an appropriate kernel is essential for accurate identification of malware classes. We present a simple algorithm to construct a weighted linear kernel on the tree transformed features and demonstrate its effectiveness in detecting malware from real network traffic data.

Data Mining and Knowledge Discovery | 2012

Simultaneous classification and community detection on heterogeneous network data

Prakash Mandayam Comar; Pang Ning Tan; Anil K. Jain

Previous studies on network mining have focused primarily on learning a single task (such as classification or community detection) on a given network. This paper considers the problem of multi-task learning on heterogeneous network data. Specifically, we present a novel framework that enables one to perform classification on one network and community detection in another related network. Multi-task learning is accomplished by introducing a joint objective function that must be optimized to ensure the classes in one network are consistent with the link structure, nodal attributes, as well as the communities detected in another network. We provide both theoretical and empirical analysis of the framework. We also show that the framework can be extended to incorporate prior information about the correspondences between the clusters and classes in different networks. Experiments performed on both real-world and synthetic data sets demonstrate the effectiveness of the joint framework compared to applying classification and community detection algorithms on each network separately.

advances in social networks analysis and mining | 2013

Community detection by popularity based models for authored networked data

Tianbao Yang; Prakash Mandayam Comar; Linli Xu

Community detection has emerged as an attractive topic due to the increasing need to understand and manage the networked data of tremendous magnitude. Networked data usually consists of links between the entities and the attributes for describing the entities. Various approaches have been proposed for detecting communities by utilizing the link information and/or attribute information. In this work, we study the problem of community detection for networked data with additional authorship information. By authorship, each entity in the network is authored by another type of entities (e.g., wiki pages are edited by users, products are purchased by customers), to which we refer as authors. Communities of entities are affected by their authors, e.g., two entities that are associated with the same author tend to belong to the same community. Therefore leveraging the authorship information would help us better detect the communities in the networked data. However, it also brings new challenges to community detection. The foremost question is how to model the correlation between communities and authorships. In this work, we address this question by proposing probabilistic models based on the popularity link model [1], which is demonstrated to yield encouraging results for community detection. We employ two methods for modeling the authorships: (i) the first one generates the authorships independently from links by community memberships and popularities of authors by analogy of the popularity link model; (ii) the second one models the links between entities based on authorships together with community memberships and popularities of nodes, which is an analog of previous author-topic model. Upon the basic models, we explore several extensions including (i) we model the community memberships of authors by that of their authored entities to reduce the number of redundant parameters; and (ii) we model the communities memberships of entities and/or authors by their attributes using a discriminative approach. We demonstrate the effectiveness of the proposed models by empirical studies.

conference on information and knowledge management | 2010

Multi task learning on multiple related networks

Prakash Mandayam Comar; Pang Ning Tan; Anil K. Jain

With the rapid proliferation of online social networks, the need for newer class of learning algorithm to simultaneously deal with multiple related networks has become increasingly important. This paper proposes an approach for multi-task learning in multiple related networks, where in we perform different tasks such as classification on one network and clustering on the other. We show that the framework can be extended to incorporate prior information about the correspondences between the clusters and classes in different networks. We have performed experiments on real-world data sets to demonstrate the effectiveness of the proposed framework.

international symposium on neural networks | 2016

Crowdsourcing of network data

Ding Wang; Prakash Mandayam Comar; Pang Ning Tan

A key requirement for supervised learning is the availability of sufficient amount of labeled data to build an accurate prediction model. However, obtaining labeled data can be manually tedious and expensive. This paper examines the use of crowdsourcing technology to acquire labeled examples for classifying network data. Unfortunately, creating human intelligence tasks (HITs) to enable crowdsourcing is cumbersome for network data and may even be prohibitive for privacy reasons. To overcome this limitation, we present a novel framework called surrogate learning to transform the network data into a new representation (i.e., images) so that the labeling task can be completed even by non-domain experts. We analyze the reconstruction error of the transformation and use the theoretical insights to provide guidance on how to develop an effective surrogate learning approach for any given network and source image corpus. We also performed extensive experiments using Amazon Mechanical Turk to demonstrate the efficacy of our approach on node classification problems.

conference on information and knowledge management | 2017

Intent Based Relevance Estimation from Click Logs

Prakash Mandayam Comar; Srinivasan H. Sengamedu

Estimating the relevance of documents based on the user feedback is an essential component of search, retrieval and ranking problems. User click modeling in search has focused primarily on factoring out the position bias. It is easy to see that the query type (generic queries vs specific queries) and user intent (purchase vs exploration) also introduce a bias in the click signal. In other words, the results not matching with the user intent will not be clicked. In this paper, we outline a technique to model the interplay of query, user intent and position bias with respect to the relevance of the retrieved search results. In particular, we define two intents namely purchase and explore, and estimate the relevance of the documents with respect to these two intents. We also relate them to the relevance estimates from considering only the position bias. We empirically demonstrate the effectiveness of the proposed approach by comparing its performance against the well-known CoEC measure and the recently proposed factor model approach for relevance estimation.

international conference on pattern recognition | 2012