Swarup Chandra | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Swarup Chandra is active.

Explore More

Publication

Featured researches published by Swarup Chandra.

privacy security risk and trust | 2011

Estimating Twitter User Location Using Social Interactions--A Content Based Approach

Swarup Chandra; Latifur Khan; Fahad Bin Muhaya

Microblogging services such as Twitter allow users to interact with each other by forming a social network. The interaction between users in a social network group forms a dialogue or discussion. A typical dialogue between users involves a set of topics. We make the assumption that this set of topics remains constant throughout the conversation. Using this model of social interaction between users in the Twitter social network, along with content-derived location information, we employ a probabilistic framework to estimate the city-level location of a Twitter user, based on the content of the tweets in their dialogues, using reply-tweet messages. We estimate the city-level user location based purely on the content of the tweets, which may include reply-tweet information, without the use of any external information, such as a gazetteer, IP information etc. The current framework for estimating user location does not consider the underlying social interaction, i.e. the structure of interactions between the users. In this paper, we calculate a baseline probability estimate of the distribution of words used by a user. This distribution is formed by using the fact that terms used in the tweets of a certain discussion may be related to the location information of the user initiating the discussion. We also estimate the top K probable cities for a given user and measure the accuracy. We find that our baseline estimation yields an accuracy higher that the 10% accuracy of the current state of the art estimation.

international conference on security and privacy in communication systems | 2014

Towards a Systematic Study of the Covert Channel Attacks in Smartphones

Swarup Chandra; Zhiqiang Lin; Ashish Kundu; Latifur Khan

Recently, there is a great attention on the smartphones security and privacy due to their increasing number of users and wide range of apps. Mobile operating systems such as Android, provide mechanisms for data protection by restricting the communication between apps within the device. However, malicious apps can still overcome such restrictions via various means such as exploiting the software vulnerability in systems or using covert channels for data transferring. In this paper, we aim to systematically analyze various resources available on Android for the possible use of covert channels between two malicious apps. From our systematized analysis, we identify two new hardware resources, namely battery and phone call, that can also be used as covert channels. We also find new features to enrich the existing approaches for better covert channel such as using the audio volume and screen brightness. Our experimental results show that high throughput data transmission can be achieved using these resources for the covert channel attacks.

european symposium on research in computer security | 2017

Securing Data Analytics on SGX with Randomization

Swarup Chandra; Vishal Karande; Zhiqiang Lin; Latifur Khan; Murat Kantarcioglu; Bhavani M. Thuraisingham

Protection of data privacy and prevention of unwarranted information disclosure is an enduring challenge in cloud computing when data analytics is performed on an untrusted third-party resource. Recent advances in trusted processor technology, such as Intel SGX, have rejuvenated the efforts of performing data analytics on a shared platform where data security and trustworthiness of computations are ensured by the hardware. However, a powerful adversary may still be able to infer private information in this setting from side channels such as cache access, CPU usage and other timing channels, thereby threatening data and user privacy. Though studies have proposed techniques to hide such information leaks through carefully designed data-independent access paths, such techniques can be prohibitively slow on models with large number of parameters, especially when employed in a real-time analytics application. In this paper, we introduce a defense strategy that can achieve higher computational efficiency with a small trade-off in privacy protection. In particular, we study a strategy that adds noise to traces of memory access observed by an adversary, with the use of dummy data instances. We quantitatively measure privacy guarantee, and empirically demonstrate the effectiveness and limitation of this randomization strategy, using classification and clustering algorithms. Our results show significant reduction in execution time overhead on real-world data sets, when compared to a defense strategy using only data-oblivious mechanisms.

international conference on big data | 2014

Distributed Adaptive Importance Sampling on graphical models using MapReduce

Ahsanul Haque; Swarup Chandra; Latifur Khan; Charu C. Aggarwal

In the case of a graphical model, machine learning algorithms used to evaluate a query can be broadly classified into exact and approximate inference algorithms. Exact inference algorithms use only network parameters to evaluate a query. However, these algorithms are typically intractable on large networks due to exponential time and space complexity. Approximate inference algorithms are widely used in practice to overcome this constraint, with a trade-off in accuracy. It includes sampling and propagation-based algorithms. These approximate algorithms may also suffer from scalability issues if applied on large networks, for achieving higher accuracy. To address this challenge, we have designed and implemented several MapReduce-based distributed versions of a specific type of approximate inference algorithm called Adaptive Importance Sampling (AIS). We compare and evaluate the proposed approaches using benchmark networks. Experimental results show that our proposed approaches achieve significant scaleup and speedup compared to the sequential method, while achieving similar accuracy asymptotically.

conference on information and knowledge management | 2016

An Adaptive Framework for Multistream Classification

Swarup Chandra; Ahsanul Haque; Latifur Khan; Charu C. Aggarwal

A typical data stream classification involves predicting label of data instances generated from a non-stationary process. Studies in the past decade have focused on this problem setting to address various challenges such as concept drift and concept evolution. Most techniques assume availability of class labels associated with unlabeled data instances, soon after label prediction, for further training and drift detection. Moreover, training and test data distributions are assumed to be similar. These assumptions are not always true in practice. For instance, a semi-supervised setting that aims to utilize only a fraction of labels may induce bias during data selection. Consequently, the resulting data distribution of training and test instances may differ. In this paper, we present a novel stream classification problem setting involving two independent non-stationary data generating processes, relaxing the above assumptions. A source stream continuously generates labeled data instances whose distribution is biased compared to that of a target stream which generates unlabeled data instances from the same domain. The problem, we call Multistream Classification, is to predict the class labels of data instances in the target stream, while utilizing labels available on the source stream. Since concept drift can occur asynchronously on these two streams, we design an adaptive framework that uses a technique for supervised concept drift detection in the biased source stream, and unsupervised concept drift detection in the target stream. A weighted ensemble of classifiers is updated after each drift detection on either streams, while utilizing a bias correction mechanism that leverage source information to predict labels of target instances whenever necessary. We empirically evaluate the multistream classifiers performance on both real-world and synthetic datasets, while comparing with various baseline methods and its variants.

annual computer security applications conference | 2016

Adaptive encrypted traffic fingerprinting with bi-directional dependence

Khaled Al-Naami; Swarup Chandra; Ahmad M. Mustafa; Latifur Khan; Zhiqiang Lin; Kevin W. Hamlen; Bhavani M. Thuraisingham

Recently, network traffic analysis has been increasingly used in various applications including security, targeted advertisements, and network management. However, data encryption performed on network traffic poses a challenge to these analysis techniques. In this paper, we present a novel method to extract characteristics from encrypted traffic by utilizing data dependencies that occur over sequential transmissions of network packets. Furthermore, we explore the temporal nature of encrypted traffic and introduce an adaptive model that considers changes in data content over time. We evaluate our analysis on two packet encrypted applications: website fingerprinting and mobile application (app) fingerprinting. Our evaluation shows how the proposed approach outperforms previous works especially in the open-world scenario and when defense mechanisms are considered.

international conference on data mining | 2014

Stream Mining Using Statistical Relational Learning

Swarup Chandra; Justin Sahs; Latifur Khan; Bhavani M. Thuraisingham; Charu C. Aggarwal

Stream mining has gained popularity in recent years due to the availability of numerous data streams from sources such as social media and sensor networks. Data mining on such continuous streams possess a variety of challenges including concept drift and unbounded stream length. Traditional data mining approaches to these problems have difficulty incorporating relational domain knowledge and feature relationships, which can be used to improve the accuracy of a classifier. In this work, we model large data streams using statistical relational learning techniques for classification, in particular, we use a Markov Logic Network to capture relational features in structured data and show that this approach performs better for supervised learning than current state-of-the-art approaches. Additionally, we evaluate our approach with semi-supervised learning scenarios, where class labels are only partially available during training.

computational intelligence and data mining | 2014

MapReduce guided approximate inference over graphical models

Ahsanul Haque; Swarup Chandra; Latifur Khan; Michael Baron

A graphical model represents the data distribution of a data generating process and inherently captures its feature relationships. This stochastic model can be used to perform inference, to calculate posterior probabilities, in various applications such as classification. Exact inference algorithms are known to be intractable on large networks due to exponential time and space complexity. Approximate inference algorithms are instead widely used in practice to overcome this constraint, with a trade off in accuracy. Stochastic sampling is one such method where an approximate probability distribution is empirically evaluated using various sampling techniques. However, these algorithms may still suffer from scalability issues on large and complex networks. To address this challenge, we have designed and implemented several MapReduce based distributed versions of a specific type of approximate inference algorithm called Adaptive Importance Sampling (AIS). We compare and evaluate the proposed approaches using benchmark networks. Experimental result shows that our approach achieves significant scaleup and speedup compared to the sequential algorithm, while achieving similar accuracy asymptotically.

computer and communications security | 2018

BCD: Decomposing Binary Code Into Components Using Graph-Based Clustering

Vishal Karande; Swarup Chandra; Zhiqiang Lin; Juan Caballero; Latifur Khan; Kevin W. Hamlen

Complex software is built by composing components implementing largely independent blocks of functionality. However, once the sources are compiled into an executable, that modularity is lost. This is unfortunate for code recipients, for whom knowing the components has many potential benefits, such as improved program understanding for reverse-engineering, identifying shared code across different programs, binary code reuse, and authorship attribution. A novel approach for decomposing such source-free program executables into components is here proposed. Given an executable, the approach first statically builds a decomposition graph, where nodes are functions and edges capture three types of relationships: code locality, data references, and function calls. It then applies a graph-theoretic approach to partition the functions into disjoint components. A prototype implementation, BCD, demonstrates the approachs efficacy: Evaluation of BCD with 25 C++ binary programs to recover the methods belonging to each class achieves high precision and recall scores for these tested programs.

international conference on data engineering | 2017

Efficient Multistream Classification Using Direct Density Ratio Estimation

Ahsanul Haque; Swarup Chandra; Latifur Khan; Kevin W. Hamlen; Charu C. Aggarwal

Traditional data stream classification techniques assume that the stream of data is generated from a single non-stationary process. On the contrary, a recently introduced problem setting, referred to as Multistream Classification involves two independent non-stationary data generating processes. One of them is the source stream that continuously generates labeled data instances. The other one is the target stream that generates unlabeled test data instances from the same domain. The distributions represented by the source stream data is biased compared to that of the target stream. Moreover, these streams may have asynchronous concept drifts between them. The multistream classification problem is to predict the class labels of target stream instances, while utilizing labeled data available from the source stream. In this paper, we propose an efficient solution for multistream classification by fusing drift detection into online data shift adaptation. Experiment results on benchmark data sets indicate significantly improved performance over the only existing approach for multistream classification.

Explore More