Suprakash Datta | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Suprakash Datta is active.

Explore More

Publication

Featured researches published by Suprakash Datta.

information processing in sensor networks | 2007

Localization in wireless sensor networks

Masoomeh Rudafshani; Suprakash Datta

A fundamental problem in wireless sensor networks is localization - the determination of the geographical locations of sensors. Most existing localization algorithms were designed to work well either in networks of static sensors or networks in which all sensors are mobile. In this paper, we propose two localization algorithms, MSL and MSL*, that work well when any number of sensors are static or mobile. MSL and MSL* are range-free algorithms - they do not require that sensors are equipped with hardware to measure signal strengths, angles of arrival of signals or distances to other sensors. We present simulation results to demonstrate that MSL and MSL* outperform existing algorithms in terms of localization error in very different mobility conditions. MSL* outperforms MSL in most scenarios, but incurs a higher communication cost. MSL outperforms MSL* when there is significant irregularity in the radio range. We also point out some problems with a well known lower bound for the error in any range-free localization algorithm in static sensor networks.

Cytometry Part A | 2014

SWIFT—scalable clustering for automated identification of rare cell populations in large, high‐dimensional flow cytometry datasets, Part 1: Algorithm design

Iftekhar Naim; Suprakash Datta; Jonathan Rebhahn; James S. Cavenaugh; Tim R. Mosmann; Gaurav Sharma

We present a model‐based clustering method, SWIFT (Scalable Weighted Iterative Flow‐clustering Technique), for digesting high‐dimensional large‐sized datasets obtained via modern flow cytometry into more compact representations that are well‐suited for further automated or manual analysis. Key attributes of the method include the following: (a) the analysis is conducted in the multidimensional space retaining the semantics of the data, (b) an iterative weighted sampling procedure is utilized to maintain modest computational complexity and to retain discrimination of extremely small subpopulations (hundreds of cells from datasets containing tens of millions), and (c) a splitting and merging procedure is incorporated in the algorithm to preserve distinguishability between biologically distinct populations, while still providing a significant compaction relative to the original data. This article presents a detailed algorithmic description of SWIFT, outlining the application‐driven motivations for the different design choices, a discussion of computational complexity of the different steps, and results obtained with SWIFT for synthetic data and relatively simple experimental data that allow validation of the desirable attributes. A companion paper (Part 2) highlights the use of SWIFT, in combination with additional computational tools, for more challenging biological problems.

Cytometry Part A | 2014

SWIFT-scalable clustering for automated identification of rare cell populations in large, high-dimensional flow cytometry datasets, part 2: biological evaluation.

Tim R. Mosmann; Iftekhar Naim; Jonathan Rebhahn; Suprakash Datta; James S. Cavenaugh; Jason M. Weaver; Gaurav Sharma

A multistage clustering and data processing method, SWIFT (detailed in a companion manuscript), has been developed to detect rare subpopulations in large, high‐dimensional flow cytometry datasets. An iterative sampling procedure initially fits the data to multidimensional Gaussian distributions, then splitting and merging stages use a criterion of unimodality to optimize the detection of rare subpopulations, to converge on a consistent cluster number, and to describe non‐Gaussian distributions. Probabilistic assignment of cells to clusters, visualization, and manipulation of clusters by their cluster medians, facilitate application of expert knowledge using standard flow cytometry programs. The dual problems of rigorously comparing similar complex samples, and enumerating absent or very rare cell subpopulations in negative controls, were solved by assigning cells in multiple samples to a cluster template derived from a single or combined sample. Comparison of antigen‐stimulated and control human peripheral blood cell samples demonstrated that SWIFT could identify biologically significant subpopulations, such as rare cytokine‐producing influenza‐specific T cells. A sensitivity of better than one part per million was attained in very large samples. Results were highly consistent on biological replicates, yet the analysis was sensitive enough to show that multiple samples from the same subject were more similar than samples from different subjects. A companion manuscript (Part 1) details the algorithmic development of SWIFT.

wireless and mobile computing, networking and communications | 2006

Distributed localization in static and mobile sensor networks

Suprakash Datta; Chris Klinowski; Masoomeh Rudafshani; Shaker Khaleque

Sensor networks are expected to revolutionize information gathering, processing and dissemination in many diverse environments. In this paper, we address a fundamental problem in designing sensor networks: localization, or determining the locations of nodes. We assume that a small fraction of the sensor nodes (called seeds) know their locations. We propose an algorithm that enables other nodes to estimate their locations by exchanging information between nodes and seeds. Unlike most existing work, in our algorithm, a node uses the location information of all its neighbors, not just the seed nodes. Unlike most existing algorithms our algorithm works for both static and mobile sensor networks. Using simulation experiments, we demonstrate that our algorithm significantly outperforms comparable existing algorithms like DV-hop [1] and MCL [2]

international symposium on multimedia | 2004

Prediction of protein coding regions in DNA sequences using Fourier spectral characteristics

Suprakash Datta; Amir Asif; Haoyuan Wang

Existing discrete Fourier transform (DFT)-based algorithms for identifying protein coding regions in DNA sequences (S. Tiwari et al., 1997, D. Anastassiou, 2001, D. Kotlar et al., 2003) exploit the empirical observation that the spectrum of protein coding regions of length N nucleotides has a peak at frequency k=N/3. In this paper, we prove the aforementioned and several other empirical observations attributed to DNA sequences. Our analytical results lead to faster and more accurate DFT-based algorithms for predicting coding regions.

international conference on acoustics, speech, and signal processing | 2010

Swift: Scalable weighted iterative sampling for flow cytometry clustering

Iftekhar Naim; Suprakash Datta; Gaurav Sharma; James S. Cavenaugh; Tim R. Mosmann

Flow cytometry (FC) is a powerful technology for rapid multivariate analysis and functional discrimination of cells. Current FC platforms generate large, high-dimensional datasets which pose a significant challenge for traditional manual bivariate analysis. Automated multivariate clustering, though highly desirable, is also stymied by the critical requirement of identifying rare populations that form rather small clusters, in addition to the computational challenges posed by the large size and dimensionality of the datasets. In this paper, we address these twin challenges by developing a two-stage scalable multivariate parametric clustering algorithm. In the first stage, we model the data as a mixture of Gaussians and use an iterative weighted sampling technique to estimate the mixture components successively in order of decreasing size. In the second stage, we apply a graph-based hierarchical merging technique to combine Gaussian components with significant overlaps into the final number of desired clusters. The resulting algorithm offers a reduction in complexity over conventional mixture modeling while simultaneously allowing for better detection of small populations. We demonstrate the effectiveness of our method both on simulated data and actual flow cytometry datasets.

systems communications | 2005

Building multicast trees for multimedia streaming in heterogeneous P2P networks

Xiangrong Tan; Suprakash Datta

P2P networks have been proposed as a scalable, inexpensive solution to the problem of distributing multimedia content over the Internet. Since real P2P systems exhibit considerable heterogeneity in hardware, software and network connections, the design of P2P streaming networks must factor in this variation. There are two different sources of heterogeneity in P2P networks. Most existing work in the literature handle heterogeneity among receivers and requirements by the use of different multimedia encodings of the same content. In this paper we focus on the problems caused by heterogeneity in the network delays connecting receivers to the sender We assume that there is a single multicast tree and a single video stream. We propose new algorithms for building multicast trees for multimedia streaming in heterogeneous P2P networks. Our algorithms differ in the amount of communication and computational resources they require. We compare the performance (using simulations) of our algorithms with an existing Zigzag algorithm. Our results show that two of our algorithms ( FollowTree-Landmark-II algorithm and FollowTree algorithm) significantly outperform Zigzag.

IEEE Transactions on Evolutionary Computation | 2013

Evolved Features for DNA Sequence Classification and Their Fitness Landscapes

Wendy Ashlock; Suprakash Datta

A key problem in genomics is the classification and annotation of sequences in a genome. A major challenge is identifying good sequence features. Evolutionary algorithms have the potential to search a large space of features and automatically generate useful ones. This paper proposes a two-stage method that generates features using multiple replicates of a genetic algorithm operating on an augmented finite state machine, called a side effect machine (SEM), and then selects a small diverse feature set using several methods, including a novel method called dissimilarity clustering. We apply our method to three problems related to transposable elements and compare the results to those using k-mer features. We are able to produce a small set of interesting and comprehensible features that create random forest classifiers more accurate and less prone to overfitting than those created using k-mer features. We analyze the SEM fitness landscapes and discuss the use of different fitness functions.

asilomar conference on signals, systems and computers | 2004

DFT based DNA splicing algorithms for prediction of protein coding regions

Suprakash Datta; Amir Asif

Identifying protein coding regions in DNA sequences is a fundamental step in computational recognition of genes. Traditional Discrete Fourier transform (DFT) based approaches exploit the empirical observation that the spectrum of protein coding DNA of length N nucleotides has a peak at frequency k=N/3 corresponding to the length of a DNA codon. In this paper, we prove the aforementioned and several other empirical observations attributed to DNA sequences. Our analytical results lead to faster and more accurate DFT-based algorithms for predicting coding regions. Further, our algorithm requires no prior training.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2012

Distinguishing Endogenous Retroviral LTRs from SINE Elements Using Features Extracted from Evolved Side Effect Machines

Wendy Ashlock; Suprakash Datta

Side effect machines produce features for classifiers that distinguish different types of DNA sequences. They have the, as yet unexploited, potential to give insight into biological features of the sequences. We introduce several innovations to the production and use of side effect machine sequence features. We compare the results of using consensus sequences and genomic sequences for training classifiers and find that more accurate results can be obtained using genomic sequences. Surprisingly, we were even able to build a classifier that distinguished consensus sequences from genomic sequences with high accuracy, suggesting that consensus sequences are not always representative of their genomic counterparts. We apply our techniques to the problem of distinguishing two types of transposable elements, solo LTRs and SINEs. Identifying these sequences is important because they affect gene expression, genome structure, and genetic diversity, and they serve as genetic markers. They are of similar length, neither codes for protein, and both have many nearly identical copies throughout the genome. Being able to efficiently and automatically distinguish them will aid efforts to improve annotations of genomes. Our approach reveals structural characteristics of the sequences of potential interest to biologists.

Explore More