Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Debojyoti Dutta is active.

Publication


Featured researches published by Debojyoti Dutta.


Mobile Computing and Communications Review | 2005

Data acquisition in multiple-sink sensor networks

Abhimanyu Das; Debojyoti Dutta

Scalable, energy-efficient data acquisition in large sensor network deployments such as habitat monitoring is an research important problem. In several papers [1, 2], sensor networks have been modeled as having a single sink (or base-station) that acts as the data recipient for a large number of sensors (data sources) deployed over a sensor field. The sensor network might use simple querying and data collection trees for hop-by-hop query dissemination and routing of sensor responses [1] back towards the sink. Since sensors are energy-constrained devices, we wish to minimize communication energy expenditure of these sensors.


Mobile Computing and Communications Review | 2008

Profile-cast: behavior-aware mobile networking

Wei-jen Hsu; Debojyoti Dutta; Ahmed Helmy

In this paper we advocate a service paradigm, profile-cast, within the communication framework of delay tolerant networks (DTN) (K. Fall, 2003). This novel approach leverages the behavioral patterns of mobile network users for delivering messages to a sub-group of users as defined by their profiles (e.g., interest, social affiliation, etc.). We study large data sets of user mobility profiles and present a case-study of mobility profile-cast with a similarity-based forwarding protocol. We show that behavior-aware protocol design has a great potential - we reduce the total number of transmissions to 45% of flooding under 92% delivery success rate, or to only 3% transmissions of flooding under 61% delivery success rate. It also leads to shorter delay (at least 30% less) as compared to a random transmission protocol.


international conference on computer communications | 2003

Oblivious AQM and Nash equilibria

Debojyoti Dutta; Ashish Goel; John S. Heidemann

An oblivious active queue management scheme is one which does not differentiate between packets belonging to different flows. In this paper, we study the existence and the quality of Nash equilibria imposed by oblivious AQM schemes on selfish agents. Oblivious AQM schemes are of obvious importance because of the ease of implementation and deployment, and Nash equilibrium offers valuable clues into network performance under noncooperative user behavior. Specifically, we ask the following three questions: 1) do there exist oblivious AQM schemes that impose Nash equilibria on selfish agents? 2) Are the imposed equilibria, if they exist, efficient in terms of the goodput obtained and the drop probability experienced at the equilibrium? 3) How easy is it for selfish users to reach the Nash equilibrium state? We assume that the traffic sources are Poisson but the users can control the average rate. We show that drop-tail and RED do not impose Nash equilibria. We modify RED slightly to obtain an oblivious scheme, VLRED, that imposes a Nash equilibrium, but is not efficient. We then present another AQM policy, EN-AQM, that can impose an efficient Nash equilibrium. Finally, we show that for any oblivious AQM, the Nash equilibrium imposed on selfish agents is highly sensitive as the number of agents increases, thus making it hard for the users to converge to the Nash equilibrium, and motivating the need for equilibria-aware protocols.


Journal of Chemical Information and Modeling | 2007

Ensemble Feature Selection: Consistent Descriptor Subsets for Multiple QSAR Models

Debojyoti Dutta; Rajarshi Guha; David J. Wild; Ting Chen

Selecting a small subset of descriptors from a large pool to build a predictive quantitative structure-activity relationship (QSAR) model is an important step in the QSAR modeling process. In general, subset selection is very hard to solve, even approximately, with guaranteed performance bounds. Traditional approaches employ deterministic or stochastic methods to obtain a descriptor subset that leads to an optimal model of a single type (such as linear regression or a neural network). With the development of ensemble modeling approaches, multiple models of differing types are individually developed resulting in different descriptor subsets for each model type. However, it is advantageous, from the point of view of developing interpretable QSAR models, to have a single set of descriptors that can be used for different model types. In this paper, we describe an approach to the selection of a single, optimal, subset of descriptors for multiple model types. We apply this approach to three data sets, covering both regression and classification, and show that the constraint of forcing different model types to use the same set of descriptors does not lead to a significant loss in predictive ability for the individual models considered. In addition, interpretations of the individual models developed using this approach indicate that they encode similar structure-activity trends.


Journal of Chemical Information and Modeling | 2006

Local lazy regression: making use of the neighborhood to improve QSAR predictions.

Rajarashi Guha; Debojyoti Dutta; Peter C. Jurs; Ting Chen

Traditional quantitative structure-activity relationship (QSAR) models aim to capture global structure-activity trends present in a data set. In many situations, there may be groups of molecules which exhibit a specific set of features which relate to their activity or inactivity. Such a group of features can be said to represent a local structure-activity relationship. Traditional QSAR models may not recognize such local relationships. In this work, we investigate the use of local lazy regression (LLR), which obtains a prediction for a query molecule using its local neighborhood, rather than considering the whole data set. This modeling approach is especially useful for very large data sets because no a priori model need be built. We applied the technique to three biological data sets. In the first case, the root-mean-square error (RMSE) for an external prediction set was 0.94 log units versus 0.92 log units for the global model. However, LLR was able to characterize a specific group of anomalous molecules with much better accuracy (0.64 log units versus 0.70 log units for the global model). For the second data set, the LLR technique resulted in a decrease in RMSE from 0.36 log units to 0.31 log units for the external prediction set. In the third case, we obtained an RMSE of 2.01 log units versus 2.16 log units for the global model. In all cases, LLR led to a few observations being poorly predicted compared to the global model. We present an analysis of why this was observed and possible improvements to the local regression approach.


Bioinformatics | 2007

Speeding up tandem mass spectrometry database search: metric embeddings and fast near neighbor search

Debojyoti Dutta; Ting Chen

MOTIVATION Due to the recent advances in technology of mass spectrometry, there has been an exponential increase in the amount of data being generated in the past few years. Database searches have not been able to keep with this data explosion. Thus, speeding up the data searches becomes increasingly important in mass-spectrometry-based applications. Traditional database search methods use one-against-all comparisons of a query spectrum against a very large number of peptides generated from in silico digestion of protein sequences in a database, to filter potential candidates from this database followed by a detailed scoring and ranking of those filtered candidates. RESULTS In this article, we show that we can avoid the one-against-all comparisons. The basic idea is to design a set of hash functions to pre-process peptides in the database such that for each query spectrum we can use the hash functions to find only a small subset of peptide sequences that are most likely to match the spectrum. The construction of each hash function is based on a random spectrum and the hash value of a peptide is the normalized shared peak counts score (cosine) between the random spectrum and the hypothetical spectrum of the peptide. To implement this idea, we first embed each peptide into a unit vector in a high-dimensional metric space. The random spectrum is represented by a random vector, and we use random vectors to construct a set of hash functions called locality sensitive hashing (LSH) for preprocessing. We demonstrate that our mapping is accurate. We show that our method can filter out >95.65% of the spectra without missing any correct sequences, or gain 111 times speedup by filtering out 99.64% of spectra while missing at most 0.19% (2 out of 1014) of the correct sequences. In addition, we show that our method can be effectively used for other mass spectra mining applications such as finding clusters of spectra efficiently and accurately. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


ad hoc networks | 2012

CSI: A paradigm for behavior-oriented profile-cast services in mobile networks

Wei-jen Hsu; Debojyoti Dutta; Ahmed Helmy

We propose a new behavior-oriented communication paradigm in mobile networks, profile-cast, motivated by tight user-network coupling in mobile societies. In this novel paradigm, messages are sent to sender-specified target profiles, instead of machine IDs. We present a systematic framework for such services. First, we analyze the spatio-temporal stability of user mobility profiles constructed from empirical data sets, and they turn out to be surprisingly stable. The similarity of the current mobility profile of a user to its future mobility profile remains above 0.6 for five weeks, while the correlation coefficient of the similarity metrics between a user pair at different time instants is above 0.5 for two weeks. Second, we present a protocol for the profile-cast service, named CSI, and provide a fully distributed solution utilizing behavioral profile space gradients and small world structures to selectively diffuse information across the network towards the intended recipients. Leveraging stability in user behaviors, the two modes of CSI achieve good performance compared to the theoretical optimal protocols. Both CSI:Target mode and CSI:Dissemination mode achieve more than 94% delivery ratio. Comparing with the delay-optimal protocol, they show no more than 47% and 32% more delay, respectively, with at most 10% more transmission overhead. Comparing with the overhead-optimal protocol, they use no more than 7% more overhead while achieving dramatic improvement in delay (up to 150% less). Both CSI:T and CSI:D significantly outperform the epidemic routing, using less than 7% overhead, and variants of random walk, where CSI:T doubles the delivery ratio using less overhead, and CSI:D shows at least 50% less delay under similar overhead. We believe the profile-cast paradigm would enable many behavior-oriented services efficiently, such as targeted announcements and profile-based alert notifications, in various mobile networks.


wireless communications and networking conference | 2008

Profile-Cast: Behavior-Aware Mobile Networking

Wei-jen Hsu; Debojyoti Dutta; Ahmed Helmy

In this paper we advocate a service paradigm, profile-cast, within the communication framework of delay tolerant networks (DTN) (K. Fall, 2003). This novel approach leverages the behavioral patterns of mobile network users for delivering messages to a sub-group of users as defined by their profiles (e.g., interest, social affiliation, etc.). We study large data sets of user mobility profiles and present a case-study of mobility profile-cast with a similarity-based forwarding protocol. We show that behavior-aware protocol design has a great potential - we reduce the total number of transmissions to 45% of flooding under 92% delivery success rate, or to only 3% transmissions of flooding under 61% delivery success rate. It also leads to shorter delay (at least 30% less) as compared to a random transmission protocol.


Journal of Chemical Information and Modeling | 2006

Scalable partitioning and exploration of chemical spaces using geometric hashing

Debojyoti Dutta; Rajarshi Guha; Peter C. Jurs; Ting Chen

Virtual screening (VS) has become a preferred tool to augment high-throughput screening(1) and determine new leads in the drug discovery process. The core of a VS informatics pipeline includes several data mining algorithms that work on huge databases of chemical compounds containing millions of molecular structures and their associated data. Thus, scaling traditional applications such as classification, partitioning, and outlier detection for huge chemical data sets without a significant loss in accuracy is very important. In this paper, we introduce a data mining framework built on top of a recently developed fast approximate nearest-neighbor-finding algorithm(2) called locality-sensitive hashing (LSH) that can be used to mine huge chemical spaces in a scalable fashion using very modest computational resources. The core LSH algorithm hashes chemical descriptors so that points close to each other in the descriptor space are also close to each other in the hashed space. Using this data structure, one can perform approximate nearest-neighbor searches very quickly, in sublinear time. We validate the accuracy and performance of our framework on three real data sets of sizes ranging from 4337 to 249 071 molecules. Results indicate that the identification of nearest neighbors using the LSH algorithm is at least 2 orders of magnitude faster than the traditional k-nearest-neighbor method and is over 94% accurate for most query parameters. Furthermore, when viewed as a data-partitioning procedure, the LSH algorithm lends itself to easy parallelization of nearest-neighbor classification or regression. We also apply our framework to detect outlying (diverse) compounds in a given chemical space; this algorithm is extremely rapid in determining whether a compound is located in a sparse region of chemical space or not, and it is quite accurate when compared to results obtained using principal-component-analysis-based heuristics.


international conference on communications | 2002

An early bandwidth notification (EBN) architecture for dynamic bandwidth environment

Debojyoti Dutta; Yongguang Zhang

In todays heterogeneous Internet, bandwidth available to TCP flows is often variable. However, current TCP cannot perform optimally under such dynamically varying bandwidth conditions. This paper addresses this problem by introducing a new architecture to improve TCP performance with explicit bandwidth notification (EBN). It uses a normalized bandwidth feedback method to provide accurate and timely bandwidth estimations. Then, a new TCP control algorithm (TCP-EBN) is proposed to give a prompt response to any bandwidth changes. Our simulation results have shown that TCP-EBN performs much better than several other variations of TCP.

Collaboration


Dive into the Debojyoti Dutta's collaboration.

Top Co-Authors

Avatar

Ashish Goel

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

John S. Heidemann

Information Sciences Institute

View shared research outputs
Researchain Logo
Decentralizing Knowledge