Indranil Gupta
University of Illinois at Urbana–Champaign
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Indranil Gupta.
international workshop on peer to peer systems | 2003
Indranil Gupta; Kenneth P. Birman; Prakash Linga; Alan J. Demers; Robbert van Renesse
A peer-to-peer (p2p) distributed hash table (DHT) system allows hosts to join and fail silently (or leave), as well as to insert and retrieve files (objects). This paper explores a new point in design space in which increased memory usage and constant background communication overheads are tolerated to reduce file lookup times and increase stability to failures and churn. Our system, called Kelips, uses peer-to-peer gossip to partially replicate file index information. In Kelips, (a) under normal conditions, file lookups are resolved within 1 RPC, independent of system size, and (b) membership changes (e.g., even when a large number of nodes fail) are detected and disseminated to the system quickly. Per-node memory requirements are small in medium-sized systems. When there are failures, lookup success is ensured through query rerouting. Kelips achieves load balancing comparable to existing systems. Locality is supported by using topologically aware gossip mechanisms. Initial results of an ongoing experimental study are also discussed.
principles of distributed computing | 2001
Indranil Gupta; Tushar Deepak Chandra; Germán S. Goldszmidt
Process groups in distributed applications and services rely on failure detectors to detect process failures completely, and as quickly, accurately, and scalably as possible, even in the face of unreliable message deliveries. In this paper, we look at quantifying the optimal scalability, in terms of network load, (in messages per second, with messages having a size limit) of distributed, complete failure detectors as a function of application-specified requirements. These requirements are 1) quick failure detection by some non-faulty process, and 2) accuracy of failure detection. We assume a crash-recovery (non-Byzantine) failure model, and a network model that is probabilistically unreliable (w.r.t. message deliveries and process failures). First, we characterize, under certain independence assumptions, the optimum worst-case network load imposed by any failure detector that achieves an applications requirements. We then discuss why traditional heart beating schemes are inherently unscalable according to the optimal load. We also present a randomized, distributed, failure detector algorithm that imposes an equal expected load per group member. This protocol satisfies the application defined constraints of completeness and accuracy, and speed of detection on an average. It imposes a network load that differs frown the optimal by a sub-optimality factor that is much lower than that for traditional distributed heartbeating schemes. Moreover, this sub-optimality factor does not vary with group size (for large groups).
IEEE Computer | 2010
Arutyun Avetisyan; Roy H. Campbell; Indranil Gupta; Michael T. Heath; Steven Y. Ko; Gregory R. Ganger; Michael Kozuch; David R. O'Hallaron; M. Kunze; Thomas T. Kwan; Kevin Lai; Martha Lyons; Dejan S. Milojicic; Hing Yan Lee; Yeng Chai Soh; Ng Kwang Ming; Jing-Yuan Luke; Han Namgoong
Open Cirrus is a cloud computing testbed that, unlike existing alternatives, federates distributed data centers. It aims to spur innovation in systems and applications research and catalyze development of an open source service stack for the cloud.
dependable systems and networks | 2001
Indranil Gupta; R. van Renesse; Kenneth P. Birman
The paper discusses fault-tolerant, scalable solutions to the problem of accurately and scalably calculating global aggregate functions in large process groups communicating over unreliable networks. These groups could represent sensors or processes communicating over a network that is either fixed (e.g., the Internet) or dynamic (e.g., multihop ad-hoc). Group members are prone to failures. The ability to evaluate global aggregate properties (e.g., the average of sensor temperature readings) is important for higher-level coordination activities in such large groups. We first define the setting and problem, laying down metrics to evaluate different algorithms for the same. We discuss why the usual approaches to solve this problem are unviable and unscalable over an unreliable network prone to message delivery failures and crash failures. We then propose a technique to impose an abstract hierarchy on such large groups, describing how this hierarchy can be made to mirror the network topology. We discuss several alternatives to use this technique to solve the global aggregate function evaluation problem. Finally, we present a protocol based on gossiping that uses this hierarchical technique. We present mathematical analysis and performance results to validate the robustness, efficiency and accuracy of the Hierarchical Gossiping algorithm.
international conference on computer communications | 2008
I-Hong Hou; Yu-En Tsai; Tarek F. Abdelzaher; Indranil Gupta
Code updates, such as those for debugging purposes, are frequent and expensive in the early development stages of wireless sensor network applications. We propose AdapCode, a reliable data dissemination protocol that uses adaptive network coding to reduce broadcast traffic in the process of code updates. Packets on every node are coded by linear combination and decoded by Gaussian elimination. The core idea in AdapCode is to adaptively change the coding scheme according to the link quality. Our evaluation shows that AdapCode uses up to 40% less packets than Deluge in large networks. In addition, AdapCode performs much better in terms of load balancing, which prolongs the system lifetime, and has a slightly shorter propagation delay. Finally, we show that network coding is doable on sensor networks in that (i) it imposes only a 3 byte header overhead, (ii) it is easy to find linearly independent packets, and (3) Gaussian elimination needs only 1 KB of memory.
symposium on cloud computing | 2010
Steven Y. Ko; Imranul Hoque; Brian Cho; Indranil Gupta
Parallel dataflow programs generate enormous amounts of distributed data that are short-lived, yet are critical for completion of the job and for good run-time performance. We call this class of data as intermediate data. This paper is the first to address intermediate data as a first-class citizen, specifically targeting and minimizing the effect of run-time server failures on the availability of intermediate data, and thus on performance metrics such as job completion time. We propose new design techniques for a new storage system called ISS (Intermediate Storage System), implement these techniques within Hadoop, and experimentally evaluate the resulting system. Under no failure, the performance of Hadoop augmented with ISS (i.e., job completion time) turns out to be comparable to base Hadoop. Under a failure, Hadoop with ISS outperforms base Hadoop and incurs up to 18% overhead compared to base no-failure Hadoop, depending on the testbed setup.
network computing and applications | 2005
Dionysios Kostoulas; Dimitrios Psaltoulis; Indranil Gupta; Kenneth P. Birman; Alan J. Demers
Large-scale and dynamically changing distributed systems such as the grid, peer-to-peer overlays, etc., need to collect several kinds of global statistics in a decentralized manner. In this paper, we tackle a specific statistic collection problem called group size estimation, for estimating the number of non-faulty processes present in the global group at any given point of time. We present two new decentralized algorithms for estimation in dynamic groups, analyze the algorithms, and experimentally evaluate them using real-life traces. One scheme is active: it spreads a gossip into the overlay first, and then samples the receipt times of this gossip at different processes. The second scheme is passive: it measures the density of processes when their identifiers are hashed into a real interval. Both schemes have low latency, scalable per-process overheads, and provide high levels of probabilistic accuracy for the estimate. They are implemented as part of a size estimation utility called PeerCounter that can be incorporated modularly into standard peer-to-peer overlays. We present experimental results from both the simulations and PeerCounter, running on a cluster of 33 Linux servers
dependable systems and networks | 2002
Abhinandan Das; Indranil Gupta; Ashish Motivala
Several distributed peer-to-peer applications require weakly-consistent knowledge of process group membership information at all participating processes. SWIM is a generic software module that offers this service for large scale process groups. The SWIM effort is motivated by the unscalability of traditional heart-beating protocols, which either impose network loads that grow quadratically with group size, or compromise response times or false positive frequency w.r.t. detecting process crashes. This paper reports on the design, implementation and performance of the SWIM sub-system on a large cluster of commodity PCs. Unlike traditional heart beating protocols, SWIM separates the failure detection and membership update dissemination functionalities of the membership protocol. Processes are monitored through an efficient peer-to-peer periodic randomized probing protocol. Both the expected time to first detection of each process failure, and the expected message load per member do not vary with group size. Information about membership changes, such as process joins, drop-outs and failures, is propagated via piggybacking on ping messages and acknowledgments. This results in a robust and fast infection style (also epidemic or gossip-style) of dissemination. The rate of false failure detections in the SWIM system is reduced by modifying the protocol to allow group members to suspect a process before declaring it as failed - this allows the system to discover and rectify false failure detections. Finally, the protocol guarantees a deterministic time bound to detect failures. Experimental results from the SWIM prototype are presented. We discuss the extensibility of the design to a WAN-wide scale.
international conference on distributed computing systems | 2005
Matthew Miller; Cigdem Sengul; Indranil Gupta
Networking protocols for multi-hop wireless sensor networks (WSNs) are required to simultaneously minimize resource usage as well as optimize performance metrics such as latency and reliability. This paper explores the energy-latency-reliability trade-off for broadcast in multi-hop WSNs, by presenting a new protocol called PBBF (probability-based broadcast forwarding). PBBF works at the MAC layer and can be integrated into any sleep scheduling protocol. For a given application-defined level of reliability for broadcasts, the energy required and latency obtained are found to be inversely related to each other. Our analysis and simulation study quantify this relationship at the reliability boundary, as well as performance numbers to be expected from a deployment. PBBF essentially offers a WSN application designer considerable flexibility in choice of desired operation points
grid computing | 2005
Adeep S. Cheema; Moosa Muhammad; Indranil Gupta
Grid applications need to discover computational resources quickly, efficiently and scalably, but most importantly in an expressive manner. An expressive query may specify a variety of required metrics for the job, e.g., the number of hosts required, the amount of free CPU required on these hosts, and the minimum amount of RAM required on these hosts, etc. We present a peer-to-peer (P2P) solution to this problem, using structured naming to enable both (1) publishing of information about available computational resources, as well as (2) expressive and efficient querying of such resources. Extensive traces collected from hosts within the Computer Science department at UIUC are used to evaluate our proposed solution. Finally, our solutions are based upon a well known P2P system called Pastry, albeit for Grid applications; this is another step towards the much-needed convergence of Grid and P2P computing.