Is this you? Create Your Porfile

David R. Karger

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David R. Karger is active.

Explore More

Publication

Featured researches published by David R. Karger.

IEEE ACM Transactions on Networking | 2003

Chord: a scalable peer-to-peer lookup protocol for Internet applications

Ion Stoica; Robert Tappan Morris; David Liben-Nowell; David R. Karger; M. Frans Kaashoek; Frank Dabek; Hari Balakrishnan

A fundamental problem that confronts peer-to-peer applications is the efficient location of the node that stores a desired data item. This paper presents Chord, a distributed lookup protocol that addresses this problem. Chord provides support for just one operation: given a key, it maps the key onto a node. Data location can be easily implemented on top of Chord by associating a key with each data item, and storing the key/data pair at the node to which the key maps. Chord adapts efficiently as nodes join and leave the system, and can answer queries even if the system is continuously changing. Results from theoretical analysis and simulations show that Chord is scalable: Communication cost and the state maintained by each node scale logarithmically with the number of Chord nodes.

acm special interest group on data communication | 2001

Chord: A scalable peer-to-peer lookup service for internet applications

Ion Stoica; Robert Tappan Morris; David R. Karger; M. Frans Kaashoek; Hari Balakrishnan

A fundamental problem that confronts peer-to-peer applications is to efficiently locate the node that stores a particular data item. This paper presents Chord, a distributed lookup protocol that addresses this problem. Chord provides support for just one operation: given a key, it maps the key onto a node. Data location can be easily implemented on top of Chord by associating a key with each data item, and storing the key/data item pair at the node to which the key maps. Chord adapts efficiently as nodes join and leave the system, and can answer queries even if the system is continuously changing. Results from theoretical analysis, simulations, and experiments show that Chord is scalable, with communication cost and the state maintained by each node scaling logarithmically with the number of Chord nodes.

IEEE Transactions on Information Theory | 2006

A Random Linear Network Coding Approach to Multicast

Tracey Ho; Muriel Médard; Ralf Koetter; David R. Karger; Michelle Effros; Jun Shi; Ben Leong

We present a distributed random linear network coding approach for transmission and compression of information in general multisource multicast networks. Network nodes independently and randomly select linear mappings from inputs onto output links over some field. We show that this achieves capacity with probability exponentially approaching 1 with the code length. We also demonstrate that random linear coding performs compression when necessary in a network, generalizing error exponents for linear Slepian-Wolf coding in a natural way. Benefits of this approach are decentralized operation and robustness to network changes or link failures. We show that this approach can take advantage of redundant network capacity for improved success probability and robustness. We illustrate some potential advantages of random linear network coding over routing in two examples of practical scenarios: distributed network operation and networks with dynamically varying connections. Our derivation of these results also yields a new bound on required field size for centralized network coding on general multicast networks

symposium on the theory of computing | 1997

Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web

David R. Karger; Eric Lehman; Tom Leighton; Rina Panigrahy; Matthew S. Levine; Daniel M. Lewin

We describe a family of caching protocols for distrib-uted networks that can be used to decrease or eliminate the occurrence of hot spots in the network. Our protocols are particularly designed for use with very large networks such as the Internet, where delays caused by hot spots can be severe, and where it is not feasible for every server to have complete information about the current state of the entire network. The protocols are easy to implement using existing network protocols such as TCP/IP, and require very little overhead. The protocols work with local control, make efficient use of existing resources, and scale gracefully as the network grows. Our caching protocols are based on a special kind of hashing that we call consistent hashing. Roughly speaking, a consistent hash function is one which changes minimally as the range of the function changes. Through the development of good consistent hash functions, we are able to develop caching protocols which do not require users to have a current or even consistent view of the network. We believe that consistent hash functions may eventually prove to be useful in other applications such as distributed name servers and/or quorum systems.

acm/ieee international conference on mobile computing and networking | 2000

A scalable location service for geographic ad hoc routing

Jinyang Li; John Jannotti; Douglas S. J. De Couto; David R. Karger; Robert Tappan Morris

GLS is a new distributed location service which tracks mobile node locations. GLS combined with geographic forwarding allows the construction of ad hoc mobile networks that scale to a larger number of nodes than possible with previous work. GLS is decentralized and runs on the mobile nodes themselves, requiring no fixed infrastructure. Each mobile node periodically updates a small set of other nodes (its location servers) with its current location. A node sends its position updates to its location servers without knowing their actual identities, assisted by a predefined ordering of node identifiers and a predefined geographic hierarchy. Queries for a mobile nodes location also use the predefined identifier ordering and spatial hierarchy to find a location server for that node. Experiments using the ns simulator for up to 600 mobile nodes show that the storage and bandwidth requirements of GLS grow slowly with the size of the network. Furthermore, GLS tolerates node failures well: each failure has only a limited effect and query performance degrades gracefully as nodes fail and restart. The query performance of GLS is also relatively insensitive to node speeds. Simple geographic forwarding combined with GLS compares favorably with Dynamic Source Routing (DSR): in larger networks (over 200 nodes) our approach delivers more packets, but consumes fewer network resources.

international acm sigir conference on research and development in information retrieval | 1992

Scatter/Gather: a cluster-based approach to browsing large document collections

Douglass R Cutting; David R. Karger; Jan O. Pedersen; John W Tukey

Document clustering has not been well received as an information retrieval tool. Objections to its use fall into two main categories: first, that clustering is too slow for large corpora (with running time often quadratic in the number of documents); and second, that clustering does not appreciably improve retrieval. We argue that these problems arise only when clustering is used in an attempt to improve conventional search techniques. However, looking at clustering as an information access tool in its own right obviates these objections, and provides a powerful new access paradigm. We present a document browsing technique that employs docum-ent clustering as its primary operation. We also present fast (linear time) clustering algorithm.

international symposium on information theory | 2003

The benefits of coding over routing in a randomized setting

Tracey Ho; Ralf Koetter; Muriel Médard; David R. Karger; Michelle Effros

A novel randomized network coding approach for robust, distributed transmission and compression of information in networks is presented, and its advantages over routing-based approaches is demonstrated.

Communications of The ACM | 2003

Looking up data in P2P systems

Hari Balakrishnan; M. Frans Kaashoek; David R. Karger; Robert Tappan Morris; Ion Stoica

The main challenge in P2P computing is to design and implement a robust and scalable distributed system composed of inexpensive, individually unreliable computers in unrelated administrative domains. The participants in a typical P2P system might include computers at homes, schools, and businesses, and can grow to several million concurrent participants.

human factors in computing systems | 2004

The perfect search engine is not enough: a study of orienteering behavior in directed search

Jaime Teevan; Christine Alvarado; Mark S. Ackerman; David R. Karger

This paper presents a modified diary study that investigated how people performed personally motivated searches in their email, in their files, and on the Web. Although earlier studies of directed search focused on keyword search, most of the search behavior we observed did not involve keyword search. Instead of jumping directly to their information target using keywords, our participants navigated to their target with small, local steps using their contextual knowledge as a guide, even when they knew exactly what they were looking for in advance. This stepping behavior was especially common for participants with unstructured information organization. The observed advantages of searching by taking small steps include that it allowed users to specify less of their information need and provided a context in which to understand their results. We discuss the implications of such advantages for the design of personal information management tools.

international workshop on peer to peer systems | 2003

Koorde: A simple degree-optimal distributed hash table

M. Frans Kaashoek; David R. Karger

Koorde is a new distributed hash table (DHT) based on Chord 15 and the de Bruijn graphs 2. While inheriting the simplicity of Chord, Koorde meets various lower bounds, such as O(log n) hops per lookup request with only 2 neighbors per node (where n is the number of nodes in the DHT), and O(log n/log log n) hops per lookup request with O(log n) neighbors per node.

Explore More