[PDF] An Analytical Study of a Structured Overlay in the presence of Dynamic Membership

Abstract

In this paper we present an analytical study of dynamic membership (aka churn) in structured peer-to-peer networks. We use a fluid model approach to describe steady-state or transient phenomena, and apply it to the Chord system. For any rate of churn and stabilization rates, and any system size, we accurately account for the functional form of the probability of network disconnection as well as the fraction of failed or incorrect successor and finger pointers. We show how we can use these quantities to predict both the performance and consistency of lookups under churn. All theoretical predictions match simulation results. The analysis includes both features that are generic to structured overlays deploying a ring as well as Chord-specific details, and opens the door to a systematic comparative analysis of, at least, ring-based structured overlay systems under churn.

Full PDF

aa r X i v : . [ c s . N I] O c t An Analytical Study of a Structured Overlay in thePresence of Dynamic Membership

Supriya Krishnamurthy , , Sameh El-Ansary , Erik Aurell , and Seif Haridi , Swedish Institute of Computer Science (SICS), Sweden Department of Physics, KTH-Royal Institute of Technology, Sweden IMIT, KTH-Royal Institute of Technology, Sweden { supriya,sameh,eaurell,seif } @sics.se Abstract — In this paper we present an analytical study ofdynamic membership (aka churn) in structured peer-to-peernetworks. We use a ﬂuid model approach to describe steady-state or transient phenomena, and apply it to the Chord system.For any rate of churn and stabilization rates, and any systemsize, we accurately account for the functional form of theprobability of network disconnection as well as the fraction offailed or incorrect successor and ﬁnger pointers. We show howwe can use these quantities to predict both the performance andconsistency of lookups under churn. All theoretical predictionsmatch simulation results. The analysis includes both featuresthat are generic to structured overlays deploying a ring as wellas Chord-speciﬁc details, and opens the door to a systematiccomparative analysis of, at least, ring-based structured overlaysystems under churn.

I. I

NTRODUCTION A N intrinsic property of Peer-to-Peer systems is the pro-cess of never-ceasing dynamic membership. StructuredPeer-to-Peer Networks (aka Distributed Hash Tables (DHTs))have the underlying principle of arranging nodes in an over-lay graph of known topology and diameter. This knowledgeresults in the provision of performance guarantees. However,dynamic membership continuously “corrupts/churns” the over-lay graph and every DHT strives to provide a technique to“correct/maintain” the graph in the face of this perturbation.Both theoretical and empirical studies have been conductedto analyze the performance of DHTs undergoing “churn” andsimultaneously performing “maintenance”. Liben-Nowell etal. [11] prove a lower bound on the maintenance rate requiredfor a network to remain connected in the face of a givendynamic membership rate. Aspnes et al. [3] give upper andlower bounds on the number of messages needed to locatea node/data item in a DHT in the presence of node or linkfailures. The value of such theoretical studies is that theyprovide insights neutral to the details of any particular DHT.Empirical studies have also been conducted to complementthese theoretical studies by showing how within the asymptoticbounds, the performance of a DHT may vary substantially

This work is funded by the European 6th FP EVERGROW project.c (cid:13)

IEEE. Personal use of this material is permitted. However, permission toreprint/republish this material for advertising or promotional purposes or forcreating new collective works for resale or redistribution to servers or lists,or to reuse any copyrighted component of this work in other works, must beobtained from the IEEE. depending on different DHT designs and implementationdecisions. Examples include the work of: Li et al. [10], Rhea et al. [14], and Rowstron et al. [5].In this paper, we present a ﬂuid model of Chord [15], aspeciﬁc DHT, under churn. Fluid models have been used tomodel data communication systems at least since the early’80ies [2], and in some sense since the work of Erlang [4].More recently, in the context of P2P systems, it has beenused to model the performance of BitTorrent [13] and theSquirrel caching system [6]. This technique has much incommon with macroscopic and mesoscopic descriptions ofphysical and chemical phenomena (from where the term ﬂuidhas obviously been borrowed), and carries the same advantagesof conciseness and computability relative to an underlyingmore exact description. Our analysis is directly based on themaster equation approach of physical kinetics, see e.g. the textbook [12], which provides a scheme for taking the variousdynamical processes involved systematically into account.The ﬂuid model requires the notion of a state of the system.This is just a listing of the quantities one would need to knowfor a description of the system at a given level of detail.For Chord, we use grosso modo a level of description whichrequires keeping track of how many nodes there are in thesystem and what the state (whether correct, incorrect or failed)of each of the pointers of those nodes is. This information isnot enough to draw a unique graph of network-connectionsbecause, for example, if we know that a given node has an’incorrect’ successor pointer, this still does not tell us whichnode it is pointing to. However, as we will see, beginning atthis level of description is sufﬁcient to keep track of most ofthe details of the Chord protocols. Having deﬁned a state, theﬂuid model is simply a set of equations for the evolution of theprobability of ﬁnding the system in this state, given the detailsof the dynamics. The master equation approach is useful forkeeping track of the contribution of all the events which canbring about changes in the probability in a micro-instant oftime i.e. , evaluating all the terms in the dynamics leading toa gain or loss of this probability.Using this formalism we investigate a probabilistic modelin which peers arrive independently, distributed as a Poissonprocess, and life-times are exponentially distributed. While thissetup is not necessary fully realistic (more realistic modelscan also be analyzed using master equation techniques), it is standard in modeling, as it typically brings out the salientfeatures of the system with as few obscuring details fromthe probabilistic model as possible. We then derive the func-tional forms of the following: ( i ) Chord-speciﬁc inter-nodedistribution properties and ( ii ) for every outgoing pointer of aChord node, the probability that it is in any one of its possiblestates. This probability is different for each of the successorand ﬁnger pointers. We then use this information to predictother quantities such as ( iii ) the probability that the networkgets disconnected, ( iv ) lookup consistency (number of failedlookups), and ( v ) lookup performance (latency). All quantitiesare computed as a function of the parameters involved and allresults are veriﬁed by simulations.II. R ELATED W ORK

Closest in spirit to our work is the informal derivation in theoriginal Chord paper [15] of the average number of timeoutsencountered by a lookup. This quantity was approximatedthere by the product of the average number of ﬁngers usedin a lookup times the probability that a given ﬁnger pointsto a departed node. Our methodology not only allows us toderive the latter quantity systematically but also demonstrateshow this probability depends on which ﬁnger (or successor) isinvolved. Further we are able to derive a precise relation re-lating this probability to lookup performance and consistencyaccurately at any value of the system parameters.In the works of Aberer et al. [1] and Wang e t al. [16],DHTs are analyzed under churn and the results are comparedwith simulations. These analyses can also be classiﬁed as ﬂuidmodels. However the main parameter is the probability thata random selected entry of a routing table is stale. In ouranalysis, we determine this quantity from system details andchurn rates.A brief announcement of the results presented in this paper,has appeared earlier in [8].III. O UR I MPLEMENTATION OF C HORD

The Chord Ring.

The general philosophy of DHTs is tomap a set of data items onto a set of nodes where the insertionand lookup of items is done using the unique keys that theitems are given. Chord’s realization of that philosophy is asfollows. Peers and data items are given unique keys (usuallyobtained by a cryptographic hash of unique attribute like theIP address or public key for nodes, and ﬁlename or checksumfor items) drawn from a circular key space of size K . TheChord system dictates that the right place for storing an itemis at the ﬁrst alive node whose key succeeds the key of theitem. Since we refer to nodes and items by their keys, theinsertion and lookup of items becomes a matter of locatingthe right “successor” of a key. All nodes have successor andpredecessor pointers. For N nodes, using only the successorpointers to lookup items requires N hops on average. Fingers.

To reduce the average lookup path length, nodeskeep M = log K pointers known as the “ﬁngers”. Usingthese ﬁngers, a node can retrieve any key in O (log N ) hops.The ﬁngers of a node n (where n ∈ · · · K − ) point toexponentially increasing distances of keys away from n . That is, ∀ i ∈ .. M , n points to a node whose key is equal to n + 2 i − . We denote that key by n.f in i .start . However, for acertain i , there might not be a node in the network whose keyis equal to n + 2 i − . Therefore, n points to the ﬁrst successorof n + 2 i − which we denote by n.f in i .node . The Successor List

Moreover, each node keeps a list ofthe S = O (log( N )) immediate successors as backups for itsﬁrst successor. We use the notation n.s to refer to this list and n.s i to refer to the i th element in the list. Finally we use thenotation n.p to refer to the predecessor. Stabilization, Churn & Steady State.

To keep the pointersup-to-date in the presence of churn, each node performsperiodic stabilization of its successors and ﬁngers. In ouranalysis, we deﬁne λ j as the rate of joins per node, λ f the rateof failures per node and λ s the rate of stabilizations per node.The fraction of stabilizations which act on the successors is α ,such that the rate of successor stabilizations is αλ s , and therate of ﬁnger stabilizations is (1 − α ) λ s . In all that follows, weimpose the steady state condition λ j = λ f unless otherwisestated. Further it is useful to deﬁne r ≡ λ s λ f which is therelevant ratio on which all the quantities we are interested inwill depend, e.g, r = 50 means that a join/fail event takesplace every half an hour for a stabilization which takes placeonce every seconds. Throughout the paper we will use theterms λ j N ∆ t , λ f N ∆ t , αλ s N ∆ t and (1 − α ) λ s N ∆ t to denotethe respective probabilities that a join, failure, a successorstabilization, or a ﬁnger stabilization take place anywhere onthe ring during a micro period of time of length ∆ t . Parameters.

The parameters of the problem are hence: K , N , α and r . All relevant measurable quantities should beentirely expressible in terms of these parameters. Simulation

Since we are collecting statistics like the prob-ability of a particular ﬁnger pointer to be wrong, we needto repeat each experiment times before obtaining well-averaged results. The total simulation sequential real timefor obtaining the results of this paper was about hoursthat was parallelized on a cluster of nodes where wehad N = 1000 , K = 2 , S = 6 , ≤ r ≤ and . ≤ α ≤ . .While the main outlines of the chord protocol are providedby its authors in [15], an exact analysis necessitates theprovision of a deeper level of detail and adopted assumptionswhich we provide in the following subsections. A. Joins, Failures & Ring Stabilization

Initialization.

Initially, a node knows its key and at leastone node with key c that already exists in the network andis alive. The knowledge of such a node is assumed to be ac-quired through some out-of-band method. The predecessor p ,successors ( s .. S ) and ﬁngers ( f in .. M .node ) are all assignedto nil . Joins (Fig. 1). A new node n joins by looking up itssuccessor using the initial random contact node c . It also startsits ﬁrst stabilization of the successors and initializes its ﬁngers. Stabilization of Successors (Fig. 1). The function ﬁxSuc-cessors is triggered periodically with rate αλ s . A node n tells its ﬁrst alive successor y that it believes itself to be y ’s n . join ( c ) s = c .ﬁndSuccessor( n )ﬁxSuccessors()initFingers( s ) n . ﬁxSuccessors () y = ﬁrstAliveSuccessor() { y.p, y.s } = y .iThinkIamYourPred( n )if ( y.p ∈ ( me, y ) ) //Case Aprepend( y.p )ﬁxSuccessors()elsif ( y.p ∈ ( y, me ) ) //Case BconsiderANewPred( y.p )reconcilce( y.s )else //Case C: y.p == me reconcile( y.s ) n . ﬁrstAliveSuccessor ()while (true)if ( s == nil )//Broken Ring!!if (isAlive( s ))return ( s ) ∀ i ∈ .. ( S − s i = s i +1 s S = nil n . iThinkIAmYourPred ( x )if ((isNotAlive( p ) or ( p == nil )) p = x return( { s, x } )if ( x ∈ ( p, me ) ) oldp = pp = x return( { s, oldp } )elsereturn( { s, p } ) n . considerANewPred ( x )if (isNotAlive( p )or ( p == nil )or ( x ∈ ( p, n ) )) p = x n . reconcile ( s ′ )for i = 1 .. ( S − s i +1 = s ′ i n . prepend ( y )for i = S .. s i = s i − s = y Fig. 1J

OINS AND R ING S TABILIZATION A LGORITHMS . predecessor and expects as an answer y ’s predecessor y.p andsuccessors y.s . The response of y can lead to three actions: Case A . Some node exists between n and y ( i.e. , n ’s beliefis wrong), so n prepends y.p to its successor list as a ﬁrstsuccessor and retries ﬁxSuccessors . Case B . y conﬁrms n ’s belief and informs n of y ’s old prede-cessor y.p . Therefore n considers y.p as an alternative/initialpredecessor for n . Finally, n reconciles its successor list with y.s . Case C . y agrees that n is its predecessor and the only taskof n is to update its successor list by reconciling it with y.s .By calling iThinkIamYourPred (Fig. 1), some node x in-forms n that it believes itself to be n ’s predecessor. If n ’spredecessor p is not alive or nil , then n accepts x as apredecessor and informs x about this agreement by returning x . Alternatively, if n ’s predecessor p is alive (discovering thatwill be explained shortly in section III-C), then there are twopossibilities: The ﬁrst is that x is in the region between n andits current predecessor p , therefore n should accept x as anew predecessor and inform x about its old predecessor. Thesecond is that p is already pointing to x so the state is correctat both parties and n conﬁrms that to x by informing it that x is the predecessor of n . In all cases the function returns apredecessor and a successor list.The function ﬁrstAliveSuccessor (Fig. 1) iterates throughthe successor list. In each iteration, if the ﬁrst successor s isalive, it is returned. Otherwise, the dead successor is droppedfrom the list and nil is appended to the end of the list. If theﬁrst successor is nil this means that all immediate successorsare dead and that the ring is disconnected. n . initFingers ( s ) f ′ = s .f ∀ i ∈ .. M s.th. ( fin i .start ∈ ( n, s ] ), fin i .node = s ∀ j ∈ .. M s.th. ( fin j .start / ∈ ( n, s ] ), fin j .node = localSuccessor ( f ′ , fin j .start ) n . localSuccessor ( f , k )for i = 1 .. M if ( k ∈ ( n, fin i ] )return( fin i )return(nil) n . ﬁxFingers ( k ) ≤ i = random() ≤ M fin i .node =ﬁndSuccessor( fin i .start) Fig. 2I

NITIALIZATION AND S TABILIZATION OF F INGERS . B. Lookups and Stabilization of Fingers

Stabilization of Fingers (Fig. 2). Stabilization of ﬁngersoccurs at a rate (1 − α ) λ s . Each time the ﬁxFingers functionis triggered, a random ﬁnger f in i is chosen and a lookupfor f in i .start is performed and the result is used to update f in i .node . n . ﬁndSuccessor ( k )//Case A: k is exactly equal to n if ( k == n )return( n )//Case B: k is between n and s if ( k ∈ ( n, s ] )return(ﬁrstAliveSuccessorNoChange());//Case C: Forward to the lookup to//the closest preceding alive ﬁnger cpf = closestAlivePrecedingFinger( k );if ( cpf == nil ) y = ﬁrstAliveSuccessorNoChange();if ( k ∈ ( n, y ] )return( y ); cpf = closestAlivePrecedingSucc(k);return( cpf .ﬁndSuccessor(k))elsereturn ( cpf .ﬁndSuccessor(k)); n . ﬁrstAliveSuccessorNoChange () i = 1 while (true)if ( s i == nil )//Broken Ring!!if (isAlive( s i ))return ( s i ) i + + n . closestAlivePrecedingFinger ( k )for i = M .. if (( fin i ∈ ( n, k ) )and ( fin i = nil )and isAlive( fin i ))return( fin i )return(nil) n . closestAlivePrecedingSucc ( k )for i = S .. if (( s i ∈ ( n, k ) )and ( s i = nil )and isAlive( s i ))return( s i )return(cpf) Fig. 3T HE L OOKUP A LGORITHM . Initialization of Fingers (Fig. 2). After having initialized itsﬁrst successor s , a node n sets all ﬁngers with starts between n and s to s . The rest of the ﬁngers are initialized by takinga copy of the ﬁnger table of s and ﬁnding an approximatesuccessor to every ﬁnger from that ﬁnger table. Lookups (Fig. 3). A lookup operation is a fundamentaloperation that is used to ﬁnd the successor of a key. It is usedby many other routines and its performance and consistency are the main quantities of interest in the evaluation of anyDHT. A node n looking up the successor of k runs the ﬁndSuccessor algorithm which can lead to the following cases: Case A. If k is equal to n then n is trivially the successorof k . Case B. If k ∈ ( n, s ] then n has found the successor of k ,but it could be that s has failed and n has not yet discoveredthis. However, entries in the successor list can act as backupsfor the ﬁrst successor. Therefore, the ﬁrst alive successor of n is the successor of k . Note that, in this case, while we tryto ﬁnd the ﬁrst alive successor, we do not change the entriesin the successor list. This is mainly because, to simplify theanalysis, we want the successor list to be changed at a ﬁxedrate rate αλ s only by the ﬁxSuccessors function. Case C.

The lookup should be forwarded to a node closerto k , namely the closest alive ﬁnger preceding k in n ’s ﬁngertable. The call to the function closestAlivePrecedingFinger returns such a node if possible and the lookup is forwarded toit. However, it could be the case that all alive preceding ﬁngersto k are dead. In that case, we need to use the successor listas a last resort for the lookup. Therefore, we locate the ﬁrstalive successor y and if k ∈ ( n, y ] then y is the successor of k . Otherwise, we locate the closest alive preceding successorto k and forward the lookup to it. C. Failures

Throughout the code we use the call isAlive and isN otAlive . A simple interpretation of those routines wouldbe to equate them to a performance of a ping. However, acorrect implementation for them is that they are discoveredby performing the operation required. For instance, a call to f irstAliveSuccesor in Fig. 1 is performed to retrieve a node y and then call y.iT hinkIamY ourP red , so alternatively theﬁrst alive successor could be discovered by iterating on thesuccessor list and calling iT hinkIamY ourP red .IV. T HE A NALYSIS

A. Distributional Properties of Inter-Node Distances

In this section we will assume that all keys are populatedby peers with independent and equal probability, and, further-more, that this probability does not change with time. Theﬁrst condition is a natural consequence of peers joining andleaving/failing independently. The last condition, on the otherhand, does not hold strictly since the number of peers presentunder churn is a ﬂuctuating quantity, Nevertheless, it can beexpected to hold to good accuracy in sufﬁciently large systems.A detailed analysis along these lines will be given elsewhere.

Deﬁnition 4.1:

Given two keys u, v ∈ { ... K − } , the“distance” between them is u − v (with modulo- K arithmetic).We interchangeably say that u and v form an “interval” oflength u − v . Hence the number of keys inside an interval oflength ℓ is ℓ − keys. Property 4.1:

The probability P ( x ) of ﬁnding an intervalof length x is: P ( x ) = ρ x − (1 − ρ ) where ρ = K− N K .Under the stated conditions, each key will be populated withthe same probability N K = 1 − ρ , for N << K . An interval

Fig. 4( A ) C ASE WHEN n AND p HAVE THE SAME VALUE OF fin k .node . ( B )C ASE WHERE A NEWLY JOINED NODE p COPIES THE k th ENTRY OF ITSSUCCESSOR NODE n AS THE BEST APPROXIMATION FOR ITS OWN k th ENTRY ( BY THE JOIN PROTOCOL ). I

N THIS CASE , THERE COULD BE ANODE o WHICH IS THE ’ CORRECT ’ ENTRY FOR p.fin k .node . H OWEVER , SINCE p IS NEWLY JOINED , THE ONLY INFORMATION IT HAS ACCESS TO ISTHE FINGER TABLE OF n . of length x then involves x − consecutive unpopulated keys,and then one populated key, which explains the formula.We now derive some properties of this distribution whichwill be used in the ensuing analysis. Property 4.2:

For any two keys u and v , where v = u + x ,let b i be the probability that the ﬁrst node encountered inbetween these two keys is at u + i (where ≤ i < x ). Then b i ≡ ρ i (1 − ρ ) . The probability that there is deﬁnitely at leastone node between u and v is: a ( x ) ≡ − ρ x . Hence theconditional probability that the ﬁrst node is at a distance i given that there is at least one node in the interval is bc ( i, x ) ≡ b ( i ) /a ( x ) . Property 4.3:

The probability that a node and at least oneof its immediate predecessors share the same k th ﬁnger is p ( k ) ≡ ρ ρ (1 − ρ k − ) . The explanation for this propertygoes as follows. If the distance between node n and itspredecessor p is x , the distance between n.f in k . start and p.f in k . start is also x (see Fig. 4(a)). If there is no node inbetween n.f in k . start and p.f in k . start then n.f in k . node and p.f in k . node will share the same value. From Property 4.1,the probability that the distance between n and p is x is ρ x − (1 − ρ ) . However, x has to be less than k − , otherwise p.f in k . node will be equal to n . The probability that nonode exists between n.f in k . start and p.f in k . start is ρ x (byProperty 4.2). Therefore the probability that the n.f in k . node and p.f in k . node share the same value is: P k − − x =1 ρ x − (1 − ρ ) ρ x = ρ ρ (1 − ρ k − ) . It is straightforward (though tedious)to derive similar expressions for p ( k ) the probability that anode and at least two of its immediate predecessors share thesame k th ﬁnger, p ( k ) and so on. Property 4.4:

We can similarly assess the probability thatthe join protocol (see Section III-B) results in further replica-tion of the k th pointer. Let us deﬁne the probability p join ( i, k ) as the probability that a newly joined node, chooses the i th entry of its successor’s ﬁnger table for its own k th entry. Notethat this is unambiguous even in the case that the successor’s i th entry is repeated. All we are asking is, when is the k th entryof the new joinee the same as the i th entry of the successor? Fig. 5C

HANGES IN W , THE NUMBER OF WRONG ( FAILED OR OUTDATED ) s POINTERS , DUE TO JOINS , FAILURES AND STABILIZATIONS . Clearly i ≤ k . In fact for the larger ﬁngers, we only need toconsider p join ( k, k ) , since p join ( i, k ) ∼ for i < k . Using theinterval distribution we ﬁnd, for large k , p j oin ( k, k ) ∼ ρ (1 − ρ k − − ) + (1 − ρ )(1 − ρ k − − ) − (1 − ρ ) ρ (2 k − − ρ k − − .This function goes to for large k .We can also analogously compute p join ( i, k ) for any i .The only trick here is to estimate the probability that startingfrom i , the last distinct entry of n ’s ﬁnger table does not give p a better choice for its k th entry. This can againreadily be computed using property 4.2, but we do not do thecomputation here since for our purposes p join ( k, k ) sufﬁces. B. Successor Pointers

We now turn to estimating various quantities of interest forChord. In all that follows we will evaluate various average quantities, as a function of the parameters. To do this weneed to understand how the dynamical evolution of the systemaffects these quantities.In the case of Chord, we only need to consider one of threekinds of events happening at any micro-instant: a join, a failureor a stabilization. One assumption made in the following isthat such a micro-instant of time exists, or in other words,that we can divide time till we have an interval small enoughthat in this interval, only one of these three processes occursanywhere in the system. Implicit in this is the assumption thata stabilization (either of successors or ﬁngers) is done fasterthan the time-scales over which joins and fails occur.Another aspect of this system which simpliﬁes analysis isthat successor pointers of adjacent nodes are independent ofeach other. That is, the state of the ﬁrst successor pointer ofa given node does not affect the state of the ﬁrst successorpointer of either its predecessor or its successor. The samelogic also works for the state of the second successor pointersof adjacent nodes and so on. On the other hand, the state of

TABLE IG

AIN AND LOSS TERMS FOR W ( r, α ) : THE NUMBER OF WRONG FIRSTSUCCESSORS AS A FUNCTION OF r AND α .Change in W ( r, α ) Probability of Occurrence W ( t + ∆ t ) = W ( t ) + 1 c . = ( λ j N ∆ t )(1 − w ) W ( t + ∆ t ) = W ( t ) + 1 c . = λ f N (1 − w ) ∆ tW ( t + ∆ t ) = W ( t ) − c . = λ f Nw ∆ tW ( t + ∆ t ) = W ( t ) − c . = αλ s Nw ∆ tW ( t + ∆ t ) = W ( t ) 1 − ( c . + c . + c . + c . ) the second successor pointer of a node is clearly related to thestate of its ﬁrst successor pointer as well the state of the ﬁrstsuccessor pointer of the successor. This is taken into accountin the analysis of second and higher successor pointers. Incharacterizing the states of higher successors, we look for theleading order behavior in terms of the parameter r .Consider ﬁrst the successor pointers. Let w k ( r, α ) denotethe fraction of nodes having a wrong k th successor pointerand d k ( r, α ) the fraction of nodes having a failed successorpointer. Also, let W k ( r, α ) be the number of nodes havinga wrong k th successor pointer and D k ( r, α ) the number ofnodes having a failed successor pointer. A failed pointer isone which points to a departed node while a wrong pointerpoints either to an incorrect node (alive but not correct) or adead one. As we will see, both these quantities play a role inpredicting lookup consistency and lookup length.By the protocol for stabilizing successors in Chord, a nodeperiodically contacts its ﬁrst successor, possibly correcting itand reconciling with its successor list. Therefore, the numberof wrong k th successor pointers are not independent quantitiesbut depend on the number of wrong ﬁrst successor pointers.We write an equation for W ( r, α ) by accounting for allthe events that can change it in a micro event of time ∆ t . Anillustration of the different cases in which changes in W takeplace due to joins, failures and stabilizations is provided inFig. 5. In some cases W increases/decreases while in others itstays unchanged. For each increase/decrease, Table I providesthe corresponding probabilities.By our implementation of the join protocol, a new node n y ,joining between two nodes n x and n z , always has a correct s pointer after the join. However the state of n x .s beforethe join makes a difference. If n x .s was correct (pointingto n z ) before the join, then after the join it will be wrongand therefore W increases by . If n x .s was wrong beforethe join, then it will remain wrong after the join and W isunaffected. Thus, we need to account for the former case only.The probability that n x .s is correct is − w and term c . follows from this.For failures, we have cases. To illustrate them we usenodes n x , n y , n z and assume that n y is going to fail. First,if both n x .s and n y .s were correct, then the failure of n y will make n x .s wrong and hence W increases by . Second,if n x .s and n y .s were both wrong, then the failure of n y will decrease W by one, since one wrong pointer disappears.Third, if n x .s was wrong and n y .s was correct, then W is unaffected. Fourth, if n x .s was correct and n y .s waswrong, then the wrong pointer of n y disappears and n x .s becomes wrong, therefore W is unaffected. For the ﬁrst case w (r , α ) , d (r , α ) Rate of Stabilisation /Rate of failure (r= λ s / λ f )w (r,0.25) Simulationw (r,0.5) Simulationw (r,0.75) Simulationw (r, ) Theoryw (r, ) Theoryw (r, ) Theoryd (r,0.75) Simulationd (r, 0.75) Theory Fig. 6T

HEORY AND SIMULATION FOR THE PROBABILITY OF WRONG st SUCCESSOR w ( r, α ) AND FAILED st SUCCESSOR d ( r, α ) . to happen, we need to pick two nodes with correct pointers,the probability of this is (1 − w ) . For the second case tohappen, we need to pick two nodes with wrong pointers, theprobability of this is w . From these probabilities follow theterms c . and c . .Finally, a successor stabilization does not affect W , unlessthe stabilizing node had a wrong pointer. The probability ofpicking such a node is w . From this follows the term c . .Hence the equation for W ( r, α ) is: dW N dt = λ j (1 − w ) + λ f (1 − w ) − λ f w − αλ s w Solving for w in the steady state and putting λ j = λ f , weget: w ( r, α ) = 23 + rα ≈ rα (1)This expression matches well with the simulation resultsas shown in Fig. 6. d ( r, α ) is then ≈ w ( r, α ) since when λ j = λ f , about half the number of wrong pointers are incorrectand about half point to dead nodes. Thus d ( r, α ) ≈ rα whichalso matches well the simulations as shown in Fig. 6.The fraction of wrong second successors can be estimatedin an analogous manner. Consider, for a node n , the possiblestates of the successor, n.s , the successor of the successor, ∗ ( n.s ) .s , and the second successor, n.s . In a fully correctstate, ∗ ( n.s ) .s and n.s of course point to the same node.If in such a state either n.s or ∗ ( n.s ) .s becomes incorrectthrough the action of a join or a failure, then n.s is alsoincorrect. On the other hand, n.s cannot be corrected bythe stabilization protocol unless both n.s and ∗ ( n.s ) .s areboth already corrected. Hence, n.s is wrong if either n.s or ∗ ( n.s ) .s are wrong, and also if both n.s and ∗ ( n.s ) .s are correct, but n.s has not yet been corrected. If the numberof such non-stabilized conﬁgurations is N and the fraction is n , we have w = 2 w − w + n (2)To estimate n we consider how these conﬁgurations mightbe gained or lost. The gain term arises from stabilizations of conﬁgurations where n.s is correct but ∗ ( n.s ) .s is wrong.A stabilization performed by node n.s then results in thegain of a N conﬁguration. On the other hand, non-stabilizedconﬁgurations are lost either by a stabilization performedby node n (when it gets the correct successor list from itssuccessor and hence corrects n.s ), or by corrupting either n.s or ∗ ( n.s ) .s (by a join or failure). The latter possibilitygives terms of order r and we can ignore it in the limit thatstabilizations happens on a much faster time scale than joinsand failures ( i.e. , r much larger than unity). The equation for N is hence dN dt ≈ αλ s w (1 − w ) − αλ s n (3)which implies n ≈ w to order r . Thus, we have w ≈ r .For higher successors we reason similarly by consideringthe state of the k − st successor pointer of node n , the suc-cessor pointer of the k − st successor, and the k th successorpointer of node n . We can write a recursion equation for w k the fraction of nodes with wrong k th successor pointer w k = w + w k − − w k − w + n k (4)where n k is the density of conﬁgurations where the k − st successor pointer of node n and the ﬁrst successor pointer ofthe k − st successor are both correct, but this informationhas not yet been used to correct the k th successor pointer ofnode n . If node n does not as yet have the correct informationabout its k th successor, that means that either all the nodesin between n and its k − st successor have the correctinformation but node n has not as yet stabilized, or that thestabilization has propagated back from the k − st successorto some node in between but not as yet to n.s . To elaborateon this further, there is the case where the second successorpointer of the k − nd successor has not been corrected, thenthe case where this has been done, but the third successorpointer of the k − rd successor has not been corrected, andso on. Each of these is analogous to n and each occurswith density (1 − w k − ) w , if joins and failures are neglectedcompared to stabilizations. Hence, if to leading order in r wehave w k ∼ c k αr , then c k = c k − + kc (5)which leads to w k ≈ k ( k + 1) αr (6). We note that this expression obviously depends on the detailsof the stabilization scheme, and is in principle only valid upto k ∼ √ r . As shown in Fig. 7, the agreement between theoryand simulation is still however quite reasonable at k = 5 and r = 100 . C. Break-up (Network Disconnection) Probability

We demonstrate below, how calculating d k ( r, α ) : the frac-tion of nodes with dead k th pointers, helps in estimating theprobability that the network gets disconnected for any value of r and α . Let P bu ( n, r, α ) be the probability that n consecutivenodes fail. If n = S , the length of the successor list, then F r ac ti on o f nod e s w it h w r ong k t h s u cce ss o r , w k (r , α ) Rate of Stabilisation of Successors/Rate of failure ( α r= αλ s / λ f )w (r,0.5) Simulationw (r,0.5) Theoryw (r,0.5) Simulationw (r,0.5) Theoryw (r,0.5) Simulationw (r,0.5) Theoryw (r,0.5) Simulationw (r,0.5) Theoryw (r,0.5) Simulationw (r,0.5) Theory Fig. 7T

HEORY AND SIMULATION FOR THE PROBABILITY OF A WRONG k th SUCCESSOR w k ( r, α ) .TABLE IIG AIN AND LOSS TERMS FOR N bu (2 , r, α ) : THE NUMBER OF NODES WITHDEAD FIRST and

SECOND SUCCESSORS .Change in N bu ( r, α ) Probability of Occurrence N bu ( t + ∆ t ) = N bu ( t ) + 1 c . = ( λ f N ∆ t ) d ( r, α ) N bu ( t + ∆ t ) = N bu ( t ) + 1 c . = λ f N ∆ t (1 − d ) d N bu ( t + ∆ t ) = N bu ( t ) − c . = αλ s N ∆ tP bu (2 , r, α ) N bu ( t + ∆ t ) = N bu ( t ) 1 − ( c . + c . + c . ) clearly the node whose successor list this is, gets disconnectedfrom the network and the network breaks up. For the rangeof r considered in Fig. 6, P bu ( S , r, α ) ∼ . However shouldwe go lower, this starts becoming ﬁnite. The master equationanalysis introduced here can be used to estimate P bu ( n, r, α ) for any ≤ n ≤ S . We indicate how this might be doneby ﬁrst considering the case n = 2 . Let N bu (2 , r, α ) be thenumber of conﬁgurations in which a node has both s and s dead and P bu (2 , r, α ) be the fraction of such conﬁgurations.Table II indicates how this is estimated within the presentframework.A join event does not affect this probability in any way. Sowe only need to consider the effect of failures or stabilizationevents. The term c . accounts for the situation when the ﬁrst successor of a node is dead (which happens with probability d ( r, α ) as explained above). A failure event can then kill itssecond successor as well and this happens with probability c . . The second term is the situation that the ﬁrst successoris alive (with probability − d ) but the second successor isdead (with probability d ). The logic used to estimate d (or d k in general) is very similar to the reasoning we used toestimate the w k ’s. So we have d k = d + ( k − d = kd (7)Thus the k th successor of a node is dead if the k − st succes-sor’s successor is dead, or the k − st successor’s successoris not dead but the intermediate nodes think it is because theyhaven’t stabilized. Hence d ∼ /αr . This estimate for d matches the simulation results very well, as shown in Fig. 8.Coming back to counting the gain and loss terms for d (r , α ) Rate of Stabilisation /Rate of failure (r= λ s / λ f )d (r,0.5) Simulationd (r,0.5) Theoryd (r,0.5) Simulationd (r,0.25) Theoryd (r,0.5) Simulationd (r,0.75) Theory Fig. 8T

HEORY AND SIMULATION FOR THE PROBABILITY OF FAILURE OF THE nd SUCCESSOR , d ( r, α ) . N bu (2 , r, α ) , a stabilization event reduces the number of suchconﬁgurations by one, if the node doing the stabilization hadsuch a conﬁguration to begin with.Solving the equation for N bu (2 , r, α ) , one hence obtainsthat P bu (2 , r, α ) ∼ / ( αr ) . As Fig. 9 shows, this is a preciseestimate.We can similarly estimate the probabilities for three con-secutive nodes failing, etc , and hence also the general discon-nection probability P bu ( S , r, α ) . In fact P bu ( S , r, α ) may bewritten in terms of the d k ( r, α ) as: P bu ( S ) = ( S − P S d i ( r, α )( αr ) S− (8)The logic behind this equation is similar to that used forsolving for P bu (2) , namely that for S consecutive nodes tofail, any S − of the S nodes should have failed ﬁrst, andthen a failure event kills the remaining node. (8) is readilysolved by substituting the values of the d k ’s to get P bu ( S ) = ( S + 1)!2( αr ) S (9)As mentioned above this is again correct only to leadingorder. Namely there will be correction terms of the order r S +1 which we haven’t computed at this level of approximation.The Master Equation formalism thus affords the possibilityof making a precise prediction for when the system runs thedanger of getting disconnected, as a function of the parameters. Lookup Consistency

By the lookup protocol, a lookup isinconsistent if the immediate predecessor of the sought keyhas a wrong s pointer. However, we need only consider thecase when the s pointer is pointing to an alive (but incorrect)node since our implementation of the protocol always requiresthe lookup to return an alive node as an answer to the query.The probability that a lookup is inconsistent I ( r, α ) is hence w ( r, α ) − d ( r, α ) . This prediction matches the simulationresults very well, as shown in Fig. 10. B r ea k - up P r ob a b ilit y P bu ( S , r , α ) Rate of Stabilisation of Successors/Rate of failure ( α r= αλ s / λ f )P bu (2,r,0.25) SimulationP bu (2,r,0.25) TheoryP bu (2,r,0.5) SimulationP bu (2,r,0.5) TheoryP bu (2,r,0.75) SimulationP bu (2,r,0.75) Theory Fig. 9T

HEORY AND SIMULATION FOR THE BREAK - UP PROBABILITY P bu (2 , r, α ) . I(r , α ) Rate of Stabilisation of Successors/Rate of failure ( α r= αλ s / λ f )I(r,0.25) SimulationI(r,0.5) SimulationI(r,0.75) SimulationI(r, ) theoryI(r, ) theoryI(r, ) theory Fig. 10T

HEORY AND SIMULATION FOR INCONSISTENT LOOKUPS I ( r, α ) . D. Failure of Fingers

We now turn to estimating the fraction of ﬁnger pointerswhich point to failed nodes. As we will see this is animportant quantity for predicting lookups, since failed ﬁngerscause timeouts and increase the lookup length. However, weonly need to consider ﬁngers pointing to dead nodes. Unlikemembers of the successor list, alive ﬁngers even if outdated,always bring a query closer to the destination and do notaffect consistency or substantially even the lookup length.Therefore we consider ﬁngers in only two states, alive or dead(failed). By our implementation of the stabilization protocol(see Sections III-A and III-B), ﬁngers and successors arestabilized entirely independently of each other to simplify theanalysis. Thus even though the ﬁrst ﬁnger is also always theﬁrst successor, this information is not used by the node inupdating the ﬁnger. Fingers of nodes far apart are independentof each other. Fingers of adjacent nodes can be correlated andwe take this into account. The only assumption in this sectionis in connection with the join protocol as explained below.

Fig. 11C

HANGES IN F k , THE NUMBER OF FAILED fin k POINTERS , DUE TO JOINS , FAILURES AND STABILIZATIONS .TABLE IIIT

HE RELEVANT GAIN AND LOSS TERMS FOR F k , THE NUMBER OF NODESWHOSE kth

FINGERS ARE POINTING TO A FAILED NODE FOR k > . F k ( t + ∆ t ) Probability of Occurence = F k ( t ) + 1 c . = ( λ j N ∆ t ) P ki =1 p join ( i, k ) f i = F k ( t ) − c . = (1 − α ) M f k ( λ s N ∆ t )= F k ( t ) + 1 c . = (1 − f k ) [1 − p ( k )]( λ f N ∆ t )= F k ( t ) + 2 c . = (1 − f k ) ( p ( k ) − p ( k ))( λ f N ∆ t )= F k ( t ) + 3 c . = (1 − f k ) ( p ( k ) − p ( k ))( λ f N ∆ t )= F k ( t ) 1 − ( c . + c . + c . + c . + c . ) Let f k ( r, α ) denote the fraction of nodes whose k th ﬁngerpoints to a failed node and F k ( r, α ) denote the respectivenumber. For notational simplicity, we write these as simply F k and f k . We can predict this function for any k by againestimating the gain and loss terms for this quantity, caused bya join, failure or stabilization event, and keeping only the mostrelevant terms. These are listed in Table III and illustrated inFig. 11A join event can play a role here by increasing the numberof F k pointers if the successor of the joinee had a failed i th pointer (occurs with probability f i ) and the joinee replicatedthis from the successor as the joinee’s k th pointer. (occurs withprobability p join ( i, k ) from property 4.4). For large enough k ,this probability is one only for p join ( k, k ) , that is, the newjoinee mostly only replicates the successor’s k th pointer as itsown k th pointer. This is what we consider here.A stabilization evicts a failed pointer if there was one tobegin with. The stabilization rate is divided by M , since anode stabilizes any one ﬁnger randomly, every time it decidesto stabilize a ﬁnger at rate (1 − α ) λ s .Given a node n with an alive k th ﬁnger (occurs withprobability − f k ), when the node pointed to by that ﬁngerfails, the number of failed k th ﬁngers ( F k ) increases. Theamount of this increase depends on the number of immediatepredecessors of n that were pointing to the failed node with their k th ﬁnger. That number of predecessors could be , , ,.. etc. Using property 4.3 the respective probabilities of thosecases are: − p ( k ) , p ( k ) − p ( k ) , p ( k ) − p ( k ) ,... etc.Solving for f k in the steady state, we get: f k = h P rep ( k ) + 2 − p join ( k ) + r (1 − α ) M i P rep ( k )) − rh P rep ( k ) + 2 − p join ( k ) + r (1 − α ) M i − P rep ( k )) P rep ( k )) (10)where ˜ P rep ( k ) = Σ p i ( k ) . In practice, it is enough to keep theﬁrst three terms in this sum. To ﬁrst order in r we have, inanalogy to (6), f k ≈ (1 + ˜ P rep ( k )) M (1 − α ) r (11)This expression simply says that the fraction of dead ﬁngersis inversely proportional to the rate of ﬁnger stabilizations, (1 − α ) r , and proportional to how many ﬁngers there are tostabilize, M , with the proportionality factor (1 + ˜ P rep ( k )) depending only on ρ .To sum up, the computation of the fraction of dead k th ﬁnger pointers is analogous to the calculation of the fractionof wrong ﬁrst successor pointer, albeit a bit more involved.No recursion is involved, in contrast to the calculation ofthe fraction of wrong higher successor pointers. The aboveexpressions, (10) match very well with the simulation results(Fig. 13). E. Cost of Finger Stabilizations and Lookups

In this section, we demonstrate how the information aboutthe failed ﬁngers and successors can be used to predict the costof stabilizations, lookups or in general the cost for reachingany key in the id space. By cost we mean the number ofhops needed to reach the destination including the number oftimeouts encountered en-route. Timeouts occur every time aquery is passed to a dead node. The node does not answer andthe originator of the query has to use another ﬁnger instead.For this analysis, we consider timeouts and hops to add equallyto the cost. We can easily generalize this analysis to investigatethe case when a timeout costs some factor γ times the cost ofa hop.Deﬁne C t ( r, α ) (also denoted by C t ) to be the expectedcost for a given node to reach some target key which is t keysaway from it (which means reaching the ﬁrst successor ofthis key). For example, C would then be the cost of lookingup the adjacent key ( key away). Since the adjacent key isalways stored at the ﬁrst alive successor, therefore if the ﬁrstsuccessor is alive (which occurs with probability − d ), thecost will be hop. If the ﬁrst successor is dead but the secondis alive (occurs with probability d (1 − d ) ), the cost will be1 hop + 1 timeout = and the expected cost is × d (1 − d ) and so forth. Therefore, we have C = 1 − d + 2 × d (1 − d ) + 3 × d d (1 − d ) + · · · ≈ d = 1 + 1 / ( αr ) . To ﬁnd the expected cost for reaching a general distance t we need to closely follow the Chord protocol, which wouldlookup t by ﬁrst ﬁnding the closest preceding ﬁnger. For thepurposes of the analysis, we will ﬁnd it easier to think in termsof the closest preceding start . Let us hence deﬁne ξ to be the s tart of the ﬁnger (say the k th ) that most closely precedes t . Hence ξ = 2 k − + n and t = ξ + m i.e. , there are m keys between the sought target t and the start of the closestpreceding ﬁnger. With that, we can write a recursion relationfor C ξ + m as follows: C ξ + m = C ξ [1 − a ( m )]+ (1 − f k ) a ( m ) " m − X i =0 bc ( i, m ) C m − i + f k a ( m ) (cid:20) k − X i =1 h k ( i ) ξ/ i − X l =0 bc ( l, ξ/ i )(1 + ( i −

1) + C ξ i − l + m ) + O ( h k ( k )) (cid:21) (12)where ξ i ≡ P m =1 ,i ξ/ m and h k ( i ) is the probability thata node is forced to use its k − i th ﬁnger owing to the deathof its k th ﬁnger. The probabilities a, b, bc have already beenintroduced in Section IV, and we deﬁne the probability h k ( i ) below.The lookup equation though rather complicated at ﬁrst sightmerely accounts for all the possibilities that a Chord lookupwill encounter, and deals with them exactly as the protocoldictates.The ﬁrst term (Fig. 12 (a)) accounts for the eventuality thatthere is no node intervening between ξ and ξ + m (occurswith probability − a ( m ) ). In this case, the cost of lookingfor ξ + m is the same as the cost for looking for ξ .The second term (Fig. 12 (b)) accounts for the situationwhen a node does intervene in between (with probability a ( m ) ), and this node is alive (with probability − f k ). Thenthe query is passed on to this node (with added to registerthe increase in the number of hops) and then the cost dependson the length of the distance between this node and t .The third term (Fig. 12 (c)) accounts for the case when theintervening node is dead (with probability f k ). Then the costincreases by (for a timeout) and the query needs to ﬁnd analternative lower ﬁnger that most closely precedes the target.Let the k − i th ﬁnger (for some i , ≤ i ≤ k − ) be such aﬁnger. This happens with probability h k ( i ) i.e. , the probabilitythat the lookup is passed back to the k − i th ﬁnger eitherbecause the intervening ﬁngers are dead or share the sameﬁnger table entry as the k th ﬁnger is denoted by h k ( i ) . Thestart of the k − i th ﬁnger is at ξ/ i and the distance between ξ/ i and ξ is equal to P m =1 ,i ξ/ m which we denote by ξ i .Therefore, the distance from the start of the k − i th to the targetis equal to ξ i + m . However, note that f in k − i .node could be l keys away (with probability bc ( l, ξ/ i ) ) from f in k − i .start (for some l , ≤ l < ξ/ i ). Therefore, after making one hopto f in k − i .node , the remaining distance to the target is ξ i + Fig. 12C

ASES THAT A LOOKUP CAN ENCOUNTER WITH THE RESPECTIVE PROBABILITIES AND COSTS . f k (r , α ) Rate of Stabilisation of Fingers/Rate of failure ((1- α )r=(1- α ) λ s / λ f )f (r,0.5) Simulationf (r,0.5) Theoryf (r,0.5) Simulationf (r,0.5) Theoryf (r,0.5) Simulationf (r,0.5) Theoryf (r,0.5) Simulationf (r,0.5) Theory 6 6.4 6.8 7.2 7.6 8 8.4 8.8 9.2 9.6 10 10.4 10.8 11.2 0 200 400 600 800 1000 1200 1400 1600 L ookup l a t e n c y ( hop s + ti m e ou t s ) L (( - α )r) Rate of Stabilisation of Fingers/Rate of failure (1- α )rL((1- α )r) SimulationL((1- α )r) Theory Fig. 13T

HEORY AND SIMULATION FOR PROBABILITY OF FAILURE OF THE k th FINGER f k ( r, α ) , AND THE LOOKUP LENGTH L ( r, α ) . m − l . The increase in cost for this operation is i − ; the indicates the cost of taking up the query again by f in k − i .node , and the i − indicates the cost for trying anddiscarding each of the i − intervening ﬁngers. The probability h k ( i ) is easy to compute given property 4.2 and the expressionfor the f k ’s computed in the previous section. h k ( i ) = a ( ξ/ i )(1 − f k − i ) × Π s =1 ,i − (1 − a ( ξ/ s ) + a ( ξ/ s ) f k − s ) , i < kh k ( k ) =Π s =1 ,k − (1 − a ( ξ/ s ) + a ( ξ/ s ) f k − s ) (13)In (13) we account for all the reasons that a node mayhave to use its k − i th ﬁnger instead of its k th ﬁnger. Thiscould happen because the intervening ﬁngers were either deador not distinct. The probabilities h k ( i ) satisfy the constraint P ki =1 h k ( i ) = 1 since clearly, either a node uses any one ofits ﬁngers or it doesn’t. This latter probability is h k ( k ) , that isthe probability that a node cannot use any earlier entry in itsﬁnger table. In this case, n proceeds to its successor list. Thequery is now passed on to the ﬁrst alive successor and the newcost is a function of the distance of this node from the target t .We indicate this case by the last term in 12 which is O ( h k ( k )) . This can again be computed from the inter-node distributionand from the functions d k ( r, α ) computed earlier. However inpractice, the probability for this is extremely small except fortargets very close to n . Hence this does not signiﬁcantly affectthe value of general lookups and we ignore it in our analysis.The cost for general lookups is hence L ( r, α ) = Σ K− i =1 C i ( r, α ) K The lookup equation is solved recursively numerically, giventhe coefﬁcients and C . In Fig. 13, we compare theoreticalresults with simulation for N = 1000 . It is seen that the theorymatches the simulation results very well.In Fig. 14 we also show the theoretical predictions forsome larger values of N . From the structure of Equation12, it is clear that the dependence of the average lookupon churn comes entirely from the presence of the terms f k .Since f k ∼ f is independent of k for large ﬁngers, we canapproximate the average lookup length by the functional form L ( r, α ) = A + Bf + Cf + · · · . The coefﬁcients A, B, C etc can be recursively computed by solving the lookup equation tothe required order in f and depend only on N the number of L (( - α )r) fr o m t h e L ookup E qu a ti on (1- α )r N=1000N=2000N=4000N=8000N=160007.846+7.846*(f+3*f )7.346+7.346*(f+3*f )6.846+6.846*(f+3*f )6.346+6.346*(f+3*f )5.846+5.846*(f+3*f ) Fig. 14L

OOKUP COST , THEORETICAL CURVE , FOR , , , AND

PEERS . T

HE RATIONALE FOR THE FITS IS EXPLAINED IN THE TEXT . nodes, − ρ the density of peers and b the base or equivalentlythe size of the ﬁnger table of each node. The advantage ofwriting the lookup length this way is that churn-speciﬁc detailssuch as how new joinees construct a ﬁnger table or howexactly stabilizations are done in the system, can be isolatedin the expression for f . If we were to change our stabilizationstrategy for example [9], we could immediately estimate thelookup length by plugging in the new expression for f in theabove relation.The coefﬁcient A , which is the lookup cost without churncan be obtained very precisely for any base b , from analyzing(12) in the zero-churn case. This analysis is rather laboriousand will be presented elsewhere [9]. It conﬁrms the well-known result A = log N and in addition reproduces smalldeviations from this behavior previously observed by us innumerical simulations [7]. The values of A in Fig. 14 aretaken from this analysis. B can be qualitatively estimated as follows : every suf-ﬁciently long ﬁnger is dead with some ﬁnite probability f given by (10). If A is the average value of the lookuplength without churn, then each look-up encounters f A deadﬁngers on average. This estimate predicts a look-up cost ofapproximately A (1 + f ) , giving B = A and C and all othercoefﬁcients equal to ..In Fig. 14 we show that the best ﬁt to the data is obtainedin fact by taking B = A and C = 3 A . The expressionfor f is taken from 10 for large k (for a system with ﬁngers, the expression for f k becomes independent of k for k ≥ ). In general, as mentioned earlier, B and C can beobtained accurately for any value of the system parameters bythe numerical solution of Eq. 12 to the required order.V. D ISCUSSION AND C ONCLUSION

In this paper we have presented a detailed theoreticalanalysis of a DHT-based P2P system, Chord, using a ﬂuidmodel. The technique for deriving the ﬂuid model has beenborrowed from the master equation approach of physics, whichhelps in systematically taking different dynamical effects intoaccount. This analysis differs from previous theoretical work done on DHTs in that it aims not at establishing bounds,but on precise determination of the relevant quantities in thisdynamically evolving system. From the match of our theoryand the simulations, it can be seen that we can predict withan accuracy of greater than in most cases. Though thisanalysis is not exact , since it takes only some (and not all)correlations into account, yet it provides a methodology forkeeping track of most of the relevant details of the system.We expect that a similar analysis can be done for most otherDHT’s, thus helping to establish quantitative guidelines fortheir comparison.The main conclusions for the analysis of Chord in astatistically steady state are the following. Property 5.1:

As a function of r , the ratio of the rate ofstabilizations to the rate of failures, the fraction of wrongpointers of any kind (successors or ﬁngers) is to leading orderand good approximation Const. /r , where the constant dependson the pointer. Property 5.2:

The probability of break up of a ring can beestimated from the knowledge of the fraction of wrong ﬁrstsuccessors, wrong second successors, etc. This probability isgenerally very low when every node has a sufﬁcient number ofsuccessors, indicating that Chord is robust against ring break-up.

Property 5.3:

At a given value of r , the fraction of wrongsuccessors, w k , and the fraction of dead ﬁngers, f k , increaseswith k . The fraction of wrong successors increases indeﬁnitely,and becomes of order one at k about √ r for the particularstabilization strategy that we have used. The fraction of deadﬁngers on the other hand tends to a constant for sufﬁcientlylarge k . Property 5.4:

The look-up cost, which is the expected num-ber of hops including time-outs, can be computed by numericalrecursion. The fraction of incorrect ﬁnger pointers f k ( ∼ f for large k ) is a required input for this recursion. The lookupcost tends to the well-known average number of hops withoutchurn when f is small (or churn is low) and increases when f is large. We show that it can be well described by the formula A (1 + g ( f )) , where A is the value of the lookup cost withoutchurn and g ( f ) is well approximated by f +3 f for N << K .In general g ( f ) can be obtained accurately to any desired orderby solving Eq. 12 recursively to the required order in f . Property 5.5:

The preceding note brings out the followingsimple feature of Chord: under any state of churn, sufﬁcientlylong ﬁngers are all dead with essentially the same probability.Hence, in a sufﬁciently large system, a look-up will almostsurely encounter one or more dead ﬁngers, leading to time-outs. For applications where time-outs should be the exceptionand not the norm, this paper helps in estimating how muchstabilization is necessary under a given level of churn, toachieve such a level of performance.

Property 5.6:

The preceding note also brings out the ad-ditional feature that by writing the lookup cost in the abovesimpliﬁed form, we can isolate the effects of churn-speciﬁcdetails in the expression for f . Changing details in the joinprotocol or changing the maintenance strategy [9] merelycause a change in the expression for f . The lookup cost withthis new strategy can then be immediately assessed for any r , by plugging in the new expression for f in the expression forthe lookup cost (as opposed to solving Eq. 12 each time foreach value of r ).The impact of this work can be summarized as follows:given that periodic stabilization is a fundamental techniquefor topology maintenance in DHTs, the question: ”How oftenshould a DHT node perform periodic stabilization?” is of greatpractical relevance. The answer to this question depends onseveral factors. First we need to know where the DHT isdeployed, in a LAN, in a cooperative milieu, or among publicnon-trusting partners, i.e. , what is the expected join/failure rate(churn)? Secondly, since DHTs involve different types of stabi-lizations, we need to know which of these rates is of interest tooptimize. For example, in the DHT studied in this paper, thereis both ring stabilization as well as ﬁnger stabilization. Thirdly,we also need to know whether we have performance goalswhich require us to know how much stabilization is needed,or constraints on bandwidth which necessitate a knowledgeof the expected performance. Previous analytical attempts(see Section II) have addressed these question through theidentiﬁcation of general (algorithm/system-neutral) bounds onstabilization rates.In this paper, we have taken another point of view. We havetraded-off generality for accuracy. That is, we have producedresults that can describe to a very high degree of accuracyquantities like the probability of inconsistent look-ups and theexpected look-up length as functions of the stabilization andchurn rates. Many of the insights we get from this analysissuch as most of the points listed above, would be very hard tocome by from simulations alone. So for instance, the formulaeproduced in this paper could directly be used by a systemadministrator or the person in charge of deploying a DHT asa guide for conﬁguring stabilization rates. While the resultsare based on Chord, all analyses concerning the ring (break-up and inconsistency) are applicable to many other systems,since consistent hashing on a ring is a recurring component inmany other DHTs.VI. L IMITATIONS AND F UTURE W ORK

The main limitation of this work stems from the fact that theresults are inherently dependent on the intricate details of theanalyzed algorithms. While some changes in the algorithmscan be easily accommodated without redoing the analysis (asexplained in 5.6), others such as a different lookup strategy ora different placement of ﬁngers would necessitate recalculatingall the quantities again. However, results concerning the ring-related aspects like successor lists, break-up probability andinter-node distributions are likely to be reusable in othervariations of the Chord protocols as well other systems usinga ring geometry.For the future, the authors’ research agenda include theintroduction of extensions to the current model to be able toaccount for locality-awareness and different topology main-tenance techniques. Some work towards the latter goal hasalready been done in [9]. Relatedly, a useful application forthis work is to enable systems to dynamically self-tune theirstabilization rates and choose the best maintenance techniqueto achieve a desired hop count. R

EFERENCES[1] Karl Aberer, Anwitaman Datta, and Manfred Hauswirth,

Efﬁcient, self-contained handling of identity in peer-to-peer systems , IEEE Transac-tions on Knowledge and Data Engineering (2004), no. 7, 858–869.[2] D. Anick, D. Mitra, and M.M. Sondhi, Stochastic theory of data-handling systems with multiple sources , Bell Systems Technical Journal (1982), 1871–1894.[3] James Aspnes, Zo¨e Diamadi, and Gauri Shah, Fault-tolerant routing inpeer-to-peer systems , Proceedings of the twenty-ﬁrst annual symposiumon Principles of distributed computing, ACM Press, 2002, pp. 223–232.[4] E. Brockmeyer, H.L. Halstrom, and Arns Jensen,

The life and works ofA.K. Erlang , The Copenhagen Telephone Company, 1948.[5] Miguel Castro, Manuel Costa, and Antony Rowstron,

Performance anddependability of structured peer-to-peer overlays , Proceedings of the2004 International Conference on Dependable Systems and Networks(DSN’04), IEEE Computer Society, 2004.[6] Florence Cl´evenot and Philippe Nain,

A simple ﬂuid model for theanalysis of the squirrel peer-to-peer caching system , IEEE INFOCOM2004, 2004.[7] Sameh El-Ansary, Erik Aurell, and Seif Haridi,

A physics-inspiredperformace evaluation of a structured peer-to-peer overlay network ,The International Conference on Parallel and Distributed Computingand Networks (PDCN 2005), 2005.[8] Supriya Krishnamurthy, Sameh El-Ansary, Erik Aurell, and Seif Haridi,

A statistical theory of chord under churn , The 4th International Work-shop on Peer-to-Peer Systems (IPTPS’05) (Ithaca, New York), February2005.[9] ,

Comparing maintenance strategies for overlays , Tech. report,Swedish Institute of Computer Science, in preparation 2007.[10] Jinyang Li, Jeremy Stribling, Robert Morris, M. Frans Kaashoek, andThomer M. Gil,

A performance vs. cost framework for evaluating dhtdesign tradeoffs under churn , Proceedings of the 24th Infocom (Miami,FL), March 2005.[11] David Liben-Nowell, Hari Balakrishnan, and David Karger,

Analysisof the evolution of peer-to-peer systems , ACM Conf. on Principles ofDistributed Computing (PODC) (Monterey, CA), July 2002.[12] N.G. van Kampen,

Stochastic Processes in Physics and Chemistry ,North-Holland Publishing Company, 1981, ISBN-0-444-86200-5.[13] Dongyu Qui and R. Srikant,

Modeling and performance analysis ofbittorrent-like peer-to-peer networks , SIGCOMM’04 (Portland, Oregon),August 2004.[14] Sean Rhea, Dennis Geels, Timothy Roscoe, and John Kubiatowicz,

Handling churn in a DHT , Proceedings of the 2004 USENIX AnnualTechnical Conference(USENIX ’04) (Boston, Massachusetts, USA),June 2004.[15] Ion Stoica, Robert Morris, David Liben-Nowell, David Karger, M. FransKaashoek, Frank Dabek, and Hari Balakrishnan,