[PDF] Slotless Protocols for Fast and Energy-Efficient Neighbor Discovery

Abstract

In mobile ad-hoc networks, neighbor discovery protocols are used to find surrounding devices and to establish a first contact between them. Since the clocks of the devices are not synchronized and their energy-budgets are limited, usually duty-cycled, asynchronous discovery protocols are applied. Only if two devices are awake at the same point in time, they can rendezvous. Currently, time-slotted protocols, which subdivide time into multiple intervals with equal lengths (slots), are considered to be the most efficient discovery schemes. In this paper, we break away from the assumption of slotted time. We propose a novel, continuous-time discovery protocol, which temporally decouples beaconing and listening. Each device periodically sends packets with a certain interval, and periodically listens for a given duration with a different interval. By optimizing these interval lengths, we show that this scheme can, to the best of our knowledge, outperform all known protocols such as DISCO, U-Connect or Searchlight significantly. For example, Searchlight takes up to 740 % longer than our proposed technique to discover a device with the same duty-cycle. Further, our proposed technique can also be applied in widely-used asymmetric purely interval-based protocols such as ANT or Bluetooth Low Energy.

Full PDF

SSLOTLESS PROTOCOLS FOR FAST AND ENERGY-EFFICIENT NEIGHBORDISCOVERY

PHILIPP H. KINDT, MARCO SAUR AND SAMARJIT CHAKRABORTY

Institute for Real-Time Computer Systems,Technische Universit¨at M¨unchen,80290, Munich, Germany.

Abstract.

In mobile ad-hoc networks, neighbor discovery protocols are used to ﬁnd surrounding devicesand to establish a ﬁrst contact between them. Since the clocks of the devices are not synchronized andtheir energy-budgets are limited, usually duty-cycled, asynchronous discovery protocols are applied. Only iftwo devices are awake at the same point in time, they can rendezvous. Currently, time-slotted protocols,which subdivide time into multiple intervals with equal lengths (slots), are considered to be the most eﬃcientdiscovery schemes. In this paper, we break away from the assumption of slotted time. We propose a novel,continuous-time discovery protocol, which temporally decouples beaconing and listening. Time is continuous,which means that each device periodically sends packets with a certain interval that can be chosen freely inarbitrarily small steps. These points in time are independent from the time instances the device listens tothe channel. Similarly, each device has a listening interval with which it repeatedly switches on its receiverfor a certain amount of time. Unlike in slotted protocols, both interval lengths, their temporal oﬀsets andthe listening-duration in each interval are independent from each other. By optimizing these interval lengths,we show that this scheme can, to the best of our knowledge, outperform all known slotted protocols such asDISCO, U-Connect or Searchlight signiﬁcantly. For example, Searchlight takes up to 1020 % longer than ourproposed technique to discover a device with the same duty-cycle and hence energy-consumption. Further,our proposed technique can also be applied in widely-used asymmetric purely interval-based protocols such asANT or Bluetooth Low Energy, thereby optimizing their energy-consumptions. Introduction

Low power mobile ad-hoc networks (MANETs), which provide wireless connectivity between multiple mobiledevices without the need for stationary, grid-powered infrastructure, are widely used. Since the participatingdevices are battery-powered, energy-saving communication is a crucial requirement, which is usually achievedthrough duty-cycled protocols. Such protocols allow the hardware to go to a sleep mode during most of thetime, and to wake up only if packets need to be exchanged. Once a connection between two devices has beenestablished, a common wakeup schedule applies, which is known to both sides. Therefore, both devices alwayswakeup simultaneously and sleep in the meantime. However, before initiating a connection, the clocks ofthe devices are not synchronized and a common wakeup schedule cannot be realized. For establishing a ﬁrstcontact, periods of time at which only one device is active whereas the other one is asleep cannot be avoided, ifduty-cycling is used. To minimize these periods and to ensure a reasonably fast discovery procedure, multipleneighbor discovery protocols have been proposed. They deﬁne optimized wakeup schedules for both devices,thereby minimizing the worst-case discovery latency and at the same time reducing the duty-cycle and hencethe energy-consumption.

E-mail address : kindt OR saur OR [email protected] . a r X i v : . [ c s . N I] J u l SLOTLESS PROTOCOLS FOR FAST AND ENERGY-EFFICIENT NEIGHBOR DISCOVERY

Figure 1.

Slot design of a) Disco [1], b) Searchlight [2] and PI − kM .The large majority of the recently proposed protocols assume time to be slotted: A certain period T issubdivided into a number of i intervals with equal lengths d sl . In each of these slots, a device can either be ina sleep-mode or active. In each active slot, a device transmits a beacon at the beginning and the end of theslot and listens for incoming beacons in between of them, as shown in Figure 1 a). Accordingly, whenever twoactive slots of a pair of devices overlap temporally for at least the duration of one beacon d a , the beacons arereceived successfully at both devices and the discovery is complete.Figure 2 a) depicts the schedule of such a slotted protocol. Active slots are colored and passive slots areleft white. The objective of neighbor discovery protocols is to determine a pattern of active slots for whicha deterministic (i.e., discovery is guaranteed within a certain number of slots) maximum latency is alwaysachieved for every possible time-oﬀset between two devices following such a schedule. Multiple deterministicschedules have been proposed, which are based on coprime interval lengths [1], quorum-patterns [3], systematicprobing [2] or optimal diﬀerence codes [4]. We describe these techniques in more detail in Section 2.An alternative to time-slotted protocols is contiuous-time, periodic-interval (PI)-based discovery, as shown inFigure 2 b). Unlike the slotted protocols mentioned above, all known implementations are asymmetric, in thesense of assigning diﬀerent roles to diﬀerent devices. One device, which is called the advertiser , periodicallysends out packets with a period of T a time-units. The transmission duration of a packet d a is determinedby the number of bytes sent. The other device is referred to as the scanner . It periodically switches on itsreceiver for a duration called the scan window d s . The repetition period is called the scan interval T s . Timeis assumed to be continuous, beaconing and receiving occur independently from each other and no slots exist. Figure 2.

Slotted (a) and purely interval-based (b) discovery.

LOTLESS PROTOCOLS FOR FAST AND ENERGY-EFFICIENT NEIGHBOR DISCOVERY 3

Such protocols are also widely used in practice. They have been proposed ﬁrst in [5], with separate channels forbeaconing and for responses to the received packets. The unidirectional mode of the ANT/ANT+[6] protocolmakes use of the scheme described without any modiﬁcations. The most popular protocol that relies on thistechnique is Bluetooth Low Energy (BLE), which applies two slight modiﬁcations to it. First, beacons aresent on up to three diﬀerent channels in a row after each advertising interval T a . The scanner cycles betweenthese three channels after each scan-interval. Second, a random delay ρ ∈ [0 s ,

10 ms] is added to T a in eachperiod.Whereas much research has been carried out to improve time-slotted protocols, signiﬁcantly less work has beenpresented on the optimization of PI-based ones. The main reason is that until recently, the popular beliefwas that they cannot guarantee deterministic discovery. Attempts to ﬁnd special conﬁgurations for BLE, e.g.,coprime interval lengths [7] which provide upper latency bounds, did not turn out to be a promising alternativeto time-slotted protocols. Except for some special-cases, such protocols have not been well understood, andneither the mean discovery latencies nor any information on upper bounds could be derived with analyticalmethods. Recently, in [8], the ﬁrst comprehensive analysis of purely interval-based protocols has been pro-posed. It revealed that such protocols provide upper latency bounds for all interval lengths except a ﬁnitenumber of singularities. Therefore, they can be applied in applications in which slotted protocols are usednowadays to ensure deterministic discovery. However, their maximum performances, especially for optimizedparametrizations, are still not clear.The main contribution of this paper is proposing an optimization technique for PI-based protocols, whichcan be used for computing optimized parameter values for T a , T s and d s . The resulting latencies for a givenduty-cycle are very short. To the best of our knowledge, for reasonable slot lengths of a few milliseconds,the P I − kM -protocol can outperform all known time-slotted protocols signiﬁcantly. For instance, for certainchoices of duty cycles, Searchlight takes up to 10 . × longer for discovering a device with the same duty-cycleand hence the same energy-consumption. For no choice of duty-cycles, Searchlight outperforms ours. Further,as already mentioned, our proposed theory can be used to optimize the parametrizations of existing PI-basedprotocols such as ANT/ANT+ and, to some extend, also BLE. In addition, unlike slotted protocols, ourproposed protocol can realize any speciﬁed duty-cycle. The main insight we have exploited is temporallydecoupling beaconing from listening. This allows a) for sending additional beacons compared to slottedprotocols and b) for sending the beacons at optimal points in time.PI-based protocols have three degrees of freedom ( T a , T s and d s ), and the maximum latency is a veryirregular function with a large number of minima and maxmia, as can be seen in Figure 6. Evaluating allpossible valuations exhaustively to ﬁnd optimal parametrizations cannot be realized in reasonable amountsof time. Further, the models from the literature [8] for computing the upper latency bounds are in the formof recursive algorithms, which makes it impossible to apply diﬀerential methods. As a result, ﬁnding optimalparametrizations is a diﬃcult task which has not yet been solved.For computing the discovery-latency of PI-based protocols, the shrinkage of the temporal oﬀset betweenappropriate neighboring advertising packets and scan windows is tracked over time. The theory in [8] revealsthat there are multiple orders of latency-maxima and minima, as described in Section 2 in detail. In brief,the order of a maximum/minimum refers to the number of advertising- and scan intervals that lie in betweena neighboring pair of an advertising packet and a scan window. For example, if T a < T s and if only oneinstance of T a and T s are considered, the distance between each advertising packet and the next temporallyright scan-event shrinks in multiples of T a for subsequent advertising packets (cf. Figure 5 a)). This is referredto as an order-0 process. Whereas for some initial oﬀsets the order-0 process might lead to a match after acertain number of T a - intervals, for other oﬀsets, linear combinations of the same pair of intervals need tobe accounted for. The next higher neighborhood-relationship that needs to be examined is T s and multiplesof (cid:98) T s T a (cid:99) T a (or (cid:100) T s T a (cid:101) T a , depending on the situation). Such a situation is depicted in Figure 4, in which thetemporal oﬀset Φ[ k ] shrinks for increasing numbers of scan-intervals k . This is referred to as an order-1 process.Processes of higher order exist, which require computations with more elaborate linear combinations. SLOTLESS PROTOCOLS FOR FAST AND ENERGY-EFFICIENT NEIGHBOR DISCOVERY

In this paper, we propose an optimization technique to compute parameter values which always lie in latencyminima of orders 0 and 1. These minima provide pareto-optimal discovery latencies (under the constraintthat only maxima of order 0 and 1 are considered). The parameter optimization works as follows: First, wederive a linear function of T s to compute all advertising intervals that lead to latency minima of order 0, asdescribed in detail in Section 3. Based on this function, we deduce optimal values of T s . After these steps, theinterval lengths T a and T s are parametrized by two integer numbers k and M . Based on the desired targetduty-cycle, k , M and the scan window d s are optimized using diﬀerential methods. We call solutions usingthese optimizations P I − kM - protocols and use them for the comparison against recent slotted protocols,such as DISCO [1], U-connect [9] and Searchlight [2]. In this comparison, we also take into account optimaldiﬀerence codes [4], even though they are currently not realizable except for some special conﬁgurations. Inaddition, we compare our proposed protocols against Lightning [10], even though no real-world implementationof Lightning has been presented, yet. Finally, the recently presented protocol Nihao [11] is considered, since itapplies a theory of pseudo slots, which can be seen as an intermediate step towards fully slotless protocols, asconsidered in this paper. As already mentioned, our proposed protocol can outperform all of these protocolsin terms of discovery-latency for any given duty-cycle.Comparing the performance of slotless, PI-based protocols against slotted ones, as carried out in this paper,is challenging. Mainly, this is because of the following reasons.(1) PI- based and slotted protocols are not directly comparable, because all known purely interval-basedprotocols assign diﬀerent roles to each device (viz., advertiser and scanner), whereas slotted protocolstypically assume that every device implements both roles. To achieve comparability, we propose theﬁrst symmetric, two-way variant of such PI-protocols, in which both devices perform both advertis-ing and scanning. Known asymmetric protocols such as ANT/ANT+ can be described as specialcases of this protocol by setting some of its parameters to inﬁnity. Therefore, all our results candirectly be applied to them and we, for the ﬁrst time, present a mathematical framework for eﬃcientparametrizations of these protocols.(2) Third, PI-based protocols guarantee a maximum latency in terms of time-units, whereas slotted solu-tions achieve deterministic discovery within a given number of slots. Clearly, their discovery latencydepends on the slot size d sl . Towards a fair comparison, we determine a lower bound for reasonableslot lengths.The rest of this paper is organized as follows. In the next section, we provide an overview on the state of theart of slotted and PI-based discovery protocols. Next, in Section 3, we describe our proposed protocol andits optimizations in detail, including a theoretical model on the maximum discovery latency. We evaluate ourtheory by measurements on a real-world implementation of the P I − kM -protocol and present a performance-comparison with time-slotted protocols in Section 4. Finally, we discuss the implications of our theory inSection 5. 2. Related Work

In this section, we ﬁrst give a brief overview on time-slotted discovery protocols. Next, we present relatedwork on purely interval-based discover and compare the relevant related work with this paper.2.1.

Time-Slotted Discovery.

Within the last decade, a large number of slotted discovery protocols havebeen proposed. As already mentioned, they subdivide a certain time period T into multiple equal-length slots.Each protocol deﬁnes a unique discovery schedule, which determines the set of active slots in each period.Since this paper focuses on symmetric asynchronous neighbor discovery , i.e. neighbor discovery in which allnodes follow the same schedule with independent clocks, we only present related work on symmetric neighbordiscovery. LOTLESS PROTOCOLS FOR FAST AND ENERGY-EFFICIENT NEIGHBOR DISCOVERY 5

Figure 3.

Discovery Schedules of a) Disco [1], b) U-Connect [9] and c) Searchlight [2].One of the ﬁrst slotted protocols has been proposed in [3], which is based on a quorum schedule to achievedeterminism. A diﬀerent approach which achieves similar latencies is Disco [1]. Its schedule for two devicesthat attempt to discover each other is shown in Figure 3 a). Disco assumes that each device chooses twoperiods T , T (e.g., T = 5 and T = 3 in Figure 3), which consist of a prime number of slots. As can be seenin the ﬁgure, such a schedule overlaps guaranteed after a limited amount of time. The Chinese RemainderTheorem states that the worst-case latency is the product of the coprime periods of both devices.However, two devices which have chosen the same pair of prime numbers risk never ﬁnding each other. Thisproblem can be overcome by U-Connect [9], which is depicted in Figure 3 b). Each device choses a coprimeinterval length of T or T slots, respectively. In each period, the ﬁrst slot is active. In addition, each devicehas a super-period of T or T slots. The ﬁrst (cid:4) T +12 (cid:5) slots of each super-period are also active. As can be seeneasily, also in cases with T = T , mutual discovery is guaranteed. In symmetric settings (i.e., both deviceshave the same duty-cycle), U-Connect provides lower latency-duty-cycle products than Disco, which meansthat for a given duty cycle, the discovery can be carried out in less time. Unlike Disco (cf. Figure 1 a)), theimplementation of U-Connect presented in [9] uses two diﬀerent types of slots. During the regular slots of theinterval T , the device is in reception mode (indicated by RC in Figure 3 b), whereas the device continuously SLOTLESS PROTOCOLS FOR FAST AND ENERGY-EFFICIENT NEIGHBOR DISCOVERY transmits within the active slots of the hyper-period T (indicated by TR in the Figure). Another signiﬁcantimprovement of the latency-duty-cycle product has been achieved by Searchlight [2]. Like in Disco, each activeslot consists of a listening period, which is preceded and terminated by sending a beacon packet. However,as shown in Figure 1 b), active slots are by a percentage δ longer than passive slots (so-called over-lengthslots ). The schedule of Searchlight is shown in Figure 3 c). Again, there is a period T and a hyper-period T . In this paper, we consider the symmetric case in which both devices have the same period. The ﬁrstslot of each period is referred to as an anchor slot A. In addition, there is another active slot per period,the so-called probe slot

P. In the ﬁrst period, the second slot is a probe-slot. In each succeeding period, theposition of the probe-slot is increased by one until reaching half of the interval T . With this scheme, an upperlatency-bound of approximately T slots is achieved. With the over-length slots shown in Figure 1 b), everysecond probe-slot can be skipped and therefore a maximum latency of coarsely T slots is guaranteed. Thisversion of Searchlight is usually referred to as Searchlight-Striped or Searchlight-S .Further work has been carried out recently to reduce the latencies of slotted discovery protocols, of which webrieﬂy present a selection below. In [12], a framework called HELLO to construct slotted discovery protocolshas been proposed. Disco, U-Connect and Searchlight can be constructed as special cases of HELLO. Accordingto [12], Searchlight achieves the lowest discovery-latencies for symmetric duty-cycles. [13] proposes to applyadditional slots in protocols like Disco to exchange informations on already known neighbors. In [14], two newslotted protocols called HEDIS and TODIS have been proposed, which outperform Searchlight in asymmetriccases. For symmetric discovery, Searchlight still performs better. Further, [4] combines overﬂowing slot lengths(as described for Searchlight) with optimal diﬀerence codes. Whereas optimal codes can only be realized fora few target duty-cycles, an algorithm to create approximations (with reduced performance) for every target-duty-cycle has been presented, which provides better latency-duty-cycle products than Searchlight. In [14], ascheme to construct schedules which exploit over-length slots in any given schedule for non-over-length slotssuch as Disco or U-Connect has been proposed. Another recently proposed discovery protocol, which claimsto achieve shorter latencies than Searchlight, is Lightning [10]. However, the results rely entirely on simulationand therefore, it is not clear how it performs in implementations, yet. To the best of our knowledge, togetherwith optimal diﬀerence codes, Lightning provides the lowest latency bounds of all currently known protocols.A work that has been presented recently is Nihao [11]. Unlike the previously known slotted protocols, it deﬁnesdedicated receive and transmit slots. In each receive slot, the radio listens to the channel during the wholeslot length. In each transmit slot, one beacon is sent at the beginning of the slot, and the device goes back tothe sleep mode afterwards. This can be seen as a pseudo-slot , which leads a to signiﬁcantly diﬀerent protocolbehavior than known from slotted solutions. Therefore, Nihao can be seen as an intermediate step betweenslotted and fully slotless protocols, as studied in this paper. Hence, we provide a more detailed comparison inwhat follows.Three diﬀerent versions of Nihao have been proposed. For the ﬁrst one,

S-Nihao , the authors state that itrequires the fraction of the packet transmission duration and the slot length to be smaller than its duty-cycle.Therefore, it is restricted to relatively large duty-cycles or to very long slot lengths [11], which is infeasible inpractice.The second version

G-Nihao oﬀers two parameters m and n , which can be used to adjust the number ofbeacons and listening slots of each period. While the theory of pseudo-slots used in G-Nihao and the fullyslotless theory in this paper results in similar schedules as one of the three proposed protocol variants in thispaper (e.g., the PI-0M-protocol), there are two main diﬀerences:(1) G-Nihao does not contain any parameter-optimizations of m and n for maximum performance. There-fore, it is not clear how to chose them to achieve the shortest discovery latency at a given duty-cycle. Ithas been shown that pseudo-slotted protocols can outperform existing slotted protocols when assum-ing the same channel utilization, and that the performance can be increased with additional numbersof beacons sent [11]. However, the optimal number of beacons for reaching the best discovery-latenciesand the corresponding performance is not clear. In contrast, our proposed theory contains built-in LOTLESS PROTOCOLS FOR FAST AND ENERGY-EFFICIENT NEIGHBOR DISCOVERY 7 optimizations of all parameters, therefore minimizing the latency for a given duty-cycle. The discovery-latencies achieved are signiﬁcantly lower than those of all known protocols, including G-Nihao for theparametrizations presented in [11]. As we will show, the resulting channel-occupancy is increasedcompared to existing protocols, but always below a reasonable bound of ≈ Purely Interval-Based Discovery.

As already mentioned, PI-based protocols are also widely used inpractice, but signiﬁcantly less research has been carried out on them until recently. The main reason is thattheir analysis is signiﬁcantly more complex than for slotted ones. Therefore, only Monte-Carlo simulationswere available for over a decade. With Bluetooth Low Energy applying a PI-based scheme, its discoverylatency became of high interest and multiple simulation models and -results have been presented [15], [16].These simulation results have revealed that for varying parameter values, the mean latency is a complex curvewith a large number of maxima and minima. Therefore, [15] concludes that modeling and optimizing theseparameters is a crucial requirement for low-power discovery, since it makes a large diﬀerence if parametervalues lead to a peak or lie within a minimum. Further, a model for the special case of T a < d s has beenproposed in [5] and adapted to Bluetooth Low Energy in [17], [18], [19]. Since discrete-event simulations arenot suitable for determining upper latency bounds, the popular belief was that PI-based protocols cannotguarantee any upper latency bounds in general. Therefore, special parametrizations which fulﬁll the ChinesRemainder Theorem for BLE have been proposed [7]. Figure 4.

Modeling PI-based protocols [8].Recently, a novel modeling technique has been presented, which is capable of describing the mean- and worst-case discovery latencies of PI-based protocols [8]. Our paper is built upon this theory. The main idea of sucha model is shown in Figure 4. The ruled squares show the scan windows, whereas the hatched vertical barsdepict the advertising packets. When the advertiser comes into range of the scanner, the ﬁrst advertisingpacket has a temporal oﬀset of Φ[0] to the ﬁrst scan-event considered. The subsequent packets are eitherclassiﬁed as closest right neighbors (marked with a * in the Figure), or as remaining ones. In this example,only the closest right neighbors have a chance on a successful reception. The temporal distance Φ[ i ] betweenclosest right neighbors and their (left) neighboring scan window shrinks for increasing values of i , where theindex i identiﬁes the instance of the larger interval out of ( T s , T a ). The amount of shrinkage per interval isconstant, such that it is:(1) Φ[ i + 1] − Φ[ i ] = γ. A successful reception occurs once Φ[ i ] has shrunken below the length of one scan-window d s . For the worstcase latency, all possible values of the initial oﬀset Φ[0] need to be taken into account. For the intervals SLOTLESS PROTOCOLS FOR FAST AND ENERGY-EFFICIENT NEIGHBOR DISCOVERY exempliﬁed in Figure 4, the maximum latency is(2) d m = (cid:24) T s − T a T a (cid:25) T a (cid:124) (cid:123)(cid:122) (cid:125) A + (cid:24) T a − d s + d a γ (cid:25) · (cid:24) T s T a (cid:25) T a (cid:124) (cid:123)(cid:122) (cid:125) B + d a , with(3) γ = (cid:24) T s T a (cid:25) · T a − T s . Term A accounts for cases in which Φ[0] > T a . In such cases, the distance Φ is increased in multiples of T a time-units, until an advertising packet which is temporally on the right of the second scan window is reached.From there on, Φ[1] is measured against the second scan window and becomes smaller than T a . The remainingdistance Φ[1] ≤ T a is shrunken in multiples of γ until the next scan-window is reached. The latency causedby this shrinkage is accounted for by term B. However, there are cases in which the amount of shrinkage γ exceeds d s and accordingly, the advertising packet might temporally ”overtake” the scan window. In suchcases, appropriate linear combinations of both intervals need to be considered to compute another, eﬀectiveparameter γ < γ . This is called a higher-order γ -process . The example of Figure 4 is a order-1 process, whichimplies that γ < d s and therefore, ”overtaking” the scan window cannot occur. Similarly, we refer to order-0processes as the growth of Φ[ i ] in multiples of T a . Modeling higher-order processes is more elaborate and werefer to [8] for more details. For this paper, only order-0 and order-1-processes are relevant. Further, thereare situations in which the distance Φ[ i ] grows in multiples of γ with increasing numbers of i . They can behandled similar to shrinking situations by tracking the distance to the next (succeeding) scan window.Although the theory presented in [8] provides a closed-form formulation of the discovery latency, it cannotbe used for parameter optimizations, since its functions are not well-behaved. One of the main technicalcontributions of this paper is to develop an equivalent formulation of this theory. It covers the relevant cases,only, but is amenable to systematical parameter optimizations. Based on this simpliﬁed model, we deriveoptimized parametrizations, as described in Section 3. Our Contributions:

Compared to the literature, we make the following contributions: • For comparing PI-based protocol against time-slotted ones, we propose a novel symmetric discoveryprotocol, which is based on continuous-time, periodic-interval-based discovery. • We present a mathematical framework to optimize the parameter values of this protocol. This opti-mization can also be applied to optimize widely-used protocols such as ANT/ANT+ or BLE. • We compare the performance of our proposed protocol against time-slotted solutions and show thatit can realize signiﬁcantly lower latency-duty-cycle-products than all known slotted protocols. • In real-world measurements, we show that such a protocol can realize the discovery latencies predictedby the theory. 3.

The PI-kM Protocol

In this section, we ﬁrst brieﬂy describe the overall scheme of our proposed protocol. Since it contains multipletunable parameters, we then analyze the properties of this protocol with mathematical methods to deriveoptimal parameter values. This leads to three variants of the protocols, which are presented in detail.3.1.

Protocol Overview.

In Section 2, we have described commonly-used asymmetric PI-based protocolslike BLE and ANT/ANT+. To become comparable to slotted protocols, we deﬁne a symmetric PI-basedprotocol in which both devices follow exactly the same scheme. Each device repeatedly broadcasts packetswith an interval T a ( advertising interval ) and periodically scans the channel with an interval T s ( scan interval )for a duration of d s time units, as shown in Figure 1 c). If a beacon needs to be sent within a period at whichthe device is scanning, the scanning is interrupted, the beacon is sent and then the device continues scanningwithout extending d s . We further assume that both devices have been conﬁgured with the same values of LOTLESS PROTOCOLS FOR FAST AND ENERGY-EFFICIENT NEIGHBOR DISCOVERY 9 T a , T s and d s . However, the theory presented can be extended to one-way discovery (both with diﬀerentdevices roles and diﬀerent parameter values) easily, thereby making it applicable to existing protocols such ase.g. ANT/ANT+. The duration of one beacon d a is deﬁned by the number of bytes N b in each packet andthe bitrate R . The initial oﬀset between the ﬁrst advertising packet and the ﬁrst scan window is distributedrandomly between 0 and T s . Under the common assumption that the device draws the same current fortransmission and for reception, the duty-cycle η of a device using this protocol is(4) η = T a d s + T s d a T a T s . For the asymmetric case, in which one device only advertises whereas the other one only scans, this Equationdescribes the joint duty-cycle of both devices. Therefore, the theory presented below remains valid for theasymmetric case, too. For optimizing T a , T s and d s , the maximum discovery latency d m needs to be derived.Existing models [8] rely on recursive schemes which make parameter optimizations diﬃcult. In the following,we derive the worst-case discovery latency considering maxima of order 0 (which means that T a < d s − d a ) and1 (which implies that T a > d s − d a and γ < d s − d a ), only. For these cases, such a model can be formulatedsimple enough to derive optimal parametrizations. All relevant cases for computing the discovery latency areshown in Figure 5. Each case leads to a unique protocol variant. First, we consider a situation which has onlyminima of order 0, as shown in Figure 5 a). Figure 5.

Modeling the neighbor discovery latency for a) order-0 and b),c) order-1 problems.

Case a): The PI-0M-Protocol.

Figure 5 a) shows the packet ﬂow for T a ≤ d s , which is the necessaryand suﬃcient condition to obtain order-0 maxima, only. The ruled boxes depict the scan windows, whereasthe hatched vertical bars show the packets. We refer to this protocol as periodic interval-0M-protocol , since ithas one parameter M , as described below.As can be seen in Figure 5 a), d m is bound to approximately one scan-interval. It can be observed easily thatit is (5) d m = (cid:24) T s − d s + d a T a (cid:25) T a + d a . Choosing T a . To minimize the ceiling-function in Equation 5, it is beneﬁcial to set T a to its largestpossible value. Since we consider order-0 minima, we set(6) T a = d s − d a , since larger values of T a would allow for order-1 minima, too. Figure 6 depicts the maximum discovery latency d m for sweeping values of T a and T s and a ﬁxed value of d s . It also visualizes the parameter values deﬁned byEquation 6. As can be seen, they provide low maximum latencies. Smaller values of T a would lead to similarmaximum latencies, but to higher duty-cycles. Therefore, the values chosen by Equation 6 are optimal fororder-0 minima. Figure 6.

Discovery latencies an valuations chosen by Equation 6 ( d s = 2 . Optimizing T s . Within certain bounds, the duty-cycle η deﬁned by Equation 4 becomes smaller forincreasing values of T s . However, if T s exceeds certain thresholds, the ceiling-function in Equation 5 turns For our proposed protocols, we deﬁne the discovery beginning from the point in time after which both devices have beenactive for the ﬁrst time, since this is the latency observed in practical problems. Most other protocols deﬁne the latency beginningfrom the point in time after which the ﬁrst device has been active for the ﬁrst time. However, the diﬀerence is small.

LOTLESS PROTOCOLS FOR FAST AND ENERGY-EFFICIENT NEIGHBOR DISCOVERY 11 over to its next higher value. Therefore, we attempt to set T s to its largest possible value under the constraintthat the ceiling-function must not increase. Hence, we require:(7) T s − ( d s − d a ) d s − d a ! = M − (cid:15) (cid:48) , M ∈ N . We assume that (cid:15) (cid:48) is an arbitrary small value. By solving this, the optimal value of T s is deﬁned as follows:(8) T s = ( M + 1)( d s − d a ) − (cid:15). In Equation 8, we replaced (cid:15) (cid:48) by (cid:15) . The relation between (cid:15) and (cid:15) (cid:48) is not relevant, as long as both valuesare small. To ease the readability, we assume (cid:15) = 0 in all subsequent equations and describe its valuationsfor practical implementations in Section 4. The maximum latency for parametrizations selected according toEquation 6 and Equation 8 is deﬁned as follows:(9) d m = M ( d s − d a ) . The desired target duty-cycle deﬁnes the value of d s . From Equation 4, it follows that(10) d s ( η ) = d a + d a ( M + 2) η ( M + 1) − . The only parameter left is M . By diﬀerentiating the duty cycle-latency-product η · d m , it follows that thereis a local minimum at(11) M opt = (cid:112) − η + 1 η − . In addition, there are multiple constraints on M which need to be kept. First, d s > M > η − M min . Further, we assume that there is a minimum scan window d s,l the hardware supports. Hence, we require d s > d s,l . For η < d a d s,l − d a , this implies(13) M ≥ d s,l ( η − − d a ( η + 1) d a ( η + 1) − ηd s,l . It can be shown that this is always fulﬁlled if Equation 12 is kept. For η > d a d s,l − d a , it is required that M ≤ M max with(14) M max = d s,l ( η − − d a ( η + 1) d a ( η + 1) − ηd s,l . Therefore, we set(15) M = (cid:26) round( M opt ) , if M opt < = M max , (cid:98) M max (cid:99) , else.Finally, it is required that (cid:100) M min (cid:101) ≤ (cid:98) M max (cid:99) . This translates into the (conservative) requirement M max − M min ≥ η max = d a + (cid:112) d a d s,l d s,l − d a . Case b) and c): The PI − kM + -Protocols. The assumption T a < d s − d a is quite restrictive. Wetherefore allow larger values of T a , as long as γ < d s − d a . Such valuations lead to order-1 processes, in whichthe distance Φ[ i ] shrinks or grows with multiples of γ until reaching a scan event. Figure 5 b) shows thesituation in which the oﬀset shrinks in each scan-interval, whereas Figure 5 c) depicts the case in which itgrows. In this ﬁgure, the ﬁrst advertising and scan-event start at the same point in time. Therefore, the closestneighboring advertising packet of the second scan-event determines which case applies: If the temporally leftneighbor is closer than the right one, the process is of shrinking type (case b)). Otherwise, it is growing(case c)). In Figure 5 b) and c), the space in between the scan-events is subdivided into multiple parts.If an advertising packet starts within one of the colored rounded rectangles, the order-0 process causes adeterministic match after the number of T a - intervals depicted in each box. These boxes are marked with a*. The other, non-colored rectangles contain the number of T a -intervals until reaching the temporally closestlocation next to the second scan-event. From this location, the distance is further reduced in multiples of γ . The value of γ is the distance from the second scan event to its closest neighboring advertising packet, asshown in the ﬁgure. In case b), the situation is shrinking and hence the distance is reduced from the rightside of the scan-event until the advertising packet reaches it. For case c), the distance is reduced in multiplesof γ from the left side.For example, in case b), the transmission of the ﬁrst advertising packet might start e.g. in box 2. Then,two advertising intervals take place until reaching the ﬁeld marked with 0 (cid:48) in the ﬁgure. From there, thetemporal distance to the second scan event is reduced in multiples of γ until it is reached. Another example isan advertising packet which starts in the box marked with 0. From there, the distance directly shrinks withmultiples of γ towards the ﬁrst scan event. The box marked with x can never be reached by any advertisingpacket. With these considerations, one can derive the worst-case latency for case b) as follows:(17) d m = (cid:24) T s − T a T a (cid:25) T a + (cid:24) T a − ( d s − d a ) γ (cid:25) · (cid:22) T s T a (cid:23) T a + d a . For case c), it is similarly: d m = (cid:24) T s + d s − d a T a − (cid:25) T a + (cid:24) T a − d s + d a γ (cid:25) · (cid:24) T s T a (cid:25) T a + d a . (18)3.3.1. Choosing T a . When examining Equations 18 and 17, it is clear that the minimum worst-case latenciesare achieved for maximum values of γ . The maximum value of γ is d s − d a , since otherwise, processes oforder 2 and higher would take place. Hence, the minimum worst-case latency is reached for γ = d s − d a .From Figure 5 b) and c), it can be seen that γ is the diﬀerence between the second scan event and its closestneighboring advertising packet. It follows that this value can be realized by setting the advertising interval by d s − d a time-units shorter or longer than one scan-interval:(19) T a = T s ± ( d s − d a ) . Figure 7 shows the worst case discovery latencies for d s = 2 . d a = 246 µ s and sweeping values of T s and T a .Further, the values of T a deﬁned by Equation 19 are highlighted. As can be seen, they all lie within latencyminima.We refer to the protocol which sets T a by approximately one packet length longer than T s as the P I − kM + -protocol (because of the +-operation in Equation 19), and to the protocol which sets T a slightly smaller than T s as P I − kM − . As can be seen in Figure 7, these valuations always lie within local latency minima. In thefollowing, we analyze valuations following the P I − kM + -scheme in detail. The optimization is more elaboratethan for the P I − M -protocol, since there is another parameter k , which is used to deﬁne additional valuationsof T a for achieving even lower maximum latencies. It is introduced next. LOTLESS PROTOCOLS FOR FAST AND ENERGY-EFFICIENT NEIGHBOR DISCOVERY 13

Figure 7.

Values and associated maximum discovery latencies chosen by Equation 19.3.3.2.

Optimizing T a . Equations 18 and 17 can be further minimized by reducing T a without modifying thevalue of γ . The main idea is that dividing T a by an integer number k does not change γ , as can be comprehendedby inserting additional packets into the situations shown in Figure 5 b) and c). This measure reduces themaximum distance Φ[0] which needs to be shrunken in multiples of γ , and therefore reduces the maximumdiscovery latency. Instead of the interval length deﬁned by Equation 19, we choose T a as follows.(20) T a = 1 k ( T s + d s − d a ) . Figure 8 shows the values and maximum latencies when choosing T a according to Equation 20. Increasingvalues of k reduce the maximum discovery latencies, but slightly increase the duty-cycles. Hence, the value of k needs to be set such that an optimum is reached. The lower limit of k is 1, and the upper limit is deﬁnedby requiring T a > d a , which we never reach in practice.Since T a is a function of T s , the optimization of T s is the next step towards ﬁnding an optimal parametrization.3.3.3. Optimizing T s . Figure 9 shows η · d m for a ﬁxed value of d s = 3 . k = 1. Pareto-eﬃcient values arethe ones depicted by the circles. Since the duty-cycle η decreases for increasing values of T s , but d m remainsconstant as long as the ceiling-functions in Equations 18 and 17 do not change their values, we need to ﬁndvalues of T s which maximize the terms within the ceiling functions, without causing them to turn over. Hence,we require(21) T s + ( d s − d a )(1 − k ) k ( d s − d a ) ! = M − (cid:15) (cid:48) , M ∈ N .(cid:15) (cid:48) is a number close to zero. Solving Equation 21 by T s leads to:(22) T s = ( k ( M + 1) − d s − d a ) − (cid:15) For convenience reasons, we have introduced a parameter (cid:15) instead of (cid:15) (cid:48) , as described for the

P I − M -protocol.Equations 19 and 22 deﬁne the parametrizations of the P I − kM + - protocol. When combining Equations 20, Figure 8.

Impact of choosing k in Equation 20. Figure 9.

The product η · d m for d s = 3 . k = 1 when choosing T a according to Equation 20.22, 17 and 18, its maximum discovery latency can be computed as follows:(23) d m =  ( M − M + 1)( d s − d a ) − (cid:15) ) + d a , if k = 1 , ( k ( M + 1)( d s − d a ) − (cid:15) ) ·· k ( M +1) − k + d a , else.When assuming (cid:15) = 0 , the duty-cycle can be computed as(24) η = d s − d a + kd a + M ( d s + kd a )( d s − d a )( M + 1)( k + kM − . Compared to the initial situation, in which there were three real-valued parameters T s , T a and d s , now thereare one real-valued parameter d s and two integer numbers k and M . All valuations lead to local latency The error introduced by this is negligible, since (cid:15) is small.

LOTLESS PROTOCOLS FOR FAST AND ENERGY-EFFICIENT NEIGHBOR DISCOVERY 15 minima. In the following, we describe how these values can be further optimized by selecting the best out ofthese minima. A value of M = 1 leads to the situation depicted in Figure 5 b), in which the oﬀset Φ[ i ] shrinksin multiples of γ . For values of M >

1, situation 5 c) occurs, in which Φ[ i ] grows in multiples of γ . Next, weconsider M = 1.3.3.4. Case b): The

P I − k + -Protocol (M = 1). In the following, we assume M = 1 and analyze the resultingproperties. For a given target duty-cycle η , we can chose d s as follows:(25) d s ( η ) = d a (2 k − η )2 η (2 k − − . Still, there is one degree of freedom k left. We determine its value by analyzing the constraints on d s . First, d s ( η ) needs to be larger than zero, and Equation 25 implies that(26) k > η + 12 . In addition, we assume that the hardware supports a minimum scan window d s,l , and require d s ( η ) ≥ d s,l .We can deﬁne a limiting value for k as follows.(27) k l = 2 d s,l (1 + η ) − d a (1 + 2 η )4 ηd s,l − d a (1 + 2 η ) . To satisfy d s ≤ d s,l , we need to ensure that(28) k ≤ k l , if η < d a d s,l − d a ) ,k ≥ k l , if η > d a d s,l − d a ) . For η = d a d s,l − d a ) , the hardware-constraint is kept if d s,l ≥ d a , which is always true in practice.By diﬀerential computations, it can be shown that the maximum-latency-duty-cycle-product η · d m has a localminimum at k = k opt , which is deﬁned as(29) k opt = 1 + (cid:112) (1 − η )(1 + 2 η )2 η + 12 . The strategy for choosing k works as follows. • We set k ← round( k opt ), if no constraint is violated. • If any of the constraints mentioned above are violated, we set k to the closest possible value to k opt whichis allowed by all constraints.With this scheme, there are no degrees of freedom left, and a given target-duty-cycle can be realized withoptimal parameters. When comparing Equations 26 and 28, one can derive that the maximum duty-cyclewhich can be realized by the P I − k + -protocol is given by(30) η ≤ d a + (cid:112) d a ( d a + 8 d s,l )8( d s,l − d a ) . Next, we examine

M >

Case c): The

P I − k + -Protocol ( M > )). As we will show below, increasing the value of M above 2is not beneﬁcial in terms of the latency-duty-cycle product achieved. Since we keep M at 2, we refer to thisprotocol as the P I − k + -protocol. Again, we can diﬀerentiate η · d m by k to ﬁnd a local minimum k opt :(31) k opt = 1 M + 1 + (cid:112) (1 − η )( η ( M + 1) + 1) + 1 η ( M + 1) . Next, we realize that ddM ( η · d m ) given ( k = k opt ) is positive for all M >

1. Hence, the slope of the duty-cycle-maximum-latency product is always positive and higher values of M lead to higher values of η · d m . Therefore, we set M to its minimal value 2. Further, there are multiple constraints which need to be be kept. First, η ( d s ) needs to be positive, and therefore(32) k > η + 1 η ( M + 1) = k min . In addition, the protocol should guarantee that no limits of the hardware are exceeded. We again assume aminimum scan duration of d s,l and require d s ( η ) ≥ d s,l . This imposes the following upper limit on k:(33) k < d s,l ηd s − (3 η + 1) d a + 13 = k max . Since k is an integer value, (cid:100) ( k min ) (cid:101) < (cid:98) k max (cid:99) must be kept. A conservative but analytically solvable form ofthis inequality is(34) k min + 1 < k max − . This leads to a general limit of duty-cycles the

P I − k + -protocol can realize. Whereas (almost) arbitrarilysmall duty-cycles are feasible, the upper limit is(35) η < d a + (cid:112) d a ( d a + 8 d s,l )12( d s,l − d a )We again set k to round( k opt ), or as close to it as allowed by the constraints, respectively.3.4. One-Way Discovery.

In our proposed protocol, we have modiﬁed the scheme used in ANT/ANT+ orBLE to obtain a symmetric version of them. The goal was to make them comparable to symmetric slottedprotocols. However, e.g. ANT/ANT+ can be seen as a special case of the protocol described at the beginningof this section, by setting its parameters as follows: on the advertising device, T s ← ∞ and on the scanner, T a ← ∞ . Therefore, the equations presented in this section can be used for selecting optimized parametervalues e.g. for ANT/ANT+. The parameters chosen by our theory would then determine the advertisinginterval for one device and the scan interval and -window for the other one, thereby optimizing the jointenergy consumption of both devices. This is highly relevant, since there is currently now known theory foroptimizing the parameters of these protocols. Currently, parameters are chosen based on empirical data [20]or ”good guesses”. 4. Evaluation

In this section, we evaluate our proposed technique. Therefore, we ﬁrst compare the performances of thethree protocols described in the previous section among each other. In addition, we compare the theoreticallyachieved maximum latencies and channel utilizations of our proposed technique against slotted protocols.Further, we demonstrate that our proposed protocol can be implemented on a radio and achieves the predictedperformance in real-world measurements.4.1.

Hardware Parameters.

In this section, we attempt to derive reasonable values for the packet length d a , the slot size d sl and the scan window length d s,l . We assume a Nordic nRF51822[21] radio, which supportsmultiple protocols such as e.g., Bluetooth Low Energy (BLE). It allows for fast switching between sleep-and active modes and between transmission and reception. Therefore, it is well-suited for implementing ourproposed protocols. For the BLE protocol, our scheme needs to be adjusted to account for the random oﬀset that is added to T a . LOTLESS PROTOCOLS FOR FAST AND ENERGY-EFFICIENT NEIGHBOR DISCOVERY 17

Beacon Length.

In typical neighbor discovery applications, it is not only required to detect that anotherdevice generates electromagnetic energy on a channel. Typically, one would like to transmit some informationabout the device during the discovery procedure, e.g., a device address, a device type and some applicationdata. For example, the BLE location beacon Estimote [22] typically transmits beacons with 46 bytes length(incl. all overheads). On an nRF58122 radio with an over-the-air symbol rate of 1 MBit/s, this would translateto a beacon length of d a = 368 µ s. We assume this value for our evaluation.4.1.2. Slot Length.

For comparing the performance of slotted protocols against periodic-interval protocols,there is one major challenge: Slotted solutions deﬁne a maximum discovery latency in terms of N m slots,whereas the P I − kM -protocols guarantee an upper latency limit in terms of time passed. N m is independentfrom the slot length d sl , and therefore the maximum latency of slotted protocols is N m · d sl . The duty-cycleof slotted protocols does not depend on the slot length. Clearly, it is beneﬁcial to set the slot-length as shortas possible. In contrast, the performance of periodic interval protocols is not proportional to the smallestpossible scan window length d s,l . d s,l inﬂuences the largest supported duty-cycle and sometimes aﬀects theadjustment of k and M due to hardware-constraints, such that the performance can be reduced. Therefore,it should be minimized as well, but its impact is not crucial.For slotted protocols, the minimum slot length d sl is limited by multiple factors. First, there is a fundamentallimit of 3 · d a , which can be derived from Figure 1 a) and b): Every slot contains two beacons with length d a and a reception phase in between. The radio must listen for at least d a time-units, since otherwise the packetcannot be received entirely. Next, packet collisions are an issue. From Figure 1 a) and b), it can be derivedthat the collision probability P c for slotted protocols is(36) P c = 3 d a d a + d s . To obtain a reasonable collision rate of e.g. 10 %, a slot length of 27 d a is required, which translates to d sl = 9 .

936 ms for d a = 368 µ s. Hence d sl ≈

10 ms is a reasonable slot length, which we assume throughoutthe rest of this paper. This value has been assumed in multiple previous studies, e.g. in [1] and [11].Another limiting factor is clock skew: the clocks of both devices must not drift by more than one slot length perperiod. For d sl = 10 ms, one can show that this is not an issue, even if very inaccurate clocks are considered.4.1.3. Scan Window Length.

In Section 3, we have assumed that there is a lower limit of the scan-window d s,l . Whereas all scan windows larger than d a time-units could be realized by the hardware, clock skew isthe limiting factor. In the ﬁrst part of our evaluation, we assume that d s,l = d sl = 10 ms for comparingthe latencies achieved against the latencies of slotted protocols. Next, we demonstrate that d s,l = 10 ms isa reasonable scan window length for PI-protocols, since it is suﬃcient to compensate for clock skew on areal-world implementation.4.2. Discovery Latencies of PI-Protocols.

In this Section, we evaluate the maximum discovery latenciesof our three proposed protocol variants (viz.

P I − M , P I − k + and P I − k + ) for a given duty-cycle η . Wehave set the minimum scan window length d s,l to 10 ms , as described above. The parameter (cid:15) is assumed tobe

132 768 Hz , since this is the smallest step-size a typical crystal that is used as a sleep clock would support.Figure 10 depicts the maximum discovery latencies d m of all three protocols for diﬀerent duty-cycles η . Thelatencies have been computed by the Equations in Section 3. Starting from 1 %, the evaluation has been carriedout for duty-cycles up to 23 . d s,l = 10 ms. As can beseen, for low duty cycles, the performances of all three protocols are similar. The P I − M -protocol oﬀers thebest latencies for larger duty-cycles. For increasing values of η , the latency function has non-linearities whichare caused by adjusting the protocol parameters to meet the constraints described. For the P I − M -protocol,these nonlinearities can become quite signiﬁcant at larger duty-cycles. Therefore, we propose not to increase Figure 10.

Worst Case Latencies d m ( η ).the duty-cycle beyond η adj , which is deﬁned as follows.(37) η adj = d s,l − d a · (cid:113) d a − d a d s,l + d s,l + d a − d s,l d s,l − d a ) . The limit from Equation 37 ensures that k is not adjusted by more than one, which keeps the performance-degradation negligible.Since all three protocol variants lead to almost identical latencies for small duty-cycles, each of them can beused in practical applications.4.3. Comparison against Slotted Protocols.

In the following, we compare the

P I − kM protocols againstmultiple time-slotted solutions. We have chosen the following protocols for our comparison: • Disco [1], because it is one of the ﬁrst slotted discovery protocols and the most popular concept that relieson the Chinese Remainder Theorem. For the ease of comparison, we set the two adjacent primes p and p that Disco requires to p = p in the equations for computing the latency. Since such a schedule cannot berealized in practice, DISCO performs slightly worse in real-world implementations. • U-Connect [9], because it is frequently used as a baseline in many comparisons. [9] proposes dedicatedreceiving slots (1 in each period) and transmission slots (( p + 1) / µ s, but requires large transmission slot lengths. However,it can only be realized if no payload information is transmitted during the discovery. In addition, it leads tohigh channel utilizations. We therefore assume a slotting scheme as depicted in Figure 1 a), with d sl = 10 ms. • Searchlight [2], since, to the best of our knowledge, it is the most eﬃcient slotted symmetric discoveryprotocol which has been proven to be realizable on real hardware. We consider the striped version of Search-light, since it oﬀers the lowest discovery latencies. We assume that the overﬂow δ is 0 in all computations,which means that Searchlight performs slightly worse in practice. • Optimal Diﬀcodes [4] provide a theoretical limit which is, to the best of our knowledge, the lowest limitfor slotted protocols known so far for most duty-cycles. Despite this limit has not been reached except fora few duty-cycles in practice, we include this theoretical limit into our comparison. The overﬂow δ is againassumed to be 0. LOTLESS PROTOCOLS FOR FAST AND ENERGY-EFFICIENT NEIGHBOR DISCOVERY 19

Table 1.

Worst-Case Discovery Latencies of Slotted Protocols [1], [9], [2], [4], [10].Protocol d m ( η )Disco η d sl U-Connect ( (cid:113) η + η + η ) d sl Searchlight (cid:24) (cid:98) η (cid:99) (cid:25) d sl Optimal DiﬀCodes η d sl Lightning n (1+ δ )+( n − δβ +1+2 δη − (1 − δ ) β + δ n ( n +1) − δ + β (1 − δ )2( n +1) , n ≈ . √ η +2109+0 . η G-Nihao (cid:16) d sl + d a γ γηd sl + (cid:113) d sl + d a γ γηd sl − d a d sl (cid:17) γ • Lightning [10], since it claims to achieve the lowest latency bounds of all known protocols, but has notbeen evaluated on hardware, yet. We assume the parameters proposed in [10], i.e. β = δ = 0 . • G-Nihao [11], since it it constructs similar schedules as the PI-0M+ - protocol. Since G-Nihao providesa parameter to adjust the number of beacons per period, but does not come with any mechanism to ﬁndthe optimal number of beacons per slot, we assume γ = nm = 2, as assumed in the comparison in [11]. Thismeans that the number of beacons per period n is twice the number of slots per period m . It needs to bementioned that optimized parametrizations of G-Nihao are expected to achieve higher performances. However,the optimal number of beacons per period is not clear.Since this study focuses on symmetric asynchronous discovery, we restrict our evaluation on the symmetricvariants of the protocols mentioned above.Under these assumptions, the worst-case discovery latencies for these protocols are as shown in Table 1 .Among the three proposed protocols P I − M , P I − k + and P I − k + , we always chose the lowest latencyfor a given duty-cycle and refer to the resulting duty-cycle-latency relation as the P I − kM Opt -protocol.4.3.1. Discovery Latencies.

In this section, we compare the discovery-latencies of the

P I − kM Opt -protocolto the ones achieved by slotted protocols for a relevant range of duty-cycles between 1 % and 20 %. The resultsof this comparison are depicted in Figure 11. As can be seen, the P I − kM Opt -protocol provides signiﬁcantlyshorter discovery-latencies than all slotted protocols for all duty-cycles. For larger duty-cycles, the performanceof the P I − kM Opt -protocol gets slightly deteriorated due to the adjustments of the parameters for meetingthe hardware constraints. Table 2 shows the maximum gains G m and the mean gains G over slotted protocols,deﬁned as G = d m,slotted d m,PI − kMOpt . For example, for a given duty-cycle, in the worst case, Searchlight would take10.2 times as long as the P I − kM Opt -protocol for discovering a neighbor. As already mentioned, G-Nihaoconstructs similar schedules to the ones constructed by the P I − M -protocol using pseudo-slots. Whereasit has been shown in [11] that such protocols can guarantee a by factor of 1.65 faster discovery than e.g.Searchlight-S with the same duty-cycle, it has not been studied how such protocols perform in the optimalcase, when allowing higher duty-cycles. Our results indicate that the PI-kMOpt-protocol outperforms G-Nihao(with γ = 2) by up to a factor of 3 . Channel Utilization.

Unlike in slotted protocols, beaconing and receiving are temporally decoupledin our proposed protocol. Since beacons are short, a large amount of beacons does not increase the duty-cycle signiﬁcantly, but reduces the maximum discovery latency. However, this could potentially cause large In our evaluation, the

P I − M -protocol always performed best. Figure 11.

Worst-case latencies of slotted protocols against the

P I − kM Opt -Protocol. Table 2.

Gain in theoretic Worst-Case Discovery Latencies over Slotted protocols. Valuesmarked with a * indicate the theoretic gains with the modiﬁcations descried in Section 4.5

G G m G ∗ G ∗ m Disco 23.5 40.0 22.1 40.0U-Connect 13.7 22.5 12.9 22.5Searchlight-SR 6.0 10.2 5.7 10.2Opt. DiﬀCodes 2.9 5.0 2.8 5.0Lightning 3.7 4.3 3.4 4.2G-Nihao 2.1 3.1 2.0 2.8channel utilizations, which would lead to high collision rates and make the protocol impracticable for manysituations. To show that the channel utilization is within reasonable bounds, we evaluate and compare it tothe utilizations of slotted protocols. The channel utilization is deﬁned by the sum of time-units in which apacket is transmitted on the channel divided by the total time, when considering one device, only. As can beseen in Figure 12, compared to the slotted ones, it is indeed increased. However, the total utilization is alwaysbelow 4%. Between two devices discovering each other, this would lead to a collision rate of up to 8 %. Formost duty-cycles, it is signiﬁcantly lower. It needs to be highlighted that this is an improvement over slottedprotocols: The collision rate of all slotted protocols considered is 10 % for d sl = 10 ms, and remains constantfor all duty-cycles. This advantage is caused by distributing the packets equally over time, whereas for slottedprotocols, all beacons are sent temporally compressed within the active slots, only.It can be seen that the the pseudo-slotted protocol G-Nihao with γ = 2 has a similar channel utilization asSearchlight-S, as the authors claim. Nevertheless, it outperforms all other protocols except optimal diﬀcodesand P I − kM + Opt .4.3.3.

Average-Case Behavior.

Figure 13 shows that all three protocol variants have nearly identical cumu-lative distribution functions (CDF). The curves have been generated by computing the latencies for each

LOTLESS PROTOCOLS FOR FAST AND ENERGY-EFFICIENT NEIGHBOR DISCOVERY 21

Figure 12.

Comparison of channel utilizations.

Figure 13.

Comulative Distribution Function (CDF) for duty-cycles η = 1 % and η = 3 %.possible initial temporal oﬀset between the two devices. Since the oﬀsets can be subdivided into multipleﬁelds with constant latencies [8], the number of possible oﬀsets is ﬁnite and all possibilities can be evaluatedin an exhaustive fashion.As can be observed, the CDFs of the P I − kM + protocol family are linear functions, which means that theiraverage-case discovery latencies are half of the worst-case latencies. Compared to the slotted protocols considered in this paper, the average-case performance gains of our proposedprotocols are as follows. The non-randomized version of Searchlight, as well as U-Connect have nearly linearCDF functions [2], [7]. Therefore, the performance gains in terms of average latencies are similar to theones presented in terms of the worst case latencies. The CDF functions of Disco, Lightning and OptimalDiﬀerence Codes [1], [10], [4] are concave downwards. Therefore, the performance gains of our proposedprotocols regarding the average latencies will exceed the gains regarding the worst-case latencies presented inSection 4.3.1.

Figure 14.

Realizable Duty-Cycles of

P I − kM + Opt and G-Nihao. Every circle depicts arealizable duty-cycle of G-Nihao, whereas every cross depicts a realizable duty-cycle of

P I − kM + Opt .4.4.

Duty-Cycle Granularity.

Besides shorter latencies, slotless protocols have a better duty-cycle granu-larity than slotted ones. In particular, slotless protocols can realize every possible duty-cycle in practice, andthe only limit is the granularity supported by the timers of the radio. To verify this, we have conducted thefollowing experiment.A large number of target duty-cycles η t have been deﬁned by sweeping η t between 0 . d sl = 10 ms and γ mn = 2, as assumedin [11]. We have computed the value m to realize these duty-cycles. In addition, we have parametrized the P I − kM + Opt -protocol for these duty-cycles, as described in Section 3.Since G-Nihao is based on a pseudo-slotted theory, m needs to be an integer value. Therefore, we have roundedthe resulting value of m to the nearest integer. This limits the duty-cycles that can be realized. Figure 14shows the results of this experiment. Every realizable duty-cycle of G-Nihao is depicted by a circle, whereasevery realizable duty-cycle of P I − kM + Opt is depicted by a cross. As can be seen, the crosses overlap whicheach other, since literally every duty-cycle can be realized. In contrast, only a ﬁnite number of duty-cyclescan be realized by G-Nihao. Especially for larger duty-cycles, the granularity decreases signiﬁcantly. Fromthe literature, it is known that other slotted protocols are even more restrictive. For example, DISCO limits

LOTLESS PROTOCOLS FOR FAST AND ENERGY-EFFICIENT NEIGHBOR DISCOVERY 23 possible duty-cycles to the sum of reciprocals of two prime numbers [1]. Since slotless protocols do not have thislimitation, they allow for continuous on-line duty-cycle controllers, which optimize the duty-cycle continuouslyduring run-time, e.g. given the battery level as an input.

Figure 15.

Measured against computed values.4.5.

Real-World Implementation.

Since the clock of any hardware is subjected to skew, slight adjustmentsof our proposed protocols are required to compensate for that. These adjustments diﬀer slightly for each of thethree protocol variants. We in the following describe them for the most eﬃcient solution, the

P I − M -protocol.First, the ceiling function in Equation 5 might turn over for positive clock skews. Therefore, (cid:15) needs to beset such that the largest possible skew ψ m exceeds this value. For the valuations of η we consider, the largest scan-interval that occurs is around 15 s. When assuming a crystal with an accuracy of e.g. 20 ppm, one wouldcompute (cid:15) as 2 · · − ·

15 s = 600 µ s, since the clocks of both devices might skew in opposing directions.In addition, the occurrence of higher-order processes needs to be prevented, and therefore T a + ψ m < d s − d a .This can be fulﬁlled by shortening T a by (cid:15) T a , such that (cid:15)

T a exceeds ψ m . Therefore, (cid:15) T a = (cid:15) = 600 µ s. Theseadjustments slightly reduce the performance of the P I − kM Opt -protocol. The gains over slotted protocolsmarked with * in Table 2 account for them. As can be seen, the diﬀerences to the original gains are small.To demonstrate that it is realizable in practice, we have implemented the P I − kM Opt -protocol on a radio.Two nRF51822-USB dongles with custom ﬁrmwares based on the open-source BLE stack blessed [23] havebeen connected to a laptop. In our experiments, each device repeatedly chose a random point in time between0 and T s , after which the advertising was started. Once a packet was arrived by a device, it was reported toa laptop which accounted for the points in time this occurred. After either both devices have discovered eachother, or a timeout of 30 seconds (which exceeded d m by more than a factor 2 in the worst case) was reached,the advertising was stopped and another random oﬀset was chosen for the next round. For each duty-cycle, theexperiment was repeated 100 times. The measured discovery-latencies are shown in Figure 15 together withthe computed upper limits d m ( η ) (which also accounts for (cid:15) and (cid:15) T a ). Each small point represents the result ofone measurement, whereas the solid line depicts the computed limit. In Figure 15 a), the asymmetric, one-waycase, in which one device advertises and the other one scans, is shown. As can be seen, the measurementsconﬁrm our theory. The measured points always lie below the theoretic upper bound. Small deviations aremainly caused by the latency of the USB-connection to the laptop. These results do not only show that the

P I − kM Opt -protocol can be realized in practice, but also prove that an outstanding performance can bereached by real implementations of it. Figure 15 b) shows the measurements of the symmetric, two-way case,in which both devices advertise and scan. As can be seen, the large majority of the measured latencies liesbelow the computed curve. However, the predicted maximum latencies are exceeded by a certain fraction ofmeasurements. This behavior is caused by packet collisions. The collision rate has been around 7 . Concluding Remarks

We have introduced a novel discovery protocol, which is based on optimized parametrizations of purely interval-based schemes. It achieves signiﬁcantly lower discovery latencies than all known slotted protocols. Whereasprevious recent studies have shown that breaking away from the assumption of slotted time can increasethe performance while maintaining the same channel utilization, we demonstrate that if such protocols areoptimized towards performance, much lower latencies than all existing protocols can be guaranteed, while stillmaintaining acceptable channel utilizations. In addition to its outstanding performance, unlike all previouslyproposed deterministic protocols, slotless solutions can realize practically every duty-cycle within the range ofinterest. Given these results, we hope to motivate more researchers to work on slotless protocols in the future.There is a large potential for future optimizations. For example, starting from Disco [1], subsequent slottedprotocols performed better than their predecessors because they added additional slots based on a hyper-period. This concept could be seized for interval-based protocols as well. Further, while we focused onvaluations with T s = T a + d s − d a , protocols with T s = T a − ( d s − d a ) need to be evaluated, too.Besides two-way discovery, the asymmetric one-way case is of high interest, since it is widely used in pro-tocols like BLE and ANT/ANT+. Our proposed optimizations can be applied to these protocols, too. ForANT/ANT+, they can be used directly without any modiﬁcations. Therefore, we have presented the ﬁrstmathematical framework for optimizing its parameters in a systematic fashion. For BLE, small changes haveto be applied to account for the random delay and for the discovery on three channels. However, the basicconcept remains the same. LOTLESS PROTOCOLS FOR FAST AND ENERGY-EFFICIENT NEIGHBOR DISCOVERY 25

References [1] P. Dutta and D. Culler, “Practical asynchronous neighbor discovery and rendezvous for mobile sensing applications,” in

ACM Conference on Embedded Network Sensor Systems (SenSys) , 2008.[2] M. Bakht, M. Trower, and R. Kravets, “Searchlight: won’t you be my neighbor?” in

Annual International Conference onMobile Computing and Networking (MOBICOM) , 2012.[3] Y. Tseng, C.-S. Hsu, and T.-Y. Hsieh, “Power-saving protocols for IEEE 802.11-based multi-hop ad hoc networks,” in

IEEEConference on Computer Communications (INFOCOM) , 2002.[4] T. Meng, F. Wu, and G. Chen, “On designing neighbor discovery protocols: A code-based approach,” in

IEEE Conferenceon Computer Communications (INFOCOM) , 2014.[5] C. Schurgers, V. Tsiatsis, S. Ganeriwal, and M. Srivastava, “Optimizing sensor networks in the energy-latency-density designspace,”

IEEE Transactions on Mobile Computing (TMC) , vol. 1, no. 1, pp. 70–80, 2002.[6] Dynastream Innovations Inc., “ANT message protocol and usage,” 2014, revision 5.1, available via thisisant.com.[7] A. Kandhalu, A. Xhafa, and S. Hosur, “Towards bounded-latency bluetooth low energy for in-vehicle network cable replace-ment,” in

International Conference on Connected Vehicles and Expo (ICCVE) , 2013.[8] P. Kindt, M. Saur, and S. Chakraborty, “Neighbor discovery latency in BLE-like duty-cycled protocols,”

CoRR , vol.abs/1509.04366, 2015.[9] A. Kandhalu, K. Lakshmanan, and R. Rajkumar, “U-connect: a low-latency energy-eﬃcient asynchronous neighbor discoveryprotocol,” in

International Conference on Information Processingin Sensor Networks (IPSN) , 2010.[10] L. Wei, B. Zhou, X. Ma, D. Chen, J. Zhang, J. Peng, Q. Luo, L. Sun, D. Li, and L. Chen, “Lightning: A high-eﬃcientneighbor discovery protocol for low duty cycle wsns,”

IEEE Communications Letters , vol. PP, no. 99, pp. 1–1, 2016.[11] Y. Qiu, S. Li, X. Xu, and Z. Li, “Talk more listen less: Energy-eﬃcient neighbor discovery in wireless sensor networks,” in

IEEE Conference on Computer Communications (INFOCOM), to appear , 2016.[12] W. Sun, Z. Yang, W. Keyu, and L. Yunhao, “Hello: A generic ﬂexible protocol for neighbor discovery,” in

IEEE Conferenceon Computer Communications (INFOCOM) , 2014.[13] D. Zhang, T. He, Y. Liu, Y. Gu, F. Ye, R. K. Ganti, and H. Lei, “

Acc : generic on-demand accelerations for neighbor discoveryin mobile applications,” in

ACM Conference on Embedded Network Sensor Systems, (SenSys) , 2012.[14] L. Chen, R. Fan, L. Chen, M. Gerla, T. Wang, and X. Li, “On heterogeneous neighbor discovery in wireless sensor networks,”in

IEEE Conference on Computer Communications (INFOCOM) , 2015.[15] P. Kindt, D. Yunge, R. Diemer, and S. Chakraborty, “Precise energy modeling for the bluetooth low energy protocol,”

CoRR ,vol. abs/1403.2919, 2014.[16] G. Mokhtari, Q. Zhang, and M. Karunanithi, “Modeling of human movement monitoring using bluetooth low energy tech-nology,” in

International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) , 2015.[17] J. Liu, C. Chen, and Y. Ma, “Modeling neighbor discovery in bluetooth low energy networks,”

IEEE CommunicationsLetters , vol. 16, no. 9, pp. 1439–1441, 2012.[18] ——, “Modeling and performance analysis of device discovery in bluetooth low energy networks,” in2012 IEEE GlobalCommunications Conference (GLOBECOM)