Queueing in the Mist: Buffering and Scheduling with Limited Knowledge
QQueueing in the Mist:Buffering and Scheduling with Limited Knowledge
Itamar Cohen and Gabriel Scalosub
Department of Communication Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva 84105, IsraelEmail: [email protected], [email protected]
Abstract —Scheduling and managing queues with boundedbuffers are among the most fundamental problems in computernetworking. Traditionally, it is often assumed that all the proper-ties of each packet are known immediately upon arrival. However,as traffic becomes increasingly heterogeneous and complex, suchassumptions are in many cases invalid. In particular, in variousscenarios information about packet characteristics becomes avail-able only after the packet has undergone some initial processing.In this work, we study the problem of managing queueswith limited knowledge. We start by showing lower boundson the competitive ratio of any algorithm in such settings.Next, we use the insight obtained from these bounds to identifyseveral algorithmic concepts appropriate for the problem, anduse these guidelines to design a concrete algorithmic framework.We analyze the performance of our proposed algorithm, andfurther show how it can be implemented in various settings,which differ by the type and nature of the unknown information.We further validate our results and algorithmic approach by anextensive simulation study that provides further insights as toour algorithmic design principles in face of limited knowledge.
I. I
NTRODUCTION
Some of the most basic tasks in computer networks involvescheduling and managing queues equipped with finite buffers,where the primary goal in such settings is maximizing thethroughput of the system. The always-increasing heterogeneityand complexity of network traffic makes the challenge ofmaximizing the throughput ever harder, as the packet processingrequired in such queues span a plethora of tasks includingvarious forms of DPI, MPLS and VLAN tagging, encryption /decryption, compression / decompression, and more.The most prevalent assumption in most works studying theseproblems is that the various properties of any packet – e.g., itsQoS characteristic, its required processing, its deadline – areknown upon its arrival. However, this assumption is in manycases unrealistic. For instance, when a packet is recursivelyencapsulated a few times by MPLS, PBB, 802.1Q, GRE orIPSec, it is hard to determine in advance the total numberof processing cycles that such a packet would require [1],[2]. Furthermore, the QoS features of a packet are commonlydetermined by its flow ID, which is in many cases known onlyafter parsing [2].In data center network architectures such as PortLand [3],ingress switches query a cache for an application-to-locationaddress resolution. A cache miss, which is unpredictable bynature, results in forwarding of the packet to the switch softwareor to a central controller, which performs a few additional pro-cessing cycles before the packet can be transmitted. Similarly, in the realm of Software Defined Networks, ingress switchesquery a cache for obtaining rules for a packet [4], which mayalso depend on priorities [5]. In such a case, a cache missresults in additional processing until the rules are retrieved andthe profit from the packet is known.In spite of this increased heterogeneity, and the fact thatthe processing requirement of a packet might not be known inadvance, these characteristics usually become known once someinitial processing is performed. This behavior is common inmany of the applications just described. Furthermore, for trafficcorresponding to the same flow, it is common for characteristicsto be unknown when the first few packets of the flow arrive ata network element, and once these properties are unraveled,they become known for all subsequent packets of this flow.In this work we address such scenarios where the character-istics of some arriving traffic are unknown upon arrival, andare only revealed when a packet has undergone some initialprocessing (parsing), “causing the mist to clear”. We modeland analyze the performance of algorithms in such settings,and in particular we develop online scheduling and buffermanagement algorithms for the problem of maximizing theprofit obtained from delivered packets, and provide guaranteeson their expected performance using competitive analysis.We focus on the general case of heterogeneous processingrequirements (work) and heterogeneous profits [6]. We assumepriority queueing, where the exact priorities depend on thespecifics of the model studied. We present both algorithmsand lower bounds for the problem of dealing with unknowncharacteristics in these models. Furthermore, we highlight somedesign concepts for settings where algorithms have limitedknowledge, which we believe might be applicable to additionalscenarios as well.As an illustration of the problem, assume we have a 3-slotsbuffer, equipped with a single processor, and consider thearrival sequence depicted in Figure 1. In the first cycle wehave seven unit-size packets arriving, out of which three willprovide a profit of 5 upon successful delivery, each requiring5 processing cycles (work). The characteristics of these threepackets are known immediately upon arrival. The characteristicsof the remaining four packets (marked gray) are unknown uponarrival. We therefore dub such packets U -packets (i.e., unknownpackets). Each of these four U -packets may turn out to be eithera ”best” packet, requiring minimal work and having maximalprofit; a ”worst” packet, requiring maximal work and havingminimal profit; or anything in between. Thus, already at the a r X i v : . [ c s . N I] J u l · · · t work=5profit=5work=5profit=5work=5profit=5 work=5profit=5work=5profit=5work=5profit=5 Fig. 1: An illustrative example of an arrival sequence withknown and unknown packetsvery beginning of this simple scenario, any buffering algorithmwould encounter an admission control dilemma: how many U -packets to accept, if any? This dilemma can be addressedby various approaches including, e.g., allocating some bufferspace for U -packets, accepting U -packets only when currentknown packets in the buffer are of poor characteristics, interms of profit, or of profit to work ratio, etc. In case thatthe algorithm accepts U -packets, an additional question arises:which of the U -packets to accept into the buffer? Obviously,for any online deterministic algorithm there exists a simpleadversarial scenario, which would cause it to accept only the”worst” U -packets (namely, packets with maximal work andminimum profit), while an optimal offline algorithm wouldaccept the best packets. This motivates our decision to focusour attention on randomized algorithms.We now turn to consider another aspect of handling trafficwith some unknown characteristics. Assume the scenariocontinues with 5 cycles without any arrival, and then a cyclewith an identical arrival pattern - namely, three known packetswith both work and profit of 5 per packet, and four U -packets.This sheds light on a scheduling dilemma: which of the acceptedpackets should better be processed first? every schedulingpolicy impacts the buffer space available in the next burst. Forinstance, a run-to-completion attitude would enable finishing theprocessing of one known packet by the next burst, thus allowingspace for accepting a new packet without preemption. However,one may consider an opposite attitude - namely, parsing asmany U -packets as possible, thus ”causing the mist to clear”,allowing more educated decisions, once there are new arrivals.In terms of priority queuing, this means over-prioritizing some U -packets, and allowing them to be parsed immediately uponarrival. We further develop appropriate algorithmic conceptsbased on the insights from this illustrative example in SectionIII. A. System Model
Our system model consists of four main modules, namely,(a) an input queue equipped with a finite buffer, (b) a buffermanagement module which performs admission control (c) ascheduler module which decides which of the pending packets should be processed, and (d) a processing element (PE), whichperforms the processing of a packet.We divide time into discrete cycles, where each cycle consistsof three steps: (i) The transmission step, in which fully-processed packets leave the queue, (ii) the arrival step, inwhich new packets may arrive, and the buffer managementmodule decides which of them should be retained in the queue,and which of the currently buffered packets should be pushed-out and dropped, and finally (iii) the processing step, in whichthe scheduler assigns a single packet for processing by the PE,which in turn processes the packet.We consider a sequence of unit-size packets arriving at thequeue. Upon its arrival, the characteristic of each packet maybe known - in which case we refer to the packet as a K -packet (i.e., known packet); or unknown - in which case we refer tothe packet as a U -packet (i.e., unknown packets). We let M denote the maximum number of U -packets that may arrive inany single cycle.Each arriving packet p has some (1) required number ofprocessing cycles ( work ), w ( p ) ∈ { , ..., W } , and (2) intrinsicbenefit ( profit ) v ( p ) ∈ { , ..., V } . To simplify the expressionsthroughout the paper, we assume that both V and W are powersof 2. We use the notation ( w, v ) -packet to denote a packetwith work w and profit v .In our model, similarly to [7], upon processing a U -packetfor the first time, its properties become known. We thereforerefer to such a first processing cycle of a U -packet as a parsingcycle . Non-parsing cycles where the processor is not idle arereferred to as work cycles .The queue buffer can contain at most B packets. We assume B ≥ , since the case where B = 1 is degenerate. The head-of-line (HoL) packet at time t (for a given algorithm Alg) isthe highest priority packet stored in the buffer just prior to theprocessing step of cycle t , namely, the packet to be scheduledfor processing in the processing step of t . We say the bufferis empty at cycle t if there are no packets in the buffer afterthe transmission step of cycle t .We focus our attention on queue management algorithms,which are responsible for both the buffer management andthe scheduling of packets for processing. In particular, wefocus our attention on algorithms targeted at maximizing the throughput of the queue, i.e. the overall profit from all packetssuccessfully transmitted out of the queue.We evaluate the performance of online algorithms usingcompetitive analysis [8], [9]. An algorithm Alg is said tobe c -competitive if for every finite input sequence σ , thethroughput of any algorithm for this sequence is at most c times the throughput of Alg ( c ≥ ). We let OPT denote any(possibly clairvoyant) algorithm attaining optimal throughput.An algorithm is said to be greedy if it accepts packets as longas there is available buffer space. We further focus our attentionon work-conserving algorithms, i.e., algorithms which neverleave the PE idle unnecessarily. Our results degrade by a mere constant factor otherwise. . Related Work
Competitive algorithms for scheduling and management ofbounded buffers have been extensively studied for the pasttwo decades. The problem was first introduced in the contextof differentiated services, where packets have uniform sizeand processing requirements, but some of the packets havehigher priorities, represented by a higher profit associated withthem [10]–[12]. The numerous variants of this problem includemodels where packets have deadlines or maximum lifetime inthe switch [11], environments involving multi-queues [13]–[16]and cases with packets dependencies [17], [18], to name but afew. An extensive survey of these models and their analysis canbe found in [19], where for the most part these works assumedfull knowledge of packets characteristics. While traditionallyit was assumed that packets have heterogeneous profits butuniform work (processing requirements), some recent workintroduced the complementary problem, of uniform profitswith heterogeneous work [20]. This work presented an optimalalgorithm for the fundamental problem, as well as onlinealgorithms and bounds on the competitive ratio for numerousvariants. Subsequent works investigated related problems withheterogeneous work combined with heterogeneous packetsizes [21], or with heterogeneous profits [6], [22]. In particular,the work [6] showed that the competitive ratio of somestraight-forward deterministic algorithms for the problem ofheterogeneous work combined with heterogeneous profits islinear in either the maximal work W , or in the maximal profit V , even when the characteristics of all packets are knownupon arrival. These results motivate our focus on randomizedalgorithms.While most of the literature above assumed that all the char-acteristics of packets are known upon arrival, this assumptionwas put in question recently [7] by noting that it is ofteninvalid. However, the main problem addressed in [7] revolvedaround developing schemes for transmitting packets of thesame flow in-order, even when their required processing timesare unknown upon arrival.Maybe closest to our work are the recent works consideringserving in the dark [23], [24], which investigate an extremecase, where the online algorithm learns the profit from a packetonly after transmitting it. These works consider highly obliviousalgorithms, whereas our model and our proposed algorithmsdwell in a middle-ground between the well studied modelswith complete information, and these recent oblivious settings.Our work further considers traffic with variable processingrequirements, whereas [23], [24] focus on settings where allpackets require only a single processing cycle, and they differonly by their profit.The problem of optimal buffering of packets with variablework is closely related to the problem of job scheduling ina multi-threaded processor, which was extensively studied. Acomprehensive survey of online algorithms for this problemcan be found in [25]. This body of work, however, differssignificantly from our currently studied models. The majordifferences are that packet buffering has to deal with limited buffering capabilities, and is targeted at maximizing throughput.Processor job scheduling, however, usually has no strictbuffering limitations, and is mostly concerned with minimizingthe response time. C. Our Contribution
We introduce the problem of buffering and scheduling whichaims to maximize throughput when the characteristics of someof the packets are unknown upon arrival. We focus our attentionon traffic where every packet has some required processingcycles, and some profit associated with successfully transmittingit. In Section II we present lower bounds on the performanceof any randomized algorithm for the problem, which showthat no algorithm can have a competitive ratio better than
Ω(min { W V, M } ) .In Section III We describe several algorithmic conceptstailored for dealing with unknown characteristics in suchsystems. We follow by presenting an algorithm that applies oursuggested algorithmic concepts in Section IV. For the mostgeneral case we prove our algorithm has a competitive ratioof O ( M log V log W ) .In Sections V-VI we present some modifications and heuris-tics applicable to our algorithm that, while leaving the worst-case guarantees intact, are designed to improve performancecompared to the baseline algorithmic design. The modifiedalgorithm can cope with cases where the maximal amountof work and profit are not known in advance, or when acharacteristic consists of a small set of potential values.We further validate and evaluate the performance of ourproposed algorithms in Section VII via an extensive simulationstudy. Our results highlight the effect the various parametershave on the problem, well beyond the insights arising fromour rigorous mathematical analysis.We conclude in Section VIII with a discussion of our results,and also highlight several interesting open questions.II. L OWER B OUNDS
In this section we present lower bounds on the competitiveratio of any randomized algorithm for our problem.We do so by proving first the following general bound.
Theorem 1. If V ≥ , M ≥ and the work of eachpacket is w ( p ) ∈ { w, w + 1 , . . . , W } where W ≥ , thenthe competitive ratio of any randomized algorithm is at least V ( W − w (cid:20) − (cid:16) − V ( W − − w (cid:17) Mw (cid:21) Proof.
We prove the theorem using Yao’s method [26]. Wewill show that the claim is true even if the optimal offlinealgorithm uses a buffer that can hold only 2 packets. We definethe following distribution over arrival sequences, where eacharrival sequence has two phases: (i)
Fill phase : For N cycles,where N is a large integer, we have M U -packets arriving percycle, where each packet is a ( w, V ) -packet with probability p ,and a ( W, -packet with probability (1 − p ) , for some constant p to be determined later. This phase is followed by (ii) Flushphase : BW cycles without arrivals.o simplify our analysis, we define the SubOPT policy, whichworks as follows: During the fill phase, SubOPT operates in periods of w consecutive cycles each. During each period,SubOPT accepts at most one ( w, V ) -packet which has arrivedduring the period, if such exists. This packet is the oneconsidered picked by SubOPT in that period. Starting from thesecond period, SubOPT processes the packet it picked duringthe previous period (if such a packet exists), and transmits itin the last cycle of the period. During the flush phase, SubOPTprocesses and finally transmits the packet it picked during thelast period. It should be noted that SubOPT is neither greedy,nor work conserving. Moreover, the expected throughput ofSubOPT clearly serves as a lower bound on the expectedoptimal throughput possible.We thus have Nw periods, and the probability that SubOPTsuccessfully picks a ( w, V ) -packet during a period is − (1 − p ) Mw . The performance of SubOPT is therefore at least N Vw [(1 − (1 − p ) Mw )] (1)We now turn to consider the expected performance of anydeterministic algorithm Alg for the problem. We first assumethat Alg begins the flush phase with a buffer full of V -packets,all of them unparsed. This provides Alg with a profit of BV during the flush phase, while still having N processing cyclesduring the fill phase for processing additional packets. Thisprofit is clearly an upper bound on the maximum possiblethroughput attainable by Alg from packets transmitted duringthe flush phase.Consider now the profit of Alg from packets transmittedduring the fill phase. Since Alg is assumed to be workconserving, and we have arrivals throughout the fill phase,there exists some < r ≤ such that the number of parsing,and work, cycles performed by Alg are N r , and N (1 − r ) ,respectively. Consider a case where Alg reveals a V -packet q . Then, processing q and finally transmitting it would surelynot decrease the throughput of Alg relatively to dropping it.Thus, the best deterministic algorithm Alg would work at least w − work cycles per each parsing cycle, in which a ( w, V ) -packet is parsed (recall our condition, that for successfullytransmitting any additional packet, Alg must fully process italready during the fill phase). Therefore, the total number ofwork cycles is at least w − times larger then the expectednumber of parsing cycles, in which a ( w, V ) -packet is revealed: N (1 − r ) ≥ N rp ( w − .If the total number of work cycles during the fill phase ex-ceeds the number of cycles which are necessary for transmittingall the parsed V -packets, Alg may work also on ( W, -packets.Namely, if N (1 − r ) > N rp ( w − , then Alg may work on ( W, -packets for N (1 − r ) − N rp ( w − cycles, transmittingat most one ( W, -packet once in W − such cycles.Combining the above reasoning we conclude that the overallthroughput of Alg is at most N V rp + N (1 − r ) − N rp ( w − W − BV (2) Considering the ratio between the lower bound on theexpected performance of SubOPT (as captured by Equation 1)and the upper bound on the expected performance of Alg (ascaptured by Equation 2) and letting N → ∞ , we concludethat no algorithm can have a competitive ratio better than V ( W − w · − (1 − p ) Mw V rp ( W −
1) + 1 − r − rp ( w − (3)By choosing p ∗ = [ V ( W −
1) + 1 − w ] − , the resultfollows.We now aim to relate the lower bound established in Theo-rem 1 to a simpler and more intuitive function of M, V and W .We do so by means of two propositions, which relate the boundto either Ω( M ) or Ω( V W ) for different ranges of M . In thepropositions we use our notation p ∗ = [ V ( W −
1) + 1 − w ] − from the proof of Theorem 1. The following proposition showsthat if M is relatively small, then the lower bound establishedin Theorem 1 is Ω( M ) . Proposition 2. If V ≥ , w ≥ , W ≥ and ≤ M ≤ p ∗ w + 1 − w , then V ( W − w (cid:104) − (1 − p ∗ ) Mw (cid:105) ≥ M Proof.
Denote n = M w and note that n ≥ . Then, by simplealgebraic manipulation, it suffices to show that (1 − p ∗ ) n ≤ − n (cid:16) p ∗ + w − (cid:17) (4)We now prove that Equation 4 holds true by induction over M . First, we note that when n = 1 , the inequality is equivalentto /p ∗ + w − ≤ p ∗ , which is true for every w ≥ . Assumingthat Equation 4 holds for n , we show that it is true also for n + 1 . By the induction hypothesis, (1 − p ∗ ) n +1 ≤ (1 − p ∗ ) − n (cid:16) p ∗ + w − (cid:17) (5)It therefore suffices to prove that (1 − p ∗ ) − n (cid:16) p ∗ + w − (cid:17) ≤ − n + 12 (cid:16) p ∗ + w − (cid:17) (6)which is equivalent to requiring that (cid:16) p ∗ + w − (cid:17) ≤ p ∗ − n (cid:16) p ∗ + w − (cid:17) (7)Recalling that w ≥ , we have that p ∗ + w − ≥ p ∗ or,equivalently (cid:16) p ∗ + w − (cid:17) ≤ p ∗ (8)nd therefore it suffices to prove that p ∗ ≤ p ∗ − n (cid:16) p ∗ + w − (cid:17) (9)which is satisfied for every M ≤ p ∗ w + 1 − w The following proposition shows that if M is relatively large,then the lower bound established in Theorem 1 is Ω( V Ww ) . Proposition 3. If V ≥ , w ≥ , W ≥ and M > p ∗ w +1 − w ,then V ( W − w (cid:104) − (1 − p ∗ ) Mw (cid:105) > e − e V Ww (10) Proof. As M > p ∗ w + 1 − w and w ≥ , we have M w > p ∗ + w − > p ∗ . Therefore we can denote M w = a p ∗ forsome a > . Then, (1 − p ∗ ) Mw = (cid:104) (1 − p ∗ ) p ∗ (cid:105) a ≤ e − a ≤ e − (11)Therefore V ( W − w (cid:104) − (1 − p ∗ ) Mw (cid:105) ≥ V ( W − w (cid:18) − e (cid:19) (12) = V Ww W − W (cid:18) − e (cid:19) ≥ e − e V Ww Assigning w = 1 in Theorem 1 and Propositions 2 and 3implies the following corollary: Corollary 4.
The competitive ratio of any randomized algo-rithm is
Ω(min { V W, M } ) . In the special case of uniform-profits, we are essentially inter-ested in maximizing the overall number of packets successfullytransmitted. Therefore we may assign V = 1 in Corollary 4,implying the following corollary: Corollary 5.
In the case of uniform-profits, the competitiveratio of any randomized algorithm is
Ω(min { W, M } ) . In the special case of uniform-work, we can assign w = W in Propositions 2 and 3, implying the following corollary: Corollary 6.
In the case of uniform-work, the competitiveratio of any randomized algorithm is
Ω(min { V, M } ) . III. A
LGORITHMIC C ONCEPTS
In this section we describe the algorithmic concepts under-lying our proposed algorithms for dealing with scenarios oflimited knowledge.
Random selection:
Ideally, we would like every arriving U -packet to have at least some minimal probability of beingaccepted and parsed, thus avoiding a scenario where OPTsuccessfully transmits a bulk of “good” packets which theonline algorithm discards. An intuitive way to do that is topick the unknown packets at random. Speculatively Admit:
Competitive algorithms must ensurethey retain throughput from both K -packets and U -packets.Furthermore, once a U -packet is accepted, there is a highmotivation to reveal its characteristics as soon as possible, thusmaking educated decisions in the next cycles.We therefore propose to speculatively over-prioritize un-known packets over known packets in certain cycles. The actof making such a choice in some cycle t is referred to as admitting , in which case cycle t is referred to as an admittancecycle . A U -packet retained due to such a choice is referred toas an admitted packet . Classify and randomly select:
Intuitively, as unknownpacket characteristics are drawn from a wider range of values,the task of maximizing throughput becomes harder, especiallywhen compared to the optimal throughput possible. To dealwith this diversity, we implicitly partition incoming packetsinto classes, where intra-class variability is constrained. Wethen apply a classify and randomly select scheme [27], whichenables us to provide analytic guarantees on the expectedperformance of our algorithms.
Alternate between fill & flush:
This paradigm is especiallycrucial in cases of limited information. The main motivation forthis approach is that whenever a “good” buffer state is identified,the algorithm should focus all its efforts on monetizing thecurrent state, maybe even at the cost of dropping packetsindistinctly. IV. C
OMPETITIVE A LGORITHMS
In this section we present a basic competitive onlinealgorithm for the problem of buffering and scheduling withlimited knowledge. In Sections V and VI we present severalimproved variants of this algorithm. We first provide a high-level description of our algorithm, and then turn to specify itsdetails and analyze its performance.For simplicity of analysis and algorithm presentation, weassume that the values of W and V – the maximal work andprofit per packet, respectively – are known to the algorithm inadvance. Later, in Section VI, we show how to remove thisassumption without harming the performance of our algorithm.We further note that neither of our proposed solutions requireknowing the value of M – the maximum number of unknownpackets arriving in a single cycle – in advance. A. High-level Description of Proposed Algorithm
Our algorithm is designed according to the algorithmicconcepts presented in Section III as follows.
Randomly select and speculatively admit:
In every cycle t during which a U -packet arrives, the algorithm picks t as anadmittance cycle with some probability r (to be determined inthe sequel). In every cycle chosen as an admittance cycle, thelgorithm picks exactly one of the U -packets arriving at t toserve as the admitted packet. This U -packet is chosen uniformlyat random out of all U -packets arriving at t . At the end of thearrival step, the algorithm schedules the admitted U -packet (ifone exists) for processing, hence parsing the packet. We notethat if no such U -packet exists, or if t is not an admittancecycle, then the top-priority packet residing in the Head-of-Line(HOL) is scheduled for processing. The exact notion of prioritywill be detailed later. Classify and randomly select:
We implicitly partition thepossible types of arriving packets into classes C , C , . . . C m ;the criteria for partitioning and the exact value of m willbe specified later. Our algorithm picks a single selected class,uniformly at random from the m classes. Our goal is to provide guarantees on the performance of our proposed algorithm forpackets belonging to the selected class, which is henceforthdenoted G . Packets which belong to the selected class arereferred to as G -packets . Following our previously introducednotation, known (unknown) packets that belong to the selectedclass, i.e., G -packets for which their attributes are known(unknown), are denoted as G K -packets ( G U -packets ).Focusing solely on packets belonging to G may seem like aquestionable choice, especially if there are few packets arrivingwhich belong to this class, or if the characteristics of packetsbelonging to this class are poor (e.g., they have low profit andrequire much work). However, this naive description is meantonly to simplify the analysis. In Section V we show how toremedy this naive approach in order to deal with these apparentshortcomings, while keeping the analytic guarantees intact. Alternate between fill & flush:
Our algorithm will bealternating between two states: the fill state, and the flush state. We define an algorithm to be
Hfull if its buffer is filledwith known G -packets. Once becoming Hfull, our algorithmswitches to the flush state, during which it discards all arrivingpackets and continuously processes queued packets. Once thebuffer empties, the algorithm returns to the fill phase. Again,in Section V we show how to remedy this naive simplifiedapproach. B. The Classify and Randomly Select Mechanism
We now turn to define the various classes considered by ouralgorithm. We say a packet p with w ( p ) > is of work-class C ( W ) i if (cid:100) log w ( p ) (cid:101) = i . If w ( p ) = 1 we assign it to workclass C ( W )1 . Similarly, we say p with v ( p ) > is of profit-class C ( P ) j if (cid:100) log v ( p ) (cid:101) = j , and we assign it to profit class C ( P )1 if v ( p ) = 1 . Equivalently, we say p is of a certain class (eitherwork- or profit-) i if its corresponding value is in the interval X i = (cid:40) [1 , i = 1[2 i − + 1 , i ] i > . (13)This yields a collection of log W work-classes, and log V profit-classes. Lastly, we say a packet p is of combined-class C ( i,j ) if it is of work-class C ( W ) i and of profit-class C ( P ) j .We note that in terms of work, the class to which a packet p belongs is defined statically by the total work of p , and does Algorithm 1
DecideAdmittance() return true w.p. r Algorithm 2
UpdatePhase() if buffer is empty then phase = fill else if buffer is Hfull then phase = flush end if (cid:46) if buffer is neither empty nor Hfull, phase isunchanged. Algorithm 3
Admit( p ) admit p w.p. /A U ( t p ) not depend upon its remaining processing cycles, which maychange over time.Upon initialization, the algorithm selects a class by picking i ∗ ∈ { , . . . , log W } and j ∗ ∈ { , . . . , log V } , each chosenuniformly at random. Then, the selected combined-class is G = C ( i ∗ ,j ∗ ) . C. The SA Algorithm
We now describe the specifics of our algorithm, SpeculativelyAdmit ( SA ), depicted in Algorithm 4. The pseudo-code in Algo-rithm 4 uses the procedures DecideAdmittance() , UpdatePhase() and
Admit( p ) , whose pseudo-code appear in Algorithms 1, 2and 3, respectively. In Algorithm 3 A U ( t p ) denotes the numberof U -packets which arrive in cycle t by the arrival of packet p , including p itself. We note that Algorithm 3 essentiallyperforms reservoir sampling [28].Once in the arrival step, the algorithm updates its phase (line1). If the phase is flush, the algorithm skips the while loop(lines 3-12), thus discarding all arriving packets. If the phaseis fill, the algorithm greedily accepts every arriving packet aslong as its buffer is not full (lines 4-5). If the buffer is full,however, the algorithm accepts an arriving packet only if itis either a known packet from the selected class (namely, a G K -packet), or a U -packet, which was randomly picked to beadmitted (lines 6-8). In either of these cases, the last packetin the queue is dropped (line 7), so as to free space for theaccepted packet.While in the processing step, if the algorithm is in the fillphase and there exists an admitted packet, the algorithm pushesit to the HoL, with the purpose of processing it immediatelyin the cycle of arrival, thus revealing its characteristics (lines13-15). Finally, the algorithm updates its phase and sorts thequeued packets in G K -first order each time it either acceptsor processes a packet (lines 10-11 and 17-18). D. Performance Analysis
We now turn to show an upper bound on the performanceof our algorithm (for
W, V > ), captured by the followingtheorem: lgorithm 4 SA : at every time slot t after transmission Arrival Step: phase = UpdatePhase() admittance = DecideAdmittance() while phase == fill and exists arriving packet p do if buffer is not full then accept p else if p is a G K -packet or Admit( p ) then drop packet from tail accept p end if phase = UpdatePhase() sort queued packets in G K -first order, break ties byFIFO end while Processing Step: if phase == fill and there exists an admitted packet p then move p to the HoL end if process HoL-packet phase = UpdatePhase() sort queued packets in G K -first order, break ties by FIFO Theorem 7. SA is O ( Mr log W log V ) -competitive. The proof is found in appendix B.V. I
MPROVED A LGORITHMS
Algorithm SA selects a single class uniformly at random sothat the characteristics of packets on which it focuses differ byat most a constant factor. This gives the sense of “uniformity”of traffic, which in turn reduces the variability of characteristicsof packets on which the algorithm focuses. However, inpractice there are various cases where the strict decisionsmade by SA can be relaxed without harming its competitiveperformance guarantees. In practice, such relaxations actuallyallow obtaining a throughput far superior to that of SA . In whatfollows we describe such modifications, which we incorporateinto our improved algorithm, SA * , and prove that all ourperformance guarantees for SA still hold for SA * . Class closure:
Given any partitioning of packets intoclasses as described in Section IV-B, (cid:8) C ( i,j ) | i = 1 , . . . , log W, j = 1 , . . . , log V (cid:9) , we let the ( i, j ) -closure class be defined as C ∗ ( i,j ) = (cid:83) i (cid:48) ≤ i,j (cid:48) ≥ j C ( i (cid:48) ,j (cid:48) ) .This definition effectively assigns any packet which is at leastas good as any packet in C ( i,j ) , to the ( i, j ) -closure class. Weemphasize that any such packet p must satisfy both w ( p ) ≤ i and v ( p ) ≥ j − . We let SA * denote the algorithm where theselected class G is chosen to be C ∗ ( i,j ) , for some values of i, j chosen uniformly at random from the appropriate sets. Asimple swap argument shows that thus picking C ∗ ( i,j ) by SA * ,instead of selecting C ( i,j ) as done in SA , leaves the analysisdetailed in Section IV-D intact. Fill during flush (pipelining):
Algorithm SA was definedsuch that no arriving packets are ever accepted during theflush phase. This enables the partitioning of time into disjointintervals (determined by SA ’s buffer being empty et theend of such an interval), and applying the comparison ofperformance of OPT, on the one hand, and SA , on the otherhand, independently for each interval. In practice, however,allowing the acceptance of packets during a flush phase cannotharm the analysis, nor the actual performance, if this is doneprudently: packets which arrive during the flush phase areaccepted according to the same priority suggested by thealgorithm’s behavior in the fill phase. Furthermore, packetswhich arrive during the flush phase are stored in the buffer,but never scheduled for processing before all B packets thatare stored in the buffer when it turns Hfull are transmitted. Improved scheduling: SA sorts the queued packets in G K -first order. For simplicity of presentation, we assumed inSection IV that within the set of G K -packets, as well as withinthe set of non- G K -packets, packets are internally ordered byFIFO. However, one may consider other approaches as wellto performing such scheduling for each of these sets (whilemaintaining G K -first order between the sets). We considerspecifically the following methods: (i) FIFO, (ii) W -then- V ,which orders packets by an increasing order of remaining work,and breaks ties by decreasing order of value, and (iii) decreasingorder of packet effectiveness , where the effectiveness of a packetis defined as its profit-to-work ratio.We emphasize that the packet scheduled for processingduring an admittance cycle remains a U -packet, which isselected uniformly at random from the arriving U -packets atthis cycle. All the non -admitted U -packets, however, are locatedat the tail of the queue, thus representing the fact that theirpriority is lower than that of every known packet. By applyingdifferent scheduling regimes, we obtain different flavors of SA * . The following Theorem shows that the performance ofall flavors of SA * is at least as good as the performance of SA . Theorem 8. SA * is O ( Mr log W log V ) -competitive. For proof, see Appendix C. We study the performance ofthe various flavors of SA * in Section VII.VI. P RACTICAL I MPLEMENTATION
While presenting our basic algorithm in Section IV, weassumed for simplicity that the values of W and V – themaximal work and profit per packet, respectively – are knownto the algorithm in advance. We further assumed, that thework (resp., profit) assigned to a packet may take any value , . . . , W (resp., , . . . , V ). In this section we show how torelax these assumptions without harming the performanceof our algorithms, and potentially even allow for improvedperformance guarantees. A. Adaptation of SA for a Case of Limited Possible Values We now show that when a characteristic consists of asmall set of potential values, the logarithmic dependency ofhe competitive ratio of SA on the maximal value of thecharacteristic can be transformed to a linear dependency onthe number of distinct values for this characteristic.Denote the number of distinct work values by (cid:96) W , and theset of work values by L W = { w , w , . . . w (cid:96) W } . Similarly,denote the number of distinct profit values by (cid:96) V , and the setof profit values by L V = { v , v , . . . v (cid:96) V } .We consider now a case where (cid:96) W ≤ log W and (cid:96) V ≤ log V , and show an improved upper bound for this case. Wedub the adaptation of SA for this case, of Small Sets , SA SS .Pick w ∗ i ∈ L W and v ∗ j ∈ L V , each uniformly at random,and let G = C ( w ∗ i ,v ∗ j ) be the selected combined-class. We note,that now the selected work-class (resp. profit-class) consistsof a single concrete value of work (resp. profit), rather than arange of values. Then, using similar proof to that of Theorem7, we can show the following theorem: Theorem 9. SA SS is O ( Mr (cid:96) W (cid:96) V ) -competitive. For proof, see Appendix D.
B. Handling Unknown Maximal Profit and Work Values
So far we have assumed that our algorithms know in advancethe values of W and V – the maximal work and profit perpacket, respectively. We now show that we can implementthe random class selection as prescribed in our algorithmswithout knowing the values of W and V in advance. Werefer to an algorithm implementation that does not know thesevalues in advance as a values-oblivious algorithm, and to analgorithm implementation that knows the values of W and V in advance as a values-aware algorithm. We will show that avalues-oblivious algorithm can obtain a performance which isno worse than that of a values-aware algorithm, even if thevalues-aware algorithm knows not only W and V , but also theconcrete classes in which packets will arrive.Our implementation of a values-oblivious algorithm is basedon an application of Reservoir sampling [28] on classes revealedduring packet arrivals, as we will detail shortly. A new class isrevealed either due to the arrival of a K -packet p , or due to a U -packet q being parsed, corresponding to a class previouslyunknown to the algorithm. We call such an event an uncoveringof a new class .The values-oblivious algorithm implementation performs thefollowing alongside all decisions made by the values-awarealgorithm: Before the arrival sequence begins we initiate acounter N of known classes to be N = 0 . Upon the uncoveringof a new class at t the algorithm increments N by one (toreflect the updated number of known classes), and replaces thepreviously selected class with the new class with probability /N .As the above procedure essentially performs a Reservoirsampling on the collection of classes known to the algorithm,it essentially implements the selection of a class uniformly atrandom among all aposteriori known classes [28].It therefore follows that the distribution of the packetscorresponding to the eventual selected class (after the sequenceends) handled by the values-oblivious algorithm is identical to the distribution of the packets handled by the values-aware algorithm. Therefore the expected performance of thevalues-oblivious algorithm is lower bounded by the expectedperformance of the values-aware algorithm.We note that the implementation of the values-obliviousalgorithm can be applied to any of the variants described inour previous sections.VII. S IMULATION S TUDY
In this section we present the results of our simulation studyintended to validate our theoretical results, and provide furtherinsight into our algorithmic design.
A. Simulation Settings
We simulate a single queue in a gateway router whichhandles a bursty arrival sequence of packets with high workrequirements (corresponding, e.g., to IPSec packets, requiringAES encryption/decryption) as well as packets with low workrequirements (such as simple IP packets requiring merely IPv4-trie processing). Arriving packets also have arbitrary profits,modeling various QoS levels.Our traffic is generated by a Markov modulated Poissonprocess (MMPP) with two states, LOW and HIGH, such thatthe burst during the HIGH state generates an average of 10packets per cycle, while the LOW state generates an average ofonly . packet per cycle. The average duration of LOW-stateperiods is a factor W longer than the average duration of HIGH-state periods. This is targeted at allowing some traffic arrivingduring the HIGH-state to be drained during the LOW-state.In our simulations, we do not deterministically bound themaximum number, M , of U -packets arriving in a cycle, butrather control the expected intensity of U -packets by lettingeach arriving packet be a U -packet with some probability α ∈ [0 , . We thus obtain that the expected number of U -packets per cycle during the HIGH state is α .In real-life scenarios, the maximum work, W , required by apacket, is highly implementation-depended. It depends on thespecific hardware, processing elements, and software modules.However, several works which investigated the required workon typical tasks [29]–[31] indicate that W is two orders ofmagnitude larger than the work required for doing an IPv4-Trie search or classification of a packet. We refer to IPv4-Triesearch or classification of a packet as the baseline unit ofwork, captured by our notion of “parsing”. We therefore set themaximum work required by a packet to W = 256 throughoutthis section.Determining the maximum profit, V , associated with apacket, is a challenging task. This value depends both onimplementation details, as well as on proprietary commercialand business considerations. In order to have a diverse set ofvalues, which model distinct QoS requirements, we set themaximum profit associated with a packet to V = 16 throughoutthis section.The values W = 256 and V = 16 imply a total of · potential classes for the algorithm to select from, respectively.The value of each characteristic for each packet is drawn fromig. 2: Probability distribution function of the characteristicsvaluesa Pareto-distribution, with average and standard deviations of17.97 and 22.22 for packet work, and 3.66 and 3.20 for packetprofit. The probability distribution function of the characteristicsvalues is depicted in Figure 2. Note, that for disallowing valuesabove the maximum (256 for work values and 16 for profitvalues), all the cases where the randomly generated valueswere above the maximum were truncated, namely, treated as ifthe generated value was exactly the maximal value. Thereforethe plot in Figure 2 shows a spike at its maximum. Unlessstated otherwise, we assume that B = 10 , r = 1 and eacharriving packet is a U -packet with probability α = 0 . . Wethus obtain that the expected number of U -packets arrivingduring the HIGH state is . ·
10 = 3 per cycle.As a benchmark which serves as an upper bound on theoptimal performance possible, we consider a relaxation ofthe offline problem as a knapsack problem. Arriving packetsare viewed as items, each with its size (corresponding to thepacket’s work) and value (corresponding to the packet’s profit).The allocated knapsack size equals the number of time slotsduring which packets arrive. The goal is to choose a highest-value subset of items which fits within the given knapsacksize. This is indeed a relaxation of the problem of maximizingthroughput during the arrival sequence in the offline setting,since the knapsack problem is not restricted by any finite buffersize during the arrival sequence, nor by the arrival time ofpackets (e.g., it may “pack” packets even before they arrive).We employ the classic 2-approximation greedy algorithm forsolving the knapsack problem [32], and use its performanceas an approximate upper bound on the performance of OPT.For considering the additional profit which OPT may gainfrom packets which reside in its buffer at the end of thearrival sequence, we simply allow the offline approximationan additional throughput of BV for free, which is an upperbound on the benefit it may achieve after the arrival sequenceends.We compare the performance of studied algorithms byevaluating their performance ratio , which is the ratio betweenthe algorithm’s performance and that of our approximate upperbound on the performance of OPT.We compare the performance of the following algorithms:1) FIFO : A simple greedy non-preemptive FIFO disciplinethat simply accepts packets and processes each packet until completion, regardless of its required work or value.2) SA : Algorithm SA , described in Section IV.3) SA * FIFO : Algorithm SA * where packets are processedin FIFO order.4) SA * W -Then- V : Algorithm SA * where packets areprocessed in increasing order of remaining work, breakingties in decreasing order of profit.5) SA * EFFECT : Algorithm SA * where the packets areprocessed in decreasing order of their profit-to-work ratio.We recall that all the flavors of SA * listed above maintain a G K -first order, and differ only in the internal ordering within each set (namely, within the set of G K -packets, as well aswithin the set of non - G K -packets).All flavors of SA * described above employ the class-closureand the fill-during-flush optimizations defined in Section V.For each choice of parameters we show the average of running100 independently-generated traces of 10K packets each. In allour simulations the standard deviation was below 0.035. B. Simulation Results
Figures 3-6 show the results of our simulation study. Firstwe note that SA exhibits a very low performance ratio, similarto that of a simple FIFO (which disregards packets parametersaltogether). This is due to the fact that SA focuses only ona specific class, which consists of a relatively small part ofthe input, and it thus spends processing cycles on packets thatwould not be eventually transmitted.For the variants of SA * we consider, in all simulationsthe best scheduling policy is by non-increasing effectiveness,followed by employing the W -then- V approach. FIFO schedul-ing, in spite of it being simple and attractive, comes in lastin all scenarios. This behavior is explained by the fact thatboth former scheduling policies in SA * clear the buffer moreeffectively once it is Hfull. The latter FIFO scheduling approachclears the buffer in an oblivious manner, and therefore doesn’tfree up space for new arrivals fast enough. We now turn todiscuss each of the scenarios considered in our study.
1) The Effect of Selected Class:
Our first set of results shedslight on the effect of the class selected by an algorithm on itsperformance. Figure 3 shows the results where the selectedprofit-class is 1, which makes SA * allow all profits, and thechoice of work-class i ∗ varies. The most interesting phenomenais exhibited by SA * FIFO. Its performance is very poor if thework-class may contain packets requiring very little work. Thisis due to the fact that only a small fraction of the traffic requiresthis little work, and the algorithm scarcely arrives at beingHfull. As a consequence, the algorithm handles many low-priority packets, which are handled in FIFO order, giving riseto far-from-optimal decisions. The algorithm steadily improvesup to some point, and then its performance deteriorates fastas it assigns high-priority to packets with increasingly higherprocessing requirements. In this case the algorithm becomesHfull too frequently, and allocates many processing cyclesto low-effectiveness packets. The maximum performance isachieved for i ∗ = 3 , which implies that the algorithm flusheswhenever its buffer is filled up with packets whose work isig. 3: Effect of chosen work-class i ∗ at most i ∗ = 8 . This value suffices to allow the algorithmto prioritize a rather large portion of the arrivals (recallingthe Pareto distribution governing packet work-values), whileensuring the processing toll of high-priority packet is not toolarge. This strikes a (somewhat static) balance between theamount of work required by a packet, and its expected potentialprofit. The other variants of SA * exhibit a gradually decreasingperformance, due to their higher readiness to compromise overthe required work of packets they deem as high-priority traffic. SA shows a similar performance deterioration, for a similarreason, when the selected work-class i ∗ is increased from 1 upto 6. However, when increasing i ∗ above 6, SA ’s performanceincreases again. This improvement is explained by the factthat, due to the Pareto-distribution of the work values, thenumber of packets which belong to each work-class rapidlydiminishes when switching to work-class indices closest to themaximum of 8. In such a case, SA is coerced to process alsopackets which do not belong to the selected class – namely,packets with lower work – which somewhat compensates forthe poor choice of the work-class. We verified this explanationby additional simulations (not shown here), in which the work-class of packets was chosen from the uniform distribution. Insuch a case, where there is an abundance of packets fromevery possible work-class, the performance of SA consistentlydegrades with the increase of i ∗ , which implies a poorer choiceof work-class.Similar phenomena are exhibited in Figure 4, where weconsider the effect of the profit-class j ∗ selected by an algorithmon its performance. In this set of simulations all work-valueswere allowed (i.e., the selected work-class is 8). In this scenariothe performance of all algorithms improves as the selectedprofit-class index increases, and the algorithms are able tobetter restrict their focus on high profit packets as the packetsreceiving high-priority. We note the fact that SA * FIFO andregular FIFO have a matching performance in the case theselected profit-class is 1, since in this case SA * FIFO isidentical to plain FIFO (since it simply indiscriminately acceptsall incoming packets in FIFO order).In subsequent results described hereafter, we fix both thework-class and the profit-class to be 3, which represents amid-range class for both the profit and the work. Fig. 4: Effect of chosen profit-class j ∗
2) The Effect of Missing Information:
Figure 5 illustratesthe performance ratio of our algorithms as a function of theexpected number of U -packets arriving during the HIGH state,where we vary the value of α from 0 to 1. This provides furtherinsight as to the performance of each algorithm as a function ofthe intensity of unknown packets. We recall that for our choiceof parameters, the values of α translate to having the expectednumber of unknown packets per cycle during the HIGH statevary from 0 to 10. As one could expect, the performance ratioof SA and of all versions of SA * degrades as the amount ofuncertainty increases.Finally, we study the intensity of exploring unknown packets,as depicted by the choice of parameter r which determineswhether a cycle is an admittance cycle or not. The resultsdepicted in Figure 6 consider the case of high uncertainty,where M is essentially unbounded, and all arriving packetsare unknown.Observe first the special case where r = 0 , which representsan extreme case, in which, although all arriving packets areunknown, our algorithms do not explore any new packets, andactually degenerate to a simple FIFO, and therefore exhibitidentical performance. Increasing the admittance probability r ,however, yields a steady increase in performance, albeit withdiminishing returns. Similar results were obtained also whensome of the packets are known, but with smaller marginalbenefits. These results coincide with our analytic results, whichfurther validate our algorithmic approach.VIII. C ONCLUSIONS AND F UTURE W ORK
We consider the problem of managing buffers where traffichas unknown characteristics, namely required processing andprofits. We define several algorithmic concepts targeted atsuch settings, and develop several algorithms that follow oursuggested prescription. We analyze the performance of ouralgorithms theoretically using competitive analysis, and alsovalidate their performance via simulation which further servesto elucidate our design criteria. Our work can be viewed asa first step in developing fine-grained algorithms handlingscenarios of limited knowledge in networking environmentsfor highly heterogeneous traffic.Our work gives rise to a multitude of open questions,including: (i) closing the gap between our lower and upperig. 5: Effect of expected number of U -packets during theHIGH stateFig. 6: Effect of admittance probability of U -packets r bound for the problem, (ii) applying our proposed approachesto other limited knowledge networking environments, and(iii) devising additional algorithmic paradigms for handlinglimited knowledge in heterogeneous settings.A PPENDIX
A. Preliminaries
We now define some of the notation that will be usedthroughout the appendix.For every cycle t and packet type α , we denote by A α ( t ) thenumber of α -packets that arrive in cycle t . For instance, A ( K ) ( t ) ( A ( U ) ( t ) ) denotes the number of K -packets ( U -packets) whicharrive in cycle t . This notation can be combined with the workand profit values of packets. For instance, A ( U ) ( w,v ) ( t ) denotesthe number of U -packets with work w and profit v , whicharrive in cycle t .Our proofs involve a careful analysis of the expected profitof our algorithms from packets which arrive when it is eitherin the fill or the flush phase. Therefore, we now turn to definethe exact notion of cycles belonging to either phase. We saythat an algorithm is in the flush phase in a specific cycle t ifit is in the flush state at the end of the arrival step of cycle t .If it’s not in the flush phase in cycle t , then we say it is in thefill phase in cycle t . Denote by P (fill) and P (flush) the sets ofcycles in which our algorithm is in the fill and flush phases,respectively. For every packet type α , we denote by S α ( t ) the expectedprofit of the algorithm from α -packets which arrive in cycle t , and by S α = (cid:80) t S α ( t ) the overall expected profit of Algfrom α -packets. We denote by O α the expected profit of OPTfrom α -packets. Again, these notations can be combined withprevious notations. For instance, O G U ( t ) denotes the overallexpected profit of OPT from G U -packets. Furthermore, O (fill) G U denotes the expected profit of OPT from G U -packets whicharrive during P (fill) . B. Proof Of Theorem 7
Our proof will follow from a series of propositions. Thefirst proposition shows that SA never drops a G K -packet. Proposition 10. SA successfully transmits every G K -packetwhich resides in its buffer.Proof. The only case in which SA may drop a packet (line 7)is when the buffer is full (due to the if clause in line 4), butnot Hfull (due to the while condition in line 3, which assuresthat the algorithm is in the fill phase). Since no G K -packet inthe buffer is ever dropped, all such packets will be transmittedonce the arrival sequence terminates.The following proposition ensures a specific distributionover the admitted U -packet in an admittance cycle: Proposition 11.
In every fill cycle t defined as an admittancecycle, SA ’s admitted packet is chosen uniformly at random outof all U -packets arriving at t . Proposition 11 follows from the fact that the algorithmimplements Reservoir sampling [28].The following lemma shows that the overall number of G -packets transmitted by SA is at least a significant fraction ofthe number of G -packets accepted by an optimal policy duringa fill phase. Lemma 12. S G ≥ rM O (fill) G .Proof. In each cycle t ∈ P (fill) in which U -packets arrive,with probability r SA admits one U -packet, denoted p . ByProposition 11, p is picked uniformly at random out of at most M unknown arrivals, and therefore the probability that p ∈ G U is at least A G U ( t ) /M . As p is parsed in the cycle of arrival,in the subsequent cycle it is known. By Proposition 10, if p isa G K -packet, then SA will eventually transmit p . Hence S G U ( t ) ≥ rM (cid:88) w ∈ X i ∗ ,v ∈ X j ∗ [ v · A ( U )( w,v ) ( t )] (14)Therefore, S G U ≥ rM (cid:88) t ∈ P (fill) (cid:88) w ∈ X i ∗ ,v ∈ X j ∗ [ v · A ( U )( w,v ) ( t )] ≥ rM O (fill) G U (15)In addition, S G K ≥ O (fill) G K since SA does not discard arriving G K -packets during P (fill) . Therefore S G = S G K + S G U ≥ rM ( O (fill) G K + O (fill) G U ) = rM O (fill) G (16)e are now in a position to prove Theorem 7. Proof.
Every class C ( i,j ) is the selected class with probability W · log V . Using Lemma 12 we therefore have for all i ∈ { , , . . . , log W } and j ∈ { , , . . . , log V } , S ( i,j ) ≥ rM log W log V O (fill)( i,j ) .Summing over all the classes, we obtain that the expectedthroughput of our algorithm is log W (cid:88) i =1 log V (cid:88) j =1 S ( i,j ) ≥ rM log W log V log W (cid:88) i =1 log V (cid:88) j =1 O (fill)( i,j ) (17)Note that if SA is never Hfull during an arrival sequence,then by Equation 17 the ratio between the performance of OPTand that of SA is at most Mr log W log V , as required.Assume next that SA becomes Hfull during an inputsequence. In such a case we compare the overall throughputdue to packets transmitted by SA until the first cycle in whichits buffer is empty again, and the profit obtained by OPT dueto packets accepted by OPT during the same interval. We notethat our analysis would also apply to subsequent such intervals,namely, until the subsequent cycle in which SA is empty again.Denote by δ Si ∗ the upper bound on the remaining work ofeach G -packet which is found in the buffer of SA at thebeginning of the flush phase, and let π Sj ∗ denote the lowerbound on the profit of each such packet at this time. Similarly,denote by δ Oi ∗ the lower bound on the remaining work of each G -packet which OPT accepts and transmits during P flush , andby π Oj ∗ the upper bound on the profit of each such packet.We note that in case SA becomes Hfull, SA holds in its bufferexactly B G -packets, and all these packets are transmitted bythe time SA is empty again. Therefore SA gains at least π Sj ∗ B during the flush phase, which takes at most δ Si ∗ B cycles.Since OPT is equipped with a buffer of size B , there can beno more than δ Si ∗ δ Oi ∗ B + B G -packets handled by opt during theflush phase, which translate to a gain of at most ( δ Si ∗ δ Oi ∗ B + B ) π Oj ∗ from these packets. This implies that O (flush) G S G ≤ ( δ Si ∗ δ Oi ∗ B + B ) π Oj ∗ π Sj ∗ B = ( δ Si ∗ δ Oi ∗ + 1) π Oj ∗ π Sj ∗ (18)By our definition of classes, it follows that δ Si ∗ δ Oi ∗ = i ∗ i ∗− = 2 and π Oj ∗ π Sj ∗ = j ∗ j ∗− = 2 . By substituting these values in Equation18, we obtain O (flush) G S G ≤ .As every class C ( i,j ) is the selected class w.p. W log V ,we have ∀ i ∈ { . . . log W } , j ∈ { . . . log V } , S ( i,j ) ≥
16 log W log V O (flush)( i,j ) .Summing over all the classes we obtain log W (cid:88) i =1 log V (cid:88) j =1 S ( i,j ) ≥
16 log W log V log W (cid:88) i =1 log V (cid:88) j =1 O (flush)( i,j ) . (19) Combining equations 17 and 19 implies that the competitiveratio of SA is at most (cid:80) log Wi =1 (cid:80) log Vj =1 [ O (fill)( i,j ) + O (flush)( i,j ) ] (cid:80) log Wi =1 (cid:80) log Vj =1 S ( i,j ) ≤ ( Mr +6) log W log V, (20)which completes the proof.Our analysis shows that the best bound on the competitiveratio is attained for r = 1 , i.e., every cycle where we have U -packets arriving should be an admittance cycle. Randomizationshould be maintained only for choosing the specific U -packetto be admitted, and the choice of the selected class. In practicalscenarios, however, one might want to be more conservativein choosing admittance cycles. E.g., one might choose r < so as to allow non-parsing cycles even when U -packets arrive.Our analysis provides a worst-case performance guarantee forsuch settings, and we further explore the effect of such choicesin Section VII.In the special case of homogenous profit values, we assign V = 2 in the upper bound implied by Theorem 7 and obtainthe following corollary: Corollary 13.
In the special case of homogenous profit values, SA is O ( Mr log W ) -competitive. In the special case of homogenous work values, we assign W = 2 in the upper bound implied by Theorem 7 and obtainthe following corollary: Corollary 14.
In the special case of homogenous work values, SA is O ( Mr log V ) -competitive.C. Proof Of Theorem 8Proof. We first consider the effect of uniformly at randomselecting a class closure, instead of selecting a specific class. SA * satisfies Lemma 12, because S ∗ G ∗ ≥ S G ≥ rM O (fill) G .Furthermore, the work (resp. profit) of SA * from each packetwhich lies in its buffer during the flush is either equal or lower(resp. higher) than that of SA from each packet, which lies inits buffer during flush. Therefore, the performance of SA * isat least as good as that of SA , namely, SA * satisfies Equation20.Consider next the affect of performing fill during flush. In SA * we accept packets also during the flush phase, but we neverprocess any of these packets before all packets contributing tothe algorithm being Hfull are transmitted, i.e., they are neverprocessed before the flush phase is complete. We enumeratethe fill phases and the subsequent flush phases as follows: P fill , P flush , P fill , P flush , . . . , P fill n , P flush n , where n ≥ . Itshould be noted that each such phase corresponds to a series ofdisjoint time intervals defined by the first cycle of the sequenceof phases. We further denote the P flush phase as an empty setof cycles, and in case that the sequence ends by a fill phase,we also let P flush n denote an empty set of cycles. We denoteby an additional star-superscript all the above notations whenapplied to the fill and flush phases of SA * .e denote the profit accrued by SA and OPT from packetswhich arrive during the i th fill phase by S ( P fill i ) and O ( P fill i ) respectively. Similarly, denote the profit of SA and OPTobtained from packets which arrive during the i th flush phaseby S ( P flush i ) and O ( P flush i ) , respectively. Finally, we use againa starred-superscript version of the notation when referring to SA * . Namely, S ∗ ( P ∗ fill i ) and S ∗ ( P ∗ flush i ) indicate the profit of SA * obtained from packets which arrive during its i th fill andflush phase, respectively.Using this notation, we recall that, by the analysis of SA presented in Theorem 7 O ( P fill i ) + O ( P flush i ) ≤ ( Mr + 6) log W log V · S ( P fill i ) (21)for every i = 1 , . . . , n .This induces an implicit mapping φ of the units of profitobtained from G -packets accepted by OPT during P fill i ∪ P flush i to the units of profit obtained from G -packets accepted by SA during P fill i (either known, or unknown that were parsed),such that every unit of profit obtained by SA has at most ( Mr + 6) log W log V units of profit mapped to it.A key observation is noting that the image of mapping φ is essentially the profit attained from the set of G -packetscontributing to the algorithm being Hfull at the end of thecorresponding fill phase.As SA * may accept packets during flush, in the beginningof the subsequent fill phase the buffer of SA * may not beempty. In particular, there could be G -packets accepted duringthe recent flush phase that are stored in the buffer. However,none of these packets have any OPT packets mapped to them.It follows that these packets can contribute to SA * becomingHfull in the new fill phase, and any profit implicitly mapped tothe profit of these packets by φ would correspond to packetsarriving during the new fill phase, or its subsequent flush phase.The implicit mapping is depicted in Figure 7, along with thedifference between the mapping arising from the behavior of SA (visualized above the time axis), and the mapping arisingfrom the behavior of SA * (visualized below the time axis). Notethat the fill and flush phases of both algorithms need not besynchronized, since SA * can potentially become Hfull “faster”than SA .It follows that Equation 21 now translates to O ( P ∗ fill i ) + O ( P ∗ flush i ) ≤ ( Mr + 6) log W log V · ( S ∗ ( P ∗ flush i − ) + S ∗ ( P ∗ fill i ) ) (22)for every i = 1 , . . . , n . Summing over all i = 1 , . . . , n , weobtain that the competitive ratio guarantee for SA * is the sameas that for SA .Lastly, the analysis of SA does not assume any specificscheduling rule to be applied, as long as the G ( K ) -first orderrule is maintained. Thus, our competitive ratio guarantee isindependent of the specific scheduling regime applied in orderto prioritize the handling of G ( K ) -packets. D. Proof Of Theorem 9Proof.
We first consider Lemma 12. Observe, that SA SS satisfies Equation 14, which now degenerates to S G ( U ) ( t ) ≥ rM v ∗ j · A ( U )( w ∗ i ,v ∗ j ) ( t ) (23)Therefore, SA SS satisfies also Equation 15, which degeneratesto S G ( U ) ≥ rM v ∗ j (cid:80) t ∈ P (fill) · A ( U )( w ∗ i ,v ∗ j ) ( t ) ≥ rM O (fill) G ( U ) . There-fore, SA SS satisfies also Equation 16 and Lemma 12.We carefully consider now the proof of Theorem 7, andfocus on the differences between SA and SA SS .Every class C ( i,j ) is the selected class with probability (cid:96) W · (cid:96) V .Using Lemma 12 we therefore have ∀ i ∈ L W , j ∈ L V , S ( i,j ) ≥ rM · (cid:96) W · (cid:96) V O (fill)( i,j ) . SA SS therefore satisfies Equation 17, which is modified to (cid:88) i ∈ L W (cid:88) j ∈ L V S ( i,j ) ≥ rM · l W · l V (cid:88) i ∈ L W (cid:88) j ∈ L V O (fill)( i,j ) (24) SA SS satisfies Equation 18, where now δ Si ∗ = δ Oi ∗ and π Sj ∗ = π Oj ∗ , and therefore O (flush) G S G ≤ . Equation 19 is modifiedaccordingly to: (cid:88) i ∈ L W (cid:88) j ∈ L V S ( i,j ) ≥ · (cid:96) W · (cid:96) V (cid:88) i ∈ L W (cid:88) j ∈ L V O (flush)( i,j ) (25)Combining equations 24 and 25 implies that the competitiveratio of SA SS is at most (cid:80) i ∈ L W (cid:80) j ∈ L V [ O (fill)( i,j ) + O (flush)( i,j ) ] (cid:80) i ∈ L W (cid:80) j ∈ L V S ( i,j ) ≤ ( Mr + 2) (cid:96) W · (cid:96) V , (26)which completes the proof. E. Bounding the Number of U -Packets in a Cycle In this appendix we show how to bound the probabilityof having more than 10 U -packets arriving in a single cyclein the HIGH state in the simulations settings described inSection VII-A. While the MMPP is in the HIGH state, thegeneration of new arrivals is governed by a Poisson processwith parameter λ = 10 . We can therefore denote the numberof arrivals in a burst cycle by a random variable X , where X ∼ P (10) . We denote by the random variable Y the numberof U -packets arriving in a burst cycle, and by the (conditional)random variable Y n the number of U -packets arriving in a burstcycle, given that the total number of arrivals in this cycle is n ,namely Y n ( y ) = P r ( Y = y | X = n ) . As Y n is the results of n Bernoulli trials, where the probability of success is p = 0 . ,we have Y n ∼ B ( n, . . Then, the probability of having morethan k U -packets arriving in a burst cycle is P r ( Y > k ) = ∞ (cid:88) n = k [ P r ( Y n > k ) · P r ( X = n )] < N (cid:88) n = k [ P r ( Y n > k ) · P r ( X = n )] + ∞ (cid:88) n = N +1 P r ( X = n ) = N (cid:88) n = k [ P r ( Y n > k ) · P r ( X = n )] + P r ( X > N ) (27) ime P fill P flush P fill P flush P fill P flush P fill P flush P ∗ fill P ∗ flush P ∗ fill P ∗ flush P ∗ fill P ∗ flush P ∗ fill P ∗ flush packets accepted by OPTpackets accepted by OPTG-packets contributingto SA being HfullG-packets contributingto SA * being Hfull Fig. 7: Visualization of the mappings induced by the analysis of SA and SA * , for the first 4 fill and flush phases. The fill andflush phases of SA are denoted P fill i and P flush i , respectively, whereas the fill and flush phases of SA * are denoted P ∗ fill i and P ∗ flush i , respectively. The top part shows the mapping of profit corresponding to packets accepted by OPT along time, to theprofit corresponding to G -packets accepted by SA during the fill phase (since SA does not accept any packets during the flushphase). The bottom part shows the induced mapping of profit obtained by packets accepted by OPT along time to the profit of G -packets accepted by SA * during both the preceding flush phase, and the current fill phase.where N is a large integer. Denote the cumulative distributionfunctions of X , Y n , by CDF X , CDF Y n , respectively. Then,we can assign in Eq. 27 P r ( X > N ) = 1 − CDF X ( N ) and P r ( Y n > k ) = 1 − CDF Y n ( k ) and obtain an upper boundon the probability of having more than k arrivals in a burstcycle. We can make the bound as tight as we like by choosinga proper value of N . For instance, assigning N = 100 impliesthat the probability of having more than 10 U -packets arrivingin a burst cycle is less than 0.0003.R EFERENCES[1] K. Karras, T. Wild, and A. Herkersdorf, “A folded pipeline networkprocessor architecture for 100 gbit/s networks,” in
ANCS , 2010, p. 2.[2] C. Kozanitis, J. Huber, S. Singh, and G. Varghese, “Leaping multipleheaders in a single bound: wire-speed parsing using the kangaroo system,”in
INFOCOM , 2010, pp. 830–838.[3] R. Niranjan Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri,S. Radhakrishnan, V. Subramanya, and A. Vahdat, “Portland: a scalablefault-tolerant layer 2 data center network fabric,” in
ACM SIGCOMMComputer Communication Review , vol. 39, 2009, pp. 39–50.[4] M. Yu, J. Rexford, M. J. Freedman, and J. Wang, “Scalable flow-basednetworking with difane,”
ACM SIGCOMM Computer CommunicationReview , vol. 41, no. 4, pp. 351–362, 2011.[5] M. Casado, M. J. Freedman, J. Pettit, J. Luo, N. Gude, N. McKeown,and S. Shenker, “Rethinking enterprise network control,”
IEEE/ACMTransactions on Networking (TON) , vol. 17, no. 4, pp. 1270–1283, 2009.[6] P. Chuprikov, S. Nikolenko, and K. Kogan, “Priority queueing withmultiple packet characteristics,” in
INFOCOM , 2015, pp. 1418–1426.[7] A. Shpiner, I. Keslassy, and R. Cohen, “Scaling multi-core networkprocessors without the reordering bottleneck,” in
HPSR , 2014, pp. 146–153.[8] D. D. Sleator and R. E. Tarjan, “Amortized efficiency of list update andpaging rules,”
Comm. of the ACM , vol. 28, no. 2, pp. 202–208, 1985.[9] A. Borodin and R. El-Yaniv,
Online computation and competitive analysis .cambridge university press, 2005.[10] W. A. Aiello, Y. Mansour, S. Rajagopolan, and A. Ros´en, “Competitivequeue policies for differentiated services,” in
INFOCOM , vol. 2. IEEE,2000, pp. 431–440. [11] A. Kesselman, Z. Lotker, Y. Mansour, B. Patt-Shamir, B. Schieber, andM. Sviridenko, “Buffer overflow management in qos switches,”
SIAMJournal on Computing , vol. 33, no. 3, pp. 563–583, 2004.[12] Y. Mansour, B. Patt-Shamir, and O. Lapid, “Optimal smoothing schedulesfor real-time streams,” in
Proceedings of the nineteenth annual ACMsymposium on Principles of distributed computing . ACM, 2000, pp.21–29.[13] S. Albers and M. Schmidt, “On the performance of greedy algorithmsin packet buffering,”
SIAM Journal on Computing , vol. 35, no. 2, pp.278–304, 2005.[14] Y. Azar and Y. Richter, “An improved algorithm for cioq switches,” in
ESA , 2004, pp. 65–76.[15] A. Kesselman, K. Kogan, and M. Segal, “Packet mode and qos algorithmsfor buffered crossbar switches with fifo queuing,”
Distributed Computing ,vol. 23, no. 3, pp. 163–175, 2010.[16] Y. Kanizo, D. Hay, and I. Keslassy, “The crosspoint-queued switch,” in
INFOCOM . IEEE, 2009, pp. 729–737.[17] A. Kesselman, B. Patt-Shamir, and G. Scalosub, “Competitive buffermanagement with packet dependencies,”
Theoretical Computer Science ,vol. 489–490, pp. 75–87, 2013.[18] Y. Mansour, B. Patt-Shamir, and D. Rawitz, “Overflow management withmultipart packets,”
Computer Networks , vol. 56, no. 15, pp. 3456–3467,2012.[19] M. H. Goldwasser, “A survey of buffer management policies for packetswitches,”
ACM SIGACT News , vol. 41, no. 1, pp. 100–128, 2010.[20] I. Keslassy, K. Kogan, G. Scalosub, and M. Segal, “Providing performanceguarantees in multipass network processors,”
IEEE/ACM Transactionson Networking (TON) , vol. 20, no. 6, pp. 1895–1909, 2012.[21] K. Kogan, A. L´opez-Ortiz, S. Nikolenko, G. Scalosub, and M. Segal,“Balancing work and size with bounded buffers,” in
COMSNETS , 2014.[22] Y. Azar and O. Gilon, “Buffer management for packets with processingtimes,” in
ESA , 2015, pp. 47–58.[23] Y. Azar, I. R. Cohen, and I. Gamzu, “The loss of serving in the dark,”in
STOC , 2013, pp. 951–960.[24] Y. Azar and I. R. Cohen, “Serving in the dark should be done non-uniformly,” in
ICALP , 2015, pp. 91–102.[25] K. Pruhs, “Competitive online scheduling for server systems,”
ACMSIGMETRICS Perf. Eval. Review , vol. 34, no. 4, pp. 52–58, 2007.[26] A. C.-C. Yao, “Probabilistic computations: Toward a unified measure ofcomplexity,” in
FOCS , 1977, pp. 222–227.[27] B. Awerbuch, Y. Bartal, A. Fiat, and A. Ros´en, “Competitive non-preemptive call control.” in
SODA , 1994, pp. 312–320.28] J. S. Vitter, “Random sampling with a reservoir,”
ACM Transactions onMathematical Software (TOMS) , vol. 11, no. 1, pp. 37–57, 1985.[29] R. Ramaswamy, N. Weng, and T. Wolf, “Analysis of network processingworkloads,”
Journal of Systems Architecture , vol. 55, no. 10, pp. 421–433,2009.[30] M. E. Salehi and S. M. Fakhraie, “Quantitative analysis of packet-processing applications regarding architectural guidelines for network-processing-engine development,”
Journal of Systems Architecture , vol. 55,no. 7, pp. 373–386, 2009.[31] M. E. Salehi, S. M. Fakhraie, and A. Yazdanbakhsh, “Instruction setarchitectural guidelines for embedded packet-processing engines,”
Journalof Systems Architecture , vol. 58, no. 3, pp. 112–125, 2012.[32] D. P. Williamson and D. B. Shmoys,