Efficient Packet Transmission in Wireless Ad Hoc Networks with Partially Informed Nodes
NNoname manuscript No. (will be inserted by the editor)
Efficient Packet Transmission in Wireless Ad HocNetworks with Partially Informed Nodes
Sara Berri · Samson Lasaulce · Mohammed Said Radjef
Received: date / Accepted: date
Abstract
One formal way of studying cooperation and incentive mechanismsin wireless ad hoc networks is to use game theory. In this respect, simple inter-action models such as the forwarder’s dilemma have been proposed and usedsuccessfully. However, this type of models is not suited to account for possiblefluctuations of the wireless links of the network. Additionally, it does not allowone to study the way a node transmits its own packets. At last, the repeatedgame models used in the related literature do not allow the important scenarioof nodes with partial information (about the link state and nodes actions) tobe studied. One of the contributions of the present work is precisely to providea general approach to integrate all of these aspects. Second, the best perfor-mance the nodes can achieve under partial information is fully characterizedfor a general form of utilities. Third, we derive an equilibrium transmissionstrategy which allows a node to adapt its transmit power levels and packet for-warding rate to link fluctuations and other nodes actions. The derived resultsare illustrated through a detailed numerical analysis for a network model builtfrom a generalized version of the forwarder’s dilemma. The analysis shows inparticular that the proposed strategy is able to operate in presence of chan-
S. BerriResearch Unit LaMOS (Modeling and Optimization of Systems), Faculty of Exact Sciences,University of Bejaia, Bejaia, 06000, AlgeriaL2S (CNRS-CentraleSupelec-Univ. Paris Saclay), Gif-sur-Yvette, FranceE-mail: [email protected]
Present address: Telecom ParisTech-Univ. Paris Saclay, Paris, France
S. LasaulceL2S (CNRS-CentraleSupelec-Univ. Paris Saclay), Gif-sur-Yvette, FranceE-mail: [email protected]. RadjefResearch Unit LaMOS (Modeling and Optimization of Systems), Faculty of Exact Sciences,University of Bejaia, Bejaia, 06000, AlgeriaE-mail: [email protected] a r X i v : . [ c s . N I] D ec Sara Berri et al. nel fluctuations and to perform significantly better than existing transmissionmechanisms (e.g., in terms of consumed network energy).
Keywords
Packet transmission, Power control, Game theory, Repeated games, Incen-tive mechanisms, Wireless ad hoc networks.
In wireless ad hoc networks, nodes are interdependent. One node needs theassistance of neighboring nodes to relay the packets or messages it wants tosend to the receiver(s). Therefore, nodes are in the situation where they haveto relay packets, but have at the same time to manage the energy they spendfor helping other nodes. To capture the tradeoff between a cooperative behav-ior (which is necessary to convey information through an ad hoc network) anda selfish behavior (which is necessary to manage the node energy), the authorsof [2] proposed a simple but efficient model. Their modeling has been foundto be very important and insightful in the literature of ad hoc networks, asadvocated by the many papers where it is exploited. The model consists inassuming, whatever the size of the network, that the local node interactiononly involves two neighboring nodes having a decision-making role; one of thevirtues of considering the interaction to be local is the possibility of designingdistributed transmission strategies. In the original model of [2], a node hastwo possible choices namely, forward or drop the packets it receives from theneighboring node. As shown in [2], modeling the problem at hand as a gameappears to be natural and relevant; in the corresponding game, the node util-ity function consists of the addition of a data rate term (which is maximizedwhen the other node forwards its packets) and an energy term (which is max-imized when the node does not forward the packets of the other node). Atthe Nash equilibrium of the corresponding strategic-form static game (calledthe forwarder’s dilemma in the corresponding literature), nodes don’t transmitat all [2]. To avoid this, cooperation has to be stimulated e.g., by studyingthe repeated interaction between the nodes [2], [3], [4] or by implementingincentive mechanisms [4], [5], [6], [7], [8], [9], [10]. The vast majority of incen-tive mechanisms either rely on the idea of reputation [4], [6], [11] or the useof a credit system [5], [7], [12], [13]. While providing an efficient solution, allthe corresponding models still have some limitations, especially regarding linkquality fluctuations and partial knowledge at the nodes; indeed, they do nottake into account the quality of the link between the transmitting and thereceiving nodes, which may be an important issue since the link quality maystrongly fluctuate if it is wireless. The solution in [8] referred to as ICARUS(for hybrId inCentive mechAnism for coopeRation stimUlation in ad hoc net-workS), combines the two ideas namely, reputation and credit system, but it isnot suited to scenarios where the actions of the other nodes are not perfectlyobserved, which results e.g., in inappropriate punishment (a node is declared itle Suppressed Due to Excessive Length 3 selfish while it is cooperative) and therefore in a loss of efficiency. Additionally,in [8], when a node is out of credit, the transmission is blocked and the nodecannot send any packet anymore; this might be not practical in some wirelessnetworks where a certain quality-of-service has to be provided. Also [8] pro-poses a mechanism to regulate the credit when a node has an excessive numberof credits, but the proposed mechanism may be too complex. At last but notleast, in [8] no result is provided on the strategic stability property, whichis important and even necessary to make the network robust against selfishdeviations. The purpose of this paper is precisely to overcome the limitationsof the aforementioned previous works. More precisely, the contributions of thepresent paper are as follows. (cid:73)
The first key technical difference with the closely related works is that theproposed formulation accounts for the possible presence of quality fluctua-tions of the different links that are involved in the considered local interactionmodel. In particular, this leads us to a game model which generalizes the exist-ing models since the forwarding game has now a state and the discrete actionsets are arbitrary, not just binary; additionally, the node does not only choosethe cooperation power but also the power used to send its own packets. (cid:73)
An important and useful contribution of the paper is to characterize thefeasible utility region of the considered problem, by exploiting implementabil-ity theorems provided by recent works [14], [15], [16]. This problem is knownto be non-trivial in the presence of partial information and constitutes a deter-mining element of folk theorems; this difficult problem turns out to be solvablein the proposed reasonable setting (the channel gains are i.i.d. and the obser-vation structure is memoryless). The knowledge of the utility region is veryuseful since it allows one to measure the efficiency of any distributed algorithmrelying on the assumed partial information. (cid:73)
A third contribution of the paper is that we provide a new transmissionstrategy whose main features is to be able to deal with the presence of fluc-tuating link qualities and to be efficient. To design the proposed strategy, weshow that the derived utility region can be used in a constructive manner toobtain efficient operating points, and propose a new incentive mechanism toensure that these points are equilibrium points. The proposed incentive mech-anism combines the ideas of credit and reputation. To our knowledge, theclosest existing incentive mechanism to the one proposed in the present paperis given in [8]. Here, we go further by dealing with the problem of imperfectobservation and that of credit outage or excess. Indeed, the credit evolutionlaw we propose in this paper prevents, by construction, the number of creditsfrom being too large; therefore one does not need to resort to an additionalcredit regulation mechanism, which may be too complex. (cid:73)
In addition to the above analytical contributions, we provide a numericalstudy which demonstrates the relevance of the proposed approach. Comparedto the closest transmission strategies, significant gains are obtained both interms of packet forwarding rate, network consumed power, and combined util-
Sara Berri et al. ities. As a sample result, the network consumed power is shown to be dividedby more than two w.r.t. state-of-the art strategies [8] and [17].The remainder of the paper is organized as follows. In Sec. 2, we present thesystem model; the assumed local interaction model involves two neighboringnodes of an ad hoc network with arbitrary size and generalizes the famousmodel introduced in [2]. The associated static game model is also providedin Sec. 2. In Sec. 3, the repeated game formulation of the generalized packetforwarding problem is provided; one salient feature of the proposed model isthat partial information is assumed both for the network state and the nodesactions. In Sec. 4, the feasible utility region of the studied repeated gamewith partial observation is fully characterized. We also provide an algorithmto determine power control policies that are shown to be globally efficientin Sec. 6. The proposed incentive mechanism and equilibrium transmissionstrategy are provided in Sec. 5; the proposed transmission strategy allows boththe packet forwarding rate and the transmit power to be adapted. A detailednumerical performance analysis is conducted in Sec. 6. Sec. 7 concludes thepaper.
The present work concerns wireless ad hoc networks namely, networks in whicha source node needs the assistance of other nodes to communicate with thedestination node(s). As well motivated in related papers such as [4], [18], [19],we will assume the interaction among nodes to be local i.e., it only involvesneighboring nodes. This means that the network can have an arbitrary sizeand topology but a node only considers local interactions to take its decisionalthough it effectively interacts with more nodes. One of the virtues of suchan interaction model is to be able to design distributed transmissionstrategies for every node. More specifically, we will assume the famous modelof [2] in which local interactions take place in a pairwise manner, which notonly allow us to design distributed strategies but also to easily compare theproposed transmission strategy with existing strategies. The key idea of thisrelevant model is to take advantage of the fact that the network is wirelessto simplify the interaction model. For a given node, the dominant interactionwill only involve its closest neighbors (see Fig. 1). If several neighboring nodeslie within the radio range of the considered node, then it is assumed to haveseveral pairwise interactions in parallel, as explained in detail in the numericalpart.The nodes are assumed to be non-malicious i.e., each of them does not aimat damaging the communication of the other. Additionally, they are assumedto operate in an imperfect promiscuous mode, which means that each nodeimperfectly overhears all packets forwarded by their neighbors. The proposedmodel generalizes the model [2] for at least four reasons. First, the action ofa node has two components instead of one: the transmit power used to helpthe other node, which is denoted by p (cid:48) i ; the transmit power used to send its itle Suppressed Due to Excessive Length 5 (cid:1) (cid:2) (cid:3) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:2) (cid:1) (cid:3) (cid:1) (cid:4) (cid:3) (cid:1) (cid:4) (cid:2) (cid:4) (cid:5) (cid:2)(cid:1) (cid:2)(cid:2) (cid:2)(cid:6)(cid:7)(cid:8) (cid:9)(cid:10) (cid:11)(cid:11) (cid:12) (cid:13) (cid:14) (cid:15) (cid:16) (cid:17) (cid:18) (cid:14) (cid:19) (cid:20) (cid:13) (cid:1) (cid:6) (cid:21)(cid:19)(cid:16)(cid:18)(cid:22)(cid:15)(cid:11)(cid:18)(cid:20)(cid:16)(cid:16)(cid:15)(cid:23)(cid:24)(cid:20)(cid:13)(cid:25)(cid:19)(cid:13)(cid:26)(cid:11)(cid:14)(cid:20)(cid:11)(cid:27) (cid:28)(cid:14)(cid:29)(cid:15)(cid:11)(cid:16)(cid:17)(cid:25)(cid:19)(cid:20)(cid:11)(cid:16)(cid:17)(cid:13)(cid:26)(cid:15)(cid:11)(cid:20)(cid:30)(cid:11)(cid:31)(cid:20)(cid:25)(cid:15)(cid:11)(cid:2) Fig. 1
In this example, the focus is on what Node 1 does to allow Node 0 (source) to routeits packets to Node 4 (destination). The dashed circle represents the radio range for Node1 and defines its neighbors. To ensure a distributed design, two key elements are exploited:a) Node 1 adopts its transmission behavior to each of its neighbors. Here, Node 1 interactswith Node 2 (indicated by the dotted box); b) Only the available knowledge of the qualityof the most influential links is accounted for (denoted generically by h , h (cid:48) , h , h (cid:48) ) own packets, which is denoted by p i . Second, the transmit power levels arenot assumed to be binary but to lie in a general discrete set P i = P (cid:48) i = P = { P , P , ..., P L } = { P min , . . . , P max } , |P i | = |P (cid:48) i | = |P| = L . Assuming that thesets are discrete is of practical interest, as there exist wireless communicationstandards in which the power can only be decreased or increased by step andin which quantized wireless channel state information (CSI) is used (see e.g.,[20], [21]). Similarly, the channel may be quantized to define operating modes(e.g., MCS -modulation coding scheme) used by the transmitter. Even whenthe effective channel is continuous, assuming it to be discrete in the modeland algorithm part may be very relevant. At last, note that from the limitingperformance characterization point of view, the analysis of the continuouscase follows from the discrete case but the converse is not true [22]. As a thirdnew feature compared to [2], the considered model accounts for the possiblefluctuations of the quality of each link. With each link a non-negative scalar isassociated, which is called the channel gain of the considered link. For a node,the channel gains of the links used to send its own packets and to help the othernode, are denoted by h i and h (cid:48) i , respectively. These channel gains are assumedto lie in discrete sets (of states): H i = H (cid:48) i = H = { h min , . . . , h max } with |H i | = |H (cid:48) i | = |H| = H ; the realizations of each channel gain will be assumed to bei.i.d.. Technically, continuous channel gains might be assumed. But, as done inthe information theory literature for establishing coding theorems, we addressthe discrete case in the first place, since the continuous case can be obtained by The notation | . | stands for the cardinality of the considered set. Sara Berri et al. classical arguments (such as assuming standard probability spaces), whereasthe converse is not true. Now, from the practical aspect, quantizing the channelgains typically induces a small performance loss compared to the continuouscase; one figure assuming a typical scenario illustrates this. The correspondingchannel gain model naturally applies to time-selective frequency flat fadingsingle-input single-output channels. If the channel gain is interpreted as thecombined effect of path loss and shadowing, our model can also be used tostudy more general channel models such as MIMO channels. Fourth, the utilityfunction of a node has a more general form than in the forwarder’s dilemma.The instantaneous utility function for Node i ∈ { , } expresses as follows: u i ( a , a , a ) = ϕ (SNR i ) − α ( p i + p (cid:48) i ) , (1)where: – a = ( h , h (cid:48) , h , h (cid:48) ) is the global channel or network state . The correspond-ing set will be denoted by A = H × H (cid:48) × H × H (cid:48) = H ; – a i = ( p i , p (cid:48) i ) is the action of Node i ∈ { , } ; – the function ϕ is a communication efficiency function which representsthe packet success rate . It is assumed to be increasing and lie in [0 , ϕ is for example, ϕ ( x ) = (1 − e − x ) (cid:96) , (cid:96) being the number ofsymbols per packet (see e.g., [23], [24], [25]) or ϕ ( x ) = e − cx with c = 2 r − r being the spectral efficiency in bit/s per Hz [26]; – for i ∈ { , } , the quantity SNR i is, for Node i , the equivalent signal-to-noise ratio (SNR) at the next node after the neighbor. It is assumed toexpress as follows: SNR i = p i h i p (cid:48)− i h (cid:48)− i σ , (2) σ being the noise variance and the index notation − i standing for theindex of the other node. Remark 1.
The results derived in Sec. 4 hold for any utility function underthe form u i ( a , a , a ) (under some assumptions which only concern the obser-vation structure) and not only for the specific choice made above. This choiceis made to allow comparisons with existing results (and more specifically withthe large set of contributions on the forwarder’s dilemma) to be conductedand discussed. Remark 2.
The assumed expression of the SNR is also one possible pragmaticchoice but all the analytical results derived in this paper hold for an arbitrarySNR expression of the form SNR i ( a , a , a ); this choice is sufficiently generalto study the problem of channel fluctuations which is the main feature to beaccounted for. The proposed expression is relevant e.g., when nodes imple-ment the amplify-and-forward protocol to relay the signals or packets [27].This simple but reasonable model for the SNR may either be seen as an ap-proximation where the single-hop links dominate the multi-hop links or thetalk/listen phases are scheduled appropriately. If another relaying protocol isimplemented such as decode-and-forward, other expressions for the equivalent itle Suppressed Due to Excessive Length 7 SNR may be used (see e.g., [28]) without questioning the validity of the an-alytical results provided in this paper. At last, the parameter α ≥ u , u ) defines a strategic-form static game (seee.g., [29]) in which the players are Nodes 1 and 2 and the action sets arerespectively A = P and A = P . This game generalizes the forwarder’sdilemma introduced in [2]. The latter can be retrieved by assuming that ϕ is a step function, p (cid:48) i is binary, p i is constant, and all the channel gains areconstant. In the next section, we describe mathematically the problem underinvestigation. It is shown how the problem can be modeled by a repeated game,which is precisely built on the stage or static game: G = ( N , {A i } i ∈N , { u i } i ∈N ) , (3)where N = { , } .The unique Nash equilibrium of G is p i, NE = P min and p i (cid:48) , NE = P min . If theminimum power P min is taken to be zero, then the situation where the nodes donot transmit at all corresponds to the equilibrium (and thus ( u , u ) = (0 , The problem we want to solve in this paper is as follows. It is assumed that thenodes interact over an infinite number of stages. Over stage t ∈ { , , ..., T } , T → ∞ , the channel gains are assumed to be fixed while the realizations ofeach channel gain are assumed to be i.i.d. from stage to stage. During a stage,a node typically exchanges many packets with its neighbors. At each stage, anode has to make a decision based on the knowledge it has. In full generality,the decision of a node consists in choosing a probability distribution over itsset of possible actions. The knowledge of a node is in terms of global channelstates and actions chosen by the other node. More precisely, it is assumed thatNode i ∈ N , has access to a signal which is associated with the state a and isdenoted by s i ∈ S i , |S i | < ∞ . At stage t , the observation s i ( t ) ∈ S i thereforecorresponds to the image (i.e., the knowledge) that Node i has about the globalchannel state a ( t ). This signal is assumed to be the output of a memorylessobservation structure [27] whose conditional probability is denoted by (cid:107) i : (cid:107) i ( s i | a ) = Pr[ S i = s i | A = a ] , (4)where capital letters stand for random variables whereas, small letters stand forrealizations. Simple examples for s i are: s i = h i ; s i = (cid:98) h i , (cid:98) h i being an estimate The memoryless assumption means that for sequences of realizations of size t ( t beingarbitrary), Pr( y ti | a t , a t , a t ) = Π tt (cid:48) =1 Pr( y i ( t (cid:48) ) | a ( t (cid:48) ) , a ( t (cid:48) ) , a ( t (cid:48) )). Sara Berri et al. of h i ; s i = ( h i , h (cid:48) i ); s i = a = ( h , h (cid:48) , h , h (cid:48) ). Now, in terms of observedactions, it is assumed that Node i ∈ N has imperfect monitoring. In general,Node i ∈ N has access to a signal y i ∈ Y i , |Y i | < ∞ , which is assumed to be theoutput of a memoryless observation structure whose conditional probability isdenoted by Γ i : Γ i ( y i | a , a , a ) = Pr[ Y i = y i | ( A , A , A ) = ( a , a , a )] . (5)The reason why we distinguish between the observations s i and y i comes fromthe assumptions made in terms of causality. Indeed, practically speaking, itis relevant to assume that a node has access to the past realizations of s i inthe wide sense namely, to s i (1) , ..., s i ( t ) at stage t . However, only the pastrealizations in the strict sense y i (1) , ..., y i ( t −
1) are assumed to be known atstage t . Otherwise, it would mean that a node would have access to the imageof its current action and that of the others before choosing the former.At this point, it is possible to define completely the problem to be solved.The problem can be tackled by using a strategic-form game model, which isdenoted by G . As for the static game G on which the repeated game model G is built on, the set of players is the set of nodes N = { , } . The transmissionstrategy of the Node i is denoted by σ i and consists of a sequence of functionsand is defined as follows: σ i,t : S ti × Y t − i → ∆ ( P )( s ti , y t − i ) (cid:55)→ π i ( t ) , (6)where: – s ti = ( s i (1) , . . . , s i ( t )), y t − i = ( y i (1) , . . . , y i ( t − – ∆ ( P ) represents the unit simplex namely, the set of probability distribu-tions over the set P ; – π i ( t ) is the probability distribution used by the Node i at stage t to generateits action ( p i ( t ) , p (cid:48) i ( t )).The type of strategies we are considering is referred to as a behavior strategy inthe game theory literature, which means that at every game stage the strategyreturns a probability distribution. The associated randomness not only allowsone to consider strategies which are more general than pure strategies, butalso to model effects such as node asynchronicity for packet transmissions. Atlast, the performance of a node is measured over the long run and nodes aretherefore assumed to implement transmission strategies which aim at maxi-mizing their long-term utilities. The long-term utility function of Node i ∈ N is defined as: U i ( σ , σ ) = lim T →∞ T (cid:88) t =1 θ t E [ u i ( A ( t ) , A ( t ) , A ( t ))]= lim T →∞ T (cid:88) t =1 θ t (cid:88) a ,a ,a P t ( a , a , a ) u i ( a , a , a ) , (7)where: itle Suppressed Due to Excessive Length 9 – σ i stands for the transmission strategy of Node i ∈ N ; – it is assumed that the limit in (7) exists; – θ t is a sequence of weights which corresponds to a convex combinationthat is, 0 ≤ θ t < (cid:80) Tt =1 θ t = 1. For a repeated game with discount θ t = (1 − δ ) δ t and for a classical infinitely repeated game θ t = T ; – as already mentioned, capital letters stand for random variables whereas,small letters stand for realizations. Here A ( t ), A ( t ), and A ( t ) stand forthe random processes corresponding to the network state and the nodesactions; – the notation P t stands for the joint probability distribution induced by thestrategy profile ( σ , σ ) at stage t .This general model thus encompasses the two well-known models for the se-quence of weights which are given by the model with discount and the infiniteCesaro-mean. In the model with discount, note that the discount factor maymodel different phenomena but in a wireless ad hoc network, the most rel-evant effect to be modeled seems to be the uncertainty that there will be asubsequent iteration of the stage game, for example, connectivity to an accesspoint can be lost. With this interpretation in mind, the discounting factorrepresents e.g., the probability that the current round is not the last one or,in terms of mobility, it may also represent the probability that the nodes donot move for the current stage. Therefore it may model the departure or thedeath of a node (e.g., due to connectivity loss) for a given routing path. Moredetails about these interpretation can be found in [2], [29] while [30] providesa convincing technical analysis to sustain this probabilistic interpretation.At this point, we have completely defined the strategic-form of the repeatedgame that is, the triplet G = ( N , { Σ i } i ∈N , { U i } i ∈N ) , (8)where Σ i is the set of all possible transmission strategies for Node i ∈ N .One of the main objectives of this paper is to exploit the above formulationto find a globally efficient transmission scheme for the nodes. For this purpose,we will characterize long-term utility region for the problem under considera-tion. It is important to mention that the characterization of the feasible utilityregion of a dynamic game (which includes repeated games as a special case)with an arbitrary observation structure is still an open problem [31]. Remark-ably, as shown recently in [14], [15], the problem can be solved for an interestingclass of problems. It turns out that the problem under investigation belongsto this class provided that the channel gains evolve according to the classicalmodel of block i.i.d. realizations.In the next section, we show how to exploit [14], [15] to characterize thelong-term utility region and construct a practical transmission strategy. InSec. 5, we will show how to integrate the strategic stability property into thisstrategy, this property being important to ensure that selfish nodes effectivelyimplement the efficient strategies. We will refer to the stability of a point to single deviations as strategic stability.0 Sara Berri et al.
When the number of stages is assumed to be large, the random process associ-ated with the network state A (1) , A (2) , ..., A ( T ) is i.i.d. and, the observationstructure given by ( (cid:107) , (cid:107) , Γ , Γ ) is memoryless, some recent results can beexploited to characterize the feasible utility region of the considered repeatedgame and to derive efficient transmission strategies. The main difficulty todetermine the feasible utility region of G is to find the set of possible averagecorrelations between a , a , and a . Formally, the correlation averaged over T stages is defined by: P T ( a , a , a ) = 1 T T (cid:88) t =1 P t ( a , a , a ) , (9)where P t is the joint probability at stage t . More precisely, a key notion tocharacterize the attainable long-term utilities is the notion of implementability,which is given as follows. Definition 1
An average correlation Q is said to be implementable if thereexists a pair of transmission strategies ( σ , σ ) such that the average correlationinduced by these transmission strategies verifies: ∀ ( a , a , a ) ∈ A × A × A , lim T →∞ T T (cid:88) t =1 P t ( a , a , a ) = Q ( a , a , a ) . (10)Using the above definition, the following key result can be proved. Proposition 1
The Pareto-frontier of the achievable utility region of G isgiven by all the points under the form ( E Q λ ( u ) , E Q λ ( u )) , λ ∈ [0 , , where Q λ is a maximum point of W λ = λ E Q ( u ) + (1 − λ ) E Q ( u ) , (11) and each maximum point is taken in the set of probability distributions whichfactorize as follows: Q ( a , a , a ) = (cid:88) v,s ,s ρ ( a ) P V ( v ) × (cid:107) ( s , s | a ) × P A | S ,V ( a | s , v ) P A | S ,V ( a | s , v ) , (12) where: – λ denotes the relative weight assigned to the utility of the first player andcan be chosen arbitrarily depending on some prescribed choice e.g., in termsof fairness or global efficiency; – ρ is the probability distribution of the network state a ; itle Suppressed Due to Excessive Length 11 – (cid:107) is the joint conditional probability which defines the assumed observationstructure i.e., a probability which writes as (cid:107) ( s , s | a ) = Pr[( S , S ) = ( s , s ) | A = a ]; (13) – V ∈ V is an auxiliary random variable or lottery. (See the proof in the appendix). One interesting comment to be made con-cerns the presence of the ”parameter” or auxiliary variable V . The presence ofthe auxiliary variable is quite common in information-theoretic performanceanalyses and in game-theoretic analyses through the notion of external cor-relation devices (such as those assumed to implement correlated equilibria).Indeed, V ∈ V is an auxiliary random variable or lottery which can be provedto improve the performance in general (see [14] for more details). Such a lot-tery may be implemented by sampling a signal which is available to all thetransmitters e.g., an FM or a GPS signal.In (22), ρ and (cid:107) are given. Thus, W λ has to be maximized with respectto the triplet ( P A | S ,V , P A | S ,V , P V ). In this paper, we restrict our attentionto the optimization of ( P A | S ,V , P A | S ) for a fixed lottery P V and leave thegeneral case as an extension.The maximization problem of the functional W λ ( P A | S ,V , P A | S ,V ) withrespect to P A | S ,V and P A | S ,V amounts to solving a bilinear program. Thecorresponding bilinear program can be tackled numerically by using iterativetechniques such as the one proposed in [32], but global convergence is notguaranteed and therefore some optimality loss may be observed. Two otherrelevant numerical techniques have also been proposed in [33]. The first tech-nique is based on a cutting plane approach while the second one consists of anenlarging polytope approach. For both techniques, convergence may also be anissue since for the first technique, no convergence result is provided and for thesecond technique, cycles may appear [34]. To guarantee convergence and man-age the computational complexity issue, we propose here another numericaliterative technique namely, to exploit the sequential best-response dynamics(see e.g., [35] for a reference in the game theory literature, [29] for applicationsexamples in the wireless area, [36] for a specific application to power controlover interference channels). Here also, some efficiency loss may be observed,but it will be shown to be relatively small for the quite large set of scenar-ios we have considered in the numerical performance analysis. The sequentialbest-response dynamics applied to the considered problem translates into thefollowing algorithm. Algorithm 1. Initialization . The arguments of the functional W λ are fixed to an initialvalue: ( P A | S ,V , P A | S ,V ) = ( P (0) A | S ,V , P (0) A | S ,V ) . Iteration . At iteration n ≥ , P ( n ) A i | S i ,V is updated by being chosen in theargmax of W λ ( P A i | S i ,V , P ( n − A − i | S − i ,V ) . If there are several maximum points,choose one of them randomly and according to a uniform law. Note that (cid:107) and (cid:107) are directly obtained from (cid:107) by a simple marginalization operation.2 Sara Berri et al. Stopping criterion . | W λ ( P ( n ) A i | S i ,V , P ( n ) A − i | S − i ,V ) − W λ ( P ( n − A i | S i ,V , P ( n − A − i | S − i ,V ) | <η for some η ≥ . Although suboptimal in general (as the available state-of-the art tech-niques), the proposed technique is of particular interest for at last three rea-sons. First, convergence is unconditional. It can be proved to be guaranteede.g., by induction or by identifying the proposed procedure as an instanceof the sequential best-response dynamics for an exact potential game (anygame with a common utility is an exact potential game). Second, conver-gence points are local maximum points but in all the simulations performedthose maximums had the virtue of not being too far from the global max-imum. At last but not least, it allows us to build a practical transmissionstrategy which outperforms all the state-of-the art transmission strategies, asexplained next. Note that this is necessarily the case when Algorithm 1 isinitialized with the state-of-the art transmission strategy under consideration.Although we will not tackle the classical issue of the influence of initializationon the convergence point, it is worth mentioning that many simulations haveshown that the impact of the initial point on the performance at convergenceis typically small, at least for the utilities under consideration. Therefore, ini-tializing Algorithm 1 with naive strategies such as transmitting at full power ∀ t ∈ { , ..., T } , ( a ( t ) , a ( t )) = ( P max , P max , P max , P max ) is well suited. Remark 3.
Algorithm 1 would typically be implemented offline in prac-tice. The purpose of Algorithm 1 is to generate decision functions which areexploited by the proposed transmission strategy. To implement Algorithm 1,only statistics need to be estimated in practice (namely, the channel distri-bution ρ and the observation structure conditional distribution (cid:107) ); estimatingstatistics such as the channel distribution information is known to be a classicalissue in the communications literature. The main purpose of this section is to obtain globally efficient transmissionstrategies. Here, global efficiency is measured in terms of social welfare namely,in terms of the sum U + U . This corresponds to choosing λ = . This choiceis pragmatic and follows to what is often done in the literature; it implicitlymeans that the network nodes have the same importance. Otherwise, thisparameter can always be chosen to operate at the desired point of the utilityregion. Indeed, social welfare is a well-known measure of efficiency and it alsoallows one to build other famous efficiency measure of a distributed networksuch as the price of anarchy [37]. The proposed approach holds for any otherfeasible point of the utility region which is characterized in the precedingsection.The transmission strategy we propose comprises three ingredients: 1) awell-chosen operating point of the utility region; 2) the use of reputation [4], [6];3) the use of virtual credit [5], [7]. itle Suppressed Due to Excessive Length 13
1. The proposed operating point is obtained by applying the sequential best-response dynamics procedure described in Sec. 4 and choosing λ = 0 . |V| = 1. Each individual maximization operation provides a probabilitydistribution which is denoted by π (cid:63)i . Since W λ is linear in P A i | S i ,V , themaximization operation returns a point which is one of the vertices ofthe unit simplex. The corresponding probability distribution has thus aparticular form namely, that of a decision function under the form f (cid:63)i ( s i ).Therefore, when operating at this point, at stage t , Node i chooses itsaction to be a i ( t ) = f (cid:63)i ( s i ( t )). This defines for Node i a particular choiceof a lottery over its possible actions; this lottery is denoted by π (cid:63)i ( t ) and isthe unit simplex of dimension 2 L i.e., ∆ ( P ). By convention, the possibleactions for Node i are ordered according to a lexicographic ordering . Having π (cid:63)i ( t ) = (1 , , , ..., ∈ ∆ ( P ) means that action ( P , P ) is used withprobability 1 (wp1); having π (cid:63)i ( t ) = (0 , , , ..., ∈ ∆ ( P ) means thataction ( P , P ) is used wp1; ...; having π (cid:63)i ( t ) = (0 , , , ..., , ∈ ∆ ( P )means that action ( P L , P L ) is used wp1.2. The reputation (see e.g., [4], [8], [38]) of a neighboring node is evaluatedas follows. Over each game stage duration, the nodes exchanges a certainnumber of packets which is denoted by K . This number is typically large.Since each node has access to the realizations of the signal y i for eachpacket, it can exploit it to evaluate the reputation of the other node atstage t . In this section, we assume a particular observation structure Γ , Γ , which is tailored to the considered problem of packet forwarding in adhoc networks. We assume that the signal y i is binary: y i ∈ { D , F } . Let (cid:15) ∈ [0 ,
1] be a parameter which represents the probability of misdetection .If Node i chooses the action a min = ( P min , P min ) (resp. any other action of P i ), then with probability 1 − (cid:15) , Node − i receives the signal D (resp. F).With probability (cid:15) , Node − i perceives what we define as the action DropD (resp. Forward F) while the action Forward F (resp. Drop D) has beenchosen by Node i . Thus, Γ i takes the following form: Γ i ( y i | x , a , a ) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − (cid:15) if y i = F and a − i ∈ A F i , (cid:15) otherwise, (14)where A F i = P i × P i \ { } .Using these notations, Node i can compute the reputation of Node − i asfollows: R − i ( t ) = (1 − (cid:15) ) |{ y i = F }| + (cid:15) |{ y i = D }| K , (15)where |{ y i = F }| and |{ y i = D }| are respectively the numbers of occur-rences of the action Forward and Drop among the K packets Node i hasbeen needing the assistance of Node − i to forward its packets. The rep-utation R − i ( t ) is one of the tools we use to implement the transmissionstrategy which is described further. Note that one of the interesting featuresof the proposed mechanism is that reputation (15) of Node − i only exploits local observations (first-hand reputation information [38]); Node i does notneed any information about the behavior of its neighboring nodes. This con-trasts with the closest existing reputation mechanisms such as [4], [8] forwhich the reputation estimation procedure exploits information obtainedfrom other nodes (second-hand information [38]). The corresponding in-formation exchange induces additional signaling [8] and additional energyconsumption. At last, by using these techniques, selfish nodes may colludeand disseminate false reputation values.3. The idea of virtual credit is assumed to be implemented with a similarapproach to previous works [7], [8] namely, we assume that the nodes havean initial amount of credits, impose a cost in terms of spent credits fora node that wants to transmit through a neighbor at a certain frequencyor probability, and that are rewarded when they forward their neighbors’packets. The reward and cost assumed in this paper are defined next.The proposed transmission strategy is as follows. While the node has notenough credit, it adopts a cooperative decision rule, which corresponds to op-erating at the point we have just described. Otherwise, it adopts a signal-basedtit-for-tat decision rule. Existing tit-for-tat decision rules such as GTFT [17]do not take into account the possible existence of a state for the game andtherefore the existence of a signal associated with the realization of the state.Additionally, the proposed tit-for-tat decision rule also takes into account thefact that action monitoring is not perfect. In contrast with the conventionalsetup assumed to implement tit-for-tat or its variants, the action set of anode is not binary. Therefore we have to give a meaning to tit-for-tat in theconsidered setup. The proposed meaning is as follows. When Node i receivesthe signal D and Node − i has effectively chosen the action Drop it meansthat Node − i has chosen a − i = a min = ( P min , P min ). When Node i receivesthe signal Forward and Node − i has effectively chosen the action Forward itmeans that Node − i has chosen a − i = a (cid:63) − i = f (cid:63) − i ( s − i ). Implementing tit-for-tat for Node i means choosing a i = a min = ( P min , P min ) (which representsthe counterpart of the action Drop) if the Node − i is believed to have chosenthe action a − i = a min = ( P min , P min ). On the other hand, Node i chooses thebest action a (cid:63)i = f (cid:63)i ( s i ) when it perceives that Node − i has chosen the action a (cid:63) − i = f (cid:63) − i ( s − i ). Note that, contrarily to the conventional tit-for-tat decisionrule, the actions a (cid:63)i and a (cid:63) − i differ in general. Denoting by m i ( t ) ≥ i at stage t , the proposed strategy expresses formally as follows. Wewill refer to this transmission strategy as SARA (for State Aware tRAnsmis-sion strategy). Proposed transmission strategy (SARA). σ (cid:63)i,t ( s ti , y t − i ) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) π (cid:63)i ( t ) if t =0 or m i ( t ) <µ , (cid:98) π − i ( t −
1) otherwise, (16) where: itle Suppressed Due to Excessive Length 15 – the virtual credit m i ( t ) obeys the following evolution law: m i ( t ) = m i ( t −
1) + β < π i ( t − , e k > − βν i ( t − , (17) βν i ( t − represents the virtual monetary cost for Node i when its packetarrival rate is ν i ( t − , with β ≥ ; < ; > stands for the scalar product; e k is the k th vector of the canonical basis of R L namely, all componentsequal except the k th component which equals . The index k is given bythe index of action a (cid:63)i ( t ) = f (cid:63)i ( s i ( t )) ; – µ ≥ is a fixed parameter which represents the cooperation level of thenodes. A sufficient condition on µ and β to guarantee that the nodes havealways enough credits is that µ ≥ β ; – the distribution (cid:98) π − i ( t − is constructed as follows: (cid:98) π − i ( t −
1) = R − i ( t − π (cid:63)i ( t ) + [1 − R − i ( t − π min , (18) with π min = (1 , , , . . . , ∈ R L representing the pure action a min =( P min , P min ) . Comment 1.
The second term of the dynamical equation which definesthe credit evolution corresponds to the reward a node obtains when it forwardsthe packets of the other node. On the other hand, the third term is the cost paidby the node for asking the assistance of the other node to forward. The sameweight (namely, β ) is applied on both terms to incite the node to cooperate.Additionally, such a choice allows one from preventing cooperative nodes tohave an excessive number of credits, thus to avoid using a mechanism such asin [8] to regulate the number of credits. In [8], the credit excess occurs becausethe reward in terms of credits only depends on the node action and the costonly depends on the relaying node action. Thus, when the node is cooperativeand the relaying node is selfish, there will be a reward but not a cost. As forthe credit system, in practice it might be implemented either by assumingthe existence of an external central trusted entity [8], [13] that stores andmanages the nodes credits, or through a credit counter located in the nodeand maintained by a tamper-resistant security module (see e.g., [5], [7] formore details). Thus, in practice the operation of this module would not bealtered, because it would be designed so that information be accessible onlyby specific software containing appropriate security measures. Comment 2.
Depending on whether packet forwarding rate maximizationor energy minimization is sought, it is possible to tune the triplet of parameters( m i (0) , β, µ ) according to what is desired. In this respect, it can be checkedthat the best packet forwarding rate is obtained by choosing any triplet underthe form (2 β, β, β ) for any β >
0. But the power (which is defined by thequantity ANP defined through (21)) is then at its maximum. On the otherhand, if the triplet of parameters takes the form ( m i (0) , , m i (0) ≥ m i (0) , β, µ )therefore lead to various tradeoffs in terms of transmission rate and consumedpower. Comment 3.
The proposed strategy can always be used in practice whetheror not it corresponds to an equilibrium point of G . However, if the strategicstability property is desired, some conditions have to be added to ensure thatit corresponds to an equilibrium. Indeed, effectively operating at an efficientpoint in the presence of self-interested and autonomous nodes is possible if thelatter have no interest in changing their transmission strategy. More formally,a point which possesses the property of strategic stability or Nash equilibriumis defined as follows. Definition 2
A strategy profile ( σ NE1 , σ
NE2 ) is a Nash equilibrium point for G if ∀ i ∈ N , ∀ σ i ∈ Σ i , U i ( σ NE1 , σ
NE2 ) ≥ U i ( σ i , σ NE − i ) . (19)In order to obtain an explicit condition for the proposed strategy to be anequilibrium we consider, as the closest related works [4], a repeated game withdiscount. This also allows some effects such as the loss of network connectivityto be captured. Remarkably, for the repeated game model with discount, thesubgame perfection property is also available. This is useful in practice since itoffers some robustness in terms of node behavior. Indeed, this property makesthe equilibrium strategy robust against changes in terms of node behaviorwhich might occur during the transmission process; even if some deviationsfrom equilibrium occurred in the past, players have an interest in coming backto equilibrium. A necessary and sufficient condition for a strategy profile tobe subgame perfect equilibrium is given by the following result. Proposition 2
Assume that ∀ t ≥ , θ t = (1 − δ ) δ t , < δ < . The strategyprofile ( σ (cid:63) , σ (cid:63) ) defined by (16) is a subgame perfect equilibrium of G if andonly if: δ ≥ max (cid:26) c i (1 − (cid:15) ) r i , c − i (1 − (cid:15) ) r − i , (cid:27) , (20) where: c i = (cid:88) a ρ ( a )( u i − u ki ) , and r i = (cid:88) a ρ ( a )( u ki − u i ) . u i = u i ( a , a (cid:63) , a min ) , u ki = u i ( a , a (cid:63) , a (cid:63) ) , u i = u i ( a , a min , a (cid:63) ) , and a (cid:63)i = f (cid:63)i ( s i ) . (See the proof in the appendix) Comment 4.
The proposed transmission strategy is compatible with apacket delivery mechanism such as an ACK/NACK mechanism. Indeed, inthe definition of the transmission strategies (6), the observed signal y i maycorrespond to a binary feedback such as an ACK/NACK feedback. Indeed, y i corresponds to an image of ( a , a , a ). Such image might then be a bi-nary version of the receive SNR or SINR (e.g., if the receive SNR is greaterthan a threshold then the packet is well received and the corresponding feed-back signal y i will be ACK). More generally, a binary feedback of the formForward/Drop is completely compatible with the presence of ACK/NACK A subgame of the repeated game is a game that starts at a stage t with a given history.itle Suppressed Due to Excessive Length 17 feedback-type mechanisms. Simply, the signal Drop may combine the effectsof a selfish behavior and bad channel conditions. Comment 5.
This section shows that the proposed transmission strategyhas five salient features.1. First of all, in contrast with the related works on the forwarder’s dilemma,it is able to deal with the problem of time-varying link qualities.2. Second, it not only deals with the adaptation of the cooperation power p (cid:48) i of Node i (which is the power to forward the packets of the other node)but also the power to transmit its own packets p i .3. Third, the proposed strategy is built in a way to exploit the available ar-bitrary knowledge about the global channel state ( a ) as well as possible.The key observation for this is to exploit the provided utility region char-acterization. Ideally, the nodes should operate on the Pareto frontier. Thisis possible if a suited optimal algorithm is used.4. Fourth, the proposed transmission strategy is shown to possess the strate-gic stability property in games with discount under an explicit sufficientcondition on the discount factor. Note that, here again, each node has onlyimperfect monitoring of the actions chosen by the other node. Additionally,the equilibrium strategy is subgame perfect.5. Fifth, the proposed strategy does not induce any problem of credit outageor excess. Credit outage is avoided only if the conditions µ ≥ β and (20)are satisfied. Therefore, if there is no credit outage problem, there is notneed for assisting distant nodes as required in [8]. All simulations provided in this section have been obtained by an ad hoc simu-lator developed under
Matlab . The simulation setup we consider in this paperis very close to those assumed in the closest works and [4], [7], [8] in particular.The setup we assume by default is provided in Sec. 6.1. When other valuesfor some parameters are considered, this will be explicitly mentioned. In addi-tion to the simulation setup subsection, the simulation section comprises threesubsections. The first subsection (Sec. 6.2) aims at conducting a performanceanalysis in terms of utility function (1), which captures the tradeoff betweenthe transmission rate and the energy spent for transmitting. Sec. 6.3 focuseson the transmission rate aspect while Sec. 6.4 is dedicated to a performanceanalysis in terms of consumed network energy.6.1 Simulation setup assumed by defaultWe consider a network of N nodes. When N is considered to be fixed, it willbe taken to be equal to 50. The N nodes are randomly placed (accordingto a uniform probability distribution) over an area of 1000 × ; onlynetwork topology draws which guarantee every node to have a neighbor (in the sense of its radio range) are kept. The assumed topology corresponds toa random topology since the node locations are drawn from a given spatialdistribution law (which is uniform for the simulations). Each node only con-siders the behavior of its neighbors to choose its own behavior. As assumed inthe related literature, if a node has several neighbors, it is assumed to play agiven game with each of its neighbors. In fact, averaging the results over thenetwork topology realizations has the advantage of making the conclusionsless topology-dependent. Provided simulations are averaged over 1200 drawsfor the network topology. Routes are supposed to be fixed and known. Indeed,the proposed transmission strategy is compatible with any routing algorithm.One node can communicate with another node only if the inter-node distanceis less than the radio range, which is taken to be 150 m. When a node hasseveral neighbors, it may be involved in several routing paths, then it is as-sumed to play several independent forwarding games in parallel, and have agiven initial credit m i (0) for each neighbor. The credits are updated separatelybased on the corresponding forwarding game. This means that the credits anode receives by cooperating with one of its neighbors can only be used forforwarding via the considered neighbor. As a node without neighbors doesnot need credits and the nodes do not obtain an initial credit, the problem ofcredit excess is avoided. By default, 50% of the nodes are assumed to be self-ish but the network does not comprise any malicious node. The initial packetforwarding rate for cooperative nodes and selfish nodes are respectively set to1 and 0 .
1. Each source node transmits at a constant bit rate of 2 packets/s.For each draw for the network topology, the simulation is run for 1000 s. Thisperiod of time is made of 20 frames of 50 s. A frame corresponds to a gamestage and to a given draw for the channel vector h . The fact that channel gainsare assumed to fluctuate over time is a way of accounting for mobility; in thesimulations, they are assumed to evolve according to a (discrete version of the)Rayleigh fading law. Averaging over network topologies allows one to averagethe results over the path losses. Each channel gain is thus drawn according toan exponential law, which corresponds to a Rayleigh law for the amplitude;if one denotes by h i the considered channel gain, we have that h i ∼ h i e − hihi ,where h i = E ( h i ) represents the path loss effects. As mentioned above, thechannel gain is discrete and the discrete realizations are obtained by quantiz-ing the realizations given by a Rayleigh distribution. The effect of quantizationon the performance is typically small. Simulations, which are provided here,show that the loss induced by implementing Algorithm 1 by using quantizedchannel gains in the presence of actual channel gains which are continuous isabout a few percents for the size of channel gain sets used for the simulations.If d denotes the inter-node distance of the considered pair of nodes, then thepath loss is assumed to depend on the distance according to h i = const d + κ ; κ > d = 0. In prac-tice, κ may typically represent the antenna height. During each frame, 100packets are exchanged. Tab. 1 recaps the values chosen for the main networkparameters. itle Suppressed Due to Excessive Length 19 Table 1
Simulation SettingsNetwork parameters ValueSpace 1000 m × N = 50Radio range 150 mConst 10 κ m i (0) = 35Initial credit of ICARUS [8] m i (0) = 220Parameter cost β = 10Cooperation degree µ = 20Probability of misdetection (cid:15) = 0Packet arrival rate ν = 1Simulation time 1000 sFrame/stage duration 50 sGenerosity parameter of GTFT 0 . th of ICARUS [8] 0 . a of ICARUS [8] 0 . b of ICARUS [8] 2 . Concerning the game parameters, the following choices are made by de-fault. The parameter α is set to 10 − . The receive variance σ is always setto 0 .
1. The sets of possible power levels are defined by: ∀ i ∈ { , } , P i = P (cid:48) i , L = 10, P min = 0, P max = 10 W. The power increment is uniform overa dB scale, starting from the minimal positive power which is taken to beequal to 10 mW. The sets of possible channel gains are defined by: ∀ i ∈{ , } , H i = H (cid:48) i : H = 10, h min = 0 . h max = 10, and the channel gain in-crement equals − . . The different means of the channel gains are givenby: (¯ h i , ¯ h (cid:48) i , ¯ h − i , ¯ h (cid:48)− i ) = (1 , , , ϕ ( x ) = e − cx with c = 2 r − r being the spectral efficiencyin bit/s per Hz [26]. In the simulations provided we always have r = 1 bit/sper Hz; one simulation will assume a higher spectral efficiency namely, r = 3bit/s per Hz.6.2 Utility analysisHere, to be able to easily represent the utility region for the considered prob-lem and to be able to compare our approach with previous models (withthe well-known forwarder’s dilemma model [2] in particular), we consider twoneighboring nodes.The first question we want to answer is to know to what extent the abilityfor a node to properly adapt to the link qualities which have an impact onthe weighted utility w λ = λu + (1 − λ ) u , it is related to its knowledge aboutthese qualities i.e., the global channel state a = ( h , h (cid:48) , h , h (cid:48) ). To this end,we have represented in Fig. 2 , the achievable utility region under various information assumptions. The top curve in solid line represents the Paretofrontier which is obtained when implementing the transmission strategy givenby Algorithm 1 when ∀ i ∈ N , s i = a = ( h , h (cid:48) , h , h (cid:48) ) that is, each nodehas global CSI . The crosses correspond to the performance of the centralizedtransmission strategy , namely the best performance possible. It is seen that fora typical scenario the proposed algorithm does not involve any optimality loss.The curve in dashed line is obtained with Algorithm 1 under local CSI i.e., s i = ( h i , h (cid:48) i ). Interestingly, the loss for moving from global CSI to local CSIis relatively small. This shows that it is possible to implement a distributedtransmission strategy without sacrificing too much the global performance.This result is not obvious since the weighted utility w λ depends on the wholevector a . When no CSI is available (i.e., s i = constant), the incurred lossis more significant. Indeed, the curve with diamonds (which is obtained bychoosing for each λ ∈ [0 ,
1] the best action profile in terms of the expectedweighted utility (11)) shows that the gain in terms of sum-utility or socialwelfare when moving from no CSI to global CSI is about 10%. The pointmarked by a star indicates the operating point for which transmitting at fullpower a i = ( P max , P max , P max , P max ) is optimal under no CSI.As a second step, we compare the performance of SARA, ICARUS [8] andGTFT [17], that do not take into account the channel fluctuations. The threecorresponding equilibrium points are particular points of the achievable orfeasible utility region represented by Fig. 3 . The outer curve is the achievedutility region of Fig. 2 when Algorithm 1 is implemented under local CSI (it isthe same as the dashed line curve of Fig. 2). The social optimum correspondsto the point indicated by the small disk. The point marked by a square corre-sponds to the performance of SARA whereas, the points marked by a star anda diamond respectively represent the equilibrium points obtained when usingICARUS [8] and GTFT [17]. Note that the way the strategies ICARUS andGTFT have been designed is such that they are able to adapt the packet for-warding rates but not the transmit power level. As a consequence they cannotexploit any available knowledge in terms of CSI, which induces a quite signif-icant performance loss; it is assumed that GTFT and ICARUS use a pair ofactions ( a , a ) which maximizes the expected sum-utility. The gain obtainedby the proposed transmission strategy comes not only from the fact that thetransmit power level can adapt to the wireless link quality fluctuations, butalso from the proposed cooperation mechanism. The latter both exploits theidea of virtual credit and reputation, which allows one to obtain a better packetforwarding rate than ICARUS and GTFT. We elaborate more on this aspectin the next subsection. At last, when implementing a transmission strategybuilt from the one-shot game model given in Sec. 2, the NE of [2] would beobtained i.e., the operating point would be (0 , We therefore assume that the corresponding statistical knowledge is available and ex-ploited.itle Suppressed Due to Excessive Length 21 ) E ( u ) Centralized transmission strategy w. global CSISARA with global CSI and π i =1, i ∈ {1,2}SARA with local CSI and π i =1SARA without CSI and π i =1 Fig. 2
Achievable utility region under various scenarios of information for the nodes: globalCSI, local CSI, and no CSI. ) E ( u ) SARA with local CSI and π i =1, i ∈ {1,2}Social optimum of SARA with local CSISARA at equilibriumICARUS at equilibriumGTFT at equilibriumOne shot game equilibrium Fig. 3
Achievable utility region with local CSI and the repeated game equilibrium foreach strategy: SARA, ICARUS, and GTFT. The one-shot game Nash equilibrium is alsorepresented. The strategies ICARUS, and GTFT do not take into account the channelfluctuations.
Fig. 4 depicts the evolution of the packet forwarding rate for SARA,ICARUS, and GTFT for a network of 50 nodes. We look at the influenceof the fraction of selfish nodes. SARA is very robust to selfishness. Whateverthe fraction of selfish nodes, SARA provides a high performance in terms ofpacket forwarding rate. We see that ICARUS is less efficient than SARA interms of stimulating cooperation in the presence of selfish nodes, which showsthat the proposed punishment mechanism is effectively relevant. The GTFTstrategy performance decreases in a significant manner with the number ofselfish nodes. For the latter transmission strategy, it is seen that, when thenetwork is purely selfish, the operating packet forwarding rate is about 50%;this shows the significant loss induced by using a cooperation scheme which isnot very robust to selfishness.The robustness to observation errors is assessed. More precisely, we wantto evaluate the impact of not observing the action Forward or Drop perfectlyon the packet forwarding rate.
Fig. 5 depicts the packet forwarding rate asa function of the probability of misdetection (cid:15) (see (15)). When (cid:15) > µ , nodeskeep on cooperating. Estimating the forwarding rate does not intervene in thedecision process of a node. Note that we have only considered (cid:15) ≤ (cid:15) >
50% it is always possible, by symmetry,to decrease the effective probability of misdetection to (cid:15) (cid:48) = 1 − (cid:15) . For this, itsuffices to declare the used action to be Forward whereas, the action Drop wasobserved, and vice-versa.6.4 Consumed network energy analysisBased on the preceding two sections, we know that SARA provides improve-ments in terms of utility and packet forwarding rate. But the most significantimprovements are in fact obtained in terms of consumed energy, which is alsoobserved in [39], where time-varying channels are exploited to find the op-timal instants to communicate. Indeed, ICARUS and GTFT have not beendesigned to account for link quality fluctuations whereas, SARA both adaptsthe packet forwarding rate and the transmit power level using the parametersassumed by default, except for the path loss h i = const d + κ , where const=10 and κ = 5. In the current formulation of ICARUS and GTFT, the trans-mit power is fixed (as in Sec. 6.2) according to the best pair of actions interms of expected sum-utility. In this subsection, the advantage of adaptingthe power to the quality of the wireless link is clearly observed. Since theconsumed network energy is proportional to the network sum-power averaged itle Suppressed Due to Excessive Length 23 P a ck e t f o r w a r d i ng r a t e SARAICARUSGTFT
Fig. 4
Packet forwarding rate for different values for the proportion of selfish nodes. Theresults are averaged over 1200 executions, using the simulation setup assumed by default. P a ck e t f o r w a r d i ng r a t e SARAICARUSGTFT
Fig. 5
Packet forwarding rate for different values for the probability of misdetection (cid:15) . Theresults are averaged over 1200 executions, using the simulation setup assumed by default. over time, we will work with the average network power (ANP). Here we con-sider the total power which is effectively consumed by the node and not theradiated powers p i and p (cid:48) i (the consumed power therefore includes the cir-cuit power in particular). As explained e.g., in [40], [41], [42], a reasonableand simple model for relating the radiated power and the consumed poweris the affine model: p i, total = a ( p i + p (cid:48) i ) + b . The parameter b is very impor-tant since it corresponds to the power consumed by the node when no packetis transmitted; in [41], [42] it represents the circuit power whereas, in [40] itrepresents the node computation power. Here we assume as in [42] that b is comparable to the P max and choose the same typical values as in [42] namely, b = P max = 1 W. Eventually, the ANP is obtained by averaging the followingquantity (cid:80) Ni =1 { a [ p i ( t ) + p (cid:48) i ( t )] π i ( t ) + b } over all channel and network topol-ogy realizations, where N is always the number of nodes in the network and π i ( t ) the forwarding probability for stage or frame t :ANP = 1 T (cid:48) N (cid:88) i =1 { a [ p i ( t ) + p (cid:48) i ( t )] π i ( t ) + b } (21)where T (cid:48) corresponds to the number of realizations used for averaging. Here,this quantity is averaged over 1200 ×
20 stages, the number of network re-alizations being 1200 and the number of channel realizations being 20.
Fig.6 shows how the ANP in dBm scales with the number of nodes for SARA,ICARUS, and GTFT. It is seen that the ANP and therefore the total energyconsumed by the network can be divided by more than 2 (the gain is about 4dB to be more precise) showing the importance of addressing the problem ofpacket forwarding and power control jointly. A N P ( d B m ) ICARUSGTFTSARA
Fig. 6
The figure depicts the average total power (in dBm) or equivalently the total energyconsumed by the network against the total number of nodes in the network. It is seen thatSARA allows the energy consumed by the network to be divided by more than 2 whencompared to the state-of-the art strategies ICARUS, and GTFT that do not take intoaccount the channel fluctuations. itle Suppressed Due to Excessive Length 25 case; the continuous case follows by using standard information-theoretic ar-guments. But, from a practical point of view, it matters to assess the lossinduced by using an algorithm which exploits quantized channel gains insteadof continuous ones. Fig. XXX represents the performance in terms of XXX asa function of XXX. It is seen that the performance obtained by using discretechannel gains in the proposed algorithm whereas, the actual channel gains arecontinuous is typically small. Here the parameters have been chosen as follows:XXX....
One of the contributions of this work is to generalize the famous and insightfulmodel of the forwarder’s dilemma [2] by accounting for channel gain fluctua-tions. Therefore, the problem of knowledge about global channel state appears,in addition to the problem of imperfect action monitoring when the interac-tion is repeated. In this paper, we have seen that it is possible to characterizethe best performance of the studied system even in the presence of partial in-formation; the corresponding observation structure is arbitrary provided. Theobservations are generated by discrete observation structures denoted by (cid:107) and Γ . In terms of performance, designing power control policies which exploit theavailable knowledge as well as possible is shown to lead to significant gains.Since, we are in presence of selfish nodes, we propose a mechanism to stimulatecooperation among nodes. The proposed mechanism is both reputation-basedand credit-based. For the reputation aspect, one of the novel features of theproposed strategy is that it generalizes the concept of tit-for-tat to a contextwhere actions are not necessarily binary. For the credit aspect, we proposean evolution law for the credit which is shown to be efficient and robust toselfishness and especially imperfect action monitoring.From the quantitative aspect, the proposed transmission strategy (referredto as SARA) Pareto-dominates ICARUS and GTFT for the utility, the packetforwarding rate, and the energy consumed by the network. Significant gainshave been observed; one very convincing result is that the energy consumedby the network can be divided by 2 when the packet forwarding problem andthe power control problem are addressed jointly.This paper provides the characterization of the best performance in termsof transmission strategy under partial information. Although all performedsimulations show that the optimality loss appears to be small, there is noguarantee that the proposed algorithm provides an optimal solution of the op-timization problem to be solved to operate on the Pareto-frontier or the utilityregion. Providing such a guarantee would constitute a valuable extension ofthe present work. Another significant extension would be to relax the i.i.d.assumption on the network state. In this work, the network state correspondsto the global channel state and the i.i.d. assumption is known to be reasonablebut, in other setups, where the state represents e.g., a queue length, a buffer size, or a battery level, the used framework would need to be extended sinceMarkov decision processes would be involved. Appendix A: Proof of Proposition 1
First, it has to be noticed that long-term utilities are linear images of theimplementable distribution. Therefore, characterizing the achievable utilityregion amounts to characterizing the set of implementable distributions. Notethat the set of implementable characterization does not depend on the assumedchoice for the infinite sequence of weights ( θ t ) t ≥ , making the result valid forboth considered models of repeated games (namely, the classical infinitelyrepeated game and the model with discount).Second, to obtain the set of implementable distributions we exploit theimplementability theorem derived in [14]. Therein, it is proved that under themain assumptions of the present paper (namely, the network state is i.i.d. andthe observation structure is memoryless) a joint distribution is implementableif and only if it factorizes as in (22). That is, a joint probability distributionor correlation Q ( a , a , a ) is implementable if and only if it factorizes as: Q ( a , a , a ) = (cid:88) v,s ,s ρ ( a ) P V ( v ) × (cid:107) ( s , s | a ) × P A | S ,V ( a | s , v ) P A | S ,V ( a | s , v ) . (22)Third, a key observation to be made now is that if two probability distri-butions Q and Q are implementable, then the convex combination µQ +(1 − µ ) Q is implementable. Indeed, if there is a transmission strategy to im-plement Q and an other to implement Q then by using the first one T T of the time and the second one T − T T of the time, and making T large suchthat T T → µ , µQ + (1 − µ ) Q becomes implementable. It follows that thelong-term utility region is convex. Therefore, the Pareto frontier of the utilityregion, which characterizes the utility region, can be obtained by maximizingthe weighted utility W λ . This concludes the proof. Appendix B: Proof of Proposition 2
Proof
We want to prove the following result.The strategy profile ( σ (cid:63)i , σ (cid:63) − i ) is a subgame perfect equilibrium of ¯ G if andonly if: δ ≥ max (cid:26) c i (1 − (cid:15) ) r i , c − i (1 − (cid:15) ) r − i , (cid:27) , (23)where: c i = (cid:88) a ρ ( a )( u i − u ki ), and r i = (cid:88) a ρ ( a )( u ki − u i ). itle Suppressed Due to Excessive Length 27 u i = u i ( a , a (cid:63) , a min ), u ki = u i ( a , a (cid:63) , a (cid:63) ), u i = u i ( a , a min , a (cid:63) ), and a (cid:63)i = f (cid:63)i ( s i ).As a preliminary, we first review the one-shot deviation ”principle” in thecontext of interest. This ”principle” is one of the elements used to prove thedesired result. One-shot deviation principle:
For Node i the one-shot deviation prin-ciple from strategy σ i is a strategy ˜ σ i writes as: ∃ ! τ, ∀ t (cid:54) = τ σ i,t ( s ti , y t − i ) = ˜ σ i,t ( s ti , y t − i ) . (24)The two strategies ˜ σ i and σ i therefore produce identical actions except atstage τ . Definition 3
For Node i the one-shot deviation ˜ σ i from strategy σ i is notprofitable if: U i ( σ i , σ − i ) ≥ U i (˜ σ i , σ − i ) , (25)with ˜ σ i (cid:54) = σ i . Let us exploit the one-shot deviation principle to prove the result, since itis well known that a strategy profile σ is a subgame perfect equilibrium if andonly if there are no profitable one-shot deviations. Assume that for a givengame history, the distributions used by Nodes i and − i are respectively π i and π − i . Following the proposed strategy σ (cid:63)i,t defined by (15), and by using(14) and (17), one can obtain the distribution of a node i , π i ( t ), from π − i foreach stage t . As defined by relation (17), at each stage t , if m i ( t ) ≥ µ , Node i chooses a distribution π i ( t ) = ˆ π − i ( t −
1) stipulating that a min = ( P min , P min )and a (cid:63)i ( t ) = f (cid:63)i ( s i ( t )) are the only actions that could be chosen with a positiveprobability. Thus, it would be sufficient to provide only the k th component of π i ( t ), which is denoted by π ki ( t ). Note that k is the index of action a (cid:63)i ( t ) = f (cid:63)i ( s i ( t )).Thus, for t ≥ π ki ( t ) = , if m i ( t ) <µ ;(1 − (cid:15) ) t π − i + (cid:15) (cid:80) t − k =0 (1 − (cid:15) ) k , if mod ( t,
2) = 1;(1 − (cid:15) ) t π i + (cid:15) (cid:80) t − k =0 (1 − (cid:15) ) k , if mod ( t,
2) = 0 . Now, we define a one-shot deviation. We consider that Node i deviates unilat-erally at one stage from the proposed strategy σ (cid:63)i,t , by choosing ˜ σ i,t . If Node i deviates, it will be in order to save energy, thus it chooses a min with a higherprobability than the one provided by the proposed strategy σ (cid:63)i,t . Therefore, weconsider that ˜ σ i,t defines a distribution over the action set as follows (26):˜ π i ( t ) = π i ( t ) − d. ( − (cid:124)(cid:123)(cid:122)(cid:125) a min , , . . . , (cid:124)(cid:123)(cid:122)(cid:125) a (cid:63) , , . . . ) , (26)where: d ∈ [0 , π i ( t ) is the distribution given by σ (cid:63)i,t , and whose k th component is defined above. Using the one-shot deviation ˜ π − i ( t ), we have for t ≥ π ki ( t ) = , if m i ( t ) <µ ; π ki ( t ) , if mod ( t,
2) = 0; π ki ( t ) − d (1 − (cid:15) ) t − . if mod ( t,
2) = 1 . ˜ π k − i ( t ) = , if m i ( t ) <µ ; π k − i ( t ) − d (1 − (cid:15) ) t − , if mod ( t,
2) = 0; π k − i ( t ) . if mod ( t,
2) = 1 . Now, to accomplish the proof we need to define the associated expected utilitiesfor each stage provided by π i ( t ) and ˜ π i ( t ), which are denoted by u (cid:63)i,t and ˜ u i,t ,respectively. u (cid:63)i,t = (cid:88) a ,a ,a P t ( a , a , a ) u i ( a , a , a ) , (27)where: P t is the joint probability distribution, and u i the instantaneous utility(1). Denote by u ki = u i ( a , a (cid:63) , a (cid:63) ), u i = u i ( a , a (cid:63) , a min ), u i = u i ( a , a min , a (cid:63) ),and u min i = u i ( a , a min , a min ). a (cid:63)i = f (cid:63)i ( s i ), and a min = ( P min , P min ). By meansof these notations, we obtain: u (cid:63)i,t = (cid:88) a ρ ( a )[ π ki ( t ) π k − i ( t ) u ki + π ki ( t )(1 − π k − i ( t )) u i +(1 − π ki ( t )) π k − i ( t ) u i + (1 − π ki ( t ))(1 − π k − i ( t )) u min i ] . (28)We now define the expected utility of the deviation for each stage, denoted˜ u i,t .˜ u i,t = (cid:88) a ρ ( a )[˜ π ki ( t )˜ π k − i ( t ) u ki + ˜ π ki ( t )(1 − ˜ π k − i ( t )) u i +(1 − ˜ π ki ( t ))˜ π k − i ( t ) u i + (1 − ˜ π ki ( t ))(1 − ˜ π k − i ( t )) u min i ] . (29)As the deviation distribution ˜ π i depends on the distribution provided by theproposed strategy σ (cid:63)i , π i , one can also define ˜ u i,t as a function of u (cid:63)i,t , by usingthe definitions of π i ( t ) and ˜ π i ( t ). Hence, we have the following result:˜ u i,t = u (cid:63)i,t , if m i ( t ) <µ ; u (cid:63)i,t − (1 − (cid:15) ) t − d (cid:88) a ρ ( a ) ˘ U ( t ) if mod ( t,
2) = 0; u (cid:63)i,t − (1 − (cid:15) ) t − d (cid:88) a ρ ( a ) ˘ U ( t ) if mod ( t,
2) = 1, itle Suppressed Due to Excessive Length 29 where: ˘ U ( t ) = π ki ( t )( u ki − u i + u min i − u i ) + u i − u min i , and ˘ U ( t ) = π k − i ( t )( u ki − u i + u min i − u i ) + u i − u min i . Thus, the deviation utility of Node i in the repeated game ¯ G is: U i (˜ σ i , σ (cid:63) − i ) = u (cid:63)i, + z (cid:88) t =1 δ t ˜ u i,t + ∞ (cid:88) t = z +1 δ t u (cid:63)i,t , where: z is the number of stages until the condition m i < µ is satisfied. Theunilaterally deviation from the proposed strategy σ (cid:63)i is not profitable if: U i (˜ σ i , σ (cid:63) − i ) ≤ U i ( σ (cid:63)i , σ (cid:63) − i ) . (30)The equilibrium condition could be determined using relation (30). It is definedas follows: u (cid:63)i, + z (cid:88) t =1 δ t ˜ u i,t + ∞ (cid:88) t = z +1 δ t u (cid:63)i,t ≤ ∞ (cid:88) t =0 δ t u (cid:63)i,t . By substituting ˜ u i,t by its value, the equilibrium condition writes as: t = z − (cid:88) t =0 δ t ((1 − (cid:15) ) t d (cid:88) a ρ ( a ) ˘ U (2 t + 1))+ δ t = z − (cid:88) t =0 δ t ((1 − (cid:15) ) t +1 d (cid:88) a ρ ( a ) ˘ U (2 t + 2)) ≥ . (31)We have, ˘ U (2 t + 1) = π ki (2 t + 1)( u ki − u i + u min i − u i ) + u i − u min i , and˘ U (2 t + 2) = π k − i (2 t + 2)( u ki − u i + u min i − u i ) + u i − u min i . We provide resultsfor π ki (2 t + 1) = π k − i (2 t + 2) = 1 , which implies that relation (31) is satisfiedfor each π ki (2 t + 1) and π k − i (2 t + 2). With this assumption, the relation (31)becomes: (cid:88) a ρ ( a )( u ki − u i ) t = z − (cid:88) t =0 δ t (1 − (cid:15) ) t + (cid:88) a ρ ( a )( u ki − u i ) δ t = z − (cid:88) t =0 δ t (1 − (cid:15) ) t +1 ≥ . This is satisfied if and only if: δ ≥ (cid:88) a ρ ( a )( u i − u ki ) t = z − (cid:88) t =0 δ t (1 − (cid:15) ) t (cid:88) a ρ ( a )( u ki − u i )(1 − (cid:15) ) t = z − (cid:88) t =0 δ t (1 − (cid:15) ) t . (32) The equilibrium condition is thus: δ ≥ max (cid:26) c i (1 − (cid:15) ) r i , (cid:27) , (33)where: c i = (cid:88) a ρ ( a )( u i − u ki ), r i = (cid:88) a ρ ( a )( u ki − u i ), u i = u i ( a , a (cid:63) , a min ), u ki = u i ( a , a (cid:63) , a (cid:63) ), u i = u i ( a , a min , a (cid:63) ), and a (cid:63)i = f (cid:63)i ( s i ).Thus, the strategy profile ( σ (cid:63)i , σ (cid:63) − i ) is a subgame perfect equilibrium if andonly if: δ ≥ max (cid:26) c i (1 − (cid:15) ) r i , c − i (1 − (cid:15) ) r − i , (cid:27) . (34) References
1. S. Berri, S. Lasaulce, and M.S. Radjef, ”Power Control with Partial Observation in Wire-less Ad Hoc Networks,”
IEEE Proc. of the EUSIPCO conference,
Budapest, Hungary,Aug. 2016.2. M. F´elegyh´azi and J.P. Hubaux, ”Game Theory in Wireless Networks: A Tutorial,” inEPFL technical report, LCA-REPORT-2006-002,
Feb. 2006.3. M. F´elegyh´azi, J.P. Hubaux, and L. Butty´an, ”Nash Equilibria of Packet ForwardingStrategies in Wireless Ad Hoc Networks,”
IEEE Transactions on Mobile Computing, vol. 5, no. 5, pp. 463-476, 2006.4. J.J. Jaramillo and R. Srikant, ”A Game Theory Based Reputation Mechanism to In-centivize Cooperation in Wireless Ad Hoc Networks,”
Ad Hoc Networks, vol. 8, no. 4,pp. 416-429, 2010.5. L. Butty´an and J.P. Hubaux, ”Enforcing Service Availability in Mobile Ad Hoc WANs,”
Proc. of IEEE/ACM Workshop on Mobile Ad Hoc Networking and Computing (Mobi-HOC),
Boston, Aug. 2000.6. P. Michiardi, and R. Molva, ”CORE: A COllaborative REputation Mechanism to En-force Node Cooperation in Mobile Ad Hoc Networks,”
Proc. Communications and Mul-timedia Security Conference (CMS’02),
Portoroz, Slovenia, 2002.7. L. Butty´an and J.P. Hubaux, ”Stimulating Cooperation in Self-Organizing Mobile AdHoc Networks,”
Mobile Networks and Applications, vol. 8, no. 5, pp. 579-592, 2003.8. D.E. Charilas, K.D. Georgilakis and A.D. Panagopoulos, ”ICARUS: hybrId inCentivemechAnism for coopeRation stimUlation in ad hoc networkS,”
Ad Hoc Networks, vol.10, no. 6, pp. 976-989, 2012.9. X. Lin, J. Zhang, C. Hu, Y. Huang, B. Chen, N. Xie and H. Wang, ”Utility-BasedNode Cooperation Mechanism in Wireless Sensor Networks,”
International Journal ofCommunications, Network and System Sciences, vol. 6, no. 5, pp. 236-243, 2013.10. K. S. Rohini, and S. Dhanasekar, ”Survey on Quality Analysis of Cooperation Incen-tive Strategies in MANET,”
International Journal of Computer Science and MobileComputing, vol. 3, no. 1, pp. 495-500, Jan. 2014.11. C. A. Kamhoua, N. Pissinou, A. Busovaca, K. Makki, ”Belief-free Equilibrium ofPacket Forwarding Game in Ad Hoc Networks under Imperfect Monitoring,”
Proc. Per-formance Computing and Communications Conference (IPCCC),
Albuquerque, NM,USA, Dec. 2010.12. A. Krzesinski, ”Promoting Cooperation in Mobile Ad Hoc Networks,”
InternationalJournal of Information, Communication Technology and Applications, vol. 2, no. 1,pp. 24-46, 2016.13. Q. Xu, Z. Su, S. Guo, ”A Game Theoretical Incentive Scheme for Relay Selection Ser-vices in Mobile Social Networks,”
IEEE Transactions on Vehicular Technology, vol. 65,no. 8, pp. 6692-6702, Aug. 2016.itle Suppressed Due to Excessive Length 3114. B. Larrousse, S. Lasaulce, and M. Wigger, ”Coordinating Partially-Informed Agentsover State-Dependent Networks,”
Proc. IEEE Information Theory Workshop (ITW),
Jerusalem, Israel, Apr-May. 2015.15. B. Larrousse and S. Lasaulce, ”Coded Power Control: Performance Analysis,”
Proc.IEEE International Symposium on Information Theory,
July. 2013.16. B. Larrousse, S. Lasaulce, and M. Bloch, ”Coordination in Distributed Networks viaCoded Actions with Application to Power Control,”
IEEE Transactions on InformationTheory, submitted in Aug. 2015, available on arXiv.17. R. Axelrod, and J. Wu, ”How to Cope with Noise in the Iterated Prisoner’s Dilemma,”
The Journal of Conflict Resolution, vol. 39, pp. 183-189, 1995.18. Z. Ji, W. Yu, and K.J. Ray Liu, ”A Belief Evaluation Framework in AutonomousMANETs under Noisy and Imperfect Observation: Vulnerability Analysis and Coop-eration Enforcement,”
IEEE Transactions on Mobile Computing, vol. 9, no. 9, pp.1242-1254, 2010.19. Z. Li, and H. Shen, ”Game-Theoretic Analysis of Cooperation Incentive Strategies inMobile Ad Hoc Networks,”
IEEE Transactions on Mobile Computing, vol. 11, no. 8,pp. 1287-1303, 2012.20. A. Gjendemsj, D. Gesbert, G.E. Oien, and S.G. Kiani, ”Binary Power Control for SumRate Maximization over Multiple Interfering Links,”
IEEE Transactions on WirelessCommunications, vol. 7, no. 8, pp. 3164-3173, 2008.21. S. Sesia, I. Toufik, and M. Baker, ”LTE-The UMTS Long Term Evolution: From Theoryto Practice,”
Wiley Publishing,
Springer,
IEEE PersonalCommunications, vol. 7, no. 2, pp. 48-54, Apr. 2000.24. F. Meshkati, M. Chiang, H.V. Poor, and S.C. Schwartz, ”A Game Theoretic Approachto Energy-Efficient Power Control in Multicarrier CDMA Systems,”
Journal on SelectedAreas in Communications, vol. 24, no. 6, pp. 1115-1129, 2006.25. S. Lasaulce, Y. Hayel, R. El Azouzi, and M. Debbah, ”Introducing Hierarchy in EnergyGames,”
IEEE Transactions on Wireless Communications, vol. 8, no. 7, pp. 3833-3843,July. 2009.26. E.V. Belmega and S. Lasaulce, ”Energy-Efficient Precoding for Multiple-Antenna Ter-minals,”
IEEE Transactions on Signal Processing, vol. 59, no. 1, pp. 329-340, Jan.2011.27. A. El Gamal and Y. Kim, ”Network Information Theory,”
Cambridge University Press,
Proc. IEEE International Sympsium on SignalProcessing and Information Technology (ISSPIT), pp. 622-627, Vancouver, Canada,Aug. 2006.29. S. Lasaulce and H. Tembine, ”Game Theory and Learning for Wireless Networks: Fun-damentals and Applications,”
Academic Press, Elsevier, pp. 1-336, Oct. 2011, ISBN978-0123846983.30. A. Neyman, and S. Sorin, ”Repeated Games with Public Uncertain Duration Process,”
International Journal of Game Theory, vol. 39, no. 1, pp. 29-52.31. M. Maschler, E. Solan, and S. Zamir, ”Game Theory,”
Cambridge University Press,
MathematicalProgramming, vol. 11, no. 1, pp. 14-27, 1976.33. G. Gallo and A. Olkticti, ”Bilinear Programming: An Exact Algorithm,”
MathematicalProgramming, vol. 12, no. 1, pp. 173-194, 1977.34. H. Vaish and C.M. Shetty, ”The Bilinear Programming Problem,”
Naval Research Lo-gistics Quarterly, vol. 23, no. 2, 303-309, 1976.35. D. Fudenberg and J. Tirole, ”Game Theory,”
MIT Press,
Proc. IEEE Fifth InternationalConference on Communications and Networking (ComNet),
Hammamet, Tunisia, Nov.2015.2 Sara Berri et al.37. C.H. Papadimitriou, ”Algorithms, Games, and the Internet”,
Proc. 33rd Annual ACMSymposium on Theory of Computing (STOC),
July. 2001.38. G.F. Marias, P. Georgiadis, D. Flitzanis and K. Mandalas, ”Cooperation EnforcementSchemes for MANETs: A Survey,”
Wireless Communications and Mobile Computing, vol. 6, pp. 319-332, 2006.39. M. I. Poulakis, A. D. Panagopoulos, P. Constantinou, ”Channel-Aware OpportunisticTransmission Scheduling for Energy-Efficient Wireless Links,”
IEEE Transactions onVehicular Technology, vol. 62, no. 1, pp. 6692-6702, Jan. 2013.40. S.M. Betz and H.V. Poor, ”Energy Efficient Communications in CDMA Networks: AGame Theoretic Analysis Considering Operating Costs,”
IEEE Transactions on SignalProcessing, vol. 56, no. 10, pp. 518-5190, 2008.41. F. Richter, A.J. Fehske and G. Fettweis, ”Energy Efficiency Aspects of Base StationDeployment Strategies for Cellular Networks,”
Proc. of IEEE 70th Vehicular TechnologyConference Fall,