An Iterated Game of Uncoordinated Sharing of Licensed Spectrum Using Zero-Determinant Strategies
aa r X i v : . [ c s . G T ] J a n An Iterated Game of Uncoordinated Sharing ofLicensed Spectrum Using Zero-DeterminantStrategies
Ashraf Al Daoud ∗ , George Kesidis † , J¨org Liebeherr ∗∗ Department of Electrical and Computer Engineering, University of Toronto, Canada † Department of Computer Science and Engineering, Pennsylvania State University, PA, USA
Abstract
We consider private commons for secondary sharing of licensed spectrum bands with no accesscoordination provided by the primary license holder. In such environments, heterogeneity in demandpatterns of the secondary users can lead to constant changes in the interference levels, and thus canbe a source of volatility to the utilities of the users. In this paper, we consider secondary users to beservice providers that provide downlink services. We formulate the spectrum sharing problem as a non-cooperative iterated game of power control where service providers change their power levels to fix theirlong-term average rates at utility-maximizing values. First, we show that in any iterated × game,the structure of the single-stage game dictates the degree of control that a service provider can exert onthe long-term outcome of the game. Then we show that if service providers use binary actions eitherto access or not to access the channel at any round of the game, then the long-term rate can be fixedregardless of the strategy of the opponent. We identify these rates and show that they can be achievedusing mixed Markovian strategies that will be clearly identified in the paper. I. I
NTRODUCTION
Advancements in mobile broadband access technologies in the recent years have resulted in an ex-ponential surge in demand for wireless data services [10]. While the demand is expected to grow, thewireless industry is trying to develop techniques to improve utilization of available radio spectrum bands.This has led to the advent of cognitive radio technologies which allow network users to adapt their systemparameters to the dynamic environment and optimize spectrum utilization without necessarily cooperatingto pursue such goals. Under this paradigm where centralization and traditional spectrum sharing techniques are no longer valid concepts in modern spectrum utilization, game theory has emerged as a useful toolfor modeling and analyzing user behavior in non-cooperative spectrum sharing environments. See [2] fora recent survey on games for cognitive radio networks.Non-cooperative spectrum sharing can be studied for scenarios that involve secondary provision oflicensed spectrum where primary license holders lease out the surplus of their spectral capacities to somesecondary users. These scenarios for spectrum sharing have been made possible by recent regulatorymodels that emerged under different proposals including the FCC’s private commons model [13] andthe licensed shared access model of the EU [26]. Suggested models entail that spectrum provisionmay not necessarily require license holders to coordinate spectrum access among secondary users. Forexample, under the private commons model, secondary users are granted spectrum access via using peer-to-peer communications without relying on the license holder’s infrastructure. In fact, license holdersmay not even need to have a deployed network in order to be eligible for this model [8]. However,license holders can still authorize the use of certain communication devices or can dictate using specifictechnical parameters.This paper presents a framework for designing strategies for secondary sharing of licensed spectrumbands. The underlying communication system involves a number of service providers that share aninterference channel to provide downlink services without access coordination. A distinctive feature ofsharing licensed channels, versus sharing unlicensed ones, is that the utilities of the service providersfrom achieving some rate on the channel are discounted by the cost of utilizing the channel. This costis paid to the primary license holder on usage basis in the form of monetary compensation. Since itis plausible to assume that the marginal utility of each service provider decreases by increasing therate, transmitting at the maximum allowed power level can be sub-optimal from a utility maximizationpoint of view. In this respect, operating at a utility-maximizing rate from the standpoint of any of theservice providers is governed by the interference in the channel which depends on the demand patternsof the service providers. Specifically, at times when the demand is high, a service provider transmits atrelatively higher power levels causing more interference, and visa versa when the demand is low, thusleading to variations in the interference. Demand patterns are generally unknown, and the key problemis to design strategies for power control that help service providers achieve their optimal rates and copewith fluctuations in the interference.In this paper, secondary sharing of licensed spectrum is formulated as an iterative game of powercontrol. Namely, at each round of the game service providers choose their transmission power levelsand consequently achieve some downlink rates that depend also on the interference from other service providers. We show that, there exist strategies that allow service providers to fix their rates, on the longterm, regardless of the strategies of their opponents. The key idea is to realize that in iterated gameswith same action space and same payoff profiles, players with longer memories have no advantage overplayers with shorter ones. Therefore, players can in any round condition their moves on the outcomeof the game in the previous round. This implies that iterated games can lend themselves to Markoviananalysis where a player’s strategy can be defined in terms of the state transition probabilities of theresulted Markovian chain.We use this insight in the power control game. First, we show that in any × iterated game, thestructure of the payoff matrix of any of the players in the single-stage game dictates whether or not aplayer can control its long-term payoff of the game. We also show that this property can be realized in thepower control game by transforming the action space of the service providers into a binary space; eitherto access or not to access the channel in each round of the game. The approach provides the playerswith full control on a range of rates that will be clearly characterized in the paper. In essence, a playercan achieve any value in the valid range by iterating its actions using mixed (probabilistic) Markovianstrategies. The intuition behind this approach is to allow players to maintain a certain rate, on the longterm, by using reactive strategies such that whenever the average rate exceeds the targeted value, it canbe lowered by not participating in the channel in some future rounds. The paper identifies these strategiesand shows that any fixed outcome of the game can be achieved using multiple strategies that differ bytheir convergence rates.Game theory for sharing wireless resources is a widely studied topic in communication networks, inparticular, using the classical theory of non-cooperative games [24]. Proposed strategic-game models forsharing interference channels have included pricing mechanisms [5], [15], [23], [27], medium accesscontrol [17], [18], and transmission power control of both the uplink (end-users to base-stations) andthe downlink (base-stations to end-users) [3], [4], [11]. A common assumptions in these models is thatgame structure and rationality of players are common knowledge in the game, i.e., players hold beliefsabout each others’ strategic choices [6]. Such assumptions are not limited to single-stage games but alsoextend to iterated games where players interact in multiple rounds.Iterated games are studied to induce cooperation in self organizing wireless ad-hoc networks. Most ofthe studies use the iterated Prisoners’ Dilemma game to model packet forwarding between nodes [14],[16], [19]. The model is motivated by many experimental studies which show that Tit-For-Tat can be an Two players each with boolean action space leading to a × payoff matrix. efficient strategy in this regard [7]. An important feature of iterated games is the fact that an action takenby a player at any round of the game has an impact on the future actions of the other players. This inturn leads to the concept of punishment for deviating from equilibrium strategies. Such techniques areapplied in the area of sharing unlicensed bands, particularly in [12], where multiple systems coexist andinterfere with each other. In essence, in [12], spectrum sharing is modeled as an iterated power controlgame to devise self-enforcing power control rules that lead to fair and efficient Nash equilibria.In this work, we follow a different approach from [12] and seek power control strategies that allowservice providers to share spectrum on the downlinks and maintain average rates that are robust tovariations in power control strategies of the opponents. Our work is motivated by recent results in thetheory of iterated Prisoners’ Dilemma games, where Press and Dyson have recently shown in [25] thatthese games admit zero-determinant strategies , which in some cases allow players to control each others’long-term payoffs, and in other cases allow them to set a linear relationship between the payoffs. Suchstrategies are realized if we observe that progression of the game can be formulated as a one-step Markovprocess. The approach of [25], from which the earlier results of [9] can be obtained as a special case,more readily admits generalization to asymmetric games, with different payoff structures that can involvemore than two players.The major contribution of our paper can be summarized as follows:1) We present an extension of the approach in [25] and clearly identify structures of × games thatallow player to control its own payoff or that of the opponent, thus implying a broader applicationthat is not restricted to the Prisoners’ Dilemma game. We follow a generalized approach as we donot assume symmetric payoffs of the players, and thus no assumption of symmetric control on theoutcome of the game.2) We identify the feasible set of payoffs that a player can guarantee from the game and identifymixed Markovian strategies (zero-determinant strategies) for each possible outcome of the game.Furthermore, we show that the notion of payoff control is not restricted to two-player games, butcan be extended to games with multiple players.3) We formulate secondary sharing of wireless spectrum as an iterated game of power control. Weuse an economic model for downlink data transmission to argue that, in interference channels withno access coordination, players can fix their long-term rates at utility-maximizing values by takingbinary actions (e.g., either to transmit at maximum power or not to transmit). In this regard, weidentify strategies for iterating these actions and study their convergence.The rest of the paper is organized as follows: In Section II, we address zero-determinant strategies PPPPPPPPPPPPP
Player X Player
Y n = 1 n = 2 n = 1 ( X , , Y , ) ( X , , Y , ) n = 2 ( X , , Y , ) ( X , , Y , ) Fig. 1. One round payoff matrix of the iterated two-player two-action game. for × iterated games and present our results for games of general payoff structures. We also extendour results to include games with more than two players. In Section III, we analyze secondary spectrumsharing as an iterated game of power control and devise strategies for the proposed × game. A numericalstudy to analyze convergence and power consumption of these strategies is provided in Section IV. Finally,the paper concludes in Section V.II. Z ERO -D ETERMINANT S TRATEGIES FOR I TERATED G AMES
In this section, we develop new results on iterated games where the action space and the payoff matrixdo not change over the course of the game. Our analysis is based on the approach of [25] which showsthat there exist strategies for indefinitely iterated × games that are referred to as “zero-determinant”strategies and which allow the players to control their long-term payoffs or the payoffs of their opponents.The type of control that a player can exert hinges on the structure of the game. In this section, we identifythese structures and any feasible set of payoffs that can be controlled. We also identify the strategies thatlead to this control.For this purpose, consider a × iterated game with the single round payoff matrix given in Figure 1.In each round of the game, row player X and column player Y have binary actions, respectively n , n ∈{ , } , leading to payoffs, respectively X n , Y n where n = ( n , n ) . A salient feature of iterated games isthat players with longer memories of the history of the game have no advantage over those with shorterones, i.e., a strategy of a player that shares the same history used by the opponent does not gain morefrom using longer history of the game. This is due to the iterative nature of the game where actionsand payoffs are indefinitely fixed (see the Appendix of [25]), and thus, strategies can be designed byassuming that the players have memories of a single move. A. Zero Determinant Strategies for × Games
We describe the state of the game in any round by the actions of the players in that round. Specifically,let Ω denote the set of all possible states, i.e., Ω = { (1 , , (1 , , (2 , , (2 , } , (1)and let n ( t ) denote the state of the game in round t ≥ . In each round, players choose their actionswith probabilities that depend on the state of the game in the previous round and thus the process { n ( t ) : t = 0 , , · · · } can be modeled as a Markov chain.In this respect, consider player X and let p k = Pr ( n ( t + 1) = 1 | n ( t ) = k ) , ∀ k ∈ Ω , denote the probability that player X takes action in round t + 1 if in the previous round, player X took action k and player Y took action k . For player Y , similarly let p k = Pr ( n ( t + 1) = 1 | n ( t ) = k ) , ∀ k ∈ Ω . To simplify notation, we write p k = p k and q k = p k . The set of actions of a player is referred to as the strategy of that player, i.e., { p k , ∀ k ∈ Ω } is a strategyof player X and { q k , ∀ k ∈ Ω } is a strategy of player Y . The state transition matrix of the Markov chaincan be described as follows assuming that the rows and the columns are in the same order as listed in (1): M = p , q , p , (1 − q , ) (1 − p , ) q , (1 − p , )(1 − q , ) p , q , p , (1 − q , ) (1 − p , ) q , (1 − p , )(1 − q , ) p , q , p , (1 − q , ) (1 − p , ) q , (1 − p , )(1 − q , ) p , q , p , (1 − q , ) (1 − p , ) q , (1 − p , )(1 − q , ) .Let π i,j be the probability that player X takes action i and player Y takes action j . The Markov chainhas a stationary distribution π T = ( π , , π , , π , , π , ) that satisfies π T M = π T . In the application to spectrum sharing which will be addressed later in the paper, play action will correspond to accessingthe channel with maximum power, while play action will correspond to accessing the channel with lower power or notaccessing the channel at all. π is unique if and only if the chain has a unique closed communication class. In this case, the long-termpayoff for player X is given by u X = π T X (2)and for player Y is given by u Y = π T Y , (3)where X = X , X , X , X , and Y = Y , Y , Y , Y , . Assuming unique stationary π , we can write π T ˜M = , where ˜M = M − I . Let adj ( ˜M ) be the adjugate matrix, i.e., the transposed matrix of signed minors. ByCramer’s rule, adj ( ˜M ) ˜M = det ( ˜M ) I = 0 , where the second equality holds because ˜M is singular. This implies that each row of the matrix adj ( ˜M ) is proportional to π . Furthermore, for an arbitrary vector f , π T f is the determinant of a modified versionof ˜M with a column replaced by f . The determinant does not change by adding the first column of thismodified matrix to the second and third columns, so that π T f = det − p , q , − p , − q , f p , q , − p , q , f p , q , p , − q , f p , q , p , q , f . (4)A key observation of [25] is that the second and the third columns of the matrix in (4) are purelydependent on the actions of player X and player Y , respectively. In specific, ˜m X = − p , − p , p , p , and ˜m Y = − q , q , − q , q , . Without loss of generality, consider the game from the standpoint of player X . If ˜m X = f , the determinantin (4) is equal to , and thus if f = a X + b , Equation (4) is au X + b = 0 , (5)where u X is defined in (2), and a and b are non-zero real numbers.Player X can thus fix the value of u X regardless of the strategy of player Y . To achieve this, the valuesof a and b should be chosen such that p , , p . , p , , and p , are probabilities which in turn depends onthe structure of the game via the equality ˜m X = a X + b .In the following theorem, we state our first result that defines the structures of × games whereplayer X can control u X and defines the strategies that lead to such control. Theorem 1.
For k = 1 , , let X k, min and X k, max , respectively, denote the minimum and maximum valueof row k in the payoff matrix of a × iterated game. Specifically, X k, min = min( X k, , X k, ) ,X k, max = max( X k, , X k, ) . Player X can control its long-term payoff u X regardless of the action of player Y if and only if thereexist k max , k min ∈ { , } where X k max , max ≤ X k min , min . (6) If so, any value of u X from the interval [ X k max , max , X k min , min ] can be achieved by using the followingmixed/probabilistic strategies: p , = 1 + (cid:18) − X , u X (cid:19) b, (7) p , = 1 + (cid:18) − X , u X (cid:19) b, (8) p , = (cid:18) − X , u X (cid:19) b, (9) p , = (cid:18) − X , u X (cid:19) b, (10) where b is chosen such that, if k min = 1 and k max = 2 , then < b ≤ min − − X ,max u X , − X ,min u X ! , and if k min = 2 and k max = 1 , then max − − X ,min u X , − X ,max u X ! ≤ b < . Proof:
We need to obtain a and b that satisfy ˜m X = a X + b and render p , , p , , p , , and p , as probabilities. First note that, by (5), a = − bu X and thus formulae (7–10) follow. Next, we obtain therange of valid values of the non-zero variable b by dividing the search domain into two intervals; b > and b < . Case 1 ( b > : Consider (7) and (8) and notice that, for any value of b > and a given value of u X , the condition X ,min ≥ u X is a sufficient and necessary condition for p , and p , to be less than or equal to . Similarly, for p , and p , to be greater than or equal to , we obtain the following condition from (9) and (10) X ,max ≤ u X . Therefore, u X cannot be fixed at values outside the interval [ X ,max , X ,min ] . To show that u X can befixed at any value in this interval, we need to show that there exist b > such that p , and p , aregreater than or equal to and such that p , and p , are less than or equal to . In this regard, from (7)and (8) we obtain < b ≤ − − X , u X , < b ≤ − − X , u X . Note that, since X , , X , > u X , the tightest upper bound is − − X ,maxuX . In the same way, from (9)and (10) we obtain < b ≤ − X , u X , < b ≤ − X , u X , and since X , , X , < u X , − X ,minuX is the tightest upper bound. Therefore, u X can be fixed at anyvalue in the interval [ X ,max , X ,min ] by choosing b from the following feasible range < b ≤ min − − X ,max u X , − X ,min u X ! . Case 2 ( b < : We can follow the same steps in the previous case. Namely, for p , and p , to be less than or equalto , it is required that X ,max ≤ u X , and for p , and p , to be greater than or equal to , it is required that X ,min ≥ u X . Combining the previous conditions yields the new condition X ,max ≤ X ,min , Furthermore, for p , and p , to be greater than or equal to , we obtain the conditions − − X ,min u X ≤ b < . In a similar fashion, for p , and p , to be less than or equal to we obtain − X ,max u X ≤ b < . Therefore, b can be chosen from the following feasible range max − − X ,min u X , − X ,max u X ! ≤ b < . Theorem 1 provides a framework for understanding payoff control in × iterated games. It states thatthe structure of the payoff matrix reveals the possibility of players controlling their long-term payoffs. Infact, only if the maximum payoff in one row is less than or equal to the minimum in the other row, thenrow player X can set the long-term payoff, u X , at any value between the minimum and the maximum,inclusive. For example if X , = 1 , X , = 0 . , and X , = X , = 0 . , then player X can set u X atany value in the interval [0 . , . .The results in the theorem can be directly applied to player Y by considering the columns of thepayoff matrix of the player instead of the rows. In particular, let Y k, min = min( Y ,k , Y ,k ) and Y k, max = max( Y ,k , Y ,k ) , then player Y can control its long-term payoff, u Y , if and only if there exists k max , k min ∈ { , } , where k max = k min and Y k max , max ≤ Y k min , min . A simple example to verify strategies (7–10) in the theorem is to choose u X = X k min , min , meaning thatplayer X sets u X at the maximum value possible. If we assume that X k min , min = X , , then regardlessof the value of b , this always yields a strategy with p , = 1 , i.e., player X plays action wheneverboth players played this action in the previous round. One way to understand this result is to consider astrategy of player Y playing action in each round of the game. Player X will be then playing action in each round as there will be no opportunity to make up for losses that may result in from not playingthat action in any of the previous rounds.In the same example, assume that the ratio X , /X , = 0 . and assume that X , = X , = 0 . If b = 1 , then this yields the deterministic strategy p , = 1 , p , = 0 , p , = 1 , and p , = 1 . This strategyis quite intuitive since it tracks the payoff of player X such that, regardless of the strategy of player Y , whenever the payoff in any round is X , , player X plays action in the next round and gains payoff, so that the average of the two rounds is maintained at the targeted value X , . In the next round,the player plays action to gain at least X , and so on. The strategy is one of several strategies thatcan be obtained by changing the value of the variable b , and which will be discussed in more details inSection IV.Two important observations can be carried from Theorem 1. First, the players can design their strategieswithout an underlying assumption of knowledge of each other’s payoffs. All that a player needs to knowabout the opponent at any round is the latter’s action in the previous round. This leads us to the secondobservation which highlights a more general perspective of this theorem. In essence, if the structure ofthe payoff matrix of the opponent admits the same structure described in the theorem, then a player cancontrol the long-term payoff of the opponent. For example, player X can control the payoff of player Y if the payoff matrix of player Y satisfies conditions (6) with X i,j replaced by Y i,j .Controlling opponent’s payoff can be realized in games such as the iterated Prisoners’ Dilemma where Y , > Y , > Y , > Y , . In this game, the row player can set the payoff of the column player at anyvalue in the interval [ Y , , Y , ] . Strategies for opponents controlling each other’s payoffs were previouslystudied in [9] and presented for a subset of games where players have symmetric payoffs as in the caseof the Prisoners’ Dilemma game. B. Iterated Games with Multiple Players
The results presented in the previous section can be extended to include games of more than two players.Let N ≥ denote the number of players in the game and assume they are indexed , , · · · , N . Let thebinary vector n ( t ) = ( n i ( t ) : i = 1 , · · · , N ) describe the state of the game in a given round t , where n i ( t ) ∈ { , } for all i, t so that at any given t , n ( t ) ∈ { , } N =: Ω . The process { n ( t ) : t = 0 , , · · · } can be described as a multi-dimensional Markov chain. In each round of the game, players take actionswith probabilities that depend on the state of the game in the previous round.Let p k i denote the probability that player i plays action in a certain round if the game was in state k in the previous round, and let p i = ( p k i : k ∈ Ω) denote the complete strategy profile of player i . The state transition matrix of the N -player game can bepresented as a N × N matrix. Similar to the game with N = 2 players, we can apply Cramer’s rule to ˜M = M − I and replace the last column (corresponding to all players taking action ) with a “reward”vector f . For l , k ∈ Ω , the entry in the k th row and l th column of ˜M is for all iterations t ≥ Pr ( n ( t + 1) = k | n ( t ) = l ) = Y i ∈K k p l i Y j ∈L k (1 − p l j ) where K k is the set of players playing action in state k and L k is the set of players playing action in state k .Consider adding all columns C i ⊂ Ω of ˜ M which correspond to states where player i and at least oneother player plays action , to the column where only player i plays action . An entry of the resultingcolumn ˜m i at row k is then given as − p k i Γ if a diagonal element of ˜M is added to this entry, p k i Γ otherwise,where Γ = X k ∈C i Y j ∈K k \ i p k j Y l ∈L k (1 − p k l ) . An important observation is that
Γ = 1 since each product in Γ has elements that are either theprobability or its complement of a fixed set of events. Thus, the sum of all possible permutations ofthese products evaluate to 1. Therefore, in a similar reasoning that led to (4), multiplying the stationarydistribution of the game π with an arbitrary | Ω | -size vector f leads to the following structure (also displaying ˜m i ) π T f = det · · · − p k i · · · f · · · − p k i · · · f . . . ... . . . ... · · · p k i · · · f | Ω |− · · · p k i · · · f | Ω | . So, a column that corresponds to the state where only player i plays action has elements that dependsolely on the actions of that player.We follow the developments that led to Theorem 1 and let U i, n denote the payoff of player i if thestate of the game at the previous round was n . We also let U i denote a vector of all possible outputs.Let u i denote a generic value of the long-term payoff of player i . Thus, taking ˜m i = f = a i U i + b i ,where a i and b i are non-zero real numbers leads to zero-determinant strategies for own payoff control,which is formulated in the following proposition: Proposition 1.
In the game with N ≥ players, for k=1,2, let U i,k, min = min( U i, n : n i = k ) ,U i,k, max = max( U i, n : n i = k )) , where the first quantity is the minimum payoff of player i when playing action k , and the second quantityis the maximum.Player i can control its long-term payoff, u i , regardless of the actions of the other players in the gameif and only if there exists k max , k min ∈ { , } where U i,k max , max ≤ U i,k min , min . If so, any value of u i from the interval [ U i,k max , max , X i,k min , min ] can be achieved by using the followingstrategies: p k i = − U i, k u i ) b i , if player i plays action in state k , (1 − U i, k u i ) b i , otherwise,where b i is chosen such that,if k min = 1 and k max = 2 , then < b i ≤ min − − U i, ,max u i , − U i, ,min u i ! , and if k min = 2 and k max = 1 , then max − − U i, ,min u i , − U i, ,max u i ! ≤ b i < . III. A N ON -C OOPERATIVE G AME FOR S HARING L ICENSED S PECTRUM
In this section, we apply our results to design strategies for sharing licensed spectrum bands. Weconsider a general model for spectrum sharing that involves N service providers indexed i = 1 , , · · · , N sharing a channel of bandwidth W . We assume that the channel is primarily licensed to a single entitythat we refer to as the license holder. We consider a cold leasing model where the license holder maynot deploy any network or equipment, and thus, offering the channel to the service providers withoutaccess coordination. This model is one of several models that have been suggested for treating spectrumbands as private commons where the ultimate ownership of spectrum is preserved by the license holder(See for example [8]).We model the underlying communication system as an interference channel where at times when thechannel is less congested, the service providers create less interfere to each other, and thus can achievebetter throughput rates. We focus on the downlink and assume that the service providers have fixed poolsof end-users co-located within a certain geographical area. See Figure 2 for a description of this model.Let S i denote the set of end-users of service provider i . The license holder regulates channel accessby imposing a limit on the maximum transmission power of each service provider. It also allocates theunderlying code space for transmission to individual end-users. Power and code allocations are normallynegotiated with the license holder and provided through “secondary provider” contracts.We follow a simple model of common-channel interference under CDMA where transmission of aservice provider to a given end-user appears as noise to all other end-users, including those belonging toother service providers. While interference cancellation techniques can be still applied, they are precludedin this model due to practical limitations such as decoder complexities and delay constraints. Similarassumptions have been widely used in the literature of interference channels, see for example [12].The service providers use power control to maintain certain throughput via controlling their transmissionpower levels on the downlinks. In interference channels, an increase in the transmission power on one ofthe downlinks causes interference on the other links and thus a degradation in the Signal to Interferenceand Noise Ratio (SINR) at the receiving sides of those links. In this regard, let Λ i, max denote themaximum transmission power allocated to service provider i . A power control scheme of a serviceprovider specifies the transmission power allocated to each end-user on the downlink. Let the vector service provider i service provider j h i,1 h j,1 h i,2 h j,2 end-user 1 end-user 2 Fig. 2. A channel access model where a number of service providers share a channel and provide downlink services to groupsof end-users spatially located in over-lapping coverage areas. Each end-user receives service for its designated service provider,but also gets interference from other service providers. λ i = ( λ i, , λ i, , · · · λ i, |S i | ) denote the power control scheme of service provider i . If all service providerstransmit at their maximum power levels, the SINR of end-user k ∈ S i can be represented by γ i,k ( λ i ) = λ i,k h i,k σ k + h i,k (Λ i, max − λ i,k ) + P j = i h j,k Λ j, max , where h i,k is the path gain between the base station of service provider i and end-user k , and σ k is thenoise power at end-user k [20]. The achievable throughput rate at the downlink of user k ∈ S i can beobtained using Shannon’s formula r i,k ( λ i ) = W log (1 + γ i,k ( λ i )) , (11)and the aggregate rate on the downlink of service provider i is thus given by R i ( λ i ) = X k ∈S i r i,k ( λ i ) . (12)It is plausible to measure the utilities of service providers from sharing the channel by the qualityof service they provide on the downlinks. One important measure in this regard is the average delay of In fact, it is path attenuation since h i,k < . In practice, path attenuations may be obtained by use of pilot signals or maynot be explicitly discovered if the power control mechanism is performed by adaptive dithering. packet delivery, which can be reduced by improving the rate on the downlink. Let ¯ R i be the long-termaverage downlink rate of service provider i . We denote the utility of the service provider by the function U i ( . ) which is strictly increasing in ¯ R i .A distinctive feature of secondary utilization of licensed spectrum bands is that the utility of secondaryusers, i.e., service providers, is discounted by some cost paid in the form of a fee to the license holder.See for example [1], [21], [22] for studies that involve economic models and pricing techniques forsecondary spectrum utilization. Here, we consider a pricing scheme where the license holder chargesthe service providers on usage basis per unit data transmitted on the downlinks. Let c i denote the pricecharged to service provider i per unit data transmitted on the channel. Thus, the optimal aggregate rate ¯ R ∗ i of the service provider is a solution of the optimization problem max ¯ R i U i ( ¯ R i ) − c i ¯ R i , which has a unique solution if U i is concave.From the standpoint of service provider i , achieving ¯ R ∗ i requires the service provider to transmit at acertain power level taking into consideration the interference created by other service providers. In thelight of lack of central coordination, some service providers may unpredictably change their transmissionpower levels to adapt their rates according to their demand, thus causing variations in the interferenceto the other service providers. Sharing an interference channel with users that transmit at varying powerlevels can be modeled as a non-cooperative iterated game, where it can be assumed that the channel isoffered to the service providers in rounds. In each round, the service providers choose their transmissionpower levels, which can vary from round to round according to their anticipated demand levels. Similarmodels for sharing interference channels have been considered, for example, in [12].We refer to this game as the iterated power control game. In each round of the game, the serviceproviders adapt their power levels based on some history of the game to maintain ¯ R ∗ i regardless of thepower control strategy of the opponents. The theory of zero-determinant strategies presented in Section IIhelps provide guidelines for power control in such environments that involve uncoordinated spectrumaccess. In the sequel, we present the iterated power control game for the case of two service providers,where the service providers fix their power allocation schemes λ i , but take binary decisions in each roundon whether or not to engage their users. We identify the range of values of ¯ R ∗ i that can be achieved andcharacterize strategies for achieving these values. PPPPPPPPPPPPP provider 1 provider 2 Access No AccessAccess ( θ R , θ R ) ( R , No Access (0 , R ) (0 , Fig. 3. Payoff matrix of the power control game. Service providers and achieve rates R and R , respectively, if theyaccess the channel solely. If both providers access the channel simultaneously, then provider i achieves θ i R i . A. Iterated Power Control Game with Two Service Providers and Binary Action Space
Consider an iterated power control game with two service providers labeled and . The payoff matrixof the single shot game is shown in Figure 3. For ease of exposition, we will assume that in each roundof the game the service providers choose either to transmit at a certain power level or at zero power.That is, the service providers can choose between two actions: either to access or not to access thechannel. We assume that if one service provider accesses the channel, it achieves downlink rate R if itis service provider and rate R if it is service provider . Both R and R are described by (12). Ifboth service providers access the channel, then potentially both achieve lower rates θ i R i , i = 1 , with < θ i < . Trivially, the service provider that does not access the channel in any given round achievesno rate. However, the game can be extended to include power values that are not necessarily limitedto and Λ i, max . Furthermore, extensions to more than two service providers follow directly from thegeneralization in Proposition 1.The payoff matrix of this game has a structure identified by Theorem 1. It allows any of the serviceproviders to exert control on its rate. Specifically, let denote access and denote no access . In anyround, from the perspective of service provider i , the game can be in one of four possible states givenby the set Ω i = { (1 , , (1 , , (2 , , (2 , } , where the first element of a tuple refers to an action by service provider i and the second element refersto an action by the other service provider. Let n i ( t ) ∈ { , } denote an action by service provider i at Here, we assume a stationary model of user path gains. Namely, mobility of users may result in time-varying path gainsas when a user enters a deep fade region or when handed off to a neighboring base-station of the same provider or to anotherprovider in the same region. To handle such transients, the power control iterations need to be performed on a faster time-scale. round t of the game. Also let n ( t ) = ( n , n ) and let p k i = Pr ( n i ( t + 1) = 1 | n ( t ) = k ) , ∀ k ∈ Ω i . Therefore, following the results in Theorem 1, service provider i can fix its long-term rate, ¯ R i , at anyvalue in the interval (0 , θ i R i ] by accessing the channel in each round of the game according to thefollowing policy p , i = 1 + (1 − θ i R i / ¯ R i ) b i , (13) p , i = 1 + (1 − R i / ¯ R i ) b i , (14) p , i = b i , (15) p , i = b i , (16)where b i is chosen such that < b i ≤ | − R i / ¯ R i | . Obtaining values of R , R , θ , θ , which identify the range of possible fixations of the outcome ofthe game and the associated access strategies, hinges on the underlying power allocation scheme, λ i ,applied by the service providers on the downlinks. In the following, we derive lumped parameters forcomputing these values for the max-min power allocation scheme that maximizes the minimum rate onthe downlinks. Specifically, for service provider i , the max-min scheme requires solving the followingoptimization problem max λ i min k ∈S i γ i,k ( λ i ) subject to X k ∈S i λ i,k = Λ i, max , where it is implied that the service providers transmit at the maximum allowed power Λ i, max . A solutionof this problem results in equal rates r i on all the downlinks of service provider i .To compute R i , consider the case where only service provider i accesses the channel. In this case, therate achieved at any of the downlinks k ∈ S i is given by r i = W log (cid:18) λ i,k h i,k σ k + h i,k (Λ i, max − λ i,k ) (cid:19) , and thus R i = |S i | r i . Equal rates can be maintained on all the downlinks by choosing λ i,k and λ i,l forall k, l ∈ S i such that h i,k λ i,k σ k + h i,k (Λ i, max − λ i,k ) = h i,l λ i,l σ l + h i,l (Λ i, max − λ i,l ) , which can be equivalently written as h i,k λ i,k σ k + h i,k Λ i, max = h i,l λ i,l σ l + h i,l Λ i, max =: K, ∀ k, l ∈ S i . Thus, λ i,k = K (cid:18) σ k h i,k + Λ i, max (cid:19) , ∀ k ∈ S i . Note that Λ i, max = P k ∈S i λ i,k and therefore, for maximum R i , K = Λ i, max P k ∈S i σ k h i,k + |S i | Λ i, max .θ i can be computed in a similar fashion by considering both service providers i and j simultaneouslytransmitting on the channel and taking into consideration the interference they create to each other. Inthis case, let ˜ r i denote the rate on each downlink of service provider i . Thus, θ i R i = |S i | ˜ r i where forall k ∈ S i , ˜ r i = W log (cid:18) λ i,k h i,k σ k + h i,k (Λ i, max − λ i,k ) + h j,k Λ j, max (cid:19) . The power distribution on the downlinks can be obtained by equalizing all the rates. Thus, λ i,k = ˜ K (cid:18) σ k h i,k + Λ i, max + Λ j, max h j,k h i,k (cid:19) , ∀ k ∈ S i , where ˜ K = Λ i, max P k ∈S i σ k +Λ j, max h j,k h i,k + |S i | Λ i, max . IV. N
UMERICAL S TUDY
In this section, we provide insight into the zero-determinant strategies for the × game describedin Figure 3. The structure of these strategies is given by formulae (13)-(16). Without loss of generality,we consider a symmetric game with R = R = 1 . and θ = θ = 0 . . Here, each service provider canfix ¯ R i to values in the range (0 , . . From the standpoint of service provider 1, i.e., the row player, thezero-determinant strategies ( p , , p , , p , , p , ) for a given ¯ R have the following structure (cid:18) − . R ) b , − R ) b , b , b (cid:19) , (17)where < b ≤ | − / ¯ R i | . Round A v e r age r a t e R *1 = 0.5, strategy (1, 0, 1, 1)R *1 = 0.25, strategy (2/3, 0, 1/3, 1/3)R *1 = 0.1, strategy (5/9, 0, 1/9, 1/9) Fig. 4. Rate convergence under zero-determinants strategies in a × channel sharing game with R = R = 1 . and θ = θ = 0 . . Service provider 2 (column player) uses a random strategy with probability of access = / in each round. A. Convergence of the Zero-Determinant Strategies
First, consider the deterministic strategy (1 , , , which corresponds to b = 1 and which allows theservice provider to achieve a long-term average rate ¯ R = 0 . . Assume the strategy is played againstservice provider 2 which accesses the channel in each round with probability equal to / . Figure 4shows the average rate of service provider at different rounds of the game as it converges, in thelong term, to the value . . Convergence paths are also provided for the strategies (2 / , , / , / and (5 / , , / , / which lead, respectively, to ¯ R = 0 . and ¯ R = 0 . .A common factor of these strategies is that p , = 0 , which means that if the service provider ends upusing the channel alone in any round, it will not access the channel in the next round. Strategies with thisproperty can be obtained by setting b at the highest possible value. These rectifying strategies guaranteethat there will be no long time periods of deviation from ¯ R which explains why all the previous strategiesconverge relatively quickly to ¯ R .Figure 5 shows convergence paths of different strategies that achieve ¯ R = 0 . including the strategy(1,0,1,1). All the strategies are played against the same service provider that has a probability of channelaccess equal to / . Note that as p , increases, strategies take longer time to converge. This is due tothe fact that if the service provider accesses the channel in one round, then as p , increases, it is morelikely that the service provider will access the channel in the next round, and thus, it is more likely todeviate more from ¯ R . In the meantime, an increase in p , is accompanied with a decrease in p , and Round A v e r age r a t e Strategy (1, 0, 1, 1)Strategy (1, 1/2, 1/2, 1/2)Strategy (1, 2/3, 1/3, 1/3)Strategy (1, 9/10, 1/10, 1/10)
Fig. 5. Convergence paths of different zero-determinant strategies that can be applied by service provider 1 (row player) inthe power control game. Strategies with higher p , are slower to converge, since as p , increases, it becomes more likely forthe service provider to access the channel in the next round if already accessed the channel in the previous round. This leadsto longer periods of deviation from ¯ R i , and thus, longer conversion times. p , by formula (17). This leads to long periods of channel access that are followed by long periods ofno channel access and thus, the strategy tends to converge relatively slowly.A less obvious conclusion can be carried from games of more than two service providers. For example,consider the game with three service providers i, j, k . Assume that in any round of the game, serviceprovider i achieves one of the following rates: R i , if accesses the channel alone, α R i , if accesses the channel with another provider, α R i , if all the service providers access the channel, , otherwise,where < α , α < . Let x i , x j , x k denote, respectively, the actions of player i, j, k in any round,where x i , x j , x k ∈ { , } and such that implies access and implies no access. Let p x i ,x j ,x k i denotethe probability that service provider i accesses the channel if the state of the game was ( x i , x j , x k ) inthe previous round.Following Proposition 1, a zero-determinant strategy allows service provider i to fix ¯ R i at any value Round A v e r age r a t e Strategy (1, 3/4, 3/4, 0, 1/2, 1/2, 1/2, 1/2)Strategy (1, 5/6, 5/6, 1/3, 1/3, 1/3, 1/3, 1/3)Strategy (1, 19/20, 19/20, 4/5, 1/10, 1/10, 1/10, 1/10)
Fig. 6. Convergence paths for multiple zero-determinant strategies in the power control game with three service providers.Strategies that are more likely to rectify, if exceeding the targeted rate, are the strategies that converge relatively quicker. in the interval (0 , α R i ] . The structure of the policy is given by: p , , i = 1 + (1 − α R i ¯ R i ) b i ,p , , i = p , , i = 1 + (1 − α R i ¯ R i ) b i ,p , , i = 1 + (1 − R i ¯ R i ) b i ,p , , i = p , , i = p , , i = p , , i = b i , where < b i ≤ | − R i / ¯ R i | . Figure 6 shows convergence paths of different strategies when R i = 1 . , α = 1 / , and α = 1 / .All the strategies aim to fix ¯ R i at the maximum possible value, / , where service providers j and k access the channel at each round with probability / and / , respectively. A strategy is displayed inthe figure by an -element tuple where the first elements correspond to p , , i , p , , i , p , , i , and p , , i ,respectively. Note that, since ¯ R i is fixed at the maximum value, then p , , = 1 for all the strategies. Thepattern observed in Figure 5 applies to Figure 6 where strategies that converge quickly are the strategiesthat have lower p , , i , p , , i , and p , , i , i.e., these are the strategies that are less likely to access thechannel if they achieved more than the targeted rate, / , in the previous round. B. Zero-Determinant Strategies and Power Consumption
Next, we investigate the impact of the zero-determinant strategies on the average power consumption ofthe service providers. In the considered power control game, the service providers take binary decisionsin each round whether or not to access the channel. If the channel is to be accessed, service provider i transmits at the maximum allowed power level Λ i, max . Therefore, power consumption over the course ofthe game has a probability distribution derived from the stationary distribution of the state of the game, π . In particular, consider a × power control game and consider service provider , i.e., the row player.The average consumed power is given by Λ ,avg = Λ , max ( π , + π , ) . Here, π , is the proportion of rounds where both service providers transmit and π , is the proportionwhere only service provider transmits.Consider the game in Figure 3 and assume that R = R = 1 . and θ = θ = 0 . . Assume thatboth service providers use zero-determinant strategies to achieve ¯ R = 0 . and ¯ R = 0 . . The impactof the different strategies on power savings is shown in Figure 7. The horizontal axis displays possiblestrategies of service provider with each strategy denoted by a different value of the variable b definedin (17). All the values of b are taken from the feasible range [0 . , , and a common factor of all thesestrategies is that p , = 1 . We show the proportion of rounds in which service provider accesses thechannel, ( π , + π , ), where each curve corresponds to a different strategy of service provider . Here,the strategies of service provider are denoted by the vector q = ( q , , q , , q , , q , ) , where, followingthe convention in Section II, q x,y is the probability that service provider will access the channel ifservice provider played action x and service provider played action y in the previous round.The figure shows that power consumption of service provider is unimodal in the value of b , butcan be increasing or decreasing according to the strategy of the opponent. The figure also shows thatthere exists a trend in power savings that the service provider can achieve from playing against differentopponent strategies. Namely, playing against strategies that have relatively low q , leads to more powersavings. The intuition behind this observation is that when q , is low, service provider is more likelyto skip the channel in the next round if both service providers accessed the channel in the current round.Now since p , = 1 , service provider will have the whole channel with probability in the next round,i.e., transmitting with no interference and thus achieving high rate.This argument can be solidified by looking at p , vs q , and p , vs q , . Notice that by increasing b , p , decreases and p , increases, and thus, if playing against a strategy with relatively high q , (meaning b (Strategy of service provider 1) P r opo r t i on o f r ound s w i t h c hanne l a cc e ss ( π , + π , ) q = (9/10, 7/10, 1/10, 1/10)q = (6/7, 4/7, 1/7, 1/7)q = (3/4, 1/4,1/4,1/4)q = (5/7,1/7, 2/7, 2/7)q = (2/3, 0, 1/3, 1/3) Fig. 7. Proportion of rounds in which service provider (row player) accesses the channel displayed for different strategies ofservice provider . Playing against strategies that are more likely to skip the channel when the other service provider accessesthe channel leads to power savings. low q , ) such as q = (2 / , , / , / , the gap between the previous values is going to increase. Thismeans that, if in any round only one service providers accessed the channel, it is more likely for theother service provider to access the channel in the next round, and visa versa, leading to more powersavings. On the other hand, the gap decreases if compared to a strategy with relatively low q , and high q , such as q = (9 / , / , / , / . In such a case, power consumption is going to increase.V. C ONCLUSION
In this paper, we considered private commons as a model for secondary sharing of licensed spectrumbands. The system involves multiple wireless service providers sharing an interference channel in unco-ordinated fashion and servicing their own populations of co-located end-users. The problem of aggregatedownlink power control is formulated as a non-cooperative iterated game. In this regard, we considereda set of Markovian strategies known as “zero-determinant” strategies that are primarily developed forthe iterated Prisoners’ Dilemma game and which are shown to allow players to exert control on eachother’s score. We extended these strategies to be considered for any × game and in a way based onthe structure of the game. We showed that the spectrum sharing game admits an appealing structure thatallows service providers to employ power control strategies to set their own aggregate rates regardless ofthe strategies of other service providers. We provided numerical experiments to study the convergencebehavior of these strategies and their impact on power consumption. A CKNOWLEDGMENT
This work was funded by NSERC Strategic Project grant and NSF CNS grant 1116626.R
EFERENCES [1] A. Al Daoud, M. Alanyali, and D. Starobinski. Pricing Strategies for Spectrum Lease in Secondary Markets.
IEEE/ACMTransactions on Networking , Vol. 18, No. 2, pp. 462–475, April 2010.[2] T. Alpcan, H. Boche, M.L. Honig, and H. Vincent Poor. Mechanisms and Games for Dynamic Spectrum Allocation,
Cambridge Press , 2013.[3] T. Alpcan, T. Basar, R. Srikant, and Eitan Altman. CDMA Uplink power control as a noncooperative game. in Proc. IEEEConference on Decision and Control , pp. 197–202, 2001.[4] E. Altman, K. Avrachenkov, G. Miller, and B. Prabhu. Discrete power control: Cooperative and non-cooperativeoptimization. in Proc. IEEE INFOCOM , pp. 37–45, 2007.[5] A. Attar, M.R. Nakhai, A.H. Aghvami. Cognitive Radio game for secondary spectrum access problem.
IEEE Transactionson Wireless Communications , Vol. 8, No. 4, pp. 2121–2131, April 2009.[6] Robert Aumann and Adam Brandenburger. Epistemic Conditions for Nash Equilibrium.
Econometrica , Vol. 63, No. 5,pp. 1161-1180, September 1995.[7] R. Axelrod.
The evolution of cooperation . Basic Books, New York, 1984.[8] M.M. Buddhikot. Understanding dynamic specrum access: Models, taxonomy and challenges. in Proc. of IEEE Symposiumon New Frontiers in Dynamic Spectrum Access Networks (DySPAN) , pp. 649–663, 2007.[9] M.C. Boerlijst, M.A. Nowak, and K. Sigmund. Equal pay for all prisoners.
The American mathematical monthly,
Vol. 104,No. 4, pp. 303–305, April 1997.[10] Cisco Visual Networking Index. Global mobile data traffic forecast update, 2011 - 2016, http://tinyurl.com/VNI2012, May2012.[11] S.T. Chung, S. Kim, J. Lee, and J.M. Cioffi. A game-theoretic approach to power allocation in frequency-selective Gaussianinterference channels. in Proc. of IEEE International Symposium on Information Thoery , pp. 136–136, 2003.[12] R. Etkin, A. Parekh and D. Tse. Spectrum sharing for unlicensed bands.
IEEE Journal on Selected Areas inCommunications , Vol. 25, No. 3, pp. 517–528, April 2007.[13] Federal Communications Commission. Promoting efficient use of spectrum through elimination of Barriers to thedevelopment of secondary markets.
Second Report and Order on Reconsideration and Second Further Further Noticeof Proposed Rule Making , 2004.[14] M. Felegyhazi, J.P. Hubaux, and L. Buttyan. Nash equilibria of packet forwarding strategies in wireless ad hoc networks.
IEEE Transactions on Mobile Computing , Vol. 5, No. 5, pp. 463–476, May 2006.[15] J. Huang, R.A. Berry, and M.L. Honig. Distributed interference compensation for wireless networks.
IEEE Journal onSelected Areas of Communications , Vol. 24, No. 5, pp. 1074–1084, May 2006.[16] J.J. Jaramillo and R. Srikant. DARWIN: distributed and adaptive reputation mechanism for wireless ad-hoc networks. inProc. of ACM International Conference on Mobile Computing and Networking (Mobicom) , 2007. [17] Y. Jin and G. Kesidis. Distributed Contention Window Control for Selfish Users in IEEE 802.11 Wireless LANs. IEEEJournal on Selected Areas in Communications ,. Vol. 25, No. 6, pp. 1113–1123, August 2007.[18] Y. Jin and G. Kesidis. A channel-aware MAC protocol in an ALOHA network with selfish users.
IEEE Journal on SelectedAreas in Communications- Special Issue on Game Theory in Wireless Communications , Vol. 30, No. 1, pp. 128–137, January2012.[19] Y.A. Korilis, A.A. Lazar, A. Orda. Architecting noncooperative networks.
IEEE Journal on Selected Areas in Communi-cations , Vol. 13, No.7, pp.1241-1251, September 1995.[20] A.F. Molisch, Wireless Communications,
Wiley , 2010.[21] P.K. Muthuswamy, K. Kar, A. Gupta, S. Sarkar, and G. Kasbekar. Portfolio Optimization in Secondary Spectrum Markets.In
Proc. WiOpt , pp. 249–256, 2011.[22] H. Mutlu, M. Alanyali, and D. Starobinski. Spot Pricing of Secondary Spectrum Access in Wireless Cellular Networks.
IEEE/ACM Transactions on Networking , Vol. 17, No. 6, pp. 1794–1804, December 2009.[23] D. Niyato and E. Hossain. Competitive pricing for spectrum sharing in cognitive radio networks: dynamic game,inefficiency of Nash equilibrium, and collusion.
IEEE Journal on Selected Areas in Communications , Vol. 26, No. 17,pp. 192-202, January 2008.[24] M. J. Osborne and A. Rubinstein.
A Course in Game Theory . MIT Press Books, 1999.[25] W. H. Press and F. J. Dyson. Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent.
Proceedings of the National Academy of Sciences
Vol. 109, No. 26, pp. 10409–10413, June 2012.[26] Radio Spectrum Policy Group.
Report on collective use of spectrum and other sharing approaches . 2011.[27] C.U. Saraydar, N.B. Mandayam and D. Goodman. Efficient power control via pricing in wireless data networks.
IEEETransactions on Communications , Vol. 50, No. 2, pp. 291-303, February 2002.[28] B. Wang, Y. Wu, K.J. Ray Liu. Game theory for cognitive radio networks: An overview.