Auction-based Charging Scheduling with Deep Learning Framework for Multi-Drone Networks
11 Auction-based Charging Scheduling with DeepLearning Framework for Multi-Drone Networks
MyungJae Shin, Joongheon Kim,
Senior Member, IEEE, and Marco Levorato,
Member, IEEE
Abstract —State-of-the-art drone technologies have severe flighttime limitations due to weight constraints, which inevitably leadto a relatively small amount of available energy. Therefore,frequent battery replacement or recharging is necessary in appli-cations such as delivery, exploration, or support to the wirelessinfrastructure. Mobile charging stations (i.e., mobile stations withcharging equipment) for outdoor ad-hoc battery charging is oneof the feasible solutions to address this issue. However, the abilityof these platforms to charge the drones is limited in terms ofthe number and charging time. This paper designs an auction-based mechanism to control the charging schedule in multi-drone setting. In this paper, charging time slots are auctioned,and their assignment is determined by a bidding process. Themain challenge in developing this framework is the lack ofprior knowledge on the distribution of the number of dronesparticipating in the auction. Based on optimal second-price-auction, the proposed formulation, then, relies on deep learningalgorithms to learn such distribution online. Numerical resultsfrom extensive simulations show that the proposed deep learning-based approach provides effective battery charging control inmulti-drone scenarios.
Index Terms —Auction, Deep learning, Charging, Drone net-works, Unmanned aerial vehicle (UAV)
I. I
NTRODUCTION
The possibility to use commercial drones in a broad rangeof applications is being extensively studied by the researchcommunity, and they are expected to manned operations inremote locations [1]. In general, commercial drones haveinherent limitations in the amount of energy available tosupport their operations. This is due to the energy/weight ratioof current energy storage technologies, where increasing thecapacity of the battery beyond a certain point degrades flighttime due to excessive weight.As a consequence, effective battery management is one ofthe main enablers of practical deployments of drone-basedtechnologies and applications. Importantly, in applications re-quiring extensive flight time, the energy constraint problem can
This research was supported by the Chung-Ang University GraduateResearch Scholarship in 2018 (for MyungJae Shin) and also by Institute forInformation & Communications Technology Promotion (IITP) grant funded bythe Korea government (MSIT) (No.2018-0-00170, Virtual Presence in MovingObjects through 5G).Copyright (c) 2015 IEEE. Personal use of this material is permitted.However, permission to use this material for any other purposes must beobtained from the IEEE by sending a request to [email protected]. Shin and J. Kim are with the School of Computer Science and Engineer-ing, Chung-Ang University, Seoul, Korea e-mails: [email protected],[email protected]. Levorato is with the Department of Computer Science, Donald BrenSchool of Information and Computer Sciences, University of California atIrvine, Irvine, CA 92697, USA e-mail: [email protected]. Kim is the corresponding author.
Fig. 1: Multi-drone network model for mobile charging sta-tions.not be solved by only optimizing power consumption. Thus,charging during the completion of long-term tasks has beenproposed to extend the operational range of the drones [1],[2]. There are several ways to powering the drones whichhave been proposed in the literature. We can divide theminto two main classes: (i) harvesting energy directly fromthe surrounding environment, and (ii) taking energy from anelectrical source such as a charging station [2]. Within thelatter class of approaches, the charging stations can be eitherstationary or mobile. However, solutions based on stationarycharging stations may constrain the geographical area ofoperations around specific locations. In order to deal with thisissue, mobile charging stations can be used although they faceother challenges [2]. As they are mobile, the size of thesecharging stations needs to be comparably smaller to that offixed stations. As a consequence, the capacity of the systemhas limitations, leading to relatively low chargin speeds anda relatively smaller number of drones that can be chargedsimultaneously [2], [3].Motivated by this compelling problem, we consider a sce-nario where multiple drones compete to access the servicesprovided by a mobile charging station (see Fig. 1). The frame-work proposed in this paper controls the charging process ofthe drones, where the charging station takes the role of leaderin the distributed drone-charging system, and coordinationwithin the system is supported by Internet-of-Vehicle (IoV)networking functions [4]–[6].We take an econometric approach, where the problem ofcontrolling the scheduling is formulated as an auction, whoseobjective is to maximize the utility of the drones (i.e., thedifference between payment and bid during auction computa-tion) as well as the station’s revenue (i.e., payment receivedby the drones through charging scheduling). In general auctionproblems, buyers (the drones in this system) bid to accessservices periodically auctioned by a seller (the mobile chargingstation in the considered setting). The value of the bid isindividually, and privately, estimated by each drone based a r X i v : . [ c s . G T ] J a n on the urgency of its charging needs. The auction approachis especially useful when there is no accurate estimationof the buyer’s true valuation, and buyers are not aware ofthe private true values of other buyers. In the drone net-work model considered herein, the drones are assumed tobe non-cooperative, that is, they operate independently anddistributely. Furthermore, the mobile charging station is notassumed to know the exact true values associated with eachdrone, which became available only when the actual valuesare submitted. The auction approach we take is especiallysuitable to solve the problem of assigning time slots to dronesin this information-limited system. Among the various auctionformulations available (e.g., ascending auction, descendingauction, first price auction, second price auction), we choosea second price auction formulation, where the highest bidderwins but the price paid is set to the second highest bid. One ofthe main benefits of the second price auction is that it resultsin a truthful auction process.In the considered system model, the mobile station isthe auctioneer and owner/seller of the resource (that is, thecharging time slot) and each drone is considered as a buyer.The drones are in competition for scheduling battery chargingwith price bidding via its own private valuation for auction.As auctioneer, the mobile station (i) receives all bids from thedrones, (ii) calculates the charging time allocation probabilitiesand payments, (iii) assigns the charging time to the drone(i.e., the winner in auction) who bids the highest value, corre-sponding to the largest allocation probability, (iv) announcesthe value which should be paid by the winner drone, and (v)receives the payment.During the auction, drones strategically submit bids toincrease their profits, i.e., utility. Similarly, the resource-ownedauctioneer is not a sacrificial seller, thus it is required toconsider the revenue in auctioneer, i.e., profitable. Therefore,revenue-optimal auctions have been considered as one ofmajor objectives in auction design. Although there are manyvariants already available in the literature auction theory, theproblem of simultaneously optimizing auctioneer’s revenueand buyers utility is still open [7]–[10]. Among variousauction algorithms, Myerson auction is one of the most ef-ficient revenue-optimal single-item auctions [11]. The auctiontransforms the bid value, and then winner and payment isdetermined based on the transformed bid. At that point, ifthe transformation function is monotonic, the revenue-optimalauction is configured. Therefore, the proposed auction designsthe revenue-optimal auction based on the concept of theMyerson auction.However, it is difficult to apply the existing auction as itis in the distributed drone network environment consideredin this paper. The charging scheduling system of the dronesis still in the early stages of research; and key properties ofthe system such as drones location distribution and residualenergy distribution have not been fully characterized in theliterature. Therefore, a system that can extract the desired data(i.e., distribution of drones), from the actual system withoutprior knowledge or assumptions is desirable. Therefore, thispaper takes advantage of deep learning to learn importantfeatures on-the-fly from the operating environment. Recently, frameworks combining game theory and deep learning havebeen active subject of research [12], [13]. Results illustrateapplications of such approach in various domains [14]–[16].The key is that deep learning can automatically extract andlearn important features from data, and it has been widelydemonstrated that neural network structures can approximatecomplex non-linear functions [17]–[19]. In this paper, we usethis feature to approximate some key – monotonic – functionsgoverning the behavior of the system using relatively simpleneural networks [16]. Specifically, we use deep learning tolearn the features necessary for the virtual transformationstep of Myerson auctions. Then, the proposed auction isconfigured by replacing the trained deep learning network witha virtual transformation function. We remark that the functionsto be learn by the deep learning layer is non-decreasingmonotonic [11].The proposed deep learning network uses the
ReLU (acti-vation function) and softmax (classification function) whichare widely used in optimization procedures. In addition, dueto the fact that the operations mostly amount to linear multi-plications, the proposed approach has low complexity, and itsexecution takes a limited amount of time.
Contributions.
Our proposed auction-based charging schedul-ing algorithm makes the following contributions. First, therevenue of auctioneer is considered even if the drones submitfalse/fake bids, i.e., thus the proposed algorithm is self-configurable and truthful. The proposed auction automaticallylearns environmental features. In distributed drone scenarios,various time varying features exist that make self-configurablenature essential to adapt to different scenarios and environ-ments. The proposed deep learning based auction structure issimple to be implemented and imposes a small computationburden.
Organization.
The rest of this paper is organized as follows.Sec. II discusses related work, and then Sec. III describes theauction-based mobile charging model. In Sec. IV, the deeplearning based approach is presented. In Sec. V, performanceevaluation results are presented. Sec. VI concludes the paper.II. R
ELATED W ORK
There have been several research results to solve limited-battery and limited resource scheduling problems throughauctions [1], [2], [20], [21]. The method in [1] aims atoptimizing battery assignment and drone scheduling, assumingthat the battery can be quickly replaced. The joint assignmentand scheduling problem is formulated as a two-stage problem,where the assignment problem is solved by a heuristic andthe scheduling problem is formulated as an integer-linearprogramming (ILP) problem. This paper proposes the schedul-ing algorithm based on auction. The proposed method usesinformation provided by drones capable of communicatingwith mobile charging stations to overcome the inability of acentral service provider to acquire perfect state informationin a distributed drone network. However, a solution based onbattery assignment necessarily maps to a stationary service sta-tion. This imposes some limitations [2], which are mitigatedwhen using mobile charging stations.
In [2], a systems of mobile robots executing a trans-portation task supported by a charging station is considered.The location of the charging station is a major factor indetermining the operations and performance of the robots, andthe paper assumes that the mobile charging station is itselfan autonomous robot that attempts to incrementally improveits location. Although this work considers a mobile chargingstation, the problem of charging scheduling is not considered.In a more general scenario, the resources of the chargingstation are limited and the number of robots to be chargedmay be larger than the actual charging capacity of the station.Therefore, the charging system will need to implement formsof prioritization to optimize the charging process. The methodproposed herein incorporates a notion of priority using anauction formulation based on the valuation of the drones.In [20], an auction mechanism is proposed to solve aresource allocation problem in a distributed computing system.The inherently distributed nature of the system makes theresolution of the problem much harder. The paper proposesan auction-based solution to address such challenge. The pro-posed mechanism is configured as a two auction mechanism,used to compute optimal solutions at the single unit withinthe distributed scheduling problem in a computationally effi-cient manner. However, [20] assumes prior knowledge of theenvironment where the auction mechanism is executed, whichmay limit its application in real-world distributed scenarios.The method we propose herein uses an auction-based solutionto solve the resource allocation problem, and employs deeplearning to extract the required features automatically fromthe environment, so that prior knowledge is not necessary.The method in [21] addresses a distributed train schedulingproblem using an auction method. The determination of thewinner is formulated as a mixed-integer problem. The biddingstrategy of the buyers is solved via dynamic programming.In the proposed method, the auctioneer computes the set ofbids that maximizes revenue. Both the method proposed in[21] and the one proposed in this paper are based on anauction formulation to effectively solve resource schedulingin distributed environments and maximize the revenue of theauctioneer. However, the method in [21] differs from theproposed deep learning based auction in terms of the requiredprior information to conduct the auction. The deep learning-based auction proposed in this paper only requires limitedinformation since as it can learn in real-time environmentalcharacteristics and parameters.III. C
HARGING S CHEDULING M ECHANISM D ESIGN
Drone Network Model.
The system is composed of themobile charging station S and U drones . The mobile stationis governed by the charging service controller; and the servicecontroller collects revenue by providing charging services. Therevenue of the charging service controller is recorded and willbe requested later to be paid to drone operators. This paperassumes that the mobile station can provide charging serviceto only one drone in each time slot. Thus, drones competes to The notation used in this paper is summarized in Table I.
TABLE I: Notations
Variables Descriptions U The number of drones S Mobile charging station B Bid profiles B t t -th bid profile u i i -th user c i Maximum battery capacity of u i r i Remaining battery capacity of u i e i Average amperage draw of u i h i Battery discharge of u i f Charging rate per unit time t i Scheduled charging time to u i q i Amount of energy charged of u i l i Flight time with current battery of u i v i The valuation of u i b i The bid of u i b i The transformed bid of u i g i Allocation probability of u i p i The virtual payment of u i p i Actual payment of u i φ i The forward transformation function for u i w ig,n Weight of g -th group, n -th unit for u i w sharedg,n Weight of g -th group, n -th unit of phi shared β ig,n Bias of g -th group, n -th unit for u i β sharedg,n Bias of g -th group, n -th unit of phi shared u i The utility of u i G The number of groups in network N The number of units in group R The number of epoch T The number of bid sets obtain charging opportunities. Note that we consider a short-range Internet of Vehicles (IoV) multi-drone network support-ing short-distance communications among drones based onIEEE 802.11-based wireless local area network (WLAN) tech-nologies. Therefore, the size of the network composed of onesingle mobile charging station and multiple drones is relativelysmall, and we assume that the flight time from drones’ currentpositions to the mobile charging station is negligible. Thus,unexpected operational problems due to the delay induced bylong flight time toward the mobile charging station are notconsidered in this paper. Furthermore, we note that the specificdesign, system capabilities and state of the drones participatingin the auction can vary in terms of battery capacity, residualbattery, charging rates and so forth. Formally, each drone u i is characterized by the battery capacity c i , average amperagedraw e i , and battery residual charge h i , which determines themission lifetime. The average amperage draw e i denotes theamount of amperage required to the drone to operate on-boardsystems such as motors, embedded computers, sensors, etc.Each drone continuously monitors its own state and requeststhe scheduling of a charging slot to the mobile station ifneeded.The requests from multiple drones to the mobile station forcharging services can be interpreted as a distributed competi-tion for a limited resource, which here is modeled and solvedusing an auction-based approach. In the considered setting, theauctioneer is the mobile station, which is also the owner andprovider of the resource, and the drones are the buyers.The mobile station and drones exchange information, i.e.,bids and other auction variables, over wireless links. Themobile station announces the start of the auction to the drones Fig. 2: Auction procedure.when the charging system is ready to serve (i.e., idle). Uponreception of the announcement, each drone makes its ownprivate, and independent, valuation for the use of the chargingsystem. The private valuation v i of drone u i is used to competefor the charging service. Note that the charging resource isassigned at the granularity of individual time slots as illustratedin Fig 2. The mobile station S sells the charging service andobtains revenue p i paid by the winner drone u i via auction. Drone Scheduling Auction Design.
We use second price auc-tion (SPA) as a baseline to design the auction in the consideredsetting. In SPA, all buyers submit their bids privately. Theauctioneer receives the sealed bids and selects as winner thebuyer who made the highest bid. The amount paid by thewinner is set to be equal to the second highest bid value.Herein, the problem of assigning slots to drones is formulatedas a single item auction based on SPA. Therefore, dronescompete for one item, i.e., the charging service. Since theproposed approach is based on SPA, it is guaranteed thatthe charging service will be assigned to the drone with thehighest valuation to the service [22]–[24]. Myerson presentsprovable analytical results for single item auctions optimizingthe auctioneer revenue where each buyer has its own privatevaluation of the resource [11], [18].When the auction-based mechanism is designed, it is im-portant to let the participants act truthfully to ensure systemstability [11], [18], [19], [25]–[27]. Previous studies attemptedto achieve this objective by enforcing truthfulness to individualparticipants. The concepts such as incentive compatibility(IC) and individual rationality (IR) are the characteristics ofauctions inducing the truthful action of participants. Based onthis approach, we use a Myerson auction where the followingcharacteristic is used as the baseline mechanism: The Myersonauction guarantees dominant strategy incentive compatibility(DSIC) and IR.
Definition 1. (Incentive Compatibility [28]) Incentive com-patibility is defined by the following property: if for everybidder j , every valuation v j , all declarations of the otherbidders v − j , and all possible ”false declarations” v (cid:48) j , wehave that bidder j ’s utility with bidding v (cid:48) j is no more thanhis utility with bidding the truth v j . Formally, let λ j and P j be the mechanism ’s output with input ( v j , v − j ) and λ (cid:48) j and P (cid:48) j be the mechanism’s output with input ( v (cid:48) j , v − j ) , then v j ( λ j ) − P j > v j ( λ (cid:48) j ) − P (cid:48) j . Thus, this weaker degree of DSIC guarantees IC, whereIC means that the utility a participant can obtain by actingtruthfully is greater than that by fake acting according toDefinition 1.
Definition 2. (Dominant Strategy Incentive Compatibility)Dominant strategy incentive compatibility is defined as the following property. For each bidder i , and for every possiblereport of the other bidders bid b − i , bidder i weakly maximizesutility by reporting b i = v i . That is, for all possible reports b ∗ i , u i ( v i , b − i ) ≥ u i ( b ∗ i , b − i ) . Thus, the DISC is a stronger degree of IC, meaning thata truthful action is a weakly dominant strategy, that is, theaction is guaranteed to be the best, regardless of the actionsof others, as shown in Definition 2.
Definition 3. (Individual Rationality) Individual rationality(IR) is defined by the following property: for every bidder i and for every v i , we have v i ≥ p i , that is, no bidder is everasked to pay more than its bid valuation. In a DSIC and IR auction, it is in the best interest ofeach bidder to report truthfully. Therefore, these characteristicsmake the overall auction truthful. The Myerson auction guar-antees DSIC and IR, thus encouraging the bidders to reporttruthfully [11], [18]. Furthermore, the Myerson auction alsoguarantees auctioneer’s revenue optimality. In the considereddrone network model, we remark that the charging servicecontroller obtains revenue by providing charging services. Thefollowing subsections describe in detail the components of theMyerson auction mechanism, i.e., private valuation, allocationrule, payment rule, reserve price, utility, and auction design.
Private Valuation.
In the proposed auction, each drone u i has its own individual private valuation v i . Each drone u i hasan maximum battery capacity (denoted by c i ) and a currentremaining battery (denoted by r i ). If the drone is assignedthe mobile charging service time slot, the charged energy willbe added to the residual energy in its own battery r i . Theamount of energy charged by the mobile charging station S can be expressed as q i = min( f · t i , ( c i − r i )) where f is thecharging rate per unit of time in the mobile charging station.The scheduled charging time to u i via auction is denoted by t i . t denotes the item being sold by auction, i.e., charging time.The higher f · t , the higher the valuation of t by drone u i , andthe drone is willing to pay a higher amount for the chargingservice. The expected drone flight time with current batterystatus is denoted as l i , calculated as l i = r i · h i e i . If l i is larger,the drone will give a smaller valuation to t . Let v i denote theprivate valuation of drone u i . Then, v i can be expressed as v i = f · tl i . Allocation Rule.
The allocation rule g is used to determinethe winner drone u i based on the valuation, i.e., to find whichdrone u i should be scheduled for charging. In the Myersonauction, the allocation rule that awards the item to the highestbidder is monotone. Therefore, in the proposed auction model,the allocation rule g used to award the charging service to thehighest bidder is monotone. Therefore, the allocation rule canbe expressed as follows: u ∗ ∈ arg max u i ∈ u g ( v i ) . (1) Payment Rule.
The payment rule p is used to determine thepayment by the winner drone u i based on the valuation. Inthe proposed auction, the payment rule p chooses a paymentwhich is not higher than the private valuation, and it can be expressed as follows: p i ( v i ) ∈ [0 , x i v i ] (2)where x i ∈ { , } , ∀ i = { , . . . , U } stands for the variable torepresent the winning valuation in the auction, and u ∗ is thewinner drone in the auction. Reserve Price.
The proposed auction sets a specific pricecalled a reserve price. The reserve price is the minimum rewardthe seller accepts [11]. In this paper, the reserve price is set to0. In the auction, the auctioneer solicits the private bids fromthe bidders and computes the allocation rule g = ( g , . . . , g U ) and payment rule p = ( p , . . . , p U ) . Utility.
The proposed auction guarantees DSIC and IR; andthus each bidder reports truthfully to maximize its own utility.Note that x i ∈ { , } , ∀ i = { , . . . , U } stands for the variablecorresponding to the winning in the auction. If the drone winsin auction, x is set to whereas the x is set to otherwise.Thus, the utility of drone u i can be calculated as utility ( u i ) = g ( v i ) − x i · p i ( v i ) , ∀ i = { , . . . , U } . Revenue Optimal Auction Design.
We define the virtualvaluation and virtual surplus as in Myerson [11]. The virtualvaluation φ i ( v i ) of a buyer u i in the auction is a function usedto calculate the expected revenue of the auctioneer from thatbuyer u i . The virtual surplus is the expected revenue excludingthe computing cost defined below. In Myerson auctions, eachbidder i has its own individual private valuation v i which isdrawn from the strictly increasing cumulative density function F i ( v i ) where the probability density function of v i is denotedas f i ( v i ) [29], [30]. The virtual valuation of bidder i withprivate valuation v i can be expressed as follows: φ i ( v i ) = v i − − F i ( v i ) f i ( v i ) , (3)There is a cost in computing the outcome c ( g ) which mustbe payed by the auction [31]. Given valuation v i , virtualvaluation φ i ( v i ) , and allocation rule g , the virtual surplus canbe calculated as follows: (cid:88) ∀ i φ i ( v i ) x i − c ( g ) . (4)In Myerson auction, the expected payment is proportionalto the expected virtual surplus; and it can be computed asfollows [11], [31]: E b i [ p i ( b i )] = E b i [ φ i ( b i ) x i ( b i )] . (5)Therefore, if the virtual valuations φ ( b ) are non-decreasingin valuations b , the virtual surplus E b is non-decreasing invaluations b . The bid b is drawn from the distribution F ( b ) with probability density function f ( b ) . Then, the expectedpayment can be computed as follows: E b [ p ( b )] = (cid:90) hb =0 bg ( b ) f ( b ) d b − (cid:90) hb =0 g ( b )[1 − F ( b )] d b. (6) = (cid:90) hb =0 (cid:20) b − − F ( b ) f ( b ) (cid:21) g ( b ) f ( b ) d b. (7) = E b [ φ ( b ) g ( b )] . (8)As a result, the proposed auction approach, which consistsof a variant of the Myerson auction, is DSIC, IR, and revenue Fig. 3: The proposed deep learning framework (revenue net-work) for revenue-optimal auction computation.optimal.However, Myerson auctions require full knowledge of thedistributions F , F , . . . , F U according to Eq. (8). In theconsidered scenario, it is hard to obtain such information apriori, and we propose to use deep learning to estimate thedistributions. In previous research results, it has been shownthat the deep learning with limited structure can approximatespecific functions [15]–[17]. Specifically, herein, we use neuralnetworks and unsupervised learning [18], [19] to approximatethe virtual valuation function φ ( v ) . The strength of deeplearning is that the approximated function can be continuallyupdated as inputs are acquired. The use of unsupervisedlearning makes the learning process possible, as it does notrequire the true values as input. The resulting auction is notonly easily applicable to the distributed multi-drone networkproblem, but is also capable to adapt to continuously changingenvironments.IV. D EEP L EARNING BASED A UCTION D ESIGN
In this section, a deep learning based method for singleitem auctions is introduced. The method defines allocationrule g , payment rule p , and virtual valuation function φ formaximizing the revenue of the mobile charging station viadeep learning. The deep learning model constitutes the auctionthat guarantees DSIC and IR as well as enables the revenueoptimal computation for auctioneer [18], [19]. The revenueoptimal auction can be configured through a relatively simpledeep learning structure, i.e., composed of max/min operationsand a loss function shaping the training process. Theorem 1. (Myerson [11]). There exist a collection ofmonotonically increasing functions φ i : V i → R , referred toas the virtual valuation functions, for selling a single item inthe DSIC mechanism, which assigns the item to the buyer i with the highest virtual value φ i ( v i ) assuming this quantity ispositive and charges the winning bidder the smallest bid thatensures that the bidder is winning. As mentioned earlier, the proposed deep learning basedauction is a variant of Myerson auctions; and thus the bidset b is transformed to the virtual valuation b i via virtualvaluation transformation. Specifically, as expressed in The-orem 1, the bid set of b i , ∀ i = { , . . . , U } are convertedto b i = φ mononet ( b i ) , i = { , . . . , U } , where b i denotesthat the transformed bid of u i . In this procedure, the traineddeep network φ mononet (a monotonic network ) is utilized to replace the virtual valuation function φ . The φ mononet consistof two layers, and is composed of linear computation unitsand min/max operation units. Based on the transformed bid b i , the SPA with reserve price 0 (SPA-0) is performed. TheSPA-0 calculates the allocation probability and the paymentof the winner drone based on the rules (i.e., payment rule p and allocation rule g ) as follow: Theorem 2. (Myerson [11]). For any set of strictly mono-tonically increasing functions φ , . . . , φ U : R ≥ → R ≥ , anauction defined by the allocation rule g i = softmax ( b i ) andpayment rule p i = φ − i (max j (cid:54) = i ( φ ( b i ))) is DSIC and IR. The φ mononet should have non-decreasing monotone featurewhen converting b into transformed bid b . Therefore, theproposed deep learning network has a parameter constraintand a specific structure so that the deep learning network canbe approximated to monotonic function via training process.The used parameters for deep learning, i.e., weights and biases,are positive. The structure of the network, shown in Fig 3, israther simple. The two layers network φ mononet is representedas φ shared ( φ i ( b i )) , ∀ i = { , . . . , U } [16], [18].The assignment rule consist of the softmax operationwhich has been used in deep learning based multimodal clas-sification. The payment rules is composed of max operationand ReLU . The
ReLU makes the transformed bid b i which isless than the reserve price of SPA-0 to be 0. The max unit isused to make p i be the highest transformed bid except p i . Theresults of ReLU and max unit are denoted by p . p i is the value,before conversion to p i , which should be paid by the winnerdrone u i . Note that p i can be larger than b i . Therefore, in aIR auction, p i can not be the payment. Thus p i is convertedto p i via φ − mononet . This process makes the result of deeplearning based auction to be IR when revenue optimal auctionis designed as shown in Fig 3. The φ − mononet can be expressedas φ − shared ( φ − i ( p i )) , ∀ i = { , . . . , U } . The computations of φ − shared and φ − i is described as follows. p (cid:48) i = max ≤ g ≤G (cid:26) min ≤ n ≤ N ( w sharedg,n ) − ( p i − β sharedg,n ) (cid:27) (9) p i = max ≤ g ≤G (cid:26) min ≤ n ≤ N ( w ig,n ) − ( p (cid:48) i − β ig,n ) (cid:27) (10)The two layers network φ mononet constitutes the virtualvaluation function φ of Myerson auction, as shown in Fig 3.In the φ − mononet , it is important to reuse the weights from the φ mononet network as presented in (10) and (9). This forces b i to be equal to φ − mononet ( φ mononet ( b i )) as in the case inwhich the Myerson virtual valuation function is based on fullknowledge of the distribution F . The result of φ − mononet is thepayment which should be paid by winner drone u i . Therefore,the result p i is greater than the second highest bid of b andsmaller than the winning bid b i [11].Additional networks are required to implement the rules inoverall auction processes. The deep learning networks usedin the proposed auction consist of three modular networks asfollows: (i) a network that can replace the virtual valuationfunction φ i of the Myerson auction, (ii) a network for theallocation rule g i , and (iii) a network for the payment rule p i . The above networks are optimized according to a loss functionvia a training process. The loss function is essential to enablethe deep learning computation of the same structure to havedifferent characteristics [32], [33], and plays an important rolein deep learning. Loss Function.
In this paper, the negative expected virtualsurplus is used as a loss function where the virtual surplusis equivalent to the revenue of the mobile charging station,that is, the auctioneer and seller. The loss function is used totrain deep neural network parameters (weights and biases). Thedeep neural network that configures a revenue-optimal auctionis composed of weights (denoted as w ) and biases (denoted as β ) which replaces the virtual valuation function of Myerson.Hhere, the deep neural network model automatically learns thedistribution, and fits its parameters to the actual distributionof the data during the training process. The trained neuralnetworks are approximated by a virtual valuation functionwhich is based on fully distributed knowledge. The parameters w and β of the deep neural network are trained throughunsupervised learning without ground truth information, i.e.,the winner (which drone will be scheduled for charging) andpayment (how many the winner drone will pay). Therefore, theresults of the allocation and payment rules are used for trainingparameters (i.e., (cid:126)w and (cid:126)β ) can be explained as follows: R (cid:16) (cid:126)w, (cid:126)β (cid:17) = − (cid:88) Ui =1 g i ( b i ) ∗ p i ( b i ) (11)where the loss function (11) stands for the expected negativerevenue of auctioneer, i.e., the maximization of the expectedrevenue of the auctioneer since the loss function should beminimized eventually during the training procedure. Based onthe loss function, the benefit of the deep learning based auctionis seen in the training process. The proposed networks whichreplace the virtual valuation function as well as auction rulesare optimized to DSIC, IR and the revenue optimal auction. Deep Learning Training.
The detailed training process ofthe three networks are summarized in Algorithm 1, where,based on the bid b , the payment and allocation probabilities arecalculated (line [4 − ). R ( w, β ) is the loss function to guidethe deep learning network training. The negative expectedrevenue is used as the loss function. The loss function canbe calculated by allocation probability g and payment p (line [10] ). The L ( w, β ) is regularization factor which are usedto regularize the deep learning parameters (weights and bias)(line [11] ). The L ( w, β ) regularization prevents parametersfrom becoming excessively large. The training process is basedon unsupervised learning; and thus the allocation probabilityand payment are the only required information. This meansthat the environmental information such as distribution ofprivate valuation is not required. As a result, the proposeddeep learning network can be easily applied to mobile chargingstations. The parameters are determined by means of empiricalexperiments as the payments of winners are updated sensi-tively due to the weight range (line [15 − ). A. Deep Learning Networks
In this paper, the virtual valuation function is replaced bythe two layers network φ mononet , composed of the monotonic Algorithm 1:
Deep Learning Training
Input : k, U, B = { B , . . . , B T } where each input set B t (cid:44) ( b , . . . , b U ) Output :
Optimized weights w and β Initialize:
The network weights w and β using Xavierinitialization while epoch r: → R do while t : 1 → T do Forward: (cid:46) b (cid:48) i = φ i ( b i ) = min ≤ g ≤G (cid:26) max ≤ n ≤N ( w ig,n b i + β ig,n ) (cid:27) ; (cid:46) b i = φ shared ( b (cid:48) i ) ; (cid:46) g i = softmax (cid:0) b , . . . , b U ; k (cid:1) = e kbi (cid:80) Uj =1 e kbj ; (cid:46) p i = ReLU (cid:8) max j (cid:54) = i ( b i ) (cid:9) ; (cid:46) p (cid:48) i = φ − shared ( p i ) ; (cid:46) p i = φ − i ( p (cid:48) i ) =max ≤ g ≤G (cid:26) min ≤ n ≤N ( w ig,n ) − ( p (cid:48) i − β ig,n ) (cid:27) ; (cid:46) Compute the expected negative revenue R ( w, β ) = − (cid:80) U i =1 g i ( b i ) ∗ p i ( b i ) ; (cid:46) Compute L weight loss L ( w, β ) = (cid:80) Ui =1 (cid:80) G g =1 (cid:80) N n =1 (cid:8) ( w ig,n ) + ( β ig,n ) (cid:9) ; (cid:46) Compute
Loss ( w, β ) = R ( w, β ) + L ( w, β ) Optimize: (cid:46) Update (cid:126)w and (cid:126)β for minimizing
Cost ( w, β ) ; (cid:46) Clip (cid:126)w (min B, max ∞ ) ; (cid:46) Clip (cid:126)β (min 0 , max ∞ ) ; end end networks. Monotonic Network.
As shown in Fig 3, the monotonicnetwork is a three-layer deep neural network. The input layeris configured with multiple groups composed of sets of linearunits. The maximum value of each group is calculated in thesecond layer. The last layer selects the minimum value ofthe given output of the second layer. As the name suggests,the monotonic network is monotonic, and this characteristic ispreserved regardless of the number of groups, units, and theorder of min/max operations.
Virtual Valuation Network.
The virtual valuation function inthe Myerson auction is replaced with the monotonic network.The computation of φ shared and φ i is implemented as follows. b i = φ sharedi ( b (cid:48) i ) = min ≤ g ≤G (cid:26) max ≤ n ≤N ( w sharedg,n b (cid:48) i + β sharedg,n ) (cid:27) (12) b (cid:48) i = φ i ( b i ) = min ≤ g ≤G (cid:26) max ≤ n ≤N ( w ig,n b i + β ig,n ) (cid:27) (13)The bid b i of the drone u i is transformed to b i via the virtualvaluation network φ mononet . In the φ mononet , all outcomes of φ shared are calculated on the same weights, whereas the φ i calculates the outcome using different weights for each bid.The inverse computation of φ mononet is denoted by φ − mononet .The φ − mononet that determines the payment of the winner drone u i which is composed of two networks. In the computationof φ − mononet , the weights of φ mononet are used. Thus thecomputations of two layers can be expressed as follows: p (cid:48) i = φ − shared ( p i ) = max ≤ g ≤G (cid:26) min ≤ n ≤N ( w sharedg,n ) − ( p i − β sharedg,n ) (cid:27) (14) p i = φ − i ( p (cid:48) i ) = max ≤ g ≤G (cid:26) min ≤ n ≤N ( w ig,n ) − ( p (cid:48) i − β ig,n ) (cid:27) (15)The payment p i of the drone u i is transformed to p i via the φ − mononet . The φ − mononet consists of φ − i and φ − shared . Thesame weights are used to calculate all outcomes of φ − shared .The outcome of φ − i is calculated based on different weightsfor each bid, as shown in (15).The monotonic network is responsible for the transforma-tion of the virtual bid b i in auction. As mentioned above, theoptimal revenue is equivalent to the optimal virtual surplus.Thus, the monotonic network is major component of theauction. However, in order to configure the revenue-optimalauction, the additional network by allocation and paymentrules is required. In this paper, the payment and allocationrules are configured with ReLU and softmax which havebeen mainly used in deep learning as an activation function.This makes backpropagation easy during the training process.
Allocation Rule Network ( g i ). This section describes inmore details the structure of the allocation rules ( g i ). In thispaper, since this allocation rule is implemented using a deepneural network, the probability is calculated using softmax which converts the input vector into a probability vector. Theallocation rule g awards the charging service to the highestbidder drone; and thus the highest probability is assigned to thehighest bidder. The continuous function (2) traditional auctionis approximated using the deep network, which converts theinput vector b to the probability vector. In the SPA auction withreserve price 0 (SPA-0), the allocation rule assigns the highestwinning probability to the highest bidder whose transformedbid b i is greater than , b i > . The softmax basedassignment can be calculated as follows: g i = softmax (cid:0) b , . . . , b U ; k (cid:1) = e kb i (cid:80) Uj =1 e kb j . (16)The parameter k is a constant value and it determines thequality of the approximation. As the k increases, the qualityof the approximation increases, whereas the smoothness in theallocation network decreases. For simplicity, this means thatthe higher k makes a large difference between the allocationprobabilities of users [18]. When the networks are trained tominimize (11), the value of g i ( b i ) increases. As a result, sincethe profit of the auctioneer is related to the second highest g i ( b i ) , it is also a function of the parameter k . Results inSec. V) show how larger values of k lead to higher profits. Payment Rule Network ( p i ). This section describes thestructure of the allocation rule ( p i ). The ReLU is widely usedin deep learning computation as an activation function. In theproposed auction, the payment p i of drone u i is calculatedfrom the transformed bid b i . Before the computation of φ − mononet , the deep network excludes the bid below the reserveprice 0 via ReLU ( b i ) (cid:44) max( b i , . The input b i is the secondhighest transformed bid which is the output of max j (cid:54) = i (cid:0) b i (cid:1) .The payment rule network can be, then, calculated as: p i = ReLU (cid:26) max j (cid:54) = i (cid:0) b i (cid:1)(cid:27) (17)and the result p i is used as an input of (9), i.e., the actual Algorithm 2:
Deep Learning-Based Algorithm for theAuction Controlling the Charging Scheduling
Input : t , f , Bid sets b (cid:44) ( b , . . . , b U ) Output: allocation probability set g i (cid:44) ( g , . . . , g U ) ,payment set p i (cid:44) ( p , . . . , p U ) while Mobile charging system is idle do (cid:46) Drones: charging scheduling valuation v i ; (cid:46) Drones: submit bid b i ; (cid:46) b (cid:48) i = φ i ( b i ) = min ≤ g ≤G (cid:26) max ≤ n ≤N ( w ig,n b i + β ig,n ) (cid:27) ; (cid:46) b i = φ shared ( b (cid:48) i ) ; (cid:46) g i = softmax (cid:0) b , . . . , b U ; k (cid:1) = e kbi (cid:80) Uj =1 e kbj ; (cid:46) p i = ReLU (cid:8) max j (cid:54) = i ( b i ) (cid:9) ; (cid:46) p (cid:48) i = φ − shared ( p i ) ; (cid:46) p i = φ − i ( p (cid:48) i ) = max ≤ g ≤G (cid:26) min ≤ n ≤N ( w ig,n ) − ( p (cid:48) i − β ig,n ) (cid:27) ; (cid:46) Calculate winner and payment ( g k , p k ) ; (cid:46) Winner Drone: Pay payment; (cid:46) Allocate charging system to the winner; end TABLE II: Revenue changes by k , U (in Fig.4a-4c) SPA k = 1 k = 3 k = 5 payment of winner drone. B. Overall Auction Mechanism
The overall deep learning-based auction mechanism issummarized in Algorithm 2. If the mobile charging systembecomes idle, the auction is initiated (line [1] ). The valuation v i for the charging time is computed by each drone u i based onits own private criteria. Then, based on the individual privatevaluation, each drone submits its bid b i (line [2 − ). Themobile charging station runs the auction using the pre-trainednetworks. If p = , then all the drones assign a low valuationto the charging time and the mobile charging system does notallocate the charging time to users. If there exist bids which arelarger than reserve price 0, the corresponding allocation andpayment probabilities are calculated using the proposed deeplearning networks, i.e., virtual valuation network, allocationnetwork, and payment network (line [4 − ). Because theproposed deep learning auction is the variant of SPA-0, any bidbelow the reserve price 0 is converted to 0 (line [7] ). As shownin line [11] , the mobile charging station assigns the paymentof p i to the drone u i with the highest g i . Finally, the mobilecharging station allocates the charging time to the winnerdrone u i (line [12] ). The drone, then, reaches the chargingstation and occupy it for the duration of the slot. After thewinner drone leaves the charging station, next iteration startsif the mobile station is idle.V. P ERFORMANCE E VALUATION
Software Prototype.
First, we describe the software developedto test the auction mechanism. The
Xavier initializer was TABLE III: Parameters
Variables DescriptionsThe number of drones 5, 10, 15Learning rate 0.0001 L regularization parameter 0.001Training set size 100000 bid setsSimulation epoch 100Approximate quality k
1, 3, 5Distribution of l i U[1:5], U[5:10], U[1:10]Weight range B 0.0001 used for weight value initialization, where the biases wereinitialized as 0. As mentioned earlier, L regularization isused to prevent excessive parameter growth during trainingand reduce overfitting. The regularization factor was set to . . During the training phase, the Adam [34] optimizerwas us to iteratively. This choice is motivated by the needto keep separated the learning rates for each weight. An exponentially decaying average of previous gradients was usedfor iteration-based optimization. In the experiments, differentuniform distributions were used for data generation, as shownin Table III. Data-intensive evaluation was conducted with , generated data sets. Among the data sets, % of setswere used for training; and the remaining % were used fortesting. The proposed deep learning-based auction mechanismwas implemented in Python/TensorFlow [35] and Keras [36].A multi-GPU platform (equipped with 2 NVIDIA Titan XPGPUs using 1405 MHz main clock and 12 GB memory) wasused for training and testing. Experimental Setting.
The test environment includes 5, 10,or 15 drones. During performance evaluation, the parameter k is determined to control the quality of approximation. First,we compare the proposed model with SPA-0 with a prioriknowledge to demonstrate revenue-optimality. Results showthe ability of the proposed deep learning-based approach toadapt to different scenarios. The valuation results of drones aregenerated based on various distributions as defined in Table III.Table III summarizes the used parameters. Revenue Analysis - Parameter k . The proposed frameworkis based on the Myerson optimal auction, which produces anincreased revenue to the mobile charging station compared toSPA-0 auctions. The experiments shown in Fig. 4 confirm thiseffect, and illustrate the effect of the parameter k . Fig. (4a)-(4c) show a comparison between the revenue of the mobilecharging station – the auctioneer – as a function of theparameter k defined in (16). In the experiments, the bid set isuniformly generated in the range of − . The bid is calcu-lated based on the private valuation as discussed in Sec. III.The value of system parameters, such as battery consumptionrate, and weight, is also assumed to be uniformly distributed.The results in Fig.(4a)-(4c) show that the revenue increasesas the k increases. The numerical results are presented in theTable II. The revenue gap between the SPA-0 and the proposedauction when the number of drone is is near . when k = 1 , near . when k = 3 and about . when k = 5 .The mobile charging station can take the highest revenue, i.e.,the case where k = 5 . This result shows that the revenue ofmobile charging station increases in the order of SP A − , Training Iteration R e v enue k = 1k = 3k = 5SPA (a) Revenue statistics, 5 drones Training Iteration R e v enue k = 1k = 3k = 5SPA (b) Revenue statistics, 10 drones Training Iteration R e v enue k = 1k = 3k = 5SPA (c) Revenue statistics, 15 drones Training Iteration R e v enue (d) Revenue statistics, 5/10/15 drones, k = 3 Fig. 4: Revenue changes by k and U . Revenue
SPA-0k = 1k = 3 (a) Revenue statistics of mobile charging station. R e v enue k = 3k = 1SPA (b) Revenue of mobile charging station. Fig. 5: Revenue analysis. k = 1 , k = 3 and k = 5 . The parameter k determinesnot only the approximation quality of softmax function butalso the revenue of charging station. In Fig. 4, the number ofdrones which participate in the proposed charging schedulingauction is updated. In general, more drones participate inauction, the higher the bid can be submitted to the auctionwith high probability; and thus the second highest bid valueof the auction can be increased while the number of dronesincreases. Note that the revenue of auctioneer is equivalentto the payment of user. Therefore, the payment of winnerdrone increases. As a result, the revenue of mobile chargingstation becomes larger. In the SPA-0, the revenue is increasedfrom . to . when the number of drones increases from to . Similarly, the revenue of proposed chargingscheduling auction increases to . when k = 1 , . when k = 3 and . when k = 5 . This tendency ismaintained when the number of drones increases from to , as shown in (Fig.4a-4c). Fig. 4d and Table. II shows therevenue of k = 3 model when the number of drones increasesfrom to . In this evaluation, the training of deep learningnetwork uses the pre-trained weights. Based on this experi-ment result, we can confirm that the proposed deep learningauction provides higher revenue to mobile charging systemwhen the number of drones increases. In Fig.4, the horizontalaxis of the experiment means the iteration of the proposeddeep learning network training. The convergence of the deep False Rate P a y m en t k=3k=1SPA (a) Payment changes due to false bidding. False Rate P e r c en t o f i n c r ea s e k=1k=3 (b) Increase of payment against SPA due to false bidding. Fig. 6: Payment comparison among drones.TABLE IV: Revenue statistics (in Fig. 5a)
SPA k = 1 k = 3 Mean 8.0536 8.0548 8.6032Top 25 percentile 6.6003 6.6081 7.1733Top 75 percentile 9.0358 9.0372 9.3873
TABLE V: Revenue of mobile charging station (in Fig. 5b)
Case (1) (2) (3) (4) (7)
SPA 7.5585 7.7175 6.1124 6.4550 5.7769 k = 1 k = 3 learning networks during small number of iteration shows highadaptability to specific applications. Fig.(4a-4c) show that theproposed deep learning-based auction can achieve stability inapproximately 300 iterations. Fig. 4d shows that the stabilitycan be achieved much faster when the pre-trained networkis used. This results mean that the proposed deep learningbased auction has high adaptability; and thus it can be appliedto the various environment with partial knowledge valuationdistribution as presented in Sec. III. In Fig.4, the results showthat the proposed auction guarantees the increased revenue ofmobile charging station over SPA-0 and has a highly adaptivealgorithm under partial knowledge distribution (a.k.a., not fullydistributed knowledge). Statistical Analysis (Parameter k ). In this section, we showthe case where the penalty given to participant who bids a falsebid. We confirm that the proposed method imposes penaltyon the false bidder. In addition, the experimental resultsshows that how the penalty varies depending on k values.In Fig. 4, the effect of parameter k can be observed whilethe number of drones varies. Fig. 5 shows that the statisticsanalysis of revenue values for difference k values in (16)and SPA-0 when the number of drones does not vary. Theexperiment results compare the average revenue, maximumrevenue, minimum revenue, top 25 percentile, and top 75percentile. The evaluation uses the deep learning networkswhen the k values are and . As k increases, the gap betweenthe average revenue of model and the average revenue of SPA-0 get larger as shown in Fig. 5a and Table. IV. Whenthe proposed model is k = 1 , the revenue average is . ,similar to the revenue average of SPA-0. However, when the k value of model is , the revenue average is . ; and thusthe model gets near % higher revenue average than SPA-0. When k = 1 , the gap between the proposed model andSPA-0 is near . in terms of top percentile whereasthe gap is about . , when k = 3 . In addition, in terms oftop percentile, the revenue of k = 3 model is about . larger than k = 1 model and SPA-0. This result shows that theproposed model with large k takes higher revenue. Therefore,we can confirm that the revenue of mobile charging stationdeclines in the order of k = 3 , k = 1 , and SPA-0. In Fig. 5b,the graph shows the results of validation experiments, i.e., cases are considered in the validation experiments. Thenumber on the X -axis in Fig. 5b represents the indices ofindividual cases. The result stands for the revenue of mobilecharging station via deep learning auction. We can confirmthat the revenue with k = 1 is always smaller than the onewith k = 3 . The gap between the k = 1 and k = 3 models isabout . in Case ; and the Case is the minimum, whereasthe maximum gap is about in Case as shown in Table V.The revenue with k = 1 is larger than the SPA-0, but similarto SPA-0. The gap between the k = 1 and SPA-0 is . inCase ; and the Case is the minimum. The maximum gap isabout . in Case . This experiments also show that the gapbetween the results by the two models with k = 1 / k = 3 andthe results of SPA-0 are not always equivalent. For example,the gap between k = 1 and k = 3 models is near . in Case m however near . in Case . This is due to the fact thatthe transformation depends on the weight of φ i ; and thus thetransformation via φ mononet is not applied equally to the samebids. This means that if b = 2 and also b = 2 , these twobids can be transformed differently. Therefore, the paymentis not always equivalent. This means that the proposed deeplearning auction adapts to the bid distribution at the time ofthe auction procedure, giving the mobile charging station highrevenue. It can be seen that higher revenue is guaranteed byincreasing the value of parameter k .The proposed deep learning based auction algorithm has astrength in terms of giving penalty to false bidder. In Fig. 6, Training iteration R e v enue SPA-0 (Gaussian)Gaussian [Transfered]UniformSPA-0 (Uniform) (a) Transfer learning by changing bid distribution, U = 5 Number of auctions processed N u m be r o f d i sc ha r ged d r one s Random ChargingProposed Auction (b) Number of discharging drones during the auction, U = 15 Fig. 7: Performance improvements under various bid distributions.TABLE VI: Payment of drone (in Fig. 6a)
False Rate . . . . SPA (6a) 8.6177 8.6177 8.6177 8.6177 k = 1 (6a) 11.2161 17.8814 24.3196 32.4915 k = 3 (6a) 11.3513 18.5424 25.1623 33.8989 k = 1 (6b) 130.15% 207.49% 282.20% 377.03% k = 3 (6b) 131.72% 215.16% 291.98% 393.36% experiment results present the payment of drone when thedrone submits bid falsely (i.e., fake bid). This experimentconducts with the models of k = 1 and k = 3 . The experimentassumes that the number of drones which participate in auctionis . The truth valuation of drone which submits the bid falselyis set to . . The bid values of the other drones aregenerated by uniform distribution. This experiment uses thescenario where drones exist and one is with fake bid andthe other four are with truthful bids. In Fig. 6a, the secondhighest bid is set to . as shown in Table VI. This resultshows that a drone cannot win the auction when it bids up to . − . times larger than the true valuation . . As aresult, the drone is defeated in the auction due to false bid.On the other hand, when a drone submits bid as . − timeslarger than true valuation, the fake bid leads to win in auction.However, the fake bid increases the payment in the fake-biddrone. For example, if the drone submits near bid falselyin the SPA-based auction, the drone can only pay about . .However, the payment is . in the proposed auction with k = 1 . This means that the bidding of drone which falselysubmit the bid for getting charging increases the payment. Inaddition, the payment increases when the k value of modelsincreases. Table VI also shows the payment increment while k increases. The increased payment of the proposed model isat least % greater than that of the auction using SPA-0 asshown in Table VI. When the drone submits true valuation,the payment is just about % higher than the SPA-0 auction.However, if the bid is . times larger than the true valuation,the payment is about % larger than the SPA-0 auction. Thisexperiment shows that if drone submits false bid for winningthe auction, the drone gets a loss in terms of the payment; andthus the loss let the drone avoid fake bidding. Fig. (7a) shows the proposed deep learning based auctioncan be trained through transfer learning when the distributionof bid values varies. The dotted lines are revenue when theSPA-0 is executed. The red and blue lines stand for the revenuewhen the proposed algorithm is used. The revenue is higherthan SPA-0 as shown in previous experiments. The proposedmodel is stabilized with 400 training iterations if trainingstarts from the initialized model. If the training starts fromthe trained model (i.e., transfer learning), the proposed modelis stabilized with approximately 100 training iterations whenthe bid distribution varies. This experiment shows that theproposed model adapts to the change of the bid distributionand can provide reasonable results at various distribution. InFig. (7b), we consider the flight energy consumption of dronesin this experiment as follows [37], [38]. E ( t ) = ( β + αz ) · t + P max (cid:16) zs (cid:17) (18)where α is a motor speed multiplier, β is the minimum powerneeded to hover just over the ground (when altitude is almostzero), z means the height at time step t , and s is the speedof drone and P max is the maximum power of motor to flight.Therefore, the term P max ( zs ) refers to the power consumptionneeded to lift to height h with speed v [37], [38]. In thisexperiment, we set the values of α to . , β to , z to m ,and P max to . Fig. (7b) shows that the number of dronesdischarged from battery. In this experiment, we assumed thatthe charging service fully charges the battery of the droneswhich wins in the auction. The charging service is only for time slot. Therefore, the drones consume . mAh per time step (1 hour) and recharge mAh when recharged.In this experiment, we assume that drones exist and theywant to constantly join the charging service scheduling. InFig. (7b), when the proposed deep learning auction is usedto schedule charging services, of the drones can becharged without being discharged. This experiment shows thatthe proposed method can increase the drone flight time inmulti-drone networks. Deep Learning Model Characteristics.
These experimentalresults explain why monotonic network is considered as the Bid Ø ( i ) Second Highest BidWinning Bid (a) b (cid:48) i bid change, U = 5 Bid Ø ( i ) Second Highest BidWinning Bid (b) b (cid:48) i bid change, U = 10 Bid Ø ( i ) Second Highest BidWinning Bid (c) b (cid:48) i bid change, U = 15 Fig. 8: φ i ( b i ) changes according to drone’s valuation. Bid Ø ( s ha r ed ) Second Highest BidWinning Bid (a) b i bid change, U = 5 Bid Ø ( s ha r ed ) Second Highest BidWinning Bid (b) b i bid change, U = 10 Bid Ø ( s ha r ed ) Second Highest BidWinning Bid (c) b i bid change, U = 15 Fig. 9: φ shared ( b (cid:48) i ) changes according to drone’s valuation.baseline structure of the proposed deep learning auction archi-tecture. In the proposed auction, when bids are transformed tovirtual values through the monotonic network, the order ofbids must be maintained. For example, if the bid b of user u is larger than the bid b of user u , converted b must belarger than b . The corresponding experimental results showthat the monotonic network performs the transformation thatmaintains the order. Fig. 8 shows the transformed value whichis the result of virtual valuation function φ i when the bid ofthe winner drone and the second highest bid increase from to . The fixed weights of the φ i were used when thewinning bid and the second highest bid were transformedbecause the weights are continuously updated during deeplearning training process. For this experiment, the networksare trained when the winning bid is . , the secondhighest bid is . , and other bids are generated alongthe uniform distribution. In Fig. 8a, the transformed value p (cid:48) i of the second highest bid is higher than the one of winningbid. This tendency can be shown in Fig. 8b and Fig. 8c. Bycomparing the Fig. 8a and Fig. 8b, it can be observed that thetransformed bid decreases as the number of drones increases.However, through Fig. 8b and Fig. 8c, we can confirm that thetransformed value b (cid:48) i is independent to the number of drones.Instead of the number of drones, the weights are affected bythe allocation probability and payment of other bids becausethe loss function is configured based on the allocation ruleand payment rule. Fig. 8 also shows the φ i network conductsnon-decreasing monotonic transformation. Therefore, the φ i network is able to replace φ function in Myerson auctionbecause the φ i network performs a monotonic transformation.In Fig. 9, the changes of φ shared ( φ i ( b i )) is presented whenthe bid of the winner drone and the second highest bid increasefrom to . The fixed weights of the φ shared were also used. The deep learning network is trained when the winning bid is . , the second highest bid is . , and otherbids are generated along the uniform distribution similar to theevaluation for Fig. 8. It can be seen that the result of φ shared is larger than the one of the φ i shown in Fig. 8 because thecomputation of φ shared is conducted based on the result ofthe φ i as well as the φ shared does non-decreasing monotonictransformation. Through comparison of Fig. 9a and Fig. 9b,it can be seen that the result of φ shared decreases when thenumber of drones increases, as shown in Fig. 8. However, inFig. 9c, it can be observed that the result of φ shared with drones becomes larger than the one of φ shared with drones. Therefore, this experiment shows that the weights areaffected by the allocation probability and payment and theyare independent to the number of drones. The transformedbid p i of the second highest bid is larger than the one ofthe winning bid. The tendency which is shown in Fig. 8 ismaintained. The φ i network has independent weights per inputdata. That is, if there are inputs, there are φ i networks, andall learning networks have different weights. However, φ shared network is just one network regardless of the number of inputs.If there are inputs, only one φ shared network exists. ThroughFig. 8 and Fig. 9, it can be observed that the non-decreasingmonotonic feature of φ i networks and φ shared are trained bythe limited network structure and the loss function regardlessof whether the weights are shared or not.VI. C ONCLUDING R EMARKS AND F UTURE W ORK
The proposed deep learning based auction is revenue op-timal for mobile charging scheduling in distributed multi-drone networks. In this paper, the mobile charging schedulingproblem is interpreted as auction problem where each dronebids its own valuation and then the charging station schedules drones based on it in terms of revenue-optimality. Through theproposed deep-learning based solution approach, the chargingauction enables efficient scheduling by automatically learn-ing the required knowledge (i.e., bids distribution), whichis required in conventional auction mechanisms. Therefore,environmental information is not required anymore in auctioncomputation. This makes effective troubleshooting possible indistributed multi-drone networks. The proposed algorithm onlyrequires payment and allocation probabilities by the multi-drones. The loss function in deep learning computation is animportant factor that allows the proposed auction to be con-structed based on environment independent information. Asverified via software prototype based performance evaluation,following facts are observed: (i) guaranteeing optimal revenuein terms of individual rationality and dominant strategy in-centive compatibility, (ii) limiting the false bids of dronesby increasing the payment to the false-bid drones, and (iii)enabling a revenue optimal auction to be constructed withoutcomplex prior knowledge, i.e., bids distribution.As future research directions, advanced auction mechanismdesigns with multiple mobile charging stations are worthy toconsider. In this case, the problem can be formulated withmulti-item auction and then the corresponding mathematicalformulation, verification, and analysis are desired. Further-more, the proposed deep learning-based auction mechanismscan be advantageous in various applications. For example,visual attention is considerable because it can be reformulatedas resource allocation [39]–[42].R EFERENCES[1] S. Park, L. Zhang, and S. Chakraborty, “Battery assignment and schedul-ing for drone delivery businesses,” in
Proc. IEEE/ACM ISLPED , 2017.[2] A. Couture-Beil and R. T. Vaughan, “Adaptive mobile charging stationsfor multi-robot systems,” in
Proc. IEEE IROS , 2009.[3] H. Frankenberger, “Mobile charging station,” October 2017, US Patent9,780,579.[4] J. Wang, C. Jiang, Z. Han, Y. Ren, and L. Hanzo, “Internet of Vehicles:sensing-aided transportation information collection and diffusion,”
IEEETrans. Vehicular Technology , vol. 67, no. 5, pp. 3813–3825, 2018.[5] X. Hou, Y. Li, M. Chen, D. Wu, D. Jin, and S. Chen, “Vehicular fogcomputing: A viewpoint of vehicles as the infrastructures,”
IEEE Trans.Vehicular Technology , vol. 65, no. 6, pp. 3860–3873, 2016.[6] J. Chen, G. Mao, C. Li, W. Liang, and D.-g. Zhang, “Capacity ofcooperative vehicular networks with infrastructure support: Multiusercase,”
IEEE T. Vehicular Technol. , vol. 67, no. 2, pp. 1546–1560, 2018.[7] W. Yong, Y. Li, L. Chao, C. Wang, and X. Yang, “Double-auction-basedoptimal user assignment for multisource–multirelay cellular networks,”
IEEE Trans. Vehicular Technology , vol. 64, no. 6, pp. 2627–2636, 2015.[8] H.-B. Chang and K.-C. Chen, “Auction-based spectrum management ofcognitive radio networks,”
IEEE Trans. Vehicular Technology , vol. 59,no. 4, pp. 1923–1935, 2010.[9] Y. Wen, J. Shi, Q. Zhang, X. Tian, Z. Huang, H. Yu, Y. Cheng, andX. Shen, “Quality-driven auction-based incentive mechanism for mobilecrowd sensing,”
IEEE Trans. Vehicular Technology , vol. 64, no. 9, pp.4203–4214, 2015.[10] C. Yi and J. Cai, “Multi-item spectrum auction for recall-based cognitiveradio networks with multiple heterogeneous secondary users,”
IEEETrans. Vehicular Technology , vol. 64, no. 2, pp. 781–792, 2015.[11] R. B. Myerson, “Optimal auction design,”
INFORMS Mathematics ofOperations Research , vol. 6, no. 1, pp. 58–73, 1981.[12] B. Subba, S. Biswas, and S. Karmakar, “A game theory based multilayered intrusion detection framework for VANET,”
Future GenerationComputer Systems , vol. 82, pp. 12–28, 2018.[13] Y. Tian, S. Min, and Q. Wu, “Application of neural network to gamealgorithm,”
J. Computer and Communi. , vol. 6, no. 2, pp. 1–12, 2018. [14] X. Wang, L. Kong, F. Kong, F. Qiu, M. Xia, S. Arnon, and G. Chen,“Millimeter wave communication: A comprehensive survey,”
IEEECommunications Surveys & Tutorials , 2018.[15] S. You, D. Ding, K. Canini, J. Pfeifer, and M. Gupta, “Deep latticenetworks and partial monotonic functions,” in
Proc. NIPS , 2017.[16] J. Sill, “Monotonic networks,” in
Proc. NIPS , 1998.[17] B. C. Cs´aji, “Approximation with artificial neural networks,”
Faculty ofSciences, Etvs Lornd University, Hungary , vol. 24, 2001.[18] P. D¨utting, Z. Feng, H. Narasimhan, and D. C. Parkes, “Optimal auctionsthrough deep learning,” arXiv preprint arXiv:1706.03459 , 2017.[19] N. C. Luong, Z. Xiong, P. Wang, and D. Niyato, “Optimal auction foredge computing resource management in mobile blockchain networks:A deep learning approach,” arXiv preprint arXiv:1711.02844 , 2017.[20] M. P. Wellman, W. E. Walsh, P. R. Wurman, and J. K. MacKie-Mason,“Auction protocols for decentralized scheduling,”
Games and EconomicBehavior , vol. 35, 2001.[21] D. C. Parkes and L. H. Ungar, “An auction-based method for decentral-ized train scheduling,” in
Proc. ACM AGENTS , 2001.[22] P. Sujit and R. Beard, “Distributed sequential auctions for multiple uavtask allocation,” in
Proc. American Control Conference (ACC) , 2007.[23] T. Lemaire, R. Alami, S. Lacroix et al. , “A distributed tasks allocationscheme in multi-UAV context,” in
Proc. IEEE ICRA , 2004.[24] L. Bertuccelli, H.-L. Choi, P. Cho, and J. How, “Real-time multi-UAVtask assignment in dynamic and uncertain environments,” in
Proc. AIAAGuidance, Navigation, and Control Conference , 2009.[25] Y. Jiao, P. Wang, D. Niyato, and K. Suankaewmanee, “Auction mecha-nisms in cloud/fog computing resource allocation for public blockchainnetworks,” arXiv preprint arXiv:1804.09961 , 2018.[26] L.-H. Yen and G.-H. Sun, “Decentralized combinatorial auctions formulti-unit resource allocation,” arXiv preprint arXiv:1804.05635 , 2018.[27] A. M. Khan, X. Vilac¸a, L. Rodrigues, and F. Freitag, “A distributedauctioneer for resource allocation in decentralized systems,” arXivpreprint arXiv:1604.07259 , 2016.[28] Y. Bartal, R. Gonen, and N. Nisan, “Incentive compatible multi unitcombinatorial auctions,” in
Proc. ACM Conference on Theoretical As-pects of Rationality and Knowledge (TARK) , 2003.[29] V. Krishna,
Auction theory . Academic press, 2009.[30] W. Vickrey, “Counterspeculation, auctions, and competitive sealed ten-ders,”
The Journal of finance , vol. 16, no. 1, pp. 8–37, 1961.[31] J. Hartline, “Lectures on optimal mechanism design,”
LectureNotes , 2006. [Online]. Available: http://users.eecs.northwestern.edu/$ \ sim$hartline/omd.pdf[32] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein GAN,” arXivpreprint arXiv:1701.07875 , 2017.[33] X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. P. Smolley, “Leastsquares generative adversarial networks,” in Proc. ICCV , 2017.[34] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 , 2014.[35] M. Abadi, et. al. et al. , “Keras,” https://keras.io, 2015.[37] L. D. P. Pugliese, F. Guerriero, D. Zorbas, and T. Razafindralambo,“Modelling the mobile target covering problem using flying drones,”
Optimization Letters , vol. 10, no. 5, pp. 1021–1052, 2016.[38] D. Zorbas, L. D. P. Pugliese, T. Razafindralambo, and F. Guerriero,“Optimal drone placement and cost-efficient target coverage,”
Journalof Network and Computer Applications , vol. 75, pp. 16–31, 2016.[39] D. Zhang, J. Han, C. Li, J. Wang, and X. Li, “Detection of co-salientobjects by looking deep and wide,”
International Journal of ComputerVision , vol. 120, no. 2, pp. 215–232, 2016.[40] D. Zhang, D. Meng, and J. Han, “Co-saliency detection via a self-pacedmultiple-instance learning framework,”
IEEE Trans. on Pattern Analysisand Machine Intelligence , vol. 39, no. 5, pp. 865–878, 2017.[41] J. Han, K. N. Ngan, M. Li, and H.-J. Zhang, “Unsupervised extraction ofvisual attention objects in color images,”
IEEE Transactions on Circuitsand Systems for Video Technology , vol. 16, no. 1, pp. 141–145, 2006.[42] J. Han, D. Zhang, X. Hu, L. Guo, J. Ren, and F. Wu, “Background prior-based salient object detection via deep reconstruction residual,”
IEEETransactions on Circuits and Systems for Video Technology , vol. 25,no. 8, pp. 1309–1321, 2015. MyungJae Shin is currently an M.S. Student inComputer Sciene and Engineering, Chung-Ang Uni-versity (CAU), Seoul, Korea. He received his B.S. incomputer science and engineering from CAU, Seoul,Korea, with the second highest honor from the CAUCollege of Engineering. His research interests are invarious econometric theories and their deep-learningbased computational solutions. He was a recipientof the National Science & Technology Scholarship(2016–2017).
Joongheon Kim (M’06–SM’18) is currently an as-sistant professor with Chung-Ang University Schoolof Computer Science and Engineering, Seoul, Ko-rea, since 2016. He received his B.S. (2004) andM.S. (2006) in computer science and engineeringfrom Korea University, Seoul, Korea; and his Ph.D.(2014) in computer science from the University ofSouthern California (USC), Los Angeles, CA, USA.In industry, he worked for LG Electronics (Seoul,Korea, 2006–2009), InterDigital (San Diego, CA,USA, 2012), and Intel Corporation (Santa Clara, CA,USA, 2013–2016).He is a senior member of the IEEE. He was a recipient of the AnnenbergGraduate Fellowship with his Ph.D. admission from USC (2009) and theHaedong Young Scholar Award (2018) which is for recognizing a youngKorean researcher under the age of 40 who has made outstanding scholarlycontributions to communications and information sciences research.