[PDF] On the Visual-based Safe Landing of UAVs in Populated Areas: a Crucial Aspect for Urban Deployment

Abstract

Autonomous landing of Unmanned Aerial Vehicles (UAVs) in crowded scenarios is crucial for successful deployment of UAVs in populated areas, particularly in emergency landing situations where the highest priority is to avoid hurting people. In this work, a new visual-based algorithm for identifying Safe Landing Zones (SLZ) in crowded scenarios is proposed, considering a camera mounted on an UAV, where the people in the scene move with unknown dynamics. To do so, a density map is generated for each image frame using a Deep Neural Network, from where a binary occupancy map is obtained aiming to overestimate the people's location for security reasons. Then, the occupancy map is projected to the head's plane, and the SLZ candidates are obtained as circular regions in the head's plane with a minimum security radius. Finally, to keep track of the SLZ candidates, a multiple instance tracking algorithm is implemented using Kalman Filters along with the Hungarian algorithm for data association. Several scenarios were studied to prove the validity of the proposed strategy, including public datasets and real uncontrolled scenarios with people moving in public squares, taken from an UAV in flight. The study showed promising results in the search of preventing the UAV from hurting people during emergency landing.

Full PDF

OOn the Visual-based Safe Landing of UAVs in Populated Areas: a CrucialAspect for Urban Deployment.

Javier Gonz´alez-Trejo, Diego Mercado-Ravell*, Israel Becerra and Rafael Murrieta-Cid

Abstract — Autonomous landing of Unmanned Aerial Vehicles(UAVs) in crowded scenarios is crucial for successful deploy-ment of UAVs in populated areas, particularly in emergencylanding situations where the highest priority is to avoid hurtingpeople. In this work, a new visual-based algorithm for identify-ing Safe Landing Zones (SLZ) in crowded scenarios is proposed,considering a camera mounted on an UAV, where the peoplein the scene move with unknown dynamics. To do so, a densitymap is generated for each image frame using a Deep NeuralNetwork, from where a binary occupancy map is obtainedaiming to overestimate the people’s location for security reasons.Then, the occupancy map is projected to the head’s plane,and the SLZ candidates are obtained as circular regions inthe head’s plane with a minimum security radius. Finally, tokeep track of the SLZ candidates, a multiple instance trackingalgorithm is implemented using Kalman Filters along with theHungarian algorithm for data association. Several scenarioswere studied to prove the validity of the proposed strategy,including public datasets and real uncontrolled scenarios withpeople moving in public squares, taken from an UAV in ﬂight.The study showed promising results in the search of preventingthe UAV from hurting people during emergency landing.

Index Terms — UAVs, Autonomous Landing, Crowds Detec-tion, Visual-based Landing, Density Maps.

I. I

NTRODUCTION

Over the last decades, the world have witnessed a greatinterest in the development and production of UnmannedAerial Vehicles (UAVs), ﬂooding the market with all kinds ofdrones for different budgets and with continuously increasingcapabilities. As a direct consequence, the number of suc-cessful civilian applications has been steadily increasing, andeveryday, it is more common to see UAVs integrated in ourdaily lives, interacting closer to the people in a wide varietyof tasks ranging from surveillance, photography, structureinspection, load transportation, fast response in disasters, tosimple entertainment, among others [1] [2] [3].Nevertheless, the increased use of UAVs has also raisedseveral well founded legal and public security concerns,in part due to the threat of violating privacy, but mainlyderived from the inherent risk of an accident produced bya system failure or human error. Although big efforts havebeen devoted to increase the robustness of these systems, e.g.,safety protocols for returning home have helped to attenuatesecurity issues, it is not enough to warrant people’s safetyon the streets in case of a system failure. Consequently, this

This work was supported by the Mexican National Council of Science andTechnology CONACyT, and the FORDECyT project 296737 “Consorcio enInteligencia Artiﬁcial”.Authors are with the Center for Research in Mathematics CIMAT AC,Mexico. D. Mercado-Ravell & I. Becerra are also with C´atedras CONACyT(*corresponding author email: [email protected] ). Fig. 1: Autonomous visual-based Safe Landing Zone (SLZ)proposals in populated areas. In the case of a system failure,UAVs must at least avoid hurting people. The red regionsare places where people is detected using density maps. Thegreen circles represent the regions free of people where thedrone can land. Finally, the blue circles indicate landingzones tracked by KFs to ensure time consistency.has restricted the use of drones to rural environments, orwide urban areas clear of people, precluding some very in-teresting applications in populated zones such as surveillanceof crowds, monitoring mass events, fast response in healthemergencies, criminals chasing and law enforcing, etc. [4].Accordingly, the legislation generally prohibits any UAVactivity over a crowd [5], but in practical terms, it isunavoidable that an UAV will ﬂight above an unaware crowdat lower altitudes putting at risk the persons safety in caseof any UAV malfunctioning, such a communication loss orpower shortage, or even a human error. Henceforth, it is ofcrucial relevance to further provide every UAV with safetyprotocols to at least avoid hurting people in case of anemergency, being able to autonomously ﬁnd landing zoneswithout putting people at risk. If every UAV were providedwith such an emergency autonomous landing system, thiswould not only help to prevent hurting, or even killing peopleduring drones’ crashes, but it also would considerably boostall their potential for successful urban deployment, even incrowded scenarios.The idea of visual-based autonomous landing of ﬂying ma-chines has been revisited in the past, and some examples areto be found mainly to locate predeﬁned landing tags [6] [7],or to identify and avoid hazardous terrain [8]. Nevertheless,autonomous safe landing in crowded scenarios is an almostunexplored and very challenging task, due to the impliedsecurity constrains and the inherent difﬁculty of the problem,where people may cluster in dense groups producing severe a r X i v : . [ c s . R O ] F e b cclusions, while freely moving in the scene with unknowndynamics. It is only thanks to recent advancements in moderndeep learning techniques for computer vision that we canattempt to solve this complicated task, where just a fewrelated works are to be found in the literature.In [9], the authors propose a new neural network andregularization technique based on the graph embedding net-work to detect what they call no-ﬂy zones, or zones wherea crowd is present on the scenario. Moreover, in [10] anew neural network is proposed for time consistency acrossthe frames in a video, taking into account the perspectivedistortion caused by the projection transformation from thereal world to the image. The Safe Landing Zones (SLZ) area byproduct of the density consistency achieved on the work.More recently, in [11], a model with a MobileNet back-endis presented, speciﬁcally tailored for images coming from anUAV, and fast enough for real time applications. However,all these works are limited to segment the scene in areasfree of people, although this is a necessary and importantﬁrst step for the autonomous safe landing in crowded areas,none of them attempts to solve the problem of indicatingwhich regions are the more appropriate to safely land thevehicle. It is important to stress that we do not use classicalmethods to avoid hazardous areas, because in this work thepriority is to avoid landing over people and not to ensurethe drone’s integrity. Furthermore, an important aspect thatis not considered in works like the one in [8] is that peoplemove.In that regard, this paper presents an algorithm to detectSafe Landing Zones (SLZ) in real populated and uncontrolledscenarios, considering a camera mounted on an UAV, andwhere people are moving with unknown dynamics, prioritiz-ing their integrity in emergency situations, as depicted in Fig.1. To do so, we consider the use of a Deep Neural Network(DNN) density map generator to infer people’s location onthe image, then, an occupancy binary mask is obtainedaiming at overestimating the people’s location for securityreasons. In our previous work in [12], we proposed andimplemented on a mobile phone a lightweight architecturefor real-time density map generation, from where, as a ﬁrstapproach, the biggest circle free of people was found directlyon the image plane using the Polylabel algorithm, withoutconsidering the camera movement. In contrast, in this work,the occupancy map is projected to the head’s plane in thereal world, where multiple SLZ candidates are obtained ascircular regions with a radius larger than a minimum value r .Furthermore, by means of Kalman Filters (KF) and theHungarian algorithm for data association [13], a multipleinstances tracking technique is implemented using the SLZcandidates as measurements, smoothing the SLZ proposalsmovement and providing time consistency along frames.Between two times instants k and k +

1, a KF for each landingzone gives an estimate of the location and radius of a landingzone in the head’s plane. The Hungarian algorithm ﬁnds themost similar landing zone a time k + k ; that landing zone will be the one that maximizes the area of intersection over the union area (IoU) of thelanding zones a time k and k +

1. Indeed, the Hungarianalgorithm orders the similarity of all possible matching usingthat criterion.Lastly, several scenarios were studied from availablecrowd datasets, as well as in real uncontrolled and challeng-ing scenarios recorded from a ﬂying drone in public squareswith people in continuous movement. Although further workis still needed to obtain a fully reliable solution, the proposedstrategy showed good performance in spite of the challengingscenarios, proving to be helpful to prevent hurting people incase of emergency landing.The reminder of the paper is structured as follows. InSection II, we brieﬂy introduce the density map generatorsand their utility for detecting crowds in images. Section IIIdescribes the SLZ proposal generator in the head’s plane.Section IV presents the multiple instances tracking algo-rithm. In Section V, the experiments’ results are presented.Finally, in Section VI, conclusions and future work arediscussed.II. C

ROWDS D ETECTION USING D ENSITY M APS

In order to ﬁnd suitable landing zones where no personcould be harmed, it is necessary to distinguish the crowdfrom a safe landing zone. One of the most popular methodsfor crowd detection are the density map generators [14].Density maps are obtained by means of DNN trained overcrowd images containing head annotations, and providesspatial information on the crowd location even subject tolarge occlusions, furthermore, it provides an estimated of thenumber of persons in an image or scene as the integral ofthe corresponding density map [15]. These density maps haveexcellent results in the counting task for a wide variety ofcrowded scenes, ranging from sparse to highly dense crowds,where person-to-person occlusion could be so severe to thepoint that the only visible part of a person is a portion ofthe head.In contrast to the classical detect-and-count approaches[16], [17], which try to detect the whole body for eachperson, the density map generators are trained to detect theheads of persons in a crowd [15], increasing the robustnessagainst occlusion, at the cost of being more prone to overesti-mate the people’s locations under rich textured backgrounds.The ﬁrst works using density maps with deep learning algo-rithms are found in [15], with the algorithm Multi ColumnConvolutional Neural Network (MCNN) that used multiplecolumns to capture different perspectives and heads’ scales.The latest works focus not only on obtaining more accuratearchitectures, but also on improving the manual annotationsthrough learning [18], or creating new loss functions to betterlearn how to count and locate the crowd in an image [19].The accuracy of people’s location is of great relevance forthe SLZ detection task in order to avoid human accidents.In that sense, due to the inherent security constrains, itis preferred to overestimate the people’s location, hence,it is preferred to have false positive detections, but it isunacceptable to miss detection that can result in hurtingig. 2: UAV moving in a populated environment. Threereference frames are deﬁned, an inertial one ﬁxed to theground, a body ﬁxed frame attached to the UAV and a cameraframe. Then, the camera pose can be retrieved from the UAVon-board sensors measurements.a person. Henceforth, the employed density map generatorshould be explicitly trained to overestimate the people’slocations to increment the security.Another important aspect to take into account whenselecting the density map generator is its computationalcomplexity and time response. It is of crucial importanceto have the information at a high rate in order to performthe landing maneuver in real-time, preferably embedded onthe vehicle. Current state-of-the-art algorithms, especially theworks using pre-trained DNN back-ends like VGG (VisualGeometry Group), are slow, mostly in embedded devices,which hinders the objective of implementing them in mostUAVs. Accordingly, it is required to have a good trade-offbetween accuracy and computational cost when selectingthe density map generator, where recent advancements inlightweight architectures appear as a good alternative for theSLZ detection task. These lightweight architectures could bedeﬁned as DNN below the 1 million of parameters, beingsuitable to be implemented in real-time solutions for bothhigh-end computers and embedded devices [12]. For thiswork, we have selected our own custom lightweight densitymap generator called Pruned BL CCNN [12], but any otherstate-of-the art density map generator can be used insteaddepending on the available hardware. The Pruned BL CCNNobtained good results in the crowd counting and detectiontasks, while being light enough to be implemented for real-time application on a mobile phone app. This network wasexplicitly trained to overestimate the crowd for this particulartask, in order to increase the safety of our algorithm.III. S

AFE L ANDING Z ONE P ROPOSALS

The objective of our algorithm is to propose SLZ inpopulated environments to prevent the drone from hurtingpeople in case of a system failure. For that matter, weconsider a camera attached to an UAV. Moreover, people mayfreely move in the scene with unknown dynamics. For now,we restrict our study to scenes where people are primarilymoving on a horizontal main plain. In the present modeling, three reference frames are used.The ﬁrst one is the inertial or global reference frame, whichis denoted as F W . The body reference frame is attachedto the UAV’s center of mass and is referred as F B . Thehomogeneous transformation from the UAV’s body referenceframe F B to the global reference frame F W is denoted by T WB ∈ SE ( ) and is assumed to be retrievable by means ofthe UAV’s on-board sensors. The third reference frame usesthe camera center as its origin and is denoted as F C , andits transformation with respect to the body ﬁxed frame T BC ∈ SE ( ) is known a priori. Fig. 2 shows the three referenceframes and their transformations.An important element in the proposed solution is the so-called head plane, P , which is a plane in the scene parallelto the ﬂoor that crosses the average height of a person at h h .More precisely, let P be the plane deﬁned in F W , with P =( , , h h ) and (cid:126) n = ( , , ) deﬁning its point-normal form. Thehead plane P is crucial since computations of SLZ proposalsare performed on it. Additionally, let I be the image planeas customary deﬁned in F C .The algorithm is divided into two main stages: the SLZcandidates proposal and the multiple SLZ tracking thatensures time consistency between frames. The former isdescribed in this section, while the latter in Section IV. In theSLZ proposal stage, we obtain at most N p SLZ candidatesin the P plane, with a radius of at least r meters. Towardsthe generation of these SLZ proposals, we ﬁrst calculate thedensity map D in the image plane I using a density mapgenerator. For our experiments, we use the the lightweightdeep learning architecture Pruned BL CCNN [12].The next step is to obtain an occupancy map O in the I plane, based on the density map D . The map O has the samesize as D , where each pixel has either a value of 255 or 0denoting whether there is a person or not according to thedensity values from D . For that, all the occupancy map pixelvalues O i , j are set according to the next expression: O i , j = (cid:40) D i , j − min { D i , j } > , D i , j − min { D i , j } = . After that, the zones with values equal to 255 are dilatedtwo times using a kernel of ones with shape R × to fur-ther overestimate the crowd’s location for security reasons.Finally, we perform a bitwise-not operation to set all valuesequal to 0 to 255 and vice versa.Thereafter, the occupancy map O is projected to thehead’s plane P based on the rigid transformations that relatereference frames F W , F B and F C , which are obtained fromthe UAV’s sensors and the a priori known camera’s pose withrespect to the UAV’s body. These assumptions are consistentwith the information commonly available from most UAVs.More precisely, let T BW ∈ SE ( ) be the rigid transformationfrom inertial frame coordinates to body frame coordinates,and T CB ∈ SE ( ) the transform from body frame coordinatesto camera frame coordinates (see Fig. 2). Thus, denoting theintrinsic camera matrix by K , the projection from a point ( x W , y W , h h , ) T in the P plane to a point ( x I , y I , ) T in theig. 3: Safe landing zones proposals in the image plane (left)and its corresponding projection in the head’s plane (right).The safe landing zones are found in the head’s plane usingthe projected occupation map.image plane I is given as: λ  x I y I  = KT CB T BW  x W y W h h  , (1)with λ a scale factor, which can be easily retrieved from theUAV altitude and the camera pose.Additionally, to perform the projection of the occupancymap O to the head plane P , denote it as O W , a grid G ∈ P is deﬁned. Each element G i , j of the grid stores a coordinate ( x W , y W ) on the P plane; refer to them as ( G xi , j , G yi , j ) . Then,the corresponding occupancy values for O W are retrieved asfollows. Utilizing Eq. (1) and given a grid element G i , j , itsstored inertial coordinates ( G xi , j , G yi , j ) are mapped to an imagecoordinate ( x I , y I ) as: λ · ( x I , y I , ) T = KT CB T BW · ( G xi , j , G yi , j , h h , ) T . (2)The image coordinate ( x I , y I ) is employed to query theoccupancy map O . Thus, the respective occupancy value in O is used as the occupancy value for O Wi , j . The implementationof the past procedure is based on the sampler module of theSpatial Transformer Network [20].In the ﬁnal step, we obtain a distance map C from theoccupancy map O W in the head’s plane. The distance map C considers the binary map encoded in O W , and stores thedistance of each pixel with non zero value (people free) to thethe nearest zero value pixel (occupied by people). By ﬁndingthe position and the value of the pixel, we effectively ﬁnd thebiggest circular landing zone center and its radius in the headplane P . Next, we mark the landing zone in the projectedoccupation map O W as occupied, and repeat the process untila number of N p of SLZ candidates are encountered, or until asmall landing zones with radius less than r is obtained. Theresults of the SLZ detection algorithm in the image plane I and in the holography projection to the head’s plane P areillustrated in Fig. 3.IV. M ULTIPLE LANDING ZONE T RACKING

By themselves, the safe landing zone proposals couldbe used to indicate where a UAV could land. Nonetheless,due to the difﬁculty of the task where people in the scenemove freely with unknown dynamics within the plane P , in addition to the density maps not being exempt to fail due toenvironmental variations, the SLZ candidates might criticallychange from a frame in time k to a frame in time k +

1. Thiscould be catastrophic in the context of an emergency landingmission. Therefore, as the second stage of our algorithm, wepropose to use multiple instance KF trackers to ﬁlter outoutliers in the SLZ detection, smoothing out the movementof the landing regions, preventing abrupt jumps and ensuringtemporal consistency along frames.For a given instance i of a SLZ encountered using theprocedure described in the previous section, let us considera discrete time KF, with state vector at time k equal to x ( i ) k = (cid:16) x ( i ) k , y ( i ) k , r ( i ) k , ˙ x ( i ) k , ˙ y ( i ) k , ˙ r ( i ) k (cid:17) T , (3)where x ( i ) k and y ( i ) k represent the coordinates of the SLZcircle’s center in the head plane P , r ( i ) k is the circle radius,while the velocity of the coordinates of the center and therate of change of the radius are given by ˙ x ( i ) k , ˙ y ( i ) k and ˙ r ( i ) k ,respectively.For the dynamic model, we use a constant velocity modelwith acceleration considered as the process noise ω withnormal distribution [21], that is, x ( i ) k + = (cid:20) I ∆ t I I (cid:21) x ( i ) k + ω | ω ∼ N ( , Q ) , (4)with ∈ R × a matrix of zeros, I ∈ R × the identity ma-trix, and ∆ t the time increment. The process noise covariancematrix Q is obtained as: Q = σ a (cid:34) ∆ t I ∆ t I ∆ t I ∆ t I (cid:35) , (5)where σ a is the acceleration uncertainty and is selectedempirically.The measurement model considers the observation vector z ( i ) k as the coordinates of the center of the SLZ and its radius,as obtained by the algorithm described in Section III, witha measurement noise ν with normal distribution, therefore,the measurement model is given by: z ( i ) k = (cid:2) I (cid:3) x ( i ) k + ν | ν ∼ N ( , R ) , (6)with R standing for the measurement covariance matrix.To handle multiple target tracking and associate the mainSLZ candidates with their corresponding trackers, the Hun-garian algorithm is employed, a combinatorial optimizationalgorithm that solves the assignment problem in polynomialtime [13]. We associate a SFZ proposal with a KF by creatinga cost matrix using the Intersection over Union ( IoU ) criteria,between all the combinations of the SFZ measurements andthe KFs estimates. For two circle areas A i and A j , the IoUis computed as IoU = A i ∩ A j A i ∪ A j . (7)Then, the cost matrix is fed to the Hungarian algorithm,which associates the “most similar” SFZ proposal for eachKF tracker. If no match is found for a KF, it is assumed thato consistent observation is available for that ﬁlter, and ano-found counter is increased by one. When the counter isgreater than a threshold µ , the respective KF is eliminated.If an unmatched SFZ proposals is found for a number of µ consecutive iterations, and the number of KFs is below N p ,a new KF is initialized.At the beginning of a trial, a maximum of N p KFsare created by initializing their states using the ﬁrst SFZproposals. In the following frames, the propagation stepis performed, and when available, the predicted state iscorrected through the corresponding observation.V. E

XPERIMENTAL V ALIDATION

The proposed strategy was validated in a wide varietyof real scenarios recorded from a drone in public squares,as well as with videos from the Venice dataset [22]. Theformer scenarios where recorded by our lab members usinga Parrot Bebop 2. The drone was remotely controlled usingROS (Robotic Operating System) from a ground computer,from where videos and sensor data were recorded. The videoresolution is set to 480p at 15 to 30 fps, depending on thecommunication quality. We used the IMU (Inertial Measure-ment Unit), GPS (Global Positioning System) and altimeterﬁltered data available at 5 Hz to obtain the drone’s poseand use it to project the image to inertial frame coordinates.The camera is mounted in the front of the drone with a wide-angle lens. Additionally, the camera allows to select a regionof interest with distortion correction by software using virtualtilt and pan angles.As shown in Fig. 4, the scenes come from different pointsof view of two distinct public plazas taken from an UAV.These recordings display a variety of characteristics thatmake them specially challenging to test safe landing zoneproposal algorithms. All of them contain moving crowdswith changes in illumination, heterogeneous surfaces toland, different drone’s altitude and movement, and differentbackgrounds.In the dataset experiments, we used three scenarios fromthe Venice dataset [22] recorded in the San Marcos plaza.These videos were recorded using a cellphone above thecrowd at a resolution of 720p at 25 fps. Since no camerapose was obtained, the homographies were estimated fromthe image alone at every 60 frames mark. Since the homog-raphy has a update rate of 0.5 Hertz, we ﬁx in place thehomography for short periods of time where it did not appearto be signiﬁcant camera movements. All of the scenes presenta dense crowd moving in a ﬂat surface which make themsuitable to test the SLZ proposal algorithm despite beingcreated for the crowd counting task.Fig. 4 also shows the performance of our algorithm indifferent scenes. We see in green the current frame SLZcandidates as computed by the algorithm described in SectionIII and in blue the KF trackers estimate. In all ﬁgures, it canbe seen that the algorithm, for that frame, locates suitableSLZ proposals and trackers in zones free of people. Thechallenge that each scene offers varies depending on thecharacteristics described above. For example, in Figure 4e we see one of the most challenging scenarios; where the camerais near to the ground in a narrow passage, capturing peoplein constant ﬂow, making it difﬁcult to keep track of thelanding zone proposals, while presenting high illuminationcontrasts due to the sun. Despite of this, the oldest trackers(signaled by a lower number) keep track of umbrellas and asmall green area where the drone could land without hurtingpeople. Also, in Fig. 4f, we display an scene from the Venicedataset, showing good performance since all of the trackerscover a valid SLZ proposal free of people.Another interesting aspect can be observed in Figs. 4a, 4b,4c, where sparser crowds are presented in a wide scene. Wecan observe how the rich textures on the ﬂoor may confusethe density map appearing as false positive crowds. Thisis produced since the density map generators are explicitlytrained to overestimate the crowds to increase the algorithmsecurity. Nevertheless, the proposed strategy was able toﬁnd suitable SLZ in the scenes. A more robust density mapgenerator, such as DM-Count [19], could help to mitigatethis phenomena at a computational cost, compromising alsothe capacity to implement them in real-time embeddedapplications.Finally, in order to better appreciate the performance ofthe strategy in dynamic scenarios, we provide a video in themultimedia material of this paper, containing the results ofour algorithm in the various scenarios here presented.VI. C

ONCLUSIONS AND F UTURE W ORK

In this paper we present a novel algorithm for visual-based autonomous safe landing of UAVs in populated en-vironments. We believe that this kind of algorithms are akey aspect for the successful application of UAVs in urbanareas, provided that they will help to prevent injuring peoplein case of emergency landing.The proposed algorithm, in conjunction with an adequatedensity map generator, is capable of ﬁnding Safe LandingZones proposals in a variety of real uncontrolled and chal-lenging scenes where the crowds move with unknown dy-namics, with illumination changes, different crowd densitiesand rich image features. Moreover, with the help of the KFs,the algorithm is able to keep track of Safe Landing Zonesproposals even when the density map fails to correctly detectparts of the crowds in consecutive frames.Although the present work showed promising results, itis not exempt from failure, and due to the hard securityconstrains involved and the wide diversity of challengingsituations that may occur during and emergency landingmission, further work is still required to obtain a fully reliablesolution. Future efforts, will be devoted to deﬁne a suitablecriteria to choose the best SLZ available. Also, it is requiredto impose a horizontality condition on the main planes of thescene in order to discriminate unsuited regions for landing,such as walls or steep terrains. Another interesting approachto further increase the people’s security would be to add ahuman body detector as the UAV comes closer to the ground.Also, at present we focus all the efforts in avoiding to hurtpeople, but preventing the UAV from crashing with other a) (b) (c)(d) (e) (f)

Fig. 4: Experiments results. The red zones are where the crowd is located, the green circles are the current frame safelanding proposals and the blue circles are the trackers. As shown in the images, the goal is to ﬁnd suitable landing spotsfor the drone to land in a variety of scenarios. From sparse crowds (Fig 4a) to low illumination near the ground (4e).obstacles would be also of interest. Finally, the real timecapabilities could be further improved by having an end-to-end deep learning solution by training it using synthetic andreal data. R

EFERENCES[1] F. Nex and F. Remondino, “Uav for 3d mapping applications: areview,”

Applied Geomatics , vol. 6, no. 1, pp. 1–15, 2013. [Online].Available: https://doi.org/10.1007/s12518-013-0120-x[2] A. Lucieer, S. M. de Jong, and D. Turner, “Mapping landslidedisplacements using structure from motion (sfm) and imagecorrelation of multi-temporal uav photography,”

Progress in PhysicalGeography: Earth and Environment , vol. 38, no. 1, pp. 97–116,2013. [Online]. Available: https://doi.org/10.1177/0309133313515293[3] H. Menouar, I. Guvenc, K. Akkaya, A. S. Uluagac, A. Kadri,and A. Tuncer, “Uav-enabled intelligent transportation systems forthe smart city: Applications and challenges,”

IEEE CommunicationsMagazine , vol. 55, no. 3, pp. 22–28, 2017. [Online]. Available:https://doi.org/10.1109/mcom.2017.1600238cm[4] N. H. Motlagh, M. Bagaa, and T. Taleb, “Uav-based iotplatform: a crowd surveillance use case,”

IEEE CommunicationsMagazine

Proceedings 2002IEEE International Conference on Robotics and Automation ICRA) , -2002.[7] P. J. Garcia-Pardo, G. S. Sukhatme, and J. F. Montgomery, “Towardsvision-based safe landing for an autonomous helicopter,”

Robotics andAutonomous Systems , vol. 38, no. 1, pp. 19–29, 2002.[8] A. Johnson, J. Montgomery, and L. Matthies, “Vision guided landingof an autonomous helicopter in hazardous terrain,” in

Proceedings ofthe 2005 IEEE International Conference on Robotics and Automation ,- 2005.[9] M. Tzelepi and A. Tefas, “Graph embedded convolutional neuralnetworks in human crowd detection for drone ﬂight safety,”

IEEETransactions on Emerging Topics in Computational Intelligence , pp.1–14, 2019. [Online]. Available: https://doi.org/10.1109/tetci.2019.2897815 [10] W. Liu, K. Lis, M. Salzmann, and P. Fua, “Geometric andphysical constraints for drone-based head plane crowd densityestimation,” in , 11 2019. [Online]. Available: https://doi.org/10.1109/iros40897.2019.896785[11] G. Castellano, C. Castiello, C. Mencar, and G. Vessio, “Crowddetection in aerial images using spatial graphs and fully-convolutionalneural networks,”

IEEE Access , vol. 8, pp. 64 534–64 544, 2020.[Online]. Available: https://doi.org/10.1109/access.2020.2984768[12] J. A. Gonzalez-Trejo and D. A. Mercado-Ravell, “Lightweight densitymap architecture for uavs safe landing in crowded areas,” under review ,2020.[13] D. Bruff, “The assignment problem and the hungarian method,”

Notesfor Math , vol. 20, no. 29-47, p. 5, 2005.[14] D. Kang, Z. Ma, and A. B. Chan, “Beyond counting: Comparisonsof density maps for crowd analysis tasks—counting, detection, andtracking,”

IEEE Transactions on Circuits and Systems for VideoTechnology , vol. 29, no. 5, pp. 1408–1422, 2018.[15] Y. Zhang, D. Zhou, S. Chen, S. Gao, and Y. Ma, “Single-image crowd counting via multi-column convolutional neuralnetwork,” in , 6 2016. [Online]. Available: https://doi.org/10.1109/cvpr.2016.70[16] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “Yolov4: Op-timal speed and accuracy of object detection,” arXiv preprintarXiv:2004.10934 , 2020.[17] M. Tan, R. Pang, and Q. V. Le, “Efﬁcientdet: Scalable and efﬁcientobject detection,” 2019.[18] S. Bai, Z. He, Y. Qiao, H. Hu, W. Wu, and J. Yan, “Adaptivedilated network with self-correction supervision for counting,” in

Proceedings of the IEEE/CVF Conference on Computer Vision andPattern Recognition , 2020, pp. 4594–4603.[19] B. Wang, H. Liu, D. Samaras, and M. Hoai, “Distribution matchingfor crowd counting,” in

Advances in Neural Information ProcessingSystems , 2020.[20] M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu,“Spatial transformer networks,” 2015.[21] K. Saho,

Kalman Filter for Moving Object Tracking: PerformanceAnalysis and Filter Design , ser. Kalman Filters - Theory forAdvanced Applications. InTech, 2018, p. nil. [Online]. Available:https://doi.org/10.5772/intechopen.71731[22] W. Liu, M. Salzmann, and P. Fua, “Context-aware crowd counting,”in