Extracting the fundamental diagram from aerial footage
Rafael Makrigiorgis, Panayiotis Kolios, Stelios Timotheou, Theocharis Theocharides, Christos G. Panayiotou
11 Extracting the fundamental diagram from aerialfootage
R. Makrigiorgis, P. Kolios, S. Timotheou, T. Theocharides, and C.G. Panayiotou
Abstract —Efficient traffic monitoring is playing a fundamentalrole in successfully tackling congestion in transportation net-works. Congestion is strongly correlated with two measurablecharacteristics, the demand and the network density that impactthe overall system behavior. At large, this system behaviouris characterized through the fundamental diagram of a roadsegment, a region or the network.In this paper we devise an innovative way to obtain thefundamental diagram through aerial footage obtained fromdrone platforms. The derived methodology consists of 3 phases:vehicle detection, vehicle tracking and traffic state estimation. Weelaborate on the algorithms developed for each of the 3 phasesand demonstrate the applicability of the results in a real-worldsetting.
I. I
NTRODUCTION
Drones or Unnamed Aerial Vehicles (UAVs) have a broadrange of applications ranging from remote sensing to deliveries[1]. They have also become so affordable that they are on acourse to transform domains where infrastructure inspectionand monitoring in crucial, including of course road trafficmonitoring.The great advantage of UAVs in road traffic monitoring isthat they can capture footage over large areas from which novelinformation can be extracted. Unlike localize information fromloop detectors and static cameras, processed UAV footage canreveal mobility and speed patterns over distances and timeperiods long enough that the underlying speed-flow-density re-lationship of lines, road segments and regions can be revealed.More specifically, this speed-flow-density information can beused to extract the fundamental diagram (i.e., the relationshipbetween the traffic flux and the traffic density (vehicles perhour to vehicles per kilometres) [2]. It is well know in thetransportation research community that this diagram reflectson the macroscopic effects of traffic flux, velocity and densityand it often used for predicting the characteristics of theroad system behaviour [3]. Moreover, using the fundamentaldiagram (FD), traffic control can be applied, such as increasingthe road infrastructure at highly congested regions or morefavourably apply intelligent traffic light policies and noveltraffic managements schemes as suggested in [4] and lookedat in [5], [6].In this paper we elaborate on how the FD can be extractedfrom video footage collated by UAV platforms through apipeline of image processing, vehicle tracking and finally,
R. Makrigiorgis, P. Kolios, S. Timotheou, T. Theocharides, andC.G. Panayiotou are with the KIOS Research Center for IntelligentSystems and Networks, and the Department of Electrical and ComputerEngineering, University of Cyprus, { makrigiorgis.rafael,pkolios, timotheou.stelios, christosp,ttheocharides } @ucy.ac.cy traffic state estimation. Thereafter an example case study willbe presented where the pipeline has been implemented andvalidated using collected GPS traces as well as OBD (OnboardDiagnostic unit) measurements.The rest of the paper is structured as follows. Section IIincludes related work and demonstrates our contributions withrespect to the state-of-the-art. Section III provides a detailedderivation of our proposed pipeline and Section IV providesan experimental evaluation of this pipeline. Finally Section Vconcludes with key findings and future research avenues.II. R ELATED W ORK
A plethora of recent works have looked in detail in theproblem of road traffic state estimation (using the funda-mental diagram) since it largely affects traffic managementperformance. This is due to the fact that the FD provides alow-complexity modelling framework for characterizing therelationship between the three main mobility parameters (i.e.,speed, flow, and density). Briefly speaking, the FD consists oftwo distinct regimes that are separated by the critical densityof the road traffic infrastructure under investigation. The tworegimes are the free-flow regime where traffic flows at itsmaximum speed (i.e., at free-flow speed) and the congestedregime where traffic experiences speed reduction as densitykeeps increasing. The concept of the FD has been empiricallyvalidated using real traffic data [7] and used to accuratelyestimate the outflow rate across a road network [8].Traffic control techniques including Gating and Perimetercontrol, base their policies on the FD to maximize the outflowof a region by controlling its external inflow rate so as thenetwork remains in the free-flow regime [9], [10] [11], [12].At the same time, Route Guidance methods aim at balancingthe traffic load across the network by selecting routes basedon the FD characteristics [13].It is therefore evident that an accurate FD model is anessential building block for traffic management. The seminalpaper in [3] discusses how GPS traces from a fleet of taxiswere used to extract the FD model while the more recent workin [14] discusses how the FD can be extracted from scarcesensor data.Hereafter, we derive a new and novel approach to extract theFD from aerial video footage that has become both easy andcheap to acquire. Relevant datasets available to date includethe Stanford Drone Dataset [15], which is a dataset havingtrajectories of multiple road users taken from drone video data.Also, the NGSIM Dataset (Next Generation SIMulation) [16]is a large vehicle dataset, with high-quality traffic data which is a r X i v : . [ c s . C V ] J u l destiny to be used in research of traffic flows. There has beenevaluations and further analysis of NGSIM dataset in [16], [17]that show however a lot of false positive trajectory collisionsand illogical vehicle speeds and accelerations. Specificallyfor traffic monitoring, the HighD dataset [18] has recentlybecome available that includes naturalistic vehicle trajectoriesrecorded on German highways. This is a scenario-based testingfor the safety validation of highly automated vehicles. HighDalso extracts vehicle’s trajectory, size and manoeuvres usingmachine learning and computer vision algorithms.Machine learning for detection and tracking of vehicles hasbeen extensively researched in the recent past and our workin [19] is part of that research domain where Convolutionalneural networks (CNN) for aerial image processing has beenlooked at. III. FD E XTRACTION P IPELINE
As emphasized above, the aim of this work is to providean end-to-end pipeline for extracting the fundamental diagramform aerial video footage. The three main components of thispipeline are the image processing, vehicle tracking and FDextraction as elaborated below.
A. Image Processing
Top-down aerial video footage is used as input to a trainingdataset for vehicle dection. A first step in this procedure is toextract images from the collected videos and either manuallyannotate vehicles or use tools such as DronetV3 [20] toautomatically annotate images using templates of the object.In our case a total of vehicles were annotated out of minutes of highway traffic data (captured at m heightand covering a road segment of about m length).For evaluating vehicle detections, Darknet YoloV2 [21] andDronetV3 (based on Tiny-Yolov3), were used. As a note,YoloV2 runs in an offline mode while DronetV3 is light-enough to run in real-time. B. Vehicle Tracking
Using vehicle detection algorithms each object is pointedout using a bounding box with IDs that change over timedue to the lack of accurate data association. To address thisproblem and be able to track correctly vehicle trajectories, inthis work the Hungarian Algorithm [22] in combination withKalman filtering [23] is used.By employing the Hungarian algorithm (also known asKuhn-Munkres Algorithm), an object in the current frame ismatched to an object in the previous frame using a scorefunction. To associate objects in consecutive frames, the IoU
Figure 1. IoU Sample Scores (Intersection of Union) is employed here where the percentageof overlap between frames is used as exemplified in Fig. 1.When the IoU scores above a certain threshold are found,matching the previous bounding box with the current detectedbox results to a good representation of bounding box trajec-tories for each vehicle.Evidently, the performance of this approach degrades whenvehicle dynamically change speeds or take sharp turns or evenwhen an occlusion occurs (eg. a vehicle passing under a tree).To address the aforementioned cases, on top of the matchingbetween successive frames, Kalman filtering is also employed.Kalman filtering is applied on every bounding box after abox has been matched using the Hungarian algorithm. Whenthe association is made, predictions and corrections (updatingKalman equations with real measurements) are made. Tocalculate the mean and covariance values, OpenCV’s KalmanFilter library is used. An example of what does Kalman Filteractually calculate is shown in Fig. 2.In essence Kalman filtering is employed to keep track ofevery vehicle crossing in the field-of-view. In those caseswhere a vehicle dynamically changes speeds or positions theIoU of the boxes between two frames may differ in such a waythat it cannot be matched as the same vehicle. Instead, usingKalman filtering, predictions of the detected boxes (as shownin Fig. 3) and vehicle tracking becomes much more accurate.A pseudocode of the proposed approach can be found in Alg.1.
B / C A
Optimal state estimateDrone's Camera lens D Figure 2. Kalman Filter ExplanationFigure 3. Harpy’s Kalman Filter usage preview (best viewed in colour)
C. Addressing Occlusion
When an occlusion occurs, vehicle stop being observed bythe camera. In this case, the Kalman filter can still predict the next position of the vehicle, use that information to track thevehicle trajectory and eventually match up with the detectedvehicle when it becomes visible again.The main challenge here is the fact that since Kalman isnot updating its measurements, a motion model needs to beintroduced. Hereafter we use a simple linear motion modelwhere the displacement between the last set of frames wherethe vehicle was detected is used to estimate subsequent vehiclepositions.To aid understanding, Fig. 4 provides an illustrative exam-ple. In case the predicted vehicle trajectory does not matchwith a detected vehicle over a certain period of time, theestimates are discarded. Clearly, predictions made over ex-tended periods of time will substantially deviate from realitydue to the model imperfections. A pseudocode of the occlusionalgorithm can be found in Algorithm 2.
Result:
Vehicle Bounding Box and Statistics while
Boxes detected previously do calculate IoU with current detection; if best IoU score exists and bigger than threshshold then match boxes - previous with current box; else previous box not found; endendif previous box not found then initialization of box; else Calculate average of Euclidean distance fromprevious frames to current (only for the latest frames);Calculate velocity of the vehicle in Km/h ;Kalman Predict();Kalman Correct() using real mesurements;Calculate Direction of the vehicle - depending on thebox;Display Bounding Box and Trajectory;Save all the statistics of the vehicle to array; endAlgorithm 1: Harpy Detection and Tracking Algorithm
Figure 4. Harpy’s Kalman Filter usage for occlusion from a tree (ID 207 ofthe vehicle remains the same).
D. Velocity Estimation
To calculate the velocity of moving vehicles from detec-tions, a representation of pixels to real distances is needed. Toobtain that relationship, the Ground Sample Distance (GSD)is employed as mentioned in [24]. Ground Sample Distance isthe distance between centre points of each sample taken of the
Result:
Display Vehicle Bounding Box on Occlusion if Vehicle stop being detected for less than X frames thenif is new detection then ignore it; else
Kalman Predict();Calculate x,y difference from previous frames;Kalman Correct();Display the Box; endelse remove the box from being active; end Algorithm 2:
Harpy Occlusion preventionground. In simpler terms, the GSD is the representation, in realsize, of each pixel on the 2D plane. Calculating it requires aset of parameters such as the UAV height, the camera’s sensorheight and width, focal length of the camera and the imagewidth, height of the video taken. Of course, these parametersneed to be adjusted when either image size, UAV heightor camera lenses are changed. The latter parameters can betaken from the manufacturers technical specifications. Then,by calculating the GSD for height and width separately, theworst case scenario is picked as our GSD. The equations areas follows:
GSD h = D ∗ BA ∗ E , GSD w = D ∗ CA ∗ FGSD final = GSD worst
Kmpixels
Assuming A is the focal length, B and C are the camera’ssensor height and width respectively, D is the drone’s heightand E, F are the image’s height and width respectively. Ashowcase of what these parameters are is shown in Fig. 2.When GSD is calculated a correct representation of cen-timetres to pixels (cm/px) is obtained and used to calculatethe average Euclidean distance over consecutive frames. Tocalculate the average of the Euclidean, the difference betweenthe last f detected frames is accounted for. Then given theframe rate of the video the velocity for each vehicle trajectorycan be calculated as follows: Eu = (cid:112) ( | x − x | ∗ GSD ) + ( | y − y | ∗ GSD ) V elocity = ( (cid:80) Eu ) ∗ F RF D (cid:18)
Kmh (cid:19) assuming FR and FD are Frame Rate and Frame Differencerespectively and Eu is the Euclidean.To verify these estimates, a simple real-life experiment wasconducted using a test vehicle. Aerial footage of our testvehicle was collected while driving over a particular roadsegment. Video recording were made using different heightsbetween to meters. At the same time an OBD (onboarddiagnostic unit) was used to capture timestamped readingsof the vehicle speed while a GPS tracker was used in orderto take measurements of the position and hence the velocityof the vehicle as well. As it turns out from the comparison of these three methods, the proposed tracking algorithm wasable to achieve an accuracy of in the speed estimatesas compared to the OBD. The experiment test results for thecase of the 150m height data acquisition can be seen in Fig.5. The straight lines connecting the traces acquired from theaerial footage account for the time periods where the vehiclewent out and back in the field-of-view of the UAV during theexperiment. Figure 5. Comparison of actual speed measurements captured by the threecomplimentary approaches. A small offset in the timing of the measurementsis related to matching the video frames to the timestamped data collectedfrom OBD and GPS trackers.
E. Traffic Monitoring Statistics
In addition to velocity estimates, vehicle detections can alsoprovide a number of additional measurements including perframe vehicle density and inflow/outflow vehicle counters. Ineffect, this information can be used to extract the fundamentaldiagram of a road segment and be used to characterize thetraffic state. In summary, the following set of data wereextracted using the proposed pipeline: • X,Y position and timestamp of every vehicle for everydetection. • Vehicle Velocities for each detection. • Vehicle directions for each detection (left, right,top/bottom-right/left) based on the boxes difference be-tween each frame. • Density of vehicles in each frame. • Inflow/outflow of vehicles.IV. E
XPERIMENTAL E VALUATION
To demonstrate the applicability of the Harpy FD pipeline,3 hours of aerial video was captured using DJI Mavic En-terprise UAVs flying at 150m altitude above a single roadsegment in Nicosia, Cyprus. The training was done using morethan images with more than vehicle annotationsobtained from various own and online sources. Furthermore,both YoloV2 and DronetV3 networks were employed forperformance comparison. The main reason of using DronetV3is to investigate the trade-off between processing time anddetection accuracy. As a note, even though the data was beingprocessed offline, having a smaller network can significantlyreduce the processing time, especially when dealing withlong duration and high quality video footage. Also another reason for choosing YoloV2 instead of YoloV3 for offlinedetection was due to the fact that the system configurationcould for example, not be able to handle detection of K (or higher) resolution footage using YoloV3 due to lack ofmemory resources. Using YoloV2, we could extract detectionsof K resolution (or downscaling a k video to K ) footageand since results were obtained from that process it was alsoconsidered.Taking a look on Table I it is clear that in terms of MAP(Mean Average Precision) accuracy of these two networks,they are not too much apart, since YoloV2 does not have muchmore layers than DronetV3. Although, the IoU percentage ofthe ground truth of the detections is much more higher usingYoloV2. That is another reason why YoloV2 was chosen forour Harpy Dataset example. Having a better IoU means thatthe boxes of the detections are much more accurate in termsof vehicle shapes and resulting to more precise trajectories.The training and detection tests were done using a desktopcomputer with an i7-7800X 12 core CPU @3.5Ghz, 64GB ofRAM and an NVIDIA RTX 2080 11GB. The evaluation wasdone on the collected 3 hour video where more than vehicles were extracted. Table IT
ABLE OF I O U AND
MAP
COMPARISON OF D RONET AND Y OLO . IOU (%) MAP (%)Threshold(%)
15 25 50 75 15 25 50 75
YoloV2 ,
46 75 ,
38 73 ,
34 63 ,
51 44 ,
89 44 ,
88 40 ,
54 33 , DronetV3 ,
82 51 ,
89 48 ,
69 21 ,
54 38 ,
53 41 ,
68 35 ,
89 10 , Figure 6. Speed-density relationship extracted using the Harpy dataset.
Figure 7. Fundamental Diagram. It represents the relationship between trafficflux to traffic density. When the FD graph decays, it means traffic jam startsto occur.
V. C
ONCLUSION AND F UTURE W ORK
This work develops a detailed pipeline for traffic monitoringusing aerial video data through detection and tracking of vehi-cles and finally traffic state estimation. The proposed pipelinewas empirically evaluated using OBD and GPS measurements.Thereafter the proposed pipeline was used to extract the FDfrom video collected from a particular road section in Nicosia,Cyprus.As future work, we will be exploring online solutions basedon light-weight deep learning algorithms (e.g., [25]) that couldprovide adequate accuracy and run in real-time on resource-limited onboard UAV processors. In addition, our aim is toexplore our solution over complementary datasets with varyingparameters. A
CKNOWLEDGEMENT
This work is supported by the European Union’s Hori-zon 2020 research and innovation programme under grantagreement No 739551 (KIOS CoE) and from the Republicof Cyprus through the Directorate General for EuropeanProgrammes, Coordination and Development.R
EFERENCES[1] E. Commission, “Commission staff working document - towards aeuropean strategy for the development of civil applications of remotelypiloted aircraft systems(rpas),” 2012.[2] G. Puppo, M. Semplice, A. Tosin, and G. Visconti, “Fundamentaldiagrams in traffic flow: the case of heterogeneous kinetic models,” arXivpreprint arXiv:1411.4988 , 2014.[3] N. Geroliminis and C. F. Daganzo, “Existence of urban-scale macro-scopic fundamental diagrams: Some experimental findings,”
Transporta-tion Research Part B: Methodological , vol. 42, no. 9, pp. 759–770, 2008.[4] M. Papageorgiou, C. Diakaki, V. Dinopoulou, A. Kotsialos, and Y. Wang,“Review of road traffic control strategies,”
Proceedings of the IEEE ,vol. 91, no. 12, pp. 2043–2067, 2003.[5] A. Kouvelas, M. Saeedmanesh, and N. Geroliminis, “Enhancing model-based feedback perimeter control with data-driven online adaptive op-timization,”
Transportation Research Part B: Methodological , vol. 96,pp. 26–45, 2017.[6] C. Menelaou, S. Timotheou, P. Kolios, and C. Panayiotou, “Joint routeguidance and demand management for multi-region traffic networks,”pp. 2183–2188, 2019.[7] N. Geroliminis, C. F. Daganzo et al. , “Macroscopic modeling of trafficin cities,” no. 07-0413, 2007.[8] C. Daganzo, “Urban gridlock: Macroscopic modeling and mitigationapproaches,”
Transportation Research Part B: Methodological , vol. 41,pp. 49–62, 01 2007. [9] A. Mazloumian, N. Geroliminis, and D. Helbing, “The spatial variabilityof vehicle densities as determinant of urban network capacity,”
Philo-sophical Transactions of the Royal Society A: Mathematical, Physicaland Engineering Sciences , vol. 368, no. 1928, pp. 4627–4647, 2010.[10] N. Geroliminis and J. Sun, “Properties of a well-defined macroscopicfundamental diagram for urban traffic,”
Transportation Research Part B:Methodological , vol. 45, no. 3, pp. 605–617, 2011.[11] M. Keyvan-Ekbatani, A. Kouvelas, I. Papamichail, and M. Papageorgiou,“Exploiting the fundamental diagram of urban networks for feedback-based gating,”
Transportation Research Part B: Methodological , vol. 46,no. 10, pp. 1393–1403, 2012.[12] J. Haddad and N. Geroliminis, “On the stability of traffic perimetercontrol in two-region urban cities,”
Transportation Research Part B:Methodological , vol. 46, no. 9, pp. 1159–1176, 2012.[13] M. Yildirimoglu and N. Geroliminis, “Approximating dynamic equilib-rium conditions with macroscopic fundamental diagrams,”
Transporta-tion Research Part B: Methodological , vol. 70, pp. 186–200, 2014.[14] O. Q. Montoya and C. Canudas-de Wit, “Estimation of fundamental dia-grams in large-scale traffic networks with scarce sensor measurements,”pp. 3457–3462, 2018.[15] A. Robicquet, A. Sadeghian, A. Alahi, and S. Savarese, “Learning socialetiquette: Human trajectory understanding in crowded scenes,” pp. 549–565, 2016.[16] B. Coifman and L. Li, “A critical evaluation of the next generationsimulation (ngsim) vehicle trajectory dataset,”
Transportation ResearchPart B: Methodological , vol. 105, pp. 362–377, 2017.[17] M. Montanino and V. Punzo, “Trajectory data reconstruction andsimulation-based validation against macroscopic traffic patterns,”
Trans-portation Research Part B: Methodological , vol. 80, pp. 82–106, 2015.[18] R. Krajewski, J. Bock, L. Kloeker, and L. Eckstein, “The highd dataset:A drone dataset of naturalistic vehicle trajectories on german highwaysfor validation of highly automated driving systems,” pp. 2118–2125,2018.[19] C. Kyrkou, S. Timotheou, P. Kolios, T. Theocharides, and C. G.Panayiotou, “Optimized vision-directed deployment of uavs for rapidtraffic monitoring,” pp. 1–6, 2018.[20] C. Kyrkou, G. Plastiras, T. Theocharides, S. I. Venieris, and C.-S.Bouganis, “Dronet: Efficient convolutional neural network detector forreal-time uav applications,” pp. 967–972, 2018.[21] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767 , 2018.[22] H. W. Kuhn, “The hungarian method for the assignment problem,”
Navalresearch logistics quarterly , vol. 2, no. 1-2, pp. 83–97, 1955.[23] R. E. Kalman, “A new approach to linear filtering and predictionproblems,”
Journal of basic Engineering , vol. 82, no. 1, pp. 35–45,1960.[24] P. Aero, “What is ground sample distance (gsd) and how does it affectyour drone data?”
Propeller February , 2018.[25] V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, “Efficient processing ofdeep neural networks: A tutorial and survey,”