[PDF] Extracting the fundamental diagram from aerial footage

Abstract

Efficient traffic monitoring is playing a fundamental role in successfully tackling congestion in transportation networks. Congestion is strongly correlated with two measurable characteristics, the demand and the network density that impact the overall system behavior. At large, this system behavior is characterized through the fundamental diagram of a road segment, a region or the network. In this paper we devise an innovative way to obtain the fundamental diagram through aerial footage obtained from drone platforms. The derived methodology consists of 3 phases: vehicle detection, vehicle tracking and traffic state estimation. We elaborate on the algorithms developed for each of the 3 phases and demonstrate the applicability of the results in a real-world setting.

Full PDF

11 Extracting the fundamental diagram from aerialfootage

R. Makrigiorgis, P. Kolios, S. Timotheou, T. Theocharides, and C.G. Panayiotou

Abstract —Efﬁcient trafﬁc monitoring is playing a fundamentalrole in successfully tackling congestion in transportation net-works. Congestion is strongly correlated with two measurablecharacteristics, the demand and the network density that impactthe overall system behavior. At large, this system behaviouris characterized through the fundamental diagram of a roadsegment, a region or the network.In this paper we devise an innovative way to obtain thefundamental diagram through aerial footage obtained fromdrone platforms. The derived methodology consists of 3 phases:vehicle detection, vehicle tracking and trafﬁc state estimation. Weelaborate on the algorithms developed for each of the 3 phasesand demonstrate the applicability of the results in a real-worldsetting.

I. I

NTRODUCTION

Drones or Unnamed Aerial Vehicles (UAVs) have a broadrange of applications ranging from remote sensing to deliveries[1]. They have also become so affordable that they are on acourse to transform domains where infrastructure inspectionand monitoring in crucial, including of course road trafﬁcmonitoring.The great advantage of UAVs in road trafﬁc monitoring isthat they can capture footage over large areas from which novelinformation can be extracted. Unlike localize information fromloop detectors and static cameras, processed UAV footage canreveal mobility and speed patterns over distances and timeperiods long enough that the underlying speed-ﬂow-density re-lationship of lines, road segments and regions can be revealed.More speciﬁcally, this speed-ﬂow-density information can beused to extract the fundamental diagram (i.e., the relationshipbetween the trafﬁc ﬂux and the trafﬁc density (vehicles perhour to vehicles per kilometres) [2]. It is well know in thetransportation research community that this diagram reﬂectson the macroscopic effects of trafﬁc ﬂux, velocity and densityand it often used for predicting the characteristics of theroad system behaviour [3]. Moreover, using the fundamentaldiagram (FD), trafﬁc control can be applied, such as increasingthe road infrastructure at highly congested regions or morefavourably apply intelligent trafﬁc light policies and noveltrafﬁc managements schemes as suggested in [4] and lookedat in [5], [6].In this paper we elaborate on how the FD can be extractedfrom video footage collated by UAV platforms through apipeline of image processing, vehicle tracking and ﬁnally,

R. Makrigiorgis, P. Kolios, S. Timotheou, T. Theocharides, andC.G. Panayiotou are with the KIOS Research Center for IntelligentSystems and Networks, and the Department of Electrical and ComputerEngineering, University of Cyprus, { makrigiorgis.rafael,pkolios, timotheou.stelios, christosp,ttheocharides } @ucy.ac.cy trafﬁc state estimation. Thereafter an example case study willbe presented where the pipeline has been implemented andvalidated using collected GPS traces as well as OBD (OnboardDiagnostic unit) measurements.The rest of the paper is structured as follows. Section IIincludes related work and demonstrates our contributions withrespect to the state-of-the-art. Section III provides a detailedderivation of our proposed pipeline and Section IV providesan experimental evaluation of this pipeline. Finally Section Vconcludes with key ﬁndings and future research avenues.II. R ELATED W ORK

A plethora of recent works have looked in detail in theproblem of road trafﬁc state estimation (using the funda-mental diagram) since it largely affects trafﬁc managementperformance. This is due to the fact that the FD provides alow-complexity modelling framework for characterizing therelationship between the three main mobility parameters (i.e.,speed, ﬂow, and density). Brieﬂy speaking, the FD consists oftwo distinct regimes that are separated by the critical densityof the road trafﬁc infrastructure under investigation. The tworegimes are the free-ﬂow regime where trafﬁc ﬂows at itsmaximum speed (i.e., at free-ﬂow speed) and the congestedregime where trafﬁc experiences speed reduction as densitykeeps increasing. The concept of the FD has been empiricallyvalidated using real trafﬁc data [7] and used to accuratelyestimate the outﬂow rate across a road network [8].Trafﬁc control techniques including Gating and Perimetercontrol, base their policies on the FD to maximize the outﬂowof a region by controlling its external inﬂow rate so as thenetwork remains in the free-ﬂow regime [9], [10] [11], [12].At the same time, Route Guidance methods aim at balancingthe trafﬁc load across the network by selecting routes basedon the FD characteristics [13].It is therefore evident that an accurate FD model is anessential building block for trafﬁc management. The seminalpaper in [3] discusses how GPS traces from a ﬂeet of taxiswere used to extract the FD model while the more recent workin [14] discusses how the FD can be extracted from scarcesensor data.Hereafter, we derive a new and novel approach to extract theFD from aerial video footage that has become both easy andcheap to acquire. Relevant datasets available to date includethe Stanford Drone Dataset [15], which is a dataset havingtrajectories of multiple road users taken from drone video data.Also, the NGSIM Dataset (Next Generation SIMulation) [16]is a large vehicle dataset, with high-quality trafﬁc data which is a r X i v : . [ c s . C V ] J u l destiny to be used in research of trafﬁc ﬂows. There has beenevaluations and further analysis of NGSIM dataset in [16], [17]that show however a lot of false positive trajectory collisionsand illogical vehicle speeds and accelerations. Speciﬁcallyfor trafﬁc monitoring, the HighD dataset [18] has recentlybecome available that includes naturalistic vehicle trajectoriesrecorded on German highways. This is a scenario-based testingfor the safety validation of highly automated vehicles. HighDalso extracts vehicle’s trajectory, size and manoeuvres usingmachine learning and computer vision algorithms.Machine learning for detection and tracking of vehicles hasbeen extensively researched in the recent past and our workin [19] is part of that research domain where Convolutionalneural networks (CNN) for aerial image processing has beenlooked at. III. FD E XTRACTION P IPELINE

As emphasized above, the aim of this work is to providean end-to-end pipeline for extracting the fundamental diagramform aerial video footage. The three main components of thispipeline are the image processing, vehicle tracking and FDextraction as elaborated below.

A. Image Processing

Top-down aerial video footage is used as input to a trainingdataset for vehicle dection. A ﬁrst step in this procedure is toextract images from the collected videos and either manuallyannotate vehicles or use tools such as DronetV3 [20] toautomatically annotate images using templates of the object.In our case a total of vehicles were annotated out of minutes of highway trafﬁc data (captured at m heightand covering a road segment of about m length).For evaluating vehicle detections, Darknet YoloV2 [21] andDronetV3 (based on Tiny-Yolov3), were used. As a note,YoloV2 runs in an ofﬂine mode while DronetV3 is light-enough to run in real-time. B. Vehicle Tracking

Using vehicle detection algorithms each object is pointedout using a bounding box with IDs that change over timedue to the lack of accurate data association. To address thisproblem and be able to track correctly vehicle trajectories, inthis work the Hungarian Algorithm [22] in combination withKalman ﬁltering [23] is used.By employing the Hungarian algorithm (also known asKuhn-Munkres Algorithm), an object in the current frame ismatched to an object in the previous frame using a scorefunction. To associate objects in consecutive frames, the IoU

Figure 1. IoU Sample Scores (Intersection of Union) is employed here where the percentageof overlap between frames is used as exempliﬁed in Fig. 1.When the IoU scores above a certain threshold are found,matching the previous bounding box with the current detectedbox results to a good representation of bounding box trajec-tories for each vehicle.Evidently, the performance of this approach degrades whenvehicle dynamically change speeds or take sharp turns or evenwhen an occlusion occurs (eg. a vehicle passing under a tree).To address the aforementioned cases, on top of the matchingbetween successive frames, Kalman ﬁltering is also employed.Kalman ﬁltering is applied on every bounding box after abox has been matched using the Hungarian algorithm. Whenthe association is made, predictions and corrections (updatingKalman equations with real measurements) are made. Tocalculate the mean and covariance values, OpenCV’s KalmanFilter library is used. An example of what does Kalman Filteractually calculate is shown in Fig. 2.In essence Kalman ﬁltering is employed to keep track ofevery vehicle crossing in the ﬁeld-of-view. In those caseswhere a vehicle dynamically changes speeds or positions theIoU of the boxes between two frames may differ in such a waythat it cannot be matched as the same vehicle. Instead, usingKalman ﬁltering, predictions of the detected boxes (as shownin Fig. 3) and vehicle tracking becomes much more accurate.A pseudocode of the proposed approach can be found in Alg.1.

B / C A

Optimal state estimateDrone's Camera lens D Figure 2. Kalman Filter ExplanationFigure 3. Harpy’s Kalman Filter usage preview (best viewed in colour)

C. Addressing Occlusion

When an occlusion occurs, vehicle stop being observed bythe camera. In this case, the Kalman ﬁlter can still predict the next position of the vehicle, use that information to track thevehicle trajectory and eventually match up with the detectedvehicle when it becomes visible again.The main challenge here is the fact that since Kalman isnot updating its measurements, a motion model needs to beintroduced. Hereafter we use a simple linear motion modelwhere the displacement between the last set of frames wherethe vehicle was detected is used to estimate subsequent vehiclepositions.To aid understanding, Fig. 4 provides an illustrative exam-ple. In case the predicted vehicle trajectory does not matchwith a detected vehicle over a certain period of time, theestimates are discarded. Clearly, predictions made over ex-tended periods of time will substantially deviate from realitydue to the model imperfections. A pseudocode of the occlusionalgorithm can be found in Algorithm 2.

Result:

Vehicle Bounding Box and Statistics while

Boxes detected previously do calculate IoU with current detection; if best IoU score exists and bigger than threshshold then match boxes - previous with current box; else previous box not found; endendif previous box not found then initialization of box; else Calculate average of Euclidean distance fromprevious frames to current (only for the latest frames);Calculate velocity of the vehicle in Km/h ;Kalman Predict();Kalman Correct() using real mesurements;Calculate Direction of the vehicle - depending on thebox;Display Bounding Box and Trajectory;Save all the statistics of the vehicle to array; endAlgorithm 1: Harpy Detection and Tracking Algorithm

Figure 4. Harpy’s Kalman Filter usage for occlusion from a tree (ID 207 ofthe vehicle remains the same).

D. Velocity Estimation

To calculate the velocity of moving vehicles from detec-tions, a representation of pixels to real distances is needed. Toobtain that relationship, the Ground Sample Distance (GSD)is employed as mentioned in [24]. Ground Sample Distance isthe distance between centre points of each sample taken of the

Result:

Display Vehicle Bounding Box on Occlusion if Vehicle stop being detected for less than X frames thenif is new detection then ignore it; else

Kalman Predict();Calculate x,y difference from previous frames;Kalman Correct();Display the Box; endelse remove the box from being active; end Algorithm 2:

Harpy Occlusion preventionground. In simpler terms, the GSD is the representation, in realsize, of each pixel on the 2D plane. Calculating it requires aset of parameters such as the UAV height, the camera’s sensorheight and width, focal length of the camera and the imagewidth, height of the video taken. Of course, these parametersneed to be adjusted when either image size, UAV heightor camera lenses are changed. The latter parameters can betaken from the manufacturers technical speciﬁcations. Then,by calculating the GSD for height and width separately, theworst case scenario is picked as our GSD. The equations areas follows:

GSD h = D ∗ BA ∗ E , GSD w = D ∗ CA ∗ FGSD final = GSD worst

Kmpixels

Assuming A is the focal length, B and C are the camera’ssensor height and width respectively, D is the drone’s heightand E, F are the image’s height and width respectively. Ashowcase of what these parameters are is shown in Fig. 2.When GSD is calculated a correct representation of cen-timetres to pixels (cm/px) is obtained and used to calculatethe average Euclidean distance over consecutive frames. Tocalculate the average of the Euclidean, the difference betweenthe last f detected frames is accounted for. Then given theframe rate of the video the velocity for each vehicle trajectorycan be calculated as follows: Eu = (cid:112) ( | x − x | ∗ GSD ) + ( | y − y | ∗ GSD ) V elocity = ( (cid:80) Eu ) ∗ F RF D (cid:18)

Kmh (cid:19) assuming FR and FD are Frame Rate and Frame Differencerespectively and Eu is the Euclidean.To verify these estimates, a simple real-life experiment wasconducted using a test vehicle. Aerial footage of our testvehicle was collected while driving over a particular roadsegment. Video recording were made using different heightsbetween to meters. At the same time an OBD (onboarddiagnostic unit) was used to capture timestamped readingsof the vehicle speed while a GPS tracker was used in orderto take measurements of the position and hence the velocityof the vehicle as well. As it turns out from the comparison of these three methods, the proposed tracking algorithm wasable to achieve an accuracy of in the speed estimatesas compared to the OBD. The experiment test results for thecase of the 150m height data acquisition can be seen in Fig.5. The straight lines connecting the traces acquired from theaerial footage account for the time periods where the vehiclewent out and back in the ﬁeld-of-view of the UAV during theexperiment. Figure 5. Comparison of actual speed measurements captured by the threecomplimentary approaches. A small offset in the timing of the measurementsis related to matching the video frames to the timestamped data collectedfrom OBD and GPS trackers.

E. Trafﬁc Monitoring Statistics

In addition to velocity estimates, vehicle detections can alsoprovide a number of additional measurements including perframe vehicle density and inﬂow/outﬂow vehicle counters. Ineffect, this information can be used to extract the fundamentaldiagram of a road segment and be used to characterize thetrafﬁc state. In summary, the following set of data wereextracted using the proposed pipeline: • X,Y position and timestamp of every vehicle for everydetection. • Vehicle Velocities for each detection. • Vehicle directions for each detection (left, right,top/bottom-right/left) based on the boxes difference be-tween each frame. • Density of vehicles in each frame. • Inﬂow/outﬂow of vehicles.IV. E

XPERIMENTAL E VALUATION

To demonstrate the applicability of the Harpy FD pipeline,3 hours of aerial video was captured using DJI Mavic En-terprise UAVs ﬂying at 150m altitude above a single roadsegment in Nicosia, Cyprus. The training was done using morethan images with more than vehicle annotationsobtained from various own and online sources. Furthermore,both YoloV2 and DronetV3 networks were employed forperformance comparison. The main reason of using DronetV3is to investigate the trade-off between processing time anddetection accuracy. As a note, even though the data was beingprocessed ofﬂine, having a smaller network can signiﬁcantlyreduce the processing time, especially when dealing withlong duration and high quality video footage. Also another reason for choosing YoloV2 instead of YoloV3 for ofﬂinedetection was due to the fact that the system conﬁgurationcould for example, not be able to handle detection of K (or higher) resolution footage using YoloV3 due to lack ofmemory resources. Using YoloV2, we could extract detectionsof K resolution (or downscaling a k video to K ) footageand since results were obtained from that process it was alsoconsidered.Taking a look on Table I it is clear that in terms of MAP(Mean Average Precision) accuracy of these two networks,they are not too much apart, since YoloV2 does not have muchmore layers than DronetV3. Although, the IoU percentage ofthe ground truth of the detections is much more higher usingYoloV2. That is another reason why YoloV2 was chosen forour Harpy Dataset example. Having a better IoU means thatthe boxes of the detections are much more accurate in termsof vehicle shapes and resulting to more precise trajectories.The training and detection tests were done using a desktopcomputer with an i7-7800X 12 core CPU @3.5Ghz, 64GB ofRAM and an NVIDIA RTX 2080 11GB. The evaluation wasdone on the collected 3 hour video where more than vehicles were extracted. Table IT

ABLE OF I O U AND

MAP

COMPARISON OF D RONET AND Y OLO . IOU (%) MAP (%)Threshold(%)

15 25 50 75 15 25 50 75

YoloV2 ,

46 75 ,

38 73 ,

34 63 ,

51 44 ,

89 44 ,

88 40 ,

54 33 , DronetV3 ,

82 51 ,

89 48 ,

69 21 ,

54 38 ,

53 41 ,

68 35 ,

89 10 , Figure 6. Speed-density relationship extracted using the Harpy dataset.

Figure 7. Fundamental Diagram. It represents the relationship between trafﬁcﬂux to trafﬁc density. When the FD graph decays, it means trafﬁc jam startsto occur.

V. C

ONCLUSION AND F UTURE W ORK

This work develops a detailed pipeline for trafﬁc monitoringusing aerial video data through detection and tracking of vehi-cles and ﬁnally trafﬁc state estimation. The proposed pipelinewas empirically evaluated using OBD and GPS measurements.Thereafter the proposed pipeline was used to extract the FDfrom video collected from a particular road section in Nicosia,Cyprus.As future work, we will be exploring online solutions basedon light-weight deep learning algorithms (e.g., [25]) that couldprovide adequate accuracy and run in real-time on resource-limited onboard UAV processors. In addition, our aim is toexplore our solution over complementary datasets with varyingparameters. A

CKNOWLEDGEMENT

This work is supported by the European Union’s Hori-zon 2020 research and innovation programme under grantagreement No 739551 (KIOS CoE) and from the Republicof Cyprus through the Directorate General for EuropeanProgrammes, Coordination and Development.R

EFERENCES[1] E. Commission, “Commission staff working document - towards aeuropean strategy for the development of civil applications of remotelypiloted aircraft systems(rpas),” 2012.[2] G. Puppo, M. Semplice, A. Tosin, and G. Visconti, “Fundamentaldiagrams in trafﬁc ﬂow: the case of heterogeneous kinetic models,” arXivpreprint arXiv:1411.4988 , 2014.[3] N. Geroliminis and C. F. Daganzo, “Existence of urban-scale macro-scopic fundamental diagrams: Some experimental ﬁndings,”

Transporta-tion Research Part B: Methodological , vol. 42, no. 9, pp. 759–770, 2008.[4] M. Papageorgiou, C. Diakaki, V. Dinopoulou, A. Kotsialos, and Y. Wang,“Review of road trafﬁc control strategies,”

Proceedings of the IEEE ,vol. 91, no. 12, pp. 2043–2067, 2003.[5] A. Kouvelas, M. Saeedmanesh, and N. Geroliminis, “Enhancing model-based feedback perimeter control with data-driven online adaptive op-timization,”

Transportation Research Part B: Methodological , vol. 96,pp. 26–45, 2017.[6] C. Menelaou, S. Timotheou, P. Kolios, and C. Panayiotou, “Joint routeguidance and demand management for multi-region trafﬁc networks,”pp. 2183–2188, 2019.[7] N. Geroliminis, C. F. Daganzo et al. , “Macroscopic modeling of trafﬁcin cities,” no. 07-0413, 2007.[8] C. Daganzo, “Urban gridlock: Macroscopic modeling and mitigationapproaches,”

Transportation Research Part B: Methodological , vol. 41,pp. 49–62, 01 2007. [9] A. Mazloumian, N. Geroliminis, and D. Helbing, “The spatial variabilityof vehicle densities as determinant of urban network capacity,”

Philo-sophical Transactions of the Royal Society A: Mathematical, Physicaland Engineering Sciences , vol. 368, no. 1928, pp. 4627–4647, 2010.[10] N. Geroliminis and J. Sun, “Properties of a well-deﬁned macroscopicfundamental diagram for urban trafﬁc,”

Transportation Research Part B:Methodological , vol. 45, no. 3, pp. 605–617, 2011.[11] M. Keyvan-Ekbatani, A. Kouvelas, I. Papamichail, and M. Papageorgiou,“Exploiting the fundamental diagram of urban networks for feedback-based gating,”

Transportation Research Part B: Methodological , vol. 46,no. 10, pp. 1393–1403, 2012.[12] J. Haddad and N. Geroliminis, “On the stability of trafﬁc perimetercontrol in two-region urban cities,”

Transportation Research Part B:Methodological , vol. 46, no. 9, pp. 1159–1176, 2012.[13] M. Yildirimoglu and N. Geroliminis, “Approximating dynamic equilib-rium conditions with macroscopic fundamental diagrams,”

Transporta-tion Research Part B: Methodological , vol. 70, pp. 186–200, 2014.[14] O. Q. Montoya and C. Canudas-de Wit, “Estimation of fundamental dia-grams in large-scale trafﬁc networks with scarce sensor measurements,”pp. 3457–3462, 2018.[15] A. Robicquet, A. Sadeghian, A. Alahi, and S. Savarese, “Learning socialetiquette: Human trajectory understanding in crowded scenes,” pp. 549–565, 2016.[16] B. Coifman and L. Li, “A critical evaluation of the next generationsimulation (ngsim) vehicle trajectory dataset,”

Transportation ResearchPart B: Methodological , vol. 105, pp. 362–377, 2017.[17] M. Montanino and V. Punzo, “Trajectory data reconstruction andsimulation-based validation against macroscopic trafﬁc patterns,”

Trans-portation Research Part B: Methodological , vol. 80, pp. 82–106, 2015.[18] R. Krajewski, J. Bock, L. Kloeker, and L. Eckstein, “The highd dataset:A drone dataset of naturalistic vehicle trajectories on german highwaysfor validation of highly automated driving systems,” pp. 2118–2125,2018.[19] C. Kyrkou, S. Timotheou, P. Kolios, T. Theocharides, and C. G.Panayiotou, “Optimized vision-directed deployment of uavs for rapidtrafﬁc monitoring,” pp. 1–6, 2018.[20] C. Kyrkou, G. Plastiras, T. Theocharides, S. I. Venieris, and C.-S.Bouganis, “Dronet: Efﬁcient convolutional neural network detector forreal-time uav applications,” pp. 967–972, 2018.[21] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767 , 2018.[22] H. W. Kuhn, “The hungarian method for the assignment problem,”

Navalresearch logistics quarterly , vol. 2, no. 1-2, pp. 83–97, 1955.[23] R. E. Kalman, “A new approach to linear ﬁltering and predictionproblems,”

Journal of basic Engineering , vol. 82, no. 1, pp. 35–45,1960.[24] P. Aero, “What is ground sample distance (gsd) and how does it affectyour drone data?”

Propeller February , 2018.[25] V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, “Efﬁcient processing ofdeep neural networks: A tutorial and survey,”