[PDF] A Simulation-based End-to-End Learning Framework for Evidential Occupancy Grid Mapping

Abstract

Evidential occupancy grid maps (OGMs) are a popular representation of the environment of automated vehicles. Inverse sensor models (ISMs) are used to compute OGMs from sensor data such as lidar point clouds. Geometric ISMs show a limited performance when estimating states in unobserved but inferable areas and have difficulties dealing with ambiguous input. Deep learning-based ISMs face the challenge of limited training data and they often cannot handle uncertainty quantification yet. We propose a deep learning-based framework for learning an OGM algorithm which is both capable of quantifying first- and second-order uncertainty and which does not rely on manually labeled data. Results on synthetic and on real-world data show superiority over other approaches. Source code and datasets are available at this https URL

Full PDF

AA Simulation-based End-to-End Learning Frameworkfor Evidential Occupancy Grid Mapping*

Raphael van Kempen, Bastian Lampe, Timo Woopen, and Lutz Eckstein

Abstract — Evidential occupancy grid maps (OGMs) are apopular representation of the environment of automated ve-hicles. Inverse sensor models (ISMs) are used to computeOGMs from sensor data such as lidar point clouds. GeometricISMs show a limited performance when estimating states inunobserved but inferable areas and have difﬁculties dealingwith ambiguous input. Deep learning-based ISMs face thechallenge of limited training data and they often cannot handleuncertainty quantiﬁcation yet. We propose a deep learning-based framework for learning an OGM algorithm which is bothcapable of quantifying uncertainty and which does not rely onmanually labeled data. Results on synthetic and on real-worlddata show superiority over other approaches.

I. INTRODUCTIONAutomated vehicles rely on an accurate model of their en-vironment for planning safe and efﬁcient behavior. Depend-ing on the chosen representation of the environment in thismodel, different perception algorithms and different sensormodalities are best suited, each coming with correspondingadvantages and disadvantages. Often, several different ap-proaches are combined to compensate for the disadvantagesof one perception algorithm with the advantages of another.One common representation of the dynamic environmentare object lists. They contain the state of all objects detectedand tracked by the vehicle. Several methods for detection andtracking of objects in camera, radar and lidar data have beenpublished during the past years [1], [2], [3]. A drawback ofobject lists is that the perception algorithms generating themcan often only detect a ﬁxed set of predeﬁned object classes.Objects of these classes need to be explicitly contained inthe perception algorithms’ training data. Since there existextremely many classes of objects in the world, it is difﬁcultto ensure that all relevant objects are accounted for.Occupancy grid mapping algorithms, which are agnosticto the speciﬁc class of an object, can compensate for thisdisadvantage by reducing their task to simply assigning anoccupancy state to each cell in a grid, which describes adeﬁned area around a vehicle [4], [5]. They usually takedistance measurements, e.g. from a lidar sensor, as input.The resulting occupancy grid maps (OGMs) are for exampleused in the automated vehicles developed in the UNICARagilproject [6], [7], from which this paper also originates.To determine cell occupancy states from sensor data, aninverse sensor model (ISM) is required. In the past, geometric *This research is accomplished within the project ”UNICARagil” (FKZ16EMO0289). We acknowledge the ﬁnancial support for the project by theFederal Ministry of Education and Research of Germany (BMBF).The authors are with the Institute for Automotive Engineering(ika), RWTH Aachen University, 52074 Aachen, Germany {firstname.lastname}@ika.rwth-aachen.de

Fig. 1. A deep learning-based inverse sensor model predicts an evidentialoccupancy grid map (right) from a synthetic lidar point cloud (left),approximating the ground truth (middle). Grid cells with a large belief massfor the state

Free are colored green, those with a large belief mass for thestate

Occupied are colored red. The bigger the belief mass, the larger therespective value in the color channel. Black represents the maximum ofepistemic uncertainty. In the images, we marked a motorcyclist (yellowcircle), a truck (blue circle) and a pedestrian located next to a parked car(red circle). models have mostly been used [5]. These approaches areoften suitable for static and ﬂat environments, but fail in dy-namic and uneven environments. Recent works also proposedeep neural networks for this task as so-called deep ISMs[8], [9]. As usual with supervised learning, the gatheringof training data poses a challenge here. Both [8] and [9]use cross-modal training. They make use of a lidar-basedgeometric ISM to generate training data for a radar-baseddeep ISM. This approach enables the learned model to inferoccupancy information which cannot be deduced with thegeometrical approach alone. Since the model is trained withdata that does not constitute ground-truth data but only anestimation from the lidar-based geometric ISM, the trainedmodels suffer from the restrictions of the lidar-based ISM.If cross-modal training is discarded, one can make use ofmanually labeled data for training deep ISMs. This approachis infeasible for many though, because of the associatedamount of manual labeling work. Fortunately, simulationsoftware for automated driving is rapidly evolving. Bettersensor models and the possibility to automatically generateground-truth data make this approach viable. With morerealistic synthetic data, the reality gap is closing and gen-eralization of neural networks from synthetic to real-worlddata becomes possible.In any event, there is a need for the quantiﬁcation ofuncertainty in the estimates of perception algorithms. Thecapability of neural networks to output a measure of theirconﬁdence in a prediction is not yet reﬂected in many deeplearning-based approaches. a r X i v : . [ c s . R O ] F e b ur work takes into account all of the aforementionedchallenges and contributes a framework in which they aredealt with. II. CONTRIBUTIONWe present an end-to-end learning framework for trainingdeep learning-based inverse lidar sensor models using syn-thetic data. First, a method for generating training samplesconsisting of lidar point clouds as input data and eviden-tial ground-truth OGMs as labels is presented. Second, wepropose a suitable neural network architecture that extendsthe popular PointPillars architecture [3] with an evidentialprediction head (EPH). The EPH is capable of estimating anevidential OGM. Finally, we evaluate the performance of ourapproach both on synthetic data and on real-world data. Weshow that the trained model is able to generalize to differentsynthetic environments and also produces promising resultson real-world data.III. BACKGROUNDIn the following, the concept of OGMs and an overviewof current approaches for computing OGMs through inversesensor models (ISMs) is presented. A geometric ISM willbe the baseline for the evaluation of the deep ISM proposedin this work. Additionally, a brief introduction into evidencetheory and subjective logic is given as this is the basis forthe loss function that is used with the proposed deep neuralnetwork. A. Occupancy Grid Maps

OGMs as introduced in [4] divide the vehicle’s environ-ment into discrete cells containing occupancy information.The occupancy state of each cell at time k can be representedas a binomial random variable o k ∈ { O, F } ( O : Occupied ; F : Free ). The probability of each cell being occupied attime k can be derived from the current measurement z k using an inverse sensor model (ISM) p z k ( o k | z k ) . Often,a binary Bayes ﬁlter is used to create an OGM from aset of distance measurements [5]. This approach has somedeﬁciencies. First, the binary Bayes ﬁlter relies on theMarkov assumption that there is no temporal correlation ofthe occupancy state, and that the state does not change intime, which is not adequate for a dynamically changingenvironment. Additionally, by representing the occupancyas a probability value p k ( o k ) , it cannot be distinguishedbetween cells which are uncertain because they cannot beobserved, e.g. due to occlusions, and because of conﬂictingevidence as both cases are described with p k ( o k ) ≈ . . Thischallenge can be tackled with evidence theory as describedin Section III-C. B. Inverse Sensor Models

While a sensor model describes how the environment isrepresented in sensor data, an inverse sensor model (ISM)is required to reconstruct information about the environmentfrom sensor data, as done in e.g. grid mapping.

Geometric ISMs process distance measurements and cal-culate a probability distribution for the occupancy of cells at the location of the measurement as well as for cells inbetween the sensor and the measurement. These modelsconsider the sensor’s inaccuracy, which is usually hand-crafted, but can also be learned [5]. Hand-designed methodsassume a ground model to separate ground from obstacles inthe measurement. By ﬁltering techniques, such as Bayesianﬁltering, the information gathered from each data point inthe measurement can be combined into one measurementOGM. By combining the latter with a previous estimate ofthe OGM, a new estimate with reduced uncertainty can befound [5].

Deep ISMs are deep learning-based inverse sensor mod-els, which take distance measurements as input for a deepneural network, which is used to predict an OGM. For thispurpose, tensor-like representations for the sensor data asinput and the OGM as output must be found such that theycan be processed.In [9], a deep radar ISM is introduced. Multiple radarmeasurements are combined into one bird’s-eye-view imageserving as input for a semantic segmentation task with classes

Occupied , Free and

Unobserved . Training data is createdfrom OGMs calculated from lidar measurements using ageometric ISM.An end-to-end learning framework to predict future OGMsfrom lidar measurements using a recurrent neural network(RNN) is presented in [10]. The model is trained using unsu-pervised learning by creating training samples of OGMs withprevious measurements. They use a naive model to create theground-truth OGM by solely treating all reﬂection points ina height of . to . meters as obstacles. This only worksin ﬂat environments and with small pitch and roll angles ofthe ego-vehicle. They encode the lidar measurements intotwo matrices covering the vehicle’s environment with binaryvalues describing the visibility and the occupancy of eachcell. The OGM is encoded as a binary matrix which doesnot sufﬁciently allow describing uncertainty.In [8], radar data is transformed into a two-channel bird’s-eye-view image with one channel containing static and theother containing dynamic detections. The OGM is repre-sented as a three-channel image containing belief masses (cf.Section III-C) for the states Occupied , Free and

Unknown .The task is treated as image segmentation problem. In [11],they combine a deep radar ISM with geometrical lidar andradar ISMs to increase the perception ﬁeld and to reduce thetime needed to populate the occupancy grid.The deep neural network presented in [12] predicts abird’s-eye-view image containing occupancy, object class andmotion information in one shot. A sequence of lidar pointclouds encoded as binary matrices which are stacked along athird dimension are used as input data. The model is trainedusing labeled data from the nuScenes data set [13].A methodology capable of transforming segmented imagesfrom vehicle cameras to a semantic OGM is presented in[14]. They also use synthetic data and try to bridge thereality gap by using segmented images as an intermediaterepresentation between real-world and synthetic sensor data. . Evidence Theory

Evidence Theory as introduced by Dempster and Shafer(DST) [15] can be understood as a generalization of Bayesianprobability theory [16]. It allows the explicit considerationof epistemic uncertainty and has also been used with OGMs[17]. Using DST, belief masses are assigned to all subsets ofthe frame of discernment Θ . For cells in an evidential OGM,this can consist of all possible and mutually exclusive cellstates Free ( F ) and Occupied ( O ): Θ = { F, O } . The powerset Θ = {∅ , { F } , { O } , Θ } contains all possible subsets of Θ to which belief masses m can be assigned. m : 2 Θ → [0 , (1) m ( ∅ ) = 0 (2) (cid:88) A ∈ Θ m ( A ) = 1 (3)In this example, m ( O ) constitutes evidence for a cell beingoccupied and m ( F ) for a cell being free. Additionally, astate for which no evidence is available can be addressedwith m (Θ) while the empty set is no possible outcome. Subjective Logic (SL) is a mathematical framework forreasoning under uncertainty. It explicitly distinguishes be-tween epistemic opinions and aleatoric opinins. A directbijective mapping between the belief mass distribution m ( A ) in DST and a subjective opinion, i.e. a belief mass distribu-tion b A and an uncertainty mass u in SL is given by [18]. b A = m ( A ) , A ∈ Θ (4) u = m (Θ) (5) (cid:88) A ∈ Θ b A + u = 1 (6)Subjective opinions are equivalent to a Dirichlet probabil-ity density function (PDF)Dir ( p , α ) = 1 B ( α ) (cid:89) A ∈ Θ p α A − A (7)with prior probabilities p and parameters αp = (cid:40) p A | A ∈ Θ , (cid:88) A ∈ Θ p A = 1 , ≤ p A ≤ (cid:41) α = { α A | A ∈ Θ } (8)and the multivariate beta function B in terms of the gammafunction Γ B ( α ) = (cid:81) A ∈ θ Γ( α A )Γ( (cid:80) A ∈ Θ α A ) . (9)Evidence for the singletons in the FOD e A ≥ , A ∈ Θ can be converted to parameters of a Dirichlet PDF and to asubjective opinion ( b , u ) with the number of classes K = | Θ | and the Dirichlet strength S = (cid:80) A ∈ Θ α A : α A = e A + 1 , A ∈ Θ (10) b A = e A S (11) u = KS (12) The authors of [19] show that a deep neural network canbe trained to predict the parameters α of a Dirichlet PDFto express uncertainty in a classiﬁcation task on the MNISTdata set [20]. They propose three loss functions while theapproach using the sum of squares led to the best resultsand is used in this work. The loss for one grid cell i withnetwork parameters w , expected class probabilities ˆ p i andthe true state y i = { y i,A ∈ { , } | A ∈ Θ } with y i,A beingone if state A is true and zero if false or unknown is given: L i ( w ) = (cid:107) y i − ˆ p i (cid:107) Dir ( ˆ p i , ˆ α i ) (13)This motivates us to create an evidential deep neuralnetwork for the task of occupancy grid mapping. We train amodel to predict the parameters of a Dirichlet PDF describingthe states of cells in an OGM.IV. LEARNING FRAMEWORKOur learning framework processes lidar point clouds asinput data and predicts evidential OGMs. In the following,the network architecture and a simulation-based method forgenerating and augmenting training data is presented. A. Network Architecture

The architecture of our deep neural network is based onthe popular PointPillars architecture [3] which is capableof accurately detecting objects in lidar point clouds, whileproviding a relatively low execution time. The network isdivided into three parts. First, there is the Pillar FeatureNet, which encodes the reﬂection points from the lidar mea-surement into denser features. Second, there is a 2D CNNbackbone which transforms the features into a high-levelrepresentation. Last, there is a Detection Head estimatingbounding boxes and motion states for the measured objects.In this work, the Detection Heads used in [3] are replacedwith a 2D convolutional layer with two output channelsand ReLU activation. Each pixel represents a cell i in thepredicted OGM where the channels contain evidence e i,A forboth singletons in the frame of discernment Θ = { F, O } , i.e.the cell being occupied or free. The predicted evidence canbe converted into estimated parameters of a Dirichlet PDF ˆ α i,A with Equation (10). The expected probability values of acell being occupied or free can be derived from the DirichletPDF as ˆ p i,A = ˆ α i,A /S i . Thus, following the simpliﬁcationpresented in [19], the loss function from Equation (13) inour application reduces to L i ( w ) = E (cid:2) y i,F − y i,F p i,F + p i,F (cid:3) + E (cid:2) y i,O − y i,O p i,O + p i,O (cid:3) = ( y i,F − ˆ p i,F ) + ˆ p i,F (1 − ˆ p i,F ) S i + 1+ ( y i,O − ˆ p i,O ) + ˆ p i,O (1 − ˆ p i,O ) S i + 1 . (14)We extend the loss function with a Kullback-Leiblerdivergence term with an impact factor λ t = min(1 . , t/ that increases from zero to one with the epoch number t asproposed in [19]. This regularization term penalizes a diver-gence of the Dirichlet parameters resulting from conﬂicting ig. 2. The left image shows the point cloud of a simulated VLP32C lidar sensor where the point color indicates the intensity of reﬂection. In the middle,the corresponding high-deﬁnition (HD) point cloud with 3000 layers and points colored by the material causing the reﬂection is shown. This is used tocreate the ground-truth occupancy grid map that is shown in the right picture. Free cells are colored green, occupied cells are colored red and unknowncells are black. In addition to the information gained from the HD point cloud, all cells covered by other trafﬁc participants are marked as occupied. Atraining sample consists of one point cloud as shown in the left image and a corresponding ground truth OGM as shown in the right image. evidence ˜ α i = y i +(1 − y ) (cid:12) α i from a distribution resultingfrom no evidence, i.e. α i = . Hence, it encourages thenetwork to reduce conﬂicting evidence in its output. Thisleads to the total loss function for all N cells of the OGMin one sample. L ( w ) = N (cid:88) i =1 L i ( w ) + λ t KL [ Dir ( p i | ˜ α i ) || Dir ( p i | )] (15)with the Kullback-Leibler divergence in terms of the gammafunction Γ and the digamma function ψ .KL [ Dir ( p i | ˜ α i ) || Dir ( p i | )] = log (cid:18) Γ(˜ α iF + ˜ α iO )Γ(2)Γ(˜ α iF )Γ(˜ α iO ) (cid:19) + (˜ α iF −

1) [ ψ (˜ α iF ) − ψ (˜ α iF + ˜ α iO )]+ (˜ α iO −

1) [ ψ (˜ α iO ) − ψ (˜ α iF + ˜ α iO )] (16)This loss function will be used to train the model in theexperiments such that it approximates the evidence massesin the grid cells. B. Training Data

To create training data from the simulation, we use amethod which we have already presented in [21]. The simu-lation environment [22] provides a lidar plugin that supportsadvanced ray tracing and physically-based rendering, i.e. itconsiders the physical properties of materials to generatemore realistic point clouds. We have modeled the sensorsetup of one of our research vehicles, which has a VelodyneVLP32C lidar sensor mounted in the middle of its roof. Inaddition to this 32-layer lidar sensor, another lidar sensorwith 3000 layers, which will be called high-deﬁnition (HD)lidar from now on, was added to the simulated vehicle. Asthe lidar plugin provides information about the material typethat caused a reﬂection, it is possible to derive a denseground-truth OGM from the simulated HD lidar data. TheHD lidar covers the same ﬁeld of view and is exposed tothe same occlusions as the real sensor to ensure that theOGMs in the training data will only contain informationwhich can theoretically be derived from the input point cloud.In addition, we augment the ground-truth OGMs based on aground-truth object list that is also provided by the simulation and contains information about the position and shape ofother trafﬁc participants. As we expect the deep ISM to learnto recognize the shape of e.g. cars and trucks, we mark allcells covered by them as occupied in the ground-truth OGMif a minimum of 50 reﬂections on the vehicle is present in theinput point cloud. Figure 2 shows that cells with reﬂectionson obstacles such as cars and trees are marked as occupiedin the ground-truth OGM, whereas the ground is marked asfree. We decided to mark also sidewalks as free space asprecise information on the lane geometry can be obtainedfrom a static map. In the event of a serious obstruction ofroad trafﬁc, the sidewalk could also be an option to pass by.

C. Augmentation

During the training, we augment the training data byapplying a random rotation to the input point cloud and thelabel OGM. Both are rotated around a vertical axis at theorigin of the lidar sensor. This turned out to be important,as without augmentation, cells on the far left and right sideof the OGM are almost always occupied, which makes itdifﬁcult for the deep ISM to classify those cells correctly.V. EXPERIMENTAL SETUP AND RESEARCHQUESTIONAs explained in Section IV-B, training data for the deeplearning-based inverse lidar model is created using a simula-tion environment. We have selected a model of an urban areaand created ten scenarios, each containing random variationsof the dynamic environment. In half of the scenarios, theego-vehicle takes a clockwise route, in the other half, ittakes a counter-clockwise route. The randomly generatedpulk trafﬁc contains cars, trucks and motorcycles. The typeof each of these objects is also chosen randomly from alarger catalogue of possibilities. Additionally, pedestrians areplaced at various locations on the sidewalks. Parked vehiclesare randomly put on parking lanes. With each scenario, . training samples were created at a sampling rate of one persecond. In total, . samples were generated for training.For the validation data set, another samples were createdin the same static, but a different dynamic environment.Another test data set consists of samples from a differentcenario in another part of the simulated urban area and willbe used for evaluation in Section VI-A.The neural network is trained using the Adam optimizer.The Pillar Feature Net [3] creates . pillars with amaximum of reﬂection points per pillar. The intensityof the reﬂection points in the simulated lidar point cloud isnormalized such that a distribution similar to one observedin real-world lidar point clouds is achieved. The output gridmaps have a length of . and a width of . meters. Acell’s side length is centimeters, resulting in a by cell grid map. The sensor origin is at the center of the gridmap. The map dimensions and cell size correspond to thedetection area and step size of the Pillar Feature Net. Aftera training for epochs with a batch size of , the minimumcross entropy and the maximum accuracy on the validationdata was achieved.In the following, we want to answer the following researchquestions: How well does a deep convolutional neural net-work that is based on our presented methodology performwhen predicting dense OGMs from lidar measurements?In particular, we want to analyze whether it is capableof capturing the epistemic uncertainty for cells where noreﬂection point is located. Then, how well does the networkperform when presented with real-world sensor data?VI. RESULTS AND DISCUSSIONFirst, we evaluate the performance of the trained modelon a synthetic test data set to analyze how well the trainedISM predicts OGMs in a scenario which was not containedin the training data.Afterwards, we test our model on real-world sensor datathat was recorded with one of our research vehicles. Fig. 3. The geometric ISM (middle) is only able to determine occupied cellscontaining reﬂection points of the measurement (left). The deep learning-based ISM has learned to derive more information, e.g. the whole spaceoccupied by vehicles.

A. Evaluation on Synthetic Data

We compare the OGMs that are created using our proposeddeep ISM to OGMs created using a geometric ISM whereall cells containing reﬂection points in a height of . to . meters above ground are marked as occupied while all cellsbetween the sensor origin and these cells are marked as free.Figure 4 shows the mean belief masses in both OGMs duringthe test scenario. It is apparent and conﬁrmed by Figure 3 that . . . . time in seconds m (Θ) m G (Θ) m ( F ) m G ( F ) m ( O ) m G ( O ) Fig. 4. Mean belief masses per OGM during the test scenario. m ( F ) is themean mass for a free, m ( O ) for an occupied and m (Θ) for an unknowncell state. The dashed lines represent results based on a geometric ISM,whereas the continuous lines show results from our deep ISM. time in secondsKL [ Dir ( p | ˆ α ) || Dir ( p | α )] KL [ Dir ( p | ˆ α G ) || Dir ( p | α )] Fig. 5. Comparison of the mean Kullback-Leibler divergence of bothestimated Dirichlet PDFs from the true Dirichlet PDF per OGM during thetest scenario. ˆ α are the Dirichlet parameters estimated by our deep ISMand ˆ α G are the parameters estimated using a geometric ISM. the geometric ISM creates a considerably higher proportionof cells with an unknown state, hence, less cells are classiﬁedas free or occupied. Figure 5 shows the Kullback-Leiblerdivergence of both Dirichlet PDFs, the one predicted fromthe deep ISM Dir ( p | ˆ α ) and the one from the geometric ISMDir ( p | ˆ α G ) , from the true PDF Dir ( p | α ) . It is evident that thedeep ISM estimates the cell states better than the geometricISM at any time. B. Evaluation on Real-World Data

Finally, we want to analyze whether the deep ISM trainedwith synthetic data can also be successfully applied toreal-world data. Thus, we test the model on real lidarmeasurements from one of our research vehicles that hasa similar shape and sensor setup as the simulated vehicle.Figure 6 shows two representative samples of the model’sperformance on a data set recorded in an urban environmentat the university’s campus in Aachen. It is apparent thatthe performance cannot keep up with the evaluation onsynthetic data. Nevertheless, it creates promising resultsin an environment that is considerably different from thesynthetic environment. In particular, the occupancy statesof cells that do not contain a reﬂection point are mostly ig. 6. Two representative samples showing the predicted OGMs (middle)using the deep ISM trained with synthetic data on real-world measurements(left) compared to a geometric ISM (right). rated correctly. A higher real-world performance is expectedwhen the simulation scenarios would take place in a modelof the actual static environment of the real-world scenarios.More samples and a higher diversity of the training dataare relatively easy to achieve in the simulation and are alsoexpected to further increase performance.VII. CONCLUSIONWe presented a new methodology to train neural networksfor the task of occupancy grid mapping using lidar pointclouds. The trained network performs considerably betterthan a classical approach when presented with synthetic data.It also shows advantages when presented with real-worlddata, even though many options to increase the network’sgeneralization capabilities remain. In contrast to many otherdeep learning-based approaches, our network is trained suchthat it can give an estimate of its prediction uncertainty. Ourmethodology is especially promising because it does not relyon manually labeled data. Future research will investigatemethods that further increase the network’s performance onreal-world data. R

EFERENCES[1] M. Aeberhard, “Object-level fusion for surround environment per-ception in automated driving applications,” Ph.D. dissertation, TUDortmund, Dortmund, 2017.[2] Y. Zhou and O. Tuzel, “Voxelnet: End-to-end learning for pointcloud based 3d object detection,” in , 2018, pp. 4490–4499.[3] A. H. Lang, S. Vora et al. , “Pointpillars: Fast encoders for objectdetection from point clouds,” in , 2019, pp. 12 689–12 697.[4] A. Elfes, “Using occupancy grids for mobile robot perception andnavigation,”

Computer , vol. 22, no. 6, pp. 46–57, Jun. 1989.[5] S. Thrun, W. Burgard, and D. Fox,

Probabilistic robotics , ser. Intelli-gent robotics and autonomous agents. Cambridge, Mass.: MIT Press,2005.[6] T. Woopen, B. Lampe et al. , “Unicaragil - disruptive modular ar-chitectures for agile, automated vehicle concepts,” in , 2018, pp. 663–694.[7] M. Buchholz, F. Gies et al. , “Automation of the unicaragil vehicles,”in , 2020, pp. 1531–1560.[8] D. Bauer, L. Kuhnert, and L. Eckstein, “Deep, spatially coherent in-verse sensor models with uncertainty incorporation using the evidentialframework,” in ,2019, pp. 2490–2495.[9] L. Sless, B. E. Shlomo et al. , “Road scene understanding by occupancygrid learning from sparse radar clusters using semantic segmentation,”in , 2019, pp. 867–875.[10] J. Dequaire, P. Ondrúška et al. , “Deep tracking in the wild: End-to-endtracking using recurrent neural networks,”

The International Journalof Robotics Research , vol. 37, no. 4-5, pp. 492–512, Apr. 2018.[11] D. Bauer, L. Kuhnert, and L. Eckstein. (2020) Deep inversesensor models as priors for evidential occupancy mapping. [Online].Available: http://arxiv.org/pdf/2012.02111v1[12] P. Wu, S. Chen, and D. N. Metaxas, “Motionnet: Joint perceptionand motion prediction for autonomous driving based on bird’s eyeview maps,” in , 2020, pp. 11 382–11 392.[13] H. Caesar, V. Bankiti et al. , “nuscenes: A multimodal dataset forautonomous driving,” in , 2020, pp. 11 618–11 628.[14] L. Reiher, B. Lampe, and L. Eckstein, “A sim2real deep learningapproach for the transformation of images from multiple vehicle-mounted cameras to a semantically segmented image in bird’s eyeview,” in , 2020.[15] G. Shafer,

A mathematical theory of evidence , ser. Limited paperbackeditions. Princeton, NJ: Princeton Univ. Press, 1976, vol. 42.[16] A. P. Dempster, “A generalization of bayesian inference,”

Journal ofthe Royal Statistical Society: Series B (Methodological) , vol. 30, no. 2,pp. 205–232, Jul. 1968.[17] D. Nuss, S. Reuter et al. , “A random ﬁnite set approach for dynamicoccupancy grid maps with real-time application,”

The InternationalJournal of Robotics Research , vol. 37, no. 8, pp. 841–866, Jul. 2018.[18] A. Jøsang,

Subjective Logic . Cham: Springer International Publishing,2016.[19] M. Sensoy, L. Kaplan, and M. Kandemir, “Evidential deep learning toquantify classiﬁcation uncertainty,” in

Advances in Neural InformationProcessing Systems 31 (NeurIPS 2018) , 2018, pp. 3179–3189.[20] Y. LeCun, C. Cortes, and C. J. Burges, “Mnist hand-written digit database,”

ATT Labs [Online]. Available:http://yann.lecun.com/exdb/mnist , vol. 2, 2010.[21] B. Lampe, R. van Kempen et al. , “Reducing uncertainty by fusingdynamic occupancy grid maps in a cloud-based collective environmentmodel,” in , 2020, pp.837–843.[22] K. von Neumann-Cosel, M. Dupius, and C. Weiss, “Virtual testdrive - provision of a consistent tool-set for [d,h,s,v]-in-the-loop,” in