Application of Neuroevolution in Autonomous Cars
AApplication of Neuroevolution inAutonomous Cars
Sainath G − − − , Vignesh S − − − ,Siddarth S − − − X ] , and G Suganya − − − Vellore Institude of Technology, Chennai, India chennai.vit.ac.in
Abstract.
With the onset of Electric vehicles, and them becoming moreand more popular, autonomous cars are the future in the travel/drivingexperience. The barrier to reaching level 5 autonomy is the difficulty inthe collection of data that incorporates good driving habits and the lackthereof. The problem with current implementations of self-driving carsis the need for massively large datasets and the need to evaluate thedriving in the dataset. We propose a system that requires no data for itstraining. An evolutionary model would have the capability to optimizeitself towards the fitness function. We have implemented Neuroevolu-tion, a form of genetic algorithm, to train/evolve self-driving cars in asimulated virtual environment with the help of Unreal Engine 4, whichutilizes Nvidias PhysX Physics Engine to portray real-world vehicle dy-namics accurately. We were able to observe the serendipitous nature ofevolution and have exploited it to reach our optimal solution. We alsodemonstrate the ease in generalizing attributes brought about by geneticalgorithms and how they may be used as a boilerplate upon which othermachine learning techniques may be used to improve the overall drivingexperience.
Keywords:
Neuroevolution · Neural Networks · Genetic Algorithm · Genera-tion · Fitness · Selection · Crossover · Mutation
The Society of Automobile Engineers(SAE) has coined six different levels ofautonomy beginning at level 0, absence of any autonomy, to level 5, completeautonomy requiring no human intervention whatsoever. Currently, many luxuryvehicles possess level 3 autonomy in terms of cruise control and active lanecontrol, and a handful of vehicles possess level 4 autonomy. Level 5 autonomy incars is still under research and development. The main barrier to attain this levelof autonomy is the task of collecting data and the lack thereof. Although a deepmodel is extremely adept at generalizing features, it can only learn what it sees.There are only so many scenarios we as humans can drive around that modelcan learn from. Essentially, even if it learns to navigate through a busy street, itmay not be able to correct oversteer or understeer due to several factors such as a r X i v : . [ c s . N E ] J un Sainath G. et al. poor roads, tire wear, etc. causing a loss of traction, which may not have beenaccounted for in the training dataset. That is why an evolutionary approachwould solve these issues. What if the car could learn to drive on its own, viatrial and error, over countless generations? It would have trained, evolved toovercome such edge cases and scenarios, and would know exactly what to doonce it detects wheel spin or any form of loss of traction and grip.
A neural network is an interconnected network of neurons, also called nodes.Each neuron has a set of output edges that activate based on the resultant valueobtained from the weighted inputs it received from the previous layer.
Fig. 1.
Topology of an Artificial Neural Network
In a supervised learning approach, we would have a list of attributes orfeatures as our inputs and a list of targets as our outputs. We would then haveto use back-propagation to train our neural network to correct its weight tosuit our target and increase its accuracy. Back-propagation is just a way ofpropagating the total loss back into the neural network to know how much ofthe loss each node is responsible for, and subsequently updating the weights insuch a way that minimizes the loss by giving the nodes with higher error rateslower weights and nodes with lower error, greater weights.So, in a situation where it is difficult to obtain a dataset large enough totrain the neural network to a certain degree of accuracy, we will face problems pplication of Neuroevolution in Autonomous Cars 3 arriving at our optimal solution. This is especially true in the scenario of self-driving cars, where large corporations like Nvidia use 1000 hours of driving datato train their vehicle to navigate the roads. In such scenarios, we could adoptan evolutionary technique that requires no datasets and train our model in asimulated environment.Although Deep Neural Networks utilizing convolutional layers have performedextraordinarily in several scenarios, the problem arises after the fact that it canonly generalize what it is shown or taught. Since there are countless more pos-sibilities of things that can happen on the road, which cannot be accounted forin the driving data we gather.
Reinforcement Learning is another critical area of research in autonomous ve-hicles, where an agent learns to accomplish a task by gathering experience byitself, rather than through a supervised dataset. The basic gist of the algorithmis that an agent granted a reward when it performs an action that is desirablein the current scenario and gets punished if it does something undesirable.
Fig. 2.
Basic flow of reinforcement Learning
Although this form of carrot and stick approach seems to be how we, as indi-viduals learn, the key drawback of this algorithm is that the agent has no priorexperience whatsoever. We humans learn pretty quickly through this approachdue to the generalization of a multitude of experiences that we have gatheredfrom birth till date. This is not the case for the agent, and so it takes quitea while, depending on the complexity of the problem, for the agent to gatherenough experiences in order for it to determine whether a certain action is goodor bad.
Genetic Algorithms, also called Evolutionary algorithms, inspired by the processof natural selection and Darwinian evolution, mimic species evolution to arrive
Sainath G. et al. at a solution. Each generation has a set of species that contain specific genes, andthe best are selected to populate the next generation. This process goes on untilwe arrive at our optimal solution. Genetic Algorithms are currently in use togenerate solutions for optimization and search problems by utilizing techniquessuch as selection, mutation, and crossover.
Neuro-Evolution is a Genetic Algorithm that is used to evolve artificial neuralnetworks. In this model, each species of a generation has a brain (the neu-ral network) that has a set of genes (weights). In the beginning, all species ofthe population have random weights and hence perform random actions. It isthrough serendipitous discovery that a certain species gets closer to our solution.We select this species based on a fitness function and pick similarly performingspecies to perform crossover. After crossover, we mutate this gene and pass iton to the next generation.So the entire genetic algorithm can be summarized to 3 key processes: – Selection: We select the best species of the generation based on the fitnessfunction. – Crossover: We crossover the genes of the population to converge onto oursolution. – Mutation: We mutate the genes, in the hope of a better solution, of theselected species following crossover.
Fig. 3.
Genetic Algorithm Flowchart
We can see that mutation and crossover seem a little opposite to one an-other. Mutation randomizes the weights of a certain percentage of neurons whilecrossover tries to converge them. There is a trade-off here between explorationand exploitation. Exploration via mutation is exploring new gene sets out of pplication of Neuroevolution in Autonomous Cars 5 a hope that something new can lead to promising results. While, exploitationvia crossover is taking what you learned and using that information, combin-ing the best, to inform newer decision-making processes. Typically, in geneticalgorithms, the mutation rate is at 1020%, while the crossover rate is at 8090%.
We chose to simulate Neuro-Evolution using Unreal Engine 4, which is a gameengine that utilizes Nvidias PhysX Physics Engine to replicate real-world likevehicle dynamics, which is essential if we plan to transfer the learning that hashappened in this environment.Compared to using simulators such as CARLA, which was also built on thesame engine, we have a lot more freedom when we build the whole environmentfrom the ground up, in terms of level design, vehicle physics, frame times (timedilations) and overall gives more power to the user.
FWD Layout
Since most vehicles these days in the low to mid-tier range arefront-engine, front-wheel drives (FF), we chose this as our vehicle layout and forthe differential, we went with a limited-slip differential (LSD) that prevents wheelspin, which is getting more and more common these days. The transmission ofthe vehicle is set to automatic. The suspension settings have also been alteredso that it favours under-steer rather than it over-steer, as most manufacturersdo these days, as it is easier to correct under-steer. Weight transfer and tyretraction are also essential aspects that dictate the vehicles physical handling,and are simulated accurately.
RWD Layout
We also wanted to observe how this approach would fare on amore difficult layout which is harder to control, which is the front engine, rearwheel drive (FR), also the typical sports car layout, as they more prone toover-steering and sliding through corners without proper throttle control andadequate counter-steering. The suspension of this layout has also been alteredso that it favours over-steering behaviour rather than under-steer.
Each vehicle we simulate has a brain that controls the values for the throttlepedal, the brake pedal, and the steering angles directly. This brain is our deepneural network which outputs a value from − to Sainath G. et al.
The inputs to the neural network are the distances (normalized to 0 to The genetic algorithm in this scenario is a higher-level entity that oversees theprocesses responsible for selection, crossover and mutation. It controls the mu-tation and crossover rates and is responsible for spawning and tracking all thefeatures of the entire vehicle species for each generation of the population.
Fig. 4.
Simple Pipeline of Neuroevolution
Initially, in the first generation, all the weights of all the neural networksare initialized to random values, and it is through serendipitous discovery aidedby the fitness function that we converge on to a solution through selection andcrossover and also search for a better solution through mutation. pplication of Neuroevolution in Autonomous Cars 7
In the first generations, all the weights of all the neural networks in the vehiclesare initialized randomly, so they have no clue what to do when they are spawnedand hence move randomly. To remove poorly performing agents, which is a cru-cial part of Darwinian evolution, survival of the fittest, we de-spawn vehiclesthat crash into obstacles and guard rails or those that do not reach a certainthreshold score within a predetermined period of time.The working of the whole model follows the following pipeline:1. The Genetic Algorithm agent spawns a population2. The top-performing vehicles are selected via the fitness function3. They are then crossover-ed by the weighted average of their weights in theneural network which is obtained by multiplying them with their relativefitness with respect to the population4. Once crossover is done, we mutate a small percentage of the weights bysetting them to random values.5. We now spawn the next generation of vehiclesThis process is repeated for several generations till we obtain satisfactory results.
Fig. 5.
Neuroevolution Architecture
Selection
For each vehicle, the distance travelled (in the direction of the course)each frame, which we call the score of the neural net and is calculated as: ∆d = v × ∆t (1) ∆d = distance travelled that frame v = instantaneous speed ∆t = frame time Sainath G. et al.
In order to prevent over-correcting behaviour and so that it doesn’t game thefitness score, we increment its score only when the angle between the velocityvector and the car’s forward vector is less than a threshold value which we setas 10 ° .From this, we calculate the net score each frame, which is the total distancetravelled (until it de-spawns) and is calculated as: score = (cid:88) ∆d (2)At the end of each generation, the relative fitness of each neural network iscalculated as: f itness i = score i (cid:80) pj =1 score j (3)where: f itness i = relative fitness of the current neural network score i = total distance travelled by the vehicle p = total population of the generationNow for spawning the next generation of vehicles, we pick the top n vehicleswith the greatest fitness ( n could be selected arbitrarily, we chose it to be 1 / th of the population, p ). Crossover
We now perform an arithmetic crossover of these n species by weightedaddition of their weights with respect to their fitness. For each connection in theneural network, the weight of the connection after crossover is calculated as: new w = n (cid:88) i =1 ( w i × f itness i ) (4)where: f itness i = relative fitness of the current neural network w i = weight of a certain connection in the neural net n = the top selected species of the generation Mutation
Once we perform crossover for about 80% of the weights in the neuralnetwork, we then move on to mutation which is typically done to about 20% ofthe weights by using a random function on the weights. pplication of Neuroevolution in Autonomous Cars 9
Within the simulated environment, we have observed that the population overseveral generations have evolved to not crash into obstacles, and also throughsheer randomness have decided to stick to one side of a lane in certain simula-tions. They also perform advanced traffic management techniques such as zippermerges. A zipper merge is when a car continues to stay on the lane even aftera blockade is located. They merge into the free lane only once they are close tothe blockade in order to prevent traffic congestions that would occur if everyonestopped using that lane entirely.
Fig. 6.
The neural net switches lane only at the very verge of colliding on to theobstacle(white wall to left)
A few things we observed over the course of running several simulations,iterating and variating different parameters, is that when the fitness function issimple, it generalizes the course pretty quickly and is able to navigate it wellover very few generations. But, when we alter it so that it favours a certainstyle of cornering or maintaining a certain amount of speed, it takes drasticallymore generations for it achieve this sort of specialization. This interpretation isbacked by the discrepancies seen in the number of generations it took for theFront engine, Rear wheel drive (FR) layout compared to the Front engine, Frontwheel drive(FF) layout.
Fig. 7.
The neural net has decided to stick to the left lane on the road through sheerrandomness which can nurtured by altering the fitness function
Fig. 8.
The neural net has learned how to counter-steer and control the car on theonset of over-steerpplication of Neuroevolution in Autonomous Cars 11
Table 1.
Generations taken by the neural nets to evolve enough to navigate the entirecourse without crashing
Layout Crossover Rate Mutation Rate Generation Population
FR 80% 20% 97 4850FR 80% 10% 225 11250FR 90% 20% 145 7250FR 90% 10% 171 8550FF 80% 20% 24 1200FF 80% 10% 26 1300FF 90% 20% 38 1900FF 90% 10% 12 600
Fig. 9.
Once one the neural nets hits a peak, it is able to constantly replicate the peakswhich is an indication of evolution2 Sainath G. et al.
Based on the results we’ve observed above, we can come to the conclusion thatgenetic algorithms such as NeuroEvolution can speed up the initial phase ofgeneralizing features several fold compared to traditional techniques such asback-propagation which place the prerequisite of procuring a massive datasetfor training so as to not over-fit the solution. But once the network attains thebasic cognitive abilities for driving, it can be further improved upon throughreinforcement learning techniques such as Deep Q-learning, since it has alreadygathered a plethora of experiences over several generations, which is one of thekey barriers slowing down reinforcement, since now we can quickly jump to thephase where the focus is more on obtaining as many rewards as possible ratherthan the initial phase of gathering experience where the agent primarily tries tojust not get punished for its actions.