Data-Driven Crowd Simulation with Generative Adversarial Networks
Javad Amirian, Wouter van Toll, Jean-Bernard Hayet, Julien Pettré
DData-Driven Crowd Simulationwith Generative Adversarial Networks
Javad Amirian [email protected] Rennes, Inria, CNRS, IRISARennes, France
Wouter van Toll [email protected] Rennes, Inria, CNRS, IRISARennes, France
Jean-Bernard Hayet [email protected] de Investigación en MatemáticasGuanajuato, Mexico
Julien Pettré [email protected] Rennes, Inria, CNRS, IRISARennes, France
ABSTRACT
This paper presents a novel data-driven crowd simulation methodthat can mimic the observed traffic of pedestrians in a given en-vironment. Given a set of observed trajectories, we use a recentform of neural networks, Generative Adversarial Networks (GANs),to learn the properties of this set and generate new trajectorieswith similar properties. We define a way for simulated pedestrians(agents) to follow such a trajectory while handling local collisionavoidance. As such, the system can generate a crowd that behavessimilarly to observations, while still enabling real-time interactionsbetween agents. Via experiments with real-world data, we showthat our simulated trajectories preserve the statistical properties oftheir input. Our method simulates crowds in real time that resem-ble existing crowds, while also allowing insertion of extra agents,combination with other simulation methods, and user interaction.
CCS CONCEPTS • Computing methodologies → Intelligent agents ; Neural net-works ; Motion path planning ; Real-time simulation . KEYWORDS crowd simulation, content generation, machine learning, intelligentagents, generative adversarial networks
ACM Reference Format:
Javad Amirian, Wouter van Toll, Jean-Bernard Hayet, and Julien Pettré. 2019.Data-Driven Crowd Simulation with Generative Adversarial Networks. In
Computer Animation and Social Agents (CASA ’19), July 1–3, 2019, PARIS,France.
ACM, New York, NY, USA, 4 pages. https://doi.org/10.1145/3328756.3328769
The realistic simulation of human crowd motion is a vast researchtopic that includes aspects of artificial intelligence, computer ani-mation, motion planning, psychology, and more. Generally, the goalof a crowd simulation algorithm is to populate a virtual scene with
Publication rights licensed to ACM. ACM acknowledges that this contribution wasauthored or co-authored by an employee, contractor or affiliate of a national govern-ment. As such, the Government retains a nonexclusive, royalty-free right to publish orreproduce this article, or to allow others to do so, for Government purposes only.
CASA ’19, July 1–3, 2019, PARIS, France © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM.ACM ISBN 978-1-4503-7159-9/19/07...$15.00https://doi.org/10.1145/3328756.3328769 a crowd that exhibits visually convincing behavior. The simulationshould run in real time to be usable for interactive applicationssuch as games, training software, and virtual-reality experiences.Many simulations are agent-based : they model each pedestrian as aseparate intelligent agent with individual properties and goals.To simulate complex behavior, data-driven crowd simulationmethods use real-world input data (such as camera footage) togenerate matching crowd motion. Usually, these methods cannoteasily generate new behavior that is not literally part of the input.Also, they are often difficult to use for applications in which agentsneed to adjust their motion in real-time, e.g. because the user ispart of the crowd.In this paper, we present a new data-driven crowd simulationmethod that largely avoids these limitations. Our system enablesthe real-time simulation of agents that behave similarly to observa-tions, while allowing them to deviate from their trajectories whenneeded. More specifically, our method:(1) learns the overall properties of input trajectories, and can gen-erate new trajectories with similar properties;(2) embeds these trajectories in a crowd simulation, in which agentsfollow a trajectory while allowing for local interactions.For item 1, we use Generative Adversarial Networks (GANs) [4], anovel technique in machine learning for generating new contentbased on existing data. For item 2, we extend the concept of ‘routefollowing’ [6] to trajectories with a temporal aspect, prescribing aspeed that may change over time.Using a real-world dataset as an example, we will show thatour method generates new trajectories with matching styles. Oursystem can (for example) reproduce an existing scenario with ad-ditional agents, and it can easily be combined with other crowdsimulation methods.
Agent-based crowd simulation algorithms model pedestrians asindividual intelligent agents. In this paradigm, many researchersfocus on the local interactions between pedestrians (e.g. collisionavoidance) using microscopic algorithms [2, 5]. In environmentswith obstacles, these need to be combined with global path planninginto an overall framework [12]. A growing research topic lies inmeasuring the ‘realism’ of a simulation, by measuring the similaritybetween two fragments of (real or simulated) crowd motion [13]. a r X i v : . [ c s . G R ] M a y ASA ’19, July 1–3, 2019, PARIS, France Javad Amirian, Wouter van Toll, Jean-Bernard Hayet, and Julien Pettré
Complex real-life behavior can hardly be described with simplelocal rules. This motivates data-driven simulation methods, whichbase the crowd motion directly on real-world trajectories, typi-cally obtained from video footage. One category of such methodsstores the input trajectories in a database, and then pastes the best-matching fragments into the simulation at run-time [7, 8]. Anothertechnique is to create pre-computed patches with periodically re-peating crowd motion, which can be copied and pasted throughoutan environment [15]. Such simulations are computationally cheap,but difficult to adapt to interactive situations.Researchers have also used input trajectories to train the param-eters of (microscopic) simulation models [14], so as to adapt theagents’ local behavior parameters to match the input data. However,this cannot capture any complex (social) rules that are not part ofthe used simulation model.To replicate how agents move through an environment at ahigher level, some algorithms subdivide the environment into cellsand learn how pedestrians move between them [11, 16]. Our goal issimilar (reproducing pedestrian motion at the full trajectory level),but our approach is different: we learn the spatial and temporalproperties of complete trajectories, generate new trajectories withsimilar properties, and let agents flexibly follow these trajectories.Our work uses Generative Adversarial Networks (GANs) [4], arecent AI development for generating new data. GANs have beensuccessful at generating creative content such as faces [3]. Recently,researchers have started to adopt GANs for short-term predictionof pedestrian motion [1]. To our knowledge, our work is the firstto apply GANs in crowd simulation at the full trajectory level.
In this section, we describe our GAN-based method for generatingtrajectories that are similar to the examples in our training data.As in most crowd-simulation research, we assume a planar en-vironment and we model agents as disks. We define a trajectory as a mapping π : [ , T ] → R that describes how an agent movesthrough an environment during a time period of T seconds. Notethat a trajectory encodes speed information: our system shouldcapture when agents speed up, slow down, or stand still.In practice, we will represent a trajectory π by a sequence of n π points [ p π , . . . , p πn π − ] separated by a fixed time interval ∆ t ; that is,each p πi has a corresponding timestamp i · ∆ t . In our experiments,we use ∆ t = .
4s because our input data uses this value as well. Wewill use the notation p πj : k to denote a sub-trajectory from p πj to p πk .Given a dataset of trajectories Π = { π , . . . , π N − } , our genera-tor should learn to produce new trajectories with properties similarto those in Π . We assume that all trajectories start and end on theboundary of a region of interest R , which can have any shape andcan be different for each environment. Overview of GANs.
A Generative Adversarial Network [4] con-sists of two components: a generator G that creates new samplesand a discriminator D that judges whether a sample is real or gen-erated. The training phase of a GAN is a two-player game in which G learns to ‘fool’ D , until (ideally) the generated samples are soconvincing that D does not outperform blind guessing. Internally, both G and D are artificial neural networks; let θ G and θ D be their respective weights. G acts as a function that convertsan m -dimensional noise vector z to a fake sample x = G( z ; θ G ) . D acts as a function that converts a (real or fake) sample x to avalue D( x ; θ D ) ∈ [ , ] indicating the probability of x being real.Training a GAN represents the following optimization problem:min θ G max θ D V ( θ D , θ G ) , V ( θ D , θ G ) = E x ∼ p x [ log D( x ; θ D )] + E z ∼ p z [ log ( − D(G( z ; θ G ) ; θ D ))] (1)where V ( θ D , θ G ) is known as the loss function . Its first term denotesthe expected output of D for a random real sample. This is higherwhen D correctly classifies more input samples as real. Conversely,the second term is higher when D classifies more generated sam-ples as fake. Here, p x and p z are the probability distributions of(respectively) the real data and the noise vectors sent to G . Overview of Our System.
Figure 1 displays an overview of ourGAN system. The generator and discriminator both have two tasks:generating or evaluating the entry points of a trajectory π (i.e. thefirst two points p π and p π ), and generating or evaluating the contin-uation of a trajectory (i.e. the next point p πk + after a sub-trajectory p π k ). For the continuation tasks, we use concepts from so-called‘conditional GANs’ because the generator and discriminator takeextra data as input. We will now describe the system in more detail.Parameter settings will be mentioned in Section 5. Generator.
To generate entry points , the generator G feeds a ran-dom vector z to a fully connected (FC) block of neurons. Its outputis a 4D vector that contains the coordinates of p π and p π .To generate the continuation of a trajectory p π k , the generator G feeds p π k and a noise vector z to a Long Short Term Memory(LSTM) layer that should encode the relevant trajectory dynamics.LSTMs are common recurrent neural networks used for handlingsequential data. The output of this LSTM block is sent to an FCblock, which finally produces a 2D vector with the coordinates of p πk + . Let д ( z | p π k ; θ G ) denote this result. Ideally, this point will betaken from the (unknown) distribution of likely follow-ups for p π k .The continuation step is repeated iteratively. If the newly gen-erated point p πk + lies outside of the region of interest R , then thetrajectory is considered to be finished. Otherwise, the process isrepeated with inputs p π k + and a new noise vector. Discriminator.
The discriminator D takes an entire (real or fake)trajectory π as input. It splits the discrimination into two tasks witha similar structure as in G . For the entry point part, an FC blockevaluates p π to a scalar in [ , ] , which we denote by v e ( p π ; θ D ) .For the continuation part, an LSTM+FC block separately evaluateseach point p πk (for 2 ≤ k < n π ) given the sub-trajectory p π k − . Wedenote the result for the k th point by v c ( p π k ; θ D ) .So, for a full trajectory π of n π points, the discriminator com-putes n π − π beingreal. The training phase uses these numbers in its loss function. Training.
Each training iteration lets G generate a set Π ′ of N trajectories for different (sequences of) noise vectors. We then let D classify all trajectories (both real and fake). The loss function ofour GAN is the sum of two components: ata-Driven Crowd Simulation with Generative Adversarial Networks CASA ’19, July 1–3, 2019, PARIS, France Figure 1: Our GAN architecture for learning and generating pedestrian trajectories. • the success rate for discriminating all entry points: (cid:213) π ∈ Π log v e ( p π ; θ D ) + (cid:213) π ∈ Π ′ log ( − v e ( p π ; θ D )) , • the success rate for discriminating all other points: (cid:213) π ∈ Π n π − (cid:213) k = log v c ( p π k ; θ D ) + (cid:213) π ∈ Π ′ n π − (cid:213) k = log ( − v c ( p π k ; θ D )) . To let our GAN train faster, we add a third component. For eachreal trajectory π ∈ Π , we take all valid sub-trajectories p πk : k + oflength 5 and let G generate its own version of p πk + given p πk : k + .We add to our loss function: (cid:213) π ∈ Π n π − (cid:213) k = || p πk + − д ( z | p π k + ; θ G )|| i.e. we sum up the Euclidean distances between real and generatedpoints. We observed that this additional component leads to muchfaster convergence and better back-propagation.To reduce the chance of ‘mode collapse’ (i.e. convergence to alimited range of samples), we use an ‘unrolled’ GAN [9]. This isan extended GAN where each optimization step for θ G uses animproved version of the discriminator that is u steps further ahead(where u is a parameter). Recall that our goal is to use our trajectories in a real-time interac-tive crowd simulation, where agents should be free to deviate fromtheir trajectories if needed. This section describes how we combineour trajectory generator with a crowd simulator.Our approach fits in the paradigm of multi-level crowd simula-tion [12], in which global planning (i.e. computing trajectories) isdetached from the simulation loop. This loop consists of discretetimesteps. In each step, new agents might be added, and each agenttries to follow its trajectory while avoiding collisions.
Adding Agents.
To determine when a new agent should be addedto the simulation, we use an exponential distribution whose pa-rameter λ denotes the average time between two insertions. Thisparameter can be obtained from an input dataset (to produce similarcrowdedness), but one may also choose another value deliberately.Each added agent follows its own trajectory produced by our GAN. Trajectory Following.
In each frame of the simulation loop, eachagent should try to proceed along its trajectory π while avoidingcollisions. The main difference with classical ‘route following’ [6] is that our trajectories have a temporal component: they prescribeat what speed an agent should move, and this speed may changeover time. Therefore, we present a way to let an agent flexiblyfollow π while respecting its spatial and temporal data. Our al-gorithm computes a preferred velocity v pref that would send theagent farther along π . This v pref can then be fed to any existingcollision-avoidance algorithm, to compute a velocity that is closeto v pref while avoiding collisions with other agents.Two parameters define how an agent follows π : the time window w and the maximum speed s max . An agent always tries to move toa point that lies w seconds ahead along π , taking s max into account.During the simulation, let t be the time that has passed since theagent’s insertion. Ideally, the agent should have reached π ( t ) bynow. Our algorithm consists of the following steps:(1) Compute the attraction point p att = π ( t att ) , where t att = min ( t + w , T ) and T is the end time of π . Thus, p att is the point that lies w seconds ahead of π ( t ) , clamped to the end of π if needed.(2) Compute the preferred velocity v pref as p att − p t att − t , where p is theagent’s current position. Thus, v pref is the velocity that willsend the agent to p att , with a speed based on the differencebetween t and t att .(3) If || v pref || > s max , scale v pref so that || v pref || = s max . This pre-vents the agent from receiving a very high speed after it hasbeen blocked for a long time. Collision Avoidance.
The preferred velocity v pref computed byour algorithm can be used as input for any collision-avoidanceroutine. In our implementation, we use the popular ORCA method[2]. In preliminary tests, other methods such as social forces [5]proved to be less suitable for our purpose. Set-up.
We have implemented our GAN using the
PyTorch library(https://pytorch.org/). The input noise vectors are 3-dimensionaland drawn from a uniformly random distribution. In both G and D ,the entry-point FC blocks consist of 3 layers with 128, 64, and 32hidden neurons, respectively. For the continuation part, the LSTMblocks consist of 62 cells, and the FC blocks contain 2 layers of 64and 32 hidden neurons. To save time and memory, the LSTM blocksonly consider the last 4 samples of a sub-trajectory.For training the GAN, all FC layers use Leaky-ReLU activationfunctions (with negative slope 0 . ,
000 iterations, using an unrolling parameter u = ASA ’19, July 1–3, 2019, PARIS, France Javad Amirian, Wouter van Toll, Jean-Bernard Hayet, and Julien Pettré
Generated Entry Points Entering Direction ROI Fake dist. Real dist. (a) GMM (b) Vanilla GAN (c) Unrolled GAN
Figure 2: The distribution of entry points created by three different methods.Figure 3: Trajectory heatmaps: the input data, the generated trajectories, andthe final simulated agent motion.
In the crowd simulation, we model agents as disks with a radiusof 0 . . w = s max = ETH dataset [10] that contains recordedtrajectories around the entrance of a university building. We havedefined the region of interest R as an axis-aligned bounding box,and we use only the 241 trajectories that both enter and exit R . Result 1: Entry Points.
To show the performance of our GAN inlearning the distribution of entry points, we computed 500 (fake)entry points in the ETH scene, and we calculated the distributionof the samples over the boundary of R . We also compared theseresults against two other generative methods: a Gaussian MixtureModel (GMM) with 3 components, and a ‘vanilla’ GAN variant thatdoes not use the unrolling mechanism. As shown in Fig. 2, the entrypoints of the unrolled GAN (right) are closer to the real data thanthose of the other two methods. Result 2: Trajectories.
Next, we used our system to generate 352new trajectories, and we used them to simulate a crowd. The firsttwo heatmaps in Fig. 3 show that generated trajectories (middle)are similarly distributed over the environment as the real data (left).The third heatmap shows the final motion of the simulated agentswith route following and collision avoidance. In this scenario, agentsare well capable of following their given trajectories.
Computation time.
We used CUDA to run our GAN on a NVIDIAQuadro M1200 GPU with 4GB of GDDR5 memory. With this set-up,generating a batch of 1024 trajectories (with a maximum lengthof 40 points) took 152ms, meaning that the average generationtime was 0 . We have presented a data-driven crowd simulation method that usesGANs to learn the properties of input trajectories and then generatenew trajectories with similar properties. Combined with flexibleroute following that takes temporal information into account, thetrajectories can be used in a real-time crowd simulation. Our systemcan be used, for example, to create variants of a scenario withdifferent densities. It can easily be combined with other simulationmethods, and it allows interactive applications.In the future, we will perform a thorough analysis of the trajecto-ries produced by our system, and compare them to other algorithms.We will also investigate the exact requirements for reliable train-ing. Furthermore, our system generates trajectories for individuals,assuming that agents do not influence each other’s choices. Assuch, it cannot yet model group behavior, and it performs worse inhigh-density scenarios where agents cannot act independently. Wewould like to handle these limitations in future work.
ACKNOWLEDGMENTS
This project was partly funded by EU project CROWDBOT (H2020-ICT-2017-779942).
REFERENCES [1] Amirian, J., Hayet, J.-B., and Pettré, J. Social ways: Learning multi-modaldistributions of pedestrian trajectories with GANs. In
CVPR Workshops (2019).[2] van den Berg, J., Guy, S., Lin, M., and Manocha, D. Reciprocal n-body collisionavoidance. In
Proc. Int. Symp. Robotics Research (2011), pp. 3–19.[3] Di, X., and Patel, V. Face synthesis from visual attributes via sketch usingconditional VAEs and GANs.
CoRR abs/1801.00077 (2018).[4] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair,S., Courville, A., and Bengio, Y. Generative adversarial nets. In
Proc. Int. Conf.Neural Information Processing Systems (2014), vol. 2, pp. 2672–2680.[5] Helbing, D., and Molnár, P. Social force model for pedestrian dynamics.
Physical Review E 51 , 5 (1995), 4282–4286.[6] Jaklin, N., Cook IV, A., and Geraerts, R. Real-time path planning in hetero-geneous environments.
Computer Animation and Virtual Worlds 24 , 3 (2013),285–295.[7] Lee, K., Choi, M., Hong, Q., and Lee, J. Group behavior from video: A data-driven approach to crowd simulation. In
Proc. ACM SIGGRAPH/EurographicsSymp. Computer Animation (2007), pp. 109–118.[8] Lerner, A., Chrysanthou, Y., and Lischinski, D. Crowds by example.
ComputerGraphics Forum 26 , 3 (2007), 655–664.[9] Metz, L., Poole, B., Pfau, D., and Sohl-Dickstein, J. Unrolled generativeadversarial networks.
CoRR abs/1611.02163 (2017).[10] Pellegrini, S., Ess, A., Schindler, K., and van Gool, L. You’ll never walkalone: Modeling social behavior for multi-target tracking. In
Proc. IEEE Int. Conf.Computer Vision (2009), pp. 261–268.[11] Pellegrini, S., Gall, J., Sigal, L., and van Gool, L. Destination flow for crowdsimulation. In
Proc. European Conf. Computer Vision (2012), pp. 162–171.[12] van Toll, W., Jaklin, N., and Geraerts, R. Towards believable crowds: Ageneric multi-level framework for agent navigation. In
ASCI.OPEN / ICT.OPEN(ASCI track) (2015).[13] Wang, H., Ondřej, J., and O’Sullivan, C. Path patterns: Analyzing and compar-ing real and simulated crowds. In
Proc. 20th ACM SIGGRAPH Symp. Interactive3D Graphics and Games (2016), pp. 49–57.[14] Wolinski, D., Guy, S., Olivier, A.-H., Lin, M., Manocha, D., and Pettré, J. Pa-rameter estimation and comparative evaluation of crowd simulations.
ComputerGraphics Forum 33 , 2 (2014), 303–312.[15] Yersin, B., Maïm, J., Pettré, J., and Thalmann, D. Crowd patches: Populat-ing large-scale virtual environments for real-time applications. In
Proc. Symp.Interactive 3D Graphics and Games (2009), pp. 207–214.[16] Zhong, J., Cai, W., Luo, L., and Zhao, M. Learning behavior patterns fromvideo for agent-based crowd modeling and simulation.