[PDF] Grab the Reins of Crowds: Estimating the Effects of Crowd Movement Guidance Using Causal Inference

Abstract

Crowd movement guidance has been a fascinating problem in various fields, such as easing traffic congestion in unusual events and evacuating people from an emergency-affected area. To grab the reins of crowds, there has been considerable demand for a decision support system that can answer a typical question: ``what will be the outcomes of each of the possible options in the current situation. In this paper, we consider the problem of estimating the effects of crowd movement guidance from past data. To cope with limited amount of available data biased by past decision-makers, we leverage two recent techniques in deep representation learning for spatial data analysis and causal inference. We use a spatial convolutional operator to extract effective spatial features of crowds from a small amount of data and use balanced representation learning based on the integral probability metrics to mitigate the selection bias and missing counterfactual outcomes. To evaluate the performance on estimating the treatment effects of possible guidance, we use a multi-agent simulator to generate realistic data on evacuation scenarios in a crowded theater, since there are no available datasets recording outcomes of all possible crowd movement guidance. The results of three experiments demonstrate that our proposed method reduces the estimation error by at most 56% from state-of-the-art methods.

Full PDF

GGrab the Reins of Crowds: Estimating the Effects of CrowdMovement Guidance Using Causal Inference

Koh Takeuchi

Kyoto [email protected]

Ryo Nishida

Tohoku University, [email protected]

Hisashi Kashima

Kyoto [email protected]

Masaki Onishi

[email protected]

ABSTRACT

Crowd movement guidance has been a fascinating problem in vari-ous fields, such as easing traffic congestion in unusual events andevacuating people from an emergency-affected area. To grab thereins of crowds, there has been considerable demand for a decisionsupport system that can answer a typical question: “what will bethe outcomes of each of the possible options in the current situation?" .In this paper, we consider the problem of estimating the effectsof crowd movement guidance from past data. To cope with lim-ited amount of available data biased by past decision-makers, weleverage two recent techniques in deep representation learning forspatial data analysis and causal inference. We use a spatial convolu-tional operator to extract effective spatial features of crowds froma small amount of data and use balanced representation learningbased on the integral probability metrics to mitigate the selectionbias and missing counterfactual outcomes. To evaluate the perfor-mance on estimating the treatment effects of possible guidance, weuse a multi-agent simulator to generate realistic data on evacuationscenarios in a crowded theater, since there are no available datasetsrecording outcomes of all possible crowd movement guidance. Theresults of three experiments demonstrate that our proposed methodreduces the estimation error by at most 56% from state-of-the-artmethods.

KEYWORDS

Deep Learning, Causal Inference, Multi-Agent Simulator

ACM Reference Format:

Koh Takeuchi, Ryo Nishida, Hisashi Kashima, and Masaki Onishi. 2021. Grabthe Reins of Crowds: Estimating the Effects of Crowd Movement GuidanceUsing Causal Inference. In

Proc. of the 20th International Conference onAutonomous Agents and Multiagent Systems (AAMAS 2021), Online, May 3–7,2021 , IFAAMAS, 9 pages.

Crowd movement guidance has been a fascinating problem in var-ious fields, such as easing traffic congestion in normal or abnor-mal events [25, 26, 33] and evacuating people from an emergency-affected areas such as a crowded building [7, 17, 32, 35, 43]. Tograb the reins of crowds, given a situation, decision-makers needto devote extensive efforts to quickly select guidance that consists

Proc. of the 20th International Conference on Autonomous Agents and Multiagent Systems(AAMAS 2021), U. Endriss, A. Nowé, F. Dignum, A. Lomuscio (eds.), May 3–7, 2021, Online

Figure 1: Example of crowd movement guidance: Evacuationevents from an opera house. The green and red dots repre-sent people who are moving or stopping, respectively. of a complex compound of actions. For example, a traffic networkmanager needs to control traffic signals, road capacities, and di-rections of crowd flow together while considering the interactionsamong them. We show an example of crowd movement guidancein Figure 1. Let us assume a situation in which theater managersneed to decide guidance required for a crowd evacuation, and thereare two guide plans: A and B . If they choose plan A , congestion ofcrowds does not occur, and thus people can quickly escape from thetheater. In contrast, plan B disturbs the crowd movements, whichresults in longer evacuation times than in plan A . The choice ofplan has a significant impact on the outcome, and therefore, therehas been considerable demand for a decision support system thatcan answer a typical question posed by decision-makers: “whatwill be the outcomes of each of the possible options in the currentsituation?" .However, due to the limited number of such events or evacu-ation situations, there are not enough data records for estimat-ing such effects with sufficient accuracy. To make matters worse,these records contain biases introduced by the plans taken by pastdecision-makers, because they try to make the best decisions forthe situations at the time, and we cannot expect randomized con-trol trials (RCTs) were performed to obtain unbiased data due toethical or other reasons. In addition, due to the counterfactual na-ture of the data acquisition process, we cannot observe guidanceoutcomes other than those of the actual ones, and therefore, wecannot directly assess the effects of guidance from such datasets. a r X i v : . [ c s . L G ] F e b igure 2: Framework of crowd movement guidance based oncausal inferences. In this paper, we focus on the problem of estimating the effects ofcrowd movement guidance from a relatively small amount of biaseddata. We propose a spatial convolutional counterfactual regression (SC-CFR), which takes advantage of two recent developments indeep representation learning in spatial data analysis and causalinference. Because the locational distribution of crowds has a sig-nificant impact on the effectiveness of movement guidance, SC-CFRaggregates crowd distribution information by means of the spatialconvolutional layer [19, 41] and predicts the outcomes of movementguidance with high accuracy even with a limited number of data.On the other hand, to cancel the data biases made by past decision-makers, we resort to the causal inference framework that has beensuccessfully applied to answer the what-if problems in various do-mains, such as healthcare [24], economics [13], and education [42].In particular, SC-CFR employs balanced representation learningbased on integral probability metrics, which mitigates selectionbias and missing counterfactual outcomes [24].Evaluating the performance of estimating the possible guidance’streatment effects requires datasets recording outcomes of all possi-ble crowd movement guidance. Since there are no datasets availablefor our purpose, we used a multi-agent simulator [36] to generaterealistic data on evacuation scenarios in a crowded theater. Fromthese datasets, we conducted experiments for estimating the treat-ment effects on outcomes, such as the maximum, average, andstandard deviation of evacuation times required by crowds. Wedemonstrate that our proposed method reduced the estimation er-ror by at most 56% from state-of-the-art baseline methods that donot consider spatial attributes.Our contributions are summarized as follows: • We first address the what-if problem in crowd movementguidance as causal effect estimation from a relatively smallamount of biased data. • We propose SC-CFR, which takes advantage of two recentdevelopments in deep representation learning in spatial dataanalysis and causal inference. • We demonstrate that our method reduces the estimationerror by at most 56% from state-of-the-art baselines by ex-periments with a multi-agent simulator.

We use an uppercase letter to denote a random variable (i.e., 𝑋 ), alowercase letter to indicate a realization 𝑥 drawn from a particulardistribution (i.e., 𝑥 ∼ 𝑃 ( 𝑋 ) ), and a calligraphy letter X to representa sample space (i.e., 𝑥 ∈ X ). Let us consider a place or area with crowds of freely moving people.When the place is crowded, a manager should want to carry out acrowd guidance plan to reduce the congestion caused by crowdsmoving toward other places. For example, if a park hosts a bigfestival with a sell-out crowd, a park manager has to make a decisionon guidance to avoid traffic jams. Another example is of a managerworking in a stadium who needs to make a decision on guidanceto safely and quickly evacuate crowds during a fire emergency. Acrowd guidance plan can comprise various types of actions, such asroute recommendations for crowds or a modification of road width,which could have complicated interactions.Assume that a situation of crowds, such as their current loca-tions, is observed as covariates. We denote a situation of crowds(covariates) as 𝑿 ∈ R 𝑑 , where 𝑑 is the number of people. A guide(treatment) is 𝒁 ∈ Z ⊂ { , } 𝑡 that consists of 𝑡 possible actions,where Z is a set of possible guides. A guide can select single or mul-tiple actions (i.e., a combination of actions); therefore, |Z| growsexponentially as 𝑡 increases. We denote a potential outcome as 𝑌 ∈ R , such as an average moving cost or the maximum movementcost consumed by crowds. For example, let us suppose an evacu-ation scenario from a place. Given an initial situation of crowds 𝒙 ∼ 𝑃 ( 𝑿 ) , a decision-maker, who needs to guide crowds to a safeplace, selects a treatment 𝒛 ∼ 𝑃 ( 𝒁 | 𝑿 ) , after which crowds start tomove. We can observe an outcome 𝑦 ∼ 𝑃 ( 𝑌 | 𝑿 , 𝒁 ) , such as evac-uation times, after finishing the scenario. This procedure is builtupon by the Rubin-Neyman potential outcomes framework [22].We show a framework for crowd movement guidance in Figure 2. Suppose that there is a healthcare record (covariate) of a patient,and a doctor needs to choose whether to prescribe medicine, whichwe will call a treatment. We can observe a future health conditionof the patient as a subsequent outcome. Causal effect estimation isa task for estimating outcomes given not actual treatments and toassess the difference in outcomes between treatments. In this paper,we condider the situation of crowds as covariates, guidance as atreatment, and costs of crowd movements as outcomes. Effects of aguide should differ from situation to situation since it depends onthe crowd conditions and types of events. Thus, decision-makingon guide planning is one of the most difficult tasks for managers,and they need a useful tool to help their decision-making. Surpris-ingly, the effect of guidance has not been considered in the existingliterature.We suppose that two arbitrary guide plans A and B correspondto guides 𝒛 and 𝒛 ′ , respectively. Given covariates 𝒙 and a guide 𝒛 ,we denote the cost consumed by crowds as 𝑌 𝒙 ( 𝒛 ) . The conditionalaverage treatment effect (CATE) for covariates 𝒙 can be defined as E [ 𝑌 𝒙 ( 𝒛 )] − E [ 𝑌 𝒙 ( 𝒛 ′ )] , where we use E to denote the expectationover 𝑌 . Intuitively, this quantity indicates how much of an advan-tage (or disadvantage) plan A has in guiding the crowd movementscompared with plan B . Knowing this quantity helps a decision-maker compare the difference between treatments for a particularsituation 𝒙 ; however, we cannot directly calculate CATE from ob-served outcomes because we can only observe the outcome of oneof the treatments. .3 Selection Bias and Missing Counterfactuals Estimating the outcome of guidance requires a deep understand-ing of the complex relationships within covariates and guidance.One of the difficulties in estimating treatment effects on crowdmovement is that an observed set of guidance is biased in the givendataset, because the manager usually selects guidance according totheir policy. Thus, even if the observed covariates are random, wecannot observe the outcome with a pair of covariates and guidancethat managers do not prefer. Another obstacle is that we can onlyobserve a factual outcome, but cannot observe counterfactual out-comes. Usually, we can only deploy a single guide in a situation, andthus the outcomes for the other possible guides remain missing.Formally, in an observed study, we can obtain only one factualoutcome 𝑦 f depending on a selected treatment 𝒛 , and thus never seecounterfactual outcomes 𝑦 cf corresponding to the other treatmentsthat are not selected by the decision-maker. More formally, whena decision-maker chooses a treatment 𝒛 , a factual outcome 𝑦 f isequal to the outcome 𝑦 𝒙 ( 𝒛 ) , and the counterfactual outcomes 𝑦 cf are 𝑦 𝒙 ( 𝒛 ′ ) , ∀ 𝒛 ′ ∈ Z \ 𝒛 . We suppose that a decision-maker has apolicy for choosing a treatment 𝒛 based on a situation 𝒙 , and thus,observed treatments are not at random and always biased. This bias,which is called a selection bias in the Rubin-Neyman framework,leads to a simple model for estimating outcomes, not accurate andbiased results. Thus, we need to develop a method that learns amodel for predicting outcomes and can simultaneously cancel theeffects of the selection bias.In this paper, we make the common assumption called strongignorability in the Rubin-Neyman framework. Assumption 1 (Stable Unit Treatment Value Assumption) . Thepotential outcomes for any situations do not vary with the treatmentassigned to other situations, and, for each situation, there are nodifferent forms or versions of each treatment level, which lead todifferent potential outcomes.

This assumption emphasizes the independence of each situationand that there are no interactions between situations.

Assumption 2 (Ignorability) . Given the covariate, 𝑿 , treatmentassignment 𝒁 is independent to the potential outcomes, i.e., 𝒁 ⊥ 𝑌 𝑿 ( 𝒛 ) | 𝑿 , ∀ 𝑧 ∈ Z . The ignorability assumption is also referred to the unconfound-edness assumption. With this unconfoundedness assumption, thesituations with the same covariates 𝑿 , their treatment assignmentcan be viewed as random. Assumption 3 (Positivity) . For any value of 𝑿 , treatment assign-ment is not deterministic: 𝑃 ( 𝒁 = 𝒛 | 𝑿 = 𝒙 ) > , ∀ 𝒛 and 𝒙 . This assumption implies that all the factors determining the out-come of each treatment are observed. From the above assumptions,we follow strong ignorability assumption that is referred to theno-hidden confounders assumption. We formalize this assumptionby using the standard strong ignorability condition:

Assumption 4 (Strong Ignorability) . 𝒁 ⊥ 𝑌 𝑿 ( 𝒛 ) | 𝑿 , ∀ 𝑧 ∈ [ , ] 𝑡 and 𝑃 ( 𝒁 = 𝒛 | 𝑿 = 𝒙 ) > , ∀ 𝒛 and 𝒙 . The strong ignorability assumption is a sufficient condition forCATE function 𝑓 to be identifiable [13]. Figure 3: Model architecture of spatial convolutional coun-terfactual regression (SC-CFR).

To predict what the outcomes of each of the possible options inthe current situation will be, we develop a model 𝑦 = 𝑓 ( 𝒙 , 𝒛 ) thatpredicts the outcome for any treatments 𝒛 . Our idea is twofold: aspatial convolutional operator extracting effective spatial featuresof crowds from a small amount of data, and balanced representationlearning to mitigate the selection bias and missing counterfactuals.We assume that the biased observed dataset D = { 𝒙 𝑛 , 𝒛 𝑛 , 𝑦 f 𝑛 } 𝑁𝑛 = is available for learning our model, where 𝑁 is the number ofsituations. Note that, our dataset contains only factual outcomes forassigned treatments; hus, we do not have access to the unobservedcounterfactual outcomes 𝑦 cf for unassigned treatments. We first give an overview of our proposed model called spatialconvolutional counterfactual regression (SC-CFR) shown in Fig-ure 3. SC-CFR takes two inputs: covariates 𝒙 and a treatment 𝑧 ,and outputs the outcome estimate 𝑦 . SC-CFR first extracts effec-tive spatial features Φ 𝒙 from covariates 𝒙 with the spatial featureextractor 𝑔 . Then, a multi-layer perceptron ℎ makes a prediction 𝑦 = ℎ ( Φ 𝒙 , 𝒛 ) = ℎ ( 𝑔 ( 𝒙 ) , 𝒛 ) .In the training phase of SC-CFR, to reduce the biases made bypast decision-makers, we utilize the IPM regularizer that makes thedistributions of the extracted representations Φ 𝒙 = 𝑔 ( 𝑥 ) as similaras possible between different treatments. Now, we explain the details of learning spatial features Φ 𝒙 . Weutilize spatial information, such as GPS locations of crowds forextracting spatial features. We split spatial attributes vertically andhorizontally to make a spatial discrete grid. Then, from covariates 𝒙 , we construct a two-dimensional array 𝒂 ∈ R 𝑑 × 𝑑 filled withvalues such as the aggregated numbers of people in each block.We employ a spatial convolution operator and an average pool-ing operator to extract spatial features Φ 𝒙 from the two-dimensionalarray 𝒂 . These operators have been widely adopted in spatial dataanalysis, and have achieved significant improvements in variousapplications such as traffic forecasting [18, 19, 41]. These operatorsenable us to extract spatial distributions of the crowds by aggre-gating information of nearby grids, and the hierarchical structureof the operators captures both local and global interactions of thecrowds. Our two-dimensional spatial convolution operation [8]akes the inner product or measures the cross-correlation betweenan input array 𝒂 and a kernel 𝒌 ∈ R 𝑘 × 𝑘 , defined as ( 𝒌 ∗ 𝒂 )( 𝑖, 𝑗 ) = 𝑘 ∑︁ 𝑚 = 𝑘 ∑︁ 𝑛 = 𝒂 𝑖 + 𝑚,𝑗 + 𝑛 𝒌 𝑚,𝑛 . (1)The average pooling operator is the convolution operator whosekernel values are set to 1. We employ two repetitions of a convolu-tion operator and an average pooling operator as 𝑔 ( 𝒙 ) in this paper(See Figure 3). If we do nothing to address the observation biases in data, the distri-bution of the extracted spatial feature Φ 𝒙 is biased depending on 𝒛 ,which could made the prediction accuracy of the model deteriorate.To mitigate the bias and make Φ 𝒙 a balanced representation, we uti-lize the integral probability metrics (IPM) [20, 27] in this paper. Letus consider a binary treatment case Z = { , } : IPM can measuredistance between two distributions. Given two probability densityfunctions 𝑃 ( Φ 𝒙 | 𝒛 𝑖 ) and 𝑃 ( Φ 𝒙 | 𝒛 𝑗 ) , for S ⊂ R 𝑑 and 𝐺 , a family offunctions from S to R , IPM is defined asIPM 𝐺 ( 𝑝, 𝑞 ) = sup 𝑔 ∈ 𝐺 (cid:12)(cid:12)(cid:12)(cid:12)∫ S 𝑔 ( 𝑠 ) ( 𝑝 ( 𝑠 ) − 𝑞 ( 𝑠 )) d 𝑠 (cid:12)(cid:12)(cid:12)(cid:12) . (2)Intuitively, minimizing IPM makes the distributions of the extractedspatial features for the factual and counterfactual treatments hard todistinguish, and thus the distribution of the training data becomesclose to those of RCTs [10, 24].In this paper, we consider a case where a guiding plan 𝒛 ∈ Z = { , } 𝑡 consists of multiple actions, such as route recommendationsfor crowds and a modification of the road width. However, theIPM compares only two distributions, and therefore, is not directlyapplicable to multiple treatments [39]. One of the most naturalextensions of IPM to multiple treatments is to measure IPM betweenall possible pairs of treatments; however, this requires |Z| IPMterms and is not tractable for a large number of treatments. Tocope with this problem, we utilize another simple extension of IPMthat considers the most frequent treatment in a mini-batch as thecontrol treatment and only use the IPMs between the control andthe others. ∑︁ 𝒛 ′ ∈Z\ 𝒛 IPM 𝐺 ( 𝑝 ( Φ 𝑥 | 𝒛 ) , 𝑞 ( Φ 𝑥 | 𝒛 ′ )) . (3) Finally, we utilize the spatial representations balanced for all thetreatments to construct a predictive model ℎ ( Φ 𝒙 , 𝒛 ) to infer an out-come ˆ 𝑦 . The original counterfactual regression [24] and its variants[23] employ multi-head networks where each multi-layer networkcorresponds to a treatment 𝑧 𝑖 and shares common base layers toavoid the risk of losing treatment information. However, since thetotal number of treatments |Z| rapidly grows with the number ofactions, and training all of the heads requires a considerable amountof observations that are not available in our setting, we employa single-head network instead built on a multi-layer perceptron(MLP) that takes a concatenation of Φ 𝒙 and 𝒛 as the input. ℎ ( Φ 𝒙 , 𝒛 ) = MLP ( Φ 𝒙 , 𝒛 ) . (4) The single-head network architecture has showed better perfor-mance when the number of treatments is more than a dozen [30].To train the SC-CFR model, we find the optimal parameter thatminimizes the loss function. 𝑁 ∑︁ 𝑖 = ∥ 𝑦 𝑛 − MLP ( Φ 𝒙 𝑛 , 𝒛 𝑛 )∥ + 𝜆 ∑︁ 𝒛 ′ ∈Z\ 𝒛 IPM 𝐺 ( 𝑝 ( Φ 𝑥 | 𝒛 𝑛 ) , 𝑞 ( Φ 𝑥 | 𝒛 ′ 𝑛 )) , (5)where 𝜆 > Modeling and simulating large-scale crowd movements [3], includ-ing traffic network flows in cities [25, 33] and at sea [26], andevacuation from buildings [7, 17, 32, 43] have been actively stud-ied in the field of multi-agent systems. Researchers have exploitedmulti-agent simulators to find optimal decision-making policiesfor crowd management. Comparisons of the properties of varioustypes of simulators are found in [21]. Most of the existing researchfocuses on bottom-up decision-making by individual crowds, i.e.,each agent making decisions based on individual or partial infor-mation to achieve one’s goals. This study focuses on the indirectcontrol of crowd movement through top-down decision-making bymanagers and the estimation of its effects.A convolutional operator is a feature extraction method devel-oped in computer vision for capturing low and high level imagefeatures [8]. Researchers in spatial data analysis have recently ap-plied convolutional operators to handle the spatial correlations inurban areas [19]. They have extracted local and city-wide globalfeatures by stacking spatial convolutional operators and utilized itto solve their domain-specific problems [18]. With such rich fea-tures, many researchers have reported significant improvements inthe predictive accuracy on forecasting future amounts and speedsof transportation networks [40, 41] and rides of on-demand rideservices [15], to name a few. Based on their results, we utilize aspatial convolutional operator for extracting effective local spatialfeatures of crowds from a small amount of data.In the field of causal inference, novel methods for estimatingthe causal effect on individual and population levels have beenproposed [12, 37]. The bayesian additive regression tree (BART)[11] is one of the most popular methods built on a simple decisiontree model [2, 34] and it shows state-of-the-art performance. Re-cently, in the field of machine learning, novel methods based ondeep neural networks to remove biases in data introduced by pastdecision-makers have been proposed [1, 5, 14, 24, 38]; they use rep-resentation learning techniques to learn balanced representationsof confounders. In particular, the counterfactual regression (CFR)[24] and its various extensions [10, 23, 39] have received wide-spread interest due to its significant performance improvement inestimating treatment effects compared to existing methods, such asBART. However, most of the existing studies have not consideredthe use of spatial information to address data deficiencies and ob-servation biases. In this paper, we use spatial convolution to extracteffective spatial features for estimating treatment effects from the igure 4: An evacuation. Figure 5: Route guidance. Figure 6: A jam of crowds.Figure 7: Building structures, seat layouts, and actions; aroute guide for people (blue) and doors at six exits (red). Fourblocks A, B, C, and D for data generation. distribution of crowds given as images [19, 41]. The problem of esti-mating treatment effects is closely related to bandit feedback [4, 28]and off-policy reinforce learning [29]. Some of them assume thatthe policies of the agents used to collect data are known [10, 12].Meanwhile, we do not assume prior information about the datacollection mechanisms.

Evaluating the performance of estimating possible guidance treat-ment effects requires datasets recording the outcomes of all possiblecrowd movement guidance. Since such datasets are not available,we used an open-source multi-agent simulator [36] to generaterealistic data on evacuation situations in a crowded theater. Weutilized the actual structure of an opera house, namely New Na-tional Theatre, Tokyo in our simulator, whose seating capacity is 𝑑 = route guidance to guide the audience tothe evacuation exits. If no route guide is placed, each audienceevacuates from the nearest exit. We employed a route guide whosemaximum evacuation time was the smallest than other guides whenall seats were occupied and all doors were open. The second type ofthe actions was door operations that determine whether the door ofan exit to be fully open or half-open. In our scenario, we considereda risk, such as spreading fires, and allowed only two of six doors tobe fully open. We show snapshots of simulations having the samecovariates but with and without route guidance in Figures 9 and 8,where the green and red dots represent agents that are moving andstopping, respectively. We can observe the positive effect of routeguidance by comparing the snapshots at 160 sec; without routeguidance, congestion occurred (Figure 8), while it is dissolved if wegive route guidance (Figure 9). Our codes and datasets are publiclyavailable at a repository . We generated crowd movement datasets using our simulator withthe following settings. We first split the auditorium into four blocks(Blocks A, B, C, and D in Figure 7) and set the occupancy rate ofeach block from { . , . , . } . We randomly set the occupation ofeach seat with the occupancy rate and constructed covariates 𝒙 bysetting 𝑥 𝑖 to 1 if the 𝑖 -th seat was occupied and set to 0 otherwise.We ran the seat selection ten times with different random seeds.Then, we put agents on the occupied seats and ran evacuationsimulations for all possible treatments 𝒛 ∈ Z , where |Z| = ∗ ( ∗ )/ =

30. We omitted simulations where the total number of peoplewas less than 400, since the guidance was not effective in suchsituations. The total number of data 𝑁 was 423 and each data has 30possible treatments and their corresponding outcomes. We observedthree outcomes for each run: the maximum, average, and standarddeviation of evacuation times required by each person. Then, weadded observation noises to the outcomes that were randomlysampled from a normal distribution N ( , 𝜎 ) where 𝜎 =

2. Weshow the distributions of the outcomes in Figures 10, 11, and 12.The red and blue lines correspond to the outcomes with and withoutroute guidance, respectively.To select a factual treatment 𝒛 for guidance, we employed twoindependent probabilistic policies (the propensity scores). We setthe first dimension of treatment 𝒛 ∈ { , } to represent the action https://github.com/koh-t/SC-CFR igure 8: Snapshots of a multi-agent simulation without route guidance.Figure 9: Snapshots of a multi-agent simulation with route guidance. of the route guide, and others to indicate the actions of six doors.We randomly determined whether to use the route guide using adistribution 𝑝 ( 𝑧 guide = ) = /( + exp (− (cid:205) 𝑑𝑖 = 𝑥 𝑖 / 𝑑 + )) . We set 𝑧 guide to 1 when the route guidance is given. To decide which exitdoors to be fully open, we calculated the populations of occupiedseats within 8 meters (cid:205) 𝑖 ∈D 𝑗 𝑥 𝑖 /|D 𝑗 | from each door, where D 𝑗 isthe number of the seats within 8 meters from the 𝑗 -th door. Finally,we randomly sampled two doors to be open by using a multinomialdistribution whose parameters were set to those populations. Wetreated all the other treatments as counterfactual treatments. We randomly sampled 90% of the seat assignments as a trainingdataset and used the rest as a test dataset. As the baseline methods,we employed nine supervised learning methods [6]: Lasso [31],the ridge regression (Ridge), the support vector regression (SVR),the random forest (RF), the gradient boosting regressor (GBR), thebayesian additive regression tree (BART) [11], and the multi-layerperceptron (MLP) consisting of five layers. Since these methodswere not applicable to identify differences between covariates anda treatment, we simply utilized a concatenation [ 𝒙 , 𝒛 ] as an in-put. The treatment-agnostic representation network (TARNET) andthe counterfactual regression (CFR) are recently proposed neuralnetwork-based causal effect estimation methods in machine learn-ing fields [24]. For TARNET and CFR, we used a two-layer MLP tolearn the representation of 𝒙 and a three-layer MLP whose inputswere a concatenation of the extracted representation and treat-ments to predict the outcome. We employed two variants of ourproposed methods. SC-CFR is our proposed method using IPM, anda spatial convolutional treatment-agnostic representation network(SC-TARNET) is the proposed method without IPM ( 𝜆 = ) . Weconstructed two-dimensional arrays 𝒂 ∈ R × from covariatesbased on seat layouts. We show examples of the arrays in Figure13. For our methods, we utilized two repetitions of a block thatconsisted of a convolutional operator and an average pooling oper-ator, whose kernel size 𝑘 and paddings were set to 3 to extract the spatial feature Φ 𝒙 . Then, we used a three-layer MLP to predict theoutcome. We selected model architectures and hyperparameters byminimizing the training errors. To compare the performance of models, we employed several met-rics that could measure the accuracy of predicted outcomes ˆ 𝑦 underthe potential outcome framework. We evaluated models in twodifferent settings; the within-sample setting, where the task is toestimate outcomes for all scenarios in the training dataset, evalu-ates what happens if a decision-maker chose other treatments inthe past evacuation events. On the other hand, the out-of-sample setting, where the task is to estimate outcomes for scenarios in thetest dataset, corresponds to the case of a new evacuation event, andthe goal is to estimate the outcomes of all possible treatments. Root mean squared error (RMSE) . RMSE is a widely usedmetric in supervised learning problems. We employed RMSE tomeasure the accuracy of each predicted outcome depending on thecovariates and all possible treatments. We define 𝜖 RMSE as  |Z| 𝑁 ∑︁ 𝒛 𝑖 ∈Z 𝑁 ∑︁ 𝑛 = (cid:0) ˆ 𝑦 𝒙 𝑛 ( 𝒛 𝑖 ) − E [ 𝑌 𝒙 𝑛 ( 𝒛 𝑖 )] (cid:1)  / , (6)where Z ′ is a set of whole pairs of possible guiding plans. Precision in estimation of heterogeneous effect (PEHE) . Inthe binary setting, PEHE measures the ability of a predictive modelto estimate the difference between true and predicted outcomesbetween two treatments 𝑧 and 𝑧 for covariates 𝑥 . We define PEHEas the RMSE between the true difference of the outcomes of twotreatments 𝒛 𝑖 and 𝒛 𝑗 , that is, E [ 𝑌 𝒙 𝑛 ( 𝒛 𝑖 )] − E [ 𝑌 𝒙 𝑛 ( 𝒛 𝑗 )] , and thepredicted difference, ˆ 𝑦 𝒙 𝑛 ( 𝒛 𝑖 ) − ˆ 𝑦 𝒙 𝑛 ( 𝒛 𝑗 ) . Obtaining a better PEHErequires accurate estimations of both the factual and counterfactualoutcomes. We set 𝜖 PEHE as (cid:34) 𝑁 𝑁 ∑︁ 𝑛 = (cid:0) ˆ 𝑦 𝒙 𝑛 ( 𝒛 𝑖 ) − ˆ 𝑦 𝒙 𝑛 ( 𝒛 𝑗 ) − (cid:0) E [ 𝑌 𝒙 𝑛 ( 𝒛 𝑖 )] − E [ 𝑌 𝒙 𝑛 ( 𝒛 𝑗 )] (cid:1)(cid:1) (cid:35) / . (7) igure 10: Maximum evacu-ation times. Figure 11: Average evacua-tion times. Figure 12: Standard devia-tion of the evacuation times.Figure 13: Examples of two-dimensional covariate arrays.Average treatment Effect (ATE) . We employed the error onestimating a conditional ATE. In contrast to PEHE, ATE considersonly the average difference of treatment effects over the population.ATE is not as important as PEHE for models optimised for CATEestimation, but it can be a useful indicator of how well an estimatorperforms at comparing two treatments across the entire population.We define 𝜖 ATE as1 𝑁 𝑁 ∑︁ 𝑛 = (cid:0) ˆ 𝑦 𝒙 𝑛 ( 𝒛 𝑖 ) − ˆ 𝑦 𝒙 𝑛 ( 𝒛 𝑗 ) (cid:1) − 𝑁 𝑁 ∑︁ 𝑛 = (cid:0) E [ 𝑌 𝒙 𝑛 ( 𝒛 𝑖 )] − E [ 𝑌 𝒙 𝑛 ( 𝒛 𝑗 )] (cid:1) . (8) Multiple Treatments . Let us consider a general case wherecombinations of multiple actions are available for making a guidingplan. By following [23], we extended PEHE and ATE to be applicableto multiple treatments by considering the average PEHE and ATEbetween every possible pair of guiding plans defined as 𝜖 mPEHE = (cid:0) 𝑡 (cid:1) ∑︁ ( 𝒛 𝑖 , 𝒛 𝑗 ) ∈Z 𝜖 PEHE , 𝜖 mATE = (cid:0) 𝑡 (cid:1) ∑︁ ( 𝒛 𝑖 , 𝒛 𝑗 ) ∈Z 𝜖 ATE . (9) Tables 1, 2, and 3 show the results of prediction of the maximum,average, and standard deviation of evacuation times, respectively.Each column represents the average and the standard deviationof each metric. A bold letter indicates a method that achieved thesmallest (best) error in each column. To assess the effect on biascancellation, we employed the t-test on two related samples ofthe scores between TARNET and CFR, and SC-TARCNET and SC-CFR. The values with a single star or double stars indicate that the p-values between the two models were 𝑝 < .

10 and 𝑝 < . 𝜖 RMSE and 𝜖 mPEHE . SC-TARNETcame in the second place. GBR or BART placed third. In Table 1,SC-CFR reduced the estimation error of 𝜖 RMSE for within-sampleand out-of-sample by 33% and 38% from GBR, and 55% and 56%from Ridge, respectively.The improvements of SC-CFR on 𝜖 RMSE and 𝜖 mPEHE for bothwithin-sample and out-of-sample settings indicated that our methodsuccessfully extracted spatial features and learned balanced repre-sentations, thus providing more accurate predictions of outcomesthan the state-of-the-art baselines. Our methods achieved the bestestimation results of 𝜖 mATE in Tables 1 and 3, while BART showedthe best performance in Table 2. Since the distribution of the stan-dard deviations had a relatively simpler shape than the others,BART, which is a simpler non-linear method than ours, was suffi-cient to estimate ATE.By comparing MLP and TARNET, TARNET showed better scoresthan MLP because of its representation learning mechanism, whichMLP does not have. The performance of SC-CFR was significantlybetter than that of CFR. With these results, we could argue thatthe spatial convolutional operators worked correctly and greatlycontributed to this improvement. As regarding the effect of biascancellation, we observed that IPM contributed to the performanceimprovements of CFR from TARNET and SC-CFR from SC-TARNET.This result indicates that a balanced representation was effective inprecisely estimating CATE of crowd movement guidance, which isconsistent with existing literature [24]. In this paper, we focused on the problem of estimating the effectsof crowd movement guidance from a relatively small amount ofbiased data to answer a typical question posed by decision-makers: “what will be the outcomes of each of the possible options in the currentsituation?" . We proposed a Spatial Convolutional CounterfactualRegression (SC-CFR), which takes advantage of two recent devel-opments in deep representation learning in spatial data analysisand causal inference. We utilized spatial convolutional operators toextract effective spatial features of crowds from a small amount ofdata, and applied balanced representation learning using IPM to mit-igate the selection bias and missing counterfactual outcomes. With able 1: Results on estimating the maximum evacuation time.

Method Within-sample Out-of-sample 𝜖 RMSE 𝜖 mPEHE 𝜖 mATE 𝜖 RMSE 𝜖 mPEHE 𝜖 mATE Lasso 19 . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) Ridge 18 . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) SVR 19 . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) RF 16 . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) GBR 13 . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) BART 14 . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) MLP 19 . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) TARNET 15 . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) CFR ∗ . ( . ) ∗∗ . ( . ) . ( . ) ∗ . ( . ) ∗∗ . ( . ) . ( . ) SC-TARNET 9 . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) SC-CFR ∗ . ( . ) ∗∗ . ( . ) ∗ . ( . ) ∗ . ( . ) ∗∗ . ( . ) ∗∗ . ( . ) Table 2: Results on estimating the average evacuation time.

Method Within-sample Out-of-sample 𝜖 RMSE 𝜖 mPEHE 𝜖 mATE 𝜖 RMSE 𝜖 mPEHE 𝜖 mATE Lasso 3 . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) Ridge 2 . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) SVR 2 . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) RF 3 . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) GBR 2 . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) BART 2 . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) MLP 3 . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) TARNET 2 . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) CFR ∗∗ . ( . ) ∗∗ . ( . ) ∗∗ . ( . ) . ( . ) ∗∗ . ( . ) . ( . ) SC-TARNET 1 . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) SC-CFR . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) Table 3: Results on estimating the standard deviation of evacuation time.

Method Within-sample Out-of-sample 𝜖 RMSE 𝜖 mPEHE 𝜖 mATE 𝜖 RMSE 𝜖 mPEHE 𝜖 mATE Lasso 4 . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) Ridge 3 . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) SVR 3 . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) RF 3 . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) GBR 2 . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) BART 2 . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) MLP 4 . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) TARNET 3 . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) CFR ∗∗ . ( . ) ∗∗ . ( . ) ∗ . ( . ) . ( . ) ∗∗ . ( . ) . ( . ) SC-TARNET 2 . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) SC-CFR . ( . ) . ( . ) ∗ . ( . ) . ( . ) . ( . ) ∗∗ . ( . ) datasets generated by a multi-agent simulator on evacuation sce-narios in a crowded theater, we conducted experiments to estimatetreatment effects on three types of outcomes of evacuation timesrequired by crowds. We demonstrated that SC-CFR reduced theestimation error by at most 56% from the dstate-of-the-art baselinemethods that did not consider spatial attributes. Acknowledgement

This work was supported by JST, PRESTOGrant Number JPMJPR20C5, Japan and JSPS KAKENHI Grant Num-bers 20H00609, Japan

EFERENCES [1] Susan Athey and Guido Imbens. 2015.

Machine Learning for Estimating Heteroge-neous Causal Effects . Research Papers. Stanford University, Graduate School ofBusiness.[2] Susan Athey, Julie Tibshirani, Stefan Wager, et al. 2019. Generalized randomforests.

The Annals of Statistics

47, 2 (2019), 1148–1178.[3] M. Balmer, N. Cetin, Kai Nagel, and B. Raney. 2004. Towards truly agent-basedtraffic and mobility simulations. In

Proceedings of AAMAS . 60–67.[4] Alina Beygelzimer, John Langford, Lihong Li, Lev Reyzin, and Robert Schapire.2011. Contextual Bandit Algorithms with Supervised Learning Guarantees. In

Proceedings of AISTATS .[5] Ioana Bica, Ahmed M. Alaa, James Jordon, and Mihaela van der Schaar. 2020.Estimating counterfactual treatment outcomes over time through adversariallybalanced representations. In

Proceedings of ICLR .[6] Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2001.

The Elements ofStatistical Learning . Vol. 1. Springer series in statistics New York.[7] Daniele Gianni, Georgios Loukas, Erol Gelenbe, et al. 2008. A Simulation Frame-work for the Investigation of Adaptive Behaviours in Largely Populated BuildingEvacuation Scenarios. In

Proceedings of AAMAS .[8] Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016.

Deeplearning . Vol. 1. MIT press Cambridge.[9] Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, andAlexander Smola. 2012. A Kernel Two-Sample Test.

Journal of Machine LearningResearch

13, 25 (2012), 723–773.[10] Negar Hassanpour and Russell Greiner. 2019. CounterFactual Regression withImportance Sampling Weights. In

Proceedings of IJCAI .[11] Jennifer L. Hill. 2011. Bayesian Nonparametric Modeling for Causal Inference.

Journal of Computational and Graphical Statistics

20, 1 (2011), 217–240.[12] Guido W. Imbens and Donald B. Rubin. 2015.

Causal Inference for Statistics, Social,and Biomedical Sciences: An Introduction . Cambridge University Press.[13] Guido W. Imbens and Jeffrey M. Wooldridge. 2009. Recent Developments in theEconometrics of Program Evaluation.

Journal of Economic Literature

47, 1 (2009),5–86.[14] Fredrik D. Johansson, Uri Shalit, and David Sontag. 2016. Learning Representa-tions for Counterfactual Inference. In

Proceedings of ICML .[15] Jintao Ke, Hongyu Zheng, Hai Yang, Xiqun, and Chen. 2017. Short-Term Forecast-ing of Passenger Demand under On-Demand Ride Services: A Spatio-TemporalDeep Learning Approach.

Transportation Research Part C: Emerging Technologies

85 (2017), 591 – 608.[16] Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Opti-mization. arXiv:1412.6980[17] Gregor Lammel, Marcel Rieser, and Kai Nagel. 2008. Bottlenecks and Congestionin Evacuation Scenarios: A Microscopic Evacuation Simulation for Large-ScaleDisasters. In

Proceedings of AAMAS .[18] Yuxuan Liang, Kun Ouyang, Lin Jing, Sijie Ruan, Ye Liu, Junbo Zhang, David S.Rosenblum, and Yu Zheng. 2019. UrbanFM: Inferring Fine-Grained Urban Flows.In

Proceedings of SIGKDD .[19] Xiaolei Ma, Zhuang Dai, Zhengbing He, Jihui Ma, Yong Wang, and YunpengWang. 2017. Learning Traffic as Images: A Deep Convolutional Neural Networkfor Large-Scale Transportation Network Speed Prediction.

Sensors

17, 4 (2017),818.[20] Alfred Müller. 1997. Integral Probability Metrics and Their Generating Classesof Functions.

Advances in Applied Probability

29, 2 (1997), 429–443.[21] Yara Rizk, Mariette Awad, and Edward W. Tunstel. 2018. Decision Making inMultiagent Systems: A Survey.

IEEE Transactions on Cognitive and DevelopmentalSystems

10, 3 (2018), 514–529.[22] Donald B. Rubin. 2005. Causal Inference Using Potential Outcomes.

J. Amer.Statist. Assoc.

Proceedings of ICML .[25] Guni Sharon, Josiah P. Hanna, Tarun Rambha, Michael W. Levin, Michael Albert,Stephen D. Boyles, and Peter Stone. 2017. Real-time adaptive tolling scheme foroptimized social welfare in traffic networks. In

Proceedings of AAMAS .[26] Arambam James Singh, Duc Thien Nguyen, Akshat Kumar, and Hoong ChuinLau. 2019. Multiagent Decision Making For Maritime Traffic Management. In

Proceedings of AAAI .[27] Bharath K. Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Bernhard Schölkopf,Gert RG Lanckriet, et al. 2012. On the empirical estimation of integral probabilitymetrics.

Electronic Journal of Statistics

Proceedings of NeurIPS .[29] Richard S. Sutton and Andrew G. Barto. 2018.

Reinforcement learning: An intro-duction . MIT press. [30] Akira Tanimoto, Tomoya Sakai, Takashi Takenouchi, and Hisashi Kashima. 2021.Regret Minimization for Causal Inference on Large Treatment Space. In

Proceed-ings of AISTATS .[31] Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso.

Journalof the Royal Statistical Society: Series B (Methodological)

58, 1 (1996), 267–288.[32] Jason Tsai, Natalie Fridman, Emma Bowring, Matthew Brown, Shira Epstein,Gal A. Kaminka, Stacy Marsella, Andrew Ogden, Inbal Rika, Ankur Sheel, et al.2011. ESCAPES - Evacuation Simulation with Children, Authorities, Parents,Emotions, and Social comparison. In

Proceedings of AAMAS .[33] Pradeep Varakantham, Hala Mostafa, Na Fu, and Hoong Chuin Lau. 2015. DIRECT:A scalable approach for route guidance in Selfifish Orienteering Problems. In

Proceedings of AAMAS .[34] Stefan Wager and Susan Athey. 2018. Estimation and Inference of HeterogeneousTreatment Effects using Random Forests.

J. Amer. Statist. Assoc.

Proceedings of AAMAS .[36] Tomohisa Yamashita, Takashi Okada, and Itsuki Noda. 2013. Implementationof Simulation Environment for Exhaustive Analysis of Huge-Scale PedestrianFlow.

SICE Journal of Control, Measurement, and System Integration

6, 2 (2013),137–146.[37] Liuyi Yao, Zhixuan Chu, Sheng Li, Yaliang Li, Jing Gao, and Aidong Zhang. 2020.A Survey on Causal Inference. (2020). arXiv:2002.02770[38] Liuyi Yao, Sheng Li, Yaliang Li, Mengdi Huai, Jing Gao, and Aidong Zhang. 2018.Representation Learning for Treatment Effect Estimation from ObservationalData. In

Proceedings of NeurIPS .[39] Jinsung Yoon, James Jordon, and Mihaela van der Schaar. 2018. GANITE: Estima-tion of Individualized Treatment Effects using Generative Adversarial Nets. In

Proceedings of ICLR .[40] Haiyang Yu, Zhihai Wu, Shuqin Wang, Yunpeng Wang, and Xiaolei Ma. 2017.Spatiotemporal Recurrent Convolutional Networks for Traffic Prediction in Trans-portation Networks.

Sensors

17, 7 (2017), 1501.[41] Junbo Zhang, Yu Zheng, and Dekang Qi. 2017. Deep Spatio-Temporal ResidualNetworks for Citywide Crowd Flows Prediction. In

Proceedings of AAAI .[42] Siyuan Zhao and Neil Heffernan. 2017. Estimating Individual Treatment Effectfrom Educational Studies with Residual Counterfactual Networks. In

Proceedingsof EDM .[43] Min Zhou, Hairong Dong, Petros A. Ioannou, Yanbo Zhao, and Fei-Yue Wang.2019. Guided Crowd Evacuation: Approaches and Challenges.