[PDF] STUaNet: Understanding uncertainty in spatiotemporal collective human mobility

Abstract

The high dynamics and heterogeneous interactions in the complicated urban systems have raised the issue of uncertainty quantification in spatiotemporal human mobility, to support critical decision-makings in risk-aware web applications such as urban event prediction where fluctuations are of significant interests. Given the fact that uncertainty quantifies the potential variations around prediction results, traditional learning schemes always lack uncertainty labels, and conventional uncertainty quantification approaches mostly rely upon statistical estimations with Bayesian Neural Networks or ensemble methods. However, they have never involved any spatiotemporal evolution of uncertainties under various contexts, and also have kept suffering from the poor efficiency of statistical uncertainty estimation while training models with multiple times. To provide high-quality uncertainty quantification for spatiotemporal forecasting, we propose an uncertainty learning mechanism to simultaneously estimate internal data quality and quantify external uncertainty regarding various contextual interactions. To address the issue of lacking labels of uncertainty, we propose a hierarchical data turbulence scheme where we can actively inject controllable uncertainty for guidance, and hence provide insights to both uncertainty quantification and weak supervised learning. Finally, we re-calibrate and boost the prediction performance by devising a gated-based bridge to adaptively leverage the learned uncertainty into predictions. Extensive experiments on three real-world spatiotemporal mobility sets have corroborated the superiority of our proposed model in terms of both forecasting and uncertainty quantification.

Full PDF

SSTUaNet: Understanding uncertainty in spatiotemporalcollective human mobility

Zhengyang Zhou

University of Science and Technologyof ChinaHefei, [email protected]

Yang Wang ∗ University of Science and Technologyof ChinaHefei, [email protected]

Xike Xie

University of Science and Technologyof ChinaHefei, [email protected]

Lei Qiao

Beijing Institute of ControlEngineeringBeijing, [email protected]

Yuantao Li

University of Science and Technologyof ChinaHefei, [email protected]

ABSTRACT

The high dynamics and heterogeneous interactions in the com-plicated urban systems have raised the issue of uncertainty quan-tification in spatiotemporal human mobility, to support criticaldecision-makings in risk-aware web applications such as urbanevent prediction where fluctuations are of significant interests.Given the fact that uncertainty quantifies the potential variationsaround prediction results, traditional learning schemes always lackuncertainty labels, and conventional uncertainty quantification ap-proaches mostly rely upon statistical estimations with BayesianNeural Networks or ensemble methods. However, they have neverinvolved any spatiotemporal evolution of uncertainties under vari-ous contexts, and also have kept suffering from the poor efficiencyof statistical uncertainty estimation while training models withmultiple times. To provide high-quality uncertainty quantificationfor spatiotemporal forecasting, we propose an uncertainty learn-ing mechanism to simultaneously estimate internal data qualityand quantify external uncertainty regarding various contextualinteractions. To address the issue of lacking labels of uncertainty,we propose a hierarchical data turbulence scheme where we canactively inject controllable uncertainty for guidance, and henceprovide insights to both uncertainty quantification and weak su-pervised learning. Finally, we re-calibrate and boost the predictionperformance by devising a gated-based bridge to adaptively lever-age the learned uncertainty into predictions. Extensive experimentson three real-world spatiotemporal mobility sets have corroboratedthe superiority of our proposed model in terms of both forecastingand uncertainty quantification. ∗ Prof. Yang Wang is the corresponding author.Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected].

CCS CONCEPTS • Information systems → Web applications ; Web mining ; Spatial-temporal systems . KEYWORDS

Uncertainty quantification, human mobility, spatiotemporal datamining, web of things

ACM Reference Format:

Zhengyang Zhou, Yang Wang, Xike Xie, Lei Qiao, and Yuantao Li. 2021.STUaNet: Understanding uncertainty in spatiotemporal collective humanmobility. In

Proceedings of the Web Conference 2021 (WWW ’21), April 19–23, 2021, Ljubljana, Slovenia.

ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn

Understanding human mobility is crucial for intelligent web appli-cations. Although many researches have delved into the predictionsof human mobility including both individual trajectories and col-lective activities, few works have made the efforts to quantify thespatiotemporal uncertainty for human mobility. However, uncer-tainty quantification, which quantifies the potential variations inprediction results, is important for those applications such as epi-demic forecasting, crowd management and commercial promotionswhere extremes are of significant interests. For instance, given theunprecedented volumes of crowds gathering in Chen Yi Square onthe new year eve of 2015, the crowd monitoring system of Shang-hai failed to accurately predict the abnormal variations, and henceled to a disastrous stampede which killed 36 people [35]. In thisoccasional and challenging scenario, inaccurate prediction and neg-ligent urban management have ignored the potential uncertaintyand risks which were caused by random human behaviors andcomplicated context influences.A surge of works have focused on investigating the regularitiesand variations in individual mobility. They explore the limits ofpredictability in human dynamics by measuring different types ofentropy for individual trajectories [13, 23, 26, 31]. However, dueto the inherent randomness and sparsity in individual trajectories,it is more meaningful to emphasize the uncertainty in spatiotem-poral collective human mobility, which benefits location-based a r X i v : . [ c s . L G ] F e b WW ’21, April 19–23, 2021, Ljubljana, Slovenia Zhengyang Zhou, Yang Wang, Xike Xie, Lei Qiao and Yuantao Li applications, ranging from dynamic public resource allocationsand crowd-based public safety predictions to epistemic controlling.Recently, deep learning-based methods for addressing collectivehuman mobility predictions have been widely studied [3, 33, 35],however, all of them are incapable of capturing such uncertainties.Regarding uncertainties, it can be further categorized into twocategories, epistemic and aleatoric [4, 14, 16]. Epistemic uncertainty,which can be explained with sufficient training data [25], estimatesthe uncertainty in model parameters, while aleatoric uncertaintycaptures intrinsic randomness in data observations. To supportuncertainty quantification in deep learning frameworks, dropout-based Bayesian Neural Networks (BNN) impose a probability dis-tribution over learnable model parameters, and the variations onmodel parameters can be viewed as uncertainties [7, 8, 14, 21, 29].Inspired by model ensembles and physical random systems, fewpioneering non-Bayesian methods such as ensemble-based [17]and Brownian Motion-based methods [16] were proposed to modelthe randomness of learning process in vision-related tasks. Nev-ertheless, existing works in this aspect mostly passively learn theuncertainties from statistical estimation of testing results, which failto internalize uncertainty extractions into the model and haven’tconsidered the evolution of uncertainty over time and contexts.Differing from existing efforts, we here focus on spatiotemporaluncertainty quantification to achieve the comprehensive under-standing of potential predictive fluctuations in collective humanmobility, by internalizing active uncertainty extraction into ourframework. Specifically, we discover that spatiotemporal tendencyof human mobility can be decomposed into two parts, i.e., long-term periodicity and irregular instantaneous fluctuations. A casestudy of mobility distributions in Suzhou Industry Park (SIP) isillustrated in Figure 1(a)-(b) to reveal this observation. As high-lighted, the local drifts and fluctuations in different degrees mayprohibit accurately future mobility predictions, therefore there ex-ists an urgent demand on quantifying spatiotemporal fluctuations.However, this is an intractable task because the correlations be-tween uncertainties and spatiotemporal context factors are highlyimplicit and ambiguous. For instance, even the same contexts canhave spatially heterogeneous influences on regions with differentlocal functionalities, as all factors will have complex interactionswith each other. Moreover, we consider the absolute error as anuncertainty indicator, and compare the spatial heatmaps of bothpredictions and uncertainty quantification in Figure 1(c)-(d). Fromthese figures, we discover an obvious task-related spatial misalign-ment between predictions and uncertainty quantification, and thiskind of misalignment may consequently bring unexpected ineffi-ciency to those methods which share the same feature extractorsin both prediction and uncertainty learning stages [14, 17, 30].In this paper, we propose a SpatioTemporal Uncertainty-awareprediction Network (STUaNet), to address the spatial misalignmentissue between uncertainty learning and mobility prediction tasks.Specifically, our STUaNet consists of two modules, a spatiotempo-ral mobility prediction module to capture temporal dependencieswith a graph-based sequential learning structure and a Content-Context Uncertainty Quantification module (C2UQ) to quantifyuncertainties considering both spatiotemporal dependencies andheterogenous context influences. To actively extract uncertainty (c) Heatmap of human mobility prediction in SIP on March 25th, 2017 (d) Heatmap of absolute errors of human mobility predictions in SIP on March 25th, 2017

Averaged pearson correlations = 0.9506 Averaged pair-wise pearson correlations = 0.9283 (a) Flow distributions of different weekdays in

SIP during March 13th to 17th, 2017 (b) Flow distributions of two pairs of neighboring regions on Jan 17th, 2017

Figure 1: Examples of uncertainty in human mobility distri-bution. Subfigures (a)-(b) reveal the temporal periodicitiesand spatial correlations in traffic flows of SIP. The circleshighlight different degrees of local drifts and fluctuationsof flows. Subfigures (c)-(d) illustrate the spatial heatmaps ofboth predictions and uncertainty quantification. from multi-source spatiotemporal observations in a learnable man-ner, we first classify the uncertainty sources of spatiotemporal datainto internal content and external context, and then resolve theproblem of uncertainty quantification with three main techniquesof C2UQ. (i) Neural data quality estimation.

Given the reasonthat data quality can significantly influence the uncertainty of pre-dictions, we devise a similarity-based time-series quality estimationmethod to measure internal content consistencies and detect insta-neous pattern variations in spatiotemporal data. (ii) Context in-teraction learning.

To capture complicated interactions betweenmultiple contextual factors and human mobility uncertainties, wepropose a Factorization Machine-Graph Convolution Network (FM-GCN) to learn mapping functions from different contextual factorsto region-level uncertainty where context-level interactions andspatial dependencies are captured by FM and GCN, respectively. (iii) Active uncertainty learning.

To address the issue of lack-ing uncertainty labels, during the training phase, we propose twoweak supervised indicators and impose different spatiotemporalturbulences to imitate the Out-Of-Distribution (OOD) and randomnoises which correspondingly refer to epistemic and aleatoric un-certainties, and eventually enable an active weak supervised uncer-tainty learning. Further, we design a Gated Mobility-uncertaintyRe-calibrate bridge (GMuR) to leverage the associated uncertainty,and finally boost both uncertainty learning and task-specific pre-dictions. In summary, the main contributions of this paper are asfollows.

TUaNet: Understanding uncertainty in spatiotemporal collective human mobility WWW ’21, April 19–23, 2021, Ljubljana, Slovenia • To our best knowledge, we are the first to focus on the quan-tification of spatiotemporal uncertainty with internalizeduncertainty extraction. We further innovatively re-calibratepredictions by taking advantage of uncertainty quantifica-tion, and this is an initial step on investigating how to makebetter use of uncertainty in optimizing predictions. • By proposing two novel uncertainty indicators, our activehierarchical uncertainty learning enables implicit but quan-tifiable pseudo labels to guide unlabelled learning, and thisprovides prominent insights for novel indicator designs withregard to weak supervised information and active guidedtraining schemes for consciously learning specific character-istics.

Human mobility prediction.

Recent years have witnessed a surgeresearch focusing on human mobility predictions which can be di-vided into two categories, the predictions of collective mobility andindividual-level trajectories. Regarding the prediction of collectivehuman mobility including traffic flows [10, 35] as well as taxicabpick-ups and drop-offs [33], this issue is traditionally resolved byextracting spatiotemporal correlations with advanced variants ofConvolution Neural Network (CNN) encoders. Further, by takingadvantage of constructing grids into urban graph, graph-based deeplearning method was introduced to forecast spatiotemporal mobil-ity by employing full convolution blocks [34]. Based on [34], Bai et,al. subsequently devised a graph sequence-based learning schemeto iteratively predict passenger demands [1]. Regarding individ-ual trajectory predictions, given sparse and long-range trajectorysequences, [20] and [5] were both proposed to jointly predict thehuman activity and location by carefully involving sequential pat-terns. Besides, [27] and [37] investigated individual human mobilityregularities, and model personalized sequential patterns of users fornext POI recommendation with Long Short-Term Memory (LSTM)network and spatiotemporal gated network, respectively. Neverthe-less, none of them predict the results associated with uncertainty.

Predictability and uncertainty in human mobility.

Previ-ous works have been confined to the predictability and uncertaintyof individual mobility based on entropy. [26] explored the limitsof predictability by studying the mobility patterns of anonymousmobile phone users, and they also identified a potential 93% av-erage predictability in user mobility based on three types of en-tropy. These kinds of entropy respectively characterized spatiallocation randomness, heterogeneity of user visitation patterns andspatiotemporal order presented in personal mobility pattern. Theirfollow-up work [23] derived the maximum predictability of 88%, byconsidering both stationary and non-stationary trajectories. Differ-ent from above works, [13] discovered that the low uncertainty inabove works was highly dependent on the selected spatiotemporalscales as people didn’t move in very short period. In this way, theypredicted human mobility from two aspects, next location and staytime, and found an upper limit on predictability of 71% by usingnatural length scale. Recently, [2] tried to incorporate exogenousfactors as uncertain factors to estimate shared mobility availabilitybut they didn’t exactly perform uncertainty quantification. Subse-quently, [9] observed remarkable heterogeneity in individual views and further uncovered an underlying consistency between spa-tial and temporal human mobility in the collective spatiotemporalview, which maybe inherently related to the nature of human be-havior. These emerging researches only derived a time-invariantpredictability and cannot be directly used to address the challengesthat we are facing. However, these preliminary efforts motivatedus to explore the potential variations of collective human mobilityunder different contexts.

Uncertainty quantification in deep learning.

Emerging deepuncertainty quantification methods can be categorized into Bayesianand non-Bayesian lines. Bayesian methods quantify the uncertaintyby imposing a probability distribution over model parameters andapproximate the posterior distribution [16]. To infer Bayesian pos-terior for multi-layer perceptions, an easy-implemented variationalapproach was proposed by employing dropout and Monte-Carlosampling [7, 8]. Going after above works, [14] comprehensivelyanalyzed how to model epistemic and aleatoric uncertainty, success-fully quantified the uncertainty of image-level classification anddetection based on BNN, and eventually improved the predictionreliability of risk-sensitive vision-related tasks. On the other hand,the representative Non-Bayesian methods [17] were proposed totrain multiple neural networks with different initialization andquantify uncertainty based on the statistics of prediction results.In addition, another state-of-the-art Stochastic Differential Equa-tion (SDE) model imitated the system diffusion and captured bothepistemic and aleatoric uncertainty by involving the Brownianmotion [16]. Specifically, for spatiotemporal uncertainty, [28] firstproposed the concept of uncertainty in spatiotemporal databases,and then researchers have started to explore the uncertainties inlarge-scale climate datasets [24, 29], as well as uncertainty in nu-merical weather forecasting [22, 30] with BNN methods. Theseworks were the beginning where uncertainty quantification was in-troduced into spatiotemporal data mining, but they failed to capturethe spatial dependency and temporal evolution of uncertainties.These above-mentioned uncertainty learning methods are also lim-ited in addressing our spatiotemporal uncertainty quantificationdue to the following limitations, (i) incapability of capturing thespatiotemporal uncertainty evolution, (ii) failing to map the variouscontext influences on uncertainties, and (iii) inefficiency of multipletimes of training.In summary, even though extensive efforts have been achievedin enhancing the prediction performances and understanding thenature of human mobility, the uncertainty issue in human mobilityprediction has never been systematically considered due to lackingawareness of spatiotemporal evolutions and context interactions.Therefore, in this paper, we will tackle this issue with a systematicperspective.

Definition 1 (

Urban graph and Urban regions ). Thestudy area can be defined as an undirected graph, called

Urban Graph .Following previous works [12, 32], the whole city is discretized intoa set of 𝑁 urban regions (e.g., POI locations, pick-up/drop-off loca-tions and road intersections) and can be constructed as an urbangraph 𝐺 ( V , E ) . Here, the urban regions are composed of the vertexset V = { 𝑟 , 𝑟 , · · · , 𝑟 𝑁 } . Given two urban regions 𝑟 𝑖 and 𝑟 𝑗 , the edge WW ’21, April 19–23, 2021, Ljubljana, Slovenia Zhengyang Zhou, Yang Wang, Xike Xie, Lei Qiao and Yuantao Li 𝑒 𝑖 𝑗 ∈ E within these two urban regions can be instantiated as thegeographical proximity and the potential mobility transitions. Definition 2 (

Region human mobility intensity ). To de-scribe the dynamic urban mobility intensity, we discretize time domaininto equal intervals (e.g. 30 min). For region 𝑟 𝑖 , the human mobilityintensity 𝐻 𝑡𝑖 represents the number of active persons in region 𝑟 𝑖 atthe interval 𝑡 . Problem 1 (

Collective human mobility prediction withuncertainty qantification ). Given historical collective hu-man mobility observations of region 𝑖 , 𝐻 𝑖 , ..., 𝐻 𝑇𝑖 where 𝑖 = 1 , ..., 𝑁 ,we aim to simultaneously perform the point estimation of human mo-bility and the associated uncertainty quantification ( (cid:154) 𝐻 𝑇 +1 𝑖 , (cid:154) 𝜎 𝑇 +1 𝑖 ) inthe next interval 𝑇 + 1 , where the numerical uncertainty (cid:154) 𝜎 𝑇 +1 𝑖 quanti-fies the potential variations around the prediction results. Namely, weaim to minimize the predicted uncertainty (cid:154) 𝜎 𝑇 +1 𝑖 while optimize tAheprediction interval considering variations [ (cid:154) 𝐻 𝑇 +1 𝑖 − (cid:154) 𝜎 𝑇 +1 𝑖 , (cid:154) 𝐻 𝑇 +1 𝑖 + (cid:154) 𝜎 𝑇 +1 𝑖 ] to maximumly cover the ground truth. The proposed uncertainty-aware spatiotemporal prediction frame-work STUaNet is illustrated in Figure 2. To disentangle task-relatedspatial misalignments, we design a double-head network whichconsists of three components, a spatiotemporal prediction module,a content-context uncertainty quantification module, and a gatedmobility-uncertainty re-calibration bridge. To actively learn theuncertainty, we propose a three-layer training architecture, whichimitates the pure data, noisy data and OOD data by imposing dif-ferent quantifiable spatiotemporal turbulences to training samples.We will elaborate each module in the following sections.

This component serves as a spatiotemporal predictor which predictsthe mobility intensity jointly in all regions in the next time inter-val based on historical mobility observations. As spatiotemporalforecasting has been well investigated [1, 34, 36], we here leveragea widely applied spatiotemporal model with a slight modification.We modify it by introducing the mobility transition proximity intoadjacent matrix to adapt the collective mobility prediction. Thispredictor combines graph convolution module and an LSTM asthe backbone, to extract the spatial and temporal dependencies,respectively. According to the closeness and multi-level periodicityin human mobility sequences, we first construct the time periodas 𝑝 consecutive intervals, and then retrieve a series of historicalperiods in the database for next-interval predictions. The selectedperiods are layered in three scales, i.e., the closeness period P 𝑐 consisting of nearest 𝑝 intervals, the period of daily periodicity P 𝑑 consisting of the same intervals as P 𝑐 in most recent 𝑞 days, andthe period for summarized long-term weekly pattern P 𝑙 obtainedby averaging the mobility intensity of the same intervals as P 𝑐 in each day of last week. Then we have 𝑞 + 2 groups of periodsand ( 𝑞 + 2) ∗ 𝑝 intervals. To characterize the transition patterns inurban human mobility and thus better capture the dynamic spatial dependencies, we borrow the idea of cross-city migration modelbased on gravity systems, for their transition pattern similaritiesbetween urban mobility flows and city-wise migrations [38]. Thismigration model demonstrates the fact that transitions betweentwo specific regions are proportional to current flows in each re-gion and inversely correlated with their spatial distances. Concisely,given time interval 𝑡 , we instantiate the edges of urban graph asa time-sensitive mobility-involved adjacent matrix 𝑨 𝑡 , where wesimultaneously consider the potential transition pattern and thegeographical proximity. Each element 𝐴 𝑡𝑖 𝑗 in this adjacent matrixcan be formally given by, 𝐴 𝑡𝑖,𝑗 = 𝑒 − 𝑑𝑖𝑠𝑡 ( 𝑟 𝑖 ,𝑟 𝑗 ) + 𝜌 × log( 𝐻 𝑡𝑖 × 𝐻 𝑡𝑗 𝑑𝑖𝑠𝑡 ( 𝑟 𝑖 , 𝑟 𝑗 ) ) (1)where 𝑑𝑖𝑠𝑡 ( 𝑟 𝑖 , 𝑟 𝑗 ) is the Euclidean distance and 𝐻 𝑡𝑖 , 𝐻 𝑡𝑗 are the mo-bility intensities in 𝑟 𝑖 and 𝑟 𝑗 , in the corresponding interval of 𝑡 . Thescalar factor 𝜌 adjusts the proportion of region-wise transitions inoverall adjacent matrix.In what follows, we respectively summarize daily periodicity P 𝑑 and long-term pattern P 𝑙 into one interval by average, yieldingtotally 𝑝 + 2 intervals for sequence learning. We employ 𝑝 + 2 GCNblocks to parallelly extract the spatial correlations and noted herewe share the same adjacent matrix 𝑨 𝑡 ∗ in each GCN block for thesame period by averaging the adjacent matrix of all intervals in thecorresponding period. To learn the temporal dependencies, we feedthe feature map sequences along with the corresponding contextfactors (e.g. timestamps and weather) into a mobility LSTM andfinally obtain the citywide mobility intensity in the 𝑇 + 1 interval.We formulate the prediction task as, (cid:154) 𝑯 𝑇 +1 = LSTM ( GC ( 𝑨 𝑡 ∗ , 𝑯 P 𝑐 , 𝑯 P 𝑑 , 𝑯 P 𝑙 ; 𝜽 𝑔𝑐 ); 𝜽 𝐿 ) (2)where 𝑯 P 𝑐 , 𝑯 P 𝑑 , 𝑯 P 𝑙 are well-organized citywide human mobil-ity sequences during corresponding periods. And note that GC isthe graph convolution neural network parameterized by learnable 𝜽 𝑔𝑐 , and the mobility LSTM neural network is parameterized bylearnable 𝜽 𝐿 . Here we take one of the GCN blocks to demonstratethe graph convolution GC by denoting the 𝑘 -th hidden layer ofGCN as 𝑯 𝑘𝐺 , 𝑯 𝑘𝐺 = ReLU( ˜ 𝑫 𝑡 ∗ 𝐴 − / (cid:101) 𝑨 𝑡 ∗ ˜ 𝑫 𝑡 ∗ 𝐴 − / 𝑯 𝑘 − 𝐺 𝑾 𝑘 − 𝑔𝑐 ) (3)where (cid:101) 𝑨 𝑡 ∗ = 𝑨 𝑡 ∗ + 𝑰 𝑁 and ˜ 𝑫 𝑡 ∗ 𝐴 is the degree matrix for (cid:101) 𝑨 𝑡 ∗ . Werespectively initialize 𝑯 𝐺 as 𝑯 P 𝑐 , 𝑯 P 𝑑 𝑯 P 𝑙 in each GCN block. 𝑾 𝑘𝑔𝑐 are a series of learnable parameters that constitute the 𝜽 𝑔𝑐 andwe utilize ReLU as the non-linear activation function. We propose a brand-new Content-Context Uncertainty Quantifi-cation network (C2UQ), which is tailored for spatiotemporal un-certainty learning. The C2UQ shares the same inputs with theprediction component but has a different structure for uncertaintylearning. Intuitively, spatiotemporal uncertainty can arise from twoscenarios, the internal data noise which suffers from two aspects,

TUaNet: Understanding uncertainty in spatiotemporal collective human mobility WWW ’21, April 19–23, 2021, Ljubljana, Slovenia

Human mobility from web applications Context factors

Mobility transition patterns … Region-wise temporal representationIntra-regionsimilarity learning

Neural data quality estimation

Spatial embedding Temporalembedding Weather embedding

Context interaction learning

Factorization

Machine-

GCN

Period-wise uncertainty aggregation

Internal content uncertainty

Spatiotemporal predictionContent-Context

Uncertainty

Quantification

Gated Mobility-uncertaintyRe-calibration

Uncertaintymap ෝ𝒚ෝ𝝈

Calibrated mobility map

Content-ContextLSTM

Similarity learning

Human mobility observation

Loss

Parallel Graph

Convolution blocksExternal context uncertainty

Ground truth

Aggregated period-wiseuncertainty

Mobility LSTM

Period-wise ST Variance

Uncertainty indicator

Period-wise data error

Quality indicator

LossLoss L2 Loss

Gated Re-calibration

Figure 2: Framework overview of STUaNet collections and measurements, and the complicated and heteroge-neous uncertainties produced by various external contexts, e.g. tem-poral evolution, weathers and urban events. The proposed C2UQconsists of two dedicated designed modules, neural data qualityestimation and context interaction learning where these two mod-ules are responsible for extracting uncertainties from internal datanoise and external factor influences, respectively.

Data quality estimation isa non-trivial task as it doesn’t have explicit quantifiable noise la-bels. In our research, we find that spatiotemporal data enjoys thenice property of periodicity and closeness, and human mobilitysequences usually reveal the multi-level periodicity in both long-term and short-term [6]. This intuition provides us the opportunityto detect the internal data noises and sequence pattern variationsfrom the content perspective, by computing period-wise sequencesimilarities where we reuse the concept of period in Section 4.2.To this end, we propose the multi-scale similarity-based neuraldata quality estimation which is illustrated in Figure 3. The neuraldata quality estimation is specified to each individual region to cap-ture the internal consistency of data observations. We implementthis data quality estimation as follows. Firstly, we organize 𝑞 + 2periods as a human mobility sequence for each region, i.e. closenessperiod P 𝑐 , daily periodicity P 𝑑 𝑖 ( 𝑖 ∈ , , ..., 𝑞 ) and long-term weeklypattern P 𝑙 . For region 𝑟 𝑖 , we denote the selected 𝑞 + 2 historicalperiods of its human mobility intensity set H 𝑂𝑖 as, H 𝑂𝑖 = { 𝒉 𝑙𝑖 , 𝒉 𝑑 𝑖 , ..., 𝒉 𝑑 𝑞 𝑖 , 𝒉 𝑐𝑖 , } (4) and we correspondingly redefine the superscript of 𝒉 ∗ 𝑖 in H 𝑂𝑖 as { , , , ..., 𝑞 + 1 } for simplicity. We then develop our model in aperiod-level and each period still consists of 𝑝 intervals. 𝒉 𝒊 𝒉 𝒊𝟏 𝒉 𝒊 … 𝒉 𝒊 𝒒+𝟏 ℎ 𝑖 Context factors

Period-wise embedding layer 𝑰 𝒊 𝑰 𝒊𝟏 𝑰 𝒊𝟐 𝑰 𝒊 𝒒+𝟏 … Similarity learning 𝒔 𝒊 𝒔 𝒊 𝒔 𝒊 𝒔 𝒊 𝒒+𝟏 Data quality transformation

Data content: multi-scale observation sequences

Weekly summarized pattern

Daily periodicity pattern Closeness sequence ℎ ℎ … … Figure 3: Structure of neural data quality estimation

Second, we embed the context factors of respective periods andadd them into the period sequence. A neural network (e.g. fully-connected layers) is then imposed to each period sequence forlearning period-level deep representations. The context encapsu-lated mobility intensity 𝑰 𝑗𝑖 during period 𝑗 is formulated by 𝑰 𝑗𝑖 = 𝑾 𝑗𝑖 (( 𝑾 𝑗𝑒𝑥 𝑖 𝒆𝒙 𝑗𝑖 ) ⊕ 𝒉 𝑗𝑖 ) + 𝒃 𝑗𝑖 (5) WW ’21, April 19–23, 2021, Ljubljana, Slovenia Zhengyang Zhou, Yang Wang, Xike Xie, Lei Qiao and Yuantao Li where 𝒆𝒙 𝑗𝑖 denotes the concatenated 𝐿 𝑐 -dimensional contextualfactors of 𝑗 -th period at the region 𝑖 , 𝑾 𝑗𝑒𝑥 𝑖 ∈ R 𝑁 × 𝐿 𝑐 is the weightof mapping function for aligning the dimensions of external factorsto the same as 𝒉 𝑗𝑖 . Here, 𝑾 𝑗𝑖 ∈ R 𝐿 𝑒 × 𝑁 and 𝒃 𝑗𝑖 ∈ R 𝐿 𝑒 × are learnableweight and bias for period embedding, where 𝐿 𝑒 denotes embed-ded period dimension. Noted that ⊕ is the element-wise additionoperation.Subsequently, we compute the period-wise similarities amongall periods, and for period 𝑚 we have, 𝑠 𝑚𝑖 = 1 𝑞 + 1 𝑞 +2 ∑︁ 𝑗 =1( 𝑗 ̸ = 𝑚 ) sim( 𝑰 𝑚𝑖 , 𝑰 𝑗𝑖 ) (6)where 𝑠 𝑚𝑖 is the 𝑖 -th element in vector 𝒔 𝑚 , and the similarity ismeasured by the Hadamard product between 𝑰 𝑚𝑖 and 𝑰 𝑗𝑖 ,sim( 𝑰 𝑚𝑖 , 𝑰 𝑗𝑖 ) = 𝑰 𝑚𝑖 · 𝑰 𝑗𝑖 (7)As data noise and uncertainty of observations are inversely corre-lated with the period-wise similarity, we can obtain uncertainty ininternal content view by imposing a negative exponential functionand linear transformations to 𝑠 𝑚𝑖 where 𝑾 𝑚𝐼 ∈ R 𝑁 × 𝑁 , 𝒃 𝑚𝐼 ∈ R 𝑁 × are parameters in the learnable transformation. The citywide inter-nal content uncertainty 𝑼 𝑚𝐼 of period 𝑚 is learned by, 𝑼 𝑚 I = 𝑾 𝑚 I 𝑒 − 𝒔 𝑚 + 𝒃 𝑚 I (8)It is worth noting that the weekly period 𝑰 𝑖 (corresponding to 𝒉 𝑙𝑖 ) isthe high-level summarized mobility pattern during the same periodof different days, which can be viewed as a multi-scale temporalpattern, along with consecutive mobility intensities from 𝑰 𝑖 to 𝑰 𝑞𝑖 . As pointed earlier, externalcontextual factors tend to interact with each other and contributeto the prediction uncertainty. For instance, mobilitIES in regionsof different functionalities are with various sensitivity to extremeweather. And the mobility volumes become difficult to quantifywhen there exist urban events such as concerts and accidents, be-cause urban travelers will randomly select unhindered segmentsheading for their destinations. This can lead to spatially increasingmobility uncertainty. With these intuitions, we propose a deep Fac-torization Machine-based Graph Convolutional Network (FM-GCN)to quantify context influences on uncertainty by simultaneouslymodeling context interactions and spatial dependencies betweenvarious external contextual factors.Technically, deep factorization machine was proposed in [19] toextract the field-wise interactions by performing vector-level multi-plication and learning feature interactions implicitly in recommen-dation system. In our work, we take advantage of this vector-levelinteraction modeling in FM and the spatially uncertainty propaga-tion learning in GCN, and then seamlessly combine the FM andGCN to achieve the mappings from context interactions to spa-tiotemporal uncertainties. To be detailed, as Figure 4 shown, giventhe period 𝑚 and region 𝑖 , we first embed the 𝑄 categories of contextfactors into 𝑄 vectors 𝒆 𝑚,𝑐 𝑢 𝑖 ( 𝑢 = 1 , , ..., 𝑄 ) with the fixed-lengthof 𝐿 𝑐𝑒 as different fields in our FM, where 𝑐 𝑢 represents the 𝑢 -thcategory of the context factor. Given period 𝑚 and region 𝑖 , we formulate the field-wise inter-action learning as, 𝒆 𝑚, ( 𝑐 𝑢 ,𝑐 𝑣 ) 𝑖 = ( 𝒆 𝑚,𝑐 𝑢 𝑖 · 𝒆 𝑚,𝑐 𝑣 𝑖 ) 𝑾 ( 𝑐 𝑢 ,𝑐 𝑣 ) 𝐸 𝑖 (9)where 𝑾 ( 𝑐 𝑢 ,𝑐 𝑣 ) 𝐸 𝑖 ∈ R 𝐿 𝑐𝑒 × 𝐿 ie is the learnable weight that implementsthe interaction learning between 𝑢 -th and 𝑣 -th factors and mapsthe interaction embeddings into 𝐿 ie dimensions. By concatenatingembeddings of all fields and their counterpart interactions together,we obtain the region-wise period-level context interactions 𝑬 𝑚𝑖 ∈ R × ( 𝑄 × 𝐿 𝑐𝑒 + 𝑄 ( 𝑄 − × 𝐿 ie ) . Deep Context

Factorization Machine Spatial Graph

Convolution

Weather

Time stamps

Spatial

Embedding

Spatial- weather

Spatial- time …… Learned context representations 𝒆 𝒊𝒎,𝒄 𝒆 𝒊 𝒎,𝒄 𝒆 𝒊𝒎, 𝒄 ,𝒄 Multiple context factors 𝒆 𝒊𝒎,𝒄 𝒆 𝒊𝒎, 𝒄 ,𝒄 𝑬 𝒎 𝑬 𝒊 𝒎 Figure 4: Details of FM-GCN

Motivated by the capacity of non-Euclidean spatial propagationin GCN, we hereby employ the GCN blocks to carry out the spatialinfluence aggregation with our mobility-involved adjacent matrix.By defining the compressed context interactions 𝑬 𝑚𝑖 as features ofnode 𝑖 in urban graph where 𝑬 𝑚𝑖 is the element in citywide contextinteraction tensor 𝑬 𝑚 , We can perform spatial aggregation by Eq.3.The citywide uncertainty regarding external context interactions 𝑼 𝑚𝐸 can be learned by, 𝑼 𝑚𝐸 = GC ( 𝑨 𝑚 , 𝑬 𝑚 ; 𝜽 𝐹 𝑔𝑐 ) (10)where 𝑨 𝑚 denotes the average of adjacent matrices at all intervalsduring period 𝑚 , and 𝜽 𝐹 𝑔𝑐 is the graph convolution kernels to per-form spatial context aggregation. The proposed FM-GCN jointlyextracts the context interactions and aggregates spatial influences,thus the graph convolution structure enjoys the flexible kernelnumbers to further generate feature interactions, and ultimatelysqueeze them to 𝑁 -dimension vector which represents region-wiseuncertainty. We have de-scribed the period-level internal content uncertainty and spatiotem-poral external context uncertainty in previous sections, where 𝑼 ( 𝑚,𝑖 ) 𝐼 and 𝑼 ( 𝑚,𝑖 ) 𝐸 are elements in two respective tensors 𝑼 𝐼 and 𝑼 𝐸 ,referring to two sources of uncertainties in region 𝑖 at period 𝑚 . Inthis section, to jointly optimize these two kinds of uncertainties and TUaNet: Understanding uncertainty in spatiotemporal collective human mobility WWW ’21, April 19–23, 2021, Ljubljana, Slovenia incorporate temporal uncertainty evolutions, we propose a period-wise LSTM. We realize this process in two steps, 1) combining thetwo kinds of uncertainties into period-wise overall uncertainty 𝑼 𝑚𝑜 with an aggregation function aggr( · ) , 2) capturing temporalevolution patterns of uncertainties among different periods withContent-Context uncertainty LSTM (C2-LSTM). The citywide uni-fied uncertainty of the 𝑇 + 1 interval 𝑼 𝑇 +1 can be predicted by, 𝑼 𝑚𝑜 = aggr ( 𝑼 𝑚𝐼 , 𝑼 𝑚𝐸 ) (11) 𝑼 𝑇 +1 = C2 − LSTM ( 𝑼 𝑚𝑜 ; 𝜽 𝐶𝐿 ) (12)where 𝜽 𝐶𝐿 denotes parameters in the C2-LSTM, and aggr( · ) can beinstantiated as concatenation or matrix-based fusion. Epistemic uncer-tainty can be explained as distribution differences between thesamples input and samples have been trained, while aleatoric un-certainty is explained as the inherent noise and random influencesthat cannot be explained explicitly. Based on that, instead of pas-sively learning uncertainties with multiple times of training, wedesign a hierarchical data turbulence scheme to imitate the OODsamples, slightly noisy samples for actively learning epistemic andaleatoric uncertainty. This data turbulence mechanism is imple-mented through adding different degrees of Gaussian noises intoexisting samples with a noise injection function

Noise ( · ). The noiseinjections can perform drastic turbulence and tiny drift to simulateOOD and noisy samples, respectively. For a specific region 𝑟 𝑖 at theinterval ∆ 𝑡 , the corrupted observation value 𝐻 ∆ 𝑡𝐶 𝑖 can be derived by, 𝐻 ∆ 𝑡𝐶 𝑖 = Noise ( 𝐻 ∆ 𝑡𝑖 ) (13)Even though, the core challenge of lacking definite labels inuncertainty quantification still remains unresolved. To guide theuncertainty learning, we regard this task as weak supervised learn-ing and subsequently propose two indicators of both data qualityand unified spatiotemporal uncertainty, then we can actively quan-tify the uncertainty in cooperation with above data turbulencescheme.Firstly, based on above analysis of the active hierarchical sam-ple turbulence, our neural data-quality estimation is analogous toepistemic uncertainty modeling, thus data-quality estimation cannot only serve to quantify random noise in aleatoric perspective,but has another role of detecting the OOD samples in epistemicperspective. The quantified degree of region-wise noise is directlymeasured by the absolute error between original samples and noisysamples, and consequently can be viewed as the weak supervisedinformation in our data quality estimation. Formally, we presentthe weak supervised data quality indicator in interval ∆ 𝑡 as, 𝜎 ( 𝑖, ∆ 𝑡 ) 𝑞𝑢𝑎 = | 𝐻 ∆ 𝑡𝑖 − 𝐻 ∆ 𝑡𝐶 𝑖 | (14)For period-level uncertainty quantification, we also need to averagethe interval-level data quality into period-level, e.g 𝜎 ( 𝑖,𝑚 ) 𝑞𝑢𝑎 at the 𝑚 -th period.Secondly, to achieve final predicted uncertainty, we need to findan informative indicator of unified period-level uncertainty. Herewe refer to variance, which is a statistic for dispersion measurementand can be seen as the potential variations and uncertainty in re-gression tasks [11, 25]. Inspired by the spatial proximity, temporal periodicity and closeness in spatiotemporal data, we hereby pro-pose a spatiotemporal variance as a weak supervised loss to learnthe uncertainty mapping functions from historical observations.We first define stdv( V ∗ ) as the function for computing the standarddeviation for a set of values in V ∗ . Specifically, for region 𝑟 𝑖 atperiod 𝑚 , the proposed spatiotemporal variance is determined bythree views.

1) Spatial view:

Associated with the adjacent matrix 𝑨 𝑡 , we select a set of neighboring observations of 𝑟 𝑖 and calculatethe standard deviation of the observations for each interval , andtake the average value of the deviations of all intervals into period-level as 𝑣𝑎𝑟 ( 𝑚,𝑖 ) 𝑠 .

2) Inter-period view:

We retrieve observationsof the same intervals in each 𝑞 + 2 period, measure the observationdeviations for these intervals, and compress these deviations intoa period-level variance 𝑣𝑎𝑟 ( · ,𝑖 ) ep .

3) Intra-period view:

We calcu-late the interval-wise standard deviations of observations for eachperiod as the intra-period variance 𝑣𝑎𝑟 ( 𝑚,𝑖 ) ip . By denoting the ob-servation in 𝑗 -th interval of period 𝑚 at region 𝑟 𝑖 as (cid:103) 𝐻 𝑗𝑟 𝑖 ( 𝑚 ), weformally have, 𝑣𝑎𝑟 ( 𝑚,𝑖 ) 𝑆 = 1 𝑝 𝑝 ∑︁ 𝑗 =1 stdv 𝑟 𝑘 ∈N ( 𝑟 𝑖 ) ( (cid:103) 𝐻 𝑗𝑟 𝑘 ( 𝑚 )) (15) 𝑣𝑎𝑟 ( · ,𝑖 ) 𝑒𝑝 = 1 𝑝 𝑝 ∑︁ 𝑗 =1 stdv 𝑏 ∈{ ,...,𝑞 +1 } ( (cid:103) 𝐻 𝑗𝑟 𝑖 ( 𝑏 )) (16) 𝑣𝑎𝑟 ( 𝑚,𝑖 ) 𝑖𝑝 = stdv 𝑗 ∈{ , ,..,𝑝 } ( (cid:103) 𝐻 𝑗𝑟 𝑖 ( 𝑚 )) (17)where N ( 𝑟 𝑖 ) is the neighboring region set of 𝑟 𝑖 . For simplicity, weaverage the three types of variances as the spatiotemporal varianceindicator 𝑣𝑎𝑟 𝑆𝑇 in the specific spatiotemporal domain, which iswritten as, 𝑣𝑎𝑟 𝑆𝑇 ( 𝑚,𝑖 ) = Avg( 𝑣𝑎𝑟 ( 𝑚,𝑖 ) 𝑆 , 𝑣𝑎𝑟 ( 𝑚.𝑖 ) 𝑒𝑝 , 𝑣𝑎𝑟 ( 𝑚,𝑖 ) 𝑖𝑝 ) (18)where Avg is the average aggregation function. Then these dataquality and spatiotemporal uncertainty indicators will correspond-ingly change with the data turbulences for uncertainty quantifica-tion. The objective of our uncertainty quantification can be generallysummarized as two aspects, to learn what the model does not know,and to maximumly boost the prediction performance. Hence, it isof great significance to further capture the reciprocity and interac-tions between the uncertainty and predicted results. We argue thatuncertainties can be decomposed into irreducible variation whichcan be seen as the inherent randomness, and the complementaryparts to prediction results, which may be reducible and helpfulto prediction task. Based on this intuition, we propose a GatedMobility-uncertainty Re-calibration (GMuR) bridge to proactivelylearn the complementary parts and interactions between point es-timation and uncertainty quantification, cooperatively benefitingboth tasks from each other. The idea of GMuR is to learn how uncer-tainty variations impact prediction results and subsequently reduce Here we select the top-5% most spatially nearest regions as its neighbors.

WW ’21, April 19–23, 2021, Ljubljana, Slovenia Zhengyang Zhou, Yang Wang, Xike Xie, Lei Qiao and Yuantao Li the uncertainty itself. Formally, we denote the Gate as 𝑓 𝑔𝑎𝑡𝑒 , thenthe calibrated mobility intensity and uncertainty can be written as: 𝑓 𝑔𝑎𝑡𝑒 = tanh( 𝑾 𝑔𝑎𝑡𝑒 ( (cid:154) 𝑼 𝑇 +1 ◦ (cid:154) 𝑯 𝑇 +1 )) (19) (cid:154) 𝑯 𝑇 +1 = (cid:154) 𝑯 𝑇 +1 + (cid:154) 𝑼 𝑇 +1 · 𝑓 𝑔𝑎𝑡𝑒 (20) (cid:154) 𝝈 𝑇 +1 = (cid:154) 𝑼 𝑇 +1 − (cid:154) 𝑼 𝑇 +1 · 𝑓 𝑔𝑎𝑡𝑒 (21)where ◦ is the concatenation operation, and 𝑾 𝑔𝑎𝑡𝑒 ∈ R 𝑁 × 𝑁 isthe learnable weight to map the concatenation of learned uncer-tainty and predicted results to an 𝑁 -dimension gate. Here we selecttanh as the activation function to allow both positive and negativevariations to transfer into predictions. With two weak indicators, GMuR bridge, and the hierarchical dataturbulence, we can finally perform the active hierarchical uncer-tainty quantification, as it is briefly illustrated in Figure 5. To takeadvantage of these uncertainty indicators, we accommodate thedata quality and spatiotemporal variance for period-wise guidance,and followed by a last-interval forecasting. Thus, we are expectedto minimize the following loss in Eq 22, in which we capture uncer-tainty evolution on aggregated period-levels with the former twoterms, as well as predict mobility intensity and quantify uncertainfluctuations in last interval with the latter two terms.

𝐿𝑜𝑠𝑠 (Θ) = 𝑞 +1 ∑︁ 𝑚 =0 𝑁 − ∑︁ 𝑖 =0 (( 𝑈 ( 𝑚,𝑖 ) 𝐼 − 𝜎 ( 𝑚,𝑖 ) 𝑞𝑢𝑎 ) + ( 𝑈 ( 𝑚,𝑖 ) 𝑜 − 𝑣𝑎𝑟 ( 𝑚,𝑖 ) 𝑆𝑇 ) )+ 𝑁 − ∑︁ 𝑖 =0 (( (cid:154) 𝜎 𝑇 +1 𝑖 − 𝑣𝑎𝑟 ( 𝑇 +1 ,𝑖 ) 𝑆𝑇 ) + ( (cid:154) 𝐻 𝑇 +1 𝑖 − 𝐻 𝑇 +1 𝑖 ) ) + (cid:13)(cid:13) 𝝈 𝑇 +1 (cid:13)(cid:13) (22) where Θ denotes the set of learnable parameters including all 𝜽 ∗ and 𝑾 ∗ , and ∥·∥ denotes L2-norm for regularizing uncertaintiesfrom explosion. The item 𝜎 ( 𝑚,𝑖 ) 𝑞𝑢𝑎 is enabled when data turbulenceis utilized, and we will directly learn the inherent randomness inexisting observations if turbulence is not utilized. For optimizing thealgorithm, we introduce Adam optimizer to train our STUaNet [15]. Aggregation

Context interaction learning

Neural data quality estimation

Internal content uncertainty

Content-Context LSTM

Pure data samples

Tiny drifts

Large turbulence

External context uncertainty

Spatiotemporal observation contents

Spatiotemporal variance

Multiple context factors

Uncertainty quantification

Indicator for data quality

Data corrupted

Random noise OOD

Quantified data variance

Figure 5: Illustration of active and hierarchical uncertaintylearning in C2UQ Table 1: Dataset statistics

Dataset Categroyof datasets × 𝑘𝑚 × 𝑘𝑚 We use three real-world datasets to verify the effectiveness of uncer-tainty quantification, and the statistics of these datasets are listedin Table 1.

NYC Taxi.

This dataset consists of approximate 7.5 million taxi-cab trip records including both pick-up and drop-off events fromJan 1st 2017 to May 31th 2017 in online ride-hailing services. It cantypically be an indicator of human mobility where pick-ups anddrop-offs stand for departures and arrivals in a specific region . SIP Surveillance.

This dataset contains traffic volumes at 108interactions in intelligent transportation system, covering the urbanarea of 45.5 𝑘𝑚 in Suzhou Industrial Park. We here utilize datasetfrom Jan 1st 2017 to March 31st 2017. California Check-ins in Gowalla.

It is a widely used Location-Based Social Network dataset, and contains a total of 736 k check-inrecords over the period from Feb 1st, 2009 to Oct 31st, 2010 . Wechoose the state-wide check-ins of California (Carlifo), by filter-ing longitude and latitude, and then cluster these POIs into 1,200disjointed regions.Even though the predictions and uncertainty quantification arecorrelated with the spatiotemporal scales, the solution evaluationsare orthogonal to the generalities of our proposals, based on com-mon urban division settings and fair comparison mechanisms [35].For simplicity, we fix the time interval as 30 minutes except 1 hourfor California check-ins. We organize our datasets into train-ing samples and divide the samples into 60%, 30% and 10% fortraining, testing and validation. The initial learning rate is set to0.001 with an 0.98 attenuation rate every 10 epochs. All methodsare implemented in Tensorflow 1.15.0 and trained with 2 Tesla v100GPUs. We stack 2 GCN layers and 2 LSTM layers in each spatialand sequential learning block, and instantiate 𝜌 = 0 . , 𝑝 = 6 , 𝑞 = 3 . Given the predicted point estimation (cid:99) 𝐻 𝑡𝑖 ,uncertainty quantification (cid:98) 𝜎 𝑡𝑖 , and ground truth 𝐻 𝑡𝑖 at the region 𝑖 during interval 𝑡 , we evaluate the effectiveness of our model fromaspects of both prediction accuracy and uncertainty quantifica-tion quality. Regarding prediction accuracy, we employ RMSE andMAPE for evaluation. To evaluate uncertainty learning quality and Here we utilize pick-up events for evaluation. http://snap.stanford.edu/data/loc-gowalla.html All these hyperparameters are set according to references [1, 35] and also fine-tunedcarefully, we omit the process due to limited space in this paper.

TUaNet: Understanding uncertainty in spatiotemporal collective human mobility WWW ’21, April 19–23, 2021, Ljubljana, Slovenia verify whether the predicted intervals considering uncertainty canaccurately capture the ground truth, we introduce the the predic-tion interval coverage probability (PICP) metric according to [30],which is defined as

PICP = 𝐶 𝑜𝑏 𝑗 𝑁𝑇 (23) 𝐶 𝑜𝑏 𝑗 = 𝑇 − ∑︁ 𝑡 =0 𝑁 ∑︁ 𝑖 =1 II( (cid:99) 𝐻 𝑡𝑖 − (cid:98) 𝜎 𝑡𝑖 < 𝐻 𝑡𝑖 < (cid:99) 𝐻 𝑡𝑖 + (cid:98) 𝜎 𝑡𝑖 ) (24)where II( · ) is an indicator function. In termsof forecasting tasks, we compare our STUaNet against some rep-resentative spatiotemporal prediction methods. (1) STG2Seq:

Ituses a hierarchical graph convolution model to capture both spatialand temporal dependencies for passenger demand forecasting. (2)STGCN:

This work designs a novel complete convolution structurefor comprehensive spatiotemporal correlation modeling in humanmobility. (3) MDL:

It is a state-of-the-art collective human mobilityforecasting method which inherited from ST-ResNet and simul-taneously models nodes and edges with multiple deep learningtasks.The upper half of Table 2 illustrates the forecasting comparisonresults. By carefully considering uncertainty quantification andgravity model-based mobility transitions, our method can consis-tently outperform three baselines on all datasets. More excitingly,STUaNet surpasses the best baseline DCRNN, STGCN, STG2Seq13.11%, 26.59% and 47.21% on the metric of MAPE in SIP, NYC andCalifor, respectively. We also replace the dynamic adjacent matrixwith a static distance-based matrix in our STUaNet for an ablativeevaluation, and the performance decreases on STUaNet-Static 𝑨 can prove the necessity of mobility transitions. These promising re-sults on all three datasets verify that our uncertainty quantificationis solid based on this forecasting framework. Next, we evaluatethe capacities of uncertainty estimation and prediction calibrationin different uncertainty learning baselines. We here employ fourpopular uncertainty quantification mechanisms as baselines. (1)NLL loss:

The negative log likelihood (NLL) loss is utilized toperform station-level Numerical Weather Forecasting (NWF) andthe associated uncertainty quantification [30]. (2) Dropout-basedBNN:

We realize this BNN method with dropout [7, 8], and thismechanism is widely applied in uncertainty quantification for nu-merous risk-sensitive tasks, ranging from computer vision [14, 18]to NWF [21, 29]. (3) DeepEnsembles:

We perform the uncertaintylearning with the ensemble method which trains a series of neu-ral networks with different initializations [17] . (4) SDE method: This is a state-of-the-art uncertainty learning model with injec-tions of noise and OOD samples, and we reproduce this method byreferring [16].All numerical results on uncertainty quantification are reportedin the bottom half of Table 2. Overall, the proposed STUaNet withC2UQ achieves best performance on almost all metrics over threedatasets regarding both forecasting and uncertainty learning. In-tegrated with C2QU, STUaNet improves the PICP in SIP from68.83% of DeepEnsembles to 80.74%, increasing 17.30%, and can The number of ensembled networks is set as 5, according to [17]. also obtain comparable accuracy with DeepEnsembles in both NYCand Califor. The slight decrease of PICP in NYC may be attrib-uted to its imbalanced distribution of mobility. In addition, almostall uncertainty-aware forecasting can perform better than non-uncertainty-equipped methods on MAPE metric, which demon-strates the necessity and superiority of spatiotemporal uncertaintyquantification. And the higher prediction accuracy of our C2UQcan boil down to GMuR bridge with prediction re-calibrations.For a detailed analysis, SDE surpasses all other baselines on fore-casting metrics. Its core idea can be viewed as a denoise autoencodermechanism where noisy and pure observations are trained alter-nately. The superior results illustrate the effectiveness of alternatetraining between in-distribution, noisy and OOD samples. However,it has a relatively lower PICP due to lacking uncertainty labels forexact quantified uncertainty learning. The uncertainty quantifica-tion with only negative log likelihood performs worse than othermethods on predictions and shows an instable learning process.This further provides evidence for the intuition of separating learn-ing uncertainty and prediction values. The relatively higher PICPsof NLL and DeepEnsembles are mostly because NLL and ensemblesusually derive a larger uncertainty without any guidance. Monte-Carlo Dropout-based BNN and DeepEnsembles illustrate a stabletraining process and can achieve favorable performances, whichbenefits from the ensembled mechanisms where statistical momentestimations are employed for uncertainty quantification.In summary, we argue that these methods are less effective thanours on uncertainty learning in two aspects. First, they are nottailored for spatiotemporal modeling, whichs fail to extract spa-tiotemporal evolutions and content-context interactions. Second,ensembles usually require multiple times of training which costmuch memory and computation while our C2QU enjoys the ef-ficiency of one-time training. Finally, we also integrate the well-performed STG2Seq with our C2UQ for uncertainty quantificationand the results demonstrate the scalability and generality of C2UQ.

To pro-vide an intuitive visualization of our uncertainty quantificationquality, we choose one typical region in each dataset to illustratethe interval-level prediction and uncertainty results in Figure 6. Byfocusing on the microscopic perspective of our results, we find thatthe predicted uncertainties are mostly consistent with the predic-tion errors. As observed, the prediction interval doesn’t becomewider or fluctuate heavily over time, and instead, it presents thatthe widths during nights are mostly narrower than daylights. Themain reason lies in that the mobilities are more stable at nights. Thecircled inaccurate predictions and large uncertainty fluctuationsare evening and morning peak hours in SIP and NYC where bothsuffer rains. We can infer that the concurrent contextual scenarioslike rains occur less frequently in training sets and thus increasethe uncertainty on out-of-distribution observations. Thanks to un-certainty prediction, these uncertainties in predicted results can beeffectively exhibited for a more reliable decision, thus citizens andadministrations can prepare well for the possible uncertainty con-ditions beforehand. Hence, our uncertainty-aware spatiotemporalforecasting can provide more informative and valuable quantifieddecision-making basis for urban trip planning and city safety.

WW ’21, April 19–23, 2021, Ljubljana, Slovenia Zhengyang Zhou, Yang Wang, Xike Xie, Lei Qiao and Yuantao Li

Table 2: Performance comparisons on three datasets

Methods SIP NYC CaliforniaRMSE MAPE PICP RMSE MAPE PICP RMSE MAPE PICPBaseline forspatiotemporalprediction STG2Seq 3.905 31.66% - 2.623 17.61% - 2.545 30.50% -STGCN 4.509 35.84% - 2.102 16.47% - 2.621 32.48% -MDL 2.967 34.45% - 2.984 18.62% - 1.917 35.39% -DCRNN 4.750 20.21% - 3.185 37.45% - 4.120 40.02% -STUaNet-Static 𝑨 STUaNet(Ours) 2.942 17.56% 80.74% 1.624 11.79% 72.14% 2.586 16.10% 88.46%

STG2Seq+C2UQ 3.528 22.41% 78.64% 2.320 12.96% 70.12% 2.548 22.52% 78.80% (a) Jinji Lake of SIP on Jan 17th, 2017 (c) Chelsea Market of NYC on May 25th, 2017 (b) Art Museum of California on June 1st, 2010

Figure 6: Quality visualization of uncertainty quantificationin different datasets

We conduct ablation studies to test the sensitivity of each compo-nent in our integrated STUaNet. We successively remove typicalcomponents as variants of C2UQ.

C2UQ-1:

Remove the neural data-quality estimation.

C2UQ-2:

Omit the FM-GCN and spatiotempo-ral variance components.

C2UQ-3:

Expurgate the GMuR bridge.

C2UQ-4:

Omit the alternate training process and only train onpure dataset.Table 3 illustrates the performance of four ablative variants onthree datasets. From the quantitative results, we can see the inte-grated C2UQ outperforms all its variants. In particular, SIP andNYC datasets are more sensitive to data quality estimation andGMuR modules, while performances on Califor dataset are largelyimproved by alternate training process. After removing the FM-GCN module, it becomes difficult to capture uncertainty withoutconsidering the interactions of context factors and variance-basedweak supervised information, by illustrating a prominent decreasedperformance on three datasets. In contrast, our C2UQ can encour-age larger uncertainty for higher spatiotemporal variance and viceversa, where we can actively learn uncertainties. The decreasedperformance of C2UQ-3 and C2UQ-4 demonstrate the success ofre-calibrating spatiotemporal predictions with uncertainty-awaremechanism, and eventually facilitates uncertainty quantificationtasks with weak supervised learning.

Table 3: Performances of ablative variants on three datasets

Variants SIP NYC CaliforMAPE/PICP MAPE/PICP MAPE/PICPC2UQ-1 33.13%/73.35% 28.70%/58.07% 9.24%/72.40%C2UQ-2 21.29%/70.21% 27.96%/60.59% 14.39%/69.31%C2UQ-3 28.20%/63.44% 10.65%/63.39% 22.93%/67.43%C2UQ-4 24.86%/80.80% 36.20%/71.50% 20.78%/67.00%

C2UQ 17.56%/80.74% 11.79%/72.14% 16.10%/88.46%

A higher uncertainty indicates that the prediction model is notconfident about the predicted value or there exists a large dispersionamong its historical observations under this context. In this section,we generate the mobility and uncertainty maps with STUaNetfrom test sets, to investigate how can they benefit diverse webapplications. (1) Urban event detection and prediction.

Figure 7(a) illus-trates the urban situation of peak morning hours on March 2nd.Under the context of sunny morning, these three highlighted re-gions with both high uncertainties and mobility intensities canbe interpreted as urban events like congestions, which are furtherverified in ground truth. These predictions motivate travelers tore-plan their routes and urge traffic agency to proactively evacuatecrowds to avoid urban safety concerns like accidents and spread ofpandemics. With a heavy rain, the increasing uncertainties acrossurban regions in subfigure (b) reveal our model lacking the confi-dence in such prediction, due to the rare weather event and com-plicated context interactions. This not only reflects the principle ofepistemic uncertainty, also verifies the common practice that theincreasing probability of burstiness like accidents on rainy dayscan reasonably contribute to mobility uncertainties. Therefore, itis of great significance to provide uncertainty-aware predictionswhich actively prevent misleading decision-makings. (2) Miningpotential commercial interests.

As shown in subfigure (c), thereexhibits an expansive coverage of higher uncertainty with moderatemobility intensity around Jinji Commercial Center, which impliesthe potential mobility fluctuations during following intervals. Forbusinessmen, crowds are profits, thus they can take advantage ofthese uncertainties and preferable weather, to maximumly motivate

TUaNet: Understanding uncertainty in spatiotemporal collective human mobility WWW ’21, April 19–23, 2021, Ljubljana, Slovenia (a) March 2 nd , Thurs, 08:00-08:30 (c) March 26 st , Sun, 12:00-12:30 (b) March 17 th , Fri, 17:00-17:30 Congestions

Accident

Potential hot spot M o b ili t y & E v e n t s U n cer t a i n t y A r t er i a l r oa d s Arterial roads

Commercial center

Jin ji CBD &

Commercial center

Residential area

Residential area A r t er i a l r oa d s A B Figure 7: Uncertainty studies in SIP. Solid circles notate the ground-truth urban events while dashed circles highlight regionswith high uncertainties for comparison. (a) Number of layers in GCN block (b) Number of layers in LSTM (c) Proportion of transition pattern 𝝆 Figure 8: Performance on different parameter settings buying desires of consumers and drive the uncertainty into a posi-tive increase of intensities, by propagandizing the sale promotionswith online web applications. (3) Deeper understanding the na-ture of human mobility.

We can also discover several interestingphenomena. Firstly, we identify that commercial centers are moresensitive to weather changes while arterial roads are more stable tocontext, and particularly the Jinji Circle is mostly with high uncer-tainties as it may experience the quick and dynamic flow changingfor its integratedly complicated functionalities. Secondly, we canalso provide urban planning suggestions for regions A and B tobuild some commercial complex for attracting the mobilities as theyare currently with both lower uncertainty and volume intensity.By uncertainty learning, we can dive deeper into human mobility,uncover the potential intentions and facilitate the urban planningand human-centered computing for a better life.

To investigate how different values of hyperparameters impact theprediction performance, we show the hyperparameter studies here.The hyperparameters are three-fold here, i.e., the number of GCN layers, the number of LSTM layers and 𝜌 in adjacent matrix. Weshow the fine-tuning process in Figure 8 and for simplicity, we onlycompare the metric of MAPE in regression tasks which is more fairand intuitive for different datasets. Finally, we stack 2 GCN blocks,2 LSTM layers, and set 𝜌 = 0.6 on all three dataset learning tasks. STUaNet, which internalizes the uncertainties into the model fromthe perspectives of internal content consistency, external contextinteractions and temporal evolutions, is a pioneering attempt on spa-tiotemporal uncertainty quantification in collective human mobil-ity. In particular, to tackle uncertainty quantification challenge, wetransfer it into a weak supervised learning and an active hierarchi-cal uncertainty learning by proposing two implicit but quantifiableuncertainty indicators. Extensive experiments on three mobility-related datasets verify the effectiveness of our proposal.For uncertainties, regardless of whether they are aleatoric orepistemic and are internal or external, these uncertainties are bothdata-dependent and model-dependent. Therefore, we burst fortha bold idea that all these uncertainties can be summarized from

WW ’21, April 19–23, 2021, Ljubljana, Slovenia Zhengyang Zhou, Yang Wang, Xike Xie, Lei Qiao and Yuantao Li an epistemic view by considering unpredictable factors and highlycomplex interactions as high-level knowledge that should be deeplylearned and understood from a long-term perspective. In future,we will further explore both the quantified regularities and un-certainties of spatiotemporal data with more basic and theoreticalanalysis, and hence explicitly optimize spatiotemporal predictionsby identifying the sources of deductible uncertainties.

ACKNOWLEDGMENTS

This work is partially supported by Zhejiang Lab’s InternationalTalent Fund for Young Professionals, Anhui Science Foundation forDistinguished Young Scholars (No.1908085J24), NSFC (No.62072427,No.61672487, No.61772492), and Jiangsu Natural Science Founda-tion (No.BK20191193).

REFERENCES [1] Lei Bai, Lina Yao, Salil S Kanhere, Xianzhi Wang, and Quan Z Sheng. 2019.STG2seq: spatial-temporal graph to sequence model for multi-step passengerdemand forecasting. In . International Joint Conferences on Artificial Intelligence, 1981–1987.[2] Bei Chen, Fabio Pinelli, Mathieu Sinn, Adi Botea, and Francesco Calabrese. 2013.Uncertainty in urban mobility: Predicting waiting times for shared bicycles andparking lots. In . IEEE, 53–58.[3] Huimin Chen, Zeyu Zhu, Fanchao Qi, Yining Ye, Zhiyuan Liu, Maosong Sun, andJianbin Jin. 2020. Country Image in COVID-19 Pandemic: A Case Study of China.

IEEE Transactions on Big Data (2020).[4] Armen Der Kiureghian and Ove Ditlevsen. 2009. Aleatory or epistemic? Does itmatter?

Structural safety

31, 2 (2009), 105–112.[5] Jie Feng, Yong Li, Zeyu Yang, Qiang Qiu, and Depeng Jin. 2020. PredictingHuman Mobility with Semantic Motivation via Multi-task Attentional RecurrentNetworks.

IEEE Transactions on Knowledge and Data Engineering (2020).[6] Jie Feng, Yong Li, Chao Zhang, Funing Sun, Fanchao Meng, Ang Guo, and DepengJin. 2018. Deepmove: Predicting human mobility with attentional recurrentnetworks. In

Proceedings of the 2018 world wide web conference . 1459–1468.[7] Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation:Representing model uncertainty in deep learning. In international conference onmachine learning . 1050–1059.[8] Yarin Gal, Jiri Hron, and Alex Kendall. 2017. Concrete dropout. In

Advances inneural information processing systems . 3581–3590.[9] Wei Geng and Guang Yang. 2017. Partial correlation between spatial and temporalregularities of human mobility.

Scientific reports

7, 1 (2017), 1–9.[10] Shengnan Guo, Youfang Lin, Ning Feng, Chao Song, and Huaiyu Wan. 2019.Attention based spatial-temporal graph convolutional networks for traffic flowforecasting. In

Proceedings of the AAAI Conference on Artificial Intelligence , Vol. 33.922–929.[11] Danijar Hafner, Dustin Tran, Timothy Lillicrap, Alex Irpan, and James Davidson.2020. Noise contrastive priors for functional uncertainty. In

Uncertainty inArtificial Intelligence . PMLR, 905–914.[12] Suining He and Kang G Shin. 2020. Dynamic Flow Distribution Prediction forUrban Dockless E-Scooter Sharing Reconfiguration. In

Proceedings of The WebConference 2020 . 133–143.[13] Edin Lind Ikanovic and Anders Mollgaard. 2017. An alternative approach to thelimits of predictability in human mobility.

EPJ Data Science

6, 1 (2017), 12.[14] Alex Kendall and Yarin Gal. 2017. What uncertainties do we need in bayesiandeep learning for computer vision?. In

Advances in neural information processingsystems . 5574–5584.[15] Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic opti-mization. In

International Conference for Learning Representations .[16] Lingkai Kong, Jimeng Sun, and Chao Zhang. 2020. SDE-Net: Equipping DeepNeural Networks with Uncertainty Estimates. arXiv preprint arXiv:2008.10546 (2020).[17] Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. 2017. Simpleand scalable predictive uncertainty estimation using deep ensembles. In

Advancesin neural information processing systems . 6402–6413. [18] Christian Leibig, Vaneeda Allken, Murat Seçkin Ayhan, Philipp Berens, andSiegfried Wahl. 2017. Leveraging uncertainty information from deep neuralnetworks for disease detection.

Scientific reports

7, 1 (2017), 1–14.[19] Jianxun Lian, Xiaohuan Zhou, Fuzheng Zhang, Zhongxia Chen, Xing Xie, andGuangzhong Sun. 2018. xdeepfm: Combining explicit and implicit feature in-teractions for recommender systems. In

Proceedings of the 24th ACM SIGKDDInternational Conference on Knowledge Discovery & Data Mining . ACM, 1754–1763.[20] Dongliang Liao, Weiqing Liu, Yuan Zhong, Jing Li, and Guowei Wang. 2018. Pre-dicting Activity and Location with Multi-task Context Aware Recurrent NeuralNetwork.. In

IJCAI . 3435–3441.[21] Yongqi Liu, Hui Qin, Zhendong Zhang, Shaoqian Pei, Zhiqiang Jiang, ZhongkaiFeng, and Jianzhong Zhou. 2020. Probabilistic spatiotemporal wind speed fore-casting based on a variational Bayesian deep learning model.

Applied Energy

253 (2019), 113596.[23] Xin Lu, Erik Wetter, Nita Bharti, Andrew J Tatem, and Linus Bengtsson. 2013.Approaching the limit of predictability in human mobility.

Scientific reports

Entropy

21, 2 (2019), 184.[25] Janis Postels, Francesco Ferroni, Huseyin Coskun, Nassir Navab, and FedericoTombari. 2019. Sampling-free epistemic uncertainty estimation using approxi-mated variance propagation. In

Proceedings of the IEEE International Conferenceon Computer Vision . 2931–2940.[26] Chaoming Song, Zehui Qu, Nicholas Blumm, and Albert-László Barabási. 2010.Limits of predictability in human mobility.

Science

Proceedings of the AAAIConference on Artificial Intelligence , Vol. 34. 214–221.[28] Erlend Tossebro and Mads Nygård. 2002. Uncertainty in spatiotemporal databases.In

International Conference on Advances in Information Systems . Springer, 43–53.[29] Thomas Vandal, Evan Kodra, Jennifer Dy, Sangram Ganguly, RamakrishnaNemani, and Auroop R Ganguly. 2018. Quantifying uncertainty in discrete-continuous and skewed data with Bayesian deep learning. In

Proceedings ofthe 24th ACM SIGKDD International Conference on Knowledge Discovery & DataMining . 2377–2386.[30] Bin Wang, Jie Lu, Zheng Yan, Huaishao Luo, Tianrui Li, Yu Zheng, and GuangquanZhang. 2019. Deep uncertainty quantification: A machine learning approachfor weather forecasting. In

Proceedings of the 25th ACM SIGKDD InternationalConference on Knowledge Discovery & Data Mining . 2087–2095.[31] Huandong Wang, Sihan Zeng, Yong Li, and Depeng Jin. 2020. Predictability andprediction of human mobility based on application-collected location data.

IEEETransactions on Mobile Computing (2020).[32] Huaxiu Yao, Yiding Liu, Ying Wei, Xianfeng Tang, and Zhenhui Li. 2019. Learningfrom Multiple Cities: A Meta-Learning Approach for Spatial-Temporal Predic-tion. In

The World Wide Web Conference (San Francisco, CA, USA) (WWW ’19) .Association for Computing Machinery, New York, NY, USA, 2181–2191.[33] Junchen Ye, Leilei Sun, Bowen Du, Yanjie Fu, Xinran Tong, and Hui Xiong. 2019.Co-prediction of multiple transportation demands based on deep spatio-temporalneural network. In

Proceedings of the 25th ACM SIGKDD International Conferenceon Knowledge Discovery & Data Mining . 305–313.[34] Bing Yu, Haoteng Yin, and Zhanxing Zhu. 2018. Spatio-temporal graph convolu-tional networks: a deep learning framework for traffic forecasting. In

Proceedingsof the 27th International Joint Conference on Artificial Intelligence . 3634–3640.[35] Junbo Zhang, Yu Zheng, and Dekang Qi. 2017. Deep spatio-temporal residualnetworks for citywide crowd flows prediction. In

Proceedings of the Thirty-FirstAAAI Conference on Artificial Intelligence . 1655–1661.[36] Junbo Zhang, Yu Zheng, Junkai Sun, and Dekang Qi. 2019. Flow prediction inspatio-temporal networks based on multitask deep learning.

IEEE Transactionson Knowledge and Data Engineering

32, 3 (2019), 468–478.[37] Pengpeng Zhao, Anjing Luo, Yanchi Liu, Fuzhen Zhuang, Jiajie Xu, Zhixu Li,Victor S Sheng, and Xiaofang Zhou. 2020. Where to go next: A spatio-temporalgated network for next poi recommendation.

IEEE Transactions on Knowledgeand Data Engineering (2020).[38] George Kingsley Zipf. 1946. The P 1 P 2/D hypothesis: on the intercity movementof persons.