[PDF] Assessing the contagiousness of mass shootings with nonparametric Hawkes processes

Abstract

Gun violence and mass shootings are high-profile epidemiological issues facing the United States with questions regarding their contagiousness gaining prevalence in news media. Through the use of nonparametric Hawkes processes, we examine the evidence for the existence of contagiousness within a catalog of mass shootings and highlight the broader benefits of using such nonparametric point process models in modeling the occurrence of such events.

Full PDF

AAssessing the contagiousness of mass shootings withnonparametric Hawkes processes

Peter Boyd, James MolyneuxOregon State University, Department of StatisticsSeptember 18, 2020

Abstract

Gun violence and mass shootings are high-proﬁle epidemiological issues facing the UnitedStates with questions regarding their contagiousness gaining prevalence in news media. Throughthe use of nonparametric Hawkes processes, we examine the evidence for the existence of con-tagiousness within a catalog of mass shootings and highlight the broader beneﬁts of using suchnonparametric point process models in modeling the occurrence of such events.

Gun violence in the United States is a national public heath crisis [Bauchner et al., 2017] withﬁrearm homicide rates 19.5 times that of other high-income countries [Grinshteyn and Hemenway,2011]. Mass shootings in particular represent a phenomenon of interest in that these high-proﬁleevents with multiple, and occasionally numerous, victims generate large amounts of media cover-age. Such media coverage may lead to both a contagion eﬀect that may incite others to carry outsimilar acts as well as an imitation eﬀect that may allow mass shooters to learn from those thatpreceded them [Meindl and Ivy, 2017]. Though the term mass shooting lacks a speciﬁc, rigorousdeﬁnition, the number of gun related incidences with multiple victims has become so common inthe past two decades that research of these events has become a necessary component of publichealth studies in the United States [Dzau and Leshner, 2018]. From 2000 to 2018, the US Federal1 a r X i v : . [ s t a t . A P ] S e p ureau of Investigation (FBI) recorded 277 active shooter incidents in which an individual shootsand kills (or attempts to kill) others in a public space, resulting in 2430 casualties [Federal Bureauof Investigation, 2016]. The FBI further notes that the number of such incidents is on the rise, with69% of these incidents occurring between 2010 and 2018. The need to address the contagion factorof these events, whereby a single mass shooting event inspires or is correlated with future massshooting events, represents a fundamental question in the underlying mass shooting phenomenon.Previous research by Jetter and Walker [2018] proposed that the ideation and implementation ofmass shootings are linked to media coverage of such events. A contagion factor was also previouslyfound by Towers et al. [2015] which used a self-excitation contagion model to quantify the degree towhich previous events inspired future events. In their work, Towers et al. [2015] model the increasedprobability of a mass shooting event occurring on day t j given a previous event occurred on day t i , t i < t j , and the average duration of the contagion process T excite using an exponential probabilitydistribution. That is, the probability of a new mass shooting event occurring sometime in the 24hours of day t j is expressed as P ( t j | t i , T excite ) = (cid:90) t j − t i t j − t i − dx e − x/T excite T excite . Towers et al. [2015] then couple this probability model with a non-contagion related baseline numberof events, N ( t ) , and a total number of expected secondary events, N secondary , to compute anexpected number of events, N exp , on day t n expressed as N exp ( t n ) = N ( t n ) + N secondary (cid:88) i : t i

A point process is a random collection of points { τ , τ , . . . } occurring in some metric space [Daleyand Vere-Jones, 2004]. These points often occur in some temporal or spatio-temporal window where t i ∈ R represents the temporal dimension of the i th point and s i ∈ R n represents the spatial coordi-nates of the i th point. In practice, R n is often taken to be R or R so that the spatial coordinatesland on some two-dimensional plane or three-dimensional space where the third dimension can betaken to be the depth of the point. For our purposes, we consider the occurrence of mass shootingevents to be a collection of n marked spatio-temporal points, { ( t i , x i , y i , m i ) : i = 1 , , . . . , n } , such3hat t i ∈ [0 , T ] represents the time the event occurred with 0 and T taken to be the start and end ofthe temporal window, respectively, and ( x i , y i ) ∈ [ −∞ , ∞ ] × [ −∞ , ∞ ] represents the spatial locationof the event. The mark, m i , of point i is then some additional covariate information which we taketo be the number of victims, excluding the perpetrator, of the i th mass shooting event. For themarks of the process, we deﬁne the number of victims to be the number of individuals either killedor injured during the shooting. In deﬁning the marks in this way, we intend to measure how eventswith diﬀerent numbers of victims impacts the ability of an event to incite future events.In general, point processes are typically modeled via their conditional intensity function, λ ( t ) or λ ( s, t ) for time and space-time point processes, respectively. The conditional intensity is deﬁned asthe inﬁnitesimal expected rate at which points occur given the history of the processes, H t . Thatis, we model the occurrence of points in time as λ ( t |H t ) = lim ∆ t → E [ N ([ t, t + ∆ t )) |H t ]∆ t or in space-time as λ ( s, t |H t ) = lim ∆ s, ∆ t → E [ N (( s, s + ∆ s ) × ( t, t + ∆ t )) |H t ]∆ s ∆ t where N ( · ) is taken to be a counting measure [Daley and Vere-Jones, 2004].In what follows, we introduce the self-exciting, or Hawkes, point process and then elaboratefurther on the estimation and evaluation of the nonparametric version of the processes. When the occurrence of a point causes the temporary elevation in the occurrence of future pointsnearby in time or space and time, we refer to such a process as a self-exciting point process.Foundational work in self-exciting point processes was done by Hawkes [1971] who deﬁned theconditional intensity as λ ( t |H t ) = µ + (cid:88) i : t i j probability event i is a background event , i = j i < j. These probabilities can then be displayed as a lower-triangular probability matrix P with P =  p . . . p p . . . p p p . . . ... ... ... . . . p n p n p n . . . p nn  . Each row, i , of the probability matrix then describes the probability that event i was caused byevent j , i > j , or is itself a background event, i = j . Thus, each row of the probability matrix mustsum to one.After initializing the probability matrix P by setting each p i, · = 1 /i , we iterate over the followingsequence of steps until convergence has been achieved:1. Update the stationary background rate of the process by computing the expected number ofbackground events based on the estimated probabilities from the P matrix.2. Update the histogram estimators of the triggering functions for each disjoint space, time, ormark interval using the estimated probabilities from the P matrix.3. Use the now updated background rate and triggering functions to update the probabilitiesthat each event was either a background event or a child of a previous event.Convergence is achieved once the largest update to the entries of the probability matrix falls belowsome prescribed value ε . For a more detailed description of the algorithm, we refer readers to the7rticle by Fox et al. [2015].Standard errors for the histogram estimators of the triggering functions can also be computedto assess the variability of the estimates. For the case of the temporal triggering function, g ( t ) , let S (cid:96) denote a binomial random variable that represents the number of oﬀspring in bin (cid:96) , deﬁned byparameters η t , the true number of oﬀspring, and θ g(cid:96) , the true probability a triggered event falls inbin (cid:96) . We attain estimates of these parameters via ˆ η t = n (cid:88) i =1 i − (cid:88) j =1 p ij and ˆ θ g(cid:96) = (cid:88) B (cid:96) p ij / ˆ η t for p ij equal to the triggering probability of the matrix P upon convergence of the MISD algorithmand B (cid:96) equal to the set of all events whose time diﬀerences fall within bin (cid:96) [Fox et al., 2015]. As aresult, we estimate the variance of the value ˆ g ( t ) = g (cid:96) = S (cid:96) / (∆ t (cid:96) η t ) by (cid:100) V ar ( g (cid:96) ) = (ˆ θ g(cid:96) )(1 − ˆ θ g(cid:96) )ˆ η t ∆ t (cid:96) . Standard errors for ˆ k ( m ) = k (cid:96) can be found similarly as (cid:100) V ar ( k (cid:96) ) = ˆ n t (ˆ θ k(cid:96) )(1 − ˆ θ k(cid:96) )( n mark(cid:96) ) where ˆ θ k(cid:96) = (cid:80) A (cid:96) p ij / ˆ η t for A (cid:96) equal to the set of events whose marks fall within the (cid:96) th marks binand N mark(cid:96) equal to the number of events in bin (cid:96) . Super-thinning [Clements et al., 2012] is a hybrid approach of two combined model evaluation tech-niques for point processes: residual thinning and superpositioning. For residual thinning, event i is kept in the realized set of points, S , with probability b/ ˆ λ ( s i , t i ) for b = inf ( s,t ) ∈S (cid:110) ˆ λ ( s, t ) (cid:111) andremoved from the data otherwise, where ˆ λ represents the estimated conditional intensity of a pointin S [Schoenberg, 2003]. Superpositioning meanwhile ﬁrst simulates a point process with intensity8 − ˆ λ ( s, t ) , where b = sup ( s,t ) ∈S (cid:110) ˆ λ ( s, t ) (cid:111) and then superposes these points into the data [Matthes,1988]. For both thinning and superpositioning, the resulting residual process, R , will be a homo-geneous Poisson process if and only if the model for the conditional intensity, λ , is correct. Byusing the hybrid approach of super-thinning, in that a point process is both thinned in areas ofhigh conditional intensity and superposed points are included in areas of low intensity to form theresidual process, the resulting set of points will have a higher power and lower volatility.The process then for super-thinning a point process, S , is as follows:1. Thin S by retaining each point ( s i , t i ) with probability min { b/ ˆ λ ( s i , t i ) , } .2. Simulate a point process with rate max { b − ˆ λ ( s, t ) , } at point ( s, t ) .3. Combine the two resulting processes above to form the super-thinned residual process, R .The value of b is used to adjust how much thinning or superposing takes place. Once a residualprocess is obtained, it can be examined for uniformity. If the model speciﬁcation is correct then theresidual process should have a uniform distribution throughout the time window. Data availability on mass shootings is limited with no deﬁnitive collection of incidents reported bya public entity, in part due to the 1996 Dickey Amendment mandating that the injury preventionfunds at Centers for Disease Control and Prevention (CDC) cannot be used to advocate or promotegun control [1996]. A 2018 spending bill clariﬁed the language of the Dickey Amendment, allowingthe CDC to research gun violence, which was believed to be barred by the amendment, while stip-ulating that government funds may not be used for gun control advocacy [DeBonis and O’Keefe,2018]. Additionally, the United States government has no deﬁnition for a mass shooting but doesdeﬁne a mass killing as an incident in which a single perpetrator kills at least three people in apublic space; this deﬁnition is consequently extended to the deﬁnition of a mass shooting by variousentities that compile data for the purpose of studying mass shootings.9everal private institutes and organizations have established publicly available data repositoriesthat will be used in this study. Four data sets of mass shootings in the United States were utilized,and only events occurring in the continental United States were considered in analyses. The datasets diﬀer in observation periods in addition to their deﬁnitions as to what constitutes a massshooting . Data compilation diﬀerences lead to large diﬀerences in total number of observations.Further discrepancies are acknowledged below and summarized in Table 1. Plots containing thenumber of mass shootings per month are displayed in Figure 1, with each plot displaying the sametime window. Figure 2 displays the distribution of the number of victims from events in each dataset.

The Brady Campaign ( ) is a nonproﬁt group advocating for guncontrol and striving to end gun violence. The organization is named after James Brady, a cabinetmember during the Ronald Reagan presidency who was shot during the assassination attempt onthe president. Brady, left permanently disabled from the gunshot, became an advocate for guncontrol. The group has compiled data including incidents in which at least three people wereshot or injured, but not necessarily killed. The data spans from February 2005 to January 2013,containing a total of 477 incidents. The Brady Campaign data set used in this article is alsoused in the Towers et al. [2015] analysis to allow for comparison of results. Data can be accessedhere: https://journals.plos.org/plosone/article/file?type=supplementary&id=info:doi/10.1371/journal.pone.0117259.s002 . The Stanford Mass Shootings in America data was compiled in an eﬀort to create a comprehensivecollection of mass shooting data in the United States. Incidents included involve three or morepeople shot, but not necessarily killed. The data ranges from August 1966 to June 2016 whenmaintenance and updates to the database were halted. We utilize data beginning in January 1999,with Columbine happening months later on April 20, 1999, to study the occurrence of mass shootingsas a more modern phenomenon. The data originally contained 335 observations, but was reduced to262 to reﬂect the altered starting date. Data can be accessed here: https://library.stanford. du/projects/mass-shootings-america . Gun Violence Archive (GVA) ( ) is a nonproﬁt group thatcompiles records of gun related incidents in the United States. Incidents recorded involved four ormore people shot but not necessarily killed. New records are updated in near real time, with dataranging from January 2012 until the present. While some data sets exclude events such as gangviolence, GVA does not set any limiting terms to their deﬁnition of a mass shooting other thanthe number of individuals shot and killed, leading to a data set that contains a greater numberof events. For events in which the perpetrator is killed or commits suicide during the shooting,GVA also diﬀers from the other data sets in that the perpetrator is included in the number of totalvictims.

Mother Jones is an investigative journalism organization that has compiled a collection of massshootings under stricter criterion than others. With data ranging from 1982 until the present,Mother Jones initially recorded only incidents in which four or more people were killed. Whenthe United States government redeﬁned a mass killing to involve three or more people, MotherJones followed suit, redeﬁning the criterion for the database. For this analysis, the data will bereduced to events taking place on or after January 1, 1999. Data can be accessed here: .Dataset Beginning Date End Date Deﬁnition ObservationsBrady February 2005 January 2013 3+ killed 477Stanford January 1999 June 2016 3+ shot 262Mother Jones January 1999 February 2020 3+ killed 92GVA January 2012 December 2019 4+ shot 2024Table 1: Summaries for each data set used in the analysis including the time window of data used inthe analyses, deﬁnition of what constitutes a mass shooting, and the number of observations fallingin the time window. 11

Year

Brady

Year

GVA

Year

Mother Jones

Year

Stanford N u m be r o f E v en t s pe r M on t h Figure 1: Monthly totals of the number of mass shootings for each data set.

Mother Jones StanfordBrady GVA − − − −

10 11 −

15 16 −

20 21 −

50 50 + − − − −

10 11 −

15 16 −

20 21 −

50 50 + Number of victims P r opo r t i on o f da t a s e t Figure 2: Distribution of the number of victims for events in each data set.

Nonparametric Hawkes processes were ﬁt to each data set listed in Section 3 using the MISD algo-rithm to estimate their conditional intensity functions. Initially, the spatial triggering componentwas included in the conditional intensity function but was later dropped as the spatial triggeringcomponent was found to have little to no eﬀect in triggering subsequent events. The remainingresults focus on the triggering of the temporal and mark components, g ( t ) and k ( m ) respectively.Intervals for the temporal triggering function were chosen to reﬂect natural breaks in the inter-event12ime diﬀerences, i.e. 2 weeks, 3 months, 6 months, +1 year, while the intervals for the marks trig-gering function were selected using quantiles to roughly allocate an equal number of events into eachinterval based on the number of victims. With discrete mark values, an exactly uniform division ofevents into quantiles could not be realized as certain values accounted for a large proportion of thedata that would otherwise have spanned several quantiles, speciﬁcally in the GVA data in whichroughly 55% of incidents involved four victims.Dataset Diagonal Mass Background Rate Number Oﬀspring Number 13 Day OﬀspringBrady 10.39% 0.0168 0.8980 0.1913Stanford 28.42% 0.0117 0.7186 0.4140GVA 34.87% 0.3484 0.6516 0.6146Mother Jones 54.58% 0.0065 0.4592 0.0043Table 2: Numeric summaries of implementing the MISD algorithm for each data set. Diagonalmass indicates the percent of the probability matrix P that lies on the main diagonal. Backgroundrate is the estimated background rate of the data catalog. Number of oﬀspring is the estimatednumber of events that are triggered oﬀspring of previous events, and number of 13 day oﬀspring isthe estimated number of oﬀspring occurring within 13 days of an event.The diagonal mass of the probability matrix P , estimated background rate, average numberof oﬀspring events, and average number of oﬀspring events occurring in the ﬁrst two weeks aredisplayed in Table 2. For most data sets, the majority of events are probabilistically treated as trig-gered events, with background events making up roughly 10% to 55% of observed mass shootings.The estimated background rate for the Brady, Stanford and Mother Jones data sets are estimatedto be between 0.007 to 0.017 mass shooting events per day while the background rate for GVA issubstantially larger with an estimated daily rate of mass shootings of 0.35.For the Brady data set, the model estimated the expected number of oﬀspring per mass shootingevent to be roughly 0.90 events with 0.19 of those events, occurring in the ﬁrst two weeks. Thisthen implies that for an event in the Brady data, 21% of the oﬀspring events occur in the ﬁrsttwo weeks with the remaining 79% of events occurring sometime afterward. The Stanford datahad an estimated expected number of oﬀspring per event of 0.72 with just over half, 0.41, of theseevents occurring in the ﬁrst two weeks. The GVA data set had a slightly smaller overall expected13umber of oﬀspring per event than Brady or Stanford with an expected number of 0.65 child events.However, the overwhelming majority, approximately 94%, of the oﬀspring events occurred in theﬁrst two weeks. Meanwhile, the Mother Jones data had the smallest expected number of oﬀspringper events, 0.46 child events per mass shooting, yet 99% of the child events occurred more than twoweeks after the initial mass shooting.The estimated histogram estimators for the triggering functions of each data set are shown inFigures 3 - 6. For each plot, the estimated constants of the histogram estimator step functionsare shown as a horizontal line spanning the time or mark sub-interval for which the constant wasestimated. The grey vertical bars then represent ± standard errors for each estimated constantof the histogram estimator. The standard error bars are truncated at zero to reﬂect only valuesthat plausibly represent the phenomenon of interest. The temporal triggering functions, g ( t ) , aredensities and thus the areas underneath the step function represent the probabilities of child eventoccurring over some time-span. The marks triggering functions, k ( m ) , represent productivity mul-tipliers which increase or decrease the rate of triggered events based on the number of victimsimpacted in prior mass shootings. The x -axes of the temporal triggering functions are truncated asthe functions tended towards zero as t j − t i , for j > i , grew larger; x -axes of the marks triggeringfunctions are truncated shortly after the ﬁnal sub-interval as shown graphically.In general, with the exception of Mother Jones, the value of the temporal triggering function, g ( t ) , monotonically decreases as t increases to each subsequent time bin. For the Brady data, thedecrease in the temporal triggering decreases more smoothly from roughly 0.0152 to 0.0078, to0.0018 down to 0. For the Stanford and GVA data, the decay in the temporal triggering decreasesmuch more drastically; from 0.41 down to 0.0054 down to zero for the ﬁrst three time intervals inthe Stanford data and from 0.067 down to essentially zero in the ﬁrst two time intervals in the GVAdata. For the Mother Jones data, the temporal estimates of the triggering are more volatile withestimates starting around 0.0007 and 0.008 for the ﬁrst and second time interval, rises to around0.0034 in the third and fourth intervals, then ﬁnally falls to zero. The Mother Jones data is alsounique in that the estimated constants of the triggering function are much smaller in value thanthe other data sets. 14or the estimated triggering functions of the marks for the Brady data, k ( m ) had an estimatedproductivity of around 0.71 for the initial interval, and then increased to 1.43 for ﬁve victims, beforefalling to 0.91 for 6-8 victims and 0.57 for nine or more victims. The estimated mark triggeringfunctions for Stanford and GVA contain the same pattern of an initial increase followed by twodescending values. Stanford has an estimate of 0.99 for the initial bin and then jumped to 1.18 forﬁve victims, before falling to 0.41 for 6-7 victims and 0.14 beyond 7 victims. GVA begins with at0.55, increasing to 0.83 for 5 victims, then falls to 0.80 and 0.21 for 6-9 and 10+ victims, respectively.For the Mother Jones data, k ( m ) also followed a less consistent form, with the highest value of 1.24in the ﬁrst bin before falling to 0.19 for 7 - 10 victims and 0.0007 for 11 - 17 victims before risingto 0.28 for 18 or more victims. The Stanford data yielded an estimated k ( m ) that did not followa monotone pattern, beginning at 0.99 in the ﬁrst bin, increasing to 1.18 for ﬁve victims, thendecreasing to 0.41 for six or seven victims, and 0.14 for larger numbers of victims. t (time in days) g ( t ) m (number of victims) k ( m ) Figure 3: Brady Campaign triggering functions. In the ﬁgure on the left, values of the temporaltriggering function are plotted over time, with the time bins used in the analysis shown on the x axis. In the ﬁgure on the right, values of the marks triggering function are plotted over the marks(number of people injured). Standard error regions are shown in gray, and latter time bins with g ( t ) ≈ and the ﬁnal mark bin is truncated in the ﬁgure.Figures 7 - 10 show the observed number of monthly mass shootings for each data source alongwith the estimated number of monthly shootings based on the models. The estimated values arecomputed by taking the median conditional intensity for each month and multiplying it by thelength of the month. The models appear to ﬁt the data fairly well in that the estimated number of15 .000.010.020.030.04 0 14 93 t (time in days) g ( t ) m (number of victims) k ( m ) Figure 4: Stanford triggering functions. In the ﬁgure on the left, values of the temporal triggeringfunction are plotted over time, with the time bins used in the analysis shown on the x axis. In theﬁgure on the right, values of the marks triggering function are plotted over the marks (number ofpeople injured). Standard error regions are shown in gray, and latter time bins with g ( t ) ≈ andthe ﬁnal mark bin is truncated in the ﬁgure. t (time in days) g ( t ) m (number of victims) k ( m ) Figure 5: Mother Jones triggering functions. In the ﬁgure on the left, values of the temporaltriggering function are plotted over time, with the time bins used in the analysis shown on the xaxis. In the ﬁgure on the right, values of the marks triggering function are plotted over the marks(number of people injured). Standard error regions are shown in gray, and latter time bins with g ( t ) ≈ and the ﬁnal mark bin is truncated in the ﬁgure.monthly mass shootings tends to follow the trends in the the observed values. The Mother Jonesand Stanford data sets, Figures 8 and 9 respectively, contain instances where no mass shootingevents occurred over a sequence of consecutive months. For these months, the models tended to16 .000.020.040.06 0 14 93 t (time in days) g ( t ) m (magnitude) k ( m ) Figure 6: GVA triggering functions. In the ﬁgure on the left, values of the temporal triggeringfunction are plotted over time, with the time bins used in the analysis shown on the x axis. In theﬁgure on the right, values of the marks triggering function are plotted over the marks (number ofpeople injured). Standard error regions are shown in gray, and latter time bins with g ( t ) ≈ andthe ﬁnal mark bin is truncated in the ﬁgure.over-estimate the number of events as the model assumes a constant background rate. Date Number of Monthly Events

ObservedEstimated

Figure 7: Brady Campaign conditional intensity plot. The number of monthly mass shootings isplotted (solid line) over time. The median value of the estimated conditional intensity of the observedpoints is calculated for each month, multiplied by the number of days in each corresponding month,and plotted (dashed line) over time.Super-thinning was implemented to evaluate each model’s ﬁt to the individual data sets with17

Date Number of Monthly Events

ObservedEstimated

Figure 8: Stanford conditional intensity plot. The number of monthly mass shootings is plotted(solid line) over time. The median value of the estimated conditional intensity of the observedpoints is calculated for each month, multiplied by the number of days in each corresponding month,and plotted (dashed line) over time.

Date Number of Monthly Events

ObservedEstimated

Figure 9: Mother Jones conditional intensity plot. The number of monthly mass shootings is plotted(solid line) over time. The median value of the estimated conditional intensity of the observed pointsis calculated for each month, multiplied by the number of days in each corresponding month, andplotted (dashed line) over time.tuning parameter, b , set to the median estimated conditional intensity for each source. To assess theoverall ﬁt of the model, the residual process for each data set is displayed as histograms in Figures 1118 Date Number of Monthly Events

ObservedEstimated

Figure 10: GVA conditional intensity plot. The number of monthly mass shootings is plotted (solidline) over time. The median value of the estimated conditional intensity of the observed points iscalculated for each month, multiplied by the number of days in each corresponding month, andplotted (dashed line) over time.- 14. If the model ﬁts the data well, then we would expect the histograms to demonstrate a roughlyuniform distribution throughout the entire time window. Of the four data sets, the estimated modelfor the Mother Jones data appears the least uniform in shape with substantial deviations throughoutthe time-window. The residual process for the GVA data appears the most uniform overall, thoughalso with some deviations. The distributions of the Brady and Stanford deviation are somewherein the middle with many time intervals appearing roughly uniform with some systematic deviationsfor certain time periods. The residual process for the Stanford data appears to have, in general,lower values prior to 2005 and slightly higher values in the years following, while the Brady residualprocess exhibits more of a unimodal distribution with a peak in values from 2008 - 2010.

In this article, we investigate the contagiousness of mass shootings by treating the data as a markedself-exciting point process and analyze it through nonparametric Hawkes procedures. The conta-giousness of mass shootings was previously studied by Towers et al. [2015], reporting that eachmass shooting will incite at least 0.30 new events brought on by an increase in probability of eventsthat lasts for 13 days after an event. The self-excitation contagion model utilized in the Towers19

Date f r equen cy Figure 11: Brady Campaign histogram of super-thinned process. After super-thinning is imple-mented, the data are plotted over time, displaying the distribution of the super-thinned process.

Date f r equen cy Figure 12: Stanford histogram of super-thinned process. After super-thinning is implemented, thedata are plotted over time, displaying the distribution of the super-thinned process.analysis requires several parametric assumptions including assuming a distribution for the decayof contagiousness, a constant number of secondary events, and the duration of contagion process.With little research on the contagiousness of mass shootings, circumventing the reliance on para-metric assumptions through a nonparametric modeling framework is an important contribution tothe study of this devastating phenomenon. 20 .02.55.07.5 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020

Date f r equen cy Figure 13: Mother Jones histogram of super-thinned process. After super-thinning is implemented,the data are plotted over time, displaying the distribution of the super-thinned process.

Date f r equen cy Figure 14: GVA histogram of super-thinned process. After super-thinning is implemented, thedata are plotted over time, displaying the distribution of the super-thinned process.Through our nonparametric approach, we see evidence that events may produce higher numbersof oﬀspring than previous results, with estimated number of oﬀspring ranging from 0.59 to 0.86,as much as almost 3 times the value reported by Towers et al. [2015] when using the same BradyCampaign data set. We also note that a contagion eﬀect exists after 13 days with expected number21f oﬀspring ranging from 0.03, based on the Mother Jones data, up to 0.60, with the GVA dataset and 0.18 events in the Brady data. The mean of these four values is 0.29, yielding an expectednumber of oﬀspring within a 13 day period similar to the 0.30 reported by Towers et al. [2015].Also, similar to the results found in the Towers article, we noted no substantial spatial eﬀect usingthe nonparametric framework.In Figures 3 - 6, the temporal histogram estimators tended to agree that the initial two-weekperiod after a mass shooting event tended to have larger contagion eﬀects compared to time periodsafter the initial two weeks, save for Mother Jones which had a temporal histogram estimator whichwas much more volatile. This volatility might not be entirely unexpected given that the MotherJones data set had slightly more than one-third of the total number of observations compared tothe next smallest data set but featured the longest time window of all the data sets. These factorsthen imply that very few of the pairwise time diﬀerences between events in the Mother Jones datafall in the shorter time intervals. The GVA data meanwhile is by far the largest data source withthe shortest time-window and, as seen in Figure 6, shows that nearly all of the contagion factoroccurs in the ﬁrst two-weeks. This is likely due to many of the pairwise inter-event time diﬀerencesoccurring relatively quickly after previous events.The triggering functions for the marks show much less consistency between the data sets butdemonstrates the beneﬁt of allowing the expected number of secondary events to vary depending onthe size of the marks. The histogram estimator for the number of victims for the Brady campaign,Figure 3, demonstrates that mass shootings with larger numbers of victims increases the produc-tivity of those events in spurring future events. The Stanford data meanwhile, Figure 4, shows thatevents with between four to seven victims were more productive than larger events with greaterthan seven victims. A similar result was seen in the GVA data, Figure 6. Again, the histogramestimator for the Mother Jones data, Figure 5, is drastically diﬀerent compared to the rest in thatsmaller events are more productive than larger events with high victim numbers. Furthermore,while the model appears to be ﬁnding some signal in regards to how the number of victims impactsthe productivity of mass shooting events to spur future events, it should be noted that there’s aconsiderable amount of uncertainty in these estimates, as represented by the standard error bars,especially for the Stanford and GVA data. 22igures 11 - 14 show the results of super-thinning the point process models for the diﬀerent datacatalogs. By considering the uniformity of the super-thinned residual processes we can evaluate theoverall ﬁt of the models in that models that ﬁt the data well should have a uniform appearance inthe histograms. In Figure 11, we observe that the residual process for the Brady model has a uni-modal appearance rather than the desired uniform distribution. Examining Figure 15, which showsthe composition of the points for the super-thinned residual process for the Brady data, allows usto further investigate the unimodal distribution. The simulated lines at the top of the plot showsthe points which were superposed while the retained lines show the points of the original processwhich were retained after thinning. The points which were thinned are then shown at the bottomof the plots. From the ﬁgure, it is evident that the super-thinned process simulates events in areasof low intensity and removes events from areas of high intensity, but by simultaneously analyzingFigures 11 and 15, we see that a lack of suﬃcient thinning spurs departures from uniformity in thehistogram. This lack of thinning then indicates that the model was not able to capture the fullcontagion eﬀect present in the data. thinnedretainedsimulated 2006 2008 2010 2012

Year

Figure 15: Brady Campaign super-thinning plot. After super-thinning is implemented, events areplotted by their classiﬁcation type over time, indicating events that were removed (thinned) fromthe data, events that were not removed (retained), and simulated events were superposed into thedata (simulated).In Figure 12, we observe an approximately uniform distribution, save a few spikes and falls, most23otably at the end of 2010. Throughout the Stanford catalog, super-thinning appears to be perform-ing as expected, despite the abrupt increase in the number of shootings that can be seen in Figure 8.Figure 14 also displays an approximately uniform distribution after super-thinning the GVA catalog.Figure 13 shows a non-uniform distribution of super-thinned residuals for the Mother Jones data.With so few events recorded in the Mother Jones data set, well-ﬁtting models are more challengingto realize without adding further complexity to the model. In Figure 9, the frequency of observedevents appears to vary considerably over time, with 40% of events occurring in only the last ﬁveyears of the catalog. With such disparity in the frequency of events, ﬁtting a single background ratefor the entire process may oversimplify trends in the data; employing a nonconstant backgroundrate may allow for a stronger representation of the data.Varying data sets and deﬁnitions of mass shootings lead to seemingly inconsistent trends andresults across analyses; more conclusive ﬁndings may be obtained with a more consistent deﬁnitionof such events and better data collection methodologies. Comparisons of results across data sets canbe diﬃcult with data sources providing wildly diﬀerent estimates; the Gun Violence Archive reports2024 mass shootings over eight years, while the original Mother Jones data reports 118 incidentsover nearly thirty-eight years. Although Brady and Mother Jones both deﬁne mass shootings asevents in which three or more individuals are killed, the number of events in each data set arestarkly diﬀerent. The Stanford data set oﬀers the well-ﬁtting model but excludes data post 2016.The GVA and Mother Jones data, as shown in Figure 1, have an upward trend in the numberof mass shootings in later years; this trend may have also been evident in the Stanford data sethad data collection been continued, potentially oﬀering a broader understanding of mass shootingcontagion, especially in later years.Despite wildly diﬀerent data and deﬁnitions, results are consistent in that a large percentage ofmass shootings are probabilistically treated to be triggered events through the application of theMISD algorithm. Such ﬁndings support previously studied assertions that mass shootings may bemotivated by a contagion eﬀect spread through media.24

Conclusion

In this article, we assess the the contagiousness of mass shootings using a nonparametric Hawkesprocess framework for a variety of data sources. This framework relies on fewer parametric as-sumptions than previous studies and detects a contagion eﬀect which varies over both time and thenumber of victims. We also ﬁnd that the level of contagion is contingent upon the data source usedas no deﬁnitive catalog of data for mass shootings yet exists.Although the estimated conditional intensity for each process appears to closely mirror the truedata process, more complex models with additional features may yield better ﬁtting models in thefuture. Speciﬁcally, adapting a nonconstant background rate over time and/or a productivity func-tion which is allowed to vary over time would allow future models to capture temporal changes tothese two components. More complex models might also allow for the incorporation of meaningfulspatial attributes or additional relevant covariates. The models featured in this article then repre-sent a baseline approach for the modeling of mass shootings as the nonparametric framework weimplemented is extensible and able to beneﬁt from innovations made in other ﬁelds and applications.

References

Howard Bauchner, Frederick P. Rivara, Robert O. Bonow, Neil M. Bressler, Mary L. (Nora) Disis,Stephan Heckers, S. Andrew Josephson, Melina R. Kibbe, Jay F. Piccirillo, Rita F. Redberg,John S. Rhee, and June K. Robinson. Death by Gun Violence—A Public Health Crisis.

JAMAPsychiatry , 74(12):1195–1196, 12 2017. ISSN 2168-622X. doi: 10.1001/jamapsychiatry.2017.3616.URL https://doi.org/10.1001/jamapsychiatry.2017.3616 .Erin Grinshteyn and David Hemenway. Homicide, suicide, and unintentional ﬁrearm fatality: Com-paring the united states with other high-income countries, 2003.

The Journal of trauma , 70:238–43, 01 2011. doi: 10.1097/TA.0b013e3181dbaddf.James N. Meindl and Jonathan W. Ivy. Mass shootings: The role of the media in promotinggeneralized imitation.

American Journal of Public Health , 107(3):368–370, March 2017. ISSN0090-0036. doi: 10.2105/AJPH.2016.303611. 25ictor J. Dzau and Alan I. Leshner. Public health research on gun violence: Long over-due.

Annals of Internal Medicine , 168(12):876–877, 2018. doi: 10.7326/M18-0579. URL . PMID: 29554693.Federal Bureau of Investigation. 2000 to 2018 active shooter incidents, Sep 2016. URL .Michael Jetter and Jay K. Walker. The Eﬀect of Media Coverage on Mass Shootings. IZA DiscussionPapers 11900, Institute of Labor Economics (IZA), October 2018. URL https://ideas.repec.org/p/iza/izadps/dp11900.html .Sherry Towers, Andres Gomez-Lievano, Maryam Khan, Anuj Mubayi, and Carlos Castillo-Chavez.Contagion in mass killings and school shootings.

PLOS ONE , 10(7):1–12, 07 2015. doi: 10.1371/journal.pone.0117259. URL https://doi.org/10.1371/journal.pone.0117259 .David Marsan and Olivier Lengliné. Extending earthquakes’ reach through cascading.

Science , 319(5866):1076–1079, 2008. ISSN 0036-8075. doi: 10.1126/science.1148783. URL https://science.sciencemag.org/content/319/5866/1076 .Daryl J Daley and David Vere-Jones.

An Introduction to the Theory of Point Processes Volume I:Elementary Theory and Methods . Springer Science & Business Media, 2004.Alan G Hawkes. Spectra of some self-exciting and mutually exciting point processes.

Biometrika ,58(1):83–90, 1971.Yosihiko Ogata. Statistical models for earthquake occurrences and residual analysis for point pro-cesses.

Journal of the American Statistical association , 83(401):9–27, 1988.Yosihiko Ogata. Space-time point-process models for earthquake occurrences.

Annals of the Instituteof Statistical Mathematics , 50(2):379–402, 1998.Eric Warren Fox, Frederic Paik Schoenberg, and Joshua Seth Gordon. Spatially inhomogeneousbackground rate estimators and uncertainty quantiﬁcation for nonparametric hawkes point processmodels of earthquake occurrences.

Ann. Appl. Stat. , 10(3):1725–1756, 09 2016. doi: 10.1214/16-AOAS957. URL https://doi.org/10.1214/16-AOAS957 .26ingyuan Zhao, Murat A. Erdogdu, Hera Y. He, Anand Rajaraman, and Jure Leskovec. Seis-mic: A self-exciting point process model for predicting tweet popularity. In

Proceedings ofthe 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ,KDD ’15, page 1513–1522, New York, NY, USA, 2015. Association for Computing Machin-ery. ISBN 9781450336642. doi: 10.1145/2783258.2783401. URL https://doi.org/10.1145/2783258.2783401 .G. O. Mohler, M. B. Short, P. J. Brantingham, F. P. Schoenberg, and G. E. Tita. Self-exciting pointprocess modeling of crime.

Journal of the American Statistical Association , 106(493):100–108,2011. doi: 10.1198/jasa.2011.ap09546. URL https://doi.org/10.1198/jasa.2011.ap09546 .Michael D. Porter and Gentry White. Self-exciting hurdle models for terrorist activity.

Ann.Appl. Stat. , 6(1):106–124, 03 2012. doi: 10.1214/11-AOAS513. URL https://doi.org/10.1214/11-AOAS513 .Felipe Gerhard, Moritz Deger, and Wilson Truccolo. On the stability and dynamics of stochasticspiking neuron models: Nonlinear hawkes process and point process glms.

PLOS ComputationalBiology , 13(2):1–31, 02 2017. doi: 10.1371/journal.pcbi.1005390. URL https://doi.org/10.1371/journal.pcbi.1005390 .Sebastian Meyer, Johannes Elias, and Michael Hohle. A space-time conditional intensity model forinvasive meningococcal disease occurrence.

Biometrics , 68(2):607–616, 2012.Fusakichi Omori. On the aftershocks of earthquakes.

Journal of the College of Science, ImperialUniversity of Tokyo , 7:111–120, 1894.Tokuji Utsu. A statistical study on the occurrence of aftershocks.

Geophys. Mag. , 30:521–605, 1961.Beno Gutenberg and Charles F Richter. Frequency of earthquakes in california.

Bulletin of theSeismological Society of America , 34(4):185–188, 1944.Eric Warren Fox, Frederic Paik Schoenberg, and Joshua Seth Gordon. A note on nonparametricestimates of space-time hawkes point process models for earthquake occurrences. 2015.27obert Alan Clements, Frederic Paik Schoenberg, and Alejandro Veen. Evaluation of space–timepoint process models using super-thinning.

Environmetrics , 23(7):606–616, 2012. doi: 10.1002/env.2168. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/env.2168 .Frederic Paik Schoenberg. Multidimensional residual analysis of point process models for earthquakeoccurrences.

Journal of the American Statistical Association , 98(464):789–795, 2003. doi: 10.1198/016214503000000710. URL https://doi.org/10.1198/016214503000000710 .K. Matthes. Brémaud, p.: Point processes and queues. martingale dynamics. springer-verlag, berlin– heidelberg – new york 1981, 373 s., 31 abb., dm 88,–.

Biometrical Journal , 30(2):248–249,1988. doi: 10.1002/bimj.4710300220. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/bimj.4710300220 .Omnibus consolidated appropriations act.

Pub. L. No. , pages 104–208, 1996.Mike DeBonis and Ed O’Keefe. Here’s what congress is stuﬃng into its $1.3 trillion spending bill.