WLAN-Log-Based Superspreader Detection in the COVID-19 Pandemic
Cheng Zhang, Yunze Pan, Yunqi Zhang, Adam C. Champion, Zhaohui Shen, Dong Xuan, Zhiqiang Lin, Ness B. Shroff
WWLAN-Log-Based Superspreader Detectionin the COVID-19 Pandemic
Cheng Zhang , Yunze Pan , Yunqi Zhang , Adam C. Champion ,Zhaohui Shen , Dong Xuan , Zhiqiang Lin , Ness B. Shroff , Department of Computer Science and Engineering, The Ohio State University Department of Electrical and Computer Engineering, The Ohio State University VirtualKare LLC
Abstract
Identifying “superspreaders” of disease is a pressing concern for society during pandemics such asCOVID-19. Superspreaders represent a group of people who have much more social contacts than others.The widespread deployment of WLAN infrastructure enables non-invasive contact tracing via people’subiquitous mobile devices. This technology offers promise for detecting superspreaders. In this paper, wepropose a general framework for WLAN-log-based superspreader detection. In our framework, we firstuse WLAN logs to construct contact graphs by jointly considering human symmetric and asymmetricinteractions. Next, we adopt three vertex centrality measurements over the contact graphs to generatethree groups of superspreader candidates. Finally, we leverage SEIR simulation to determine groups ofsuperspreaders among these candidates, who are the most critical individuals for the spread of diseasebased on the simulation results. We have implemented our framework and evaluate it over a WLANdataset with 41 million log entries from a large-scale university. Our evaluation shows superspreadersexist on university campuses. They change over the first few weeks of a semester, but stabilize throughoutthe rest of the term. The data also demonstrate that both symmetric and asymmetric contact tracing candiscover superspreaders, but the latter performs better with daily contact graphs. Further, the evaluationshows no consistent differences among three vertex centrality measures for long-term (i.e., weekly)contact graphs, which necessitates the inclusion of SEIR simulation in our framework. We believe ourproposed framework and these results may provide timely guidance for public health administratorsregarding effective testing, intervention, and vaccination policies.
Keywords : Superspreader detection, WLAN logs, contact tracing, network analysis, COVID-19 pandemic a r X i v : . [ c s . S I] F e b Introduction
The COVID-19 pandemic has devastated many communities worldwide. The presence of the novel coro-navirus (that causes COVID-19) in a community with high population density, such as a large publicuniversity, significantly increases the risk of contracting the disease. To fight COVID-19, contact trac-ing [8, 12, 17, 26, 27] is especially important to discover active individuals, known as superspreaders , wholead to numerous COVID-19 transmission cases. Tracing human contacts to understand superspreader eventsis vital for preventing the spread of disease in communities such as university campuses, and such tracing hasthus attracted a flurry of research interest [20, 30, 35, 37].Typically, contact tracing is conducted manually [7] ( e.g. , through questionnaires and interviews), initiallycollecting necessary information from infected patients (such as locations they visited and people withwhom they had contact). Unfortunately, manual contact tracing can result in inaccurate results due topeople’s unreliable memories and long delays. Hence, to fight the COVID-19 pandemic, researchers havedeveloped numerous (partially) automated contact tracing systems. Recent efforts can be divided into twocategories: client-based and infrastructure-based . Client-based approaches require pervasive deployment ofapps on people’s mobile devices. Client-side apps leverage a wide variety of sources to track “encounters,”including records of credit card transactions [24], cryptographic tokens exchanged via Bluetooth LowEnergy (BLE) [2, 22, 29, 33], or acoustic channels [20]. In contrast, infrastructure-based methods exploitexisting infrastructure deployed worldwide, such as CCTV footage [28], locations measured using cellularnetworks [3], Wi-Fi hotspots [30], and GPS [4], without requiring client-side involvement. In this context,our paper presents an approach leveraging Wi-Fi local area network (WLAN) logs to identify potentialsuperspreaders on the campus of a large public university.
However, leveraging WLAN logs for superspreader detection is nontrivial, with two major issues. First,conventional WLAN-based solutions ( e.g. , WiFiTrace [30]) infer whether students have contacted with eachother based on their associations with specific access points (APs) during certain time intervals ( e.g. , > minutes). Such symmetric contact detection neglects an important fact: the virus carried by people whohave tested positive may infect others and replicate via pathogens in the environment. Therefore, othersmay be infected even if they linger in the environment over very short periods of time ( e.g. , < minutes).Obviously, the current definition of human contact cannot handle this scenario. Second, existing Wi-Fi-basedmethods [30] quantify a superspreader by the number of associated devices from the same access point.However, the number of contacts may be unable to truly reflect how critical an individual is to spreadingdisease amidst the population. For example, previous work on vertex centrality measurement for socialnetwork analysis [18] demonstrates that the “importance” of a specific vertex in message-passing not onlydepends on the number of connected vertices, but is relevant to the vertex’s location in social networks.Moreover, ground truth remains unknown in WLAN-based contact graphs, making it hard to understand howfast the disease propagates and progresses in order to determine superspreaders.To tackle the first issue, we introduce asymmetric contact , a new type of human contact. Two personsin asymmetric contact are not necessarily associated with specific APs for the same period. For example,assume Persons A and B are in asymmetric contact. Person A may stay with one specific AP for a shorttime (e.g., minutes) whereas Person B stays longer ( minutes). Due to Person B’s longer stay time, hegenerates a much stronger “environment” with his microbes than Person A does. If B tests positive, he mayinfect A even if the latter’s stay time is only minutes. On the other hand, A will not infect B due to her There is no scientific definition of a “superspreader”. We use a definition similar to that in [25]: superspreaders are people withfar more social connections than others, are more likely to be infected, and, if infected, will infect many more people than the median.
Campus buildings with AP deployment information (shaded) . Other buildings include: 22E. 16th Avenue, 53 W. 11th Avenue, Knight House, North Commons, Northwood-High Building, RaneyCommons, Riverwatch Tower, and the Wexner Center for the Arts (not shown). We generate the map usingMapzen [21] with OpenStreetMap data [23].short stay. Hence the contact between these two persons is asymmetric. When we count the contact number,B’s contact with A is counted, but A’s contact with B is not. The concept of asymmetric contact partiallycaptures the notion of environmental infection [11].As to the second issue, ideally, we can choose a vertex measure to determine superspreaders using eitheranalytical solutions or prior experimental tests. Unfortunately, due to the diversity of contact graphs and thecomplexity of virus propagation, it is very difficult (if not impossible) to do so. In this paper, we proposean empirical approach. We include SEIR simulation, a necessary component in our solution, to ultimatelydetermine superspreaders among the vertex-measure outputs. Specifically, we use the SEIR model to simulatethe spread of the virus, followed by adaptive interventions on groups of superspreaders identified via differentmeasures. We then finalize superspreaders who have the most crucial virus spread impacts, over the givencontact graph, according to the simulations.Incorporating the above two ideas, we propose a general framework for WLAN-log-based superspreaderdetection, which includes three key steps. First, we extract the individual’s trajectory from wireless localarea network (WLAN) logs to construct contact graphs, where vertices correspond to individual students andedges indicate physical contacts. In particular, we include both symmetric and asymmetric contact tracing toreveal potential directional interactions. Second, we adopt three vertex centrality measurements to identify The physical environment represents an important source of pathogens that can cause infections or carry antibiotic resistance.
The main findings of this work include the following: (1) We find that there is a group of studentsthat is critical in spreading the virus throughout the university’s social contact networks. (2) We show theimportance of symmetric and asymmetric contact tracing in superspreader detection. Specifically, we showthat asymmetric contact tracing helps to discover hidden superspreaders in daily contact graphs and properinterventions with identified superspreaders greatly boosts efforts to contain the spread of disease. (3) Wefind that simple betweenness centrality better reveals the most critical individuals in daily contact graphs.We do not observe notable differences between vertex centrality measures in longer-term ( i.e., weekly)contact graphs with epidemic control. (4) For resource-constrained quarantine, we observe that increasing thepercentage of the quarantined individuals to over 20% of the population yields limited extra benefits. (5) Wefind that superspreaders change heavily over the first few weeks, then remain stable during the rest of thesemester. The similarity of superspreaders between the first 20 weeks and 15 weeks is around 0.8 using therank-biased overlap metric [34], opening up opportunities to discover superspreaders as early as possible forefficient pandemic mitigation.
Practical significance for university/city administrators:
We believe our proposed contact tracingmethod will enable both proactive and reactive interventions. For the former, our method can help administra-tors rapidly identify superspreaders for health warnings and frequent testing, using data from just the first fewweeks of the semester. For the latter, our method can assist efforts in contact tracing, quarantine, medicalsupport, and prioritized patient care.In summary, our main contributions are threefold:• We propose a general framework for WLAN-log-based contact analysis and superspreader detection.The framework applies to a wide range of working scenarios based on users’ preferences, environmentaldynamics, and resource availability.• We present a set of initial work using the WLAN-log-based superspreader detection framework, includ-ing asymmetric contact tracing, vertex centrality measurement, and simulation-based superspreaderdetermination.• We implement the framework and evaluate it on a large-scale real-world WLAN log dataset. Ourempirical results show the efficacy of the proposed contact tracing approaches and uncover insightfulfindings for public health administrators.The rest of this paper is organized as follows. Section 2 provides background on epidemic models.Section 3 presents our framework on WLAN-log-based superspreader detection. Section 4 illustrates ourevaluation results and analyses. Section 5 reviews related work. Finally, Section 6 concludes the paper.3
E I R
Figure 2:
Illustration of the popular SEIR compartmental model in epidemiology.
The population isassigned to one of several labeled compartments: Susceptible, Exposed, Infectious, or Recovered. The orderof the labels usually shows flow patterns between the compartments with epidemiological parameters β , σ ,and γ . Details are explained in the text. In this section, we discuss the background of compartmental epidemic models, which are simplified mathemat-ical models of infectious diseases [1, 13, 14, 15, 32]. Recently, the SEIR ( S usceptible, E xposed, I nfectious, R ecovered) model has shown promise combating COVID-19 in disease modeling [9, 25], forecasting [5, 31],and intervention [19]. In the SEIR model, the population is assigned to labeled compartments between whichpeople move based on their health status.Following the equivalent compartmental diagram shown in Figure 2, we can use the following differentialequations to describe the SEIR model involving variables S , E , I , and R and their rates of change withrespect to time t : dSdt = − β ISN , dEdt = β ISN − σE, dIdt = σE − γI, dRdt = γI, (1)where β is the probability of transmitting disease between a susceptible and an infectious individual, σ is the inverse of the average incubation time (the rate of latent individuals becoming infectious), and γ isthe recovery rate. In this model, recovered individuals are permanently immune to disease. In practice, allparameters are constant values that can be obtained via maximum likelihood estimation with real pandemicdata. In this work, we leverage SEIR simulations [25] to model quarantine (self-isolation) of identifiedsuperspreaders (cf. subsection 4.1). Figure 3 shows an overall framework of our WLAN-log-based superspreader detection. We describe eachcomponent as follows.
The WLAN logs often include the (dis)association of mobile devices with respect to APs. In this paper, weuse the same dataset in [6]. A sample log entry has the following format: timestamp,process,ap-name,student-id,role,MAC,SSID,result
The fields in the log represent the event’s UNIX timestamp, the process that generated the log entry, theAP name, the encrypted student ID, the role assigned to the device, the anonymized MAC address (preservingthe OUI), the SSID name, and the authentication result (success or failure), respectively.Our WLAN dataset collection consists of three steps: (1) We first filter out students who use theuniversity’s unsecured WLAN from the dataset. Some information is missing regarding student ID and AP’sname. We consider these log entries invalid in this work. After removing invalid entries from the dataset, 39million log entries remain. (2) Since WLAN logs only provide the association (arrival) time of the person at4 ite locationsAccess pointsMobile devices (b) WLAN Data Collection
Raw datapreprocessingContact durationcalculationTrajectory reconstruction (c) Contact Graph Construction(a) Data Resources (1) Symmetric contact tracing(2) Asymmetric contact tracingBi-directionalinfection UndirectedDirectionalinfection Directed (d) Superspreader Detection
Graph mergingVertex centralitymeasurement(1) Degree(2) Closeness(3) Betweenness SEIRAdaptiveintervention (e) SEIR Measurement
Figure 3:
Overview of WLAN-log-based superspreader detection.
First, we extract contact graphsfrom WLAN logs via symmetric and asymmetric contact tracing. Second, we perform vertex centralitymeasurement to discover potential superspreaders. Finally, we simulate adaptive interventions using the SEIRmodel.the corresponding AP, we need to estimate the disassociation (leave) time. We first sort the log entries ofeach student in ascending order (based on timestamps) to ensure sequential order. For APs within the samebuilding, the stay time of each AP is the duration between the arrival time of the next AP and the currentone. Following [6], we also calculate the estimated walking time between two buildings using the GoogleMaps API [10]. (3) In [6], the location granularity is building-level as that work focuses on human mobilitymeasurement [16]. In contrast, we treat the AP as the base unit in the trajectory in order to study humanproximity tracing. Therefore, after data processing, each user/MAC’s trajectory becomes a time series of APsand their corresponding stay times. A person’s trajectory T can be defined as: T = ( AP , t , ST ) → ( AP , t , ST ) → · · · → ( AP M , t M , ST M ) , t < t < · · · < t M , where AP i is the i th AP in trajectory T , t i is the arrival time of the person at AP i , and ST i is the stay time ofthe person at AP i . We refer to ( AP i , t i , ST i ) as a tracklet . Figure 4 (top) shows how we estimate stay timefor intra- and inter-building AP connections for a person’s trajectory and illustrate Bob’s trajectory betweentwo buildings. Next, we describe the contact tracing method using persons’ trajectories. Given a student’s trajectory T withsequential tracklets, we take each tracklet as a query and apply beam search on all other students’ tracklets todetermine if there is an overlapping duration for physical interaction between two persons. Figure 4 showsan example where we consider two contact tracing methods—symmetric and asymmetric—to compute theoverlapping duration between Bob and two other persons, Alice and Trudy. Symmetric contact tracing.
Intuitively, if Bob and Alice connect to the same AP with a certain overlappingperiod, we assume there may be a potential physical interaction between them. Thus, given a tracklet ( AP q , t q , ST q ) from student q ( e.g. , Bob) and a tracklet ( AP p , t p , ST p ) from student p ( e.g. , Alice), we assign5 stimatedwalking time Building BBuilding A Time Association Association Association Stay time at Stay time at Environment infectionduration
Stay time at Stay time at
Asymmetrical contact duration
Symmetrical contact duration
BobAlice Trudy
Time Time
Figure 4:
Contact tracing using persons’ trajectories.
We show trajectories of three persons, i.e. , Bob,Alice, and Trudy. At AP , Bob’s stay time (red) is longer than the environmental infection duration (orange).There is a symmetric contact (blue) between Bob and Alice and an asymmetric contact (green) between Boband Trudy.a bidirectional contact edge between q and p if AP q = AP p and the following criterion is satisfied: ST q + ST p − max { t q + ST q , t p + ST p } + min { t q , t p } ≥ d sym , (2)where d sym is a constant value of symmetric contact duration. Empirically, we set d sym to 15 minutes in theexperiments. Asymmetric contact tracing.
However, the above symmetric tracing method omits environmental infection(cf. section 1). In that situation, Bob may stay at AP for a long enough period, making the environmentinfected. Thus, the virus will spread to another person, Trudy, even though the overlapping contact duration isshort. To resolve this problem, we propose a new asymmetric contact tracing method that can discover suchdirectional interactions. Concretely, we take Bob’s tracklet whose stay time ST q exceeds a certain duration d env and assign a directional contact edge between q ( e.g. , Bob) and p ( e.g. , Trudy) if AP q = AP p and thefollowing criterion is satisfied: ( ST q − d env ) + ST p − max { t q + ST q , t p + ST p } + min { t q + d env , t p } ≥ d asym , (3)where d env and d asym are constant values of environmental infection time and asymmetric contact duration,respectively. Empirically, we set d env to 50 minutes and d asym to 5 minutes in the experiments. The virus can spread from person q to person p , and vice versa. The virus may only spread from one person to another. a) Symmetric (b) Asymmetric (c) Hybrid Figure 5:
Symmetric, asymmetric, and hybrid contact graphs.
We show different contact tracing resultsof a real case from a group of students in our WLAN dataset. (a) Contact graph only with symmetrictracing: the unfilled red nodes are overlooked due to short overlapped stay time with other blue nodes. (b)Contact graph with asymmetric tracing: we observe that filled red nodes are included if directional contact isconsidered. (c) Merging symmetric and asymmetric graphs to construct a hybrid graph: red nodes and edgesindicate newly discovered information compared to the symmetric contact graph.
Graph Merging.
Once both symmetric and asymmetric contact graphs are obtained, we merge two graphsinto one hybrid graph by aligning nodes and edges. The hybrid graph can reveal realistic contacts in oursocial interactions evidenced by WLAN logs. Figure 5 gives an example for each graph.
The reader may ask a key question about a vertex in the hybrid graph:
How “important” is a specific personin the spread of disease?
Centrality measurements [18] are designed to quantify a person’s importance,helping answer this question. Accordingly, the purpose of this subsection is not to propose a new metric forvertex measurement. Rather, we investigate the efficacy of three metrics in representing superspreaders in theWi-Fi-based contact graphs. Figure 6 shows their differences.
Degree centrality.
Degree centrality is defined as the number of edges incident upon a vertex ( i.e. , thevertex’s number of social ties). If the network is directed, then two separate measures of degree centrality aredefined: in-degree and out-degree. In this paper, we define each vertex’s out-degree as follows: deg ( u ) = | E ou | N − , (4)where | E ou | is the total number of edges directed out of a vertex u in a directed hybrid contact graph, and N is the number of vertices in the graph. Closeness Centrality.
One common notion of centrality is a vertex’s “nearness” to many other vertices,which closeness centrality metrics aim to capture. For a given vertex, closeness centrality varies inversely7 a) Degree centrality(b) Closeness centrality(c) Betweenness centrality
Figure 6:
Visualization of vertex centrality measurement . We show a one-day contact graph of a buildingon campus with (a) degree centrality, (b) closeness centrality, and (c) betweenness centrality measurements.The top, left, and right part of each indicates the relative frequency histogram, centrality graphs, and the top10% of highlighted nodes (red), respectively. Warmer colors indicate larger values. Discrepancies among thethree measurements are visible. 8ith the vertex’s distance of a vertex from all others. Formally, for a connected graph, this measure is definedas: cl ( u ) = 1 (cid:80) v dist ( u, v ) , (5)where dist ( u, v ) denotes the geodesic (shortest-path) distance between vertices u and v . Intuitively, thismeasure looks at how fast information can spread from one vertex to all others. For example, a vertex that isclose to many other vertices may easily transmit the disease to them. Betweenness centrality.
Another popular class of centralities is based upon the perspective that “importance”relates to a vertex’s position regarding paths in the graph. If we picture those paths as the routes by whichcommunication takes place, vertices situated on many paths tend to be more critical to the communicationprocess. Betweenness centrality metrics are aimed at summarizing the extent to which a vertex is located“between” other pairs of vertices: bw ( u ) = (cid:88) s (cid:54) = t (cid:54) = v σ ( s, t | v ) σ ( s, t ) , (6)where σ ( s, t | v ) is the total number of shortest paths between s and t that pass through v and σ ( s, t ) = (cid:80) v σ ( s, t | v ) . Vertices with high betweenness centrality are critical for maintaining graph connectivity. SEIR Measurement.
Based on these centrality measures, we are able to identify potential superspreaders.Next, we perform adaptive interventions on those active nodes using SEIR simulations to measure who arethe most critical individuals for the spread of disease based on the simulation results.
In this section, we first describe our methodology. Next, we present our experimental results.
WLAN dataset.
We use the WLAN dataset from Cao et al. [6], which contains WLAN log data withdemographic information at a large public university spanning 139 days in 2015. Cao et al. [6] found thatuniversity students’ mobility patterns change periodically on a weekly basis. In our study, we focus onanalyzing the contact graph from a specific day of the week in the dataset. Specifically, we use the WLANlog information to compute the contact graph for each weekday from a randomly selected week in the dataset.We also report results on the contact graphs computed from a weekly period. We construct three types ofcontact graphs: symmetric, asymmetric, and hybrid.
Evaluation metrics.
Based on the SEIR model, we use the following realistic epidemiological measures toestimate the effect of different approaches:•
Doubling Time (day): the time it takes for the number of cumulative infections to double.•
Total Infected Fraction (%) : the fraction of the total accumulated infected population during theentire epidemic.•
Peak Infected Time (day) : the time required to infect the largest possible population.•
Peak Infected Fraction (%) : the fraction of infected persons when peak infection is reached.9able 1:
Main results on single-day contact graph.
We compare different methods with various centralitymetrics. Next, we perform SEIR simulation by quarantining 100 persons based on these metrics. Weobserve that our hybrid graph, which jointly considers symmetric and asymmetric contact tracing, achievesbetter performance than the baseline model and the symmetric contact tracing method alone.
DB-Time:
Doubling Time (day);
T-Inf:
Total Infected Fraction (%);
PK-Time:
Peak Infection Time (day);
PK-Inf:
Peak Infection Fraction (%). Results in blue show where the hybrid graph outperforms SymC. The top resultin each column is in bold .Method Measure DB-Time ( ↑ ) T-Inf ( ↓ ) PK-Time ( ↑ ) PK-Inf ( ↓ )No quarantine - 3.24 48.45 29.00 4.17Random - 3.29 44.90 30.40 3.91SymC Degree 5.61 40.69 40.80 2.53Hybrid (-0.70) 4.91 (-1.92) 38.77 (-0.60) 40.20 (-0.27) SymC Closeness 6.08 40.21 39.80 2.37Hybrid (+0.51) (-1.74) 38.47 (+1.20) 41.00 (-0.09) 2.28SymC Betweenness 5.44 42.61 39.80 2.51Hybrid (+0.21) 5.65 (-5.20) (+1.40) (-0.06) 2.45
Experimental comparison.
We quarantine persons with higher centrality based on the hybrid contact graphand simulate the epidemic on the hybrid graph. We test three vertex centrality measurement methods andcompare our results to the following baselines:•
No quarantine : we let the virus spread naturally on the hybrid graph without intervention.•
Random : we randomly quarantine a certain number of persons and simulate the epidemic on thehybrid graph.•
Symmetric contact tracing (SymC) : we quarantine persons with higher centrality based on thesymmetric contact graph and simulate the epidemic on the hybrid graph.•
Symmetric and asymmetric contact tracing (Hybrid) : we quarantine persons with higher centralitybased on the hybrid contact graph and simulate the epidemic on the hybrid graph.
Implementation details.
We follow [25] in order to simulate an epidemic using the SEIR model. We use thedefault SEIR parameters, as they are calculated from a real-world infectious dataset. In particular, the totalpopulation size in the our experiments is 3748. We set the initial number of infected persons to 50, which wefix across all experiments. In order to achieve stable observations, We run our simulation 50 times in eachgroup of experiments until convergence is reached.
Main results on single-day contact graph.
We report the main results of a single-day contact graph inTable 1. Identifying superspreaders using a hybrid graph with asymmetric and symmetric contact tracingoutperforms baseline methods substantially in terms of all centralities, justifying our motivation: symmetric Other toolkits could be used to simulate the spread of disease elsewhere.
20 40 60 80 100 120 140
Day F r a c t i o n o f p o p u l a t i o n Randomly Infect 100 PeopleInfect Betweenness Top 100Infect Closeness Top 100Infect Degree Top 100
Day F r a c t i o n o f p o p u l a t i o n Randomly Infect 100 PeopleInfect Betweenness Top 100Infect Closeness Top 100Infect Degree Top 100
Figure 7:
Effect of the infected population on the spread of the pandemic.
We select 100 students andset their initial conditions as infectious based on different criteria. We run SEIR simulation and show thefraction of the infected population on different days. and asymmetric contact tracing, which naturally reflects environmental infection, can be a valuable factor tocontain the spread of disease.
In addition, we find similar observations from other days of the week in theWLAN dataset. Next, we detail our analyses.
Superspreaders exist on the university campus.
We notice that both SymC and Hybrid significantlyoutperform baseline and “random quarantine,” suggesting the existence of superspreaders and the importanceof contact tracing to limit the spread of disease. To analyze these superspreaders’ extent of spread, we conducta simulated comparison by initializing different groups of individuals. As shown in Figure 7, we observe thatthe virus carried by students with higher centrality causes a much faster spread than with randomly selectedstudents. Further, students with higher betweenness centrality are critical to the spread.
Asymmetric contact tracing is efficient.
We found that asymmetric contact tracing with a simple vertexmeasure leads to a notable gain for all metrics. Especially for the total infected fraction (T-Inf), Hybridis ∼
1% better than symmetric contact tracing (SymC), which represents around 40 persons in our WLANdataset. We also show the SEIR simulation curves in Figure 8: both symmetric and asymmetric contacttracing methods significantly outperform random quarantine methods, demonstrating the effectiveness of oursuperspreader detection framework.
Betweenness centrality strongly limits the total infected population on daily contact graphs.
By com-paring different centrality measurements for the selection of quarantine populations, we found that between-ness centrality leads to the strongest reduction in the total infected fraction (from . to . ) in thedaily contact graph (cf. Table 1). One reason is that betweenness metrics can effectively discover vertices thatsit on many paths are likely more critical to the spread process in social graphs. This verifies our observationin Figure 6 that betweenness centrality identifies a very different group of persons compared to degreecentrality and closeness centrality (cf. subsection 3.3).Further, we extend the simulation on contact graphs computed over longer weekly periods. Compared todaily contact graphs, weekly graphs generated from the WLAN logs are more densely connected. We focus11 a) (b) (c) Figure 8:
Spread of the pandemic during the period.
We show comparison results on (a) degree centrality,(b) closeness centrality, and (c) betweenness centrality measurements. Our asymmetric contact tracing andsymmetric contact tracing (green and red) outperforms the baseline approaches with random quarantine(gray).Table 2:
Results on one-week contact graph.
We compare different methods using a one-week contactgraph. We perform SEIR simulation by quarantining 100 persons based on centrality metrics. We usebetweenness centrality to discover superspreaders.
CM:
Centrality Measure;
DB-Time:
Doubling Time(day);
T-Inf:
Total Infected Fraction (%);
PK-Time:
Peak Infection Time (day);
PK-Inf:
Peak InfectionFraction (%). Method DB-Time ( ↑ ) T-Inf ( ↓ ) PK-Time ( ↑ ) PK-Inf ( ↓ )No quarantine 0.98 86.15 13.04 17.17Random 0.98 83.93 13.04 16.61SymC 1.09 81.88 13.96 16.18Hybrid 1.11 82.57 13.84 16.17on betweenness centrality and Table 2 shows the results. We find that the difference between the symmetricand the hybrid graphs is marginal. This is because, in long-term contact tracing, the top superspreadersbetween the symmetric and hybrid graphs overlap highly, suggesting that early-stage pandemic controlhelps identify superspreaders who may be missed otherwise. We observe similar patterns for other weeksthroughout the study period. How to perform quarantine with constrained resources?
Next, we study suitable proportions of thepopulation for intervention. We show the total infected fraction with respect to different amounts of infectiousand quarantined populations. Figure 9 shows the results based on the betweenness centrality measure. Weobserve a clear turning point where quarantining 20% of the whole population reduces the spread of diseaseamong all infected ratios. This suggests that increasing the quarantine percentage over 20% provides onlymarginal benefits.
Will superspreaders change during the whole semester?
To further analyze the stability of superspreadersamong different periods, we compute the similarities of the identified superspreaders from any two accumu-lated weeks, whose results are shown in Figure 10. In this study, we first generate the contact graphs basedon the first N weeks in the WLAN dataset, where N ranges from 1 to 20. Next, we select the top 100 students12igure 9:
Effect of infected population in the spread of the pandemic.
We show the fraction of totalinfected in terms of fractions of initial infected and quarantined people.based on our centrality measurements. We adopt rank-biased overlap (RBO) [34] to compute the similarityof two ranked student lists from any two accumulated weeks. Our results show that the superspreaderschange during the first few weeks, but remain stable throughout the rest of the semester. For example, thesimilarity between the first 20 weeks and 15 weeks is around 0.8, opening up opportunities to discover thesuperspreaders as early as possible for efficient pandemic mitigation.
Researchers have devoted considerable attention to mobile application (app) technology for COVID-19contact tracing. For example, Covid Watch [33] uses Bluetooth signals to detect when users are near eachother and alerts them anonymously if they were in contact with someone who is later diagnosed with COVID-19. Similarly, PACT [29] uses inter-phone Bluetooth communications (including energy measurements) as aproxy for inter-person distance measurement. Through applied cryptography, this system can collect andmaintain weeks of contact events. Later, PACT augments these events with infection notifications leading toexposure notifications to all mobile phone owners who have had medically significant contact (in terms ofdistance and time) with infected people in the past medically significant period (e.g., two weeks). In addition,Singapore launched the TraceTogether [22] app to boost COVID-19 contact tracing efforts. By downloadingthe app and consenting to participate in it, TraceTogether lets users “proactively help” in the contact tracingprocess [22]. The app works by exchanging short-range Bluetooth signals between phones to detect otherapp users who are nearby. Apple and Google [2] are working together for the first time on a protocol thatwill alert users if they have been exposed to the coronavirus. Luo et al. propose A-Turf [20], an acousticencounter detection method for COVID-19 contact tracing. Compared with Bluetooth technology, the systemmore precisely detects encounters within 6-foot ranges (social distancing). Unlike the WLAN-log-basedcontact tracing presented in this paper, client-based contact tracing requires users’ widespread adoption andactive participation. 13 accumulated weeks a cc u m u l a t e d w ee k s (a) Degree centrality accumulated weeks a cc u m u l a t e d w ee k s (b) Closeness centrality accumulated weeks a cc u m u l a t e d w ee k s (c) Betweenness centralityFigure 10: Similarity matrix of superspreaders between accumulated weeks.
Infrastructure-based methods take advantage of existing infrastructure deployed worldwide such as CCTVfootage [28], locations measured using cellular networks [3], Wi-Fi hotspots [30, 36], and GPS [4], withoutrequiring client-side involvement. Similar to our approach, recent efforts [30, 36] use passive Wi-Fi sensingfor network-based contact tracing for infectious diseases, particularly focused on the COVID-19 pandemic.Those works mainly use location occupancy or number of contact as the measure to identify the superspreaderswhile we consider different types of centrality for measuring the “importance” of the vertex in the socialnetworks. Moreover, we adopt SEIR simulation to justify which measure is better in discovering thesuperspearders.
In this paper, we focused on WLAN-log-based superspreader detection in the COVID-19 pandemic. Weproposed a general framework with applications to a wide range of working scenarios based on users’ prefer-14nces, environmental dynamics, and resource availability. Moreover, we presented asymmetric contact, a newtype of human contact. The concept of asymmetric contact partially captured the notion of environmentalinfection. We required that persons in asymmetric contact must have had a certain overlap time betweentheir association times with a specific AP. In fact, we can generalize by eliminating this constraint. We cantreat the overlap time as a control knob to adjust the degree of “asymmetry”. Due to space limitations, thisremains part of our future work. We have implemented our framework, conducted an extensive evaluation,and obtained a set of important findings. Our proposed contact tracing framework and our findings provided atool as well as guidelines for public health administrators regarding both proactive and reactive interventionsagainst the pandemic.
Acknowledgement
The work was supported in part by the National Science Foundation (NSF) under Grant No. CNS 2028547.Any opinions, findings, conclusions, and recommendations in this paper are those of the authors and do notnecessarily reflect the views of the funding agencies.
References [1] Roy M Anderson and Robert M May. 1992.
Infectious diseases of humans: dynamics and control arXiv preprint arXiv:2010.14558 (2020).[4] Jason Bay, Joel Kek, Alvin Tan, Chai Sheng Hau, Lai Yongquan, Janice Tan, and Tang Anh Quy.2020. BlueTrace: A privacy-preserving protocol for community-driven contact tracing across borders.
Government Technology Agency-Singapore, Tech. Rep (2020).[5] Andrea L Bertozzi, Elisa Franco, George Mohler, Martin B Short, and Daniel Sledge. 2020. Thechallenges of modeling and forecasting the spread of COVID-19. arXiv preprint arXiv:2004.04741 (2020).[6] Paul Y Cao, Gang Li, Adam C Champion, Dong Xuan, Steve Romig, and Wei Zhao. 2017. On humanmobility predictability via WLAN logs. In
IEEE INFOCOM
Proceedings of theRoyal Society of London. Series B: Biological Sciences
DOI
The Lancet Global Health (2020).[13] Herbert W Hethcote. 1989. Three basic epidemiological models. In
Applied mathematical ecology .Springer, 119–144.[14] Herbert W. Hethcote. 2000. The Mathematics of Infectious Diseases.
SIAM Rev.
42, 4 (2000), 599–653.https://doi.org/10.1137/S0036144500371907[15] William Ogilvy Kermack and Anderson G McKendrick. 1927. A contribution to the mathematicaltheory of epidemics.
Proceedings of the royal society of london. Series A, Containing papers of amathematical and physical character
IEEE INFOCOM .[17] Don Klinkenberg, Christophe Fraser, and Hans Heesterbeek. 2006. The effectiveness of contact tracingin emerging epidemics.
PloS one
1, 1 (2006), e12.[18] Eric D Kolaczyk and G´abor Cs´ardi. 2014.
Statistical analysis of network data with R . Vol. 65. Springer.[19] Abby Leung, Xiaoye Ding, Shenyang Huang, and Reihaneh Rabbany. 2020. Contact Graph EpidemicModelling of COVID-19 for Transmission and Intervention Strategies. arXiv preprint arXiv:2010.03081 (2020).[20] Yuxiang Luo, Cheng Zhang, Yunqi Zhang, Chaoshun Zuo, Dong Xuan, Zhiqiang Lin, Adam CChampion, and Ness Shroff. 2020. ACOUSTIC-TURF: Acoustic-based Privacy-Preserving COVID-19Contact Tracing. arXiv preprint arXiv:2006.13362 medRxiv (2020). 1626] Marcel Salath´e, Christian L Althaus, Richard Neher, Silvia Stringhini, Emma Hodcroft, Jacques Fellay,Marcel Zwahlen, Gabriela Senti, Manuel Battegay, Annelies Wilder-Smith, et al. 2020. COVID-19epidemic in Switzerland: on the importance of testing, contact tracing and isolation.
Swiss medicalweekly
ITU Journal on Future and Evolving Technologies (2020).[28] D Skoll, JC Miller, and LA Saxon. 2020. COVID-19 testing and infection surveillance: Is a combineddigital contact tracing and mass testing solution feasible in the United States?
Cardiovascular digitalhealth journal (2020).[29] Private Automated Contact Tracing. 2020. https://pact.mit.edu/. (Accessed on 02/15/2021).[30] Amee Trivedi, Camellia Zakaria, Rajesh Balan, and Prashant Shenoy. 2020. WiFiTrace: Network-basedContact Tracing for Infectious DiseasesUsing Passive WiFi Sensing. arXiv preprint arXiv:2005.12045 (2020).[31] Jan-Diederik Van Wees, Sander Osinga, Martijn van der Kuip, Michael Tanck, M Hanegraaf, MPluymaekers, O Leeuwenburgh, L Van Bijsterveldt, J Zindler, and MT Van Furth. 2020. Forecastinghospitalization and ICU rates of the COVID-19 outbreak: An efficient SEIR model.
Bull World HealthOrgan (2020).[32] Emilia Vynnycky and Richard White. 2010.
An introduction to infectious disease modelling
ACM Transactions on Information Systems (TOIS)
28, 4 (2010), 1–38.[35] Haohuang Wen, Qingchuan Zhao, Zhiqiang Lin, Dong Xuan, and Ness Shroff. 2020. A study ofthe privacy of covid-19 contact tracing apps. In
International Conference on Security and Privacy inCommunication Systems . Springer, 297–317.[36] Camellia Zakaria, Amee Trivedi, Michael Chee, Prashant Shenoy, and Rajesh Balan. 2020. Analyzingthe Impact of Covid-19 Control Policies on Campus Occupancy and Mobility via Passive WiFi Sensing. arXiv preprint arXiv:2005.12050 (2020).[37] Qingchuan Zhao, Haohuang Wen, Zhiqiang Lin, Dong Xuan, and Ness Shroff. 2020. On the accuracyof measured proximity of bluetooth-based contact tracing apps. In