Proximity Inference with Wifi-Colocation during the COVID-19 Pandemic
Mikhail Dmitrienko, Abhishek Singh, Patrick Erichsen, Ramesh Raskar
PProximity Inference with Wifi Colocation during theCOVID-19 Pandemic
Mikhail Dmitrienko
PathCheck FoundationCambridge, MA
Abhishek Singh
MIT Media LabCambridge, MA
Patrick Erichsen
PathCheck FoundationCambridge, MA
Ramesh Raskar
MIT Media LabCambridge, MA
Abstract
In this work we propose a WiFi colocation methodology for digital contact trac-ing. The approach works by having a device scan and store nearby access pointinformation to perform proximity inference. We make our approach resilient todifferent practical scenarios by configuring a device to turn into a hotspot if accesspoints are unavailable, which makes the approach feasible in both dense urbanareas and sparse rural places. We compare various shortcomings and advantages ofthis work over other conventional ways of doing digital contact tracing. Preliminaryresults indicate the feasibility of our approach for determining proximity betweenusers, which is relevant for improving existing digital contact tracing and exposurenotification implementations.
Privacy-preserving proximity inference is crucial to the success of a digital contact tracing solution.The majority of mainstream solutions use either Bluetooth or GPS based colocation to achieve this.However, a third option that is not as widely discussed is WiFi colocation. In this paper, we investigatea hybrid implementation of three WiFi colocation approaches described in Sapiezynski et. al, Daset al., and Carreras et. al. The hybrid implementation functions as follows: when a user is activelyusing WiFi, a deterministic if/else classifier infers proximity using features extracted from scans ofnearby WiFi access points (APs). In the case where a user is not actively using WiFi, a duty-cyclethat rotates the device between acting as a WiFi hotspot and a WiFi receiver will allow for proximityinference without interfacing with APs. We conduct an experiment to see whether three proximityfeatures; Pearson correlation, Jaccard similarity, and Das proximity, can be used as reliable proxiesfor distance, and we assess the performance of a simple classifier using these features. Lastly, weassess potential privacy issues associated with WiFi-based colocation.
A number of groups have developed methods to perform colocalization using WiFi signals. One ofthe earliest papers on the subject was Meunier [7] in 2004, which proposed calculating the Manhattandistance between the signal strength vectors of two devices on a network to detect their proximity.That same year, Krumm [5] filed a patent for a network system in which clients send MAC addressand signal strength data to a central server that estimates the likelihood of their proximity using apolynomial regression model. More contemporary papers like Sapiezynski et al [9] and Das et al[3] used many of the same features from [5] to develop machine learning models. Sapiezynski et altrained a Gradient Boosting classifier while Das et al used an unsupervised, random walk algorithmwhich outperformed Sapiezynski’s supervised approach. Other noteworthy WiFi colocation anddistance estimation papers include Nakatani et al [8], which trained a convolutional neural network toestimate distance for indoor navigation and Wi-Fi geo-fencing, and Carreras et al [2], which proposeda method that cycles devices between acting as hotspots and signal receivers to work around privacy
Preprint. Under review. a r X i v : . [ c s . C Y ] O c t igure 1: Devices scan for the signal strengths and MAC addresses from nearby access points.Engineered features based on this data are then used to predict proximity between users.issues associated with access point-based colocation. Following the outbreak of COVID-19, a numberof groups have explored ways to leverage WiFi signal processing to help enforce social distancingguidelines and improve digital contact tracing efforts. Trivedi et al [11] proposed a network-centricapproach that uses maintenance logs to detect proximity between users connected to an enterprisenetwork. A prototype of their approach has already been implemented on two college campuses.Harvard has also started using a network-centric approach for WiFi based contact tracing [1]. Gupta etal. [4] developed a privacy preserving data collection system that allows organizations to gain insightson social distancing adoption and facilitate contact tracing using WiFi connectivity data.VContact [6]proposed a method for collecting access point information similar to our method and allows infectedindividuals to share this information which healthy users download and query locally. We propose a deterministic if/else classifier that uses the following three features:• Pearson correlation of signal strengths from overlapping APs between two devices• Jaccard similarity between the lists of APs• "Proximity feature" described in Das et al [3] which we will refer to as "Das proximity"The classifier makes its prediction according to the following logic: scan ij = { Jaccard ij , P earson ij , Das ij } represents the vector of the Jaccard Similarity, Pearsoncorrelation, and Das proximity for the scan recorded by subject i at distance j from access point. avgM etrics k = { avgJaccard k , avgP earson k , avgDas k } represents the vector of the averageJaccard, Pearson, and Das at the distance threshold k for proximity. Algorithm 1:
Proximity classifier logicInputs: scans ij , i, j ∈ I, J, avgM etrics k Outputs: predictionspredictions ← ∅ for scan ij , ∀ i, j ∈ I, J do if scan ij [0] > avgM etrics k [0] or scan ij [1] > avgM etrics k [1] or scan ij [2] > avgM etrics k [2] then predictions ij ← true else predictions ij ← f alse end ifend for .2 Hotspot Duty Cycle In settings where an individual is not connected to a WiFi access point, we propose an implementationof a hotspot duty cycle described in [2]. An immediate limitation of this approach is that it is onlypossible on Android devices - the necessary APIs are not exposed on iOS. For individuals with anAndroid device, Carreras et al [2] describes four distinct advantages to an access point based approachto colocation:i) does not require any additional access points to be present aside from the user’s deviceii) only requires information from pairwise interactionsiii) users do not need to actively connect to any access pointsiv) provides accuracy in the range of 0.5m - 1.0m across a range of settingsNative APIs are available on Android to programmatically create and destroy a hotspot on a user’sdevice for the duty cycle logic. In order to easily integrate this logic into a React Native app, wedeveloped a set of React Native bindings for the underlying Android code. With these bindingsavailable, future work would include an implementation of this duty cycle in a npm package. Thispackage would have the ability to detect when a user is connected to a WiFi access point, and if not, itwould then begin to perform a hotspot duty cycle as described in [2]. The package could be includedin a React Native application to perform proximity inference with WiFi colocation while providingthe least disruptive experience to an end user.
We launched a study to see whether the features from 3.1 could be used as reliable proxies for distance.We recruited six subjects and instructed them to collect WiFi sensor log data using an Android app.Subjects recorded their first scan right next to their access point in their indoor living environment.Then, they took scans at 1 ft intervals away from their AP until they were 25ft away or as high ofa distance as space permitted. By comparing the WiFi logs of the scan at each distance interval tothose of the initial scan right next to the access point, we were able to capture how the three featureschanged with distance. These calculated features were then averaged over all the scans.
We first assessed the viability of Pearson correlation, Jaccard similarity, and Das proximity as proxiesfor distance based on the scans we collected from the experiment. The change in these featuresaveraged over all the scans can be seen in subfigures a-c. We observe that all three features steadilydecreased with distance before plateauing at approximately 10ft.We then evaluated the performance of the classifier using standard metrics for binary classification:recall, precision, and F-score. We also found the corresponding metrics for different distancethresholds. The change in the metrics with increasing distance thresholds is captured in subfiguresi-iii. Figure iii shows that F-score increased linearly with the distance threshold. Given that socialdistancing guidelines recommend that people remain 6-10ft apart, we aim for a high accuracy in thatrange. Using 10ft as the distance threshold led to an F-score of 0.65, which is comparable to thelower end of Bluetooth proximity detection algorithms described in Shankar et al [10]. However, dueto the limited amount of data in this experiment, its difficult to quantify how well the classifier wouldperform in a consumer setting. We expect that further experimentation with more robust datasets willyield a more accurate picture about the feasibility of our approach.
Privacy is one of the most critical pieces of a digital contact tracing ecosystem. Having a high levelof privacy not only alleviates a majority of ethical concerns but also drives up the adoption rate,which is prerequisite for the success of any contact tracing infrastructure. While alternate solutionslike Bluetooth, GPS, and ultrasound have witnessed privacy aware solutions for contact tracingin light of COVID-19, WiFi has not seen anything substantial so far. There are a few important Github link removed for Neurips submission. a) Average Pearson over all scans (b) Average Jaccard over all scans (c) Average Das proximity overall scans(i) Change in recall with changingthreshold (ii) Change in precision withchanging threshold (iii) Change in F score with chang-ing distance threshold differentiating factors when it comes to the privacy of a WiFi-based contact tracing solution comparedto other colocation technologies. The first big difference is the lack of direct interaction betweenparticipating parties. Similar to GPS but unlike Bluetooth and ultrasound, WiFi logs its own datawithout exchanging any information with other participating parties. This puts a restriction on theentropy of information which can be obtained through a third party providing information, in thiscase a WiFi router. Therefore the only private and common information held by the two proximateparties is the MAC address. Each MAC address can only be represented by a 48 bit and hence itis not sufficient to safeguard against brute force attacks. However, we can increase the entropy bymaking assumptions about the additional number of hotspots/access points available which can addto the total entropy. Nevertheless, adding extra access points only leads to minimal improvementsbecause an attacker can create a map of access points close to each other, reducing the overall entropyof the available data. The attack can be further enhanced by a dictionary mapping available throughwebsites such as wigle.net. In this proposal we outlined a two pronged approach to performing WiFi colocation during COVID-19. We demonstrated how a simple deterministic classifier could be used in a limited capacity toinfer proximity using features extracted from WiFi scan data, as well as how a hotspot duty cyclecould be integrated to ensure coverage of the application across places where WiFi access points maynot be available. Finally, we outlined some potential privacy concerns including low informationentropy and limited safeguards against brute force attacks. In the coming months, we plan on utilizingonline communities to recruit more volunteers to help collect data for more nuanced experiments.Our current data set is quite limited, which prevents us from implementing more advanced machinelearning algorithms without major trade-offs in test accuracy. However, deterministic classifiers forproximity detection appear to perform at comparable levels to highly sophisticated models, whichpresents an area for further analysis on the topic. We also hope to collaborate with other practitionersin the WiFi signal processing community to improve the proximity detecting capabilities of ourparticular implementation, as well as with differential privacy researchers to address the securitytrade offs inherent to WiFi-based contact tracing systems.4 eferences [1] Harvard to Track Affiliates’ Wi-Fi Signals as Part of Contact Trac-ing Pilot. , 2020.[2] I. Carreras, A. Matic, P. Saar, and V. Osmani. Comm2sense: Detecting proximity through smart-phones. In , pages 253–258, 2012.[3] S. Das, S. Chatterjee, S. Chakraborty, and B. Mitra. An unsupervised model for detectingpassively encountering groups from wifi signals. In , pages 1–7, 2018.[4] Peeyush Gupta, Sharad Mehrotra, Nisha Panwar, Shantanu Sharma, Nalini Venkatasubramanian,and Guoxi Wang. Quest: Practical and oblivious mitigation strategies for covid-19 using wifidatasets, 2020.[5] John Krumm and Ken Hinckley. The nearme wireless proximity server. In
UbiComp 2004:Ubiquitous Computing: 6th International Conference, Nottingham, UK, September 7-10, 2004,Proceedings (Lecture Notes in Computer Science) , pages 283–300. Springer Berlin Heidelberg,September 2004.[6] Guanyao Li, Siyan Hu, Shuhan Zhong, and S-H Gary Chan. vcontact: Private wifi-based contacttracing with virus lifespan. arXiv preprint arXiv:2009.05944 , 2020.[7] J. . Meunier. Peer-to-peer determination of proximity using wireless network data. In