[PDF] A Haystack Full of Needles: Scalable Detection of IoT Devices in the Wild

Abstract

Consumer Internet of Things (IoT) devices are extremely popular, providing users with rich and diverse functionalities, from voice assistants to home appliances. These functionalities often come with significant privacy and security risks, with notable recent large scale coordinated global attacks disrupting large service providers. Thus, an important first step to address these risks is to know what IoT devices are where in a network. While some limited solutions exist, a key question is whether device discovery can be done by Internet service providers that only see sampled flow statistics. In particular, it is challenging for an ISP to efficiently and effectively track and trace activity from IoT devices deployed by its millions of subscribers --all with sampled network data. In this paper, we develop and evaluate a scalable methodology to accurately detect and monitor IoT devices at subscriber lines with limited, highly sampled data in-the-wild. Our findings indicate that millions of IoT devices are detectable and identifiable within hours, both at a major ISP as well as an IXP, using passive, sparsely sampled network flow headers. Our methodology is able to detect devices from more than 77% of the studied IoT manufacturers, including popular devices such as smart speakers. While our methodology is effective for providing network analytics, it also highlights significant privacy consequences.

Full PDF

IIf you cite this paper, please use the IMC reference: Said Jawad Saidi, Anna Maria Mandalari, Roman Kolcun, Hamed Haddadi, Daniel J. Dubois, David Choffnes, GeorgiosSmaragdakis, Anja Feldmann. 2020. A Haystack Full of Needles: Scalable Detection of IoT Devices in the Wild. In

Internet Measurement Conference (IMC ’20), October 27–29,2020, Virtual Event, USA.

ACM, New York, NY, USA, 14 pages. https://doi.org/10.1145/3419394.3423650

A Haystack Full of Needles:Scalable Detection of IoT Devices in the Wild

Said Jawad Saidi

Max Planck Institute for Informatics

Anna Maria Mandalari

Imperial College London

Roman Kolcun

Imperial College London

Hamed Haddadi

Imperial College London

Daniel J. Dubois

Northeastern University

David Choffnes

Northeastern University

Georgios Smaragdakis

TU BerlinMax Planck Institute for Informatics

Anja Feldmann

Max Planck Institute forInformatics/Saarland University

ABSTRACT

IoT devices are where in a network. While some limited solutionsexist, a key question is whether device discovery can be done byInternet service providers that only see sampled flow statistics. Inparticular, it is challenging for an ISP to efficiently and effectivelytrack and trace activity from IoT devices deployed by its millionsof subscribers—all with sampled network data.In this paper, we develop and evaluate a scalable methodologyto accurately detect and monitor IoT devices at subscriber lineswith limited, highly sampled data in-the-wild. Our findings in-dicate that millions of IoT devices are detectable and identifiablewithin hours, both at a major ISP as well as an IXP, using passive ,sparsely sampled network flow headers. Our methodology is ableto detect devices from more than 77% of the studied IoT manufac-turers, including popular devices such as smart speakers. Whileour methodology is effective for providing network analytics, italso highlights significant privacy consequences.

CCS CONCEPTS • Security and privacy → Network security; • Networks → Net-work monitoring;

Public Internet; Network measurement;

KEYWORDS

Internet of Things, IoT detection, IoT secuirty and privacy, InternetMeasurement

The number of IoT devices deployed within homes is increasingrapidly. It is estimated that at the end of 2019, more than 9.5 billionIoT devices were active, and the IoT population will increase to 20billion by 2025 [1]. Such devices include virtual assistants, smarthome control, cameras, and smart TVs. While users deploy someIoT devices explicitly, they are often unaware of the security threatsand privacy consequences of using such devices [2]. Major Internet Service Providers (ISPs) are developing strategies for dealing withthe large-scale coordinated attacks from these devices.Existing solutions focus on instrumenting testbeds or home en-vironments to collect and analyze full packet captures [3–5], localsearch for IoT anomalies [6, 7], active measurements [8, 9], ordata from antivirus companies running scan campaigns from usershomes [7]. In isolation, these data sources do not provide enoughinsights for preventing network-wide attacks from IoT devices [10].Detecting IoT devices from an ISP can help to identify suspicioustraffic and what devices are common among the subscriber linesgenerating that traffic.In this paper, we present a methodology for detecting homeIoT devices in-the-wild at an ISP, and an Internet Exchange Point(IXP), by relying on passive, sampled network traces and activeprobing experiments. We build on the insight that IoT devicestypically rely on backend infrastructure hosted on the cloud tooffer their services. While contacting such infrastructure, theyexpose information, including their traffic destinations, even whena device is not in use [4]. One of the challenges of detecting IoTdevices at scale is the poor availability and low granularity of datasources. The available data is often in the form of centrally-collectedaggregate and sampled data (e.g., NetFlow [11], IPFIX traces [12]).Thus, we need a methodology that (a) does not rely on payload and(b) handles sparsely sampled data.Another challenge is traffic patterns diversity , across IoT devicesand their services. We note that some devices, e.g., cameras, willgenerate significant continuous traffic; others, e.g., plugs, can beexpected to be mainly passive unless used. Moreover, many devicesoffer the same service, e.g., the Alexa voice assistant [13] is availableon several brands of smart speakers as well as on Amazon FireTV devices. Here, the traffic patterns may depend on the servicerather than the specific IoT device. Some services rely on dedicatedbackend infrastructures, while others may use shared ones, e.g.,CDNs. Thus, we need a methodology that identifies which IoTservices are detectable from the traffic and then identifies a uniquetraffic pattern for each IoT device and associated services.Our key insight is that we can address these challenges by fo-cusing our analysis only on the types of destinations contacted byIoT devices. Even with sparsely sampled data, the set of serverscontacted by an IoT device over time can form a reasonably unique Here we refer to IoT services as the set of protocols and destinations that are part ofthe operations of an IoT device. a r X i v : . [ c s . N I] S e p SP Device A IoT Service Flow Device B IoT Service Flow Device C IoT Service Flow Subscriberw/ IoTServer

DedicatedInfrastructure

CDN

Figure 1: Simplified IoT communication patterns. signature that is revealed in as little as a few hours. However, thisapproach has limitations, for example we cannot use it to detectdevices or services that use a shared infrastructure with unrelatedservices (e.g., CDNs).To understand the detectability of IoT devices in the above-mentioned environment, we focus on the possible communicationpatterns of end-user IoT services and the types of destinations theycontact. Figure 1 shows three possible communication patterns ontop of a typical network topology. This includes three households,an ISP, as well as a dedicated infrastructure and a CDN that hostsmultiple servers. Device A is deployed by two subscribers, and onlycontacts one server in the dedicated infrastructure. Device B is de-ployed by a single subscriber and contacts both a dedicated server,as well as a CDN server. Device C is deployed by two subscribersand contacts only CDN servers. We observe that, using NetFlowtraces at the ISP edge, it is possible to identify subscriber lineshosting devices of type A and B . Devices of type C are harder todetect given the sampling rates and header-only nature of NetFlow.In this paper, we use a unique testbed and dataset to build amethodology for detecting and monitoring IoT devices at scale (seeFigure 2). We first use controlled experiments, where we tunnelthe traffic of two IoT testbeds with 96 IoT devices to an ISP. Thisprovides us with ground truth IoT traffic within this ISP (Section 2).We confirm the visibility of the ground truth IoT traffic using theNetFlow ISP data (Section 3). Next, we identify backend infras-tructures for many IoT services, from the observed ISP IoT traffic(Section 4). We augment this base information with data from DNSqueries, web certificates, and banners. Next, we use the traffic sig-natures to identify broadband subscriber lines using IoT servicesat the ISP, as well as an IXP (Section 6). Finally, we discuss ourresults, their significance, and limitations in Section 7, related work(Section 8), and conclude with a summary in Section 9.Our main contributions are as follows: • We develop a methodology for identifying IoT devices, by classi-fying domains and IP addresses of the backend infrastructure. Tothis end we derive distinct signatures, in terms of IP/domain/portdestinations, to recognize IoT devices. With our signatures wewere able to recognize the presence of devices from 31 out of 40manufacturers in our testbed. To foster further research in the area of IoT privacy and security, we make all thesignatures available at https://moniotrlab.ccis.neu.edu/imc20/ • We show that it is possible to detect the presence of IoT devicesat subscriber lines, using sparsely sampled flow captures from alarge residential ISP, and a major IXP, even if the device is idle,i.e., not in active use. Specifically, we were able to recognizethat 20% of 15 million subscriber lines used at least one of the 56different IoT products in our testbed. • We highlight that our technique scales, is accurate, and canidentify millions of IoT devices within minutes, in a non-intrusiveway from passive, sampled data. In the case of the ISP, we wereable to detect the presence of devices from 72% of our targetmanufacturers within 1 hour, sometimes minutes.Based on our findings, we also discuss why some IoT devices arefaster to detect, how to hide an IoT service, as well as how thedetectability can be used to improve IoT services and networktroubleshooting.

We need ground truth traffic from IoT devices, as observed both in atestbed and in the wild, for developing and testing our methodology.In this section, we describe our data collection strategy (see point1 of Figure 2).

We utilize two vantage points , namely a large European ISP, and amajor European IXP.

ISP (ISP-VP).

The ISP is a large residential ISP that offers Internetservices to over 15 million broadband subscriber lines. The ISPuses NetFlow [11] to monitor the traffic flows at all border routersin its network, using a consistent sampling rate across all routers.Figure 3 shows where NetFlow data is collected.

IXP (IXP-VP).

The IXP facilitates traffic exchange between itsmembers. At this point, it has more than 800 members, includinginternational, with peak traffic exceeding 8 Tbps. The IXP usesIPFIX [12] to collect traffic data across its switching fabric at aconsistent sampling rate, which is an order of magnitude lowerthan the one used at the ISP. Figure 4 illustrates where the IPFIXdata is collected.

Ethical considerations ISP/IXP.

Neither the ISP nor the IXP flowdata contain any payload data, thus no user information. We distin-guish user IPs from server IPs and anonymize by hashing all userIPs, following the method described in [5]. The address space ofthe ISP residential users is known. We call an IP a server IP if itreceives or transmits traffic on well-known ports or if it belongs toASes of cloud or CDN providers. The ports include, e.g., web ports(80, 443, 8080), NTP (123), DNS (53). Moreover, we do not have anyspecific user activity and can only access and report aggregatedstatistics in accordance with the policies of the ISP and IXP.

Subscriber line (Home-VP) Network setup.

In order to ingestground truth traffic into the network, we need privileged access toa home subscriber line . For this, we use the ISP-VP, but rather thandeploying all IoT devices directly within the home, we placed aVPN endpoint with an IP out of the /28 subscriber’s prefix and usedit to ingest IoT traffic tunneled to the server from two IoT testbeds,one in Europe, one in the US, see Figure 3. The measurement pointswithin the ISP will also capture this traffic. We simply excluded thistraffic from our dataset, as the VPN tunnel endpoints are known SP2 23 33224 4 445 5 55551

Generate and capture ground truth (GT) IoT tra ﬃ c in the labs and household - Section 2 Capture GT tra ﬃ c in ISP Vantage pointEvaluate visibility of GT IoT tra ﬃ c in the ISP-VP - Section 3 Identify IoT domains, service IPs , and port numbers, generate detection rules - Section 4 Detect IoT devices in the wild - Section 6 Cross check detection rules by inferring devices on GT data - Section 5

Home w/ IoT Devices Figure 2: General methodology overview.

Border Router

BNG Router

Border Router

Home Vantage Point

Border Router

Testbed 1 w/ IoT Devices Testbed 2 w/ IoT Devices

BNG Router

IoT Tra ﬃ c through VPNDevice A IoT Service Flow Device B IoT Service Flow Device C IoT Service Flow Packet Capture PointFlow Capture Point BNG Router Border Router

Figure 3: ISP setup& flow collection points.

Device A IoT Service Flow Device B IoT Service Flow Flow Capture PointIXP Member

Figure 4: IXP setup& flow collection points. to us and for each experiment we use the default DNS server forthe ISP. Importantly, since the /28 prefix is used explicitly for ourexperiments, there was no other network activity other than thatof the IoT devices.

Ethical considerations–Home-VP setting.

With the coopera-tion of the ISP, we were able to use a reserved /28 allocated to thisspecific subscriber line (Home-VP) (with signed explicit consent)out of a /22 prefix reserved for residential users. Thus, the analysisin this paper only considers traffic explicitly ingested by the groundtruth experiments and does not involve any user-generated traffic.

The IoT testbeds used here consist of 96 devices from 40 vendors.We selected the devices to provide diversity within and betweendifferent categories: surveillance, smart hubs, home automation,video, audio, and appliances. Most of these are among the mostpopular devices, according to Amazon, in their respective region.Our testbed includes multiple instances of the same device (56different products), so that we can see the destinations that eachproduct contacts in different locations. For a list of the IoT devicesand the category of each device, we refer to Table 1. We redirect allIoT traffic to the Home-VP within the ISP, and we capture all thetraffic generated by the IoT devices (see 1 in Figure 2).Most of the selected IoT devices are controlled using either avoice interface provided by a voice assistant (such as Amazon Alexa)or via a smartphone companion application. We use the voice inter-face to automate active experiments by producing voice commandsusing a Google Voice synthesizer. For IoT devices that support acompanion app, we use Android smartphones, and we rely on theMonkey Application Exerciser for Android Studio [14] for automat-ing simulated interactions between the user and the IoT device.

Our experiments can be classified into idle and active experiments.

Idle experiments . We define as idle the experiments during whichthe devices are just connected to the Internet without being activelyused. We generate idle traffic for three days (November 23rd-25th,2019) from both testbeds.

Category Device Name

Surveillance

Amcrest Cam, Blink Cam, Blink Hub, Icsee Door-bell, Lefun Cam, Luohe Cam, Microseven Cam,Reolink Cam, Ring Doorbell, Ubell Doorbell,Wansview Cam, Yi Cam, ZModo Doorbell

Smart Hubs

Insteon, Lightify, Philips Hue, Sengled, Smartthings,SwitchBot, Wink 2, Xiaomi

Home Automation

D-Link Mov Sensor, Flux Bulb, Honeywell T-stat,Magichome Strip, Meross Door Opener, Nest T-stat,Philips Bulb, Smartlife Bulb, Smartlife Remote, TP-Link Bulb, TP-Link Plug, WeMo Plug, Xiaomi Strip,Xiaomi Plug

Video

Apple TV, Fire TV, LG TV, Roku TV, Samsung TV

Audio

Allure with Alexa, Echo Dot, Echo Spot, Echo Plus,Google Home Mini, Google Home

Appliances

Anova Sousvide, Appkettle, GE Microwave,Netatmo Weather, Samsung Dryer (idle), Sam-sung Fridge (idle), Smarter Brewer, Smarter Cof-fee Machine, Smarter iKettle, Xiaomi Rice Cooker

Table 1: IoT devices under test. idle indicates that we capture thetraffic just for idle periods because the experiments could not beautomated.

Active experiments . We define as active the experiments involv-ing automated interactions. We perform two types of automatedinteractions, each one repeated multiple times: (i) power interac-tions , since in a previous study [4] it was reported that many IoTdevices generate significant traffic when they are powered off andon. We manage the power status of the devices through severalTP-Link smart plugs that we can control programmatically, fol-lowed by two minutes of traffic capture; (ii) functional interactions ,by automatically controlling the main functionality of the devices(i.e., the act of switching on/off the light for a smart bulb) via voice(either directly or through a smart speaker) or via a companion apprunning on a separate network with respect to the IoT device (toforce the communication to happen over the Internet rather thanlocally). Unfortunately, some interactions for some devices cannoteasily be automated (devices with idle in Table 1). For these devices,we consider only idle experiments. In total, we perform 9,810 activeexperiments between November 15th and 18th, 2019. lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Active Experiment Idle Experiment N o v N o v N o v N o v N o v N o v N o v N o v N o v U n i que S e r v i c e I P s P e r H ou r Vantage Point l Home−VP ISP−VP (a) llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll

Active Experiment Idle Experiment N o v N o v N o v N o v N o v N o v N o v N o v N o v U n i que D o m a i n s P e r H ou r Vantage Point l Home−VP ISP−VP (b) llllllllllllllllllllllllllllllllllllllllllllllllllll l l l l l l l l l l l l l l l l l l l l llll l l lll l l l l l l l l l l l l l l l ll

Active Experiment Idle Experiment N o v N o v N o v N o v N o v N o v N o v N o v N o v C u m u l a t i v e S e r v i c e I P s P e r H ou r( l og10 ) Type of Service IP and Vantage Point l l

Home−VP: Web Service IPsHome−VP: NTP Service IPsHome−VP: Other Service IPs ISP−VP: Web Service IPsISP−VP: NTP Service IPsISP−VP: Other Service IPs (c) Cumulative service IPs per port. lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll

Active Experiment Idle Experiment N o v N o v N o v N o v N o v N o v U n i que D e v i c e s P e r H ou r Vantage Point l Home−VP ISP−VP (d)

Figure 5: Home-VP vs. ISP-VP.

Active Experiment Idle Experiment N o v N o v N o v N o v N o v N o v N o v N o v N o v F r a c t i on ObservedHeavy Hitters Fraction of top 10% service IPs in terms of BytecountFraction of top 20% service IPs in terms of BytecountFraction of top 30% service IPs in terms of Bytecount

Figure 6: Fraction of observed ISP-VP vs. Home-VP per hour for pop-ular servers (heavy hitters).

In this section, we aim to understand (i) to which extent the IoTrelated traffic of a single subscriber line reaches a diverse set ofservers in the Internet, and (ii) whether the low sampling rate ofNetFlow limits the subscriber/device visibility. For this, we relyon the ground truth traffic for the Home-VP. More specifically, wemonitor the IoT traffic at both vantage points: the Home-VP, as wellas the border routers of the ISP-VP (see 1 and 2 of Figure 2).We first focus on the number of IP addresses that are contactedin each hour during the idle and the active experiments by theIoT devices, as stated in Section 2.3. We explicitly exclude DNStraffic, since it is not IoT-specific. From Figure 5(a), we see thatduring the active experiments, the IoT devices contact between 500 and 1,300 service IPs per hour when monitored at the Home-VP.Due to sampling, not all of this traffic is visible at the ISP-VP. Wedefine service IPs as the sets of IPs associated with the backendinfrastructures that support the IoT services. Indeed, the number ofobserved service IPs per hour in the ISP-VP decreases to an averageof 16%. Overall, during our idle experiments, the total numberof contacted service IPs is lower, but the average percentage ofobserved service IPs remained at 16.5%.The spikes in the active experiments are partially due to powerand the functional interactions. This can be seen on the idle experi-ments, where the spike indicates the action of starting the device(only at the beginning). Note that these spikes are also visible inthe sampled ISP NetFlow data.At first glance, 16% sounds like a very small percentage. However,we note that the visibility of popular service IPs is significantlyhigh. Figure 6 shows the fraction of service IPs that are visible forthe servers contacted the most, according to byte count. For thetop 10% of the service IPs, more than 75% are visible, rising up to90% during some experiments. For less popular service IPs, e.g., thetop 20% and top 30%, the visibility is only reduced to 70% and 60%in the active experiment, and a bit lower for the idle experiment.If we consider the entire period of our experiments, the percent-age of visible service IPs is more than 34% and 28% for idle andactive experiments. Overall, at the daily level, more than 95% ofservice IPs are visible for the top 20%. Although we cannot observe all

IoT devices activity at the ISP-VP, a significant subset is visible.While any specific service IP may not matter that much for an IoTservice, its communications with a server domain name that may be osted on multiple service IPs is essential. From the Home-VP, weknow which service IPs correspond to which domain. Thus, we candetermine which observed service IPs at the ISP-VP belong to whichdomain. This information is relevant for our methodology becausein the ISP NetFlow data only IPs are visible. Figure 5(b) shows thenumber of observed Fully Qulified Domain Names (FQDNs, wewill refer to them as domains or domain names for the rest of thepaper) at the Home-VP and the ISP-VP. Many domains are hostedat multiple service IPs, hence we see that the number of observedservice IPs is higher than the number of observed domains.Figure 5(d) shows the number of observed IoT devices per hourfrom the ground truth IoT traffic. We observe a device when atleast one packet from that device is seen within an hour. Note,For active mode, the experiments on devices from Testbed 1 (seefigure 3), are initiated after Testbed 2. Therefore, all devices are notactive during the same period. The average percentages of devicesvisible at ISP-VP, during active and idle experiments are 67% and64% respectively.Next, we separate the observable network activity by ports. Morespecifically, we consider Web Services (ports 443, 80, 8080), NTPservices (port 123), and other services (the rest of the ports), andwe show the cumulative number of service IPs contacted. Theresulting plot, Figure 5(c), shows that (i) the trend of observableservice IPs at the Home-VP is mirrored at the ISP-VP, even whendifferent services are considered, and (ii) the number of service IPsconverges over time.We also checked if any of the traffic from the Home-VP is visibleat the IXP. However, neither during the active, nor during the idleexperiments, we observe traffic at the IXP. This is expected as theISP is not a member of the IXP. Rather it peers directly (via privateinterconnects) with a large number of content and cloud providersas well as other networks.In summary, our analysis of the ground truth IoT traffic showsthat, despite the low sampling of NetFlow, popular domains, serviceIPs, and ports of a single subscriber line (the Home-VP) are visibleat the ISP. In this section, we outline our methodology for the detection of IoTdevices in-the-wild. IoT services typically rely on a backend supportinfrastructure (see Figure 1) for user interactions. From our groundtruth experiments, we noticed that this backend infrastructure isoften also used for keep-alives, heartbeats, updates, maintenance,storage, and synchronization. This observation is consistent withprevious works [4, 15].We focus on identifying which Internet backend infrastructure issupporting each of the IoT devices that we deployed in our testbeds(see 3 in Figure 2). When we refer to Internet backend infrastruc-ture, we use two different abstractions: (i) sets of IP addresses/portscombinations as observable from the Internet vantage points, and(ii) sets of DNS domains. We focus also on domains because theyare the primary indirect way for the devices to access their backendinfrastructure. While domain names are typically part of the per-manent programming of the devices, IP addresses are discoveredduring DNS resolution, and may change over time.

Build a Hitlist of IoT-Domains, IPs & Port Numbers + Detection Rules

IoT Domains DNSDB data

Censys Dataset

Section 4.1 Section 4.2.1

Section 4.2.2

Generic Domain? No Dedicated, Shared, or No Record

Match criteria?

NoRec.Dedicated

Yes

Remove shared dom. Enough Primarydomains?

Section 4.2.3

Section 4.3Detection LevelGenerate Detection RulesDaily Hitlist &Detection Rules

Yes

Figure 7: IoT Traffic detection methodology overview.

A naive approach for identifying the backend infrastructurewould be to use the ground truth traffic to identify which domains,and as a consequence, which service IPs are being contacted by eachdevice. However this is not sufficient for the following reasons:

Limited relevance of some domains:

Not all domains are es-sential to support the services, or are useful for classification; forexample, some domains may be used for advertisements or genericservices, e.g., time.microsoft.com or wikipedia.org , see Sec-tion 4.1. Limited visibility of IP addresses:

Since the ground truth datais captured at a single subscriber line only and DNS to IP mappingis rather dynamic, just looking at this traffic is not sufficient, seeSection 4.2.1.

Usage of shared infrastructure:

Not all IoT services are sup-ported by a dedicated backend infrastructure. Some rely on sharedones, such as CDNs. In the former case they can still have dedicatedIP addresses; in the latter cases they use shared IP addresses, seeSection 4.2.1.

Churn:

DNS domain to IP address mappings are dynamic, seeSection 4.2.1.

Common programming APIs:

Multiple IoT services may usethe same common programming API or may be used by differentmanufacturers; as a result, they often rely on the same infrastruc-ture. This is the case for relatively generic IoT services such asAlexa voice service. While this IoT service is available on ded-icated devices, e.g., Amazon Echo, it can also be integrated intothird-party hardware, e.g., fridges and alarm clocks [13]. We cannoteasily distinguish these from network traffic observations.Below we tackle these challenges one by one. The outcome is anIoT dictionary that contains mappings for individual IoT servicesto sets of domains, IP addresses, and ports. Based on IoT services,we generate rules for IoT device detection. For an overview of theresulting methodology, see Figure 7.

The amount and frequency of network traffic that an IoT deviceexchanges with its backend infrastructure varies from device todevice, depending on the complexity of its services, its implemen-tation specifics, and the usage of the device. This is highlighted inFigure 8, where we show the average number of packets per deviceand per domain (using a log y-scale) for 13 different devices (subsetof devices) in their idle mode. The first observation is that mostdevices are supported by their own set of domains and for manyIoT services, this is a small set containing less than 10 domains. Werefer to these as small domain sets as they correspond to laconic devices. Other devices gossip and have sizable domain sets . Figure 8 a m a z on do m a i n a m a z on do m a i n a m a z on do m a i n a m a z on do m a i n a m a z on do m a i n a m a z on do m a i n a m a z on do m a i n a m a z on do m a i n a m a z on do m a i n a m a z o n d o m a i n a m a z o n d o m a i n a m a z o n d o m a i n a m a z o n d o m a i n a m a z on do m a i n a m a z on do m a i n a m a z on do m a i n a m a z on do m a i n a m a z on do m a i n a m a z on do m a i n a m a z on do m a i n a m a z on do m a i n a m a z on do m a i n a m a z on do m a i n a pp l e do m a i n a pp l e do m a i n a pp l e do m a i n a pp l e do m a i n a pp l e do m a i n a pp l e do m a i n a pp l e do m a i n a pp l e do m a i n a pp l e do m a i n a pp l e do m a i n a pp l e do m a i n b li n k do m a i n b li n k d o m a i n m e r o s s d o m a i n n e t a t m o d o m a i n p h ili p s d o m a i n ph ili p s do m a i n ph ili p s do m a i n ph ili p s do m a i n p l a tf o r m do m a i n p l a tf o r m do m a i n p l a tf o r m do m a i n s m a r tt h i ng s do m a i n s m a r tt h i ng s do m a i n s m a r tt h i ng s do m a i n s ou sv i d e do m a i n t p li n k do m a i n x i a o m i do m a i n x i a o m i do m a i n y i ca m e r a do m a i n y i ca m e r a do m a i n y i ca m e r a do m a i n GossipingDevicesGossipingDevicesLaconicDevices

Device

Apple TVBlink HubEcho Dot MerossDoor OpenerNetatmoWeatherStationPhilips Hub Smarter BrewerSmartlife BulbSmartthings Hub Sous videTP−Link BulbXiaomi Hub Yi Camera

Figure 8: Home-VP: Circular bar plot of average shows the domains of two example gossip devices (Apple TV ingray and Echo Dot in orange) and several laconic devices (rest ofthe colors).Having a sizable domain set often indicates the usage of a largerinfrastructure, which may not be dedicated to a specific IoT ser-vice. We find that most of these domains are mapped via CNAMEsto other domains. For the two gossiping examples considered inFigure 8, the domains of Echo Dot are mostly mapped to its owninfrastructure. However, the ones of Apple TV are mainly mappedto a CDN—in this case, Akamai—that offers a variety of services.Based on these observations from our ground truth data, weclassify the domains as follows:

IoT-Specific domains.

Grouped into (i)

Primary domains: reg-istered to an IoT device manufacturer or an IoT service operator;and (ii)

Support domains: that are not necessarily registered to IoTdevice manufacturers or service operators, but offering complemen-tary services for IoT devices, i.e., samsung-*.whisk.com for SamsungFridges, here whisk.com is a service that provides food recipes andimages of food.

Generic domains.

Domains registered to generic service providersthat are heavily used by non-IoT devices as well, e.g., netflix.com , wikipedia.org , and public NTP servers.We classify each domain name from our idle and active experi-ments using pattern matching, manual inspection, and by visitingtheir websites and those of the device manufacturers. Since the Generic domains cover non-IoT traffic, we do not further considerthem. Rather, we focus on the

IoT-Specific domains. As a result, weclassify 415 out of the 524 domains as

Primary and 19 as

Support domains.Next, we explore the volume of traffic that the IoT devices ex-change with all domains. Figure 9 shows the ECDF of the averagenumber of packets per hour per domain for all IoT-Specific domains l llll ll ll ll l ll llll lll ll llll l ll ll l l llll ll l lll l k k Avg E CD F Experiment l Active Idle

Figure 9: Home-VP: ECDF of average for both the idle and the active experiments. First, we note thatalmost all devices and domains , except for one device in its idlemode, are exchanging at least 100 packets per hour, and this maynot suffice for detecting them in any given hour in the wild due tosampling. However, during the active experiments, we see thatsome domains are only used when the device is active or otherdomains receive significantly more traffic, up to and exceeding 10Kpackets, which may suffice for detection . These latter domains maybe ideal candidates for detecting such devices in the wild.

Once we have a list of IoT-Specific domains (FQDNs) with theirassociated service IP addresses and port mappings from the groundtruth experiments, we need to understand whether they have ashared or dedicated backend infrastructure. The reason is that, ifwe want to identify IoT services and consequently IoT devices inthe wild by using network traces such as NetFlow, we can onlyobserve standard network level features such as src/dst IP and portnumbers without packet payload. Therefore, if a service IP belongsto a shared infrastructure such as a CDN or a generic web hostingservice, this service IP can serve many domains, and it is impossiblefor us to exactly know which domain was actually contacted. Tothis end, the purpose of this section is two-fold. First, to expand thecandidate service IPs beyond those directly observed in the groundtruth experiments (to mitigate that we are focusing on a singlesubscriber line). Second, to classify domains into those that usebackend services hosted on dedicated infrastructure service IPs vs.those that rely on shared infrastructure service IPs. We do this byrelying on DNSDB [16], Censys [9], and applying additional filters.

We useIoT-Specific domains to identify the backend infrastructure thatis hosting them. To this end, we leverage the technique in [17],and use these domain names to identify all associated service IPson which these domains are hosted during the time period of ourexperiments. We use both the ground truth experiments, and ex-ternal DNS databases, including DNSDB [18]. We found that thespecific IP addresses mapping to specific domains can change often.However, DNSDB provides information for all domains served byan IP address in a given time period and vice versa, hence it mit-igates the issues caused by this churn. DNSDB also provides allrecords, including CNAMEs that may have been returned in theDNS response, for a given domain. Thus, we use DNSDB to checkif a service IP address is exclusively used for a specific IoT service, r if it hosts additional domains. We say a service IP is exclusively used if it only serves domains from a single “second-level” domain(SLD) and its CNAMEs. However, we note that the CNAMEs maynot involve the same second-level domain. Let us consider an exam-ple: the domain devA.com is mapped via a chain of CNAMEs suchas devA-VM.ec2compute.amazonaws.com to IP a.b.c.d . This IPonly reverse maps to devA-VM.ec2compute.amazonaws.com andits associated CNAME devA.com . Since this is the only CNAMEassociated with the IP, we may consider this IP a direct mappingfor the domain. Yet, at the same time, we find support that publicIP addresses assigned to a cloud resource such as a virtual machinein AWS EC2, that is occupied by a tenant, is not shared with othertenants unless the current resource is released. This is a popular ser-vice offered by multiple platforms [19–21]. Let us consider a secondexample: domain devB.com . It may use the Akamai CDN. Thus, thedomain devB.com is a CNAME for devB.com.akadns.net . Thisdomain then maps to IP a.b.c.d . However, in this case, manyother domains, e.g., anothersite.com.akadns.net , also map tothis IP. Thus, we may conclude that this domain is hosted on a shared infrastructure.Once we understand if an IP is exclusively used for a specific IoTservice, we can also classify the domains as either using a dedicated or shared infrastructure. For the former, all service IPs have to bededicated to this domain for all days, otherwise we presume thatthe domain relies on a shared infrastructure.Once we apply this methodology to all 434 domain names, wefind that 217 are hosted on dedicated service IPs, while 202 arerelying on a shared backend infrastructure. For 15 of the domainswe did not have sufficient information in DNSDB. We handle themin the next step. Amongthe reasons that DNSDB may not suffice for mapping some domainsto service IPs is that (a) frequent remapping of domains to IPs or, (b)missing data since the requests for the domains may not have beenrecorded by DNSDB, which intercepts requests for a subset of theDNS hierarchy. To overcome this limitation, we rely on the certifi-cate and banner datasets from Censys [9], to infer the ownership ofthe domains and the corresponding IPs, as long as these are usingHTTPS. For example, we did not find any record for the domain c.devE.com in the DNSDB dataset. We then check if device E uses HTTPS to communicate with this domain. This allows us to queryfor all service IPs that potentially offer the same web certificate asthe hosts in this domain. For a certificate to be associated with adomain, we require that the domain name and the

Name field entryin the certificate match at least the SLD or higher, i.e. the Name fieldof the certificates matches the pattern c.devE.com or *.devE.com and that there is no other Subject Alternative Name (SAN) in thecertificate. Next, we query the Censys dataset for all IPs with thesame certificate and

HTTPS banner checksum for the domain fromour ground truth dataset within the same period. This allows usto identify data for 8 out of 15 of the domains which belong to 5devices.

In the laststep of our methodology we filter out devices that use shared back-end infrastructures. We find that Google Home, Google Home Mini, Apple TV, and Lefun camera, all have a shared backend infrastruc-ture. For LG TV, we are left with only one out of 4 domains; forWemo Plug and Wink-hub, we could not identify sufficient informa-tion. Because of this, we have excluded these devices from furtherconsideration.The result forms our daily list of dedicated IoT services, alongwith their associated domains, service IPs and port combinations.

Once we identified the set of IoT services that can be monitored,we generate the rules for detecting IoT devices. Depending on theset of IoT services contacted by the devices we can generate devicedetection rules at three granularity levels: (i) Platform-level, (ii)Manufacturer-level, and (iii) Product-level, from the most coarse-grained to the most fine-grained, respectively. In this section, first,we show how we determine the detection level for each device.Then, we explain how we generate the detection rules for each IoTdevice for the detection level that can be supported.

Platform-level:

Some manufacturers use off-the-shelf firmware,or outsource their backend infrastructure to IoT platform solu-tion companies such as Tuya [22], electricimp [23], AWS IoT Plat-form [24]. These IoT platforms can have several customers/manufacturersthat rely on their infrastructure. Therefore, we may not be able todistinguish between different manufacturers from their networktraffic.

Manufacturer-level:

The majority of our studied IoT servicesrely on dedicated backend infrastructures that are operated by themanufacturers themselves. We also observe that many manufac-turers rely on similar APIs and backend infrastructures to supporttheir different products and services. This makes distinguishingindividual IoT products from their network traffic more challenging.

Product-level:

This is the most fine-grained detection level, wherewe are able to distinguish between different products of a man-ufacturer, e.g., Samsung TV, or Amazon Echo vs. Amazon FireTV. For detection at the product level, we underline the impor-tance of side information about the purpose associated with a do-main. With this information, we can improve our classificationaccuracy. For example, for Alexa Enabled devices, the domain avs-alexa.*.amazon.com is critical, as it is the base URL for theAlexa Voice Service API [13] (shown in Figure 8 as amazon do-main23). Other examples are the Samsung devices that use thedomain samsungotn.net to check for firmware updates [25].Additionally, some advanced services of the devices often requireadditional backend support from manufacturers. These may thencontact additional domains. By considering more specific features(domains), the capabilities to distinguish products increases. Weleverage these specialized features e.g., to distinguish Amazon FireTV, which contacts significantly more domains than other Amazonproducts, e.g., Echo Dot.

For any of our three lev-els of detection, we require that a subscriber contacts at least oneIP/port combination associated with a Primary domain of the IoTservice, to claim detectability of IoT activity at the subscriber. How-ever, if there are many domains, requiring only one such activity ay not have enough evidence. For example, by monitoring asingle domain we can detect all Alexa Enabled devices, but thisservice can be integrated into third party hardware as well. There-fore, in order to detect products manufactured by Amazon, e.g.,Amazon Echo, it is essential to monitor additional domains that arecontacted by the Amazon Echo devices. For this, we introduce the detection threshold D . If an IoT service has N IoT-Specific domains,we require to observe traffic involving k IP/port combinations thatare associated with max ( , (cid:98) D × N (cid:99)) of the N domains. To deter-mine an appropriate value for this threshold, we rely on our groundtruth dataset, see Section 5.We start with 96 devices in our testbeds. We have multiple copiesof a same device deployed in different continents. This reduces theset of devices to 56 unique products. Of these, many are from thesame manufacturer, e.g., a Xiaomi rice cooker, a Xiaomi plug, anda Xiaomi light bulb. Since these devices are often supported bythe same backend infrastructure of the manufacturer, the list ofdomains has significant overlap and often fully overlaps. In ourmethodology we can detect 3 different IoT platforms, the coarsestlevel, as 4 of our products rely on them. Moreover, we generatedrules for the detection of 29 IoT devices at the manufacturer level.We had a diverse range of products from Amazon and Samsungin our testbed that allowed us an in-depth analysis, and cross-examination of domains contacted by different products. Therefore,for devices using Alexa voice service (i.e., Alexa Enabled), and forSamsung IoT devices, we detect the former at the platform leveland the latter at the manufacturer level. For Alexa Enabled andSamsung IoT devices, we compared the domains across differentdevices and obtained enough side information about the purposeof their domains that allowed us to further divide each of them intotwo subclasses at more fine grained levels. For this, we defineda hierarchy, namely Amazon products, and Fire TV, under AlexaEnabled devices. Amazon products are detected at manufacturerlevel, and include products such as Amazon Echo family and issuperclass of Fire TV. We identified 33 additional domains, besidesthe Alexa voice service domain, that were contacted by Amazonproducts. Moreover, Fire TV contacts up to 67 domains (34 moredomains than Amazon products). This allows us to establish itssubclass, at product level, under Amazon products. Using sideinformation [25] and comparing the set of domains across differentSamsung products, we monitor 14 domains in total, but only onedomain is important to detect Samsung IoT devices with Samsungfirmware (these include a broad range of products, such as fridges,washing machines and TVs). Samsung TVs contact 16 additionaldomains that are not used by any of the other Samsung devices inour testbed.Using the above methodology, except for the devices listedin section 4.2.3, we generated detections rules at different levelsfor our testbed devices. We generated rules for the detection of20 manufacturers, and 11 products that amounts to the 77% ofmanufacturers in our testbeds. We generate rules for 4 unique IoTplatforms by monitoring 1 to 4 domains (2 platforms were contactedby 4 devices, we report them separately). Finally, for 11 productswe consider between 1 to 67 domains. For a detailed number ofdomains per IoT device see Figure10.

Active Experiment Idle Experiment D o m a i n2 D o m a i n s D . D o m a i n s + D o m a i n s Alexa Enabled(Pl.)Anova Sousvide(Pr.)iKettle(Pl.)Insteon Hub(Pr.)Magichome Stripe(Pr.)Meross Dooropener(Man.)Microseven Cam.(Pr.)Netatmo Weather St.(Man.)Smarter Coffee(Pl.)AppKettle(Pr.)Blink Hub & Cam.(Man.)Flux Bulb(Pl.)GE Microwave(Man.)Icsee Doorbell(Pr.)Lightify Hub(Pl.)Luohe Cam.(Pr.)Reolink Cam.(Pr.)Sengled Dev.(Man.)Smartthings Dev.(Man.)Wansview Cam.(Man.)Honeywell T−stat(Man.)Xiaomi Dev.(Man.)Nest Device(Man.)Ring Doorbell(Man.)Smartlife(Pl.)Ubell Doorbell(Man.)Yi Camera(Man.)Amazon Product(Man.)Amcrest Cam.(Man.)Dlink Motion Sens.(Man.)Fire TV(Pr.)Philips Dev.(Man.)Roku TV(Pr.)Samsung IoT(Man.)Samsung TV(Pr.)TP−link Dev.(Man.)ZModo Doorbell(Man.) Threshold I o T D e v i c e w / ( D e t e c t i on Le v e l ) Duration(Hours) NotDetected

Figure 10: Home-VP: Time to detect IoT (per threshold).

We use our ground truth dataset to check how long it takes for ourmethodology (applied to the sampled flow data from the ISP) todetect the presence of the IoT devices for the idle and the activeexperiments (see 4 of Figure 2). For this, we report the time thatit takes to detect an IoT device that is hosted in our ground truthsubscriber line when it is in active mode (Figure 10 left) and idlemode (Figure 10 right). We only include the ones that are detectablewith our methodology, i.e., those that do not rely exclusively onshared infrastructures. We also annotate the device name with itsdetection levels: Platform (Pl.), Manufacturer (Man.), and Productlevel (Pr.).On average, by requiring the evidence of at least 40% of domains,we are able to detect 72/93/96% of IoT devices that are detectable atmanufacturer or product level within 1/24/72 hours in the activemode. Even in idle mode their the percentage is 40/73/76% with1/24/72 hours. For the devices detectable only at product level (Pr.),with the same required evidence, we detected 63/81/90% of themwithin the 1/24/72 hours respectively, in active mode. Note, weare using the sampled ISP data. Indeed, popular products suchas Amazon products (i.e., Echo Dot, Echo Spot) can be almost in-stantly detected. This is a significant finding and underlines thatit is possible to use sampled flow data within an ISP to accuratelydetect the presence of a specific IoT product within a subscriberline, despite differences in activity and IP churn due to operationalrequirements.A closer look reveals that, in general, it takes longer to detectan idle IoT device in comparison to when it is active. This is notsurprising, as most IoT devices show more network activity inactive mode. However, this does not mean that the increase willoccur across all of the services contacted by a device, since thereare exceptions that take longer to detect even in active mode, e.g.,SmartLife, and Nest. llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − U n i que S ub sc r i be r s / H ou r l og10 Device Type l Samsung IoTAlexa Enabled Other 32 IoT Device types (a) Per Hour. l l l l l l l l l l l l l l N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − U n i que S ub sc r i be r s P e r H Device Type l Samsung IoTAlexa Enabled Other 32 IoT Device types (b) Per Day.

Figure 11: ISP: Per Hour, Subscriber lines with IoT activity (Alexa Enabled, Samsung IoT, and others). l l l l l l l l l l l l l ll l l l l l l l l l l l l l N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − U n i que S ub sc r i be r s P e r H Device Type ll Alexa EnabledAmazon ProductAmazon Firetv Samsung IoTSamsung TV

Figure 12: ISP: Drill down for Amazon and Samsung IoT devices–perday.

Figure 10 also contains information regarding the number ofmonitored domains per IoT device with their detection level. For9 IoT devices, a single domain is considered. For the others, weconsider many more (up to 67). A threshold determines the fractionof domains for which we require evidence of network traffic toclaim detection. To understand the impact of such threshold ondetection time, we variate its value from 0.1 to 1 and show thecorresponding detection times. Note, for IoT devices where weconsider only one domain, the variation of the threshold does notchange the detection time, as we always require evidence of atleast one domain. Overall, we note that a larger threshold canincrease the detection time, and some IoT devices may no longer bedetectable. However, it may also increase the false positive rate. Wecrosscheck possible false positives by running another experimentwhere we only enable a small subset of IoT devices. We then applyour detection methodology to these traces and do not identify anydevices that are not explicitly part of the experiment. We also tryto avoid false positives by ensuring that the domain sets per devicediffer.Regarding detectability, we notice that 6 IoT devices could not bedetected even after the entire duration of our idle experiments. Acloser investigation shows that for 5 of these, the frequency of trafficis so small that their likelihood of detection is very low. Indeed,for this specific time period, they were invisible in the NetFlowdata. This highlights that in order to be able to confidently detecta device, the device have to either exchange enough packets with the targeted domains or the sampling rate shall be increased. ForSamsung TV, we require to observe enough domains to confirmthe presence of a Samsung IoT device, before moving forward withdetection. Thus, if we do not see enough Samsung IoT domains, thenwe do not claim the detection of Samsung TVs. Nevertheless, theresults look very promising for us to attempt on detecting deployedIoT devices in the wild.

In this section, we apply our methodology for detecting IoT activityin the ISP and IXP data (see 5 in Figure 2). For this we focuson the two weeks in which we collected the data from the groundtruth experiments to obtain up-to-date mappings of domains to IPs.

Applying our methodology to traffic data from ISPs and IXPs mayraise ethical concerns as it may be considered as analyzing customeractivities. However, this is not the goal of this paper. The goal hereis to showcase that it is possible to detect and map the penetration ofIoT device usage. As such, this study is not about subscribers’ deviceactivities, instead it is about detection capabilities and aggregatedusage. Thus, we report on percentages of subscriber lines wherewe can observe IoT related activity. Indeed, we are unable to traceIoT activity back to individuals as the raw data was anonymizedas per recommendations by [5] and never left our collaborators’premises. Moreover, we do not analyze any data that is not relatedto the detection of IoT presence, e.g., DNS queries [26], or flowsthat are not related to IoT backend infrastructures, to eliminate anyuser Web visit profiling.

IoT related activity in-the-wild.

Figure 11 shows the numberof ISP subscriber lines for which we detect IoT related activity.The ISP does not operate a carrier-grade NAT. Even if multipleIoT devices are hosted at an ISP subscriber, we count the hostingsubscriber only once. Thus, the number of subscribers that hosta given IoT device is a lower bound for the number of the givenIoT device in the premises of ISP subscribers. Figure 11(a) and l l l l l l l l l l l l ll l l l l l l l l l l l l ll l l l l l l l l l l l l ll l l l l l l l l l l l l l U n i que S ub sc r i be r s / S u s b sc r i be r s N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − C u m u l a t i v e C oun t pe r H Device Type ll Alexa EnabledAmazon ProductAmazon Firetv Samsung IoTSamsung TV

Figure 13: ISP: Cumulative

Figure 11(b) focus on hourly and daily summaries. Since the topIoT devices detected are Alexa Enabled and Samsung IoT, we showthem separately. We see IoT related activity for roughly 20% ofthe subscriber lines. Our results show a significant penetration ofAlexa Enabled devices of roughly 14%. This is slightly more thanestimates of national surveys in the country where the ISP operates,stating that the market penetration of Alexa Enabled devices, as ofJune 2019, is around 12% [27–29]. Yet, these reports cannot capturewhich devices are in active use at any particular day, e.g., Nov. 2019,contrary to our study. Note, in Figures 11, 12, 14 and 15 we applyour methodology on each time bin independently.

Daily patterns of IoT related activity.

By looking at the hourlyplots in Figure 11(a), we see some significant daily patterns forAlexa Enabled and Samsung IoT devices. We do not see diurnalpatterns for the other 32 IoT device types. Such diurnal patternsare correlated with human activities. Typically, during the day,network activity increases as the users interact with the IoT deviceswhile it decreases during the night when the devices are idle. Asdetection likelihood is correlated with network activity, the devicesdetectability also correlates with this diurnal pattern. We notethat the patterns for Alexa Enabled does not differ from those forSamsung. The reason is that many of the Alexa Enabled and Sam-sung IoT (Samsung TVs) class may be used more for entertainment,which is why their activity is higher in the evenings. SamsungIoT devices have a small spike in the mornings before graduallyreaching their peak around 18:00 (ISP timezone).For the drill down for Samsung IoT devices see Figure 12. Evenwith the presence of a diurnal variation for Alexa Enabled, thereis a significant baseline during the night. This is expected as IoTdevices often have traffic even when they are idle and are thusdetectable. Over the course of a day, the diurnal variation is ratherlow compared with the typical network activity driven by humanactivity. This explains the low variance of the observed number ofsubscriber lines for Alexa Enabled devices.

Aggregation per day.

We observed in Section 5 that, while itis often possible to detect Alexa Enabled devices within an hour, T op T op T op T op T op k k N o M a r k e t O t he r N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − Meross Dooropener(Man.)Philips Dev.(Man.)Wansview Cam.(Man.)Netatmo Weather St.(Man.)Smartthings Dev.(Man.)Yi Camera(Man.)iKettle(Pl.)Reolink Cam.(Pr.)Anova Sousvide(Pr.)Honeywell T−stat(Man.)Amcrest Cam.(Man.)AppKettle(Pr.)Dlink Motion Sens.(Man.)Lightify Hub(Pl.)Nest Device(Man.)ZModo Doorbell(Man.)Smarter Coffee(Pl.)GE Microwave(Man.)Insteon Hub(Pr.)Microseven Cam.(Pr.)Blink Hub & Cam.(Man.)Flux Bulb(Pl.)Icsee Doorbell(Pr.)Luohe Cam.(Pr.)Magichome Stripe(Pr.)Ring Doorbell(Man.)Roku TV(Pr.)Sengled Dev.(Man.)Smartlife(Pl.)TP−link Dev.(Man.)Ubell Doorbell(Man.)Xiaomi Dev.(Man.) I o T D e v i c e w / ( D e t e c t i on Le v e l ) k k Figure 14: ISP: Drill down of IoT activity for 32 different IoT devicetypes with their popularity in the ISPs country. the same is not always true for Samsung IoT devices. Therefore,Figure 11(b) reports the same data but this time using an aggrega-tion period of a day. We see that the total number of observedsubscriber lines does not change drastically from day to day. How-ever, we also note that the number of subscriber lines with AlexaEnabled devices roughly doubled, while those with Samsung in-creased by a factor of 6. The reason is that detecting SamsungIoT devices is more challenging because they are contacting theirPrimary domain less frequently than Alexa Enabled devices. Thus,their detection is heavily helped by the increase in the observationtime period. For the other IoT devices we see these effects, wherebythe increase is correlated to the expected time for detection. Note,certain Samsung domains are contacted by both Samsung IoT andNon-IoT devices. In our analysis, we only consider domains thatare exclusively contacted by

Samsung IoT devices. By adding thosedomains, the number of detected Samsung devices will be increasedat least by a factor of two, but this also adds false positives to ourresults.

Detecting specific devices.

So far, we have focused on the su-perclass of Alexa Enabled and Samsung IoT devices. However, byadding more specialized features, our methodology allows us tofurther differentiate them. For example, some subsets of domainsare only contacted by specific products. Thus, in Figure 12 weshow which fraction of the Alexa Enabled IoT devices are con-firmed Amazon products and which fraction of these are Fire TVsusing a conservative detection threshold of 0.4. For Samsung IoTdevices, we show how many of them are Samsung TVs. Again, the Most subscriber lines are not subject to new address assignments within a day. Mostaddresses remain stable as the ISP offers VoIP services. l l l l l l l l l l l l l N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − N o v − U n i que I P s P e r H Device Type l Samsung IoTAlexa Enabled Other 32 IoT Device types

Figure 15: IXP: Number of Samsung IoT, Alexa Enabled, and Other32 IoT device types IPs observed/day. l llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l

Top Eyeball AS E CD F Device Type l Samsung IoTAlexa EnabledOther 32 IoT Categories

Figure 16: IXP: ECDF of Per-ASN Percentage ( number of subscriber lines with such IoT devices is quite constantacross days. As expected, the specialized devices only account fora fraction of the devices of both manufacturers.

Subscriber lines churn.

While the ISP’s overall churn of sub-scriber line identifier is pretty low (as was also confirmed by theISP operator), some changes are possible and may bias our results.Possible reasons for such changes are: unplugging/rebooting of thehome router, regional outages, or daily re-assignment of IPs forprivacy reasons. Yet, as most IoT devices are detectable within aday (recall Section 5), the churn should not bias our results. Still, tocheck for such artifacts, we move to larger time windows: see theupper panel of Figure 13, which plots the cumulative number of sub-scriber lines with detected Alexa Enabled and Samsung IoT devices,respectively, for up to two weeks. Here, we see that the fractionsincrease. However, we may have substantial double counting dueto identifier rotation. To underline this conclusion, we considerpenetration at the /24 prefix aggregation level, see the lower panelin Figure 13. The penetration lines stabilize smoothly, but at dif-ferent levels and with different speed. The latter is related to thepopularity of an IoT device. If it is already popular, the likelihoodof moving from a known to an unknown subscriber line identifieris lower with respect to less popular IoT devices.

Detecting other IoT devices in-the-wild.

Figure 14 reports thedetected number of the IoT devices that are neither Alexa Enablednor Samsung IoT. We report them using a heatmap, where eachcolumn corresponds to a day and each row to an IoT device anno-tated with its detection level. The color of each entry shows thenumber of subscribers lines during that day. Our first observationis that the number of subscriber lines for each device class is verystable across the duration of our study. Next, we point out thatour experiments include popular devices from both the Europeanas well as the US market. For a reference, we report the relativepopularity of each IoT device in the Amazon ranking for that de-vice, in the country where the ISP operates. If a ranking of a deviceis not available, we categorize them as “other.” Popular devicesare more prominent than unpopular ones or the ones that are notavailable in the country’s market. For example, on the one handthere are Philips devices that are popular and in heavy use withmore than 100 K subscription lines on a daily basis. On the otherhand there is Microseven camera that is not in the country’s market.Yet, we can still observe some deployments, these results highlight that our methodology is able to detect both popular and unpopularIoT devices when the domains and associated service IPs that IoTdevices visit can be extracted.

Next, we apply our detection methodology at the IXP vantagepoint. Here, we have to tackle a few additional challenges: First,the sampling rate at the IXP is an order of magnitude lower than atthe ISP. Second, the vantage point is in the middle of the network,which means that we have to deal with routing asymmetry andpartial visibility of the routes. Third, while the ISP does aggressivespoofing prevention, e.g., with reverse path filtering, this is notpossible at the IXP. Spoofing prevention is the responsibility ofindividual IXP members. Thus, we require TCP traffic to see atleast one packet without flags, indicating that a TCP connectionwas successfully established. While this may reduce visibility, itprevents us from over-estimating the presence of IoT traffic.While the IXP offers network connectivity for every ASes, onlya few member ASes are large eyeballs [30]. It is not that surprisingthat we did not observe any activity of the ground truth experi-ment, recall Section 3. Still, we are able to detect significant IoTactivity. Figure 15 shows the number of IPs for which we detectedIoT activity per day for our two-week study period (November15th-28th, 2019). We are able to detect roughly 90k Samsung de-vices, 200k Alexa Enabled devices, and more than 100k of other IoTdevices. This underlines that our methodology, which is based ondomains and generalized observations from a single subscriber line,is successful. Most IXP members are non-eyeball networks. Assuch, we expect that the detected IoT activity is concentrated onthese members. Figure 16 shows an ECDF of the distribution of IoTactivity per AS for one day (November 15th, 2019) and three IoTdevice types, namely, Samsung IoT, Alexa Enabled, and the otherIoT devices. The distributions are all skewed—a small number ofmember ASes are responsible for a large fraction of the IoT activity.Manual checks showed that these are all eyeball ASes. Yet, we alsosee a fairly long tail. This underlines that some IoT devices may notonly be used at home (and, thus, send their traffic via a non-eyeballAS). lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Active Experiment Idle Experiment N o v N o v N o v N o v N o v N o v N o v N o v N o v P a ck e t C oun t P e r H ou r( l og10 ) Vantage Point l Home VP ISP VP

Figure 17: Home-VP/GT Household: Single Alexa Enabled device. l l l l l l l N o v − N o v − N o v − N o v − N o v − N o v − N o v − U n i que S ub sc r i be r s ( l og10 ) Granularity &Device State l Daily: Active and IdleHourly: Active and IdleHourly: Active

Figure 18: ISP:

A natural question is whether sampled flow data also allows oneto distinguish if an IoT device is in active use. Our results indicatethat the answer is positive. First, our ground truth experimentsshow that for some devices, the domain sets used during the idleexperiments differ from those during active experiments. Hence wecan use these domains to determine the mode (active/idle) of an IoTdevice. Second, the amount of traffic also varies depending on themode. To highlight this, Figure 17 shows the number of observedpackets at the Home-VP for a single Alexa Enabled device, as wellas the ISP-VP for both modes. Activities cause spikes above 1K atthe home vantage points and above 10 at the ISP-VP. These rangesare never reached during the idle experiments.When using the first insight for, e.g., devices from TP-link (TP-link Dev.), we are able to capture active use for only 3.5% of thedevices. The reason is that these are plugs, which have a total trafficvolume so low that it limits the detectability due to the low samplingrate at the ISP. When using the second insight for Alexa Enableddevices, we find that we can detect significant activity. Figure 18shows both the subscriber lines with Alexa-enabled devices perhour, per day as well as the subscriber lines with active Alexa-enabled devices. Based on the above-mentioned observations, weused the threshold of 10 for packet counts per hour to filter outsubscribers that actively used Alexa-enabled devices in a given hour.Based on this threshold, we see that the number of actively useddevices reaches 27,000 during the day and weekends (November23rd-24th, 2019), following the diurnal pattern of human activity.The ability to distinguish active from idle usage of IoT devices inthe wild may raise ethical/privacy concerns. However, the goal ofthis paper is not to analyze user behavior, but rather to point outthe privacy concerns associated with having these IoT devices athome [3].

The ability to detect IoT services can be used in a constructivemanner or even as a service by ISPs. For example, if there areknown security problems with an IoT device, the ISP/IXP can blockaccess to certain domains/IP ranges or redirect their traffic to benignservers. The methodology can also be used for troubleshooting,incident investigation, and even incident resolution. For example,an ISP can use our methodology for redirecting the IoT devices traffic to a new backend infrastructure that offers privacy notices orsecurity patches for devices that are no longer supported by theirmanufacturers.Moreover, if an IoT device is misbehaving, e.g., if it is involvedin network attacks or part of a botnet [31], our methodology canhelp the ISP/IXP in identifying what devices are common amongthe subscriber lines with suspicious traffic. Once identified, theirowner can be notified in a similar manner, as suggested by [32],and it may be possible to block the attack or the botnet controltraffic [33].

Our methodology has some limitations.

Sample devices.

We need to have sample devices in order toobserve which domains are being contacted.

Superclass detection.

We mostly check for false negatives andlimitedly for false positives as we only have traffic samples from asubset of IoT devices, but not for all possible IoT devices. If an IoTdevice relies on a shared backend infrastructure or common IoTAPIs, we only detect the superclass, e.g., at the manufacturer level.

Network activity.

We rely on the network activity of IoT devices.As such, if the traffic volume is very low detectability decreases,and detection time increases.

Shared infrastructures.

We cannot detect IoT services that relyon shared infrastructures. If the IoT devices change their backendinfrastructure, e.g., after an update, we may have to update ourdetection rules too.

Our analysis could be simplified if an ISP/IXP had access to all DNSqueries and responses as they do in [34] and [26]. Even havinga partial list, e.g., from the local DNS resolver of the ISP, couldimprove our methodology. Yet, this raises many privacy challenges.An increasing number of end-users rely on technologies like DNSover TLS [35], or public DNS resolvers, e.g., Google DNS, OpenDNS,or Cloudflare DNS, rather than the local ISP DNS server [36]. Yet,this also points to another potential privacy issue—the global datacollection and analysis engines at these DNS operators, which canidentify IoT devices at scale from the recorded DNS logs using ourinsights. Capturing DNS data from the network itself would requiredeep packet inspection and thus, specialized packet capture, whichis beyond the scope of this paper. e subscriber or device detection speed varies depending notonly on the device and its traffic intensity, but also on the trafficcapture sampling rates. The lower this rate, the more time it maytake to detect a specific IoT device. Moreover, identifying the rele-vant domains for each IoT device does require sanitization, whichmay involve manual work, e.g., studying manuals, device docu-mentation, vendor web sites, or even programming APIs. Giventhat we are unable to identify IoT services if they are using sharedinfrastructures (e.g., CDNs), this also points out a good way to hideIoT services. We can use our insights to develop signatures that allow an ISP toidentify households that use specific IoT services. If such servicesare, e.g., subject to security concerns they can use such signatures tonotify the corresponding customer of the potential problem and fix.This is also possible if the IoT service is no longer supported or needsend-user manual upgrades, e.g., to mitigate threats. Such signaturesmay also be used to move from DDoS attacks towards identifyingculprits. Our approach is potentially scalable further using MUDprofiles [37], where devices will signal to the network what sort ofdomains, access and network functionality they require to properlyfunction. It is also possible to extend the list of signatures of IoTdevices using crowdsourcing [38].

There have been some recent papers in understanding home IoTtraffic patterns and identifying devices based on their signatures,trackers, and network traffic [39]. These approaches often rely ontestbed data [4, 40], or tools for the active discovery of the householddevices and their network traffic [41]. The authors in [40] use abroad range of network features from packet captures, includingdomain names to train a machine learning model and detect IoTdevices in a lab environment. However, they do not further studythe backend infrastructure supporting IoT devices. There havealso been a few early attempts at mitigating against these devicediscoveries using traffic padding [42] or blocking techniques [33].A number of recent efforts focused on inferring IoT device typesfrom network traffic [6, 43]. In [15] the authors used instrumentedhome gateways to look at IoT traces from over 200 householdsin a US city. Their analysis revealed that while the IoT space isfragmented, few popular cloud and DNS services act as a centralhub for the majority of the devices and their data.Generally, many IoT devices periodically connect to specificservers on the Internet. Authors in [26] and [34] proposed a methodto identify IoT devices by observing passive DNS traffic and uniqueIP addresses that the device connects to. Unfortunately, manyIoT devices rely on shared infrastructures and often different IoTdevices from the same vendor connect to the same servers, thereforedetection at the scale of ISP/IXP, based on the IP addresses andport numbers without considering the important role of sharedinfrastructures, cannot be very reliable.Complementing the approaches based on testbeds and homegateways, there have been efforts in understanding IoT traffic pat-terns using data from transit networks [44], though it has beenchallenging to successfully validate the derived signatures. Similar works relied on specific port numbers [45] that may also be usedfor specialized industrial IoT systems [46], though the approachused cannot be easily extended to general-purpose IoT devices andsmart home systems that utilize popular ports, e.g., 443, 80.These related works indicate that often, neither data from corenetworks subject to sampling and middleboxes, nor data from fewdevices using home gateways or testbeds are enough for rapidly andaccurately detecting IoT devices, and understanding their anomaliesand misconfigurations [10].In this paper, for the first time we have complemented detailedground truth data from testbeds and a particular subscriber, withlarge-scale data from an ISP and an IXP, to reveal the aggregate be-havior of these devices, alongside the ability to isolate and identifyspecific subscriber devices using sampled data at an ISP.

Home IoT devices are already popular, and their usage is expectedto grow further. Thus, we need to track their deployment withoutdeep packet inspection or active measurements, both intrusive andunscalable methods for large deployments. Our insight is thatmany IoT devices contact a small number of domains, and, thus, itis possible to detect such devices at scale from sampled networkflow measurements in very large networks, even when they are inidle mode. We show that our method is able to detect millions ofsuch devices in a large ISP and in an IXP that connects hundreds ofnetworks.Our technique is able to detect 4 IoT platforms, 20 manufacturersand 11 products–both popular and less popular ones–at vendor leveland in many cases even at product granularity. While this detectionmay be useful to understand the penetration of IoT devices at home,it raises concerns about the general detectability of such devicesand the corresponding human activity.In light of our alarming observations, as part of our future work,we would like to investigate how to minimize the harm of potentialattacks and surveillance using IoT devices. We also want to use ourinsights to help ISPs to tackle security and performance problemscaused by IoT devices, e.g., by detecting them, redirecting theirtraffic, or blocking their traffic.

ACKNOWLEDGEMENTS

We thank the anonymous reviewers and our shepherd KensukeFukuda for their constructive feedback. This work was supportedin part by the European Research Council (ERC) Starting GrantResolutioNet (ERC-StG-679158), the EPSRC Defence Against DarkArtefacts (EP/R03351X/1), the EPSRC Databox (EP/N028260/1), andthe NSF (CNS-1909020).

REFERENCES [1] IoT Analytics. IoT 2019 in Review: The 10 Most Relevant IoT Developments ofthe Year. https://iot-analytics.com/iot-2019-in-review/, 2020.[2] S. Greengard. Deep Insecurities: The Internet of Things Shifts Technology Risk.

Comm. of the ACM , 62(5), 2019.[3] D. J. Dubois, R. Kolcun, A. M. Mandalari, M. T. Paracha, D. Choffnes, and H. Had-dadi. When Speakers Are All Ears: Characterizing Misactivations of IoT SmartSpeakers. In

Privacy Enhancing Technologies Symposium (PETS) , 2020.[4] J. Ren, D. J. Dubois, D. Choffnes, A. M. Mandalari, R. Kolcun, and H. Had-dadi. Information Exposure From Consumer IoT Devices: A Multidimensional,Network-Informed Measurement Approach. In

ACM IMC , 2019.

5] L. F. DeKoven, A. Randall, A. Mirian, G. Akiwate, A. Blume, L.K. Saul, A. Schul-man, G.M. Voelker, and S. Savage. Measuring Security Practices and How TheyImpact Security. In

ACM IMC , 2019.[6] S. Marchal, M. Miettinen, T. D. Nguyen, A.-R. Sadeghi, and N. Asokan. AUDI:Towards Autonomous IoT Device-Type Identification using Periodic Communi-cation.

IEEE Journal on Sel. Areas in Comm. , 37(6), 2019.[7] D. Kumar and K. Shen and B. Case and D. Garg and G. Alperovich andD. Kuznetsov and R. Gupta and Z. Durumeric. All Things Considered: AnAnalysis of IoT Devices on Home Networks. In

USENIX Security Symposium ,2019.[8] Z. Durumeric, E. Wustrow, and J. A. Halderman. ZMap: Fast Internet-WideScanning and its Security Applications. In

USENIX Security Symposium , 2013.[9] Z. Durumeric, D. Adrian, A. Mirian, M. Bailey, and J. A. Halderman. A SearchEngine Backed by Internet-Wide Scanning. In

ACM CCS , 2015.[10] H. Haddadi, V. Christophides, R. Teixeira, K. Cho, S. Suzuki, and A. Perrig.Siotome: An edge-isp collaborative architecture for iot security. In ,2018.[11] B. Claise. RFC 3954: Cisco Systems NetFlow Services Export Version 9, 2004.[12] B. Claise, B. Trammell, and P. Aitken. RFC 7011: Specification of the IPFIXProtocol for the Exchange of Flow Information, 2013.[13] Amazon. Alexa Voice Service Endpoints (accessed 2019-11). https://developer.amazon.com/en-US/docs/alexa/alexa-voice-service/api-overview.html

Proceedings of the 13th International Workshop on Automation ofSoftware Test , 2018.[15] M. Hammad Mazhar and Z. Shafiq. Characterizing Smart Home IoT Traffic in theWild. In

ACM/IEEE Conference on Internet of Things Design and Implementation

ACM IMC , 2018.[18] F. Weimer. Passive DNS Replication. In , 2005.[19] Amazon AWS. What is Amazon VPC? (accessed 2019-11). https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html.[20] Amazon AWS. Public IPv4 addresses and external DNS hostnames (ac-cessed 2019-11). https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-instance-addressing.html \ IEEE European Symposium of Security and Privacy

ACM IMC , 2010.[31] M. Antonakakis, T. April, M. Bailey, M. Bernhard, E. Bursztein, J. Cochran,Z. Durumeric, J. A. Halderman, L. Invernizzi, M. Kallitsis, D. Kumar, C. Lever,Z. Ma, J. Mason, D. Menscher, C. Seaman, N. Sullivan, K. Thomas, and Y. Zhou.Understanding the Mirai Botnet. In

USENIX Security Symposium , 2017.[32] O. C¸etin, C. Ga˜n´an, L. Altena, T. Kasama, D. Inoue, K. Tamiya, Y. Tie, K. Yoshioka,and M. van Eeten. Cleaning Up the Internet of Evil Things: Real-World Evidenceon ISP and Consumer Efforts to Remove Mirai. In

NDSS , 2019.[33] A. M. Mandalari, R. Kolcun, H. Haddadi, D. J. Dubois, and D. Choffnes. TowardsAutomatic Identification and Blocking of Non-Critical IoT Traffic Destinations.In

IEEE S & P Workshop on Technology and Consumer Protection , 2020.[34] H. Guo and J. Heidemann. Detecting IoT Devices in the Internet.

IEEE/ACMTransactions on Networking , 2020. [to appear].[35] Google. DNS-over-TLS. https://developers.google.com/speed/public-dns/docs/dns-over-tls, 2020.[36] F. Chen, R. K. Sitaraman, and M. Torres. End-User Mapping: Next GenerationRequest Routing for Content Delivery. In

ACM SIGCOMM , 2015.[37] E. Lear, R. Droms, and D. Romascanu. RFC 8520: Manufacturer Usage DescriptionSpecification, 2019.[38] D. A. Popescu, V. Safronov, P. Yadav, R. Kolcun, A. M. Mandalari, H. Haddadi,D. McAuley, and R. Mortier. Sensing the IoT network: Ethical capture of domesticIoT network traffic: poster abstract. In

ACM SenSys posters , 2019.[39] N. Apthorpe, D. Reisman, and N. Feamster. A Smart Home is No Castle: PrivacyVulnerabilities of Encrypted IoT Traffic.

Data and Algorithmic TransparencyWorkshop , 2016.[40] A. Sivanathan, H. H. Gharakheili, F. Loi, A. Radford, C. Wijenayake, A. Vish-wanath, and V. Sivaraman. Classifying IoT Devices in Smart Environments UsingNetwork Traffic Characteristics.

IEEE Transactions on Mobile Computing , 18(8),2019.[41] D. Y. Huang, N. Apthorpe, G. Acar, F. Li, and N. Feamster. IoTInspector: Crowd-sourcing Labeled Network Traffic from Smart Home Devices at Scale. In

ACMIMWUT / Ubicomp , 2020.[42] N. Apthorpe, D. Y. Huang, D. Reisman, A. Narayanan, and N. Feamster. Keepingthe Smart Home Private with Smart(er) IoT Traffic Shaping.

Proceedings onPrivacy Enhancing Technologies , 2019.[43] A. Sivanathan, H. H. Gharakheili, and V. Sivaraman. Inferring IoT Device Typesfrom Network Behavior Using Unsupervised Clustering. In

IEEE Conference onLocal Computer Networks (LCN) , 2019.[44] G. Hu and K. Fukuda. Toward Detecting IoT Device Traffic in Transit Net-works. In

International Conference on Artificial Intelligence in Information andCommunication (ICAIIC) , 2020.[45] A. Sivanathan, H. H. Gharakheili, and V. Sivaraman. Can We Classify an IoTDevice using TCP Port Scan? In , pages 1–4, 2018.[46] M. Nawrocki, T. C. Schmidt, and M W¨ahlisch. Uncovering Vulnerable IndustrialControl Systems from the Internet Core. In

IEEE/IFIP Network Operations andManagement Symposium (NOMS) , 2020., 2020.