PFirewall: Semantics-Aware Customizable Data Flow Control for Home Automation Systems
PP F
I R E WA L L : Semantics-Aware Customizable DataFlow Control for Home Automation Systems
Haotian Chi ∗ , Qiang Zeng † , Xiaojiang Du ∗ , Lannan Luo †∗ Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA † Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USAEmail: { htchi, dux } @temple.edu, { zeng1, lluo } @cse.sc.edu Abstract —Emerging Internet of Thing (IoT) platforms providea convenient solution for integrating heterogeneous IoT devicesand deploying home automation applications. However, seriousprivacy threats arise as device data now flow out to the IoTplatforms, which may be subject to various attacks. We observetwo privacy-unfriendly practices in emerging home automationsystems: first, the majority of data flowed to the platform aresuperfluous in the sense that they do not trigger any homeautomation; second, home owners currently have nearly zero control over their data.We present PF
IREWALL , a customizable data-flow controlsystem to enhance user privacy. PF
IREWALL analyzes the au-tomation apps to extract their semantics, which are automaticallytransformed into data-minimization policies; these policies onlysend minimized data flows to the platform for app execution, suchthat the ability of attackers to infer user privacy is significantlyimpaired. In addition, PF
IREWALL provides capabilities andinterfaces for users to define and enforce customizable policiesbased on individual privacy preferences. PF
IREWALL adopts anelegant man-in-the-middle design, transparently executing data-minimization and user-defined policies to process raw data flowsand mediating the processed data between IoT devices and theplatform (via the hub), without requiring modifications of theplatform or IoT devices. We implement PF
IREWALL to workwith two popular platforms: SmartThings and openHAB, andset up two real-world testbeds to evaluate its performance. Theevaluation results show that PF
IREWALL is very effective: itreduces IoT data sent to the platform by 97% and enforces user-defined policies successfully.
I. I
NTRODUCTION
With the prosperity of Internet of Things (IoTs), smart sys-tems (e.g., smart homes, factories, and hospitals) have becomerealistic and are expanding with an ever-increasing speed [1].IoT Platforms, such as SmartThings, Wink, openHAB, allowsmart home users to connect heterogeneous IoT devices (e.g.,sensors, actuators, appliances) to a platform-provided hub andto install applications on the platform to create automaticinteractions among devices, i.e., home automation.As IoT device data flow to the platform, protecting userprivacy becomes critical [2], [3]. Existing work protects userprivacy by resolving threats caused by malicios automationapplications [4], [5], [6], [7] or handling attacks that eavesdropIoT device traffic [8], [9], [10], [11]. Surprisingly, none inves-tigates privacy protection at the platform architectural level,even though the platform receives huge amounts of data from
An earlier version of this paper was submitted to USENIX Security onNovember 15th, 2018. This version contains some minor modifications basedon that submission. smart homes and has full data access privileges. Indeed, it isbaseless to assume the platform is secure and trustworthy. Aplatform could be compromised by both inside attackers [12]and remote attackers that exploit the vulnerabilities of its huband cloud [13]. Compared to clouds that have suffered manynotorious attacks, an IoT platform has a much larger attacksurface involving not only its cloud but also the hub and usercontrol interfaces (e.g., web and mobile app). Moreover, manyIoT platforms share users’ data with partners (e.g., advertisers)for the expansion of businesses [14], [15], [16]; any improperprotection may exfiltrate private data to third parties.Our investigation of popular smart home platforms showsthat these platforms are factually overprivileged to access real-time data streams from connected devices, although most ofthe data do not trigger any automation. This deviates theprinciple of “data minimisation” in European General DataProtection Regulation (GDPR) [17] or “least privilege” in ac-cess control systems [18]. We also find that no capabilities areprovided for users to control the leakage of private device datato the platform, failing to realize user-centric authorization.Therefore, our goals are to minimize the data sent to theplatform and allow users to define customizable data flowcontrol policies for individual privacy preferences.Multiple challenges arise for attaining these goals. First, thedata minimization should not adversely affect the functionalityof home automation . We observe that the semantics ofhome automation apps can be represented as rules with eachfollowing a event-condition-action model and the state-of-art code analysis techniques [19], [20], [7] are proved to beeffective in extracting rule semantics from apps. Our insightis that by finding the minimum data flows required by theserule semantics, we can properly generate and enforce dataflow control policies without affecting home automation. Forexample, suppose a rule has a semantic “ when a motion isdetected, if the indoor temperature is higher than 79 ◦ F , turnon the A/C ”. We can convert it into a data-minimization policy,such that if the indoor temperature is not higher than 79 ◦ F , nodata is sent to the platform; besides, if the A/C is already on(that is, the rule execution does not change anything), no datais sent even if the temperature is higher than 79 ◦ F . Optionally,users can have the system fuzz the data, such that even if thepolicy execution determines that the temperature should besent, a random value larger than 79 is reported.Second, many platforms are closed systems that do not al-low platform-level modifications and it is probably unrealisticto expect a platform to cooperate to enforce data minimization.Thus, how to enforce data-protection policies before data leave a r X i v : . [ c s . CR ] O c t he home network is a challenge. Intuitively, one may proposeto circumvent this challenge by building a new purely-localplatform, such that no data have to flow out of a home; or, onecan simply cut the network cable of a local gateway [21] andenforce most of the home automation locally. However, a largenumber of existing platforms have been deployed in homes andit might be infeasible to convince users to switch to anothernew platform they are not familiar with; moreover, a purelylocal platform means that a lot of highly desired Internet-basedservices (e.g., messaging, storage, and remote management)will be cut out. Therefore, how to enforce data protection onthe existing platform architecture without sacrificing the valuesof Internet-based services imposes extra difficulties.We leverage multiple system-building ideas into our sys-tem, named PF IREWALL . First, we build PF
IREWALL as adata mediator, which sits between IoT devices and the hub totransparently filter data based on privacy-protection policies.The advantage is that neither IoT devices nor the platformneeds to be modified. Thus, another challenge is that theoriginal communication between IoT devices and the hub isencrypted, which prevents PF
IREWALL from understandingand then filtering data. We overcome this difficulty with aman-in-the-middle approach: the data mediator claims itselfas a hub to pair with all the devices, and meanwhile it createsthe same number of virtual devices to connect the hub.Furthermore, we borrow the idea of a DMZ (demilitarizedzone) when designing PF
IREWALL . A DMZ exposes certainexternal-facing services (e.g., web) to the Internet, while theorganization’s local area network (LAN) is segregated by afirewall. This way, even a node in the DMZ is compromised,attackers need to bypass the firewall to reach the LAN.We propose to place the hub in a DMZ, and set up anextremely simple firewall between the DMZ and PF
IREWALL :the external world cannot initiate connection to PF
IREWALL ,and any inbound traffic, unless it targets those virtual devices,should be discarded immediately by PF
IREWALL .We demonstrate the ideas by implementing PF
IREWALL towork with two representative platforms: Samsung SmartThingsand openHAB, which are of the most popular cloud-based and gateway-based
IoT platforms, respectively. We evaluatePF
IREWALL in two real-world deployments. The results sug-gest that PF
IREWALL reduces the amount of data sent tothe platform by 97% based on data-minimization policies.Our case study shows that the data reduction heavily impairsthe attacker’s ability to infer privacy-sensitive behaviors, e.g.,bathroom usage and the arrival and departure time of homemembers. The user-specified policies provides extra fine-grained data control to resolve personalized privacy prefer-ences and concerns.The contributions of this work are summarized as follows. • We reveal the fact that most smart home platforms employa simple trust-by-default model between home devices andthe platforms, resulting in over-leakage of sensitive IoTdevice data. We find several channels through which thecollected data could be revealed, demonstrating the severeprivacy risks. Despite the clear need for user-centric dataflow control, we find that most leading platforms do nothave supports for this purpose.
InternetHub Core FrameworkIoT Devices Core Framework Messaging/Storage
Cloud-Based PlatformGateway-Based Platform
Companion and Third-Party Apps
Fig. 1:
Smart home platform architecture. • We design an effective data flow control system to enhanceuser privacy in home automation. On one hand, data-minimization policies are automatically generated based onthe installed automation apps, reporting minimally neces-sary data for app execution and obfuscating the reporteddata for further protection. On the other, users are offeredcapabilities to prioritize policies specified by themselves tocustomize data flow control for individual privacy prefer-ences and concerns. • A man-in-the-middle style enforcement mechanism inclosed-source smart home systems is designed. A proxydevice mediates the communication between IoT devicesand the hub, without modifying the devices or the hub. • We implement a proof-of-concept prototype to work withtwo platforms: SmartThings and openHAB. Through theevaluation in two real-world scenarios: a two-bedroomapartment and a public workplace, we demonstrate thatour system significantly reduces the privacy risks due todata leakage and introduces negligible latency to homeautomation. A user study is conducted to learn users’attitude and capabilities towards defining privacy-protectionpolicies with mobile interfaces.II. B
ACKGROUND : S
MART H OME P LATFORMS
Smart home platforms can be categorized into two types:cloud-based platforms (CBPs) and gateway/hub-based plat-forms (GBPs), according to whether the core framework of aplatform is hosted in a remote cloud or a gateway/hub devicelocated at home (as shown in Fig. 1); the two types are similar,otherwise. Note that the gateway running a core frameworkat home does not resolve the privacy leakage threats, as thegateway connects to the Internet and is under the full control ofthe platform administrator. Once the platform is compromised,the attacker gains equivalent capabilities of gaining user data.We choose a CBP—
SmartThings , one of the most popularand full-fledged platforms, as an example to describe the keycomponents in a smart home system. • Hub.
A CBP hub connects IoT devices through distinctshort/medium-range wireless radios (ZigBee, Z-Wave, etc.).The hub plays a key role to ensure the interconnectivity andinteroperability of heterogeneous IoT devices. A GBP alsohas a hub-like device which not only connects IoT devicesbut also hosts the core framework (descrbied below). Notethat the hub or gateway device, though physically located athome, is conceptually regarded as a part of the platform interms of data privacy protection in that it is under the fullycontrol of the platform administrator. We use gateway for distinguishment.
Cloud.
The backend cloud of a CBP hosts the core frame-work and provides cloud messaging, storage as well asany other necessary services for the platform to function.The cloud in a GBP is typically responsible for messagingand storage. The cloud messaging service facilitates somecritical functionalities, such as notification, third-party ap-plication integration, remote monitoring and control. ManyInternet-based services depend on the cloud. • Core Framework.
The core framework runs major func-tionalities of a platform, including home automation. TakeSmartThings as an example. It provides a sandboxed run-time environment for running device handlers and
Smar-tApps . Device handlers are software wrappers of physicaldevices which abstract the physical devices (as a set of capabilities and handle the underlying protocol-specificcommunications between the core framework and the physi-cal devices). They expose uniform interfaces for SmartAppsto interact with devices. • Companion and Third-Party Apps.
To provide a conve-nient user interface (UI) for users to manage their hubs, IoTdevices and apps, a platform usually provides a smartphone companion app. For instance, in SmartThings companionapp, users can install and configure a SmartApp. Currentplatforms also expose interfaces (mostly RESTful cloudAPIs) to incorporate third-party services/applications (e.g.,mobile apps, IFTTT [22], webCoRE [23]).Therefore, a smart home platform has a large attack surfaceinvolving the hub, cloud, core framework services, companionapp, and APIs for third parties, let alone inside attacks. It isdangerous and unnecessary that users grant unlimited trust toit by allowing all the data to flow to the platform.III. M
OTIVATION AND T HREAT M ODEL
In this section, we first reveal two facts we have observed,and then present the threat model.
A. Privacy Concerns about Platforms1) Trust By Default:
In smart home systems, the platformsare typically fully trusted. That said, after being installed,a platform gains the access privilege to all connected homedevices technically by design and legally by claiming a termsand conditions or a privacy policy . To reduce developmentcomplexity and the time to market, most emerging platformsdo not provide access control between home devices and theirhubs to avoid accessing unnecessary data; instead, they simplycollect all data streams reported by devices for further process-ing. We studied the privacy-related practices in popular smarthome platforms and showed the details in Appendix A. In thissection we use SmartThings as an exemplar to demonstrate.
Are home data flowing out of homes silently?
To answerthis question, we connected four types of ZigBee devices (amultipurpose sensor, a motion sensor, an arrival sensor andan outlet) and a Z-Wave sensor (Aeotec Multisensor 6) toa SmartThings hub and inserted log.debug code into the parse methods of device handlers which are used bythe core framework to parse the received IoT payload andgenerate in-system events. In this way, we obtain all datareceived by the SmartThings cloud via its hub on the livinglogging interface [24]. We did not install any automation apps
IoT Apps
PolicyGenerationConflict DetectionPolicy Engine
Policy Manager
Code AnalysisConfigurationCollection
Rule ExtractorIoT Devices
Device Takeover Virtual Device Manager
Data Flow Mediator
Virtual Devices
Platform
Fig. 2:
The architecture of PF
IREWALL . and did not operate any SmartThings-provided interfaces; weonly interacted with the devices physically. We found that theplatform cloud still kept receiving device attribute data (e.g.,motion, switch, temperature, etc.) from the above devices,indicating that device data flow out via the hub even if theyare not subscribed to or requested by any service.This trust-by-default model introduces severe data leakagerisks to smart homes since attackers may gain unauthorizedaccess to home data by compromising the hub device, cloudinfrastructure, or the companion app [25]. Vulnerabilities inIoT platforms and clouds have been demonstrated by recentworks. For instance, Fernandes et al. [18] and Zuo et al. [26]respectively revealed that the abuse of OAuth tokens and cloudAPI tokens in mobile apps imposes significant security andprivacy threats including unauthorized access to the platform.An inside attacker can also access all the data.
2) Limited User Capabilities:
Users visibility and controlhelps mitigate risks. However, users have few capabilities andinterfaces to inspect or control what their device sends to theInternet [25]. They only have a binary choice: whether ornot to connect a device to the platform; once connected, thedevice keeps reporting data to the hub device continuously andopaquely.
B. Threat Model
We consider the platform may be exploited by attackers foraccessing user private data and inferring user privacy-sensitivebehaviors. Attacks that exploit the home IoT device hardwarevulnerabilities, side channels, or home local networks to stealprivate data are out of the scope of this work. We assumethe home automation apps are not malicious (note that howto detect and handle malicious automation apps is a separateproblem and has been well studied, e.g. [7], [20], [22]).IV. PF
IREWALL S YSTEM O VERVIEW
To mitigate data leakage, we propose to introduce accesscontrol before data leave the control of users. In this way,privacy-oriented metrics can be applied to provide the dataexposure with certain privacy guarantees (i.e., data minimiza-tion in this paper) and end-user controls are also feasible tosatisfy personal privacy preferences. However, it is challengingto attain these goals for the following reasons. First, thedata filtering, if not carefully performed, may accidentallyaffect home automation. Thus, how to precisely analyze appsand convert them into privacy-protection policies correctly isa challenge (Section V-A). Most smart home platforms areclosed-systems and do not allow platform-level modifications.oreover, the traffic between IoT devices and the hub isencrypted. How to perform the data filtering without modi-fying the device, hub, or platform framework is challenging(Section V-B). How to provide interfaces for non-expert usersto define their own privacy-protection policies is non-trivial(Section V-A2).For interoperability, the wireless protocols in IoT devicesare mostly open-source standard ones such as ZigBee, Z-Wave, LAN, etc., which makes it possible to place a man-in-the-middle device (named mediator ) between IoT devicesand the hub to intervene in the communication between them.On top of the mediator, it becomes possible to process theraw data flows before forwarding them to the hub. With thisinsight, we build PF
IREWALL , a system that enforces carefullygenerated data flow control policies before data are reportedto the backend platform for home automation. As shown inFig. 2, PF
IREWALL comprises the following modules: • Rule extractor extracts the home automation rules fromrule creation interfaces, e.g., IoT apps, webpages, smart-phone apps, etc. When rules are initially installed, the ruleextractor obtains rule semantics and rule-device bindinginformation. In appified IoT systems, the rule extractorcomprises a code analysis component to extract rule seman-tics from apps and a configuration collection component tocollect rule-device binding information. The rule semanticsand rule-device bindings constitute the complete automationlogic. • Policy Manager generates and manages data flow policiesused for protecting IoT data.
Policy generation , on one hand,interacts with the rule extractor to generate semantics-baseddata-minimization policies; on the other hand, it takes inuser-specified policies from the user interfaces and formatsthem into executable-formatted policies.
Conflict detection inspects if a user-specified policy conflicts with existingdata-minimization policies and thus affects home automa-tion; when conflicts are detected, it reports the conflict tothe user for making decisions.
Policy engine interprets andexecutes the above policies over the incoming raw data fromIoT devices. • Data Flow Mediator is a proxy who mediates the commu-nication between IoT devices and the hub. The mediator,on behalf of the hub, talks with IoT devices via device-dependent protocols (e.g., ZigBee, ZWave, WiFi, etc) andforwards the raw device data to the policy engine forprocessing. On the other hand, the mediator creates a virtualdevice instance to send the processed data to the hub, onbehalf of each real device. All virtual device instancesuse a uniform communication protocol supported by thetarget platform (e.g., LAN in SmartThings [27] and MQTTin openHAB [28]). Besides, the virtual devices receivedevice control commands from the hub, which will thenbe translated to protocol-specific commands and forwardedto the corresponding real device. The data mediation is nottransparent to the platform and therefore the platform worksexactly the same way.V. D
ESIGN AND I MPLEMENTATION
In this section, we present the detailed design and im-plementation of PF
IREWALL . We choose Samsung’s Smart-Things, one of the most mature and comprehensive smart TRIGGER :{ match (:type).(:subject).(:attribute) satisfy (:operator)->(:value) [ fetch1 ] (:type).(:subject).(:attribute*) [ branch ] (:operator1)->(:value) run (:method)(:parameters)(:delay) [ else ] (:method1)(:parameters1)(:delay1) } CHECK : [{ fetch (:type).(:subject).(:attribute) satisfy (:operator)->(:value) [ fetch1 ] (:type).(:subject).(:attribute*) [ branch ] (:operator)->(:value) run (:method)(:parameters) [ else ] (:method1)(:parameters1) }, ...] Listing 1: Context-aware policy formathome platforms, as the underlying platform to describe theimplementation of PF
IREWALL . We first describe our policygeneration and management for contextually controlling IoTdata flows. Then, we present how we enforce policies inexisting IoT systems by introducing a data flow mediator. Toshow the applicability of PF
IREWALL , we also present howwe integrate PF
IREWALL with another platform, openHAB,by adapting the platform-specific components.
A. Data Flow Control Policies1) Policy Definition and Execution:
Home automation iscontext-aware: a rule executes a command when it is triggeredby an event and meanwhile the smart home is under the pre-scribed condition . Note that the event and condition are slightlydifferent: an event describes a context change (e.g., the motionsensor’s reading changes from “inactive” to “active”, whichindicating a motion is detected) while a condition indicatesa collection of static statuses (e.g., the motion sensor’s latestreading is “active”). To precisely filter raw IoT data flows fordata minimization without interfering with the execution ofautomation rules, data flows need to be processed contextually.To this end, we define a context-aware policy format.Formally, we define a data flow policy as P =( T , C ) , where T and C denote the TRIGGER and
CHECK section in a policyas shown in Listing 1.
TRIGGER defines the incoming eventthat triggers the execution of P and CHECK encapsulates alist of items, each of which indicates a constraint that mustbe satisfied for the policy to indeed perform actions. type indicates that the event is fired by a device or is a time change,etc; subject is to identify a specific IoT device (i.e., deviceID); attribute specifies the attribute of a device (which mayhave multiple attributes) or the time-related feature (e.g., timeof day, date, timer). type , subject and attribute are tocheck if an incoming data matches the event that triggers thepolicy in TRIGGER and are to query the smart home statusfor constraint checking in
CHECK . operator and value denote a constraint that the incoming event or smart homestatus must satisfy for the policy context to be evaluated astrue. A policy action defined in the run fields where method and parameters define how to process the raw data and delay controls the timing for reporting the processed data tothe platform. Besides, there are three optional fields markedwith “[]” that form an extended TRIGGER section or a
CHECK tem. [ fetch1 ] and [ branch ] evaluate an extra constraint onthe fetched data; if true action defined in run is executed, andotherwise action in else will be executed instead.Policies are executed by a policy engine. The policy enginelistens to all the incoming raw data from the IoT devices andtime-related information if registered. When receiving a newdata item D (a.k.a. an event), the engine uses D to evaluate themaintained data flow policies one by one. Algorithm 1 showsthe general workflow of how the engine evaluates and executesa policy P . Specifically, it first checks if D matches the type , subject , and attribute in TRIGGER , and thenexamines if the value of D satisfies the constraint specified by operator and value . If true, P is triggered and proceedsto execute. Then the engine evaluates all items specified in CHECK . Since the data required for evaluating the
CHECK items are not newly captured events but the current smarthome status (e.g., the device working status), the policy enginefetches the information indexed by type , subject and attribute from a database DB , which stores the latest at-tribute values of all connected devices and updates them whendevices report any change. Only when constraints defined inall CHECK items are satisfied, the policy is finally evaluatedand the actions defined in all run or else fields will beperformed. During the above process, a policy terminates ifthere is any event mismatches or constraint violation. Besides,the policy engine also maintains another database DB ∗ to keeprecord of the lastest reported data for each device attribute. Algorithm 1:
The algorithm for executing a policy
Input : D ← new data item, P ← A privacy policy DB ← Newest Device Status Database DB ∗ ← Newest Reported Data Database
Output:
Privacy-Aware Data Set DS if match( D . source , P . TRIGGER . ( type , subject , attribute ) ) andsatisfy( D . value , P . TRIGGER . ( operator , value ) ) then foreach checkitem ∈ P.CHECK do val ← fetch ( DB , checkitem . ( type , subject , attribute ) ) if !satisfy( val , checkitem . ( operator , value ) ) then return if P.TRIGGER .contains( [branch] ) then val ∗ ← fetch ( DB ∗ , P . TRIGGER . ( type , subject , attribute ∗ ) ) if satisfy( val ∗ , P.TRIGGER.(operator1,value)) then DS ← run P . TRIGGER . ( method , paras ., delay ) else DS ← run P . TRIGGER . ( method1 , paras . , delay1 ) else DS ← run P . TRIGGER . ( method , parameters , delay ) foreach checkitem ∈ P . check do if checkitem .contains( [branch]) then val ∗ ← fetch ( DB ∗ , checkitem . ( type , subject , attribute ∗ ) ) if satisfy( val ∗ , checkitem . ( operator , value ) ) then DS ← checkitem . ( method , paras . ) else DS ← checkitem . ( method1 , paras . ) else DS ← checkitem . ( method , paras . )
2) Policy Generation: PF IREWALL generates two types ofpolicies: automation-based data-minimization policies (APs)and user-specified policies (UPs). To achieve data minimiza-tion, i.e., only report the minimum amount of data thatare necessary for home automation, rules are extracted frominstalled automation apps and analyzed to find the minimumdata flows for the rules to execute. UPs are generated fromuser interfaces and work with APs simultaneously, which isan important supplement to customize privacy preferences thatcannot be learned from home automation.
Presence sensor( ) == "present"Temperature sensor ( ) > 86Turn on the fan ( )
EventConditionAction
Automation RuleData Flow Policy
CHECK fetch ( ).( ).( )satisfy ( ) -> ( )fetch1 ( ).( ).( )branch ( ) -> ( )run ( ) ( ) (0)else ( ) ( ) ( )fetch ( ).( ).( )satisfy ( ) -> ( )run ( ) ( ) (0)match ( ).( ).( )satisfy ( ) -> ( )fetch1 ( ).( ).( )branch ( ) -> ( )run ( ) ( ) ( )else ( ) ( ) ( )
TRIGGER
Fig. 3:
The policy derivation from an automation rule.
Automation Rule Extraction
Rule extraction is the first step for AP generation. Automationrules follow an event-condition-action model and are installedby installing IoT apps or selecting rule templates on webor mobile app interfaces. The rule extraction regarding bothmethods has been widely studied by state-of-art literature.Code analysis has been proved to be an effective way toextract rule semantics from IoT apps by state-of-art work.For example, by utilizing Abstract Syntax Tree (AST) analysison smart apps, [29] identifies requested and used capabilitiesin SmartApps, [7], [30] breaks down SmartApps and extractsrule information, [31], [32], [33] builds Deterministic FiniteAutomatons (DFAs) from SmartApps. Symbolic execution isa more powerful technique to analyze rule semantics from apps[19], [20]. Text data crawling and natural language processing(NLP) are used for rule extraction from web pages and mobileapps [32], [34].Rather than design another code analyzer, in this paper,we adapt the solution provided in [19] to implement our ruleextractor since it not only implements a complete symbolicexecutor with API modeling but also provides an app-devicebinding collection approach. We obtain the source code fromthe authors and verify its effectiveness on 86 SmartAppsfrom SmartThings market apps. The executor works on theAST representation of a SmartApp; the rule extraction startsfrom an event subscription method subscribe() (event thattriggers a rule) and traces in the entry point of the eventhandler method. All paths branching at if-else statements(rule condition) are explored until a sink (rule action) isspotted; expressions (e.g., value assignment) and APIs (e.g.,device access methods, device control commmands) along thepaths are modelled . The combination of control flow analysisand data flow analysis allow us to extract the rule context(event and condition) and command (action) from a SmartApp.The right column of Fig. 3 shows the extracted rule from atemperature control SmartApp that defines a rule R “whena presence sensor ps becomes present , if the reading of atemperature sensor ts is higher than 86 ◦ F , turn on the fan f ”. Data-Minimization Policy Generation Due to page limits, we refer interested readers to the literature [19] formore details. onsider the example rule R . By default, the platform contin-uously receives and stores data streams from devices (presencesensor, temperature sensor, fan). However, we observe thatthese data are not all required for executing R in cases:(1) The presence sensor ps does not send any event;(2) ps sends a “not present” event;(3) The indoor temperature measured by ts is lower than86 ◦ F ;(4) The fan f is “ON”;(5) ps sends a “present” event and the last reported temper-ature by ts is higher than 86 ◦ F .In cases (1)-(4), there is no need to report any data from ps and ts to the platform; in case (5), it is unnecessaryto report temperature data since the temperature value storedin the platform database satisfies the rule condition checking;in no cases, the ON/OFF state of f is useful for executing R . From this example, we can conclude that only sporadicones in the data streams of devices are required for homeautomation, which motivates us to encode highly-structuredautomation rules to data-minimization policies. An exampleof generating an AP from R is shown in Figure 3. The TRIGGER of AP is derived from the
Event of R and CHECK is derived from the
Condition and
Action of R ,respectively. According to the policy definition and executionalgorithm presented in Section V-A1, the derived AP expressesmulti-faceted information for PF IREWALL to process data:1) Context: when and only when an incomming event of ps is “present” and meanwhile the latest received reading of ts is higher than 86 ◦ F and the state of f is not “ON”, some data will be reported, and otherwise, the policy willbe skipped and no data will be reported at all;2) Event reporting: if the latest reported value of ps is“present”, use the diffKeep() method to process thecurrent value for reporting, and otherwise, use keep() ;3) CHECK data reporting: if the latest reported value of ts ishigher than 86 ◦ F , use the block() method to process thecurrent value of ts , and otherwise, use randomize(86,MAX) ; use block() to process the state data of f .Table I shows a summary of all the methods used in the run and else fields. In the default setting, binary sensorssuch as the presence sensor reports binary values alternatively;thus, SmartThings only fires an event when observing a valuechange. Our data flow control breaks the alternate “present”and “not present” values in the data stream of ps . Thus,when the platform receives “present” but finds the last valueis also “present”, it will not issue a “present” event in itsframework and R cannot be triggered. Hence, the derived APuses diffKeep() rather than keep() to address this issue; diffKeep() reports “not present” followed by “present”with a time delay T , which ensures a “present” event is fired.It is worth mentioning that the selection of T is non-trivial toguarantee the normal execution of home automation becauseit allows time for the platform to update a received data to itsdatabase. Similarly, it is required that SmartThings have up-dated the temperature value (if necessary) in database before itissues a “present” event to R ; otherwise, the app will fail thetemperature condition check when triggered by the event . The We manually observed app execution while tuning T and found a valueas small as 100 millisecond without causing failure in 1000 trials. TABLE I:
Summary of methods used in data flow policies
Method Description keep()
Report the original value block()
Do not report diffKeep()
Report a different value and then the original value randomize(MIN,MAX)
Report a random value ∈ ( MIN , MAX ) pickOther(CUR,ENUM) Randomly picked a value ( (cid:54) = CUR ) from set
ENUM
TABLE II:
Boundary values for randomizing different attributes
Attribute Min Max Unit
Temperature -50 150 ◦ FIlluminance 0 100000 LuxHumidity 0 100 %Power 0 1800 Watt block() discards data without sending it. randomize() randomizes the float-value attribute data (e.g., temperature).In the example, the temperature is used to compare witha threshold (86 ◦ F ), so a random value between 86 ◦ F andthe upper limit of a temperature M AX is sufficient for thecondition checking.
MAX / MIN denotes the upper and lowerboundaries of a specific attribute (See Table II). We obtainsuch information from SmartThings Capabilities Reference[35]. Besides, we present how PF
IREWALL handles time/timer-related automation in Appendix B.
User-Specified Policy Generation
We propose an interactive approach for users to specify dataflow control policies. This is motivated by three reasons:1) users have individual privacy preferences that cannot bederived from automation rules; for example, users mightprioritize privacy rather than automation functionality for somedevice types during a time period or under certain situations;2) the platform may integrate a third-party service but thereis no rule extractor available to extract semantics from it; 3)users have rights to control the use of their data. In principle, (a) (b)
Fig. 4:
Screenshots of PF
IREWALL mobile app. The app provides aninformation tab showing users what data every device type generatesand the corresponding privacy implications, and a policy tab allowsusers to define context-aware data control policies.
Ps have higher priority than APs in controlling data.We develop a mobile app for end-users to specify policies.As shown in Fig. 4(a), information is displayed to help usersunderstand what privacy issues each device and its data mayimply. With the templates in Fig. 4(b), users are able toconfigure whitelist, blacklist and conditional control policiesduring a specified time period or under certain contexts.Finally, UPs are encoded into the policy format in Listing 1 forexecution. See Appendix E for the user survey we conductedto evaluate the policy templates.
3) Policy Conflicts:
A user is likely to define UPs whichconflict with existing APs and hinder the automation sinceUPs are designed for overriding APs. Nevertheless, users needa warning that shows them what conflicts are imposed andwhich automation rules are affected. Therefore, an automatedpolicy conflict detection is necessary. Two policies P and P conflict if the following requirements are satisfied: (1) P and P are triggered simultaneously; i.e., an event makes bothconstraints c T and c T (defined in TRIGGER fields of P and P , respectively) hold; (2) both policies are finally executedi.e., all the constraints c i and c i in the CHECK fields of bothpolicies are evaluated true; (3) two policies define differentactions (i.e., data processing methods, parameters, or delays)for the same data. Formally, let S ( C ) denote the set of allpossible contexts that satisfy the set of constraints C , and O ( a ) , E ( a ) denote the object (i.e., the controlled data) andeffects of a certain action a (defined in both TRIGGER and
CHECK fields). A conflict occurs when the formula holds. S ( c T ) ∩ S ( c T ) (cid:54) = ∅ , S ( c , c , · · · ) ∩ S ( c , c , · · · ) (cid:54) = ∅ , ∃ i, j, O ( a i ) = O ( a j ) , E ( a i ) (cid:54) = E ( a j ) . (1)We detect policy conflict for each newly submitted UPagainst all APs. To calculate the constraint overlapping in thefirst two formulas in Equation 1, we encode each constraintin a policy into a quantifier-free first-order formulas: ( type [ . subject [ . attribute ]]) (cid:124) (cid:123)(cid:122) (cid:125) data source and type ( operator )( value ) . Thus, the constraint overlapping is transformed into a con-straint satisfaction problem which can be solved by a constraintprogramming (CP) solver. In our implementation, we use aJavaScript linear solver javascript-lp-solver [36].If the constraint satisfaction is solvable, two policies willbe executed simultaneously. We then check whether the twopolicies perform different actions (by looking at the methodsand parameters in run and else fields) on the same dataflow; if so, the new UP conflict with an existing AP. Theautomation app which the AP was derived from would beaffected and is displayed to users for making decisions.
B. Data Flow Mediation
To enforce data flow policies in a closed-source IoTsystem, we introduce a data flow mediator for relaying thecommunication between IoT devices and the hub, as shown inFig. 5. To this end, the mediator needs to (1) act as a hub tointeract with IoT devices and (2) generate a virtual device tointeract with the original hub on behalf of each real device.
Home Gateway Virtual Device Manager
1. join/leave2. raw data
IoT Device
1. join/leave2. privacy-aware data
IoT Hub
Virtual Device Instance
1. join/leave2. raw data 3. command3. command 3. command1. create/remove2. privacy-aware data 3. command1. join/leave2. privacy-aware data3. commandOriginal Flow PFirewall Flow
Fig. 5:
The workflow of the data flow mediator.
1) Connecting IoT Devices:
To play the role of a hub,the mediator needs to handle 3 major interactions with IoTdevices: 1) devices join or leave the hub-leading network; 2)devices report attribute data to the hub; 3) the hub forwardscommands from the platform to devices. The hub functionalityis provided by many open-source platforms, e.g., openHAB[37] and Mozilla IoT [21], which allow developers to addadd-ons for integrating various IoT devices using differentcommunication techniques. Until now, openHAB supports 275bindings that have been tested to work with hundreds ofcommercial IoT devices and Mozilla IoT also have testedmore than 100 mainstream devices. In our implementation,we adapt the source code of Mozilla IoT to realize connectingwith ZigBee and Z-Wave devices since the two techniques arewidely used by IoT devices; specifically, the mediator is builton a Raspberry Pi with a Digi XStick USB dongle (ZB meshversion) and an Aeotec Z-Stick (Gen5) to extend ZigBee andZ-Wave capabilities, respectively.
C. Connecting the Hub and Platform
To interact with a target platform on behalf of a realdevice, the mediator creates a virtual device which could: (1)talk with the hub with a communication technique supportedby it, and (2) be identified as a compatible device by theplatform framework. Most emerging platforms support variousconnectivity protocols for developers to build customizednetwork devices; for example, SmartThings supports LAN-and cloud-based device integration [27], openHAB supportsMessage Queuing Telemetry Transport (MQTT) protocol [38],Mozilla IoT provides REST-based Web Things framekworkand APIs [39], and Wink allows creating RESTful API devices[40]. This feature alleviates the workload for interfacing with atarget platform. We implement the mediator to work with tworepresentative platforms: SmartThings and openHAB. Due topage limit, we present the openHAB part in Appendix C1.
Interfacing with SmartThings
We choose LAN as the protocol for communicating withthe SmartThings hub since PF
IREWALL is designed to besegregated from a DMZ by a firewall; thus attackers can-not initiate any connection to PF
IREWALL to obtain data.SmartThings provides a device handler (see Section II)for abstracting each supported device type; accordingly, webuild a virtual device (VD) type for each device handler (DH) that originally supports ZigBee or Z-Wave devices,as shown in Fig. 6. We develop a service manager irtual Device Manager
Virtual Device-SmartOutletVirtual Device-MotionSensor
SmartThings Cloud
Service ManagerDevice Handler -SmartOutletDevice Handler- MotionSensor
Discover:SSDP
Data
Subscribe/Command:UPnP
PFirewall
Data: UPnP
Fig. 6:
Overview of interfacing with SmartThings.
SmartApp on SmartThings that uses SSDP (Simple ServiceDiscovery Protocol) to discover VD instances on the LAN.To be considered as different devices (SmartThings uses IPand port to uniquely identify a device), each VD instanceis launched on a different port. After discovered a device,the service manager adds it as a child device . When a child device is added, SmartThings automatically selectsa DH to abstract it according to the model property ofthe child device ; thus, we make the model propertyof the VD instance, the child device the same as the name of the target DH that is used to represent the cor-responding real device. After the initial connection, a VDinstance on the mediator side interacts with a DH instanceon the SmartThings with the UPnP (Universal Plug and Play)protocol, which uses SOAP (Simple Object Access Protocol)messages. Additionally, we adapting all DHs for ZigBee/Z-Wave devices available in SmartThings IDE. In each DH,we add a subscribe() function which accomplishes theSUBSCRIBE step for UPnP communication; when a DH isinstantiated (which means a VD instance is created and a childdevice is added), it uses the IP and port to send a SUBSCRIBESOAP message to the VD instance, providing its IP and portinformation. Moreover, we change the code in parse andcommand-related functions for receiving ZigBee/Z-Wave dataand sending ZigBee/Z-Wave commands respectively, to codefor receving and sending SOAP messages in each DH. Thus,the VD and DH instances become addressable to each otherand realize a subscribe/publish based UPnP communication toreport data and send commands.VI. E
VALUATION
A. Evaluation Setup
We build two real-world testbeds for evaluating the per-formance of PF
IREWALL : an office with 5 members (T ) anda two-bedroom apartment with 1 member (T ), as shown inFig. 7. In each testbed (T and T ), we deployed two parallelsystems ( SYS1 and
SYS2 ) by placing two same devices ateach position in Fig. 7;
SYS1 and
SYS2 have the same devicetypes, numbers, placement and app configuration, as shown inTable III, Fig. 7 and Table IV. The only difference is that
SYS1 is a standard SmartThings deployment but
SYS2 introducesPF
IREWALL . We bind
SYS1 and
SYS2 in each testbed to twodifferent SmartThings accounts and run them simultaneouslybut independently. We choose SmartThings in the real-worldtestbeds because SmartThings provides official apps in its appstore, while openHAB needs users to write automation appsand provides no market apps. Instead, we perform some micro-benchmark tests for evaluating openHAB (see Appendix C2). TABLE III:
Devices in the two real-world testbeds
Testbed Device (
Abbreviation ) Attribute Number
Office(T ) SmartThings hub v2 ( HUB ) – 1Multipurpose sensor ( MU ) contact, temperature 1Motion sensor ( MO ) motion, temperature 1Smart outlet ( OL ) switch, power 2Smart bulb ( SL ) switch 1Smartphone ( SP ) presence 5Apartment(T ) SmartThings hub v2 ( HUB ) – 1Multipurpose sensor ( MU ) contact, temperature 3Motion sensor ( MO ) motion, temperature 2Smart outlet ( OL ) switch, power 2Smart bulb ( SL ) switch 4Aeotec MultiSensor ( AM ) motion, humidity,illuminance 2Smartphone ( SP ) presence 1 MO1 MU1SL1 OL1OL2 SP1SP2SP3 SP4SP5 (a) The office
MU2MU3 SL2SL3AM2 AM1OL3 MO2OL4SP6 MU4SL5SL4MO3 (b) The apartment
Fig. 7:
The layout and device placement in the two testbeds.
B. Performance of Data Mediating
To test the correctness of PF
IREWALL mediator, we disablethe data filtering in
SYS2 of both testbeds, i.e., the medi-ator simply forwards the IoT data to SmartThings withoutexecuting policies. To capture received data by SmartThings,we insert log.debug code into the parse methods in all device handlers for the tested devices, which allows usto record the event logs per device on SmartThings web IDE.We observe that there exist duplicate events in the capturedSmartThings event logs, so we remove duplicates beforeanalyses; consecutive events that have the same modality (thesame device, attribute, value) and very close timestamps (notlonger than 1 second) are regarded as the duplicates. Werun the above setting in
SYS2 of both testbeds for 10 daysand compare the data sequence of each device received byPF
IREWALL mediator and SmartThings. Table V shows thetotal numbers of received data per device and the number ofinconsistencies in the data sequences. The result shows thatour mediator works effectively and correctly in relaying thereceived data to the platform.
C. Performance of Policy System
To test the performance of our policy system, we establisha comparative experiment by running
SYS1 and
SYS2 simul-taneously in both testbeds for another 10 days. We enabledata filtering in
SYS2 , so
SYS2 in this experiment runs thedata-minimizaion policies. Also, we define two extra user-specified policies:
UP1 (DO NOT report
MO1.motion databetween 5pm to 10pm) in T and UP2 (DO NOT report
MU2.contact data between 8am to 6pm) in T .
1) Correctness and Reliability:
Comparing the receiveddata sequences is meaningless since data are filtered in
SYS2 ,so we test the correctness of the execution of SmartApps. Tocapture the execution of apps, we manually insert logging codeABLE IV:
SmartApp and device settings for the evaluation environments. O: official app, C: custom app.
Testbed SmartApps (
Abbreviation )( Source ) Description and Device Bindings T UndeadEarlyWarning (
UEW )( O ) When door ( MU1 ) is opened, turn on light (
SL1 ).LightsOffWithNoMotionAndPresence (
LON )( O ) When no motion ( MO1 ) or presence (
SP1 ∼ ) is detected for 5 minutes, turn off light ( SL1 ).MyAutoCoffee (
MAC )( C ) When presence ( SP1 ) becomes present, if time is before 12am, turn on coffee machine (
OL1 ).MyAutoHeater (
MAH )( C ) When motion ( MO1 ) detected, if temperature (
MU1 ) < ◦ F , turn on heater ( OL2 ).MyFitnessNotification (
MFN )( C ) When motion ( MO1 ) is active for longer than 60 minutes, send a message to alert.StrangerNotification (
STN )( C ) When door ( MU1 ) is open, if no presence (
SP2 ∼ ), send a message.T UndeadEarlyWarning (
UEW )( O ) When door is opened ( MU2 ), turn on light (
SL2 ).SmartLights (
SML )( O ) When motion ( MO2 ) active if illuminance (
AM1 ) < LUX, turn on light (
SL3 ).TurnOnOnlyIfIArriveAfterSunset (
TOO )( O ) When presence ( SP6 ) becomes present if between 5-8pm, turn on oven (
OL3 ).TextMeWhenThere’sMotionAndI’mNotHere (
TMW )( O ) When motion ( MO2 ) active if not presence (
SP6 ), send a notification.LetThereBeLight! (
LTB )( O ) When wardrobe door ( MU4 ) open, turn on light (
SL5 ); when door (
MU4 ) close, turn off light (
SL5 ).VirtualThermostat (
VIT )( O ) When motion ( MO2 ) is detected if temperature (
MU3 ) < ◦ F , turn on heater ( OL4 ); when motion (
MO2 )inactive for 20 minutes, turn off heater (
OL4 ).SmartBedroomLight (
SBL )( C ) When door ( MU3 ) opened, turn on light (
SL4 ); when door (
MU3 ) closed if motion (
MO3 ) inactive for 5minutes, turn off light (
SL4 ).NotifyMeWhenSomeoneFaints (
NMW )( C ) When humidity ( AM2 ) exceeds 85% if motion (
AM2 ) active but motion (
MO3 ) keeps inactive for 30minutes, send a notification.
TABLE V:
Statistics of the data received by PF
IREWALL mediator andthat received by SmartThings for the evaluation of data mediating. Due topage limits, we only present the result of one device for each device type.
Total : the total data volume received by SmartThings cloud and PF
IREWALL mediator, respectively.
Testbed Device Attribute Total Inconsistency T MU1 contact 1960, 1960 0
MU1 temperature 174, 174 0
MO1 motion 2198, 2198 0
MO1 temperature 325, 325 0
OL1 switch 38, 38 0
SL1 switch 24, 24 0
SP1 presence 62, 62 0T AM1 motion 384, 384 0
AM1 humidity 656 , 656 0
AM1 illumance 927, 927 0
TABLE VI:
Statistics of SmartApp method call logs. MC : the number ofmethod calls; INC : the number of inconsistencies;
INCA : the number ofinconsistencies after eliminating redundant method calls.
Testbed App Data Control Results MC in SYS1 MC in SYS2
INC INCA T UEW
971 11 960 0
LON
11 11 0 0
MAC
MAH
MFN
13 11 2 2
STN UEW
26 10 16 3
SML
41 41 0 0
TOO
11 7 4 0
TMW
LTB
42 42 0 0
VIT
235 49 186 0
SBL
164 55 109 0
NMW into the installed SmartApps to record the method calls forcontrolling devices and sending notifications. We compare themethod call sequences of each app in
SYS1 and
SYS2 andcalculate the number of inconsistencies. We summarize theresult in Table VI. We figure that the
IN C values of someapps are large. This is because SmartThings apps do not checka device’s current status before sending it a command and thusredundant method calls are made while PF
IREWALL in designdisables redundant automation commands to reduce reportingdata. For instance, the app
UEW calls the light turn-on methodevery time the door (
MU1 ) is opened, no matter the light is “on” or “off”; however, the redundant method calls are avoided byour data flow policies if the light’s status is already “on”. Thus,inconsistencies are detected in some apps. To eliminate the im-pact of redundant automation on the evaluation of automationaccuracy, we capture and remove the redundant method callsfrom method call sequences in
SYS1 by analyzing app anddevice logs; specifically, if a method call’s effect is to changea device to a state the device is already in, this method call isidentified as redundant and removed from the sequence. Werecalculate the inconsistencies, denoted as
IN CA . As showin Table VI,
IN CA in most apps are 0 except in four apps:
MAC , MFN in T and UEW in T . We manually analyze thecauses of these inconsistencies by examining the device eventand method call logs. we find that the event log of SP1 in SYS2 has one more “present” than that of
SYS1 . Thisis because SmartThings detects presence by monitoring thedistance of a smartphone (GPS data) from the in-home hubwhile PF
IREWALL scans the home WiFi network to examine ifa smartphone enters/leaves; when
SP1 moves around, differentpresence statuses are detected by the two methods due todistinct detection ranges, leading to the inconsistency in
MAC .The inconsistencies in
MFN and
UEW appear because userspecified policies
UP1 and
UP2 block
MO1.motion and
MU2.contact data during certain periods, respectively. Weverify that the 2 inconsistencies in
MFN occur during 5pm-10pm and the 3 inconsistencies in
UEW occur during 8am-6pm.We also observe that no
MO1.motion or MU2.contact data are received by SmartThings in
SYS2 during the specifiedperiods in
UP1 and
UP2 , respectively. The above result showsthe correctness of our policy-based data flow control in enforc-ing user-specified policies and in preserving home automationfunctionalities by generating data-minimization policies.
2) Latency:
We show the efficiency of PF
IREWALL by test-ing the introduced automation latency (mediating delay pluspolicy execution delay). We obtain the result by computing thetimestamp difference of the same command in both commandsequences (
SYS1 and
SYS2 ). We exclude the outliers fromour calculation where the command in
SYS1 is even issuedafter
SYS2 to reduce the influence of network delay andthe cloud response latency on the result. We calculate theautomation latency for each SmartApp in both testbeds andshow the result in Figure 8. The automation latency rangesfrom 124.7 to 486.4 millisecond. An averaged latency of 210.6millisecond is a tradeoff for using PF
IREWALL to mitigate E W L O N M A C M A H M F N S T N U E W S M L T O O T M W L T B S H A V I T S B L N M W Deployed SmartApps L a t e n c y ( m s ) Fig. 8:
Automation latency introduced by PF
IREWALL . The boxesshow the maximum, quartile, averaged and minimum values of themajority latencies per app. The blue squares are some outliers.
TABLE VII:
Comparison of reported data volume per device before andafter the deployment of PF
IREWALL . V OL : volume of reported data in
SYS1 and
SYS2 , respectively; RR : data reduction rate. We present the result forpartial devices. See Appendix D for the result of all deployed devices. Dev Attr
V OL RR
Attr
V OL RR
MU1 contact 1924, 22 0.98 temperature 142, 6 0.96
MO1 motion 2266, 47 0.98 temperature 307, 0 1
OL1 switch 29, 0 1
SL1 switch 22, 0 1
SP1 presence 34, 24 0.29
MU2 contact 52, 24 0.54 temperature 118, 0 1
MO2 motion 364, 68 0.81 temperature 173, 0 1
OL3 switch 44, 0 1
SL2 switch 60, 0 1
AM1 motion 364, 0 1
AM1 illuminance 1039, 1 0.99 humidity 668, 0 1
SP6 presence 28, 12 0.57 privacy leakage, although the latency is completely acceptablefor most automation apps.
3) Reduction of Data Leakage:
To show the effectivenessof data filtering, we compare the data volume reported by eachdevice in the
SYS1 and
SYS2 of both testbeds. As show inTable VII, PF
IREWALL blocks 96.87% IoT data on averaged.More than 99% of float-value sensor readings and devicestates (i.e., ON/OFF states of coffee machines, setpoints ofthermostats, locked/unlocked states of smart locks, etc.); thus,PF
IREWALL prevents the smart home platforms and potentialattackers from learning the private information of smart homesand homeowners based on float-value sensors and householdappliances. PF
IREWALL also reduces the reporting of binary-value sensor attributes (contact, motion, presence) to distinctextents, according to the specific automation app semanticsand app-device bindings. The relative reduction rate RR ofbinary-value attributes are smaller than float-value attributesin general, since binary attributes are used for triggering theexecution of automation apps in most cases and hence cannotbe totally blocked.
4) Privacy Gain:
To show how privacy preservation isachieved by the reducing data leakage, we compare the po-tential privacy leakage under several inference attacks withand without PF
IREWALL . Office members and events profiling.
By analyzing thepresence sensor (
SP1 ∼ ) data in the research lab testbed (T ),the working hours of 5 members (person 1 ∼
5) each of whomcarries a presence sensor could be learned, based on theirentering and leaving time, as shown in Fig. 9(a). In additionto monitoring user presence in real time, the attacker could also learn the personal working preferences and group events.For example, person 1 may leave for classes each Tuesdayand Wednesday; person 3 works less hours than person 1 and2 during weekdays but shows up more on weekends; person4 has a more regular routine through the weekdays; person5 works less hours (4 or so) every day and the hours tendto be in the afternoon; moreover, the members may leavefor a group meeting on Friday morning. When PF
IREWALL is deployed, most presence data are filtered since only the“present” events before 12am from
SP1 are required to turnon coffee machine outlet (see app
MAC ). The presence sensordata of the other persons are never sent because their valuesare kept “not present” in the platform database and only “notpresent” events from
SP1 are sent in order for the app
LON to pass its condition checking. when the last person leaves.which hides the real leaving time of person 1.. Therefore, anattacker could only learn when person 1 arrives the lab roomcorrectly (see Fig. 9(b)).
Bathroom usage monitoring.
By accessing the motion andhumidity data of the Aeotec Multisensor (
AM2 ) in the apart-ment testbed (T ), an attacker can learn the bathroom usagehabits. As depicted in Fig. 10(a), the attacker simply combineseach “active” with the next “inactive” event to obtain the startand end time of a bathroom usage. Moreover, the attackercan also use the humidity data (see Fig. 10(c)) as additionalinformation to help recognize “having shower” activities in thebathroom. In the experiment, the attacker identifies 4 “havingshower” activities by comparing the humidity values with acommon sense threshold (i.e., 85%). When PF IREWALL isapplied, the humidity data is rarely sent (for executing theanomaly activity detection app
NMW ) and motion “active”(
AM2 ) is reported only once to keep the motion value “active”in the platform database. As shown in Fig. 10(b) and 10(d),the humidity and motion data are respectively sent only oncein our one-week experiment, preventing the attacker frommonitoring and learning the bathroom usage habits.
Appliance monitoring.
Non-intrusive load monitoring(NILM) techniques can infer appliance events based onelectricity data, causing privacy concerns [41], [42]. We setup another experiment to learn how attackers are preventedfrom inferring appliance working status and user activitieswhen power data are protected. We connect a microwave, akettle and a stove to a smart outlet and install an automationapp that turns off the outlet when a user leaves home to avoidfire accidents. Although the app only needs a presence sensordata to operate, the outlet also measures real time power dataand reports it to outside. To study the incurred privacy risk, wecollect the reported raw power data (see Fig. 11(a)) for 3 daysand perform inference attacks. The attack process includesdata pre-processing, clustering and mapping (Fig. 11(b)-11(d)). The inference result achieves 95.7% precision and92% recall in identifying appliance activities when comparedwith the manually collected ground truth. When PF
IREWALL operates, all power data are preserved for running this appand hence no user privacy could be inferred from power data.VII. D
ISCUSSION AND L IMITATIONS
Can PF
IREWALL perform home automation and thus getrid of the cloud?
Note that PF
IREWALL has access to all
Time of Day (hour)
MonTueWedThuFriSatSunMonTueWed D a y person 0person 1person 2person 3person 4 (a) Without data flow control Time of Day (hour)
MonTueWedThuFriSatSunMonTueWed D a y person 0person 1person 2person 3person 4 (b) With data flow control Fig. 9:
Inferred user working hours within 10 days with and withoutdata flow control in testbed T . For simplicity of illustration, weround all presence data timestamps to the nearest hours. Time of Day (hour)
MonTueWedThuFriSatSun B a t h r oo m M o t i o n activeinactive (a) motion data without control Time of Day (hour)
MonTueWedThuFriSatSun B a t h r oo m M o t i o n activeinactive (b) motion data with control Time of Day (hour) B a t h r oo m R e l a t i v e H u m i d i t y ( % ) MonTueWedThuFriSatSun (c) humidity data without control
Time of Day (hour) B a t h r oo m R e l a t i v e H u m i d i t y ( % ) MonTueWedThuFriSatSun (d) humidity data with control
Fig. 10: received by the platform with and without data flow control. Foran clearer display, motion data that indicate shorter than 3-minutebathroom activities are omitted in (a). device data and rule semantics from IoT apps. Theoretically,PF IREWALL is capable of running a rule engine to executethe extracted semantics; thus, no data is sent to the cloud atall. However, we did not employ this design due to practicalconsiderations. (1) The kick-cloud-out strategy may causeethical or legal concerns which our research team cannottackle. The SmartThings cloud can easily verify whether it istalking with a real SmartThings hub, and cut all the servicesif not. It means that, while PF
IREWALL may provide homeautomation, all other cloud-based services (messaging, storage,and remote management) will be lost. (2) Huge engineeringefforts are needed to implement an equivalent rule enginethat supports the same programming framework and APIsand maintain them in a long run. Therefore, we strategicallysegregate the data flow control policy engine and the ruleengine; PF
IREWALL only deals with data filtering.
User efforts.
In PF
IREWALL , users pair IoT devices withthe mediator on PF
IREWALL web interfaces and add thevirtual device instances to SmartThings with its companion
Time of Day (hour) O u t l e t P o w e r ( w a tt ) o f D a y s (a) Raw data Time of Day (hour) O u t l e t P o w e r ( w a tt ) o f D a y s (b) Slicing Duration (minute) P o w e r ( w a tt ) cluster 1cluster 2cluster 3 (c) K-means Clustering Time of Day (hour) O u t l e t P o w e r ( w a tt ) o f D a y s (d) Mapping clusters to appliances Fig. 11:
Appliance usage inference over 3-day power data withoutdata flow control. mobile app; thus, users operation for connecting devices isdoubled. We design SmartThings-alike pairing interfaces onthe PF
IREWALL side, which makes pairing on both sidessimilar and reduces potential confusions. Moreover, we use thebrowser automation framework Selenium to develop a Pythonscript, which periodically checks the new SmartApps and de-vices, and installs corresponding instrumented SmartApps (forrule extraction) and custom device handlers (for PF
IREWALL mediation), respectively. Users only provide their SmartThingsaccounts to the script and no other operations are required.
Generality.
Although our implementation targets SmartThingsand openHAB, the presented approach can be potentiallyadapted to other ecosystems. As discussed in Section V-B,it is complete practical to realize a man-in-the-middle medi-ator in most systems. On one hand, the mediator could beextended to work with as various IoT devices as an open-source platform; on the other, the mediator could interfacingwith many platforms via a connectivity technique provided bythese platforms for creating and integrating software servicesand hardware devices as “things”. Moreover, approaches forextracting automation rules from IoT apps [31], [7], [32], [20],[19] and mobile/web interfaces [32], [34], [43] have beenbroadly studied. We envision that tools are developed by thecommunity for extracting rule semantics from more platformssuch that the data-minimization policies can be generated.VIII. R
ELATED W ORK
A. Privacy in Smart Home Platforms
Besides security, privacy is also an important research topicin smart home ecosystems. Zheng et al. [2] studied smarthome owners’ perceptions of privacy risks and actions takento protect their privacy; the study found that users are unawareof privacy risks from inference algorithms operating on datafrom their IoT devices, and they expect device manufacturersto protect their privacy though it is not the case. Celik et al.4] provided a tool for tracking the sensitive data flows inprogramming frameworks and identified 138 out of 230 appsin SmartThings transmit at least one kind of sensitive data overplatform-provided APIs, which means malicious apps havethe capability to steal user data collected by the platform.Literature [18] and [31] also present app-level attacks thatcan brench user privacy. Closest to our work, FlowFence [6]enforced a data flow control mechanism for sensitive dataprotection. However, FlowFence protects sensitive data fromunauthorized apps rather than the platform, so sensitive dataprotection still fails to other attacks; FlowFence requires thecooperation from the platforms and app developers to operate.
B. In-hub Security and Privacy Enforcement
Many in-hub schemes are proposed to enforce security andprivacy schemes in the IoT domain. Simpson et al. design a in-hub security manager built atop the smart home hub to patchvulnerable IoT devices and strengthen authentication. The se-curity manager is deployed in a open-source system HomeOS.FACT [44] and HanGuard [45] enforce access controls in themiddle by implementing controllers on an open-source huband a programmable WiFi router, respectively. By comparison,these schemes rely on a programmable hub (gateway, router)that can indeed intercept control the communication betweenhome area network and the Internet. However, in cloud-based smart home platforms like SmartThings, communica-tions between the commercial hub and the backend cloudare encrypted [46] and hence the router can neither decryptnor modify the packets on demand. PF
IREWALL controlsthe communication between IoT devices and the hub in aunified, backward-compatible way, regardless of the specificcommunication protocol employed by the hub and cloud.IX. C
ONCLUSION
We presented PF
IREWALL , a semantics-aware customiz-able data flow control system for smart homes, which filtersdata generated by IoT devices. PF
IREWALL can automaticallygenerate application-dependent policies based on installed au-tomation apps to block unnecessary data flows and only reportthe minimum amount of data required for home automation.Furthermore, PF
IREWALL allows users to customize individualpolicies according to their own privacy preferences.We overcame many challenges and designed an elegantman-in-the-middle proxy based system, which enforces thesepolicies without modifying the platform or IoT devices. Weimplemented a prototype of PF
IREWALL and evaluated it intwo real-world testbeds. The evaluation results demonstratedthat PF
IREWALL can effectively and efficiently reduce sensi-tive data leakage without interfering with home automation.It heavily impairs an attacker’s ability to monitor and inferuser privacy-sensitive behaviors. In addition to smart homes,the system can also significantly enhance privacy protectionin many other environments, such as smart factories andoffices, that leverage smart platforms for IoT device interactionautomation and other platform-provided services.R
Proceedings of the ACM on Human-Computer Interaction , vol. 2, no. CSCW, p. 200, 2018.[3] E. Zeng, S. Mare, and F. Roesner, “End user security & privacyconcerns with smart homes,” in
Symposium on Usable Privacy andSecurity (SOUPS) , 2017.[4] Z. B. Celik, L. Babun, A. K. Sikder, H. Aksu, G. Tan, P. McDaniel,and A. S. Uluagac, “Sensitive information tracking in commodity iot,”in
USENIX Security 2018 .[5] I. Bastys, M. Balliu, and A. Sabelfeld, “If this then what?: Controllingflows in iot apps,” in
Proceedings of the 2018 ACM SIGSAC Conferenceon Computer and Communications Security . ACM, 2018, pp. 1102–1119.[6] E. Fernandes, J. Paupore, A. Rahmati, D. Simionato, M. Conti, andA. Prakash, “Flowfence: Practical data protection for emerging iotapplication frameworks.” in
USENIX Security Symposium , 2016, pp.531–548.[7] Y. Tian, N. Zhang, Y.-H. Lin, X. Wang, B. Ur, X. Guo, and P. Tague,“Smartauth: User-centered authorization for the internet of things,” in . USENIX Association, 2017, pp. 361–378.[8] A. Acar, H. Fereidooni, T. Abera, A. K. Sikder, M. Miettinen,H. Aksu, M. Conti, A.-R. Sadeghi, and A. S. Uluagac, “Peek-a-boo:I see your smart home activities, even encrypted!” arXiv preprintarXiv:1808.02741 , 2018.[9] T. Datta, N. Apthorpe, and N. Feamster, “A developer-friendly libraryfor smart home iot privacy-preserving traffic obfuscation,” in
Proceed-ings of the 2018 Workshop on IoT Security and Privacy . ACM, 2018,pp. 43–48.[10] N. Apthorpe, D. Reisman, and N. Feamster, “Closing the blinds: Fourstrategies for protecting smart home privacy from network observers,” arXiv preprint arXiv:1705.06809 , 2017.[11] N. Apthorpe, D. Reisman, S. Sundaresan, A. Narayanan, and N. Feam-ster, “Spying on the smart home: Privacy attacks and defenses onencrypted iot traffic,” arXiv preprint arXiv:1708.05044
IEEE Symposium on Security and Privacy2016 .[19] H. Chi, Q. Zeng, X. Du, and J. Yu, “Cross-app threats insmart homes: Categorization, detection and handling,” arXiv preprintarXiv:1808.02125 , 2018.[20] Z. B. Celik, P. McDaniel, and G. Tan, “Soteria: Automated iot safetyand security analysis,” in
Usenix Security 2018
Proceedings of the IEEESymposium on Security and Privacy (S&P). https://doi. org/10.1109/SP ,2019.[26] C. Zuo, Z. Lin, and Y. Zhang, “Why does your data leak? uncoveringthe data leakage in cloud from mobile apps,” in
IEEE Symposium onSecurity and Privacy 2019 .27] “Lan-connected devices,” https://docs.smartthings.com/en/latest/cloud-and-lan-connected-device-types-developers-guide/index.html, 2018.[28] “MQTT,” https://http://mqtt.org/, 2019.[29] E. Fernandes, A. Rahmati, K. Eykholt, and A. Prakash, “Internet ofthings security research: A rehash of old ideas or new intellectualchallenges?”
IEEE Security & Privacy , vol. 15, no. 4, pp. 79–84, 2017.[30] W. Ding and H. Hu, “On the safety of iot device physical interactioncontrol,” in
Proceedings of the 2018 ACM SIGSAC Conference onComputer and Communications Security . ACM, 2018, pp. 832–846.[31] Y. J. Jia, Q. A. Chen, S. Wang, A. Rahmati, E. Fernandes, Z. M. Mao,and A. Prakash, “Contexiot: Towards providing contextual integrity toappified iot platforms,” in
Proceedings of The Network and DistributedSystem Security Symposium , 2017.[32] W. Zhang, Y. Meng, Y. Liu, X. Zhang, Y. Zhang, and H. Zhu, “Homonit:Monitoring smart home apps from encrypted traffic,” in
Proceedings ofthe 2018 ACM SIGSAC Conference on Computer and CommunicationsSecurity . ACM, 2018, pp. 1074–1088.[33] Z. B. Celik, G. Tan, and P. McDaniel, “IoTGuard: Dynamic enforce-ment of security and safety policy in commodity iot,” 2019.[34] I. Hwang, M. Kim, and H. J. Ahn, “Data pipeline for generation andrecommendation of the iot rules based on open text data,” in
IEEEWAINA
Proceedings of the 18th ACMconference on Computer and communications security . ACM, 2011,pp. 87–98.[42] M. Lisovich and S. Wicker, “Privacy concerns in upcoming residentialand commercial demand-response systems.”[43] D. T. Nguyen, C. Song, Z. Qian, S. V. Krishnamurthy, E. J. Colbert,and P. McDaniel, “Iotsan: fortifying the safety of iot systems,” in
Pro-ceedings of the 14th International Conference on emerging NetworkingEXperiments and Technologies . ACM, 2018, pp. 191–203.[44] S. Lee, J. Choi, J. Kim, B. Cho, S. Lee, H. Kim, and J. Kim,“Fact: Functionality-centric access control system for iot programmingframeworks,” in
Proceedings of the 22nd ACM on Symposium on AccessControl Models and Technologies . ACM, 2017, pp. 43–54.[45] S. Demetriou, N. Zhang, Y. Lee, X. Wang, C. A. Gunter, X. Zhou,and M. Grace, “Hanguard: Sdn-driven protection of smart home wifidevices from malicious mobile apps,” in
Proceedings of the 10th ACMConference on Security and Privacy in Wireless and Mobile Networks A PPENDIX
A. Investigation on Popular Smart Home Platforms
We study the privacy policies and practices on 7 popularcloud-based smart home platforms and 3 platforms that useother architectures for comparison. A brief summary is shownin Table VIII. “Easy to access?” shows if a privacy policyis explicitly displayed or prompted during the installation of the platform’s products (especially apps). “Collect devicedata?” shows whether a privacy policy claims that the platformaccesses users’ devices during the services. “Expose data topartners?”, “Restrict data use on 3rd parties?” and “Privacytechniques” show whether the platform claims to share users’data with third parties, whether it claims to restrict how thirdparties can legally use these data and what techniques itemploys to protect user privacy during data sharing. “Collectpersonal info.?” shows whether a platform collects personallyidentifiable information from users during the registrationprocess. “Access device data?” shows if the platform ac-cesses device data while providing services. “Expose datato partners?” shows whether the platform provides devicedata to third-parties, including integrated third-party services.“Access control before hub?” and “User controllable?” indicatewhether any access control mechanism is enforced before theplatform’s hub accesses device data and whether users cancontrol the access between devices and the platform’s hub.Some privacy policies fail to increase user perceptionsof sensitive data collection since they fail to 1) be easilyaccessible, or 2) use jargon-free words, or 3) claim sensitivedata collection explicitly. Some policies, although claim shar-ing data with third-parties, do not claim any data protectiontechniques or any restriction policies to the third-parties. Onthe other hand, we found the fact that most of the studiedplatforms request personal-identifiable information from dur-ing registration, access sensitive data from IoT devices, andshare data with business partners. However, most platforms donot have mechanisms to minimize the data access from usersand do not provide interfaces to users for fine-grained controlson their sensitive data. Users are only capable of choosingwhether to agree with the privacy policy. Once a device isconnected to the platform, they cannot further decide how theirdeivces report data to the platforms.
B. Time/Timer-related Automation PF IREWALL also deals with time-related automations. Forinstance, if a rule is defined as “when the door is opened iftime is after 18:00, turn on TV”, the derived policy needs tofetch system time for condition checking. When it comes to atimer-related automation, e.g., “when motion sensor becomesinactive for 5 minutes, turn off the light”, multiple policiesare bundled to operate by calling the methods for starting,stopping and firing a timer. Fig. 13 illustrates the workflow ofhow PF
IREWALL handles this example.
C. Interfacing with openHAB1) Implementation:
We use the supported MQTT to inter-face with openHAB because it is a general connectivity proto-col, allowing for virtualizing any device types with flexibility.Fig. 14 shows the high-level architecture of the integration.openHAB provides an embedded MQTT broker, so our workis to realize each virtual device (VD) as a MQTT client andcreate a Generic MQTT thing (supported by MQTT binding) inopenHAB for the real device represented by the VD. A thing inopenHAB has channels (equivalent to the concept “attribute”in SmartThings, e.g., motion, temperature, etc.) and eachchannel can be linked to an item (used for displaying valuesreceived by the linked channel and used as an interface forautomation rules to interact with the real device). In openHAB, a) (b) (c) (d)
Fig. 12:
The PFirewall Survey mobile app used in the user survey.
TABLE VIII:
A summary of privacy policies and facts in some well-known platforms. AGG: aggregation; ANO: anonymization.
Platform Privacy Policy FactsEasy toaccess? Collectdevice data? Expose datato partners? Restrict data useon 3rd parties? Privacytechniques Collectpersonal info.? Accessdevice data? Expose datato partners? Access controlbefore hub? Usercontrollable?
Wink " " " %
AGG " " " % %
Iris " " " % % " " " % %
Vera " " " "
AGG, ANO " " " % %
Lutron " % " % % " " " % %
Thingsee " " " "
AGG, ANO " " " % %
SmartThings % " " "
AGG, ANO " " " % %
EVRYTHNG % % % % % " " " % % openHAB % % % % % " " " % %
Mozilla IoT % % % % % " " % % %
Apple HomeKit " % % % % " " % % %
Timer (id1, duration)
ActiveInactive
Motion Sensor
StartTimer(id1)StopTimer(id)addCallback(id1, action1) if duration > 5min, fireTimer(id1) action1...
Fig. 13:
The workflow of how PF
IREWALL handles a timer-relatedrule example. The methods are show in Table IX. action1 is defined toreport “inactive” to the platform with method keep and zero delay.Each timer maintains a list of actions which will be called when thetimer’s duration satisfies a certain constraint. each MQTT thing channel can be configured as a MQTTclient. By subscribing to the same MQTT topic (essentiallya path-alike string), MQTT clients can publish/receive datato/from the topic.When a new device is added to PF
IREWALL , a VDinstance is created. If the real device is a sensor (e.g., TABLE IX:
Methods for dealing with timer-related automation
Method Description startTimer(id)
Create or reset a timer with identity idstopTimer(id)
Stop and reset a timer with identity idfireTimer(id)
Fire a timer id and execute actions in its callbacks addCallback(id,act) Add an action act to the callbacks of timer id motion sensor in Fig. 14), the VD instance subscribesto a topic data/ { device id } / { attribute } (e.g., data/12345/motion ) for publishing data, where device idis generated randomly by PF IREWALL ; if the real device is anactuator (e.g., smart outlet), the VD instance subscribes to atopic data/ { device id } / { attribute } for publishingdata and a topic cmd/ { device id } / { attribute } forreceiving commands. The MQTT bining in openHAB doesnot provide a device discovery function. To automatically adda thing and its channel in openHab, there are two choices:operating on the web interfaces or adding a configurationfile in the openhab/conf/things/ directory. We choose irtual Device Manager Virtual Device-SmartOutletVirtual Device-MotionSensor openHAB
MQTT Embedded BrokerSmartOutlet
T1: data/id_outlet/switch
PFirewall item_s Motion Sensormotionswitch item_m
T3: data/id_motion/motion
T2: cmd/id_outlet/switch
MQTT topics
T1 T2 T3 T4
ThingChannelItem
T1 T2 T3 T4
Fig. 14:
Overview of how the mediator interfacing with openHAB.
TABLE X:
Comparison of reported data volume per device before and afterthe deployment of PF
IREWALL . V OL : volume of reported data in
SYS1 and
SYS2 , respectively; RR : relative reduction rate. We present the result for eachdevice type. See Appendix D for the complete result of all deployed devices. Dev Attr
V OL RR
Attr
V OL RR
MU1 contact 1924, 22 0.98 temperature 142, 6 0.96
MO1 motion 2266, 47 0.98 temperature 307, 0 1
OL1 switch 29, 0 1
OL2 switch 19, 0 1
SL1 switch 22, 0 1
SP1 presence 34, 24 0.29
SP2 presence 36, 1 0.97
SP3 presence 30, 1 0.96
SP4 presence 28, 1 0.96
SP5 presence 26, 1 0.96
MU2 contact 52, 24 0.54 temperature 118, 0 1
MU3 contact 268, 58 0.78 temperature 131, 8 0.94
MU4 contact 42, 42 0 temperature 109, 0 1
MO2 motion 364, 68 0.81 temperature 173, 0 1
MO3 motion 564, 21 0.96 temperature 157, 0 1
OL3 switch 44, 0 1
OL4 switch 49, 0 1
SL2 switch 60, 0 1
SL3 switch 68, 0 1
SL4 switch 70, 0 1
SL5 switch 42, 0 1
AM1 motion 364, 0 1
AM1 illuminance 1039, 1 0.99 humidity 668, 0 1
AM2 motion 462, 0 1
AM2 illuminance 1384, 0 1 humidity 893, 1 0.99
SP6 presence 28, 12 0.57 the latter approach to automate the process. By populating astring template with the same device id, attribute and topicinformation as the VD instance, PF
IREWALL creates a MQTTthing by adding a thing file to the openHAB directory througha FTP service. Thus, the created MQTT thing can receive datafrom or send commands to the VD by subscribing to the sametopics.
2) Evaluation: openHAB allows users to write automationapps with a domain specific language (DSL), which is adaptedfrom Xbase [47]. However, openHAB does not provide officialapps for installation. To test our openHAB integration, wedevelop 13 apps implementing the same rule semantics to workwith the same devices, as shown in Table IV. We manuallyoperate the real devices to trigger each rule for 20 times andfind all apps are executed correctly.
D. Complete Evaluation Result of Data Volume Reduction
Due to page limits, we only present the result of onedevice for each device type in Table VII in Section VI-C3.Table X shows the complete list of all deployed devices inboth testbeds.
E. User Study1) Setup:
We conduct a user survey to study users’ at-titude and abilities towards defining customized data flowcontrol policies with our policy templates (Section V-A2). Werecruit 20 adult participants who are knowledgeable aboutthe concepts “home automation”, “smart home” or “IoT”from our institutions. Participants completed the trial tasksof our “PFirewall Survey” app in our lab using smartphoneswe provided and after that answered several questions (seeSection E3).We asked the participants to get familiar with a smart homesetting where 10 automation rules (Fig. 12(b)) are configuredto work with 15 devices (Fig. 12(a)). The app provides a page(Fig. 12(c)) to illustrate the architecture of the system and thepotential risks of data leakage; we did not explain the contentand ask questions about this page to avoid influencing theunderstanding of end-users by factors other than the interfaceitself. Besides, the app also provides an interface showing thelist of 15 devices; when a device is selected, the app switchesto a device detail page (e.g., Fig. 12(d)) showing what datathe device generates and what privacy risks are imposed ifthe data are leaked. In addition, policy templates (as shown inFig. 4(b)) were provided for participants to define their ownpolicies. After a 30-minute trial, participants were asked toanswer questions.
2) Results:
All 20 participants cared about their dataprivacy and thought it useful to define their own data flowpolicies for protecting privacy. However, 2 participants thoughtthey would not spend time in defining policies even if an appis available. We collect the number of participants who hadprivacy concerns on each listed device. Cameras and smartspeakers were the top two devices whose data are consideredsensitive by the participants (19 and 16, respectively); half ormore participants had concerns on the status data of smartlocks, doors and windows (11, 13, 10, respectively); Each ofhumidity sensors, heaters, lights, powers and coffee makers isconcerned by less than 3 participants. Except the listed devices,the participants also cared about the data privacy of smart TV,smart window blinds, smart outlet.Regarding the usability of our policy templates, 8 par-ticipants thought the templates are “very easy” to use and12 participants thought them “easy” to use. 3 participantsfound that they cannot specify policies to control data byspecifying multiple conditions with the templates, for example,the combination of an event and a specified time period.According to the feedback, we address this issue by allowingusers to select another condition after a condition has beenspecified.Overall, participants concern data privacy and hold a pos-itive attitude in defining own policies with our templates. Theresult also shows that participants may overlook the privacyrisks of some devices like humidity sensor and powers, whichwe have discussed in Section VI-C4. Hence, data-minimizationpolicies and user-specified policies could work together toachieve better privacy protection.
3) Questions in the user study:
1) Do you care about your data privacy if you use a smarthome system?. YesB. No2) List the device(s) (from the given device list in our “PFire-wall Survey” app) which you have privacy concerns if thedevice data are leaked.3) Do you think it is useful in general to control your owndata to reduce privacy leakage risks?A. YesB. No4) Would you spend time defining your own policies to controldata if an app like “PFirewall Survey” is available for youto do so?A. YesB. No5) Recall how our app guide you to define your own policies.Are the provided policy templates easy to understand anduse?A. EasyB. Somewhat challenging but still able to useC. Not usable6) Do you find any policy that you think useful but the giventemplates fail to enable you to do so? If any, please list it.