[PDF] ElfStore: A Resilient Data Storage Service for Federated Edge and Fog Resources

Abstract

Edge and fog computing have grown popular as IoT deployments become wide-spread. While application composition and scheduling on such resources are being explored, there exists a gap in a distributed data storage service on the edge and fog layer, instead depending solely on the cloud for data persistence. Such a service should reliably store and manage data on fog and edge devices, even in the presence of failures, and offer transparent discovery and access to data for use by edge computing applications. Here, we present Elfstore, a first-of-its-kind edge-local federated store for streams of data blocks. It uses reliable fog devices as a super-peer overlay to monitor the edge resources, offers federated metadata indexing using Bloom filters, locates data within 2-hops, and maintains approximate global statistics about the reliability and storage capacity of edges. Edges host the actual data blocks, and we use a unique differential replication scheme to select edges on which to replicate blocks, to guarantee a minimum reliability and to balance storage utilization. Our experiments on two IoT virtual deployments with 20 and 272 devices show that ElfStore has low overheads, is bound only by the network bandwidth, has scalable performance, and offers tunable resilience.

Full PDF

EElfStore : A Resilient Data Storage Service forFederated Edge and Fog Resources ∗ Sumit Kumar Monga, Sheshadri K R and Yogesh Simmhan

Department of Computational and Data Sciences,Indian Institute of Science (IISc), Bangalore 560012, IndiaEmail: [email protected], [email protected], [email protected]

Abstract

Edge and fog computing have grown popular as IoT deployments be-come wide-spread. While application composition and scheduling on suchresources are being explored, there exists a gap in a distributed data stor-age service on the edge and fog layer, instead depending solely on thecloud for data persistence. Such a service should reliably store and man-age data on fog and edge devices, even in the presence of failures, and oﬀertransparent discovery and access to data for use by edge computing appli-cations. Here, we present

ElfStore , a ﬁrst-of-its-kind edge-local federatedstore for streams of data blocks. It uses reliable fog devices as a super-peeroverlay to monitor the edge resources, oﬀers federated metadata indexingusing Bloom ﬁlters, locates data within 2-hops, and maintains approxi-mate global statistics about the reliability and storage capacity of edges.Edges host the actual data blocks, and we use a unique diﬀerential repli-cation scheme to select edges on which to replicate blocks, to guarantee aminimum reliability and to balance storage utilization. Our experimentson two IoT virtual deployments with 20 and 272 devices show that Elf-Store has low overheads, is bound only by the network bandwidth, hasscalable performance, and oﬀers tunable resilience.

The growing prevalence of

Internet of Things (IoT) deployments as part of smartcity and industrial infrastructure is leading to a rapid inﬂux of data generatedcontinuously from thousands of sensors [1]. These data sources include smartutility meters, air pollution monitors, security cameras, and equipment sensors.

Analytics over these data, in real-time or periodically, helps make intelligentdecisions for the eﬃcient and reliable management of such complex systems [2].At the same time, IoT is also leading to the availability of edge and fogcomputing devices on the ﬁeld, as part of sensors and gateways [3]. Aﬀordable ∗ To appear in IEEE International Conference on Web Services (ICWS), Milan, Italy, 2019 a r X i v : . [ c s . D C ] M a y dge devices like Raspberry Pi are often co-located with the sensors on privateand wide-area networks to acquire data, perform local analytics , and transmit it to cloud data centers for persistence [4]. Fog devices like NVidia Jetson TX2 manage neighboring edge devices on the network, oﬀer more advanced computing for further analytics or aggregation, and also forward data to the cloud. Inlarge IoT deployments, the edge and fog devices are often organized in a for ease of management and scalability [5], and complementedby cloud resources. Edge computing is motivated by the access to such cheap or free edge andfog compute resources, the reduced network latency between the data sourceand the analytics that makes the decision (e.g., power grid management), andto mitigate network use by high-bandwidth applications (e.g., video analyticsfor urban safety) [6, 7]. There is active research on composing micro-servicesand scheduling dataﬂows for execution on edge and fog resources, in combi-nation with or instead of cloud resources [8, 9]. These platform services allowapplications to run continuously over incremental data.However, two key gaps exist. One, there is a lack of transparent data accessservice at the edge or fog, from which such applications can consume their in-put. Typically, streaming application bind to speciﬁc device endpoints or topicson a central publish-subscribe broker, while ﬁle-based applications use ad hoc mechanisms. Ideally, applications should be able to use the logical features ofthe data they are interested in, such as its metadata, rather than its physicaladdress, to access it. Two, data generated on the edge and fog are only tran-siently available on them, and eventually moved to the cloud for persistence,a key reason being that edge devices are usually less reliable. So, applicationsusing such data are forced to run on the cloud, or move them back to the edgefor computing.These motivate the need for a distributed data storage and management ser-vice over fog and unreliable edge devices that oﬀers content-based discovery,transparent access, and high availability of data , across a wide area networkand in the presence of device failures. This ensures data locality for applica-tion micro-services on the edge, allows the cumulative storage capacity of theedge devices to be eﬃciently used, and avoids transferring data to the cloudfor persistence. The storage service should also be optimized for data that is continuously generated , as is common for IoT sensor data, and yet allow accessto diﬀerent temporal or logical segments within the data stream.We make the following speciﬁc contributions in this paper:1. We propose

ElfStore, an Edge-local federated Store , which is a ﬁrst-of-its-kind stream-based, block-oriented distributed storage service over un-reliable edge devices, with fog devices managing the operations using a super-peer overlay network .2. We propose a federated indexing model using Bloom ﬁlters maintained byfogs for a scalable, probabilistic search for blocks based on their metadataproperties. 2. We oﬀer tunable resilience for blocks using a novel diﬀerential replicationscheme across unreliable edges. This uses approximate global statistics atthe fogs to decide on replica placement, which is sensitive to edge reliabil-ity, balances capacity usage, and ensures data durability.The rest of the paper is organized as follows. We review related work tohighlight the novelty of our contributions in Sec. 2, introduce the ElfStore servicearchitecture and operations, federated indexing and tunable replication in Sec. 3,present detailed experiments to validate the design and scalability in Sec. 4, andoﬀer our conclusions in Sec. 5.

There has been limited work on distributed data storage on edge and fog re-sources, as reviewed and classiﬁed in

Moyasiadis, et al. [10]. Rather than oﬀ-loadto cloud or aggregate to reduce the size, we instead adopt a peer-to-peer (P2P)model which does not reduce data ﬁdelity, and maintains locality on edge andfog resources, with reliability guarantees. Others [11] have evaluated existingdistributed cloud object stores,

Rados (Ceph), Cassandra and

Inter PlanetaryFile System (IPFS) , for use on edge and fog resources, and proposed exten-sions. However, these store data only on the fog layer, with the fog assumedto be high-end Xeon servers with 128 GB RAM. We instead design our storageservice for practical and large-scale edge and fog resources that run on Pi- andJetson-class devices with 4–8 ARM cores and 1–2 GB RAM, and use the edgedevices as ﬁrst-class entities for persistence.

IPFS [12] is used for storing web content on a wide-area network. It usesa Merkle tree to capture the directory structure, content-based addressing forﬁles, and a P2P Distributed Hash Table (DHT) to map the ﬁle’s hash to itspeer locations. BitTorrent is used for data movement, and the data is replicatedwhen a client downloads it.

Confais, et al. [13] have deployed IPFS on fog andcloud resources using Network Attached Storage (NAS). They extend IPFS tosupport searching at the local fog, besides the DHT, to speed up access to localcontent. However, storage is limited to the fog and not edge, and there is noactive replication to ensure reliability upon failures.

FogStore [14] proposes a distributed key-value store on fog resources withreplication and diﬀerential consistency. Our focus is on reliably storing a streamof blocks of a much larger size, where resilience and capacity constraints are met.Others [15] propose repositories hosted on stable fogs (referred to as “edges”)that are populated by data from transient edges (“mobile devices”), and act asa reverse-Content Distribution Network (CDN) to serve requests from the cloudtoo. Reliability is a non-goal in their design and no experiments are presented. vStore [16] supports context-aware placement of data on fog and cloud resources,with mobile devices generating and consuming these data. It uses a rules engineto place and locate data based on its context metadata, but ignores reliabilityas edge devices do not store data. 3 hen, et al. [17] examine fault-tolerant and energy-eﬃcient data storageand computation on a set of edge devices (“mobile clouds”), without any fogor cloud. They use k-of-n erasure coding , where ﬁles are fragmented and codedfragments placed on energy-eﬃcient edge devices. Access to data is by creating n tasks that execute on the edge devices containing the fragments, and waitingfor k of them to complete, so as to decode and process the original fragment.This tightly-couples processing with storage on the same devices, rather thanoﬀer an independent data service like us. Also, it is designed for 10–100’s ofedge devices since all-to-all information is required for decision making, while weuse fog overlays that can scale to 100’s of fogs and 1000’s of edges. They do notsupport searching by metadata like we do. Lastly, erasure codes while space-eﬃcient compared to replication, are time-ineﬃcient for recovery on unreliablesystems, like the ones we consider [18]. RFS [19] is a distributed ﬁle system hosted on the cloud but optimizedfor mobile clients (edges) with transient network connectivity. While the cloudholds the encrypted master data, clients selectively pre-fetch, decrypt and cacheparts of the ﬁle based on their access patterns. Clients have exclusive access totheir encrypted home directory, and common access to shared directories. Themaster data in the cloud is reliable.

P2P systems like Chord, Pastry and BitTorrent have proposed distributedﬁle, block and key-value storage on unreliable peers on wide-area networks [20].We adopt several of these concepts such as super-peers [21], but simplify andenhance their performance for edge and fog deployments with less device ﬂux,guarantee a minimum durability for stored blocks, and balance the storage ca-pacity across peers. We also use an eﬃcient federated indexing using Bloomﬁlters [22].

Cloud storage services like

HDFS and

Ceph [23] have been vital to the suc-cess of Big Data platforms by separating the distributed storage layer from thecomputing layer, like

Apache Spark or MapReduce , while allowing co-locationduring scheduling. We adopt a similar model for edge and fog, while beingaware of the network topology, sensitive to variable failure rates of edges, andoﬀering search capability.In summary, none of the existing literature or systems provide a scalable dis-tributed store for storing, searching and accessing streams of objects generatedfrom IoT sensing devices on fog and unreliable edges , while guaranteeing relia-bility , balancing capacity , and leveraging the topology of fog and edge resources. In this section, we describe the desiderata, the supported operations, our designchoices, and the architecture for

Edge-local federated Store (ElfStore) .Our system model has two types of resources, edge and fog . Edges likeRaspberry Pi have constrained compute and memory (e.g., 4-core ARM32 CPU,1 GB RAM), and about 64 GB of SD card storage. These commodity devicesare cheap but unreliable , especially when operating in the ﬁeld, and have an4xpected failure rate. Each edge connects to a single fog , through a wireless orwired private local area network (W/LAN), and the fog manages it. Fogs likeJetson TX2 have moderate resource capacity (e.g., 8-core ARM64 CPU, 4 GBRAM, 500 GB HDD), and serve as a gateway to the public Internet for theiredges to connect to other fogs and their edges. Fog resources are reliable , andconnect with each other through a wired Metropolitan or Wide Area Network(MAN/WAN). We plan to support city-scale deployments having 10–100’s offogs, each managing 10–100’s of edges [7].Given this, there are several design goals and assumptions for our datastorage service. (1) Applications running on edge, fog or other devices on theInternet may put, search and get data and associated metadata from the service.However, we expect that the edges will be the predominant clients to the store,generating and writing data continuously from co-located sensors, and consum-ing data for edge micro-services. (2) The edges will serve as the primary storagehosts for the data to enhance locality (hence, “edge-local”), with the fogs usedfor management and discovery . We avoid cloud as a storage location, thoughit can have clients that access the data for processing or long-term archival.(3) Data that is stored must meet a minimum reliability level, even with edgefailures, and have suﬃcient availability . The typical lifetime of the hosted datais in days or months (not years), as edge applications are likely to be interestedin recent data. Adequate cumulative storage capacity should be available onthe edges. (4) The store should scale as edges join and leave the system, oftentriggered by device failures and their stateless recovery, or occasional capacityexpansion. Its performance should also weakly scale with the number of clients.(5) We assume a fully-trusted environment , where all edge and fog devices are se-cure, part of the same management domain, and there are no access restrictionsto the contents.The

ElfStore architecture (Fig. 1) addresses these requirements, and oﬀersa federated storage service for streams of blocks. It uses the local disks onunreliable edges in the LAN as the persistent layer , and fogs on the WANconnected using a super-peer overlay as the management layer . It guaranteesreliability at the block-level using diﬀerential replication , and helps search forstreams and blocks over their metadata using federated

Bloom ﬁlter indexes.These are discussed next.

IoT data is often streaming, and arrives continuously from sensors. Whilepublish-subscribe brokers enable access for real-time processing, we handle datastorage and application access in the short- and medium-term. Since this dataaccumulates over time, ElfStore adopts a hybrid data model consisting of a stream of blocks . Here, the storage namespace has a ﬂat set of streams, identi-ﬁed by unique stream IDs, and a sequence of data blocks within a stream ID ,each having a unique block ID . Streams have associated metadata properties as aset of name–value pairs, and is used in searching. Each block has a data payload as a byte-array, and also metadata properties .5 istributed Storage Edge Edge Edge Edge Fog 1 Fog 2

Data MovementFederated Index

Monitoring Plane

Super-peer Overlay

B B B B

Reliability AssuranceI IS S

Figure 1: High-level Architecture of ElfStore

Stream properties include the stream ID, start and end time range of itsblocks, sequence IDs of the blocks, and user-deﬁned properties like sensor type,spatial location, etc.

Block properties are stream ID, block ID, sequence number,MD5 checksum, timestamp, and domain properties. Our store is optimized forappend rather than update operations, with data and metadata often (but notalways) immutable .While this model resembles other block and object stores like HDFS, Cephand Azure Blobs, we additionally allow users to search over the block and streammetadata to discover block IDs to access. This is useful when the IoT clientsmicro-batch sensor streams and create blocks with diﬀerent temporal eventranges, and consumers wish to access blocks containing a particular time seg-ment; or when diﬀerent variables from the same sensor is placed in diﬀerentblocks of a stream and users wish to access blocks holding speciﬁc variables. Ifneed be, streams can be treated as directories and blocks as ﬁles within themto even oﬀer a distributed ﬁle-system view .Given this, ElfStore supports the following service API: • CreateStream(sid, smeta[], r)

This creates a logical stream withID sid , with r as the stream’s reliability (i.e., reliability required for itsblocks), and registers its metadata with the local (owner) fog, with aninitial version number, and indexes it for searching. Metadata propertiesmay be static or dynamic. • Open|ReopenStream(sid)

This is optionally used before

Put to ac-quire an exclusive write lock to the stream for this client. Its response isthe lease duration.

Reopen renews the lease before it expires. • PutBlock(sid, bid, bmeta[], data, lease)

Put adds a singlenew block bid to the end of the stream sid , with the given data payload6nd the stream’s reliability, and registers its static block metadata forsearching. If lease is passed from

Open or Renew , it supports concurrentputs. Else, it behaves as an optimistic, lock-free protocol. • UpdateBlock(sid, bid, data, lease)

This updates the data con-tents for all replicas of an existing block, but is otherwise similar to put. • UpdateStreamMeta(smeta[], v)

This allows the dynamic metadataproperties for a stream to be updated, where smeta has the updatedproperties and v the version number of the old metadata being updated. • FindStream(squery)

This searches for streams that match a given setof static stream properties provided in the squery , and returns their IDs. • FindBlock(bquery)

This searches for blocks that match a given set ofstatic properties provided in the bquery , and returns their stream andblock IDs. • GetStreamMeta(sid, latest)

This fetches the cached metadata forthe stream sid and their version. The latest ﬂag forces the most recentversion of the metadata to be fetched. • GetBlock(sid, bid)

This downloads the data and metadata for thegiven stream and block ID.Every fog runs a service that exposes these APIs, and clients can initiate an oper-ation on any fog. These can be enhanced in future by APIs like

InsertBlock , GetBlockRange , GetBlockMeta , DeleteBlock , DropStream , etc.

ElfStore uses a

P2P model for device management and search. Fogs act as super-peers and edges as peers within them [21]. Each edge peer attaches to asingle fog super-peer, which serves as its parent and manages search and access to its data and storage. A fog and its edges form a fog partition . This reﬂectspractical IoT deployments where such a 2-level hierarchy is common [5]. E.g.,there may be a fog within a university campus, and all edges in the campusLAN are part of this fog partition.Typical P2P networks scale exponentially, but require a logarithmic numberof hops to locate information [20]. Each (super)peer maintains routing detailsto h (super)peers, where 2 h is the number of items that can be stored in thenetwork. These form an overlay network that takes up to h hops to locatea peer containing an item ID. Since we expect the fogs to number within thethousands and without a lot of ﬂux, we instead maintain the super-peer overlayas a recursive 2-level tree. Each fog maintains a list of b buddy fogs at the ﬁrstlevel (which form a buddy pool ), and a list of n = ( pb +1 − neighbor fogs atthe second level, where p is the total number of fog devices. Buddy pools are7utually exclusive, as are the neighbors of buddies in each pool. This limitsour searches to 2 hops – ﬁrst to a buddy and then to its neighbor ∗ . Edgesknow which parent fog to join, and since our fogs do not come and go often,existing P2P discovery mechanisms or even simpler techniques can be used forconstructing this overlay network.Fig. 2a shows p = 12 fog super-peers in an overlay, each with b = 2 buddiesand the other fogs being partitioned across these buddies to give n = 3 neighborseach. For brevity, neighbors for only one buddy pool and edges for only onefog partition are shown. E.g., fog 9 maintains details on its buddies and ,neighbors

10, 11 and , and edges, e – e . Light-weight heartbeat events that are a few bytes long and sent often ( ≈ secs ) are used to monitor the devices. We also piggy-back tens of bytesof metadata and statistics in these heartbeats. This monitoring plane enablesfail-fast detection of device failures, and federated statistics to be maintained(Fig. 1).Each edge in a fog partition sends heartbeats to its parent fog when it isonline, say every 30 secs . The arrival or loss of an edge is detected usingthis. Multiple heartbeat misses indicate a loss, and will trigger re-replication ofblocks on the missing edge, while an edge arrival will make its storage available.This obviates the need for a “graceful” entry or exit of edges. Fogs in a buddypool send heartbeats to each other. Besides detecting the loss of a buddy andrecovering its state (in future), this passes aggregate statistics from each buddyabout its neighbors to others in the pool. Likewise, neighbors of a fog send itheartbeats and statistics periodically. Such heartbeats between buddies, andbetween neighbors and a fog, can help maintain the overlay network as fogscome and go. Typical P2P DHTs use consistent hashing over their IDs to locate the peerhosting the content. But we provide a unique feature to locate streams andblocks using their static metadata , and not just ID. We maintain a federatedindex , updated using the heartbeat events, to enable this (Fig. 1). First, eachfog maintains a partition index of the metadata for blocks present in its edgesand streams registered with it. This index is updated when a stream is createdon the local fog that becomes its owner, or when a block replica is placed on itas part of an

PutBlock call or a re-replication.Each edge e ij sends a (cid:104) a, v, e ij , bid (cid:105) tuple to its parent fog i , when a block bid ∗ This model can be easily extended to a classic super-peer overlay that scales to millionsof fogs but with h hops, or to support b -level redundancy for fog failures by having edges useall b + 1 buddies as parent fogs [21].

410 28 5

11 3 Buddies p=12b=2 n=3 m=5

Buddies

NeighborsPartition e e e e e (a) Fogs & edges in a 2-level super-peer over-lay network 𝑓 𝐿,𝑎 𝑗 | 𝑗=10 𝑓 𝐵,𝑎9 𝑓 𝐵,𝑎5 e e e e e 𝑎, 𝑣, 𝑒 𝑖 , 𝑏𝑖𝑑 𝑓 𝐿,𝑎𝑗 | 𝑗=68

86 7 𝑓 𝐿,𝑎𝑗 | 𝑗=24

42 3 e e e Index UpdateQueries (b) Update and search of federated index at fog 1 Figure 2: Overlay Network and Federated Indexwith property name a and value v is put on it † . The fog maintains the index I a : v → ( e ij , bid ), that locates edges and block IDs in its partition that matcha name–value pair. This update tuple is shown in Fig. 2b for fog 1 from itsedges, and allows the fog to answer queries – FindBlock queries overthese property name(s) can be answered locally to return the matching blockIDs and edges.We also maintain a hierarchical Bloom ﬁlter from neighbors, buddies andtheir neighbors to identify fog partitions that potentially host block(s) matchinga given key–value pair, within of the fog initiating the search request.Speciﬁcally, each fog i applies its edge metadata updates to a local Bloom ﬁlter for each property name, given as f iL,a = (cid:87) k ( H ( v k )), where H is a ﬁxed bit-widthmulti-level hash function, v k are the set of distinct values for the property name a for blocks present in this partition, and the Bloom ﬁlter is formed by a bitwise OR over all the hashes [24]. We test if a value v (cid:48) is probably present in the ﬁlterby checking if the bitwise OR of the ﬁlter with a hash of the value is non-zero,i.e., ( f iL,a ∨ H ( v (cid:48) )) (cid:54) = 0.Bloom ﬁlters can have false positives , whose frequency is determined by thenumber of unique values inserted, the number of bits in the hash, and the qualityof the hash [24]. But it has constant-time insertion and lookups, and compactstorage . In our experiments, we use a 160 bit SHA1 hash per property name.Also, Bloom ﬁlters do not support deletions, and hence used to only index static properties and not dynamic ones. This can be relaxed in future using

Cuckoo Filters [25].When the local Bloom ﬁlter is updated, a fog sends it to other fogs it † The block and stream IDs themselves are a property name. We use a similar approachfor stream metadata, but omit its discussion for brevity.

9s a neighbor of, as part of the heartbeats. Each fog i maintains list of n neighbor Bloom ﬁlters for a property name a , one per neighbor fog j , given as F iN,a = {(cid:104) j, f jL,a (cid:105)} . This lets a fog check if any neighbor possibly contains blocksmatching a given name–value query, and if so, forward the FindBlock queryto the neighbor for an exact match using its local index I a . Fig. 2b showsneighbors fogs 2, 3, 4 sending their updates to fog 1 , and responding to queries.Lastly, each fog encodes its local Bloom ﬁlters and its neighbor’s Bloomﬁlters into a recursive Bloom ﬁlter [22], and sends it to its buddies. For a fog j with neighbors fog k , this buddy Bloom ﬁlter is constructed as f jB,a = (cid:87) nk =1 ( f kL,a ) ∨ f jL,a . Each fog maintains b buddy Bloom ﬁlters, F iB,a = {(cid:104) j, f jB,a (cid:105)} ,which allows it to test if its buddies or their neighbors possibly match a givenquery. E.g., in Fig. 2b, buddy fog 9 constructs a buddy Bloom ﬁlter from itsneighbor Bloom ﬁlters, fogs 10, 11, 12 , and its local Bloom ﬁlter, and passes itto fog 1 . This uses it for (forward request to buddy) or (forwardto buddy’s neighbors) queries.Since client requests are routed through a fog, each fog maintains a cacheof metadata retrieved from others as part of various operations. This allowsfast responses to other clients from the local fog’s cache rather than the parentfog, but can return stale dynamic properties. Clients can pass a ﬂag to forcethe latest metadata to be fetched. We do not cache data blocks to reduce thestorage overhead, though it is a simple extension. Each edge e i has a pre-deﬁned device reliability r i , which can be part of thedevice speciﬁcation or inferred from ﬁeld experience. We also assume that blockshosted on them are permanently lost when they disconnect from their parentfog.ElfStore uses diﬀerential replication to ensure that a block of size ¯ s that itstores meets its block reliability ¯ r , by placing replicas on q edges having availablestorage capacity s i and reliabilities r i , such that ¯ s ≤ s i and (1 − ¯ r ) ≥ (cid:81) qi =1 (1 − r i ).So the replication count q depends on both the reliability required for the block,and the reliabilities of the edges used. When a fog receives a request to put ablock with its stream’s reliability, it determines the replication factor q and theexact edges to put these replicas on. E.g., a reliability of ¯ r = 0 .

999 (i.e., 99 . q = 3 edges with reliabilities, r i = { . , . , . } such that (1 − . ≥ (1 − . × (1 − . × (1 − . q = 2 edges having r i = { . , . } .The key challenge is that with 1000 (cid:48) s of edge devices, it is not possible foreach fog to maintain the current capacity and reliability of every edge device tomake this decision. Instead, just as we used federated indexes to locate blocks,we similarly propagate and maintain approximate statistics about the storageand reliability of edges in various fog partitions within the overlay network tohelp make this decision. 10 .4.1 Approximate Statistics Each edge e i reports its reliability and available storage capacity (cid:104) r i , s i (cid:105) to itsparent fog, periodically as part of its heartbeat. Each fog i then determines the minimum, maximum and median reliabilities and storage capacities for all itsedges, (cid:104) r mini , r medi , r maxi (cid:105) and (cid:104) s mini , s medi , s maxi (cid:105) , along with the count of edgesthat fall within each quadrant of this 2D space, (cid:104) c q i , c q i , c q i , c q i (cid:105) , as illustratedin Fig. 3(d). Here, we have c q i edges with reliability between [ r medi , r maxi ) andcapacity between [ s medi , s maxi ); c q i edges with [ r medi , r maxi ) and [ s mini , s medi ); andso on for the other 2 quadrants.These edge counts correspond to the combinations of high/low capacity andhigh/low reliability, HH, HL, LL, HL . We will also have c q i + c q i ≈ c q i + c q i ,and c q i + c q i ≈ c q i + c q i , depending on rounding errors.These 10-tuple values are then sent to the fogs we are a neighbor of, aspart of heartbeats. Similarly, buddies exchange their neighbors’ and their owntuples with other buddies. Using these 10-tuples acquired from all fogs, eachfog independently and consistently constructs a global distribution matrix , asfollows. We ﬁrst ﬁnd the global min and max storage range among all thefogs, s min = min i ( s mini ) and s max = max i ( s maxi ), and likewise the reliabilityrange, r min and r max . We divide each range [ s min , s max ) and [ r min , r max ) into k equiwidth buckets, and for each fog i , proportionally distribute its ( c q i + c q i ) count among the storage buckets that overlap with [ s mini , s medi ), and its( c q i + c q i ) count among buckets that overlap with [ s medi , s maxi ); and similarly,distribute counts ( c q i + c q i ) and ( c q i + c q i ) proportionally to reliability bucketsthat overlap with the reliability sub-ranges for the fog. We sum these bucketvalues across all fogs, and calculate the global median storage and reliability, s med and r med . This gives us the bounds of the global quadrants.For the 10-tuples for the 4 fogs, A, B, C and D shown in Fig. 3(a), theircontributions to the storage and reliability buckets are shown in (b) and (c),using k = 16 buckets. These help decide the global bounds in (d).E.g., fog B contributes it c q B + c q B = 9 edges proportionally to the 3 storagebuckets that fall between [ s minB , s medB ) = (9 , c q B + c q B = 6 edges to the 2storage buckets that between [ s medB , s maxB ) = (12 , r med = 85 and s med = 12.Now, for each fog i , we consider the area overlap of each if its local quadrantswith each of the global quadrants, and proportionally include the fog’s edgecount from that local to the global quadrant. E.g., in Fig. 3(a), fog C contributesall its edge counts in quadrants c q C = 2 and c q C = 2 to the global c q which fullycontains them, while the 6+6 = 12 edges in its q q q q

1, are shared proportionally ina ratio of 1:3 between them. This gives the global count of edges in each ofthese four storage and reliability quadrants,

HH, HL, LL, HL . Given this, afog is mapped to the quadrant where its median-center falls. E.g., fog A falls in LL and C in HL . 11 torage → R e li a b ili t y → Low High H i gh L o w Median M e d i an LH c q2 =11 HH c q1 =10 LL c q3 =12 HL c q4 =16r min = r med = r max = s min = s med = s max =

63 24

62 26

16 18 20

AB CD HH

Storage Capacity

A B C D

Storage Cap. Bucket → e dg e s

45 50246810 66 70 74 78 82 86 90 94

Reliability

A B C D

Reliability Bucket → e dg e s LH HHLL HL lh hh ll hl

16 15

14 1310 9 (a) (d) (e)(b)(c)

Figure 3: Global Matrix Estimation for Storage and Reliability.

We use this information maintained independently but consistently on each fogto handle the

PutBlock operation, invoked by a client on any fog. The fogreceiving a put request for a block of size ¯ s queries the stream sid to get itsreliability, ¯ r . It then selects a series of q fog partitions, and chooses an edgewithin each for placing a replica such that we (1) balance the use of fogs withboth high and low reliability edges to ensure that a sustainable mix of edgesremain , (2) give preference to fogs that have a higher available storage to ensure eﬀective use of capacity , (3) select diﬀerent fogs for each replica to enhance partition-tolerance and locality with diverse clients, (4) bound the replicationfactor to a minimum and maximum value set by the user, and (5) meet theblock’s reliability requirement .We select fog partitions from diﬀerent quadrants in the global matrix ina particular sequence to meet the above goals. Speciﬁcally, we alternate be-tween HH and HL quadrants to prioritize high-capacity fogs. Within the globalquadrant, we pick a random fog and test if it has a non-zero edge count in acomplementary reliability quadrant. E.g., for a fog that maps to the HH quad-rant of the global matrix, we check for edges in its HL or LL local quadrants,and for a fog in the HL global quadrant, we test for edges in its HH or LH localquadrants. If the fogs have zero edges in these quadrants, we expand to theother two local quadrants as well.The sequence order of global and local quadrants that are tested is givenin Fig. 3(e), and a variant of a Z-order curve . Intuitively, this picks edgesclose to the median global reliability and with high capacity . The reliability isinitially met by median edges. As their capacity is exhausted, the edges withmore extreme (low or high) reliability move closer to the median and will bechosen. Later, this helps us ﬁnd pairs of edges with low and high reliability that12ogether give a reliability similar to the initial two median reliability edges. Asan optimization, we always try and place the ﬁrst replica locally, if the writingclient is on an edge. We also pick edges in diﬀerent fog partitions unless thereis no available capacity.A fog i that is chosen will provide a minimum reliability of r mini if the edgeis in the HL or LL local quadrant, or r medi if in HH or LH . This is a conservativeestimate since the actual edge selected within the fog may have a reliability ashigh as r medi or r maxi , respectively. We pick as many fogs as needed to meet theblock’s reliability or the minimum replication count.The fogs chosen in this manner are sent to the client, which then directlycontacts each fog concurrently to place a replica of the block. Each fog selectsan edge with the least reliability in the speciﬁed local quadrant , and puts theblock on it. In case the global matrix is stale and the fog cannot ﬁnd a suitableedge, this fog can use its own global matrix to ﬁnd an alternative fog with asimilar non-empty global and local quadrant. Since the edge may be on a privatenetwork, the data moves from the client to the parent fog hosting a replica, andfrom it to the edge. If the client is an edge, it will also pass through its ownparent fog ﬁrst, but not otherwise, to avoid the extra hop. The fog also registersthe block metadata with itself, propagates to the federated indexes as describedbefore, and updates the stream metadata at the owner fog with the block ID,MD5 checksum, and block count. Getting a block involves ﬁnding the fogs containing the block replicas using itsID from the local fog. This ﬁrst returns the local fog or the possible neighborfogs that may contain it, based on a local index or Bloom ﬁlter lookup. Theclient contacts the local fog if present in the response, and this will have thereplica. Else, the client contacts each neighbor fogs, which checks its local index,and if present, returns the block from the edge to the client.If none of the local or neighbor fogs hold a copy, or in the rare case thesewere all false positives, we recheck with the local fog and force it to search itsbuddy Bloom ﬁlters. It forwards the ﬁnd request to matching buddies to checktheir local index and neighbor Bloom ﬁlters, in 2 – hops . This will return theglobal list of fogs that may contain the replica, and the client contacts each toget the ﬁrst available replica. A parent fog detects an edge failure due to missing heartbeats. This triggers arecovery of all block replicas present on the edge to ensure each block’s reliabilityrequirement is still met. For this, the fog uses the same edge selection approachas above, except that it tries to ﬁnd a single fog that has an edge with a reliabilitysimilar to the edge that failed. The parent fog then gets an existing block replicafrom a surviving edge, and puts it on the newly selected fog and an edge withinit. This selection of alternative devices and re-replication onto them is done13oncurrently for lost blocks on the failed edge. While we currently assumethat the reliability for an edge does not change over time, in future, this sametechnique can be extended to expand or contract the number of replicas to adaptto dynamism in the reliability.

The default

PutBlock operation is optimistic, and assumes that just one clientis writing to the stream. With concurrent clients adding blocks, the order inwhich the blocks are appended to the stream depends on the order in which thestream metadata at the owner fog is updated with the new block IDs. Here,we will need a user-deﬁned sequence number in the block metadata for partialordering of blocks written by one client.However, for global ordering of blocks with concurrent clients, we oﬀer a soft-lease mechanism . Here, the client ﬁrst calls

OpenStream to try and acquirea lease for a certain duration. This request is forwarded to the owner fog forthe stream, which logs and returns a successful lease for the requested (or apre-deﬁned) duration, if no other client holds an active lease on this stream.The response has the duration and a session key , which is a unique randomnonce used for auditing.

PutBlock then passes the client ID, lease durationand session key to the fogs where the replicas will be placed. These fogs sanity-check if the lease duration is valid, and log the client ID and session key for thisoperation, before writing the block replica to their edge. The client also addsthe new block IDs to the stream metadata.This soft-lease model is light-weight, but does not enforce locking of thestream. It is up to the clients to ensure that they have acquired a valid leasebefore they call puts in parallel to avoid inconsistent ordering. But, the logsmaintained at the fogs allow us to later verify the validity of the operations.The lease on a stream can be used by the client across multiple

Put|UpdateBlock operations. This lets it write a series of blocks to thestream with guaranteed contiguous order . If the lease is going to expire beforean operation, the client

Renew s it with the fog, which returns an extended leaseduration if it has not expired. If the lease has expired and no other client hasacquired the lease since then , the fog goes ahead and extends the lease. Thisreduces leasing overhead dues to time-skews, without aﬀecting consistency. Ifan

OpenStream fails due to another client having the lease, the client can polland retry acquisition. There is no explicit close stream operation, and the leaseis released on expiry.

UpdateBlock is similar to

PutBlock , but replaces the selection of replicasusing the global matrix, with ﬁnding the fogs holding all the current replicasfor the block, similar to

GetBlock . Once located, the client sends the updatedblock data to each replica, and also updates the stream metadata with the newMD5 checksum for the block. 14 .5.2 Stream Metadata Updates

When a stream is created, it is registered with an owner fog that holds itmetadata. These properties may be static or dynamic. While static propertiesare indexed and searchable, the values of dynamic properties can be updatedbut not searched on.Leasing is useful when multiple operations are done with a single lease toamortize its cost. But metadata updates are single operations. So we assignversion numbers to dynamic metadata properties and employ a test and set pattern to allow consistent and concurrent updates to them. This version isreturned by

GetStreamMeta . Cached versions of the stream metadata alsomaintain and return the version in their cache.When updating the metadata for a stream, the client ﬁrst does a

Get-StreamMeta , updates the values of the returned dynamic properties, and sendsthe new property values and the earlier version number to the owner fog of thestream. The fog tests if the current version matches the passed version, and ifso, sets the passed dynamic properties and increments the version. But, if thecurrent version is greater than the one that is passed, then the client is tryingto update a stale copy of the dynamic property. This may be due to usingan older cached metadata on a diﬀerent fog, or another client having updatedthe metadata with the owner fog since the last access by this client. Then theupdate call fails, and the client has to get the latest metadata and retry withthe new version number.There are also system-deﬁned dynamic properties that are maintained aspart of various APIs, such as the block count, list of block IDs, and their MD5checksums, for a stream. These cannot be modiﬁed directly by the client, butthe framework updates these internally using a similar pattern.

ElfStore is implemented in Java using the Apache Thrift cross-platform micro-services library. The fog service has the bulk of the logic, while the edge servicesare light-weight.We conduct experiments to validate the performance, resilience and scalabil-ity of ElfStore. We use the

VIoLET container-based IoT virtual environmentto deﬁne two deployments [26]. In the ﬁrst,

D20 , we have 4 fog containerson a public network, with 4 edges connected to each fog in a private network.This gives a total of 20 devices running on 4 Azure D32 VMs (32-core, 128 GBRAM). The

D272 conﬁguration has 16 fogs, with 16 edge containers each, fora total of 272 devices on 1 public and 16 private networks. They run on 16Azure D32 VMs. All devices in each fog partition run on the same VM. Theedge containers have CPU and memory resources that match a Raspberry Pi3B (4-cores@1 . . µ = 90% , σ = 3% for D20, and µ = 80% , σ = 5% forD272. For the

D20 setup , we run experiments with 1 , PutBlock

API on their local fog parent with a blocks size of 10

M B , ina loop for 100 times. We set a reliability of ¯ r = 99% for all these streams, anda min and max replication factor of 2 and 5. For the D272 setup, we performtwo experiments with 16 and 64 concurrent edge clients spread across the 16fogs. Each edge calls put for 100 iterations. They put blocks of size 1

M B or10

M B and use reliabilities of 90 . , . , .

90% or 99 . end-to-end latency distribution in seconds for the put API calls is shownas blue violin plots on the left Y axis of Fig. 4a. For a single API call, this is thetime to ﬁnd the fogs to place block replicas on, copy all replicas to the targetedges concurrently, and register the block metadata. Each violin distributionhas edges ×

100 data points.For

D20 , with , each put call takes a median of 3 . secs . Sinceeach replica is 10 M B in size, the link speed is 90

M bps , and we need 3 hops– from client to parent fog, parent fog to target fog, and target fog to edge –about 3 secs are spent just in data movement.Zooming in, the time to ﬁnd the replica placement is just 30 ms as theparent fog takes a local decision, and the time to update the metadata index isalso 30 ms ; this is mostly the service call overhead.These times do not vary much as we increase to 4 concurrent edges writingfrom 4 diﬀerent fog partitions, with their median time at 4 . secs . But with 16edges putting blocks in parallel, all 4 edges of every fog are active. Since they allroute data through their parent fog to a remote fog, the data transfered out fromthe fog for edges in its partition is 4 edges × remote replicas × M B . Hence,its available bandwidth limits the performance, taking a median of 10 . secs .So ElfStore’s overheads are minimal in all these cases, and we are only bandwidth bound .For D272 , each edge is randomly assigned to put blocks of either 1

M B or10

M B in size, 100 times. For

16 edges , there are 8 edges each putting blocksof these two sizes, while for

64 edges , there are 25 writing 10

M B blocks andthe rest 39 writing 1

M B blocks. Fig. 4a shows that the median latency with 16concurrent edges is about 2 . secs and it only marginally increases to 2 . secs for 64 edges. The smaller time than D20 is due to the use of smaller block sizesand a smaller client load, compared to the total edge count.16 Edge 4 EdgesD20 16 Edges 16 EdgesD27264 Edges

Number of Writers E n d - t o - e n d t i m e ( s e c s ) R e p li c a C o un t (a) PutBlock without leasing E n d - t o - e n d t i m e ( s e c s ) R e p li c a C o un t / o f C o n c u rr e n t W r i t e r s MeanMedian (b)

PutBlock on D20 with leasing & concurrent writers

Number of Readers048121620 E n d - t o - e n d t i m e ( s e c s )

91 020406080100 % o f l o c a l r e p li c a r e a d s

65 68 54 26 23

MeanMedian% local reads (c)

Find and

GetBlock

Figure 4: Performance of Put and Get block operations17f we limit our analysis to just the edges on

D272 putting M B blocks (plots omitted for brevity), we report that the median time for the 8 (of 16)edges writing 10

M B blocks is 5 . secs while for 25 (of 64) edges it is 6 . secs .These are higher than D20 primarily due to the higher replication factor, whichhas grown from being ≈ µ = 80% , σ = 5%. In contrast, D20’s reliability of µ = 90% , σ =3% results in a replication factor of 2–3. This clearly shows the diﬀerentialreplication at work. We initialize the

D20 setup with 16 ×

100 block writes without leasing. Then,we perform 25 additional block puts per client to a random stream, from 1, 4and 16 concurrent clients, with a lease acquired on the stream for 100 secs , andrenewed a median of 2 times.Diﬀerent edges may select the same stream to write to. Besides the end-to-end latency for these leased-puts, which now includes the lease acquisition andrenewal time (left Y axis in Fig. 4b), and the replica count (right Y axis), wealso show the concurrent writers count for a stream (right Y axis).With 1 or 4 edges doing puts, we see that the median latency is 2 . secs and4 . secs . These are comparable to the previous experiments without leasingfor the same number of writers. This is due to the lower median replicationfactor of 2 in these runs (compared to 3 earlier). This is due to a higher overallreliability of the edge devices in these runs, despite sampling from the same edgereliability distribution. No two edges have selected the same stream to writeto in these runs. This indicates that the edge reliabilities, replication countand bandwidth usage have a bigger impact on the end-to-end latency than theleasing overheads.With 16 clients, the median latency is lower than without leasing at 6 . secs due to the smaller replication count. But the latency distribution is much wider,reaching 450 secs . This is because multiple edges pick the same stream to writeto, as seen in the right-most violin. We have 4 streams selected by 2 edges eachto write to, and 1 stream picked by 3 edges. Hence, with concurrent writersand leasing, only one will write to the stream at a time while the others poll toacquire the lease. This lasts till all 25 blocks are put by an existing edge withthe lease.The peak latency to write a block is for the stream with 3 clients. The lastedge to get the lease was waiting for 50 blocks to be written by the previoustwo edges, that takes about 446 secs . So the latency for this edge to put itsﬁrst block is 450 . secs , while putting the rest of its 24 blocks does not haveadditional leasing overheads. 18 Edge 4 Edges 16 Edges U pd a t e L a t e n c y ( m s e c s )

121 121 245 01020304050607080 F a il e d U pd a t e s C o un t MeanMedian (a)

Get and

UpdateStreamMeta

Number of Edge Failures B l o c k r e c o v e r y t i m e ( s e c s )

20 21 3 7

44 0255075100125150175200 o f b l o c k s r e c o v e r e d

109 144 141 119

MeanMedian (b) D20 & D272 recovery time and block count -15 15 15 -90 90 90

I I I

5 5 7 5 5 9 6 6 4 6 5

4 4 8 4 6 4 7 4 7 4 4 5 5 4 6 6 6

5 4 4 6 7 6

14 14 13 13 13

90 90 90 90 90

I I I I I - - - - - - I

4 4

4 7 5 4 4 5

0 M N M ~ U"I ~ ~ ~ M N M ~ U"I ~ ~

00 NNNNNN

M M M M M M M M M M N N N

T ime (in 150s interval)

12 12 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11

90 90 90 90 90 90 90 90 90 90 90 90 90 90 89 89 89 89 89 89

I I I I I I I I I I I I I I I I I I I I (c)

Global Matrix values over time, for

PutBlock s (time 0–12, circle) and two edgerecoveries (times 15 and 22, cross) on D20

Figure 5: Performance of stream metadata update, and block recovery afteredge failures.

We do a similar set of concurrent

FindBlock and

GetBlock

API calls from 1,4 and 16 edges for the D20 setup, and from 16 and 64 edges for D272. ElfStorehas been loaded with 16 ×

100 blocks (D20) or 64 ×

100 blocks (D272) usingthe previous put runs. Each edge ﬁnds

100 random block IDs from the onesinserted, followed by a get of that block.The time to ﬁnd and get each block is shown in Fig. 4c (left Y axis), and amagenta triangle on the right Y axis indicates the percentage of times a replicafrom the local partition is read. The ﬁnd

API call is fast, taking about 220 ms with 1 and 4 edges for D20 , and about 440 ms with 16 edges. In the latter case,each fog is servicing 4 concurrent edge requests and hence marginally slower.Once the replicas for a block ID are identiﬁed, we get one of the replicas –preferring a replica in the local fog partition, if present. For D20 , we see that19he get latencies have a bimodal distribution. There are peaks at 1 . secs and2 . secs for 1 and 4 edges, and at 3 . secs and 7 . secs for 16 edges. This is dueto the mix of local and remote replicas that an edge accesses. Edges are able to get a local replica copy 55–70% of the time, resulting in the lower latency peak.This range is within the × × × .

5% we expect – since all edgeclients put blocks uniformly, th of all the blocks have their ﬁrst replica locally;of the remaining th of blocks, there is a chance on the ≈ D272 is equally fast, taking a median 1 . . secs with16 or 64 edge readers. It beneﬁts from 50–60% of blocks being only 1 M B insize. However, this is despite only ≈

23% of blocks having a local replica out ofthe median 4 replicas per block. This too matches the expected local fractionof × × × . gets too. We conduct experiments on the

D20 setup to measure the latency for streammetadata updates, using 1, 4 and 16 concurrent edges as clients. Each clientrandomly picks one of the 100 existing streams, and performs 100

GetStream-Meta and

UpdateStreamMeta operations alternately on it. It is possible fortwo clients to select the same stream to perform an update. Since we use versionchecking rather than leasing for metadata updates, it is likely that the versionof a stream metadata being updated may have been updated by a concurrentclient and hence fail. We report the latency for get and update metadata (leftY axis) and the count of failed updates (right Y axis) in Fig. 5a; failed updatesare not retried.With just 1 or 4 clients, no two streams are randomly picked for update bythe same client, and only local streams are chosen. So all updates are at thelocal fog, and complete successfully with a median latency of 121 ms .But with 16 clients, 4 streams are selected by a pair of clients to updateconcurrently. This causes 185 of the total of 1600 updates to fail due to staleness,as seen in the right Y axis. The update time also increases to a median of 245 ms .This is primarily due to a majority of the metadata updates happening on aremote fog partition, unlike the 1 and 4 edge cases, and this causes an extranetwork hop in the VIoLET environment. Lastly, we measure the responsiveness of ElfStore in recovering from edge fail-ures, and ensuring that the blocks maintain their reliability levels. We load16 ×

00 and 64 ×

100 blocks into the D20 and D272 setups, like before, and thenkill one of the edges with the least reliability. We track the time taken by its20arent fog to detect the loss, and start re-replicating the lost blocks on otheredges. Once recovery is complete, we kill another low reliability edge. Fig. 5bplots the time to re-replicate each block on the left Y axis violin, the numberof blocks recovered on the right Y axis, and list the total recovery time at thebottom, shown after the ﬁrst and the second failures.In all cases, 100% of lost blocks are re-replicated.We see that the re-replication time per block is ≈ secs for D20 , and ≈ secs for D272 . These are comparable to the sum of the get and put times seenbefore, since we get a surviving replica and put it on a new edge. Also, recoveryof blocks is done in parallel on the fog using 10–20 threads. Hence, while 109–144 blocks are recovered depending on the failing edge, the total recovery timeis only 105–312 secs . So the thread parallelism gives us a 10 × speedup.We further examine how our global matrix changes as blocks are populatedin ElfStore, and when failures happen. Fig. 5c shows a heatmap of the edge-counts in the 4 global matrix quadrants (top 4 rows) and the median storageand latency values (bottom 2 rows), updated every 150 secs along the X axis,for D20. At time steps 0–12, 4 edges are concurrently writing 100 blocks in aloop. Initially, the median available storage s med = 14 GB , and all 16 edgesfall in the high capacity quadrants, HH or HL . As replicas are written to fogs inthese quadrants and their edge capacities get used on a priority, the count shiftsfrom HH and HL , to LH and LL , e.g., from step 2 to 3. Eventually, this diskusage causes the median capacity to change, say, from 14 GB to 13 GB afterstep 5. This causes borderline fogs, earlier classiﬁed as low capacity, to moveto the high capacity, and become prioritized for selection. So we keep selectingfogs that are in and around the median value.After step 15, there is an edge failure and the total edge count drops from 16to 15. The ensuing re-replication causes the missing blocks to be copied to anexisting edge. While only one replica is created, this is done by 10+ concurrentthreads. So the edge counts again shift from high to low capacities. When asecond edge fails after step 22, it even causes the median reliability to drop from r med = 90% to 89%. In this paper, we have presented a novel distributed storage service for edge andfog resources that oﬀers a transparent means for edge computing applicationsto access streams of data blocks persisted locally. This avoids the need to moveIoT data to and from the cloud, other than for long-term archival. ElfStoreleverages ideas from both P2P networks and Big Data storage like HDFS. Ituses a federated index for 2-hop searching of blocks, with hierarchical Bloomﬁlters over static metadata properties for fast probabilistic searches at scale. Itmaintains approximate global statistics on storage and reliability distributionsof edges on diﬀerent fogs, which helps it select fogs and edges for diﬀerentialreplication. This guarantees tunable reliability of each block. Our experimentsdemonstrate the low overhead of ElfStore, with block read and write perfor-21ance bound only by the network speed. Consistent and concurrent updates ofblocks and metadata are also validated. It also performs automated and rapidblock re-replication on edge failures, to maintain the required reliability.As future work, we plan to include support for overlay creation, as availablein existing P2P literature, and use buddy pools to handle unreliable fogs as well.We can also enforce the leases as locks, and support access control, auditing andnon-repudiation mechanisms. Larger scale and comparative experiments, andconcurrent-failure tests are planned as well ‡ . References [1] C. Perera, A. Zaslavsky, P. Christen, and D. Georgakopoulos, “Sensing as aservice model for smart cities supported by internet of things,”

Transactionson Emerging Telecommunications Technologies , 2014.[2] Y. Simmhan, S. Aman, A. Kumbhare, R. Liu, S. Stevens, Q. Zhou, andV. Prasanna, “Cloud-based software platform for big data analytics insmart grids,”

Computing in Science & Engineering (CiSE) , 2013.[3] A. V. Dastjerdi and R. Buyya, “Fog computing: Helping the internet ofthings realize its potential,”

IEEE Computer , 2016.[4] X. Xu, S. Huang, L. Feagan, Y. Chen, Y. Qiu, and Y. Wang, “Eaaas: Edgeanalytics as a service,” in

IEEE ICWS , 2017.[5] J. He, J. Wei, K. Chen, Z. Tang, Y. Zhou, and Y. Zhang, “Multi-tierfog computing with large-scale iot data analytics for smart cities,”

IEEEInternet of Things Journal , 2017.[6] P. Garcia Lopez, A. Montresor, D. Epema, A. Datta, T. Higashino,A. Iamnitchi, M. Barcellos, P. Felber, and E. Riviere, “Edge-centric com-puting: Vision and challenges,”

ACM SIGCOMM Computer Communica-tion Review , 2015.[7] M. Yannuzzi, F. van Lingen, A. Jain, O. L. Parellada, M. M. Flores, D. Car-rera, J. L. P´erez, D. Montero, P. Chacin, A. Corsaro et al. , “A new era forcities with fog computing,”

IEEE Internet Computing , 2017.[8] P. Ravindra, A. Khochare, S. P. Reddy, S. Sharma, P. Varshney, andY. Simmhan, “Echo: An adaptive orchestration platform for hybriddataﬂows across cloud and edge,” in

ICSOC , 2017.[9] H. Wu, S. Deng, W. Li, M. Fu, J. Yin, and A. Y. Zomaya, “Service selectionfor composition in mobile edge computing systems,” in

IEEE InternationalConference on Web Services (ICWS) , 2018. ‡ Acknowledgment:

We thank Shrey Baheti from the DREAM:Lab for help with the exper-iments. This work was supported by grants from VMWare, Microsoft Azure and the Indo-USScience and Technology Forum (IUSSTF).

Wireless Communications and MobileComputing , 2018.[11] B. Confais, A. Lebre, and B. Parrein, “Performance analysis of objectstore systems in a fog and edge computing infrastructure,”

Transactionson Large-Scale Data-and Knowledge-Centered Systems XXXIII , 2017.[12] J. Benet, “IPFS - content addressed, versioned, P2P ﬁle system,”

CoRR ,vol. abs/1407.3561, 2014.[13] B. Confais, A. Lebre, and B. Parrein, “An object store service for a fog/edgecomputing infrastructure based on ipfs and a scale-out nas,” in

IEEE In-ternational Conf. on Fog and Edge Computing (ICFEC) , 2017.[14] R. Mayer, H. Gupta, E. Saurez, and U. Ramachandran, “Fogstore: Towarda distributed data store for fog computing,” in

IEEE FWC , 2017.[15] I. Psaras, O. Ascigil, S. Rene, G. Pavlou, A. Afanasyev, and L. Zhang,“Mobile data repositories at the edge,” in

USENIX HotEdge , 2018.[16] J. Gedeon, N. Himmelmann, P. Felka, F. Herrlich, M. Stein, andM. M¨uhlh¨auser, “vstore: A context-aware framework for mobile micro-storage at the edge,” in

International Conference on Mobile Computing,Applications, and Services , 2018.[17] C.-A. Chen, M. Won, R. Stoleru, and G. G. Xie, “Energy-eﬃcient fault-tolerant data storage and processing in mobile cloud,”

IEEE TCC , 2015.[18] J. S. Plank, “Erasure codes for storage systems: A brief primer,”

Login:The USENIX Magzine , vol. 38, no. 6, pp. 44–50, 2013.[19] Y. Dong, H. Zhu, J. Peng, F. Wang, M. P. Mesnier, D. Wang, and S. C.Chan, “Rfs: a network ﬁle system for mobile devices and the cloud,”

ACMSIGOPS Operating Systems Review , vol. 45, no. 1, 2011.[20] I. Stoica, R. Morris, D. Liben-Nowell, D. R. Karger, M. F. Kaashoek,F. Dabek, and H. Balakrishnan, “Chord: a scalable peer-to-peer lookupprotocol for internet applications,”

IEEE/ACM Trans. on Network. , 2003.[21] B. B. Yang and H. Garcia-Molina, “Designing a super-peer network,” in

IEEE International Conference on Data Engineering (ICDE) , 2003.[22] J. Ledlie, J. M. Taylor, L. Serban, and M. Seltzer, “Self-organization inpeer-to-peer systems,” in

ACM SIGOPS European workshop , 2002.[23] S. A. Weil, S. A. Brandt, E. L. Miller, D. D. Long, and C. Maltzahn, “Ceph:A scalable, high-perf. distributed ﬁle system,” in

OSDI , 2006.[24] A. Broder and M. Mitzenmacher, “Network applications of bloom ﬁlters:A survey,”

Internet mathematics , vol. 1, no. 4, pp. 485–509, 2004.2325] B. Fan, D. G. Andersen, M. Kaminsky, and M. D. Mitzenmacher, “Cuckooﬁlter: Practically better than bloom,” in

ACM International on Conf. onemerging Networking Experiments and Tech. , 2014.[26] S. Badiger, S. Baheti, and Y. Simmhan, “Violet: A large-scale virtualenvironment for internet of things,” in