aa r X i v : . [ c s . C Y ] M a r Big Data and the Internet of Things
Mohak Shah
Research and Technology Center - North AmericaRobert Bosch LLCPalo Alto, USA [email protected]
Abstract.
Advances in sensing and computing capabilities are makingit possible to embed increasing computing power in small devices. Thishas enabled the sensing devices not just to passively capture data at veryhigh resolution but also to take sophisticated actions in response. Com-bined with advances in communication, this is resulting in an ecosystemof highly interconnected devices referred to as the Internet of Things- IoT. In conjunction, the advances in machine learning have allowedbuilding models on this ever increasing amounts of data. Consequently,devices all the way from heavy assets such as aircraft engines to wear-ables such as health monitors can all now not only generate massiveamounts of data but can draw back on aggregate analytics to “improve”their performance over time. Big data analytics has been identified as akey enabler for the IoT. In this chapter, we discuss various avenues ofthe IoT where big data analytics either is already making a significantimpact or is on the cusp of doing so. We also discuss social implicationsand areas of concern.
Keywords:
Internet of Things, IoT, IoTS, Big Data, Industrial Analyt-ics, Industrial Internet
In recent years, technological advances have opened up entirely new opportuni-ties for both collecting and processing large-scale data. The capability to buildalgorithms that can generalize and do inductive inference has also increasedsignificantly. This has resulted in advancing the state-of-the-art in traditionalresearch fields that relied on huge quantities of data but were challenged bylimited data acquisition capability or computing power. Research fields such asastronomy, physics, neurosciences, as well as medical genomics are some immedi-ate examples (see, for example, (Feigelson and Babu 2012, Hesla 2012)). Further,largely driven by problems such as search and then those pertaining to socialmedia, novel data- and compute-architectures as well as learning algorithmshave also appeared in recent years. This has further propelled the prospects ofbuilding value added offerings.In conjunction, there have been immense developments in sensing technolo-gies resulting in “smart” devices that are constituted of sensors, actuators as well
Mohak Shah as data processors. We are at the cusp of a revolution in terms of how humankindinteract with the technology in that an ever-increasing number of devices thatwe use, operate or interact with (even passively) are capable of collecting theseactions, and more, in the form of data. As Zaslavsky et al. (2013) note, “...con-centration of computational resources enables sensing, capturing, collection andprocessing of real time data from billions of connected devices serving many dif-ferent applications including environmental monitoring, industrial applications,business and human-centric pervasive applications.” Such sensing technology isbecoming pervasive and ubiquitous, and will be able to collect data throughintermittent sensing, regular data collection as well as Sense-Compute-Actuate(SCA) loops. Hence, data can be collected at desired resolution all the way fromcontinuous monitoring, to event or action captures. Moreover, such devices, bethey appliances at home, heavy assets such as aircraft engines in the field, orwearables and mobile devices, do not function in isolation. More and more suchdevices or “things” are being “interconnected” resulting in an ecosystem referredto as the
Internet of Things-IoT . This interconnectivity offers opportunities forenhanced services and efficiency optimization that can supplement each otherby means of derived and abstracted insights - higher level of observations andinferences made from data arriving from multiple interconnected devices. Notethat this interconnectivity need not be a device-to-device or machine-to-machineinterconnectivity but can also be achieved via common platforms. Moreover, thiscan both be (near-) real-time as well as passive (data collected and analyzed overtime).Gartner estimates that, by 2020, this network of interconnected devices willgrow to about 26 billion units with an incremental revenue generation in ex-cess of $300 billion, primarily in services. Furthermore, global economic value-add through sales into diverse end markets would reach $1.9 trillion (Middletonet al. 2013). Consequently, the data resulting from these devices will grow ex-ponentially too resulting in new business opportunities as well as posing novelchallenges to managing and processing it for value gain. The data of the digitaluniverse is slated to grow 10 folds by 2020. Various research and analysis firmshave confirmed the scale of these projections in addition to the Gartner report.IDC further notes that data just from embedded systems, i.e. sensors and phys-ical systems capturing data from physical universe, will constitute 10% of thedigital universe by 2020 (this currently stands at 2%) and represent a higherpercentage of target-rich data (Turner et al. 2014). These technologies are alsoresulting in novel business models as well as new revenue sources, diversificationof revenue streams in addition to increasing visibility and operational efficiency.Businesses will increasingly focus on services’ aspects enabled by an increasedunderstanding of utilization and operation of assets, consumer interests and be-haviors, along with usage patterns and contextual awareness. Consequently, theIoT in specific contexts has also been referred to as the Internet of Things andServices (IoTS). It is also referred to as the Industrial Internet to highlight theapplications in the world of heavy industrial assets. We will, however, stick to ig Data and the IoT 3 the general term IoT to look at the opportunities that cut across domains aswell as services.
In this chapter, we will review some important aspects of the intersection of bigdata analytics and the internet of things. Even though we will briefly discuss theconnectivity, communication and data acquisition issues, this is not the mainfocus of the chapter. We would rather like to focus on the novel opportunitiesand challenges that the new world of interconnected devices offer, along withsome advancements that are being made on various fronts to realize them. Im-portantly, we will also discuss social implications as well as some of the, possiblyunderappreciated, areas that need responsible consideration as we move forwardwith a technology with a profound impact on society.As a consequence of potentially billions of connected devices, the landscapeof both handling and learning from data will undergo massive change. Further,the speed and scale at which the edge devices will produce data will dwarfthose of the current big data enablers such as social media, let alone manualdata generation. This, previously unseen speed and scale of data, of course in-troduces challenges not only to the data and computing infrastructure but alsopose a challenge to conventional learning methodologies and algorithms. As Ag-garwal et al. (2013) rightly note, scalability , distributed computing and real timeanalytics will be critical for enabling the data-driven approaches to generatevalue.We would also like to reiterate the point made by Aggarwal et al. (2013)that the concept of the internet of things goes beyond those of RFID technol-ogy and social sensing. While the former can be considered as a key enabler ofthe IoT, this technology is not the sole source of data acquistion as we notedabove. Similarly, social sensing referring to peoples’ interactions via embeddedsensor devices, is a subset of IoT whereby this concept is not limited to peoplebut also extends to machines and devices. Furthermore, we would also like tobring into discussion the resulting services based offerings that would be gener-ated of this network. This is not just servitization, that refers to “the strategicinnovation of an organization’s capabilities and processes to shift from sellingproducts, to selling an integrated product and service offering that delivers valuein use” (Vandermerwe and Rada 1989, Lee et al. 2014). In our view, IoT goesbeyond integrated product and service offerings to enable novel business andrevenue models. A variety of views on the IoT have been proposed based ondifferent contexts. Aggarwal et al. (2013) categorize these views in three broadcategories: things-oriented vision (focusing on devices), internet oriented vision Note that we use the term “edge devices” loosely to encompass not just the devicessuch as RFID tags but also other sensors esp. MEMS, including embedded sensors,monitoring and diagnostic sensors aboard industrial assets and so on. Mohak Shah (focusing on communication and interconnectivity), and semantic oriented vision(focusing on data management and integration). We would like to discuss a functional vision of the IoT , a vision where theresulting data and insights, and not the enablement mechanisms, plays a cen-tral role. From a functional perspective, we discuss the basic components of anenablement stack and also current and some future areas where we envision wit-nessing the immediate impact. It should be noted that it is impossible to covera topic such as the IoT, even in the context of big data, in its entirety in a bookchapter. The aim of this chapter is to familiarize the reader with how big dataanalytics is a major part of the IoT vision and will be a , if not the , key playerin deriving business and societal value. Finally, big data does not refer only tovolume aspect of data but also to the variety and velocity - the three importantV’s used to describe big data all of which pose novel challenges.The rest of the chapter is organized as follows: We discuss major componentsof a big data analytics stack in the context of IoT in Section 2. Section 3 thendetails various domains that stand to benefit from big data and the IoT, followedby recommendations on what steps organizations need to take in order to harnessthis value in Section 4. We then focus on the social implication issues as well asareas of concerns in Section 5 and present some concluding remarks in Section 6. We highlight in this section the major areas relevant to enabling analytics toleverage the value from the IoT as well as allow a general model to scale. Theseare also crucial to the broader ecosystem that would allow for devices to houseanalytical capabilities themselves. Any IoT application, whether it manifests atthe user level or cloud level would need existence of an end-to-end analyticsstack to support its functionalities. The offerings from IoT applications will becontingent on how each of these building blocks are realized. Of course, the levelsat which each of these components will play a role in any specific application issubjective but they are a necessary condition nonetheless.
Most of the data acquisition in the context of the IoT happens through edge de-vices. Edge devices are referred as such since these typically reside at the edgesof the network. That is, they are present at the point at which either a humanor an asset interacts with the rest of the network and it is through these devicesthat the initial data will be acquired and possibly re-transmitted back to thenetwork. Examples include health sensors on patients, activity monitors such asJawbone, control and advanced monitoring sensors on industrial assets, weathersensors, movement sensors in a home, visual, sonar and laser cameras aboard an We do briefly cover some of these categories since they are indeed critical and theeffectiveness of IoT applications and capabilities are highly contingent on effectivesolutions in these areas.ig Data and the IoT 5 autonomous vehicle, diagnostic sensors on appliances, sensors embedded on mo-bile devices, etc. Radio Frequency Identification (RFID) tags were one of the firstmechanisms of acquiring such data but there have been other devices includingsensors such as microelectro mechanical sensors (MEMS), mobiles and wearablesthat have vastly expanded the possibilities of large-scale, high-resolution dataacquisition.Efforts have been underway to establish proper channels to acquire and per-sist data collected from the edge devices. While currently there is no agreedupon protocol for such acquisition, domain-specific mechanisms are appearing.There is certainly a need for accepted protocols for communication for thesedevices both to each other and to a central capability such as cloud to enableaggregate analytics. Technologies involving wired or wireless communication ofhomogenous devices as well as capture and transmission of sensor data for storageand processing by applications are referred to, broadly, as Machine-to-Machine(M2M) technologies. Some companies are going the proprietary route while therehave also been announcements of open-source efforts (e.g., Bosch, ABB, LG andCisco’s joint venture announced recently to cooperate on open standards forsmart homes; see appendix). Similarly, there have been other joint efforts tryingto bring more standardization to the IoT including Open Interconnect Con-sortium (OIC), AllSeen Alliance, Thread group, Industrial Internet Consortium(IIC) and IEEE P2413 (Lawson 2014). Developing an open-source ecosystem hasits advantages since broader community can contribute to the efforts. Moreover,given that the user community is involved in the development, adoption becomesrelatively easier and wide-ranging. Since these devices collect high dimensionaland high frequency captures of device states, this will in turn also require high-bandwidth connectivity. For instance, an aircraft engine can send data through10’s to 100’s of sensors at millisecond-resolution and can generate multiple GB’sof data per flight.
One of the most distinguishing aspects of the IoT is the fact that the data is ac-quired from a variety of sources. In order to provide useful services the data fromedge devices typically needs to be combined with external data sources includingbusiness data, utilization data of assets, geographical data, weather data, etc.Consequently, the data quality and management issues also grow exponentially.Combining and analyzing heterogeneous data is a major challenge. Efforts havebeen made to standardize data characterization so that a communication proto-col can be developed for data exchange. The Open Geospatial Consortium, forinstance, has developed various such protocols under the Sensor Web Enable-ment initiative allowing for interoperability for sensor resource usage. Some ofthe standard interfaces proposed as a part of the initiative include O& M (Obser-vations and Measurements, to encode the real-time measurements from sensors),SML (Sensor Model Language, to describe sensor systems and processes), Trans-ducer Model Language (TML, to describe transducers and supporting real-time
Mohak Shah streaming of data), Sensor Observation Service (SOS, standard web service inter-face for requesting, filtering and retrieving sensor system observations), SensorAlert Service (SAS, for publishing and subscribing alerts from sensors), SensorPlanning Service (SPS, for requesting user-driven acquisitions and observations),and Web Notification Service (WNS, for delivery of messages or alerts from SASto SPS). See (Aggarwal et al. 2013) for more details. Further, in order to inte-grate and annotate the sensor data, the World Wide Web Consortium (W3C)has initiated the Semantic Sensor Networks Incubator Group (SSN-XL) witha mandate to develop semantic sensor network ontologies. These efforts haveconstituted a big part of the semantic web effort, further defining ontologicalframeworks such as the Resource Description Framework (RDF) and the WebOntology Language (OWL) that enable defining ontologies such as SSN (Seman-tic Sensor Network) and SWEET (Semantic Web of Earth and EnvironmentalTerminology) to express identifiers and relationships in various contexts.Further, given that the data is acquired in real time and field settings, thereare myriad of issues around missing values, skewness and noise. The high reso-lution temporal nature of such data further makes it difficult to align multiplesources as well as devise strategies to learn from them in conjunction with staticdata sources. In the context of assets, the data is also accompanied by derivedattributes - ones whose values are calculated from the raw data using a con-version mechanism. However, the protocols for obtaining the derived quantitiesare not uniform or standardized even within a given domain let alone acrossdomains. Data integration becomes more difficult since it requires the reconcili-ation of such derived quantities. In addition to formulaic data transformations,there can also be hurdles in data management arising from issues such as privacyand security resulting in deidentified and/or encrypted data.From a storage perspective, classical relational databases are no longer enoughsince the data is not only from disparate sources but it also appears, or needsto be organized, in native forms such as documents, graphs, time-series, etc.The whole paradigm around data organization that addresses the set of require-ments around big data is broadly referred to as NOSQL (standing for Not OnlySQL) databases. This includes columnar data stores such as BigTable, Cassan-dra, Hypertable, HBase (inspired by the BigTable); key-value and documentdatabases such as MongoDB, Couchbase server, Dynamo and Cassandra (alsosupports documents); stream data stores such as Eventstore; graph based data-stores such as Neo4j and so on. Each of these have associated technologies forefficiently querying and processing data from respective stores and have uniqueadvantages and capabilities. For instance, services such as Flume and Sqoopallow for ingestion and transfer of big data while languages such as Hive, Pig,JAQL and SPARQL enable efficient querying of big data in various forms includ-ing ontologies such as the RDF. From an integration perspective, the classicalapproaches of business-to-business (B2B) data integration do not apply eithersince such data cannot generally be organized using a master database schema. Acombination of these storage strategies are typically employed depending on the ig Data and the IoT 7 types of data and customized views can be created depending on the applicationrequirements.A data persistence strategy is also needed since many times storing such highresolution data in massive quantities is neither viable nor needed. Strategiesinvolving data summarization and sampling along with storing accompanyingmetadata can be quite effective especially when the data has very high levelof redundancies. Recall how sparse format allowed to store and process datafiles with few non-zero values much more efficiently in the case of very highdimensional data. These data structures, for instance, are a regular offering invarious analytics toolsets and libraries such as pandas.
The massive amounts of acquired data necessitates powerful infrastructure tosupport not just storing and querying, but also extracting insights from suchdata. Various categories of learning that need to be performed on such data exertunique set of requirements. For instance, one of the most common requirement isthat of being able to perform batch analytics over historical data to build aggre-gate models. However, this can be a complicated endeavor given that the datadoes not necessarily reside on the same network let alone the same machine.Hence, parallel learning algorithms as well as distributed learning capabilitiesare needed depending upon the size, location and other data characteristics inaddition to the communication constraints. Frameworks such as Hadoop haveshown significant promise when it comes to distributed data analysis includingefficient search, indexing as well as learning (Aggarwal et al. 2013). Hadoop isa distributed storage and processing framework for large scale data relying ona Hadoop distributed file system (HDFS) with an aim to “take compute to thedata”. This is in contrast to the classical parallel high performance computing(HPC) architectures that relied on parallel file system where the computationwould require high-speed communication mechanism to the data. Over past fewyears, Hadoop has developed as an ecosystem (see appendix) with various appli-cations and services supporting functions on the core architecture allowing forefficient data storage and organization, search and retrieval (including querying),processing, as well as services such as resource scheduling and maintenance. Var-ious data-stores as well as querying languages mentioned above form a part ofthis ecosystem.One of the major limitations in the distributed settings such as Hadoop hasbeen that of performing analytics with low latency requirements including modeldeployment, real-time, iterative, or interactive analytics. In such cases, especiallywhen multiple passes on the data are required (e.g., many machine learning al-gorithms), Hadoop framework can be quite costly in terms of communicationto the underlying HDFS. Frameworks such as Spark were developed to addressthese issues on Hadoop and since then have grown into its own ecosystem. Theseframeworks, especially Spark, have shown significant promise and are being in-vestigated for their suitability in the IoT scenarios. Spark enables in-memoryprimitives for cluster computing as opposed to Hadoop’s MapReduce which is a
Mohak Shah two-stage disk-based paradigm and hence allows for faster performance on ap-plications with low-latency requirements mentioned above. While both Hadoopand Spark offers streaming API’s, Storm is a computational framework designedwith streaming analytics as its objective. Since each of these paradigms havetheir strengths and limitations, choosing the right storage as well as computionalparadigm involves an in-depth analysis of requirements for the use-case in whichthese would be employed. However, due to their open-source nature, there hasbeen significant effort in promoting interoperability of these frameworks. For in-stance, both the Spark and Storm frameworks can operate on Hadoop clustersand hence provide for easy integrability. Hadoop commercial providers such asCloudera and Hortonworks have also announced support for Spark and Stormrespectively. Companies such as Databricks are already providing commercialversion of the Spark framework. Finally, to effectively deploy and scale the an-alytics models, standardization and benchmarking mechanishms for analyticsare available that can allow for efficient communication of these models. Pre-dictive Model Markup Language (PMML) provides one such mechanism. Therehave been successful commercialization of such standards from vendors such asZementis that provides not just an encoding mechanism but a full deploymentcapability. This includes an execution and scoring engine, namely Adaptive De-cision and Predictive Analytics (ADAPA) that can run PMML specified modelsallowing for modular and efficient model deployment. Capabilities such as Veloxalso target machine learning model management and serving at scale (Crankshawet al. 2014).
The natural subsequence to the handling, management and integration of IoTdata, is the actual insight discovery step which is the ultimate goal of the net-work. Even though each step starting from the data acquisition onwards poses avariety of challenges for the IoT, the ultimate value from these steps is realizedonly when useful and generalizable insights can be derived from this data. Thecurrent use of edge devices (at least in the consumer domain) seem to be predom-inantly point-use, that is, operationalization at the single user level. However,as more and more devices get interconnected this will inevitably change. In fact,there are various use cases where this is already visible as we will discuss in thenext section. Learning from IoT data is particularly interesting and challang-ing at the same time. Classical machine learning methods need to be extendedand adapted to cope with the challenge of scale, diversity and the distributednature of the data. The volume, acquisition speed and temporal nature of thesensor and other related data is already highlighting the limitations of tradi-tional approaches to learning. Some of the major challenges include learning indistributed settings, learning from very high dimensional, high resolution tempo-ral data and learning from heterogeneous and complex data. Novel frameworkssuch as the alternating direction method of multipliers (ADMM) (Boyd et al.2011) have appeared to enable optimization, a core functionality of many learn-ing algorithms, in such distributed settings. Furthermore, advances have also ig Data and the IoT 9 enabled versions of successful machine learning algorithms such as topic model-ing via Latent Dirichlet Allocation (LDA) (Wang et al. 2009, Zhai et al. 2012),convolutional neural nets, Restricted Boltzmann Machines (RBM’s) (Salakhut-dinov 2009, Dean et al. 2012), Support Vector Machines, Regression and so on(see, e.g., (Mackey et al. 2014, Pan et al. 2014, Gonzalez et al. 2014)) for largescale settings. Online versions of various classical learning algorithms have alsoappeared allowing for faster execution times on large datasets.Consequently, this has also necessitated extensions of the evaluation ap-proaches to the learning algorithms (Japkowicz and Shah 2011) to be extendedto large scale settings. Some promising approaches for resampling in large scalesettings such as the bag-of-little-bootstraps (BLB) (Kleiner et al. 2013) haveappeared that also provides a theoretical framework characterizing them. In ad-dition, there have been advancements in methods aimed at analyzing streamingdata at scale for event prediction, change point detection, time-series forecast-ing and so on owing to the use cases that require online learning or where themodels need to be adapted to evolving realities (see, for instance, (Lin et al.2003)). Feature discovery is also one of the issues that has resurfaced since itis no longer feasible for learning-features to be designed or discovered in con-ventional manner. Novel approaches are enabling automated feature discoveryand learning in cases where generalized models can be built from extremely largedistributed datasets. One of the most prominent developments has been in learn-ing sophisticated networks and autoencoders via Deep Learning methods (Deanet al. 2012). Deep learning has shown significant promise in domains such asimage classification, speech recognition and text mining (Krizhevsky et al. 2012,Socher et al. 2011, Le et al. 2012, Bengio et al. 2003).In addition to these, there have also been efforts to scale up the deploy-ment of large scale classifiers in hardware and embedded systems. Specific chipdesigns inspired by both the machine learning and cognitive computing fieldshave appeared to this end. Some prominent examples include IBM’s SyNAPSE,NVidia’s Tegra X1 and Qualcomm’s Zeroth (see appendix for links).
From an organization or application level, it is clear that an end-to-end IoT stackis needed. The components of this stack will include data acquisition right fromthe M2M layer, data-processing, data-sharing through interconnected network,insights’ discovery and capability to relay results both to devices (for potentialactions) as well as to (automatic or manual) decision makers. Various teams andcompanies are identifying the nature and structure of such an IoT stack thatcan provide infrastructure, platform and services for both front-end applicationand solution development, as well as back-end computing and support (e.g., viacloud). Commerical vendors such as EMC, Microsoft, Amazon and IBM offerbuilding blocks of this stack that can be instantiated by organizations or ser-vice providers based on their specific requirements. Increased modularity andinteroperability will further speed-up the adoption and scaling of these capabil-ities. For example, being able to choose the desired infrastructure, platform and software selectively from a combination of vendors can address specific needs ofIoT applications. Consequently, Infrastructure-, Platform-, and Software-as-a-service (IaaS, PaaS and SaaS respectively) are becoming increasingly desirable(see, for example, offerings from Cloud Foundry, Microsoft Azure; link in ap-pendix). Lambda architecture (Marz and Warren 2015) has shown promise asa basis that allows for batch and real-time analytics together. This also allowsto account for the volume, velocity and variety of big data. Lambda architec-ture already underlies many Hadoop and Spark instantiations. Architectures forspecific cases, such as embedded systems and sensor networks, are also beingproposed (see, for instance, (Gubbi et al. 2013, Yashiro et al. 2013, Tracey andSreenan 2013, Sowe et al. 2014)).
The applications within the IoT domains depend highly on the respectivebusiness drivers leading to multiple manifestations of business cases throughsuch network. Various works have attempted to paint a picture of the applicationlandscape for the IoT. For instance, Chui et al. (2010), categorize the applica-tions in two broad categories: i) Information and Analysis, consisting of trackingbehavior, enhanced situational awareness, and sensor-driven decision analytics;and ii) Automation and Control, consisting of process optimization, optimized re-source consumption, and complex autonomous systems. Another categorizationcomes from Markkanen and Shey (2014) who categorize these in five categoriesviz. predictive maintenance, product and service development, usage behaviortracking, operational analysis and contextual awareness. The Cognizant Report(2014) also discusses some of the opportunities in the IoT.IoT potentially goes beyond the possibilities mentioned in above reports inthat it will also enable sophisticated services capabilities as mentioned earlier.In this respect, another categorization is quite illustrative that divides these op-portunities in consumer-facing and business-facing opportunities (Leuth 2014).To provide a flavor of the type of some specific applications, let us look atsome illustrative use cases from different IoT-related domains. Note that we arereviewing these domains from a big data and analytics perspectives. There aremany more applications as a consequence of advancements in sensing technolo-gies and hyperconnectivity achieved in the IoT. As the readers will notice, ourcategorization has overlaps with various above-mentioned efforts. However, look-ing at the applications and opportunities from a domain perspective can providea more coherent picture.
Lee et al. (2013) describes manufacturing as a 5M system consisting of Ma-terials (properties and functions), Machines (precision and capabilities), Meth-ods (efficiency and productivity), Measurements (sensing and improvement) and ig Data and the IoT 11
Modeling (prediction, optimization and prevention). In this context, additivemanufacturing can be considered as a process for creating products using anintegrated 5M approach. Recent advances have significantly improved sensingcapabilities and data gathering around various aspects of this 5M system. How-ever, in traditional-, as well as in many cases advanced-, manufacturing setupsuch information gathering had a preventive or control purpose and hence didn’tnecessarily serve an analytics-oriented insight discovery objective. Even in tra-ditional sense, it can be argued that big data has been utilized for quite sometime especially in the context of modeling. However, this usage typically cor-responds to modeling based on data under simulated or nominal conditions inwhich the product or manufactured industrial asset is run under a controlled en-vironment. For a manufacturing process, big data can enable functions such ascorrelating controller and inspection data. When this is combined with the tra-ditional overall equipment efficiency (OEE) providing the production efficiencystatus, insights into the relationship between performance and the cost involvedin a sustained OEE level can be obtained. This is particularly timely as thereis a significant initiative, referred to as
Industry 4.0 , to increase digitization ofmanufacturing with a goal to build an intelligent factory , with cyber-physicalsystems and the IoT as the basis.The opportunity landscape in the manufacturing domain is vast, in additionto the OEE and performance optimization. Big data analytics can help (and hasstarted to do so) in areas such as cycle time reduction, scrap reduction, productdefect detection (e.g., to improve quality ratios, or detecting products that maylead to quality issues later), identifying and resolving issues with machine fail-ure and optimizing material and design choices (see (Kurtz et al. 2013, BoschMongoDB white-paper 2014) for some examples in various manufacturing do-mains). As smart factories move beyond sole control-centric optimization andintelligence, big data can enable further optimizations by taking into account in-teractions of surrounding systems as well as other impact factors. For instance,the production cycle quality assurance can benefit not only from the quality dataof current cycle, but can also analyze quality data from the previous steps (e.g.,quality and monitoring data from parts-suppliers) or feedback (e.g. quality re-ports and issue notifications from consumers). It should be noted that while suchbenefits result in immediate value, they also have significant indirect advantages.For instance, while slight increase in the quality as a result of improved defectdetection may seem to be a marginal improvement for advanced manufacturingfacilities, these can translate into new business opportunities for companies andsometimes can be a major deciding factor for the clients. Similarly, identifyingdefective or potentially defective parts right at the manufacturing or qualitytesting stages can mean reduction in quality claims at later stages. Such benefitshave big multiplication factors in terms of business values associated with them,of course not to mention intangible benefits such as credibility and brand build- Cyber-physical systems consist of computational and physical components that areable to perceive real-time changes as a result of seamless integration (NIST Report2013).2 Mohak Shah ing for manufacturers. In addition to the above opportunities directly related tothe manufacturing process, big data and predictive analytics can have a signifi-cant impact on making cyper-physical systems much more effective and efficient.Predictive analytics are also poised to address important issues in areas such ascapacity planning due to uncertainty in downstream capacities, inventory andsupply-chain management by reducing uncertainities around material and partavailabilities, and by reacting to (or anticipating) market and customer demandchanges. Importantly, this can also help understand and address product designand performance issues, and can help to complete the loop with respect to themanufacturing process at the material and design stages. This final aspect infact has immense significance and leads us to the next area of operation andmaintenance of (heavy industrial) assets.
Over the past few years, most organizations have undergone a major change intheir business models or are in the process of doing so referred to as servitiza-tion. This basically emphasizes a customer focus in product and service delivery,and is already a major factor in consumer-oriented companies (e.g., home ap-pliances and electronics). However, this has taken on even higher importance inasset-heavy organization such as manufacturers and operators of heavy assetslike aircraft engines, locomotives, turbines, mining and construction equipments.The renevue models of these organizations have undergone drastic changes overpast few decades. While manufacturers drew majority of their revenues fromsale of these assets earlier, they do so now by selling service agreement and per-formance guarantees over the lifetime or usage of these assets. Moreover, suchagreements rely heavly on asset utilization and hence it is imperative for theorganizations that they have a very high visibility into asset operation so thatthey can not only quickly address but also effectively anticipate any major im-pending failure (at least at the asset level but ideally at the component level)that can jeopardize operational efficiency of the operators. Achieving an optimalefficiency and availability of asset are critical to both these manufacturers andtheir clients. Moreover, this also enables effective planning on the maintenanceactions, performing fast and effective root-cause analysis as well as detectingand anticipating warranty issues as early as possible (this is analogous to therequirement at the manufacturing plants discussed above).For instance, in the context of aviation, capabilities for predicting anomaliesas well as prognostics can be potentially integrated in the flight controls. The IoTcan further help in correlating these anomalies with additional information suchas weather, particulate matter, altitudes as well as put this in context with fleetlevel statistics (Brasco et al. 2013). Big data acquisition capabilities are furtherenabling high resolution monitoring of aircraft engines. While traditionally fieldengineers were able to monitor snapshot data from engines, now full flight datacan be reliably analyzed. Further, large scale analysis on multiple sensors canbe performed that facilitate tasks such as event detection, signature discovery ig Data and the IoT 13 and root cause analysis. Other industrial domains stand to benefit from these ap-proaches as well. Data is captured at different stages during the life and operationof assets. This data is organized in disparate forms and is typically disconnectedover the stages and actions taken in maintaining and operating an asset. Some ofthese major data sources include condition monitoring data (present and histor-ical), controller parameters, digitized machine performance data, machine andcomponent configuration, model information, utilization and operational data,as well as maintenance activities. Leveraging such data would not only allowbuilding of aggregate models for the fleet but also for more customized modelsfor unique (set of) assets. Further, such information can be combined at thefleet level in order to understand aspects such as asset-deterioration patternsand behavior under varying operating conditions. Also, the underlying physicalmodels that otherwise explain the behavior of assets under nominal conditionsand effect of operations and utilization patterns on engine life, can be enhanced.In turn, this can significantly impact the maintenance of the engines taking uscloser to condition based maintenance.Furthermore, data-driven insights would allow for enhanced capabilities tomanage and contain unanticipated field events (e.g., asset failures or malfunc-tions) by enabling their localization, subsequent root cause analyses and iden-tification of the most efficient resolution mechanisms as well as future designchanges. Remote maintenance is an opportunity that is already being realizedin some cases. Lee et al. (2014), for instance, discusses a case study on the re-mote maintenance of Komatsu smart bulldozers used in mining and construction.Such capability would have significant impact on asset reliability, maintenancescheduling as well as reduction of unplanned downtimes (Lee et al. 2014, Zakiand Neely 2014) with the ultimate goal of having self-aware and self-maintenancemachines. This would in turn benefit fleet operations, scheduling as well as op-timizations.Fleet-level analytics would also enable more efficient fleet management aswell as better user experience, and hence is not limited to the heavy asset indus-try; connected cars and networked electromobility are examples. For instance,the BMW group, Bosch, Daimler, EnBW, RWE and Siemens have come to-gether to take on the “Hubject” initiative (see appendix) with the aim of op-timizing electromobility through convenient access to a charging infrastructure.Connecting electromobility service providers, charging station operators, energysuppliers, fleet managers, and manufacturers, as well as utilizing analytics toprovide value-add services (e.g., identify closest charging station, and suggestcharging routines), is a demonstration of end-to-end IoT enabled capability forconsumers. A Bosch MongoDB white-paper (2014) outlines another example usecase in the context of mobility and automation using telematics data. Similarly,as cars evolve (they typically have the computing power of 20 PC’s processingabout 25GB’s of data an hour (McKinsey Study 2014)) to a connected world,they are moving beyond optimizing internal functions. The connected car ini-tiative aims at developing the car’s ability to connect to external network and not just enhance in-car experience but also to self-optimize its operation andmaintenance.Other advantages of connected vehicles will obviously be for fleet manage-ment and companies that rely on such vehicle fleet for their operations or evenentire business models. Examples include postal and courier services, deliveryindustries, servicing companies (e.g. consumer appliance services) and so on thatwould look for connected vehicles to optimized different value drivers like gasconsumption, route optimization, service time reduction, resource allocation andfleet efficiency.
In the above subsection the focus of our discussion was mainly mobile assets.However, big data analytics capabilities are also impacting operations and main-tenance of stationary assets such as energy turbines, and plants (production, gen-eration and so on). Various monitoring, positional and control sensors in plantsare enabling more effective plant maintenance, identifying sub-optimalities aswell as safety and security. Predictive maintenance of machinery and plants canallow for higher availabilities, as well as better guarantees on quality and forreducing process variability (IBM White Paper 2011, Kurtz et al. 2013). By in-telligently instrumenting the plants with advanced sensors as well as integratinginformation from existing control and monitoring sensors, we can develop a bet-ter understanding of plant’s operational status, anticipate and predict failures,and identify sensor correlations to understand (and in some cases ascertain) sen-sor interdependence as well as adjust working set-points for the equipments. Thiscan enable improved stability in plant operations as well as a reduction in thehigh alarm rates that can hamper the operations. Garcia et al. (2011) proposessuch a monitoring model for the case of an oil processing plant. In addition toimprovement in plant utilization (efficiency) and availability (less downtime),benefits will appear in terms of a reduction in operating costs as well as theability to make real-time decisions.Similarly commercial facilities maintenance (e.g., large buildings, campusesor company facilities) in broader context can also be seen as a part of this topic.However, most efforts in those areas are currently focused on energy optimizationand hence we will cover those a little later.
Resource exploration industry such as oil and gas, mining, water and timberconstantly face challenges in terms of finding renewable reserves of natural re-sources, and balance them with volatility in demand and price. The goals forthese industries are often achieving a delicate balance of increasing productionand optimizing costs while at the same time reduce the impact of environmen-tal risks (e.g., reduction in carbon footprint). Among these various industries,the oil and gas industry, and particularly upstream sector of it is a complex ig Data and the IoT 15 business that rely heavily on data. Even before the advent of “big data”, theseindustries have made use of data from various sources of information whetherit is in the context of studying soil composition during mining, or monitor-ing deep sea or sub-surface assets’ health through traditional prognostics andhealth management methods. However, these industries face new challenges asthe data grows exponentially in volume, resolution (or speed of capture) and invariety. For instance, seismic data such as wide azimuth offshore data results invery high volumes. In addition to the seismic data, structured data comes fromsources such as well-heads, drilling equipments and multiple types of sensors,such as flow, vibrations and pressure sensors, to monitor assets. This needs tobe combined not only with drilling and production data but also those fromunstructured data sources such as maps, acoustic data, image and video dataand well logs. Further, these data are used, studied, analyzed and processed byvarious business segments that in turn generate a variety of derived data suchas reports, interpretations and projections. The industry stands to gain deepand meaningful insights if these data can be efficiently and effectively managed , integrated and reconciled . Like many other areas, the foremost challenges arethose of data management, preprocessing and more importantly ascertainingdata quality. Working successfully with such data sources can mean a significantincrease in production, possibly at lower risks to the environment and safety,reduction in costs as well as speed to first resources .While many companies in the oil and gas sector such as Chevron and Shell,have started looking into leveraging big data analytics, much more effort isneeded to benefit from these opportunities in various areas. Baaziz and Quo-niam (2013) identify areas that stand to benefit specifically in the upstreamoil and gas industry (see (Feblowitz 2012, Nicholson 2012, Hems et al. 2013,Seshadri 2013) for further details):1. Exploration:
Enhancing exploration efforts (e.g., by helping experts verifyfield analysis assumptions where new surveys are restricted by regulations);Improved operational efficiency by combining enterprise data with real-timeproduction data; Efficient and cost-effective assessment of new prospects bymore efficiently utilizing geospatial data; Early identification of potentiallyproductive seismic trace signatures; and building new scientific models viainsights discovery from multiple data sources (e.g., mud logging, seismic,testing and gamma ray).2.
Drilling and Completion:
Building more robust drilling models from currentand historial well data and subsequent integration into the drilling process;Adaptive models to incorporate new data; Improved drill accuracy and safetyby analyzing continuous incoming data for anomalies and event prediction;Reduction in Non Productive Time (NPT), one of the major concerns in the Oil and gas industry can be viewed in three different segments: Upstream (concernedwith exploration, drilling/development and production), Midstream (concerend withtrading, transportation and refining) and Downstream (concerned with bulk distri-bution and retail). Refining step has components in both midstream and downstreamsectors.6 Mohak Shah industry, by early identification of negative impacting factors of operationswhile increasing foot-per-day penetration; Models for optimal cost estima-tion; Predictive maintenance for increased asset availability, reduction indowntime as well as managed maintenance planning.3.
Production:
Mapping reservoir changes over time for adaptation of liftingmethods for enhanced oil recovery (e.g. to guide fracking in shale gas plays);More accurate production forecasts across the wells for quicker remediationof ageing wells; real-time production optimization by allowing the producerto optimize resource allocation and prices; Increased safety through ear-lier anticipation and prediction of problems such as slugging and WAG gasbreakthroughs;4.
Equipment Maintenance:
We covered this more broadly in the previous sub-section. In the current context, this refers to preventing downtime, optimiz-ing field scheduling as well as maintenance planning on shop floor.5.
Reservoir Engineering:
More accurate engineering studies and a better un-derstanding of subsurfaces by more efficiently analyzing data and subsurfacemodels.IoT related opportunities also exist in the midstream and downstream in-dustries. For instance, Seshadri (2013) further identifies opportunities in envi-ronmental monitoring (e.g., by analyzing real time sensor data for regulatoryas well as company control compliance; maintenance prediction based on pollu-tion levels), reducing set up times at refineries by quicker crude assay analysisfor oil quality prediction, and predictive and condition maintenance on assetsin the transportation and refinement facilities (both in mid- and downstream).Similarly, opportunities also exist in retail optimization (e.g., gas station au-tomation).While the above opportunity areas are detailed in the context of oil andgas industries, these areas, opportunities and challenges are quite similar inother resource exploration industries too (see, for instance, Mind CommerceLLC Report (2014) for a broader discussion).
Energy is another sector that can be transformed as a result of big data analyt-ics. While utilities have been identified as one of the biggest stakeholders, variousother industries whether they are direct energy producers, services companiesperforming campus and facilities’ energy management or sectors relying on andimpacted by energy consumption can foresee significant potential in increasingproduction, improving energy demand prediction, reducing uncertainities in en-ergy supply, better resource management through efficiency gains and energywaste reduction. These can yield benefits not just from a financial and efficiencyperspective but can also be instrumental from an environmental perspective. Forinstance, a study on energy efficiency from McKinsey and Co. concluded thata holistic program could result in energy savings in access of $ 1.2 trillion anda reduction in end-use consumption by 9.1 quadrillion BTUs while eliminating ig Data and the IoT 17 up to 1.1 gigaton of greenhouse gas every year by 2020 (MIT Business Report2015).The opportunity areas in the energy domain that stand to benefit from bigdata analytics include but are certainly not limited to:1.
Asset and workforce management at utilities:
Increasing availability (e.g., inweather conditions like storms) by reduction in downtime and maintenanceoptimization as well as identifying potential hazardous situations, outagemanagement, wind-farm management by turbine optimization both at anasset- and an aggregate-level, reducing energy thefts, etc.2.
Grid operations:
E.g., load forecasting and load balancing primarily for peak-shaving which is an immediate priority, outage management and voltage op-timization, optimizing network and energy trading and incorporation of dis-tributed smart grid components into the storage system, proactive manage-ment of distribution network, combining energy facilities into virtual powerplants, incorporating renewable energy sources into the grid, increasing gridflexibility and scalability, optimization of energy production and supply aswell as efficiency gains for utilities.3. Transportation:
E.g., reducing energy consumption by dynamic pricing forroad use and parking, frequent updated traffic information and route opti-mization.4.
Infrastructure:
Smart cities (smart parking, traffic monitoring and control,structural health systems); Infrastructure management, smart grids, etc.Please refer to (Zanella et al. 2014, Bettencourt 2013, Byrnes 2014) for abroader discussion on the implications and uses of big data and IoT in cities.5.
Residential and Commercial facilities: reducing energy consumption by op-timizing utilization (e.g., of HVAC and heating operations), smart metersto track consumption and subsequent load balancing, smart appliances andlighting for need-based operation, reduction in energy waste by identifyingenergy holes and sinks; smart systems for water, lighting, fire, power, cooling,security and notifications resulting in cost savings, preventative maintenanceof critical systems and in environmental benefits (Reddy 2014).6.
Operations: efficiency gains and cost reductions in company operations, e.g.,identification of energy sinks such as running unutilized resources, optimiza-tion of energy usage in data centers, etc.Please refer to (Orts and Spigonardo 2014) for a discussion on many of theseperspectives. Further, MIT Business Report (2015) covers a variety of opportu-nities as well as challenges as cities both grow and get “smarter” through moresensors, and advanced network and communication capabilities.
Reddy (2014) points out that one of the major changes in the healthcare domainas a result of IoT is the ability to monitor staff and patients, and the ability to lo- cate and identify the status of healthcare equipment/asset resulting in improvedemployee productivity, resource usage and efficiency gains, and cost savings.Further, as Nambiar et al. (2013) note, “big data analytics can improve opera-tional efficiencies, help predict and plan responses to disease epidemics, improvethe quality of monitoring of clinical trials, and optimize healthcare spending atall levels from patients to hospital systems to governments”. Many edge deviceshave been introduced to patients in particular, and a wider population in general.For patients, these range from temperature monitors, blood glucose-levels mon-itors, fetal monitors, electrocardiograms (ECG), and even electroencephalogra-phy (EEG) devices. Not only this, efforts are underway to move beyond moni-toring towards comprehensive health management. See, for instance, the HealthBuddy device to capture daily activities and status of high risk patients. Inaddition to the patient, an increasing section of healthy population are routinelyusing health and activity monitoring devices such as the jawbone and fitbit, aswell as various applications through sensors on the mobile devices. Consequently,this also allows for various value-added application that can guide healthy liv-ing practices. Moreover, the fact that these devices are connected to the cloudalso enables anonymized aggregate analyses at population segment levels as wellas across other dimensions such as geographies and demographics. While thismeans a more healthy lifestyle for healthy populations, the ability to monitorpatients’ conditions on a continuous basis through these devices has very sig-nificant advantages. For instance, combining remote monitoring capability anddistant communication technologies (e.g., low-cost video-conferencing), can allowfor efficient remote healthcare in areas where direct access to medical personnelis difficult or time-consuming. Many other benefits such as reducing the numberof at-risk patients, a reduction in readmission risk, epidemic monitoring, mo-bile healthcare for home and small clinics (Ghose et al. 2012), ambient assistedliving (Dohr et al. 2010) and chronic patient monitoring (P´aez et al. 2014) canbe realized. There are also advantages in pharmaceutical drug trials, treatmenteffectiveness, as well as chronic disease management. Big data and the IoT canfurther allow for valuable insight discovery and knowledge extraction from per-sonal health record, or PHR (current and historical) data (Poulymenopoulouet al. 2013).In addition to being able to provide higher quality of healthcare, providerswill also benefit from the other previously discussed benefits of IoT includingpredictive maintenance and real-time asset monitoring capabilities resulting inhigh availability levels, reduction in operating costs as well as increasing sup-ply chain efficiencies. See (Bui and Zorzi 2011, Doukas and Maglogiannis 2012,Zaslavsky et al. 2013) for further related discussions.One of the major concerns in such large scale data analyses is with respect tothe data privacy and security, mainly the protected health information (PHI).Mechanisms to ensure proper data de-identification and anonymization as well See . .ig Data and the IoT 19 as ascertaining data security in communication channels is an extremely highpriority. While this also applies to areas discussed above, these issues are criticalin the healthcare domain and if ignored can have serious implications for bothindividuals and population at large. We are moving towards a shopping experience in connected supermarket. Oneof the widely discussed IoT use cases involves the pre-specification of shoppinglists that can be communicated to the superstore so that the checkout wait-times(one of the major problems in retail stores today) can be reduced. However, thereare more interesting and compelling use cases that will be built on top of theresulting data that is generated. For instance, shoppers’ purchase patterns can bemined for recommendation of relevant items, and in addition sales and discountscan be highlighted. Combined with social data and other preferences that aconsumer may make available, a consumer-centric experience can be createdtailored to consumers’ unique preferences. These capabilities can also enableeffective monitoring of shopper traffic across stores, targeted marketing as wellas product positioning.Further, being able to track product movement, e.g., through technologiessuch as RFID tags, will allow retailers to have a more accurate and efficientinventory management, increase inventory accuracy and reduce thefts as well asadministrative costs. As Reddy (2014) points out, big data capabilities alongwith IoT will result in stock-out prevention as a result of connected and intel-ligent supply chains, as well as real-time tracking of parts and raw materialsallowing to preempt problems and address demand fluctuations. Naturally, thisinformation can be fed back into the manufacturing and distribution channelsfor further optimizations leading to a reduction in required working capital,efficiency gains as well as avoiding disruptions. Bosch MongoDB white-paper(2014) illustrates another use case in retail where inventory can be tracked as itmoves from shelf to basket allowing the retailer to enable analytics for optimiz-ing available supply to predicted demand, reducing uncertainties and fluctuationsthrough warehouse operations and the supply chain.Waller and Fawcett (2013) discusses opportunities for big data analytics ingeneral in the logistics industry highlighting potential areas of opportunity asreal-time capacity availability, time of delivery forecasting, optimal routing andreduction in driver turnover, all when it comes to carrier optimization. Morespecifically, in the context of fleet management, logistics companies will relyincreasingly on the capabilities offered by big data analytics and the IoT toharness benefits in various areas. Jeske et al. (2013) discuss areas relevant to theintersection of big data and IoT:1. Optimization of service properties like delivery time, resource utilization,and geographical coverage - an inherent challenge of logistics.2. Advanced predictive techniques and real-time processing to provide a newquality in capacity forecast and resource control.
3. Seamless integration into production and distribution processes for earlyidentification of supply chain risks leading to resilience against disruptions.4. Turning the transport and delivery network, as a result of efficient sensorinstrumentation, into a high-resolution data source. In addition to fleet man-agement by network optimization, this data may provide valuable insight onthe global flow of goods allowing the level of observations to a microeconomicviewpoint.5. Real- or near real-time insights into (changes in) demographic, environmen-tal, and traffic statistics by analyzing the huge stream of data originatingfrom a large delivery fleet.As IoT enables self driving vehicles, the logistics industry anticipates a largeimpact on end-to-end logistics operations as highlighted by Joint DHL BoschKIT report (2014). Please also refer to
Delivering Tomorrow , for some studieson how such new trends are anticipated to impact the logistics industry in thefuture. In addition to the major sectors that we discussed above, IoT has many oppor-tunities directly targeted to consumers. While the above discussion also identi-fies ultimate opportunities for consumers, such as higher quality and possiblymore accessible healthcare for patients, better retail experience, smart homesand energy efficiency gains, the IoT will become a part of daily lives via directinteraction with a lot of devices;
Wearables and Assistant devices is one suchimportant area. For instance, activity trackers such as jawbone and fitbit arealready becoming a routine part of our lifestyle. beddit , withings Aura as wellas advanced versions of jawbone can now also perform sleep monitoring, Being does better tracking than regular accelerometers while devices such as
Vessyl can monitor what we drink. Connected wearables such as
Ego LS , a wearablecamera, can stream live video while
Tzoa can do real-time environment trackingincluding pollution and UV exposure. Monitoring children is also made easierby devices such as
Pacif-i and
Sproutling . In addition, there are many assistantdevices that intend to make our lives easier including home assistant robots (e.g.,
Jibo ), automatic lawn mowers (e.g., Bosch’s
Indego ), home interaction devices(e.g., Amazon’s
Echo ) as well as self-driving cars. These are of course just a fewillustrative examples. There are a myriad of devices in the market today.
Integrated Systems and Services
Metz (2015) describes the wearables rev-olution of sorts as a result of IoT-enabled devices like the ones mentioned above.However, when it comes to consumers, we believe that most of the benefits fromthese developments will come not from individual offerings but from integratedsystems. This requires not just connected, but interconnected devices. That is, ig Data and the IoT 21 the interaction doesn’t just happen via the cloud but also between devices allow-ing them to adapt their bevahior as per requirements. For instance, a completehome automation systems that can not only control temperature, lighting andenergy consumption of home appliances, but can also connect, communicate andcoordinate with assistant devices such as echo , vacuum or lawn mower as wellas other aspects such as cars for a seamless and integrated experience to theuser and optimization of resource use. One can similarly think of an integratedhealth management capability. Even though self driving or autonomous cars canbe considered an exception here since they can be self-contained, they would alsodraw benefits from these capabilities. These would be beneficial in a range of ar-eas includng transportation, logistics and mobility. Similarly, in industrial facingapplications, this would mean more responsive, self-monitoring and potentiallyself-maintaining assets. For instance, wind-turbines can adapt their performancenot just in relation to the wind and local weather but also in relation to the aglobal optimization at the aggregate level of a wind farm. Similarly, assets suchas aircraft engines can be responsive in relation to their peers (e.g., assets oper-ating under similar operational and utilization conditions).On protocols for data transfer and communication too, a variety of standardscurrently exist either due to various disparate efforts (to avoid dependencies) ordue to companies developing proprietary offerings. Services companies will prob-ably fill the gap created by non-existence of a common communication standardfor various devices. The market will see a growth not only in such interconnec-tion and integration services but also value-added services resulting from suchintegration. For instance, Tado is providing an interfacing through a variety ofheating systems from multiple manufacturers for a smart thermostat system.Beyond increasing interoperability of standards or devices, such services willalso generate new business and revenue models and value-add capabilities allow-ing for better operations (e.g., improving availability, ensuring higher quality ofservice of systems (Deb et al. 2013)), and financial risk modeling (e.g., betterpricing and term-structure based on field operations).Finally, many other sectors such as insurance (more informed risk modelingby utilizing real-time information), sustainability, social good, and security standto gain with advancements in the big data and IoT technologies. The futureholds even more promises such as opportunities with nano robots that can curediseases, or in near-term, drones for various applications including deliveries andintegrated surveillance functions.
From organizations’ perspective, harnessing value not just from IoT related bigdata analytics, but data science in general, requires foundational capabilities tobe set in place before useful insights discovery can begin. The analytics readiness requirements include some of the capabilities discussed in Section 2 such asefficient storage and compute infrastructure, data acquisition and managementmechanism, machine learning and data modeling capabilities as well as efficient deployment and scaling mechanisms. In addition, organizations also need tofacilitate interfacing between engineering or domain experts and data scientistsfor efficient and productive knowledge transfer, agreed-upon validation as wellas adoption and integration mechanism for analytics.There are three most important objectives that an organization needs toachieve to realize these gains as they transform to be more data-driven.1.
Data and Analytics strategies that align with the business vision. While a lotof data science activities and modeling exercises can be done in a bottom-upfashion, a coherent strategy can guide how the individual scattered effortscome together. Such a strategy should take a comprehensive view of howanalytics can be a part of the decision making and insights generation processin the light of existing as well as future business directions, the enablementchannels, and required skillsets. In the absence of a sound strategy and anexecution plan, the isolated analytics efforts can quickly go adrift since itwould be almost impossible to ask the “right” questions. While this topicis not the focus here, it is still important to recognize the need for suchstrategy.2.
Culture change to accept insights from validated , verified and principled data-driven analytics into decision making at all levels. Even though a lot ofcapabilities in analytics are being commoditized, it is extremely importantthat users, both the ones performing analytics and the ones ingesting the re-sulting insights, are aware of the assumptions and constraints of the methodsapplied, as well as the ranges in which these should be interpreted. More-over, such a culture-change is not unidirectional. Data divisions also have theonus of understanding the domains and their operational constraints betterto be able to deliver value and to complement the domain and engineeringexperts.3. Innovation to address open problems especially in the context of respectivebusiness applications. Organizations need to invest in innovation since differ-entiation will result from novel capabilities and well engineered integration.
While the technological feasibility of big data analytics for the IoT has beendemonstrated in limited contexts, much more needs to be done to realize thebroader vision. Not only the existing technology needs to be perfected, furtherinnovation is needed to solve current bottlenecks as well as address longer termrequirements. On the IoT end, this can mean increasing efficiency and afford-ability of data acquistion devices while reducing energy consumption as well asstandardization of M2M service layer. Efforts are also needed in building com-mon communications standards (while efforts are underway we do not have anyconsensus yet) and improving interoperability across data, semantics and or-ganizations. In addition to the sources listed earlier, see (Vermesan and Friess2013) for a discussion on some additional aspects of IoT as well as architecturalapproaches in different contexts. ig Data and the IoT 23
On the big data processing and analytics, we have just scratched the surface.Improved solutions are needed for problems such as analyzing massive tempo-ral data, automated feature discovery, robust learning, analyzing heterogeneousdata, efficiently managing complex, as well as meta-data, performing real-timeanalytics and handling streaming data (see, for instance, (Zhou et al. 2014, Zicari2013)).However, from a social point of view, there are also some major areas of con-cern that need to be addressed. We broadly divide these concern areas into twocategories. The ones in the first category are technological challenges : researchcommunity has been sensitized to these and work is currently underway to betterunderstanding and addressing them. However, it must be mentioned that theseareas warrant more attention and effort than they currently receive. Main areasin this first category include:1.
Privacy Issues:
Machine learning and data mining communities as well asother fields including policy, security and governance have been working onthese issues for some time. From an analytics perspective, privacy preservingdata mining has developed into a subfield and considerable effort has goneinto studying privacy challenges in data mining (Matwin 2013, Navarro-Arribas and Torra 2014), data publishing (Fung et al. 2010) and, to someextent, integration and interactions of sensors (Aggarwal and Abdelzaher2011). However, these efforts have focused mainly on the data and analyticslayers. Better protocols are also needed for other layers in the IoT stack. Forinstance, privacy and de-identification at the data acquisition layer needsto be efficiently addressed. For every application, there are also specific re-quirements, both regulatory and technological, that should govern privacyconcerns. For instance, in the US, HIPAA governs the majority of the re-quirements in dealing with medical data in many scenarios. Clear data gov-ernance and handling policies are needed to guide the efforts in the desireddirection.2. Security Issues:
Security is always a concern in the case of large distributedsystems. The more access points a network has, the more vulnerable it be-comes. In the absence of clear and agreed upon standards and protocols, thesecurity challenges are increased exponentially. In fact, security issues in theIoT are already a reality. For instance, Witten (2014) discusses top securitymishaps in various contexts of IoT. Some work in this direction is alreadybeing done (e.g., (Glas et al. 2012, Yavuz 2013)) and needs to continue andexpand.3.
Interpretability issues:
When employing analytics models in practice, weneed to confirm how much we can rely on abstract models generalized basedon non-linearities in the data and what aspects require interpretability ofthese models. Some requirements can be imposed due to the nature of appli-cation field (e.g., due to regulations) while in others interpretability can beneeded to make use of the findings (e.g., gene identification). Sophisticated Health Insurance Portability and Accountability Act4 Mohak Shah models can undoubtedly leverage more information from data compared totheir simpler, interpretable counterparts. However, better evaluation andvalidation mechanism should be put in place to guarantee generalizability.4.
Data quality issues:
Often, it is seen that the acquired data does not supportdesired analysis. For example, in a lot of cases, the acquired data from sensorsis not intended to performed inductive inference at scale but rather is aimedto target a specific aspect such as safety, or reliability. Such cases would needan enhanced understanding of what use can be made of available data in theanalytics context and how data quality can be ascertained.The second category of issues is even more important in our opinion. Wecall these adoption challenges , referring to the issues resulting from inevitable,pervasive and ubiquitous adoption of analytics in various domains. This shouldnot be viewed as an argument against more integration of analytics. Just as anyother technology, analytics is a neutral force and the implications of its integra-tion and use would rely on responsible choices made while trying to leverage it.Our aim is to sensitize the community so as not to overlook these as we movetowards a new paradigm. Even though it is not possible to have immediate an-swers, we would like to highlight the issues to raise awareness of them duringdecision making processes as well as evolving strategies:1.
Model reliability, validation and adaptation:
This is possibly the most widelydiscussed issues in the current list. Just by statistical chance, given that themodels operate on vast amounts of data, correlations will be found. Howshould these correlations be validated? Standardization and agreement isrequired to evaluate these models and understand the associated risks. Prin-cipled forward testing mechanisms will be required, especially in cases ofrare events such as asset failures. Backward testing and validation set-basedevaluations are limited. Further, robustness of the models needs to be ascer-tained in changing environments either via model adaptation or via regularevaluation and requirements caliberation. Moreover, as these models interactwith the environment and do not operate in isolation, their validation andverification becomes all the more crucial. This is especially important sincethe cost of doing “wrong” analytics may be significant for certain areas suchas physical and mission-critical systems.(a)
The risk of over-sophistication:
Extreme fine-tuning may result in modelsthat can be very effective, but only for a very short period. If analyt-ics has to be integreted into the process, it should be long lasting andadaptable. This requires more than just models that take into accountevolution of the data or labels (e.g. concept drift) but also refers to howthese models are utilized, how the expectations change over time andhow the process responds to the results.2.
Integration and reconciliation with our physical understanding of the world:
As IoT grows, analytics will increasingly be integrated in the environment,whether embedded in devices or assisting in decision making based on aggre-gate analytics. It is extremely important that we can reconcile these capabili-ties with the basis that we use to build and operate the physical devices (e.g., ig Data and the IoT 25 physics-based models). An argument can be made to restrict the models to“interpretable” ones when it comes to analytics. However, this trades off theknowledge that can be had from non-linear models in deriving non-obviousrelationships hidden in the data. We need better mechanisms to integratethese models and to validate their findings.3.
Human-analytics interaction:
As technology becomes pervasive, it tends tohave an assumed truth effect , meaning that over time the users take theresults with ever increasing trust. Consequently, in scenarios where decision-making will move closer to automated approaches, we should be mindful oftheir advantages as well as limitations. For instance, automated approacheshave the potential to reduce the variations resulting from manual approaches.However, in some cases such variations are desired, even required, so thatwe can advance our understanding through a multitude of perspectives. Itis timely to start seriously discussing about how humans will interact withanalytics moving forward; how would this impact the decision making; wouldthis lead to undesired uniformity?; will we be able to notice inconsistenciesand errors in the suggested decisions as our reliance on these models increase?How would these models respond to evolving realities of the world? Howwould the automated decision making impact policy?4.
Potential for systemic errors and failures:
Another aspect to consider is howmuch of a threat do automated decision making models pose to systematicas well as systemic failures as they become pervasive; Can the errors ofindividual pieces multiply resulting in system-wide risks? Will they havepotential to bring down the whole system? Can massive interconnectivityresult in a system-wide spread of failures, threats or even attacks? Note thatthe individual risks can be small and gradual but taken together they maylead to serious implications. Consequently, a risk containment mechanismwill be needed in interconnected systems.(a)
Localization of “failures”:
If system-wide events were to happen, wouldwe have the ability to locate the sources? will we be able to quarantine apart of the system? Moreover, what effect would this have on the userssince these systems will be an integrated part of peoples’ lives? Howwould the necessary and important services be affected?5.
Personalization vs. Limitation of choice:
There can be intended and un-intended, but nonetheless undesirable, consequences of “personalization” ofservices to individual lifestyles. On the unintended side, can over-personalizationlimit choice? For instance, as an effort to recommend the most relevant op-tions, a subset of possible options is presented to the user. However, overtime, and with increasing reliance on these recommendations, the users’ ex-posure to possibilities outside of these recommendation-ranges can poten-tially be adversely impacted. Such systems can then potentially be used formalicious purposes such as social engineering around issues. Just as policyshould take into account these aspects as technology grows, technologistsalso share the responsibility to contribute to addressing these issues.
In this chapter, we discussed how big data technologies and the internet of thingsare playing a transformative role in the society. The pervasive and ubiquitousnature of such technologies will profoundly change the world as we know it, justas the industrial revolution and the internet did in the past. We discussed op-portunities in various domains both from an industrial and from the consumers’perspective. Given the data acquisition capabilities that are in place in the con-text of monitoring physical assets, the immediate opportunities are bigger froman industrial perspective. On the consumer end, we are currently undergoinga transformation as physical devices capable of advanced sensing become partof our routine life. Consumer applications will start witnessing a rapid growthin integrated services and systems, which we believe will generate much morevalue in contrast to one-off offerings as noted in Section 3.8, once a critical massof such interconnected devices is reached in various domains. The capabilitiesin leveraging big data in both of these contexts are already transitioning fromperforming descriptive analytics to predictive analytics . For instance, based onreal-time sensor data, we can predict certain classes of field events (e.g., failuresor malfunctions) for heavy assets such as aircraft engines and turbines more re-liably; this complements physics-based models employed in such cases. As thesetechnologies mature, they will enable another transition from predictive to pre-scriptive analytics whereby recommendations on resolutions of such events couldbe made. This may develop to the extent of devices themselves taking correctiveactions, and thus making them self-aware and self-maintaining. Even though weare already witnessing a paradigm shift, more needs to be done on various fronts,such as advancements in big data technologies, analytics, privacy and security,and policy making. In addition, the requirements at an organizational level interms of readiness to harness the value resulting from analytics are discussed.We then discussed broad social implications and highlighted areas of concernsas these technologies become pervasive. We organized these concerns into twocategories: technological challenges that are relatively better understood, even ifnot entirely resolved, and adoption challenges that we believe are more unclear.As the adoption and integration of such technologies grows, so will our under-standing of the implications evolve. However, the pace of change is fast indeed,and we will need to be quick in understanding this evolving landscape, analyzingthe resulting changes and defining proper policies and protocols at various lev-els. Factors such as human-analytics interaction will also play an important rolein how responsibly and effectively analytics complement our decision-makingability as well as how much autonomy these systems eventually obtain.Finally, we should reiterate that technologies are neutral. Any technologywill have implications on society. The onus is on us to define how the technologyis adopted in a responsible manner. Appendix
Links to entities referred to in the article (in alphabetical order): ig Data and the IoT 27 – Amazon AWS for IoT: http://aws.amazon.com/iot/ – Amazon Echo: – Beddit: – Being: – Bosch, ABB, LG and Cisco’s joint venture announced recently to cooper-ate on open standards for smart homes: – Bosch Indego: – Cloud Foundry: – Cloudera and Hortonwork’s real-time offering: – Ego LS: – Fitbit: – Hadoop Ecosystem: See, for instance, http://hadoopecosystemtable.github.io/ – Hubject, a joint networking mobility initiative of the BMW group, Bosch,Daimler, EnBW, RWE and Siemens: – IBM SyNAPSE: – Jawbone: https://jawbone.com/ – Jibo: – Microsoft’s IoT offerings: – NVidia’s Tegra X1: – Pacif-i: http://bluemaestro.com/ – Pandas: http://pandas.pydata.org/ – Predictive Model Markup Language (PMML): – Qualcomm Zeroth: – Spark: https://spark.apache.org/ – Sproutling: . – Storm: https://storm.apache.org/ – Tado: – Tzoa: – Vessyl: – Withings Aura: – Zementis: http://zementis.com/ ibliography
C. C. Aggarwal and T. Abdelzaher. Integrating sensors and social networks.In C. C. Aggarwal, editor,
Social Network Data Analytics , pages 379–412.Springer US, 2011. ISBN 978-1-4419-8461-6. doi: 10.1007/978-1-4419-8462-314. URL http://dx.doi.org/10.1007/978-1-4419-8462-3_14 .C. C. Aggarwal, N. Ashish, and A. Sheth. The internet of things: A survey fromthe data-centric perspective. In
Managing and mining sensor data , pages383–428. Springer, 2013.A. Baaziz and L. Quoniam. How to use big data technologies to optimize oper-ations in upstream petroleum industry.
International Journal of Innovation(IJI) , 1(1):30–42, 2013.Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin. A neural probabilisticlanguage model.
Journal of Machine Learning Research , 3:1137–1155, 2003.L. M. A. Bettencourt. The uses of big data in cities. Santa Fe Institute workingpaper 2013-09-029, September 2013. URL .Bosch MongoDB white-paper. IoT and big data. Technical report, Octo-ber 2014. URL http://info.mongodb.com/rs/mongodb/images/MongoDB_BoschSI_IoT_BigData.pdf .S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimizationand statistical learning via the alternating direction method of multipliers.
Foundations and Trends R (cid:13) in Machine Learning , 3(1):1–122, 2011.C. Brasco, N. Eklund, M. Shah, and D. Marthaler. Predictive modeling of high-bypass turbofan engine deterioration. In Proceedings of the Annual Conferenceof the Prognostics and Health Management Society (PHM 2013) , volume 4.PHM Society, 2013. URL .N. Bui and M. Zorzi. Health care applications: A solution based on the internet ofthings. In
Proceedings of the 4th International Symposium on Applied Sciencesin Biomedical and Communication Technologies , ISABEL ’11, pages 131:1–131:5, New York, NY, USA, 2011. ACM. URL http://doi.acm.org/10.1145/2093698.2093829 .N. Byrnes. Cities find rewards in cheap technologies. MIT Technology Review,November 2014. URL .M. Chui, M. L¨offler, and R. Roberts. The internet of things.
McKinsey Quar-terly , 2:1–9, mar 2010. URL .Cognizant Report. Reaping the benefits of the internet of things. Technical re-port, may 2014. URL .D. Crankshaw, P. Bailis, J. E. Gonzalez, H. Li, Z. Zhang, M. J. Franklin, A. Gh-odsi, and M. I. Jordan. The missing piece in complex analytics: Low latency, ig Data and the IoT 29 scalable model management and serving with velox. In
Conference on Inno-vative Data Systems Research (CIDR) , pages Asilomar, CA, 2014.J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, M. Ranzato, A. Se-nior, P. Tucker, K. Yang, Q. V. Le, and A. Y. Ng. Large scale distributed deepnetworks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger,editors,
Advances in Neural Information Processing Systems 25 , pages 1223–1231. Curran Associates, Inc., 2012. URL http://papers.nips.cc/paper/4687-large-scale-distributed-deep-networks.pdf .B. Deb, M. Shah, S. Evans, M. Mehta, A. Gargulak, and T. Lasky. Towardssystems level prognostics in the cloud. In
Proceedings of the IEEE Conferenceon Prognostics and Health Management (PHM) , pages 1–6. IEEE, 2013. ISBN978-1-4673-5722-7.A. Dohr, R. Modre-Opsrian, M. Drobics, D. Hayn, and G. Schreier. The internetof things for ambient assisted living. In
Information Technology: New Gen-erations (ITNG), 2010 Seventh International Conference on , pages 804–809.IEEE, 2010.C. Doukas and I. Maglogiannis. Bringing iot and cloud computing towardspervasive healthcare. In
Innovative Mobile and Internet Services in UbiquitousComputing (IMIS), 2012 Sixth International Conference on , pages 922–926,July 2012. doi: 10.1109/IMIS.2012.26.J. Feblowitz. The big deal about big data in upstream oil and gas. IDC EnergyInsights, October 2012.E. D. Feigelson and G. J. Babu. Big data in astronomy.
Significance , 9(4):22–25,2012.B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu. Privacy-preserving datapublishing: A survey of recent developments.
ACM Comput. Surv. , 42(4):14:1–14:53, June 2010. ISSN 0360-0300. doi: 10.1145/1749603.1749605. URL http://doi.acm.org/10.1145/1749603.1749605 .A. B. Garcia, C. Bentes, R. C. de Melo, B. Zadrozny, and T. J. P. Penna. Sensordata analysis for equipment monitoring.
Knowledge and Information Systems ,28(2):333–364, 2011. ISSN 0219-1377. doi: 10.1007/s10115-010-0365-1. URL http://dx.doi.org/10.1007/s10115-010-0365-1 .A. Ghose, C. Bhaumik, D. Das, and A. K. Agrawal. Mobile healthcare infrastruc-ture for home and small clinic. In
Proceedings of the 2Nd ACM InternationalWorkshop on Pervasive Wireless Healthcare , MobileHealth ’12, pages 15–20,New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1292-9. doi: 10.1145/2248341.2248347. URL http://doi.acm.org/10.1145/2248341.2248347 .B. Glas, J. Guajardo, H. Hacioglu, M. Ihle, K. Wehefritz, and A. Yavuz. Signal-based automotive communication security and its interplay with safety re-quirements. In
Proceedings of Embedded Security in Cars Conference . Novem-ber 2012.J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Sto-ica. Graphx: Graph processing in a distributed dataflow framework. In , pages 599–613, Broomfield, CO, October 2014. USENIX Associa- tion. ISBN 978-1-931971-16-4. URL .J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami. Internet of things (iot):A vision, architectural elements, and future directions.
Future GenerationComputer Systems , 29:1645–1660, 2013.A. Hems, A. Soofi, and E. Perez. Drilling for new business value: How innovativeoil and gas companies are using big data to outmaneuver the competition. AMicrosoft White Pater, May 2013.L. Hesla. Particle physics tames big data.
Symmetry , 1, August 2012.IBM White Paper. Predictive maintenance for manufacturing. IBM, 2011.N. Japkowicz and M. Shah.
Evaluating Learning Algorithms . Cambridge Uni-versity Press, 2011.M. Jeske, M. Gr¨uner, and F. Weiß. Big data in logistics: A DHL perspectiveon how to move beyond the hype. DHL Customer Solutions & Innovation,December 2013. URL .Joint DHL Bosch KIT report. Self-driving vehicles in logistics: A DHL perspec-tive on implications and use cases for the logistics industry. Technical report,2014. URL .A. Kleiner, A. Talwalkar, P. Sarkar, and M. I. Jordan. A scalable bootstrap formassive data.
Journal of the Royal Statistical Society , 76:795–816, 2013.A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classifi-cation with deep convolutional neural networks. In F. Pereira,C.J.C. Burges, L. Bottou, and K.Q. Weinberger, editors,
Advancesin Neural Information Processing Systems 25 , pages 1097–1105. Cur-ran Associates, Inc., 2012. URL http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf .J. Kurtz, P. Hoy, L. McHargue, and J. Ward. Improving operational and financialresults through predictive maintenance. IBM Smarter Analytics LeadershipSummit, Feb 2013.S. Lawson. Iot groups are like an orchestra tuning up: The mu-sic starts in 2016.
Computer World , Dec 2014. URL .Q. V. Le, R. Monga, M. Devin, K. Chen, G. S. Corrado, J. Dean, and A. Y.Ng. Building high-level features using large scale unsupervised learning. In
InInternational Conference on Machine Learning , 2012.J. Lee, E. Lapira, B. Bagheri, and H. Kao. Recent advances and trends inpredictive manufacturing systems in big data environment.
ManufacturingLetters , 1:38–41, 2013.J. Lee, H. Kao, and S. Yang. Service innovation and smart analytics for industry4.0 and big data environment.
Procedia CIRP , 16:3–8, 2014. ig Data and the IoT 31
K. L. Leuth. Iot market segments biggest opportunities in industrial manu-facturing. IoT-Analytics, October 2014. URL http://iot-analytics.com/iot-market-segments-analysis/ .J. Lin, E. Keogh, S. Lonardi, and B. Chiu. A symbolic representation of timeseries, with implications for streaming algorithms. In
Proceedings of the 8thACM SIGMOD workshop on Research issues in data mining and knowledgediscovery , pages 2–11. ACM, 2003.L. Mackey, A. Talwalkar, and M. I. Jordan. Distributed matrix completion androbust factorization.
Journal of Machine Learning Research , page to appear,2014.A. Markkanen and D. Shey. The intersection of analytics andthe internet of things. IEEE Internet of Things Newsletter, nov2014. URL http://iot.ieee.org/newsletter/november-2014/the-intersection-of-analytics-and-the-internet-of-things.html .N. Marz and J. Warren.
Big Data: Principles and best practices of scalablerealtime data systems . Manning Publications Co., 2015.S. Matwin. Privacy-preserving data mining techniques: Survey and challenges.In
Discrimination and Privacy in the Information Society , pages 209–221.Springer, 2013.McKinsey Study. Connected car, automotive value chain unbound. Technicalreport, 2014.R. Metz. Ces 2015: Wearables everywhere. MIT Technology Review,January 2015. URL .P. Middleton, P. Kjeldsen, and J. Tully. Forecast: The internet of things, world-wide, 2013. Gartner, November 2013.Mind Commerce LLC Report. Big data in extraction and natural resource in-dustries: Mining, water, timber, oil and gas 2014 - 2019. Technical report, jul2014. URL .MIT Business Report. Cities get smarter. Technical report, Jan/Feb 2015.R. Nambiar, R. Bhardwaj, A. Sethi, and R. Vargheese. A look at challenges andopportunities of big data analytics in healthcare. In
Big Data, 2013 IEEEInternational Conference on , pages 17–22. IEEE, 2013.G. Navarro-Arribas and V. Torra. Advanced research in data privacy. 2014.R. Nicholson. Big data in the oil and gas industry. IDC Energy Insights, Septem-ber 2012.NIST Report. Workshop report on foundations for innovation in cyber-physicalsystems. Technical report, Jan 2013. URL .E. Orts and J. Spigonardo. Sustainability in the age of big data. Special Re-port, Initiative for Global Environmental Leadership (IGEL), Knowledge atWharton, September 2014. URL http://knowledge.wharton.upenn.edu/article/the-big-data-and-energy-synergy/ .D. P´aez, F. Aparicio, M. de Buenaga, and J. R. Ascanio. Big data and iot forchronic patients monitoring. In
Ubiquitous Computing and Ambient Intelli- gence. Personalisation and User Adapted Services , pages 416–423. Springer,2014.X. Pan, S. Jegelka, J. Gonzalez, J. K. Bradley, and M. Jordan. Parallel doublegreedy submodular maximization. In
Advances in Neural Information Pro-cessing Systems 22 , 2014.M. Poulymenopoulou, F. Malamateniou, and G. Vassilacopoulos. Machine learn-ing for knowledge extraction from phr big data.
Studies in health technologyand informatics , 202:36–39, 2013.A. S. Reddy. Reaping the benefits of the internet of things.
Cognizant Reports ,May 2014.R. Salakhutdinov.
Learning deep generative models . PhD thesis, University ofToronto, Toronto, Canada, 2009.M. Seshadri. Big data science challenging the oil industry. Energyworld,2013. URL http://web.idg.no/app/web/online/event/energyworld/2013/emc.pdf .R. Socher, J. Pennington, E. H. Huang, A. Y. Ng, and C. D. Manning. Semi-supervised recursive autoencoders for predicting sentiment distributions. In
Proceedings of the Conference on Empirical Methods in Natural LanguageProcessing , EMNLP ’11, pages 151–161, Stroudsburg, PA, USA, 2011. As-sociation for Computational Linguistics. ISBN 978-1-937284-11-4. URL http://dl.acm.org/citation.cfm?id=2145432.2145450 .S. K. Sowe, T. Kimata, D. Mianxiong, and K. Zettsu. Managing heteroge-neous sensor data on a big data platform: Iot services for data-intensive sci-ence. In
Computer Software and Applications Conference Workshops (COMP-SACW), 2014 IEEE 38th International , pages 295–300, July 2014. doi:10.1109/COMPSACW.2014.52.D. Tracey and C. Sreenan. A holistic architecture for the internet of things,sensing services and big data. In
Cluster, Cloud and Grid Computing (CC-Grid), 2013 13th IEEE/ACM International Symposium on , pages 546–553,May 2013. doi: 10.1109/CCGrid.2013.100.V. Turner, J. F. Gantz, D. Reinsel, and S. Minton. The digital universe ofopportunities: Rich data and the increasing value of the internet of things.IDC White Paper, April 2014. URL http://idcdocserv.com/1678 .S. Vandermerwe and J. Rada. Servitization of business: adding value by addingservices.
European Management Journal , 6(6):314–324, 1989.O. Vermesan and P. Friess.
Internet of things: converging technologies for smartenvironments and integrated ecosystems . River Publishers, 2013.M. A. Waller and S. E. Fawcett. Data science, predictive analytics, and bigdata: a revolution that will transform supply chain design and management.
Journal of Business Logistics , 34(2):77–84, 2013.Y. Wang, H. Bai, M. Stanton, W. Chen, and E. Y. Chang. Plda: Paral-lel latent dirichlet allocation for large-scale applications. In
Proceedings ofthe 5th International Conference on Algorithmic Aspects in Information andManagement , AAIM ’09, pages 301–314, Berlin, Heidelberg, 2009. Springer-Verlag. ISBN 978-3-642-02157-2. doi: 10.1007/978-3-642-02158-9 26. URL http://dx.doi.org/10.1007/978-3-642-02158-9_26 . ig Data and the IoT 33 B. Witten. Top 10 iot security mishaps 2014. In
Industrial Internet ConsortiumWeb blog post . IIC, 2014. URL http://blog.iiconsortium.org/2014/12/top-10-iot-security-mishaps-2014-.html .T. Yashiro, S. Kobayashi, N. Koshizuka, and K. Sakamura. An internet of things(iot) architecture for embedded appliances. In
Humanitarian Technology Con-ference (R10-HTC), 2013 IEEE Region 10 , pages 314–319, Aug 2013. doi:10.1109/R10-HTC.2013.6669062.A. A. Yavuz. Practical immutable signature bouquets (pisb) for authenticationand integrity in outsourced databases. In
Data and Applications Security andPrivacy XXVII , pages 179–194. Springer, 2013.M. Zaki and A. Neely. Optimising asset management within complex servicenetworks: The role of data.
Cambridge Service Alliance , working paper:1–11,2014.A. Zanella, N. Bui, A. Castellani, L. Vangelista, and M. Zorzi. Internet of thingsfor smart cities.
Internet of Things Journal, IEEE , 1(1):22–32, Feb. 2014.ISSN 2327-4662. doi: 10.1109/JIOT.2014.2306328.A. Zaslavsky, C. Perera, and D. Georgakopoulos. Sensing as a service and bigdata. arXiv preprint arXiv:1301.0159 , 2013.K. Zhai, J. Boyd-Graber, N. Asadi, and M. L. Alkhouja. Mr. lda: A flexiblelarge scale topic modeling package using variational inference in mapreduce. In
Proceedings of the 21st International Conference on World Wide Web , WWW’12, pages 879–888, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1229-5. doi: 10.1145/2187836.2187955. URL http://doi.acm.org/10.1145/2187836.2187955 .Z. Zhou, N. Chawla, Y. Jin, and G. Williams. Big data opportunities andchallenges: Discussions from data analytics perspectives [discussion forum].
Computational Intelligence Magazine, IEEE , 9(4):62–74, 2014.R. V. Zicari.