Building power consumption datasets: Survey, taxonomy and future directions
Yassine Himeur, Abdullah Alsalemi, Faycal Bensaali, Abbes Amira
BB UILDING POWER CONSUMPTION DATASETS : S
URVEY , TAXONOMY AND FUTURE DIRECTIONS
A P
REPRINT
Yassine Himeur ∗ , Abdullah Alsalemi, Faycal Bensaali Department of Electrical EngineeringQatar UniversityDoha, Qatar [email protected];[email protected];[email protected]
Abbes Amira
Institute of Artificial IntelligenceDe Montfort UniversityLeicester, United Kingdom [email protected]
September 18, 2020 A BSTRACT
In the last decade, extended efforts have been poured into energy efficiency. Several energy con-sumption datasets were henceforth published, with each dataset varying in properties, uses andlimitations. For instance, building energy consumption patterns are sourced from several sources,including ambient conditions, user occupancy, weather conditions and consumer preferences. Thus,a proper understanding of the available datasets will result in a strong basis for improving energyefficiency. Starting from the necessity of a comprehensive review of existing databases, this workis proposed to survey, study and visualize the numerical and methodological nature of buildingenergy consumption datasets. A total of thirty-one databases are examined and compared in termsof several features, such as the geographical location, period of collection, number of monitoredhouseholds, sampling rate of collected data, number of sub-metered appliances, extracted featuresand release date. Furthermore, data collection platforms and related modules for data transmission,data storage and privacy concerns used in different datasets are also analyzed and compared. Basedon the analytical study, a novel dataset has been presented, namely Qatar university dataset, which isan annotated power consumption anomaly detection dataset. The latter will be very useful for testingand training anomaly detection algorithms, and hence reducing wasted energy. Moving forward, a setof recommendations is derived to improve datasets collection, such as the adoption of multi-modaldata collection, smart Internet of things data collection, low-cost hardware platforms and privacy andsecurity mechanisms. In addition, future directions to improve datasets exploitation and utilizationare identified, including the use of novel machine learning solutions, innovative visualization toolsand explainable mobile recommender systems. Accordingly, a novel visualization strategy based onusing power consumption micro-moments has been presented along with an example of deployingmachine learning algorithms to classify the micro-moment classes and identify anomalous powerusage. K eywords Building power consumption datasets · energy efficiency · dataset collection · recommender systems · micro-moments · visualization. ∗ Energy and Buildings, vol. 227, 2020, 110404 a r X i v : . [ c s . C Y ] S e p PREPRINT - S
EPTEMBER
18, 2020
Recent studies have shown that buildings are in charge of more than 40% of power consumption demand and greenhousegas emissions around the world. Indeed, this steady growth in energy consumption has been closely tied to the increasingnumber of population and the rising levels of prosperity [1, 2]. Moreover, even climate conditions in certain regions ofthe world obliged households to call for more energy for heating, cooling, cooking and refrigeration needs. Businessbuildings, principally offices and university structures, are also viewed as structures exhibiting high power consumption[3, 4]. Consequently, the expansion of power usage and carbon emission as well as expensive energy prices in the abovementioned environments has made energy preserving a vital goal for various public authorities of all governments inorder to accomplish efficient energy reduction [5, 6, 7, 8, 9].The building energy sector has recently attract the attention of various public energy efficiency initiatives to accomplishgreater energy sustainability. Additionally, a clear correlation is found between household energy consumption and userbehavior [10, 11, 12, 13, 14]. Every day, new appliances are installed and used in households resulting in an incrediblerise of power demand. Monitoring the power usage of these appliances is dependably the initial step towards energypreserving. From this point of view, understanding and controlling user consumption behavior is also a key parameterto help householders reduce energy costs. According to the ongoing evolution of the Internet of Things (IoT), the useof smart meters for monitoring electricity consumption is expanding exponentially [15, 16, 17]. The up-and-cominggeneration of energy saving systems should be more effective, easy to follow and more challenging in order to improveend-user behavior.As of now, as well as other research areas, energy, environment and sustainable development research topics areencountering the urgency of developing openly accessible datasets. In fact, power consumption datasets progressivelycome to be more consistent when estimating the precision of power monitoring techniques and perceiving how goodthey may behave under realistic circumstances. Consequently, checking the precision of outputs in real scenarios iscritical in this research area [18, 19, 20]. Moreover, simulated database does not reasonably fit realistic datasets as “anexperimental database or repository would ordinarily have unpredictable and unexplained complication nature that islaboriously anticipated and most of the time can be laboriously hard to manage [21, 22]”. On that account, Energyscientists have proved that it is important to have open access databases that provide aggregated power consumption aswell as appliance based consumption for the various devices that constitute the overall consumption [23].Moreover, end-users behavior is responsible of wasting more than 20% of the total energy consumed in buildings, andhence it is a key element in energy consumption [24, 25, 26]. Therefore, it is of paramount importance to: (i) designreal consumption datasets; (ii) deploy novel platforms and smart-meters to collect granular and appliance-specificdata, which have the means or incentive for sharing consumption data with end-users; and (iii) develop tools thathelp end-users in understanding their energy consumption footprints, such as innovative visualizations, and furtherimplement novel strategies that help them in improving their behavior and reduce wasted energy [27], e.g. via deployingrecommender systems. In this context, seeking to study how to save energy and understand power consumptionbehaviors in buildings, various datasets have been collected globally. They provide a big amount of information andcreate an immense quantity of readings about daily power usage and user behavior. Therefore, the use of machinelearning (ML) algorithms becomes essential for handling large-scale datasets and extracting meaningful featuresfrom collected data. This can assist in many applications including forecasting power demand, energy efficiency andelectricity preserving, appliance recognition, cost prediction, and other things in relation to energy usage [28, 29, 30].In this respect, any ML algorithm for building power consumption deals with information drawn from smart meters andsolar panels during the different periods of the day. This huge quantity of data including multivariate time series is ofutmost importance to ML algorithms because future usage can be effectively anticipated [31, 32, 33]. In other words,ML algorithms can forecast such data and help in developing energy efficiency frameworks proficiently.Extremely inspired by the rising relevance of public datasets, we opted to give a general review of up to 31 existingdatasets in the field of building power consumption. Presently, householders, firms and public authorities are confrontingdifficulties to guarantee the energy efficiency and reduce usage costs. The use of a large number of appliances increasesthe energy demand, cost and carbon emissions as well. In this context, we present in this paper a deep overview ofvarious power consumption datasets, their taxonomy and classification. The taxonomy is adopted to examine existingdatasets, what kind of information they provide, their applications, their benefits and their limitations. To the bestof our knowledge, this is the first framework that provides a comprehensive and universal survey of building energyconsumption datasets, their applications and future trends. Following, discussions and important findings are presentedvia analyzing and comparing the features of existing datasets, their data collection platforms and related modules.Moreover, a novel dataset, namely Qatar university dataset (QUD) is proposed to answer the challenges and issuesraised from the analytical study. Afterward, valuable future orientations to improve the quality of power consumptiondatasets and enrich their content are discussed, in which a set of recommendations to use novel hardware devices and2
PREPRINT - S
EPTEMBER
18, 2020Figure 1: General framework of an energy efficiency system.platforms are described. Moreover, future direction to improve datasets exploitation and hence improve energy savingare also also identified. To summarize, this paper presents a set of novel contributions, which can be listed as follows: • Reviewing up to 31 building power consumption datasets, describing their properties and highlighting theirpros and cons via adopting a multi-perspective comparison based on various parameters. • Proposing a taxonomy of building power consumption datasets to assess the existing repositories based ontheir applications an characteristics. • Analyzing data collection platforms used to record power consumption datasets and related modules used fordata transmission, data storage and privacy concerns. • Presenting a novel dataset called QUD that responds to various issues raised in the analysis of state-of-the-artdatasets. QUD can be used for different applications, among them detecting of anomalous power consumption. • Providing a list of valuable future orientations for (i) improving datasets collection mainly through the useof novel hardware platforms, and (ii) improving datasets exploitation via adopting innovative tools such asvisualization strategies and explainable recommender systems.The rest of this paper has been organized into four sections. Section 2 reviews up to 31 existing building powerconsumption datasets and describes their usage contexts, properties, advantages and limitations. Section 3 presentsa comprehensive discussion about the different characteristics of existing power consumption datasets. In addition,a novel dataset called QUD is presented which presents new functionalities. In Section 4, challenging orientationsand future directions that should be followed in order to improve datasets collection and enhance datasets exploitationare described. Section 5 concludes the paper with a set of proposals for improving the quality of power consumptiondatasets and highlights future works. Finally, a list of abbreviations and nomenclatures used in this paper is presented inthe Appendix.
Several datasets can be found in literature and each one has its specific characteristics, making it difficult to select adatabase for treating energy efficiency issues. To this end, this work makes a deep comparison between all datasetsbased on various specifications, such as the period and region of collection, sampling rate, number of monitored houses,number of deployed sub-meters, collected features and release date. As a matter of fact, existing realistic datasets aredivided into two major groups; appliance-level datasets versus aggregated-level based databases. The first group classprovides sub-meter readings of appliance-by-appliance consumption. This kind of data is used for various applications,including energy saving [34], appliance recognition [35], occupancy detection [36, 37] and preference behavior [38, 39].The second class group focuses on collecting overall consumption profiles of different buildings. It can be employed forenergy disaggregation, energy efficiency, and further predicting energy consumption. Figure 1 illustrates a flowchart ofa dataset collection process along with its associated modules, required to pre-process, analyze and interpret powerconsumption patterns. This is a general representation that can be used for different applications.3
PREPRINT - S
EPTEMBER
18, 2020
To fit realistic scenarios of daily power usage and test energy efficiency solutions, scientists and specialists of smartenergy monitoring systems need power consumption databases, in which developed algorithms can be evaluated inadvance. Different databases have been collected and shared publicly. Under this section we review up to 31 powerconsumption datasets that are proposed in literature in addition to our novel dataset named QUD. We specify briefly thecharacteristics of each dataset and registered features in terms of current (I), voltage (V), active power (P), reactivepower (Q), apparent power (S), normalized power (Np), energy (E), frequency (f), phase angle ( φ ), power factor (pf),energy cost (EC), weather (Wt), Temperature (T), humidity (H), occupancy (O) and light level (L).In [40, 41, 42, 43, 44, 45], large-scale datasets are formed, namely, HES, IHEPCDS, UMSM, SustData, REFIT andDataport, respectively. While HES, IHEPCDS, UMSM and Dataport assembled energy consumptions patterns at aminutely level, SustSata and REFIT reported power usage profiles over intervals in seconds. All these databases provideconsumption records at the appliance-level for long periods of monitoring. For example; in HES and UMSM data areraised for a period of one year, in REFIT and SustData energy patterns are accumulated for 213 days and 1114 days,respectively. Further, different features are gathered during the experimental campaign, such as I, V, P, Q, S f and T.REFIT has also the particularity of providing EC in $. Dataport repository [45] is also quite similar to UMSM database,since it captures energy usage at the same sampling intervals of 239 households but for a short collection period oftwo months. Dataport repository is also quite similar to UMSM database, since it captures energy usage at the samesampling intervals of more than 1200 households for a long collection period, which is more than 4 years.In [46], OCTES is proposed, which is similar to REFIT. It records P, φ and EC ($). In addition, data are collected in ashorter investigation period. A bigger examination size and information are recorded at a comparable rate to REFIT. Itlists the power consumption of each house; nonetheless, other pieces of information about the houses are not providedexcept their geological position. The case study specfied in this work depicts the utilization of a sauna in one home; asthough, this data isn’t shared publicly. Consequently, a presumption should be put with regards to the energy usage. Inaddition to power consumption, REFIT provides also readings about temperature, light, and motion patterns expandedwith dwelling reviews specifying; size, age, warming sort, isolation,fabrication type and details about the tenants oroccupants, job description and age.In [47], Tracebase database includes power consumption patterns of various devices, which enables to examinedisaggregation. The readings are collected at a sampling rate of 1 second. This dataset can be utilized for energyefficiency applications. However, it can not be employed for appliance recognition, preference detection or energydisaggregation since no data are provided about the devices being investigated and their properties. It gathers data of 43distinct appliances, in which every one has various recordings from several days and several households. Furthermore,date and time records, P and Np are provided at a sampling frequency of 8 seconds.In [48, 49, 50, 51], AMPds1, AMPds2, ECB and PSD are proposed, respectively, which are minutely power datasets.Overall, AMPds1 and AMPds2 repositories are deemed as largely used databases, which compiled information of oneand two years, accordingly, with a sampling rate of 1 min. In fact, energy consumption of 11 appliances is observedusing 21 sub-meters. On the other side, ECB that provides electricity consumption benchmarks of 25 domestic residentslocated in Victoria State in south-eastern Australia is released. Consumption patterns were extracted from the aggregatedcircuit and for individual appliances over a duration of two years and at a sampling rate of 30 min. Further, consumptionfootprints of device-event labels from 10 homes in Austin, USA, were assembled.In [52] and [53] authors released MEULPv.1 and MEULPv.2 datasets, respectively. MEULPv.1 gives energy con-sumption readings of 12 Canadian households. Data were recorded at 1-min sampling rates at both the aggregated andappliance levels. A total of 8 appliances are monitored during the data collection process. Meanwhile, MEULPv.2provides one year monitoring of 23 households using a sampling rate of 1 min that designates aggregated and appliance-based consumptions as well.In [21, 54], RAE and GREEND databases are proposed, in which data are collected at a frequency of 1 Hz. TheRAE is the initial version of an energy consumption repository that includes 1 Hz recordings for aggregated andsub-metered levels of two households. Besides power information, T and H records from a house’s indoor regulatorare incorporated. On the other side, GREEND is proposed to describe detailed energy consumption patterns collectedthrough an experimental campaign via assessing electricity usage of various individual appliances in Austria and Italy.During the collection campaign, eight households are monitored, where each one contains up to nine different individualdevices. The power usage patterns at a device-level are gleaned at a resolution of 1 Hz through a period of six months.In [55, 56, 57], ECO, IWAE and DRED that capture energy information at 1 Hz sampling intervals are nominated,accordingly. ECO is an entire measurement campaign managed in order to collect comprehensive information ofconsumption patterns in six Swiss homes through an eight months duration. During the collection campaign, I, V, and π are collected from aggregated circuits and a set selected appliances at a sampling frequency of 1 Hz. Through the IAWE4 PREPRINT - S
EPTEMBER
18, 2020campaign, measurements were performed in a pilot household with three floors in Delhi in order to measure power,water and environmental profiles. Data are collected for a duration of 73 days from May to August 2013. In addition,33 sub-meters are deployed through the whole house. DRED is publicly launched to capture energy, occupancy patternsand environmental data of one pilot house in the Netherlands. Sensor units are installed to measure aggregated energyconsumption and appliance level electricity usage. In fact, 12 different domestic appliances are sub-metered at samplingintervals of 1 min while 1 Hz sampling rates are used to gather aggregated consumption.In [58], DISEC is launched, in which various data are collected for 19 apartments at an Indian faculty housing complexduring 284 days. Different features, such as P and Wt, are collected in a 30 seconds sampling intervals and thenaggregated to 15 min, 30 min and 60 min intervals. As well, Wt variations are updated through measuring atmosphericconditions from nearly station measurements.In [59] and [60], two hourly electricity consumption datasets are proposed. The first one called CRHLP includes energypatterns of 16 residential and commercial buildings monitored at every hour for a period of one year. Additionally, solarradiation and meteorological records are also collected. The second one, namely HUE, captures long-term energy usageprofiles from five households with a sampling frequency of 1 hour. Furthermore, while device-level consumptionsfrom house 1 are collected for a period of two years with sampling intervals of one minute, data from house 2 areextracted for a one year period with a resolution rate of 1 Hz. In [61], UK-DALE is proposed, which summarizes thecurrent and voltage profiles of three houses at sampling intervals of 16 KHz and two houses at sampling frequencies of1 Hz. Moreover, patterns of individual devices of five other households are collected at a sampling rate of 6 seconds forvarious periods varying from 39 to 655 days.In [62, 63, 64], REDD, BLUED and BLOND datasets are proposed. Energy consumption records are captured at asampling frequency of more than 10 kHz. The monitoring process, by contrast, is conducted for only a few weeks.For example, in REDD, six households are monitored, where the aggregated electricity consumption is measuredat a sampling rate (15 KHz). Also, electricity consumption reviews of up to 24 devices are monitored at samplingintervals of 0.5 Hz. Furthermore, load patterns of other 20 appliances are observed at a frequency of 1 Hz while BLUEDresumes the current and voltage readings of an individual household in Pittsburgh, Pennsylvania, USA. Data are listedat a sampling frequency of 12 kHz over a period of one week. For BLOND, it aims to capture continuous powerconsumption data. It delivers voltage and current records at the aggregated and device levels. This database includesdata from 53 devices that represent 16 appliance groups. It englobes two main repositories; (i) BLOND-50 that in turnhas consumption data obtained at sampling intervals of 50 kSps for grouped circuits and 64 kSps for individual devices;and (ii) BLOND-250 that entails usage patterns for a period of 50 days gathered using sampling rates of 250 kSps at theaggregated-level and 50 kSps at the appliance-level.In [65], PLAID expresses power consumption profiles for more than 56 specific domestic equipments that representabout 11 appliance categories. Data are captured at a sampling frequency of 30 kHz that is judged among thehighest resolution frequency used in existing building power consumption datasets when collecting load profiles. Inaddition, energy consumption information is captured for a period of three months during the summer of 2013 and themeasurement campaign has been carried on in Pittsburgh, Pennsylvania, USA.ACS-F1 [66] and BERDS [67] datasets that monitor load patterns at a comparable sampling rate are proposed. ACS-F1records the amount of energy used in a set of households at an appliance-level. In this context, electricity sub-meterswere employed to measure the energy consumption of 100 house devices that represent 10 appliance classes. Powersub-metering is managed at sampling intervals of 10 seconds for a period of only one hour. This database is especiallysuitable for appliance recognition applications. On the other side, BERDS collects energy consumption outlines at 20seconds sampling rates for a period of one year.In addition, QUD is presented in this framework, which is based on an appliance-based collection campaign. It can beused for different for different purposes, such as the energy saving, anomaly detection and energy demand prediction.QUD is collected using a system that incorporates sub-metering modules registering power consumption footprints interms of P and other indoor climate conditions, including O, T, H and L. The data are recorded with sampling intervalsranging from 3 seconds to 30 min. The collection process will be spread over a one year period, while three months ofdata recording have been already completed.
Power consumption datasets are split into two main groups: Appliance-level versus aggregated-level. The first onetraces power consumption arrangements of individual devices. The second one provides the whole power consumptionof households. Datasets can also be classified based on different aspects including application purposes or the nature ofbuildings, where data are acquired among which households, commercial buildings, academic buildings, industrial, etc.Figure 2 details the global taxonomy of various building power consumption datasets found in the literature.5
PREPRINT - S
EPTEMBER
18, 2020
Domestic power consumption datasetsAppliance level Aggregated levelA1. Energy saving A2. Appliancerecognition A3. Occupancy detection A4. Preference detection A7. Anomaly detection A1. Energy saving A5. Energy disaggregation A6. Demandprediction
Figure 2: General taxonomy of existing building power consumption datasets.
Using detailed power consumption readings and based on the nature of data collection procedures at applianceor aggregated levels, existing datasets could be exploited for various applications including, but not restricted to,energy saving, appliance recognition, occupancy detection, user preference detection, abnormal detection, energydisaggregation and energy demand prediction.
A1. Energy saving:
Investigating the building sector in terms of energy saving which is a principal element of itsenvironmental and financial effects is of utmost importance. Consequently, energy saving is the most popular applicationof building power consumption datasets [68, 69, 70]. It can effectively reduce energy bills and decrease carbon dioxideemissions. It is made out of the following four stages: (1) the dataset collection stage, in which information is reapedfrom various sources, including energy sub-meters, ambient condition sensors and climate sources. The informationgathered from these heterogeneous sources is saved in a specific dataset; (2) the pre-processing step, in which theinformation stored in the first step is pre-processed before utilizing various ML strategies. the pre-processing includesdata cleaning, data resampling, features and events extraction and normalization; (3) The learning stage, in which MLalgorithms are utilized to learn functions and models; and (4) The adoption of visualizations and recommendationsphase, in which visualization tools are first adopted to provide end-users with interpretation of their consumptionpatterns. Following, specific recommendations or directives are derived in order to promote energy efficiency behaviors.Since the energy saving application is very relevant, we focus in this paper on studying how to improve systemsdeveloped in this direction along with related applications.
A2. Appliance recognition:
Appliance recognition systems can help detecting operating conditions of devices usingcollected power usage patterns, and thoroughly recognizing the nature of each appliance [71]. In [72], a model wasdesigned to detect the device activity and then to associate activities with devices using collected data. Analyzing powersignals and checking relations among activities can assist detecting unattended devices, which use energy power withouttaking part the domestic’s activities. In [73], in order to fit realistic conditions, experiments are usually conducted on aset of building power consumption databases, such as ACS-F1, PLAID, BLUED and UK-DALE.
A3. Occupancy detection:
Solutions presented in this area detect individuals’ occupancy in each specific part of abuilding based on power consumption profiles, as well as other environmental specifications, such as the temperature,humidity, luminosity and carbon dioxide emissions [74]. Dataset patterns are inspected before using ML approaches toderive the occupancy of the monitored part. Generally, occupancy is detected in two stages; (i) the presence or absenceof individuals is investigated; and (ii) the number of individuals in the monitored building/room is then calculated[75, 76, 77]. In [78], a set of ML models as well as their boosting forms are developed and tested to detect occupancyusing collected data from the AMPds2 measurement campaign.
A4. Preference detection:
Methods described in this class deal with evaluating individual preferences throughanalyzing energy usage profiles. Most approaches treat the thermal comfort, although there are other arrangements thataddress visual comfort. Works released in this area investigate information-driven methodologies from an ML point ofview and yielded arrangements that determine the preferences (e.g. the habits related to appliance usage) even through6
PREPRINT - S
EPTEMBER
18, 2020getting reports from individuals, i.e. information labeling or via observing the historic behavior of end-users to construe(in a straightforward manner) their consumption priorities or contexts that satisfy their well-being [38, 79].
A5. Energy disaggregation:
Energy disaggregation is the issue of segregating the overall power consumption recordinto particular signals, in which each one represents an individual consumption of each electrical device [80, 81, 82, 83].This is valuable since getting separated power consumption of each appliance helps individuals to save energy andprovide consumers with indexes on how to make appropriate actions [84]. Most of existing energy disaggregationframeworks resolving the problem of non-intrusive load monitoring (NILM) attempt to segregate the overall energyconsumption without utilizing separate meters for each appliance [85, 86, 87, 88, 89]. For this specific application,REDD, BERDS, REFIT, AMPds1 and AMPds2 datasets are reputed among the famous repositories used for energydisaggregation.
A6. Demand prediction:
ML algorithms generate precise power demand forecasts and they can be selected bypublic authorities and project managers instrumenting energy-efficiency procedures [90, 91, 92, 93, 94]. For domestichouseholds, academic and industrial buildings, if the power demand could be predicted using ML strategies, directivesand mechanisms that should be followed in advance can be established with a view of reducing load consumption ofequipments and appliances inside these infrastructures[95, 96, 97, 98]. Moreover, even if most the above presenteddatabases (Section 2.1) are used for energy forecasting, we can find in the literature other datasets that are only designedfor the specific problem of load and energy price forecasting, such as GEFCom2012 [99] and GEFCom2014 [100].
A7. Anomaly detection:
With the progressive widespread use of smart-meters and smart sensors to monitor loadusage in households, the utilization of power consumption observations as a solution to detect abnormal usage ofenergy is absolutely fascinating. Specifically, early detection approaches can be deployed to identify a large set offailures. In addition, recent works illustrate that for example, anomalous in lighting appliances can be responsible of2–11 % of the whole power consumption of households and and commercial structures [101]. Furthermore, detectingfaults or anomalies can permit analysts to comprehend energy consumption behavior of end-users and to be consciousof unpredictable energy usage values [102, 103]. Various data mining approaches have been explored and deployedto detect anomalous events during energy usage process [104, 105, 106, 107, 108, 109]. In addition, it is worthy tomention that there is an absence of annotated datasets dedicated to power consumption anomaly detection.However, in order that a dataset could be correctly and efficiently used for a specific application, it should respectsome specific requirements. For energy disaggregation, datasets should include both aggregated and appliance-levelconsumption fingerprints to compare the results obtained from disaggregation solutions with individual patterns. Toconduct a user preference detection or even an occupancy detection, datasets should encompass appliance-level powerconsumption because it is difficult even impossible to infer user preferences from aggregated data. In addition, foroccupancy detection, it is also required that consumption and ambient condition should be gleaned from individualappliances and from various parts of the building. For anomaly detection, it is of utmost importance that it includeslabels annotating normal and anomaly consumption footprints to train developed algorithms. Lastly, for energy demandprediction, collecting power consumption at appliance-level or aggregated level will be appreciated, however, thecollection period should be long to be useful.
Aiming to extract representative outputs and relevant interpretations, a deep comparison study of existing buildingpower consumption datasets is conducted in this section. Various dataset properties are investigated, which have a greatimportance when collecting data for developing energy efficiency solutions. Table 1 presents a comparative investigationof existing power consumption datasets. The analysis is built based on various characteristics that were collected in eachdataset, including the region and period of collection, number of monitoring houses, number of monitoring appliancesper house, collected features, sampling rate and release year. Additionally, we check and compare collected features foreach database.
Data collection platforms used to glean big energy consumption fingerprints are significantly impacting the energyefficiency systems. Specifically, sensing devices and attached platforms have a big role in gathering and safely storingdata in appropriate databases. In this line, in this subsection, we focus on inspecting different architecture platformsused in the literature to collect energy consumption datasets and their properties, including wireless capability, datalogging process and and data storage. In addition, because of the nature of collected data and their public accesscapability, privacy concerns are of utmost importance when producing datasets. Specifically, transmitting and sharing7
PREPRINT - S
EPTEMBER
18, 2020Table 1: Features comparison of existing building power consumption datasets. Φ Φ
10 sec A2 201313 AMPds1 [48] Vancouver, Canada 1 year 1 21 I, V, P, Q, S, pf, F 1 min A1,A2,A5 201314 BERDS [67] Berkely, USA 1 year / 4 groups P, Q, S 20 sec A1 201315 ECODS [55] Switzerland 8 months 6 / I, V, Φ $ ) 8 sec A5 201524 OCTES [46] Scotland, UK 4 - 13 months 33 Agg P, EC ($) individuals’ real-time power usage footprints and further their identities are probably quite harmful. To that end, itis important to investigate if the connections to the servers are secure or not in the presented dataset platforms. It isworthy to mention that in this section we focus on analyzing hardware architectures and related modules for only thedatasets from Table 1, which present a description of their implemented platforms.In [43], a power consumption monitoring and feedback platform is deployed, which is based on the use of sensorsand a notebook for recordings data, storing them on MongoDB database, performing calculations and providingfeedback to end-users. In [44], readings from several smart appliances are collected and transmitted using a commercialcommunication gateway called Vera3 smart home controller. The latter uses an encryption protocol to transmit databefore their storage in a MySQL database. In [49], electricity footprints are gleaned using industrial meters andtransmitted using a commercial platform named Obvius AcquiSuite EMB A8810, which includes many feature, amongthem the security provision. After that, they are stored offsite on a MySQL database server. In [54], platforms based onRaspberry Pi or BeagleBone along with a Plugwise Basic kit4 are used, in which collected data from sensing outlets aretransmitted via a Zigbee network. Collected data are then stored on via a remote storage on a MySQL server withoutconsidering privacy concerns. In [62], a wireless plug monitoring device with an off-the-shelf system are used to collectpower consumption data before transmitting them to central server. To keep the privacy of end-users, REDD dataset hasfocused only on hiding the identity of end-users and without deploying any secure protocol for data transmission.In [21], power consumption readings of several appliances are wirelessly gathered using a data acquisition platformbased on a Raspberry Pi 2B. Then, data are locally stored on an USB drive. In [61], a Nanode platform is used8 PREPRINT - S
EPTEMBER
18, 2020Table 2: Example of data collection platforms and their properties used in different datasets.
Dataset Platfom Wireless Data logging Data storage Privacycapability considerationREDD Laptop + smart meters yes - Hard drive noGREEND Raspberry/BeagleBone yes JSON MySQL server noSustData Laptop + smart meters non JSON MongoDB server noREFIT Vera3 smart home controller Vera3 yes JSON MySQL server yes+ smart plugsAMPDS Obvius AcquiSuite EMB A8810 non SQL MySQL server yesRAE Raspberry Pi 2B + sub-meters yes XML Local storage (USB drive) noUK-DALE Nanode (Atmel ATmega328P) yes JSON Nanode base station noENERTALK ENERTALK PLUG yes Hadoop NoSQL Hdoop database noQUD NodeMCU + Raspberry Pi 4B yes JSON No-SQL CouchDB server no+ smart sensors to wirelessly collect consumption data from individual appliance monitors and current transformers. Following,gleaned data are stored in a Nanode base station. It is worthy to mention privacy issues have not been considered.In [111], consumption records are acquired using a commercial plug, namely ENERTALK PLUG, which includes amicrocontroller unit to process and save them in a device storage unit. After that, they are wirelessly transmitted to adata collector server. Finally, data are saved on a NoSQL Hadoop database server.In this framework power consumption is measured using submeters components such as NodeMCU and SEN-11005current transformer. Furthermore, occupancy patterns, luminosity, temperature and humidity data are also recordedusing smart-sensors and then transmitted wirelessly using Raspberry Pi 4 Model B platform. The latter includes aNo SQL CouchDB server that is used to store the gathered data using the JavaScript Object Notation (JSON). JSONrepresents a vastly used text format for data exchange, which keeps data structure without adding notation overhead.Table 2 summarizes the properties of hardware platforms used to collect different datasets, including wireless capability,data logging process, data storage and privacy consideration.
Under this framework, a large number of building power consumption databases have been described, reviewed andevaluated according to different parameters as indicated in Table 1. In what follows, we derive pros and cons of eachdataset, based on what has been discussed in the previous lines. This can adequately guides us to map recommendationsfor enriching and improving energy consumption databases. • The biggest databases in terms of length and period of study are UMSM, HES, SustData and REFIT. Otherwise,for the case of HES, the observing period is too short and the sampling frequency of 2 minutes is a bit big. Thesame for UMSM, where data are gathered at a sampling rate of 1 min. Therefore, these datasets are inadequatefor energy disaggregation as it will be difficult to differentiate between individual devices and occurrences.In contrast, these two repositories provide properties data about monitored homes, among others, the natureof building, size and rooms number and occupants number. Moreover, even if SustData and REFIT use asampling rate of 8 sec and 10 sec, this is still not enough when conducting a real-time monitoring. • In some databases, e.g. PLAID, REDD and BLUED, high frequency monitoring is proceeded for only afew number of houses. This draws upon the prerequisites of energy disaggregation, where comprehensivecharacteristics catching transitory behavior can be extracted when high frequency collection is explored. • The majority of databases were gathered in the USA and Canada, under a 120V voltage and European nationsunder 230V. It can be deduced from Table 1 that the existing databases are collected in 13 different countrieswhich are located in four continents; inter alia, America, Europe, Asia and Australia. In this context, thesereal databases have been produced in distinct climate zones, which cover humid regions (UMSM, REDD,BLUED), humid semitropical (IAWE), marine west coast atmosphere (UK-DALE,HES, AMPds1, REFIT,Tracebase, BLOND, IHEPCDS and OCTES), Mediterranean weather (BERDS) and arid zone (ECB ad QUD).However, no databases from Africa countries have been gathered under this investigation since there is nowork in the literature who treat this topic in such countries. Moreover, to the best of the authors knowledge,9
PREPRINT - S
EPTEMBER
18, 2020QUD is the first dataset in the Middle East, where ordinarily 240 V voltage is used. Also, some collectedparticularities; for instance, the climate and environmental data depend on the location of the monitoringcampaign. • The number and nature of monitored appliances, just like the number of observed houses essentially restrainthe final usage of databases. In particular, a high number of houses and appliances is required for statisticalinspections. In this case, UMSM, Dataport, OCTES, TraceBase, REFIT and HES are the most suitabledatabases. Plus, some datasets supervise various houses through multiple time intervals leading to difficultyand even impracticality while comparing between different homes. More than that, the setting under whichdomestic equipments are employed throughout the day is a basic operator for analyzing the complexity ofthe usage. This way, experimental campaign should be conducted in real conditions such as households,laboratories, or offices as opposed to simulated environments. • Some databases collect short-term energy consumption and only deliver records of real power, this is the caseof COOLL, PSD, ACS-F1 and BLUED. Eventually, seasonal energy usage attitude can not be captured forshort-term periods. In this aspect, making use of these databases to track power consumption behavior ofend-users is not suitable. • A number of databases, among them UMSM, IAWE , ACS-F1, AMPds1, RAE and DRED have furnished a setof electric parameters, including I, V, P, Q, E, f and φ . Additionally to these records, other conditions such asT, O and L are also reported in QUD and REFIT datasets. The latter provides also analytic information notablyrelated to the monitored electrical appliances and integrates statistics about daily activities in dwelling andresidential environments, as well. This endows better a interpretive depth comparable to identical repositories(REDD, BLUED, GREEND). • Most of the studied datasets did not capture the exogenous conditions, such as the weather temperature,humidity, which can affect effectively the energy consumption. However, while the REFIT dataset has identicalproperties to OCTES and ACS-F1 datasets, it is also different because it adds other environmental dataincluding the temperature, light, and motion patterns. In addition, household reviews are also reported, whichinclude the surface, age, heating system, insulation, nature of buildings along with other data specifying thenumber of individuals, job quality and age. In this context, quantitative statistics gleaned from the reviewswith occupants’ statistics offer more possibilities to researchers to study the influence of other parameters onenergy consumption. • There is a lack of available publicly annotated power consumption datasets to train/learn anomaly detectionalgorithms, in which power consumption variables are clearly labeled as normal or anomalous. Specifically,all the investigated datatsets in this framework except QUD do not encompass labels that identify normal orabnormal consumption, and thereby they can only be used to train unsupervised anomaly detection algorithmsbecause they do not require annotated datasets. • Privacy and security concerns have not seriously been considered in most of the existing datasets. This is dueto the fact that conventional meters required to be physically accessed and they registered power consumptionfor longer time periods (i.e. the real-time monitoring was not considered).
Using the pros and cons of the state-of-the-art datasets presented in the previous section, a measurement campaign hasbeen conducted in the Qatar university energy lab to glean QUD repository. Specifically, in order to compensate theundersupply of appliance-level datasets dedicated for energy efficiency and anomaly detection in power consumption, areal-time micro-moment laboratory has been developed to gather accurate power usage footprints. Put simply, QUD isa set of consumption records from different installed electrical devices (e.g. air conditioner, heating system, desktopand light lamps) in addition to contextual data; including humidity, temperature, room occupancy and ambient lightintensity. To the best of our knowledge, QUD is the first dataset in the Middle East, in which consumption data arecollected at an ordinarily 240V voltage. This dataset have multiple usage scenarios such as detecting consumptionabnormalities, testing recommender systems and assessing innovative visualization tools. Moreover, it is worth notingthat QUD is among the first annotated repositories dedicated for anomaly detection in power consumption.Therefore, the time-series data representing power consumption footprints for two appliances are registered alongwith corresponding cubicle occupancy, indoor temperature, indoor humidity, and luminosity. In order to label QUDconsumption observations, the micro-moment paradigm is used which helps in identifying the moments of good oranomalous usage. Specifically, the micro-moments are deployed to come up with accurate statistics about consumers[10, 34]. Using this dataset, the power consumption observations are labeled via the use of five micro-moment classesaccording to a set of standards out of the yielded appliance. These five micro moments are defined as; “good usage”,“turn on”, “turn off”, “excessive power consumption”, and “consumption when outside”. The last two micro-moments10
PREPRINT - S
EPTEMBER
18, 2020Table 3: Micro-moments assumption and labeling
Micro-moment Label Description
Good usage 0 Non-excessive usageTurn on 1 Switching on a deviceTurn off 2 Switching off a deviceExcessive consumption 3 Consumption > 95% of device’s maximum active power consumption levelConsumption when outside 4 Device consumption without the presence of the end-user
Future directions
Improving datasets collection Improving datasets exploitationMulti-modal data collectionSmart IoT data collectionPrivacy and security considerationLow-cost hardware platforms ML for large-scale datasetsInnovative visualizations Mobile recommender systems
Figure 3: Future directions to improve both datasets collection and exploitation.represent anomalous consumption behaviors that are leading to much wasted energy. Table 3 describes the micro-moment classes and labels used in QUD (QUD can be accessed through http://em3.i-know.org/datasets/ ). Inaddition, it is worthy to mention that the micro-moment “consumption when outside” is limited to a set of appliances,such as air conditioners, televisions, light lamps, desktops/laptops, and fans, in which the end-user should be presentduring their operation to not be considered as an anomalous consumption [112].
After analyzing, comparing and capturing pros and cons of existing datasets, a set of important orientations that canimprove data collection and enrich datasets’ content are identified. In addition, other directions to improve datasetsexploitation are described as well. Figure 3 summarizes the future directions that are identified to improve both datasetscollection and exploitation.
In order to develop powerful energy efficiency systems, it is of paramount importance to improve dataset collectionprocedures and hence enhance the content of collected data. In this respect, the following recommendations anddirections can be establish:
Multi-modal data collection means merely collecting more than one type of data to accomplish an efficient energysaving task or other related applications. Specifiably, power consumption in buildings depends on multiple factors,which should be gleaned together power consumption footprints in order to design comprehensive datasets [113, 114].Figure 4 summarizes the principal parameters impacting the power consumption in buildings and contributing in themulti-modal data collection.
D1. Occupancy patterns:
Domestic residents utilize more energy when they’re occupied. Even this may appearglaringly evident, collecting occupancy data is a serious matter that must be inspected when searching for wastefulenergy aspects in households. Specifically, we ensure that these structures consume less power when unoccupied.Individuals in households influence power consumption for the most part via lighting, cooling, heating and other plugloads. Analyzing power consumption for the duration of the day demonstrates an immediate relationship amongst11
PREPRINT - S
EPTEMBER
18, 2020
D1. Occupancy patterns (presence/absence of end-users)
Multi-modal data collection
D2. User behavior (preferences and habits of end-users)
D3. Weather data (outdoor and indoor ambient conditions)
D4. Energy cost (direct information on energy price)
Figure 4: Principal factors impacting the power consumption in buildings and contributing in the multi-modal datacollection.occupancy and power usage. For the moments when individuals are in a household, different rooms are conditionedor heated to an agreeable temperature. Of course, normal day-by-day activities need also power usage. The effect ofindividuals utilizing energy in a household is the reason we underscore the relevance of individuals turning off unusedappliances or other devices in unoccupied rooms [115, 116]. In this regard, the use of occupancy sensors is highlyrecommended in households or other buildings such as offices, laboratories or campus buildings to sense when someoneis present or not and then turn off the appliances accordingly. By this way, a lot of energy can be preserved when theabsence of individuals is confirmed.
D2. User behavior:
Comprehending and improving individual power consumption behavior is among the successfulapproaches to reduce energy demand and encourage energy preserving. In fact, user behavior can be responsible ofabout 20–50% of the consumption level [117, 118]. Therefore, collecting and inserting data about end-users’ behaviorsin the energy efficiency model can significantly decrease wasted energy. This can be done through gathering informationrelated to their preferences and habits [119, 120, 121].
D3. Weather data:
Relation between weather circumstances and power consumption has been proved in several works[122, 123, 124]. As a matter of exemplification, peak energy demand during heat waves is widely seen in so many hotcountries. For that reason, gathering weather data is regarded as crucial while investigating user behavior. Over andabove, existing and newly built households will certainly undergo the impact of climate change. Accordingly, collectionand measurement of new energy consumption databases of these houses ought to consider weather patterns that integratecertain repercussions of climate change, rather than only considering historical climate information [125, 126, 127].
D4. Energy cost:
Estimating the cost of household power consumption and providing this data to end-users canmotivate them improving their behavior [128]. Forecasting the user’s electricity bill and integrating energy price signalsin energy efficiency applications can effectively increase power saving [129, 130, 131] since it helps the consumer tocognitively bridge the gap between consumption and cost. Moreover, collecting energy price profiles at the appliancelevel makes it unambiguous for the user which appliance raises more the cost. As a result, the consumer can relativelybehave in order to reduce wasted energy.
Conventional meters are not able to gather the type of granular and device-level data, however, this becomes possibletoday with smart meters. In this line, in order to achieve target requirements in relation to data accuracy and furthersupporting real-time data collection and analysis, deploying smart meters and Internet of things (IoT) sensors to ensurea smart IoT data collection strategy is of paramount importance [132]. This helps in optimizing the communication,storage and computing resources.
In order to reduce the cost of datasets collection, the use of hardware platforms, enabling more cost-effective andpowerful alternatives to process and transmit collected data is a high priority, such as the Raspberry PI 4 (RPI4) modelB [133], ODROID-XU4 [134] and Jetson TX1 [135]. Those platforms can monitor energy consumption data alongwith collecting other essential contextual information, which ultimately results in a larger pool of data.12
PREPRINT - S
EPTEMBER
18, 2020
To preserve the end-users’ privacy, power consumption footprints and end-users personal information should beprotected. Personal data related to end-user specific power consumption patterns can be exploited to identify andsupervise behavior patterns inside buildings (households or public structures). This is possible since electrical devicese.g. the microwave, air condition, washing machine, dishwasher, etc. can be detected and recognized from their powerconsumption fingerprints [2]. Therefore, personal data related to consumption signatures may be deployed to carry outreal-time surveillance of end-users. In this regard, the data collection process must encourage producing challengingdatasets and make power consumption statistics available to end-users and energy providers while respecting end-users’personal privacy and security. To that end, adopting robust techniques to remove personal information is a must,including encryption, steganography and aggregation.
Almost energy efficiency systems are built and validated using energy consumption datasets, which make them veryimportant. Further, with the increasing amount of data collected in each database, the need for challenging solutions thatcan extract comprehensive information is becoming inevitable [136]. In this section we present three main directions,which can be investigated to ameliorate energy saving initiatives. It is worthy to mention that although the followingdirections are from a consumer’s perspective, however, they are valuable for both consumers and energy providers.Specifically, they are generally developed by the latter and deployed to the benefit of end-users to help them inoptimizing their energy usage. In addition, as discussed in Section 1, consumers are responsible for wasting more than20% of the total energy consumed in buildings [24, 25, 26].
Lastly, it is noticed that mobile smart devices are becoming an indispensable part of our daily life. Unlike earlier mobilephones that provide limited functionality, smart phones can do a variety of very useful jobs. With the widespreadusage of smartphones and the fast growing of the internet and network facilities, a massive amount of data is produced.Consequently, modern societies have started the age of Big Data through successfully discovering users’ possibledemand and preferences. This has raised the necessity for data scientists and energy management stakeholders toconduct studies on mobile recommender systems for controlling users’ energy efficiency [137].Recommendation systems are commonly deployed to polish the use of smartphones and to assist in dealing with the largeamount of data through establishing appropriate advices using recommendation schemes and contextual information.In this regards, the role of recommender systems will be essential to promote energy efficiency and help end-usersunderstanding and improving their consumption footprints [138]. More specifically, every particular recommenderapplication is generally elaborated with an explicit context in mind with the aim of solving in some sense the dataoverloading issue due to the large-scale datasets of power consumption. The effectiveness of a mobile recommendersystem has been demonstrated through real-use applications in academic buildings [137], in which a context-awarebased recommender app is developed to help in supporting end-users to transform their energy consumption habits.Furthermore, the architecture of recommendation systems that is usually based on interactional models, graphic userinterfaces and recommendation engines makes them productive and useful to deal with energy efficiency applications[139]. To summarize, the use of mobile recommender systems is recommended to improve power consumption datasetsexploitation via: • Developing explainable recommender systems can be very supportive to improve data exploitation with aview to replacing inefficient energy habits with efficient ones. An explainable recommender system aims atproviding end-users with tailored recommendations, followed by explanations about them [140]. Explanationsrefer to the motivations behind the recommendation or to the benefits from providing the recommended actionor advice. They can enhance the persuasiveness of the system, end-users’ understanding and satisfaction andprovide an immediate reward to them. • Developing intelligent mobile home monitoring systems using collected data to provide information andmonitoring options to the end-user to help him control its load usage, visualize consumption statistics andcompare them to those of other users, and further predict the overall charge of monthly bills [141].
Visualization is seen to be the most effective way to assimilate increasingly large datasets with the aim of interactivelyand perfectly conveying insights to end-users, consumers, and stakeholders in general. Recent tools, methods, andsoftwares leveraged for visualization of energy consumption require further improvements to remain more important13
PREPRINT - S
EPTEMBER
18, 2020
Time & Date (min) P o w e r c o n s u m p t i o n ( w a tt s ) Time & Date (min) P o w e r c o n s u m p t i o n ( w a tt s ) Micro-moments034
Figure 5: Time-series power consumption of a television and its micro-moments scatter plot from DRED: top)time-series power consumption, and bottom) micro-moments scatter plot at a sampling rate of 3 min.in a planet with larger low-carbon emissions. Moreover, they are required to sensitize energy-consuming behavior inan approachable and stimulating way. In this context, we present in this section an example of a novel visualizationapproach based on micro-moments analysis. Figure 5 displays a time-series energy consumption of a television and itsmicro-moments scatter plot at sampling intervals of 3 min, recorded in DRED dataset. This novel visualization strategyis presented as an example, in which energy usage micro-moments of 2 days are captured and plotted. It is used todisplay good usage (class:0), excessive usage (class: 3) and consumption while outside (class: 4). Users can seamlesslyget the plots at different sampling rate starting from the milliseconds.As it can be deduced, tracing micro-moments through time patterns facilitates identifying moments of abnormalconsumption and then makes it easy to establish precise guidelines helping to reduce energy waste. Moreover, thishelps end-users understanding their consumption footprints, increasing their awareness, and hence triggering them toimprove their behavior through the use of tailored recommendations. In addition anomalous consumption behaviorscan be identified when an adequate visualization tool is adopted, e.g. the micro-moments visualization, and henceend-users can improve their behaviors based on the detected anomaly. Moreover, it is worth noting that the use of themicro-moments paradigm to detect anomalous consumption can be enlarged to identify other kinds of anomalies, e.g.detecting abnormal consumption of an air conditioner while doors/windows are open via considering other informationsources. Therefore, end-users will be provided with the appropriate notifications and advices, i.e. close doors/windowsto reduce wasted energy.In addition, a set of valuable recommendations and future directions towards designing effective visualizations aimingto increase end-users energy awareness is summarized as follows:14
PREPRINT - S
EPTEMBER
18, 2020 • Visualizations need to catch the attention of their users via using bright colors, contrasts and varied views,where somethings are changing constantly. More importantly, they should implement colors that are legiblefor people with color vision deficiencies [142]. • Visualizations also need to employ comparisons between different time period consumptions to stimulateenergy users’ reflection and learning. Individuals are highly motivated in preserving power usage throughmaking comparison of their current consumption to their own previous consumptions. • Fragile long time-series aggregated systems for visualizing energy consumption are better to be avoided. Bynature, the human being’s mind can not always memorize all what they do in each second or which appliancedo they use more in their daily routine. Hence, it was concluded by many studies that: a. Collecting appliance-level consumption data is highly needed for consumers to cognitively decode theirconsumption data and to finally make a decision towards changing their behavior [143]. b. Instantaneous short-time intervals are the best way to aid the consumer to easily figure out at what instancetheir energy cost increased [144]. This makes it possible for the users to ameliorate their behavior and to noticethe direct effect from it. •
3D visualizations of households with real-time energy consumption statistics can impart better contextualunderstanding to homeowners and help them control practices shaping their consumption. In relation, it isalso crucial to support that by exploring the deployment of other emerging technologies, including virtual andaugmented reality, interactive visualizations and wall sized presentations.
Another important direction that can improve the quality and exploitation of power consumption datasets relies ondeploying ML algorithms, which can help significantly in reducing energy consumption and thereby decreasing energycosts and carbon dioxide emissions [145, 146]. Therefore, boosting novel progresses and challenges with regard to MLmodels is of paramount importance. In this context, several directions could be identified in which ML play a majorrole when shifting to more sustainable end energy efficiency environment, among them: • The use of generative models, such as generative adversarial networks (GANs), which can tremendouslyimprove the quality of collected data via completing incomplete power consumption signals (due to data lossoccurred during the collection step) and hence leading to a better exploitation of the produced datasets indifferent applications[147]. • The use of deep learning models to identify consumption anomalies via classifying the micro-moment classesof end-users’ power consumption [34]. These algorithms assist in analyzing consumption footprints to detectabnormal consumption related to “excessive consumption” and “consumption when outside”. After definingthe micro-moments described in section 4.2.2, DNN and other ML algorithms with different configurationparameters could be deployed, including logistic regression, linear discriminant analysis (LDA), supportvector machine (SVM), K-nearest neighbors (KNN), decision Tree (DT) and ensemble bagged tree (EBT.The selection of the appropriate ML model is mainly based on ensuring the best compromise between theidentification performance and computational complexity. Therefore, this results in a better exploitation of thecollected datasets for identifying abnormal power consumption.
Existing building power consumption datasets have been reviewed together with their merits and drawbacks withinthe lines of this work. Comprehensive comparisons between these datasets have been conducted in terms of differentfactors and data collection platforms that categorize each of these. Based on the fruitful analysis gleaned fromthese comparisons, a novel annotated dataset has been presented, namely QUD, which can be very useful for powerconsumption anomaly detection since it includes labels of good and anomalous usage. Moreover, we came withrecommendations and future directives for improving the quality and the content of future power consumption datasets.Thus, a set of relevant orientations have been identified as follows: • Adopting a multi-modal data collection, which is based on gathering various data sources because the energyconsumption is affected by different factors. Therefore, data should be gathered from several geographicalpositions with regard to ambient conditions, atmospheric and environmental resources, occupation patterns,user preferences and energy cost. • Adopting smart IoT data collection strategies to collect data from various IoT sensors at a low sampling rate,which leads therefore to developing real-time and scalable energy saving systems.15
PREPRINT - S
EPTEMBER
18, 2020 • Collecting more annotated anomaly detection datasets in order to encourage testing and developing anomalydetection algorithms. Overall, detecting anomalous power consumption plays a major role in reducing wastedenergy. • Producing comprehensive databases by reference to the level of consumption (especially at the appliance-level)and the duration of the collection campaign (i.e. collecting data for the entire seasonal periods of the year)while respecting end-users personal privacy and security • Formulating protocols and standards to characterize building power consumptions datasets that can help makeunified metadata strategies and terminologies, facilitating the comparisons, and rigorously understanding thestate-of-the-art.In addition, genuine initiatives to improve datasets exploitation and utilization have been identified in this framework.Consequently, a novel visualization strategy has been presented based on the micro-moments analysis, which enablespeople to comprehend their own power usage footprints, and accordingly interpret their electricity consuming behavior.Moreover, it helps them easily getting statistics on their actual power consumption. Moreover, another example of usingML algorithms has been introduced to classify power consumption micro-moments and detect anomalous usage, suchas “excessive consumption” or “consumption while outside”. These two behaviors are responsible for wasting a largeamount of energy. Consequently, it will be part of our future work to improve datasets exploitation and help promotingenergy efficiency through: • The use of novel ML algorithms, including deep learning and generative adversial networks (GAN), whichcan effectively deal with imbalanced and large-scale datasets. • The development of innovative visualization tools that can cognitively improve end-users comprehensionof their consumption behavior. Therefore, interactive visualization tools will be integrated into smart powerconsumption dashboards that permit individuals to engage with, apprehend their energy usage and translatethem into positive actions throughout their everyday life. • The deployment of explainable recommender systems in order to trigger action recommendations at the correctmoment, especially if appropriate hardware materials are used to collect and analyze data in a real-timemanner.
Acknowledgements
This paper was made possible by National Priorities Research Program (NPRP) grant No. 10-0130-170288 from theQatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibilityof the authors.
ReferencesReferences [1] T. Chaudhuri, Y. C. Soh, H. Li, L. Xie, A feedforward neural network based indoor-climate control frameworkfor thermal comfort and energy saving in buildings, Applied Energy 248 (2019) 44 – 53.[2] Y. Himeur, A. Alsalemi, F. Bensaali, A. Amira, Robust event-based non-intrusive appliance recognition usingmulti-scale wavelet packet tree and ensemble bagging tree, Applied Energy 267 (2020) 114877.[3] M. S. Gul, S. Patidar, Understanding the energy consumption and occupancy of a multi-purpose academicbuilding, Energy and Buildings 87 (2015) 155 – 165.[4] X. M. Zhang, K. Grolinger, M. A. M. Capretz, L. Seewald, Forecasting residential energy consumption: Singlehousehold perspective, in: 2018 17th IEEE International Conference on Machine Learning and Applications(ICMLA), 2018, pp. 110–117.[5] M. Brogger, K. B. Wittchen, Estimating the energy-saving potential in national building stocks – a methodologyreview, Renewable and Sustainable Energy Reviews 82 (2018) 1489 – 1496.[6] M. Brogger, P. Bacher, H. Madsen, K. B. Wittchen, Estimating the influence of rebound effects on the energy-saving potential in building stocks, Energy and Buildings 181 (2018) 62 – 74.[7] M. A. Mohamed, Saving energy through using green rating system for building commissioning, Energy Procedia162 (2019) 369 – 378, emerging and Renewable Energy: Generation and Automation.16
PREPRINT - S
EPTEMBER
18, 2020[8] F. Sher, A. Kawai, F. Gulec, H. Sadiq, Sustainable energy saving alternatives in small buildings, SustainableEnergy Technologies and Assessments 32 (2019) 92 – 99.[9] T. Jafarinejad, A. Erfani, A. Fathi, M. B. Shafii, Bi-level energy-efficient occupancy profile optimizationintegrated with demand-driven control strategy: University building energy saving, Sustainable Cities and Society48 (2019) 101539.[10] A. Alsalemi, C. Sardianos, F. Bensaali, I. Varlamis, A. Amira, G. Dimitrakopoulos, The role of micro-moments:A survey of habitual behavior change and recommender systems for energy saving, IEEE Systems Journal (2019)1–12.[11] W. Al-Marri, A. Al-Habaibeh, M. Watkins, An investigation into domestic energy consumption behaviour andpublic awareness of renewable energy in qatar, Sustainable Cities and Society 41 (2018) 639 – 646.[12] S. A. Sarkodie, A. O. Crentsil, P. A. Owusu, Does energy consumption follow asymmetric behavior? anassessment of ghana’s energy sector dynamics, Science of The Total Environment 651 (2019) 2886 – 2898.[13] K. Zhou, S. Yang, Understanding household energy consumption behavior: The contribution of energy big dataanalytics, Renewable and Sustainable Energy Reviews 56 (2016) 810 – 819.[14] M. H. Ishak, I. Sipan, M. Sapri, A. H. M. Iman, D. Martin, Estimating potential saving with energy consumptionbehaviour model in higher education institutions, Sustainable Environment Research 26 (6) (2016) 268 – 273.[15] T. Csoknyai, J. Legardeur, A. A. Akle, M. Horvath, Analysis of energy consumption profiles in residentialbuildings and impact assessment of a serious game on occupants’ behavior, Energy and Buildings 196 (2019) 1 –20.[16] P. V. Aubel, E. Poll, Smart metering in the netherlands: What, how, and why, International Journal of ElectricalPower & Energy Systems 109 (2019) 719 – 725.[17] D. B. Avancini, J. J. Rodrigues, S. G. Martins, R. A. Rabelo, J. Al-Muhtadi, P. Solic, Energy meters evolution insmart grids: A review, Journal of Cleaner Production 217 (2019) 702 – 715.[18] S. Latif, A. Shabani, A. Esser, A. Martkovich, Analytics of residential electrical energy profile, in: 2017 IEEE30th Canadian Conference on Electrical and Computer Engineering (CCECE), 2017, pp. 1–4.[19] P. P. Moletsane, T. J. Motlhamme, R. Malekian, D. C. Bogatmoska, Linear regression analysis of energyconsumption data for smart homes, in: 2018 41st International Convention on Information and CommunicationTechnology, Electronics and Microelectronics (MIPRO), 2018, pp. 0395–0399.[20] A. Muhammad Mehar, A. Qumer Gill, K. Matawie, Analytical model for residential predicting energy consump-tion, in: 2018 IEEE 20th Conference on Business Informatics (CBI), Vol. 02, 2018, pp. 82–88.[21] S. Makonin, Z. J. Wang, C. Tumpach, Rae: The rainforest automation energy dataset for smart grid meter dataanalysis, Data 3 (1) (2018) 1–9.[22] J. L. Ramirez-Mendiola, P. Grunewald, N. Eyre, The diversity of residential electricity demand – a comparativeanalysis of metered and simulated data, Energy and Buildings 151 (2017) 121 – 131.[23] A. Alsalemi, Y. Himeur, F. Bensaali, A. Amira, C. Sardianos, I. Varlamis, G. Dimitrakopoulos, Achievingdomestic energy efficiency using micro-moments and intelligent recommendations, IEEE Access 8 (2020)15047–15055.[24] K. White, R. Habib, D. J. Hardisty, How to shift consumer behaviors to be more sustainable: A literature reviewand guiding framework, Journal of Marketing 83 (3) (2019) 22–49.[25] D. Ürge Vorsatz, L. F. Cabeza, S. Serrano, C. Barreneche, K. Petrichenko, Heating and cooling energy trends anddrivers in buildings, Renewable and Sustainable Energy Reviews 41 (2015) 85 – 98.[26] W. Al-Marri, A. Al-Habaibeh, H. Abdo, Exploring the relationship between energy cost and people’s consumptionbehaviour, Energy Procedia 105 (2017) 3464 – 3470, 8th International Conference on Applied Energy, ICAE2016,8-11 October 2016, Beijing, China.[27] A. Paone, J.-P. Bacher, The impact of building occupant behavior on energy efficiency and methods to influenceit: A review of the state of the art, Energies 11 (4).[28] X. Liu, N. Iftikhar, H. Huo, R. Li, P. S. Nielsen, Two approaches for synthesizing scalable residential energyconsumption data, Future Generation Computer Systems 95 (2019) 586 – 600.[29] Y. Guo, Z. Tan, H. Chen, G. Li, J. Wang, R. Huang, J. Liu, T. Ahmad, Deep learning-based fault diagnosis ofvariable refrigerant flow air-conditioning system for building energy saving, Applied Energy 225 (2018) 732 –745. 17
PREPRINT - S
EPTEMBER
18, 2020[30] N.-T. Ngo, Early predicting cooling loads for energy-efficient design in office buildings by machine learning,Energy and Buildings 182 (2019) 264 – 273.[31] X. Xu, W. Wang, T. Hong, J. Chen, Incorporating machine learning with building network analysis to predictmulti-building energy use, Energy and Buildings 186 (2019) 80 – 97.[32] J.-S. Chou, D.-S. Tran, Forecasting energy consumption time series using machine learning techniques based onusage patterns of residential householders, Energy 165 (2018) 709 – 726.[33] W. Wang, T. Hong, X. Xu, J. Chen, Z. Liu, N. Xu, Forecasting district-scale energy dynamics through integratingbuilding network and long short-term memory learning algorithm, Applied Energy 248 (2019) 217 – 230.[34] A. Alsalemi, M. Ramadan, F. Bensaali, A. Amira, C. Sardianos, I. Varlamis, G. Dimitrakopoulos, Endorsingdomestic energy saving behavior using micro-moment classification, Applied Energy 250 (2019) 1302 – 1311.[35] A. G. Ruzzelli, C. Nicolas, A. Schoofs, G. M. P. O’Hare, Real-time recognition and profiling of appliancesthrough a single electricity sensor, in: 2010 7th Annual IEEE Communications Society Conference on Sensor,Mesh and Ad Hoc Communications and Networks (SECON), 2010, pp. 1–9.[36] Y. Gao, A. Schay, D. Hou, Occupancy detection in smart housing using both aggregated and appliance-specificpower consumption data, in: 2018 17th IEEE International Conference on Machine Learning and Applications(ICMLA), 2018, pp. 1296–1303.[37] E. Sala, D. Zurita, K. Kampouropoulos, M. Delgado, L. Romeral, Occupancy forecasting for the reduction ofhvac energy consumption in smart buildings, in: IECON 2016 - 42nd Annual Conference of the IEEE IndustrialElectronics Society, 2016, pp. 4002–4007.[38] S. Ahmadi-Karvigh, A. Ghahramani, B. Becerik-Gerber, L. Soibelman, One size does not fit all: Understandinguser preferences for building automation systems, Energy and Buildings 145 (2017) 163 – 173.[39] C. Franco, K. Nielsen, P. J. Kerstens, Uncertainty management for classification and benchmarking of energy-usepreference profiles, in: 2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2018, pp. 1–8.[40] N. Terry, J. Palmer, Household electricity survey, in: UK Data Archive Study, 2012, pp. 1–31.[41] K. Bache, M. Lichman, Individual Household electric power consumption dataset, CA:University of California,School of Information and Computer Science, 2013.[42] S. Barker, A. Mishra, D. Irwin, E. Cecchet, P. Shenoy, J. Albrecht, Smart*: An open data set and tools forenabling research in sustainable homes, in: Proceedings of the 2012 Workshop on Data Mining Applications inSustainability (SustKDD 2012), 2012, pp. 1–6.[43] L. Pereira, F. Quintal, R. Goncalves, N. J. Nunes, Sustdata: A public dataset for ict4s electric energy research, in:ICT for Sustainability 2014 (ICT4S-14), Atlantis Press, 2014/08.[44] D. Murray, J. Liao, L. Stankovic, V. Stankovic, R. Hauxwell-Baldwin, C. Wilson, M. Coleman, T. Kane, S. Firth,A data management platform for personalised real-time energy feedback, in: Procededings of the 8th InternationalConference on Energy Efficiency in Domestic Appliances and Lighting, 2015.[45] O. Parson, G. Fisher, A. Hersey, N. Batra, J. Kelly, A. Singh, W. Knottenbelt, A. Rogers, Dataport and nilmtk: Abuilding data set designed for non-intrusive load monitoring, in: 2015 IEEE Global Conference on Signal andInformation Processing (GlobalSIP), 2015, pp. 210–214.[46] European union. opportunities for community groups through energy storage (octes) (2013), http://octes.oamk.fi/final/ , accessed: 2019-05-03.[47] A. Reinhardt, P. Baumann, D. Burgstahler, M. Hollick, H. Chonov, M. Werner, R. Steinmetz, On the accuracyof appliance identification based on distributed load metering data, in: 2012 Sustainable Internet and ICT forSustainability (SustainIT), 2012, pp. 1–9.[48] S. Makonin, F. Popowich, L. Bartram, B. Gill, I. V. Bajic, Ampds: A public dataset for load disaggregation andeco-feedback research, in: 2013 IEEE Electrical Power Energy Conference, 2013, pp. 1–6.[49] I. V. B. Stephen Makonin, Bradley Ellert, F. Popowich, Electricity, water, and natural gas consumption of aresidential house in canada from 2012 to 2014, Scientific Data 3 (180048) (2016) 1 – 12.[50] Australian Energy Regulator, Electricity consumption benchmarks, data retrieved from data.gov.au, (2014).[51] C. Holcomb, Pecan street inc.: a test-bed for nilm, in: International Workshop on Non-Intrusive Load Monitoring,ACM, New York, NY, USA, 2012, pp. 3:1–3:8.[52] N. Saldanha, I. Beausoleil-Morrison, Measured end-use electric load profiles for 12 canadian houses at hightemporal resolution, Energy and Buildings 49 (2012) 519 – 530.18
PREPRINT - S
EPTEMBER
18, 2020[53] G. Johnson, I. Beausoleil-Morrison, Electrical-end-use data from 23 houses sampled each minute for simulatingmicro-generation systems, Applied Thermal Engineering 114 (2017) 1449 – 1456.[54] A. Monacchi, D. Egarter, W. Elmenreich, S. D’Alessandro, A. M. Tonello, Greend: An energy consumptiondataset of households in italy and austria, in: 2014 IEEE International Conference on Smart Grid Communications(SmartGridComm), 2014, pp. 511–516.[55] C. Beckel, W. Kleiminger, R. Cicchetti, T. Staake, S. Santini, The eco data set and the performance of non-intrusive load monitoring algorithms, in: Proceedings of the 1st ACM International Conference on EmbeddedSystems for Energy-Efficient Buildings (BuildSys 2014). Memphis, TN, USA, ACM, 2014, pp. 80–89.[56] N. Batra, M. Gulati, A. Singh, M. B. Srivastava, It’s different: Insights into home energy consumption in india,in: Proceedings of the 5th ACM Workshop on Embedded Systems For Energy-Efficient Buildings, BuildSys’13,ACM, New York, NY, USA, 2013, pp. 3:1–3:8.[57] A. S. Uttama Nambi, A. Reyes Lua, V. R. Prasad, Loced: Location-aware energy disaggregation framework,in: Proceedings of the 2Nd ACM International Conference on Embedded Systems for Energy-Efficient BuiltEnvironments, BuildSys ’15, ACM, New York, NY, USA, 2015, pp. 45–54.[58] V. L. Chen, M. A. Delmas, S. L. Locke, A. Singh, Dataset on information strategies for energy conservation: Afield experiment in india, Data in Brief 16 (2018) 713 – 716.[59] Commercial and residential hourly load profiles for all tmy3 locations in the united states, https://openei.org/datasets/files/961/pub/ , accessed: 2019-05-30.[60] S. Makonin, Hue: The hourly usage of energy dataset for buildings in british columbia, Data in Brief 23 (2019)103744.[61] J. Kelly, W. Knottenbelti, The uk-dale dataset, domestic appliance-level electricity demand and whole-housedemand from five uk homes, Scientific Data 2 (150007) (2015) 1 – 14.[62] J. Z. Kolter, Redd : A public data set for energy disaggregation research, in: Procededings of the 1st KDDWorkshop on Data Mining Applications in Sustainability (SustKDD), ACM, San Diego, CA, USA, 2011.[63] K. Anderson, A. Ocneanu, D. R. Carlson, A. Rowe, M. Bergés, Blued : A fully labeled public dataset forevent-based non-intrusive load monitoring research, in: Procededings of the 2nd KDD Workshop on Data MiningApplications in Sustainability (SustKDD), ACM, Beijing, China, 2012.[64] T. Kriechbaumer, H.-A. Jacobsen, Blond, a building-level office environment dataset of typical electricalappliances, Scientific Data 5 (180048) (2018) 1 – 14.[65] J. Gao, S. Giri, E. C. Kara, M. Bergés, Plaid: A public dataset of high-resolution electrical appliance measure-ments for load identification research: Demo abstract, in: Proceedings of the 1st ACM Conference on EmbeddedSystems for Energy-Efficient Buildings, BuildSys ’14, ACM, New York, NY, USA, 2014, pp. 198–199.[66] C. Gisler, A. Ridi, D. Zufferey, O. A. Khaled, J. Hennebert, Appliance consumption signature database andrecognition test protocols, in: 2013 8th International Workshop on Systems, Signal Processing and theirApplications (WoSSPA), 2013, pp. 336–341.[67] M. Maasoumy, B. M. Sanandaji, K. Poolla, A. S. Vincentelli, Berds-berkeley energy disaggregation data set, in:University of California, Berkeley, 2013.[68] A. Alsalemi, F. Bensaali, A. Amira, N. Fetais, C. Sardianos, I. Varlamis, Using micro-moments to visualizedomestic energy usage, in: Intelligent Systems Conference (IntelliSys-2019), London, UK, 2019.[69] C. Sardianos, I. Varlamis, G. Dimitrakopoulos, D. Anagnostopoulos, A. Alsalemi, F. Bensaali, A. Amira, I wantto .... change’ micro-moment based recommendations can change users’ energy habits, in: 8th InternationalConference on Smart Cities and Green ICT Systems (SMARTGREENS 2019), Crete, Greece, 2019.[70] Y. Himeur, A. Elsalemi, F. Bensaali, A. Amira, Improving in-home appliance identification using fuzzy-neighbors-preserving analysis based qr-decomposition, in: International Congress on Information and CommunicationTechnology (ICICT), 2020, pp. 1–8.[71] F. Rossier, P. Lang, J. Hennebert, Near real-time appliance recognition using low frequency monitoring andactive learning methods, Energy Procedia 122 (2017) 691 – 696.[72] S.-C. Lee, G.-Y. Lin, W.-R. Jih, J. Y.-J. Hsu, Appliance recognition and unattended appliance detection forenergy conservation, in: Proceedings of the 5th AAAI Conference on Plan, Activity, and Intent Recognition,AAAIWS’10-05, AAAI Press, 2010, pp. 37–44.[73] M. Kahl, A. Ul Haq, T. Kriechbaumer, H.-A. Jacobsen, A comprehensive feature study for appliance recognitionon high frequency energy data, in: Proceedings of the Eighth International Conference on Future Energy Systems,e-Energy ’17, ACM, New York, NY, USA, 2017, pp. 121–131.19
PREPRINT - S
EPTEMBER
18, 2020[74] C. Sardianos, I. Varlamis, C. Chronis, G. Dimitrakopoulos, Y. Himeur, A. Alsalemi, F. Bensaali, A. Amira, Amodel for predicting room occupancy based on motion sensor data, in: 2020 IEEE International Conference onInformatics, IoT, and Enabling Technologies (ICIoT), 2020, pp. 394–399.[75] Y. Wei, L. Xia, S. Pan, J. Wu, X. Zhang, M. Han, W. Zhang, J. Xie, Q. Li, Prediction of occupancy level andenergy consumption in office building using blind system identification and neural networks, Applied Energy240 (2019) 276 – 294.[76] J. Ahmad, H. Larijani, R. Emmanuel, M. Mannion, A. Javed, Occupancy detection in non-residential buildings –a survey and novel privacy preserved occupancy monitoring solution, Applied Computing and Informatics.[77] Z. Chen, C. Jiang, L. Xie, Building occupancy estimation and detection: A review, Energy and Buildings 169(2018) 260 – 270.[78] T. Vafeiadis, S. Zikos, G. Stavropoulos, D. Ioannidis, S. Krinidis, D. Tzovaras, K. Moustakas, Machine learningbased occupancy detection via the use of smart meters, in: 2017 International Symposium on Computer Scienceand Intelligent Controls (ISCSIC), 2017, pp. 6–12.[79] S. Khashe, A. Heydarian, D. Gerber, B. Becerik-Gerber, T. Hayes, W. Wood, Influence of leed branding onbuilding occupants’ pro-environmental behavior, Building and Environment 94 (2015) 477 – 488.[80] G. Tang, Z. Ling, F. Li, D. Tang, J. Tang, Occupancy-aided energy disaggregation, Computer Networks 117(2017) 42 – 51, cyber-physical systems and context-aware sensing and computing.[81] V. Breschi, D. Piga, A. Bemporad, Kalman filtering for energy disaggregation, IFAC-PapersOnLine 51 (5) (2018)108 – 113, 1st IFAC Workshop on Integrated Assessment Modelling for Environmental Systems IAMES 2018.[82] M. Aiad, P. H. Lee, Energy disaggregation of overlapping home appliances consumptions using a cluster splittingapproach, Sustainable Cities and Society 43 (2018) 487 – 494.[83] A. Miyasawa, Y. Fujimoto, Y. Hayashi, Energy disaggregation based on smart metering data via semi-binarynonnegative matrix factorization, Energy and Buildings 183 (2019) 547 – 558.[84] Y. Himeur, A. Elsalemi, F. Bensaali, A. Amira, Efficient multi-descriptor fusion for non-intrusive appliancerecognition, in: The IEEE International Symposium on Circuits and Systems (ISCAS), 2020, pp. 1–5.[85] Y. Liu, X. Wang, L. Zhao, Y. Liu, Admittance-based load signature construction for non-intrusive appliance loadmonitoring, Energy and Buildings 171 (2018) 209 – 219.[86] A. L. Wang, B. X. Chen, C. G. Wang, D. Hua, Non-intrusive load monitoring algorithm based on features of v-itrajectory, Electric Power Systems Research 157 (2018) 134 – 144.[87] C. Liu, A. Akintayo, Z. Jiang, G. P. Henze, S. Sarkar, Multivariate exploration of non-intrusive load monitoringvia spatiotemporal pattern network, Applied Energy 211 (2018) 1106 – 1122.[88] S. Henriet, U. Simsekli, B. Fuentes, G. Richard, A generative model for non-intrusive load monitoring incommercial buildings, Energy and Buildings 177 (2018) 268 – 278.[89] S. S. Hosseini, K. Agbossou, S. Kelouwani, A. Cardenas, Non-intrusive load monitoring through home energymanagement systems: A comprehensive review, Renewable and Sustainable Energy Reviews 79 (2017) 1266 –1274.[90] Z.-X. Wang, L.-Y. He, H.-H. Zheng, Forecasting the residential solar energy consumption of the united states,Energy 178 (2019) 610 – 623.[91] M. Bourdeau, X. qiang Zhai, E. Nefzaoui, X. Guo, P. Chatellier, Modeling and forecasting building energyconsumption: A review of data-driven techniques, Sustainable Cities and Society 48 (2019) 101533.[92] T. Ahmad, H. Chen, Deep learning for multi-scale smart energy forecasting, Energy 175 (2019) 98 – 112.[93] T. Hong, P. Pinson, Energy forecasting in the big data world, International Journal of Forecasting.[94] G. P. Herrea, M. Constantino, B. M. Tabak, H. Pistori, J.-J. Su, A. Naranpanawa, Data on forecasting energyprices using machine learning, Data in Brief (2019) 104122.[95] J. Li, R. E. Just, Modeling household energy consumption and adoption of energy efficient technology, EnergyEconomics 72 (2018) 404 – 415.[96] M. Villca-Pozo, J. P. Gonzales-Bustos, Tax incentives to modernize the energy efficiency of the housing in spain,Energy Policy 128 (2019) 530 – 538.[97] L. M. Lopez-Ochoa, J. Las-Heras-Casas, L. M. Lopez-Gonzalez, P. Olasolo-Alonso, Towards nearly zero-energybuildings in mediterranean countries: Energy performance of buildings directive evolution and the energyrehabilitation challenge in the spanish residential sector, Energy 176 (2019) 335 – 352.20
PREPRINT - S
EPTEMBER
18, 2020[98] A. Thonipara, P. Runst, C. Ochsner, K. Bizer, Energy efficiency of residential buildings in the european union –an exploratory analysis of cross-country consumption patterns, Energy Policy 129 (2019) 1156 – 1167.[99] T. Hong, P. Pinson, S. Fan, Global energy forecasting competition 2012, International Journal of Forecasting30 (2) (2014) 357 – 363.[100] T. Hong, P. Pinson, S. Fan, H. Zareipour, A. Troccoli, R. J. Hyndman, Probabilistic energy forecasting: Globalenergy forecasting competition 2014 and beyond, International Journal of Forecasting 32 (3) (2016) 896 – 913.[101] W. Cui, H. Wang, Anomaly detection and visualization of school electricity consumption data, in: 2017 IEEE2nd International Conference on Big Data Analysis (ICBDA)(, 2017, pp. 606–611.[102] J. E. Seem, Using intelligent data analysis to detect abnormal energy consumption in buildings, Energy andBuildings 39 (1) (2007) 52 – 58.[103] I. Khan, A. Capozzoli, S. P. Corgnati, T. Cerquitelli, Fault detection analysis of building energy consumptionusing data mining techniques, Energy Procedia 42 (2013) 557 – 566, mediterranean Green Energy Forum 2013:Proceedings of an International Conference MGEF-13.[104] H. Janetzko, F. Stoffel, S. Mittelstadt, D. A. Keim, Anomaly detection for visual analytics of power consumptiondata, Computers & Graphics 38 (2014) 27 – 37.[105] Z. Ma, J. Song, J. Zhang, A real-time detection method of abnormal building energy consumption data coupledpod-lse and fcd, Procedia Engineering 205 (2017) 1657 – 1664, 10th International Symposium on Heating,Ventilation and Air Conditioning, ISHVAC2017, 19-22 October 2017, Jinan, China.[106] D. B. Araya, K. Grolinger, H. F. ElYamany, M. A. Capretz, G. Bitsuamlak, An ensemble learning framework foranomaly detection in building energy consumption, Energy and Buildings 144 (2017) 191 – 206.[107] C. Nordahl, M. Persson, H. Grahn, Detection of residents’ abnormal behaviour by analysing energy consumptionof individual households, in: 2017 IEEE International Conference on Data Mining Workshops (ICDMW), 2017,pp. 729–738.[108] H. Qiu, Y. Tu, Y. Zhang, Anomaly detection for power consumption patterns in electricity early warning system,in: 2018 Tenth International Conference on Advanced Computational Intelligence (ICACI), 2018, pp. 867–873.[109] Y. Weng, N. Zhang, C. Xia, Multi-agent-based unsupervised detection of energy consumption anomalies onsmart campus, IEEE Access 7 (2019) 2169–2178.[110] T. Picon, M. N. Meziane, P. Ravier, G. Lamarque, C. Novello, J. L. Bunetel, Y. Raingeaud, COOLL: controlledon/off loads library, a public dataset of high-sampled electrical signals for appliance identification, CoRRabs/1611.05803. arXiv:1611.05803 .[111] C. Shin, E. Lee, J. Han, J. Yim, W. Rhee, H. Lee, The enertalk dataset, 15 hz electricity consumption data from22 houses in korea, Scientific Data 6.[112] Y. Himeur, A. Alsalemi, F. Bensaali, A. Amira, A novel approach for detecting anomalous energy consumptionbased on micro-moments and deep neural networks, Cognitive Computation (2020) 1–23.[113] E. Fotopoulou, A. Zafeiropoulos, F. Terroso, A. Gonzalez, A. Skarmeta, U. ¸Sim¸sek, A. Fensel, Data aggregation,fusion and recommendations for strengthening citizens energy-aware behavioural profiles, in: 2017 GlobalInternet of Things Summit (GIoTS), 2017, pp. 1–6.[114] Y. Himeur, A. Alsalemi, A. Al-Kababji, F. Bensaali, A. Amira, Data fusion strategies for energy efficiency inbuildings: Overview, challenges and novel orientations, Information Fusion 64 (2020) 99–120.[115] A. Capozzoli, M. S. Piscitelli, A. Gorrino, I. Ballarini, V. Corrado, Data analytics for occupancy pattern learningto reduce the energy consumption of hvac systems in office buildings, Sustainable Cities and Society 35 (2017)191 – 208.[116] H. Kang, M. Lee, T. Hong, J.-K. Choi, Determining the optimal occupancy density for reducing the energyconsumption of public office buildings: A statistical approach, Building and Environment 127 (2018) 173 – 186.[117] E. Delzendeh, S. Wu, A. Lee, Y. Zhou, The impact of occupants’ behaviours on building energy analysis: Aresearch review, Renewable and Sustainable Energy Reviews 80 (2017) 1061 – 1071.[118] S. Ge, J. Li, H. Liu, X. Liu, Y. Wang, H. Zhou, Domestic energy consumption modeling per physical char-acteristics and behavioral factors, Energy Procedia 158 (2019) 2512 – 2517, innovative Solutions for EnergyTransitions.[119] Y. Ding, X. Ma, S. Wei, W. Chen, A prediction model coupling occupant lighting and shading behaviors inprivate offices, Energy and Buildings 216 (2020) 109939.21
PREPRINT - S
EPTEMBER
18, 2020[120] X. Jiang, L. Wu, A residential load scheduling based on cost efficiency and consumer’s preference for demandresponse in smart grid, Electric Power Systems Research 186 (2020) 106410.[121] S. Lefkeli, E. Manolas, K. Ioannou, G. Tsantopoulos, Socio-cultural impact of energy saving: Studying thebehaviour of elementary school students in greece, Sustainability 10 (3).[122] J. Koci, V. Koci, J. Madera, R. Cerny, Effect of applied weather data sets in simulation of building energydemands: Comparison of design years with recent weather data, Renewable and Sustainable Energy Reviews100 (2019) 22 – 32.[123] Y. Liu, R. Stouffs, A. Tablada, N. H. Wong, J. Zhang, Comparing micro-scale weather data to building energyconsumption in singapore, Energy and Buildings 152 (2017) 776 – 791.[124] Y. Geng, W. Ji, B. Lin, J. Hong, Y. Zhu, Building energy performance diagnosis using energy bills and weatherdata, Energy and Buildings 172 (2018) 181 – 191.[125] S. Farah, D. Whaley, W. Saman, J. Boland, Integrating climate change into meteorological weather data forbuilding energy simulation, Energy and Buildings 183 (2019) 749 – 760.[126] G. Lupato, M. Manzan, Italian trys: New weather data impact on building energy simulations, Energy andBuildings 185 (2019) 287 – 303.[127] S. Erba, F. Causone, R. Armani, The effect of weather datasets on building energy simulation outputs, EnergyProcedia 134 (2017) 545 – 554, sustainability in Energy and Buildings 2017: Proceedings of the Ninth KESInternational Conference, Chania, Greece, 5-7 July 2017.[128] R. Antonietti, F. Fontini, Does energy price affect energy efficiency? cross-country panel evidence, EnergyPolicy 129 (2019) 896 – 906.[129] A. Satchwell, P. Cappers, C. Goldman, Customer bill impacts of energy efficiency and net-metered photovoltaicsystem investments, Utilities Policy 50 (2018) 144 – 152.[130] D. Eryilmaz, S. Gafford, Can a daily electricity bill unlock energy efficiency? evidence from texas, The ElectricityJournal 31 (3) (2018) 7 – 11.[131] M. G. Fikru, Electricity bill savings and the role of energy efficiency improvements: A case study of residentialsolar adopters in the usa, Renewable and Sustainable Energy Reviews 106 (2019) 124 – 132.[132] N. Hossein Motlagh, M. Mohammadrezaei, J. Hunt, B. Zakeri, Internet of things (iot) and the energy sector,Energies 13 (2).[133] Raspberry pi 4 model b, , ac-cessed: 2020-05-04.[134] Odroid-xu4, , accessed: 2020-05-04.[135] Jetson tx1 developer kit, , accessed: 2020-05-04.[136] A. Alsalemi, Y. Himeur, F. Bensaali, A. Amira, C. Sardianos, C. Chronis, I. Varlamis, G. Dimitrakopoulos, Amicro-moment system for domestic energy efficiency analysis, IEEE Systems Journal (2020) 1–8.[137] I. Varlamis, C. Sardianos, G. Dimitrakopoulos, A. Alsalemi, F. Bensaali, Y. Himeur, A. Amira, Rehab-c:Recommendations for energy habits change, future generation computer systems, Future Generation ComputerSystems (Accepted) (2020) 1–41.[138] F. Ricci, L. Rokach, B. Shapira, P. B. Kantor, Recommender Systems Handbook, 1st Edition, Springer-Verlag,Berlin, Heidelberg, 2010.[139] A. Starke, M. Willemsen, C. Snijders, Effective user interface designs to increase energy-efficient behavior in arasch-based energy recommender system, in: Proceedings of the Eleventh ACM Conference on RecommenderSystems, RecSys ’17, ACM, New York, NY, USA, 2017, pp. 65–73.[140] Y. Zhang, X. Chen, et al., Explainable recommendation: A survey and new perspectives, Foundations andTrends R (cid:13) PREPRINT - S
EPTEMBER
18, 2020[144] A. Spence, M. Goulden, C. Leygue, N. Banks, B. Bedwell, M. Jewell, R. Yang, E. Ferguson, Digital energyvisualizations in the workplace: the e-genie tool, Building Research & Information 46 (3) (2018) 272–283.[145] G. Li, C. Kou, H. Wang, Estimating city-level energy consumption of residential buildings: A life-cycle dynamicsimulation model, Journal of Environmental Management 240 (2019) 451 – 462.[146] M. Zekic-Susac, S. Mitrovic, A. Has, Machine learning based system for managing energy efficiency of publicsector as an approach towards smart cities, International Journal of Information Management (2020) 102074.[147] M. Fekri, A. M. Ghosh, K. Grolinger, Generating energy data for machine learning with recurrent generativeadversarial networks, Energies 13. doi:10.3390/en13010130 . Appendix
Abbreviation description of the power consumption datasets along with a description of the nomenclatures consideredin this paper are summarized in Table 4. 23
PREPRINT - S
EPTEMBER
18, 2020Table 4: List of abbreviations and nomenclatures used in this paper.