A Non-Intrusive Load Monitoring Approach for Very Short Term Power Predictions in Commercial Buildings
Karoline Brucke, Stefan Arens, Jan-Simon Telle, Thomas Steens, Benedikt Hanke, Karsten von Maydell, Carsten Agert
AA Non-Intrusive Load Monitoring Approach for Very Short Term PowerPredictions in Commercial Buildings
Karoline Brucke, Stefan Arens ∗ , Jan-Simon Telle, Thomas Steens, Benedikt Hanke, Karsten von Maydell,Carsten Agert Abstract
This paper presents a new algorithm to extract device profiles fully unsupervised from three phases reactiveand active aggregate power measurements. The extracted device profiles are applied for the disaggregationof the aggregate power measurements using particle swarm optimization. Finally, this paper provides anew approach for short term power predictions using the disaggregation data. For this purpose, a statechanges forecast for every device is carried out by an artificial neural network and converted into a powerprediction afterwards by reconstructing the power regarding the state changes and the device profiles. Theforecast horizon is 15 minutes. To demonstrate the developed approaches, three phase reactive and activeaggregate power measurements of a multi-tenant commercial building are used. The granularity of datais 1 s. In this work, 52 device profiles are extracted from the aggregate power data. The disaggregationshows a very accurate reconstruction of the measured power with a percentage energy error of approximately1 %. The developed indirect power prediction method applied to the measured power data outperforms twopersistence forecasts and an artificial neural network, which is designed for 24h-day-ahead power predictionsworking in the power domain.
Keywords:
Non-intrusive load monitoring, energy disaggregation, power prediction, unsupervisedlearning, neural networks
1. Introduction
Due to a higher share of renewable energies andthe increasing electrification of our society, the elec-tricity grid is facing more challenges such as insta-bilities and sudden increases in energy supply ordemand. A possible solution to avoid overloadingwithout a massive increase in grid infrastructure isenergy management on both, the supply and thedemand side of the electrcity grid [1]. Energy man-agement relies among other things on high qual-ity forecasts of the electricity supply and demandfor different time horizons of seconds to months[2, 3, 4]. Predictions for seconds to minutes are re-ferred to as very short term prediction. The size ofsystems of predictions differ from the high voltage ∗ Corresponding author, email adress: [email protected] ∗∗ All authors were with: DLR Institute of Networked En-ergy Systems, Carl-von-Ossietzky-Str. 15, 26129 Oldenburg,Germany grids to the device level [5, 6, 7]. Power predic-tions on the demand side include random humanbehavior and thus show erratic and highly volatilepatterns. Especially, very short term predictionsare highly influenced by randomness and are moredifficult to carry out than long term predictions [8].This behavior is particularly evident in buildingslike households and industrial or commercial build-ings. However, commercial and industrial buildingsshow a more repetitive demand than householdsdue to the division of working time and non-workingtime trough for instance shift work and repetitivetasks. Thus, there is much potential to carry outhigh quality power predictions in commercial build-ings which additionally mostly have a higher elec-tricity demand than households as [10] shows forGermany.For power predictions, mostly machine learningmethods, in particular artificial neural networks(ANNs), are applied in order to learn interrelationsbetween past and future consumption data [11, 12]. a r X i v : . [ ee ss . SP ] J u l ower prediction methods in general mainly workdirectly with the quantity to be predicted and areoften not able to predict sudden events resulting insharp power rises [8, 12, 9]. In this work, we in-troduce a different approach of power predictionsbased on non-intrusive load monitoring (NILM),which was firstly described by Hart in 1992 [13].NILM is also called energy disaggregation becauseit divides aggregate consumption data into the con-tributions of single devices. Energy disaggregationaims for the description of the state of every appli-ance in a building without a massive increase in me-tering infrastructure and was carried out by variousmethods in the past [14, 15]. Most disaggregationmethods work with model building based on priorknowledge and also labeled data sets as [16] whichare mostly not present in reality. Therefore, NILMapproaches that work without labeled data sets arerequired.Disaggregation produces additional knowledge ofthe building which can be used for different pur-poses like power predictions. In [17], the authorsincorporate appliance usage patterns to improveperformance of load forecasting and in [18] the au-thors use NILM and a subsequent clustering of sim-ilarly behaving appliances as a preprocessing stepfor their forecast algorithm.In this work, we present a new approach of powerpredictions using the state data of devices producedby our NILM approach. Thus, we develop a state prediction of devices and carry out a power predic-tion by reconstructing the aggregate power from thestate data with respect to the according device pro-files. Therefore, we present an unsupervised NILMapproach whose produced state data is used forshort term power predictions. For this purpose, wefirstly state our developed algorithm which is ableto extract device profiles from aggregate consump-tion data unsupervised with methods from machinelearning and statistics. No prior knowledge of thebuilding is required beforehand to calculate the de-vice profiles. Afterwards, we disaggregate the ag-gregate signal using particle swarm optimization asdeveloped by the authors of this paper and exten-sively covered in [19].Figure 1 shows this procedure visually with pastaggregate consumption data being disaggregated tothe device level. Afterwards the state of devicesgets predicted and the aggregate power consump-tion is reconstructed according to the state predic-tion.In Section 2 we present the used data set. There- Figure 1: Graphical representation of power predictionsbased on power disaggregation. The aggregate power sig-nal (top) is disaggregated until t = 0. This results in thepower contributions of different devices (bottom). From t = 0 the state of device is predicted and thereafter theaggregate power signal gets reconstructed from state data ofsingle devices after, the methodology is outlined in Section 3,whereas we start by describing the assumed dis-aggregation problem in Section 3.1. Afterwards, thedeveloped and used methods are presented includ-ing device profile extraction, disaggregation withparticle swarm optimization and very short termpower prediction using an artificial neural network.In order to show results, the developed methodsare exemplary applied to the power data of a com-mercial building in Section 4. After a discussion inSection 5 of results we finish with a conclusion inSection 6.
2. Data Description
In this work, we use power data of one measur-ing point in a multi-tenant commercial building asa test data set for our developed methods. Thegranularity is 1 s. The data represents a produc-tion facility and workshop and contains six features:Three phases of active and reactive power. The sixfeatures of the power measurements are referred toas P . . . P whereas P . . . P represent the three ac-tive power phases and P . . . P represent the threereactive power phases respectively. The data set in-cludes the power measurements from December 1 st ,2018 until March 29 th , 2019. On average there are0.0023 % of data points missing. Gaps are filled bythe last known value. We use the power analyzerUMG 604 PRO from Janitza Electronics. Accord-ing to the manufacturer the measuring error is less2 able 1: Data analysis of the measured power data set Property Value P min [kW] 2.261 P max [kW] 98.95Energy mean per Day [kWh] 534.5 P mean [kW] 22.27than 0.4 % which we neglect in this work [20]. Ta-ble 1 shows four key indicators of the used dataset.
3. Methodology
In the first of the following sections, the as-sumed formulation of the disaggregation problemis stated. Sencondly, the device profile extractionmethod is presented. Thereafter, the used disaggre-gation method PSO is described briefly in the sub-sequent section. The disaggregation based powerprediction procedure is outlined in the last of thefollowing sections.
We assume a very similar formulation of thedisaggregation problem as in [19]. The aggregatepower at time t ∈ { , , . . . , T } called P ( t ) ∈ R isassumed to be a linear combination of device pro-files according to their state changes as describedin the following equation: P ( t ) = (cid:88) i, ˜ ts i (˜ t )=1 s i (˜ t ) l i ( t + ˜ t )+ (cid:88) i, ˜ ts i (˜ t )= − s i (˜ t ) (˜ t,T ) ( t ) p i + (cid:15) ( t ) (1)The device profile of device i ∈ { , , . . . , M } con-tains a dynamic profile l i and a power value of thestable operating state p i ∈ R with τ i being the(typical) time until this state is reached. This be-havior is shown in Figure 2 S ∈ { , , − } T × M denotes the so-called state-changes-matrix with s i ( t ) being the t th row andthe i th column of S . If s i ( t ) = 1, the device i isswitched on at time t and for s i ( t ) = − s i ( t ) = 0, thestate of device i remains the same. (cid:15) ( t ) is referred Figure 2: Graphical representation of a theoretical deviceprofile including dynamic behavior in the beginning and thestable operating state after τ i .Figure 3: Graphical representation of the separation of com-plex appliance signatures into simple device profiles. Theleft profile contains repetitive patterns and is divided intothree characteristic simpler profiles that represent the char-acteristic patterns. to as always-on-component or noise. Given theseassumptions for the aggregate power signal, the fol-lowing optimization problem has to be solved:min S E (cid:18) P, P S (cid:19) (2) P denotes the measured aggregate power signal. P S denotes the reconstructed or approximated poweraccording to Equation 1 using the state changesmatrix S and the device profiles l i . E ( P, P S ) rep-resents an error function of P and P S . The statechanges matrix S and the device profiles l i have tobe found in order to minimize the error E . For device profile extraction, we assume a devicehaving a binary state, i.e. the device is only inthe state ON or OFF. Stand-by modes or differentoperational modes of one appliance would be de-scribed as individual device profiles. This appliesfor complex programs of some appliances as well.Fig. 3 shows a graphical and generic representationof the division of a complex appliance signature intosimplified device profiles.3he device profile extraction algorithm firstly de-tects times of events in the aggregate power signalby identifying peaks in the derivative of the aggre-gate power signal. Then, events are clustered us-ing the k-means algorithm to determine the char-acteristics when switching the specific device typeson or off. Afterwards, the clusters are cleaned andmerged. In order to determine the typical run-timeof the device, i.e. the length of the device profile,the clusters are split using Gaussian Mixture Mod-els according to the characteristic ON-duration. Fi-nally, median blending is used to extract the deviceprofiles from the aggregate power signal.
We start by identifying when state changes ofdevices take place. For this purpose, we use thederivative of the measured aggregate power signal P which is denoted by ∆ P : { , . . . , T − } → R andis calculated according to Equation 3 where t + 1denotes the subsequently measured point in timewith respect to t . Due to the measuring frequencyof 1 Hz, the relation is simplified to:∆ P ( t ) = P ( t + 1) − P ( t )( t + 1) − t = P ( t + 1) − P ( t )1 s (3)We assume that a state change takes places, whena sharp increase or decrease in the measured poweris observable. These inflection points in the ag-gregate power signal result in maxima or minimain the derivative. Maxima are referred to as ON-events and minima as OFF-events in the following.In order to identify events, we take the sum of ac-tive phases P tot ∈ R T with P tot = P + P + P into account. We perform a peak analysis in thederivative of the sum of the three phases of activepower ∆ P tot with ∆ P tot ∈ R T − . For the peakanalysis, we take all values of ∆ P tot into account,which are above a threshold value ε threshold , thus | ∆ P tot ( t ) | ≥ ε threshold . The threshold can to bechosen with respect to the given power data. Thechoosing process of a peak threshold could be auto-mated in the future. We assume, that the processof switching on or off a device is completed within1 sec. When ∆ P tot ( t ) is an event, we denote therespective time by t p and call t p event-time. We in-troduce the following peak criterion which defines time t to be or not to be an ON-event time t p : t = t p ⇔ ∆ P tot ( t − < ∆ P tot ( t ) ∧ ∆ P tot ( t + 1) < ∆ P tot ( t ) ∧ ∆ P tot ( t ) ≥ ε threshold (4)Equation 4 accordingly applies for OFF-events withreversed signs. The set of N events is referred to as D = { ∆ P ( t p , ) , . . . , ∆ P ( t p ,N ) } . The relation between active and reactive powershows to be distinctive for a specific device type[21]. Therefore, we assume the increase or decreasein the three phases of active and reactive powerat the time of an event to be characteristic forthe specific device type. With this assumption,we can cluster the extracted events ∆ P tot ( t p ) ac-cording to their values in all six power features todistinguish the device types. For the cluster anal-ysis we use the well known k-means cluster algo-rithm. It is assumed that the specific patterns ofan ON-event correspond to those of an OFF-eventwith reversed signs. Clustering is therefore onlyperformed for the ON-events and the OFF-eventsare assigned to the cluster centers with reversedsigns with the smallest deviation afterwards. Thek-means cluster algorithm divides a given data set D = { ∆ P ( t p , ) , . . . , ∆ P ( t p ,N ) } into K Clusters insuch a way, that the Euclidean distance of each datapoint to the nearest cluster center is minimized.The number of clusters K has to be given. Thiscan be formalized as:min r nk ,(cid:126)c k N (cid:88) n =1 K (cid:88) k =1 r nk | ∆ P ( t p ,n ) − (cid:126)c k | (5) r nk = 1 if the event ∆ P ( t p ,n ) belongs to the clus-ter k and r nk = 0 for all other clusters. Clustercenters are denoted by (cid:126)c k ∈ R and the accordingcluster is a set of assigned events denoted by c k .The k-means cluster algorithm solves the minimiza-tion problem using the expectation-maximization-method [23]. In order to determine the optimalnumber of clusters K opt for the given events, theCalinski-Harabasz-Score (CH) is used, which is de-fined by [24]:CH = N − KK − (cid:80) c k ∈ C | c k || (cid:126)c k − (cid:126)D | (cid:80) (cid:126)c k (cid:80) ∆ P ( t p ,i ) ∈ c k | ∆ P ( t p ,i ) − (cid:126)c k | (6)4 denotes the number of events. The center of thewhole data set D is denoted by (cid:126)D and C denotesthe set of clusters c k . The cardinality of cluster k is denoted by | c k | . The CH gets maximum for theoptimal K and calculates a ratio between the sep-aration of the clusters and the compactness withineach cluster. It is multiplied by the pre-factor N − KK − to prevent overfitting, because a larger number ofclusters K must not always result in a higher valueof the CH than a smaller number of cluster.In order to obtain K opt , we perform a k-meansclustering for K ∈ { . . . } and calculate CH everytime. We choose 50 as the upper limit to confine thecomputing time. An adaptive method to increase K until the CH is decreasing again would be possibleas well.After the first clustering of the extracted events,we perform a cleaning step of the clusters analo-gously to [21]. For this, we define outlier events∆ ˜ P ( t p ) to be out of a 2 σ -area of the respective clus-ter where σ denotes the standard deviation of therespective cluster. All outliers get clustered againwith fixed ˜ K = 10. A second CH-analysis would bepossible for the outlier events as well but this step issimplified since this cleaning step is optional in theprocedure of the extraction of device profiles. Withthe presented clustering procedure, the character-istic increase or decrease in all six power featureswhen switching a device on or off is known. In order to improve the clustering of the ex-tracted events, we perform a merging step of clus-ters based on a similarity measure. The similarityof two clusters is evaluated by means of the Pearsoncorrelation coefficient ρ ∈ [ − ,
1] and absolute per-centage error (APE) calculated for every combina-tion of two cluster centers. The Pearson correlationcoefficient of two cluster centers (cid:126)c i and (cid:126)c j is definedby the following equation [25]: ρ ( (cid:126)c i , (cid:126)c j ) = σ (cid:126)c i ,(cid:126)c j σ (cid:126)c i σ (cid:126)c j (7)where σ (cid:126)c i ,(cid:126)c j denotes the co-variance of (cid:126)c i and (cid:126)c j .The APE is defined by the following equation:APE( (cid:126)c i , (cid:126)c j ) = | (cid:126)c i − (cid:126)c j || (cid:126)c i | (8)If ρ ( (cid:126)c i , (cid:126)c j ) and APE( (cid:126)c i , (cid:126)c j ) are above/below a giventhreshold, cluster i and j are merged. For this,a new cluster is created and the cluster members Figure 4: Graphical representation of the division of an ON-duration distribution with Gaussian Mixture Models. of cluster i and j are assigned to this new clusterwith the cluster center (cid:126)c i, new = 1 / · ( (cid:126)c i + (cid:126)c j ). Thiscalculation of the new cluster center is also appliedif the cardinalities of c i and c j are different.The thresholds are chosen to be: ρ ( (cid:126)c i , (cid:126)c j ) > . ∧ AP E ( (cid:126)c i , (cid:126)c j ) < . c i satisfies the conditionin Eq. 9 with multiple other clusters. If that situ-ation applies, c i is only merged with the cluster ofhighest similarity. The newly created cluster c i, new is not merged again with other clusters. For the calculation of device profiles, the typi-cal run-time i.e. time in the state ON, is required.Therefore, we determine the time between an ON-event in a specific cluster and the next OFF-event inthat cluster for every ON-event in that cluster. Weperform this calculation for every cluster of events.The calculated time is referred to as ON-durationin the following. If there are more ON-events thanOFF-events, we neglect the surplus ON-events andvice-versa. For every cluster we present all de-termined ON-durations in a frequency distributionand observe multiple maxima at different times.In reality, the ON-duration depends on the kindof use of the individual device, for example if a de-vice is capable of running different programs or ifthe same device type is used for different tasks.In this work, Gaussian Mixture Models (GMMs)are used to divide clusters according to character-istic ON-durations within a specific cluster. Fig-ure 4 shows an exemplary distribution of the ON-duration distribution of a cluster with a fittedGMM which divides the distribution in two sub-distributions.GMMs determine the properties of sub-distributions in an overall distribution, using5nly observations of the overall distribution B = ( (cid:126)x , . . . , (cid:126)x N ) [26]. The a posteriori probabilityfor the GMM is calculated as follows: p ( θ | B ) = m (cid:88) i =1 π i N ( (cid:126)x | (cid:126)µ i , Σ i ) (10)where p ( θ | B ) describes the probability of the modelparameters θ given the data set B . The param-eters θ i = ( π i , (cid:126)µ i , Σ i ) denote the mixing coeffi-cients, the mean values and covariance matrices ofthe i th of m Gaussian distributions. The meanvalues of the Gaussian distributions represent themean ON-duration, which will be referred to as d in the following. Therefore, the ON-duration ofdevice i is denoted by d i . The number of sub-distributions m has to be given beforehand. The maximum-likelihood -method is used together withthe expectaion-maximization -algorithm to obtainan optimal estimation of θ [27]. In order to deter-mine the optimal number of Gaussian distributions m opt in the GMM of each cluster, the Bayesian In-formation Criterion (BIC) is used. The BIC is ameasure for comparing different models. It is de-fined by the following equation [28]:BIC ≈ M ln N − ln p ( B | θ ) (11) N denotes the number of data points in data set B and M the number of parameters in θ . Accord-ing to this definition, the BIC is to be minimized.As soon as ∆ BIC > M m and M m +1 , the model M m +1 is selected andthe corresponding m is called m opt . The limit for∆ BIC to select m opt has to be determined empiri-cally. In general, m should be increased as long as∆ BIC is negative for two subsequent models.Given this procedure, every cluster k is dividedin m groups . The groups that emerge from onecluster share the cluster center (the characteristicsat an ON-event and OFF-event) but differ in theircharacteristic ON-duration. An event is assignedto a group if the associated Gaussian distributionis maximum for the ON-duration of this event.From in total K clusters emerge M = (cid:80) Kk =1 m opt ,k groups which will be denoted as G i . The ON-duration of G i is referred to as d i . For the final calculation of the device profiles, me-dian blending is used for all groups. Median blend-ing is a method of noise reduction which we will
Figure 5: Graphical representation of the developed algo-rithm for device profile extraction. On the left, the opera-tions are depicted while the data of every step in the algo-rithm is presented on the right. use in order to reduce the noise of the aggregatepower signal to isolate the device profile from thisbackground [29]. For every element ∆ P ( t p ) ∈ G i we store and normalize the aggregate power signalfrom t p . . . t p + d i . The normalized power signal isdenoted by P norm . Normalization is carried out bydividing by the maximum power value in the storedpart of the aggregate power signal. Then, the me-dian for every point in time in the saved aggregatepower signal is calculated in all six power features.In order to scale the normalized profile back to ab-solute power values, we use the cluster center ofthe respective cluster. It represents the character-istic increase in power per second when switchingon a specific device type. Therefore, we integratethe cluster center by multiplying with one second.Finally, we scale back the median values by mul-tiplying (cid:126)c k and the normalized l i . We define thepower profile of device il i : { , . . . , d i } → R , (12)where d i denotes the ON-duration, by l i ( t ) = (cid:126)c k · median { P norm ( t p + t ) | t p is an ON-event of the device } (13)for every t ∈ { , . . . , d i } . Prerequisite for this pro-cedure are enough events in G i in order to reducethe noise of the aggregate power signal.6 .3. Disaggregation Procedure The disaggregation is carried out by particleswarm optimization (PSO) as described in [19]which is a for disaggregation improved version ofthe original description of PSO by Hart in [13]. ThePSO is a metaheuristic used for multidimensionaloptimization problems as the above presented dis-aggregation problem. In this work, we use PSOto determine the state changes matrix S . For thispurpose, the extracted device profiles are used. ThePSO aims for minimizing the following error mea-sure [19]: E [ a,b ) ( P, P S ) = α · b − (cid:88) t = a ( (cid:126)P S ( t ) − (cid:126)P ( t )) + β · b − (cid:88) t = a (∆ (cid:126)P S ( t ) − ∆ (cid:126)P ( t )) (14)with α + β = 1 weighting the two summands. Thealgorithm we use in this work to carry out the dis-aggregation is extensively presented in [19]. In [19]is assumed that a device profiles consists of tran-sient or dynamic behavior and a stable state reachafter a specific time τ . Thus, we assume the ex-tracted load profiles to represent the dynamic be-havior of the device. The power value of the stablestate is assumed to be the last non-zero power valueof the specific device profile. In this work, we present a new method for powerforecasts that is based on a forecast of state changesof unknown devices. The power forecast is car-ried out by reconstructing the state changes fore-cast according to Equation 1. The weekends ofthe used data set show very regular power curveswith highly repetitive patterns. Thus, it is as-sumed, that persistence forecasts are sufficient forweekend days. Therefore, we only consider workingdays since they show more complex power demandswith many sharp increases and decreases which weare aiming for to predict. For this purpose, weuse an artificial neural network (ANN). ANNs havebeen widely used as a very powerful method fortime series prediction in different fields regardingthe power grid [30, 31, 32]. Especially, for load andenergy forecasts ANNs are preferred due to the non-linearity and randomness within power data [8]. Weare aiming for the ANN learning an interrelation-ship between the last hour of state changes and the
Figure 6: Graphical representation of the time features giventhe ANN as input. state changes within the next 15 minutes. We usea feed forward, fully connected ANN based on thesupported models and functions of keras [33]. Inthe following, the feature selection and the hyper-parameter optimization of the ANN are described.Finally, the training procedure of the ANN is out-lined.
The feature selection for the ANN determines theinput and target data, also called output data. Allinputs and outputs have to be normalized to a rangeof − . . . t = − s . . . t = 0 s to be input data for a state changes prediction for t = 0 s . . . t = 900 s . Secondly, we chose the datafrom t = 0 s . . . t = 900 s of seven days before asinput data. Thus, for a prediction on a Thursdayat 11 am, the state changes data from Thursday10 am - 11 am as well as the state changes datafrom the Thursday of one week before from 11 am- 11:15 am are selected. This feature selection hasproven to be helpful, due to the regularity in indus-trial and commercial data related to the weekday[34]. For M given device types, the input data setcontains 2 M + 3 columns. The target data containsonly the future state changes data of the M devices.Thus, there are M columns in the target data set.The number of rows is determined by the size of7 igure 7: Graphical representation of the integration of statechanges for the data preparation for the training of the ANN. the training data set, thus the number of rows cor-responds to the number of time steps in the trainingdata set.During training, the difference between outputdata and target data is quantified by calculatingan error measure. A large proportion of the statechanges data is zeros. Thus, there is a local op-timum in the error measure of the ANN to onlypredict zeros so no state changes at all. Therefore,we perform an additional preprocessing step for thestate changes data: We transform the state changesdata to state data via integration. Figure 7 showsa graphical representation of the integration proce-dure of state changes. Essentially, the state changesare added up for every device. This step bypasses adata structure that contains many zeros. After inte-gration, the state data is normalized to the range of[ − . . . The hyperparameter optimization was carriedout with the help of talos and the supported ran-dom search [35]. Hyperparameters are all parame-ters of an ANN that are not adapted during train-ing but have to be set beforehand. The followinghyperparameters are considered for optimization:The number of neurons in the hidden layers setsthe width of the ANN whereas the number of hid-den layer determines the depth of the ANN. Thenumber of neurons in the hidden layers has not tobe the same for all layers, thus the width of theANN can vary.
Dropout describes the percentage ofneurons that are neglected randomly in every hid-den layer during a training step in order to increasethe robustness and decrease over-fitting of the ANN[36]. The learning rate is a measure for the stepsize made in the training process in every iteration.
Table 2: Hyperparameter and their chosen values for thestate forecast using an ANN
Hyperparameter Name Selected ValueNeurons in hidden layer 214Number of hidden layers 3Dropout 5 %Learning rate 0.01Batch size 2048Activation function ReluA larger learning rate decreases the training timebut increases the risk of not fully converging intoan error minimum and vice-versa for smaller learn-ing rates. The batch size determines the number ofsamples of the training data set that are processedat once. Thus, the parameters of the ANN are notadapted after every single sample passed the ANNbut only after as many samples as the batch size.As activation function the relu function proved tohave the best outcome in this work. The chosen val-ues of the optimized hyperparameters are presentedin Table 2.Given these hyperparameters, the chosen modelhas 137868 trainable parameters.We chose the mean squared logarithmic error(MSLE) as error measure which is defined as fol-lows:MSLE( y target , y Out ) =1 N N (cid:88) i =0 (cid:32) log( y target ,i + 1)log( y Out ,i + 1) (cid:33) (15)Due to the logarithmic character of the MSLE,it penalizes deviations at small values more heavilythan error measures like root mean squared erroror mean absolute error. This showed an improvedtraining process given the structure of data in thiswork. In order to evaluate and compare the resultsof prediction, we use the same error measures as forthe validation of the disaggregation results. The training is performed with an Intel i7-6700kprocessor, 16GB of RAM and a Geforce GTX 1050graphic card with 768 CUDA cores. The data setfor training includes 55 working days from January2019 to March 2019, thus it contains 4752000 rowsand 2 M + 3 columns. During training, 95% of thedata get used for training the network and 5% get8sed as a validation data set. As soon as the er-ror on the independent validation set increases, thetraining is stopped. This is carried out by the earlystopping option of keras [33]. As a postprocessingstep, we calculate the derivative, thus the reverseprocedure of the shown itegration. The output val-ues of the used ANN are floats and not integers asassumed in Equation 1. Thus, we interpret the out-puts as probabilities of state changes of the devices.In order to reconstruct the power, we allow floatsand calculate a weighted sum and not a discretesum. Therefore, Equation 1 changes as follows: P ( t ) = (cid:88) i, ˜ ts i (˜ t ) > . s i (˜ t ) l i ( t + ˜ t )+ (cid:88) i, ˜ ts i (˜ t ) < − . s i (˜ t ) (˜ t,T ) ( t ) p i + (cid:15) ( t ) (16)with s i ( t ) ∈ R . We define a threshold of 0.1 totake an element of the prediction into account forreconstruction. As always-on-component (cid:15) , we giveeach short term prediction the last measured powervalue. Thus, for a prediction from t = 0 to t = 900we set (cid:15) = P ( −
4. Results
In this section we present the results of the ap-plication of the developed methods to the abovedescribed data set. For the device profile extrac-tion we use the data from January and February2019. Thereafter, we disaggregate the whole dataset. In order to train the forecast algorithm, weuse data from January until March 2019. The test-ing of the forecast algorithm is carried out usingthe last two days in the data set: 28 th and 29 th March, 2019. Since the forecast horizon is 15 min,we are able to perform an evaluate 188 single powerpredictions on the test data set. In order to vali-date the results of disaggregation and prediction,we use the error measures root mean squared error(RMSE), mean absolute error (MAE), mean abso-lute percentage error (MAPE) and the percentageenergy difference (Energy E ) as in [19]. In Figure 8 an exemplary cluster analysis of oneday of data (December 4 th , 2018) is shown for the Figure 8: Clustering of ON-events and OFF-events individu-ally for December 4 th , 2018. Presented are two of six powerderivative features. elements ∆ P ( t p ) and ∆ P ( t p ). ON-events are de-picted as well as OFF-events and the symmetry tothe central point zero is clearly visible. It is ap-parent, that the relation of the increase in activeand reactive power is not randomly distributed, butforms clusters. ON-events and OFF-events are clus-tered individually. The cluster forming behaviorbecomes clearer taking all six features of the powerderivative into account. Therefore, all six featuresof six exemplary cluster centers are presented inFigure 9.It is visible, that the clusters have very distinctcharacteristics regarding the relation between thesix features. Whereas Example 1, 3 and 5 onlyshow an increase in one phase of power, the otherfour examples seem to represent three-phase con-nected devices. They approximately have the samepower derivative at an ON-event in all three phasesin active and reactive power. The relation betweenactive and reactive power is very distinct. Whilethe Examples 1, 3 and 5 show almost no increase inreactive power when switched on, Example 4 and 6have significant reactive power increases. For Ex-ample 4 the increase in reactive power is even higherthan the increase in active power.To show the separation of clusters according totheir ON-duration, Figure 10 shows two exemplaryON-duration distributions with the respective fittedGMMs. Cluster 15 from Figure 10 gets divided intotwo groups with approximate ON-durations of 200 sand 1000 s. On the other hand, Cluster 14 getsdivided into three groups with approximate ON-durations of 250 s, 900 s and 1900 s.Given the examples for different steps of the de-veloped algorithm, we show four exemplary device9 igure 9: Six examples of cluster centers and their charac-teristics in all features of the power derivative.Figure 10: Two examples of GMMs for ON-duration distri-butions profiles in Figure 11. In total, we extracted 52device profile from the aggregate power data withthe developed algorithm. The depicted profiles arerepresentative for all extracted device profiles sincethey show the main patterns and behaviors of thedevice profiles we extracted. Both upper illustra-tions in Figure 11 show the most frequent typeof device profiles: A three phase connected devicewith a transient behavior in the beginning and af-terwards a stable operating state where the relationbetween active and reactive power remains approx-imately the same. Additionally, Device Profile 4and 42 show, that the relation of active and reac-tive power is characteristic for the specific device.The length of Profile 4 and 42 differs as well.Device Profile 6 shows no dynamic behavior inthe beginning of the profile, but it consists of aconstant component and an oscillating or randomcomponent. Obviously, Device Profile 6 representsa single-phase connected device, since all power val-ues other than P are close to zero. Profile 6 has ahigh ON-duration compared to Device Profile 4 and42 with d ≈ s . An exception of the deviceprofiles is represented by Device Profile 37, which Figure 11: Four examples of device profiles extracted unsu-pervised from the aggregate power signal. Solid lines repre-sent active power and dashed lines represent reactive power. shows a decreasing behavior with many little butsharp increases and decreases in all power features.This kind of device profiles is least frequent.
In total, we disaggregated power data of 119 daysfrom December 2018 to March 2019. Figure 12shows an exemplary day of data with the sum ofactive phases on the left and the sum of reactivephases on the right. Beneath, the respective abso-lute error is shown. It is visible, that the PSO isable to reconstruct the shape of the aggregate powersignal over the duration of one whole day includingthe repetitive patterns during the night and mostof the peaks. Nevertheless, there are error peaksof up to almost 20 kW which corresponds to ap-proximately 25 % of the measured power at the re-spective time. Nevertheless, these high error valuesoccur not frequently and they are of very short du-ration. During working time, the absolute error ishigher than at night but there is no constant offsetbetween measured and reconstructed power. Theerror of the reactive power is larger than the errorof the active power. At the end of the presentedday, noise is present in the reconstructed power.Table 3 shows the error values regarding all con-sidered error measures for the working days ofMarch 2019. The mean values of the daily evalua-tion of the reconstruction after disaggregation arepresented as well as the respective standard devi-ation of the mean. The reproducibility of resultsis shown by the standard deviations of the meanerror values which are approximately 10% of the10 igure 12: Disaggregation results for 4 th December, 2018.On the left is the sum of active power shown and on theright side is the sum of reactive power shown. At the bot-tom, the respective absolute difference between measuredand reconstructed power illustrated.Table 3: Error characteristics between the sum of activepower measured and reconstructed after disaggregation forMarch 2019. The values are means of daily error evaluationsand standard deviations of the mean values. st , 2019- March 31 th , 2019)RMSE [W] 1565 ± ± ± E [%] 0.897 ± ± Given the data produced by the disaggregation,an ANN is trained according to the in Section 3described pre- and postprocessing of the data. Thedata set for testing the ANN performance consistsof the 28 th and 29 th March, 2019. Therefore, we cancalculate 188 power predictions of 15 minutes eachfor the test data set. The ANN is given the firsthour of the test set as input data. To put the resultsinto perspective, we compare the error measures onthe test set with error measures for different persis-tence forecasts. Lastly, we compare the developedshort term prediction based on state changes datawith prediction results of an ANN mainly basedon past power data with a granularity of 5 minfrom [34]. In [34], the authors used the same data as in this work and optimized a Long-Short-Term-Memory Neural Network for a 24-h-day-ahead pre-diction. Although the prediction horizon and thegranularity are different, the power prediction from[34] represents the standard prediction procedureand therefore acts as a benchmark prediction. Allerror measures are calculated for the sum of ac-tive power phases. Table 4 shows the means andstandard deviation of multiple error measures forthe predictions. The first persistence forecast usesthe power values from seven days ago whereas thesecond persistence forecast uses the power valuesof the preceding 15 min. That means, for a pre-diction from t . . . t + 900 s the power values from t −
900 s . . . t are taken. In comparison, Table 5shows the respective means and standard devia-tions of the error measures for the ANN predictionsbased on disaggregation data. The ANN outper-forms both persistence forecasts regarding the meanerror values of all calculated error measures. Espe-cially, the MAPE and the error in daily consumedenergy is significantly smaller. Table 4: Multiple error measures between measured and pre-dicted power for two different persistence forecasts. Pre-sented are the means and standard deviations of the errors.They are calculated for 188 individual predictions of 15 min-utes each for the test data set 28 th - 29 th March, 2019.
Persistence Persistence7 days before 15 minRMSE [W] 6148 ± ± ± ± ± ± E [%] 35.25 ± ± Table 5: Multiple error measures between measured and pre-dicted power of the described ANN. Presented are the meansand standard deviations of the errors. They are calculatedfor 188 individual predictions of 15 minutes each for the testdata set 28 th - 29 th March, 2019.
Power predictionwith ANNRMSE [W] 3478 ± ± ± E [%] -0.15 ± igure 13: Comparison of 24h-day-head power forecast andvery short term power prediction based on state forecasts sured power and both ANN predictions are shown.The 24 h day-ahead prediction is similar to a rollingaveraged power value whereas the short term pre-diction based on state changes data shows more theerratic behavior during working time with sharp in-creases and decreases in the power. For the modelof the 24 h day-ahead prediction we can calculatethe RMSE and MAE which results in RMSE =5124 W and MAE = 4507 W. Both values are sig-nificantly higher than the mean error values of theshort term prediction of the disaggregation basedANN.
5. Discussion
For the extraction of device profiles, the maindistinguishing factor is the behavior at an ON-event. The used peak criterion to determine theON-behavior is very simple and neglects peaks witha width of multiple time-steps. This problem couldbe solved with a more sophisticated peak criterionin future work.The k-means clustering algorithm is used to de-termine clusters in the six-dimensional space of re-active and active power phases. Also other publica-tions use a clustering to differentiate between devicetypes like [21, 22] but they use at maximum twofeatures and not six features like in this work. Ingeneral, a clustering is more precise, the more char-acteristic features are present [23]. Thus, we canassume that we reach a higher precision in dividingthe events into clusters of device types. Other prop-erties measured by power analyzers could be usedadditionally to distinguish between different devicetypes in future work. Nevertheless, the numberof necessary features should be limited regardinga realistic application in real world energy manage-ment systems and the availability of high resolution power analyzers.During clustering, we assume that the clustercenters of OFF-events are the reversed cluster cen-ters of ON-events. When clustering is performed forON-events and OFF-events individually, the OFF-event cluster centers with reversed signs lie withina 0.25- σ area of the ON-event cluster center with σ denoting the standard deviation of the respectivecluster. Therefore, the assumption can be justified.Additionally, Figure 9 shows the symmetry to thecentral point of ON- and OFF-events.In order to determine the ON-event behavior weperform a peak analysis based on the assumption,that the switching procedure of a device is finishedwithin one second. In reality, most devices show atransient behavior mostly in the shape of an expo-nentially decreasing oscillation [37]. Some publica-tions distinguish between different transient behav-iors and thus different device types [38, 39]. But,compared to the measuring frequency, these pro-cesses happen on shorter timescales (within mil-liseconds) and can be neglected here. Only witha measuring frequency of kHz, the characteristictransient behavior would be observable [37]. Butan exhaustive installation of measuring infrastruc-ture which is able to measure in kHz is unlikely.Thus, the presented approach using a measuringfrequency of 1 Hz is more realistic to be applied inlocal energy and power management systems. Withthe measuring frequency of 1 Hz in this work an ON-event looks approximately like a step function in theaggregate power signal. Nevertheless, in most de-vice profiles in Figure 11, a transient behavior canbe observed in the first few seconds of the accord-ing profiles. Thereafter, most devices reach a stablestate for as long as the profile persists. Thus, thedivision of the device profiles in a stable state and adynamic behavior for the stated formulation of thedisaggregation problem can be applied here.The final step of the device profile extraction pro-cedure is the median blending. In general, moresamples to perform median blending with result inmore precise results. Thus, it is very importantto perform the extraction procedure on a sufficientamount of data. Especially, devices which are onlyswitched on rarely result in less accurate profiles.The chosen normalization is carried out by meansof a division by the maximum power value in everysample of P norm . With a high base load, this pro-cedure could average out characteristic fluctuationsof the device profile. Therefore, another normaliza-tion method might be appropriate if the individual12rofiles are of great importance and an allocationto real measured profiles is of interest. However, wefocused on the high quality very short term powerprediction with the focus of attention on the aggre-gate power signal. Thus, small fluctuations of in-dividual device profiles were of minor importance.The improvement of the median blending procedureor the application of other noise reduction methodsfor device profile extraction could be examined infuture work.In total, 52 device profiles are extracted for ourdataset of a commercial consumer. Since no addi-tional knowledge of the used data is present, we cannot validate this number of device profiles. But, thereason for this number of profiles is the division byON-behavior and additionally the division of clus-ters into groups with similar run-time. Even a sim-ple ohmic consumer type could therefore result inmultiple device profiles.A direct validation of the device profiles was notpossible in this work due to a lack of data of thecorrect device profiles. Additionally, an assignmentof extracted device profiles to measures, complexappliance signatures would be difficult since the ex-tracted profiles only represent operational modes ofappliances. But the good results in disaggregationand forecast show that the extracted device profilesare a satisfactory representation of the real deviceprofiles.The extraction procedure has similarities tonon-negative blind sources separation in acoustics,where the individual components and the the mix-ing procedure are unknown [40]. Since all methodsused for the device profile extraction are from statis-tics and unsupervised machine learning, no hyper-parameters have to be optimized to apply the algo-rithm to a different data set. The needed hyperpa-rameters as the number of clusters K or the numberof Gaussian distributions in the GMMs are deter-mined using statistical scores or criteria. Therefore,the device profile extraction algorithm can be ap-plied without changes to other data sets. The trans-ferability has to be examined systematically in thefuture.Figure 12 and Table 3 show, that the disaggre-gation of this work reaches a very accurate recon-struction of the measured power. The results areconsistently good in all six phases. Thus, we canassume that the device profiles are a good represen-tation of the real devices and also the separationaccording to the ON-event behavior seems valid.Since the PSO is a metaheuristic, incorrect assign- ments of devices to events are possible. Neverthe-less, the disaggregation procedure produces addi-tional knowledge of the building or the respectivedata set without a costly model building and adap-tion to the data. The aim of this work is the use ofthis additional knowledge for the purpose of a veryshort term power prediction and the examinationif this additional knowledge provides benefits forsuch an application. The disaggregation procedurecan be justified, if a disaggregation based predic-tion method outperforms classic prediction meth-ods working in the power domain.The conducted very short term forecast usingstate changes data shows significantly better resultsthan multiple persistence forecasts and a forecastusing a LSTM network which is optimized for 24hour prediction with a resolution of 5 min. Thus,the LSTM predicts 288 values compared to the 900of our short term forecast. To be mentioned is,that the maximum accuracy of predictions is the ac-curacy of the reconstruction of the disaggregation.Thus, error values smaller than the reconstructionerror values can only be undercut by chance, butnot systematically. The used model is a very sim-ple ANN for a high number of input and targetfeatures. Thus, further optimization regarding themodel of the neural network and maybe the useof LSTM layers or convolutional layers could resultin better forecasts. The ANN is optimized for theused data. Therefore, results could be worse, whenapplied to another data set of state changes. Thedeveloped forecast does not rely on a exhaustiverollout of measuring frequency as [9] and thus iseasily transferable also with limited measuring in-frastructure. Nevertheless, the transferability hasto be examined systematically in the future.It is to be assumed that a certain proportionof state changes of devices during working time ispurely coincidental. However, randomness cannotbe predicted by any model. In order to assess thechances of success of applying the presented ap-proach to other power data, the randomness of thedata could be determined in advance using appro-priate methods. For example, the approximatedentropy method described in [41] could be used,which has already been applied to i.e. stock pricesin [42]. Additionally, instead of a deterministic pre-diction, one could perform a probabilistic predic-tion and/or work with confidence intervals for thepredicted power values. This procedure could helpin management decision making. In this work, weshowed the advantages of the state changes data for13ower predictions. But the additional knowledgefrom device profile extraction and disaggregationcould also be applied for other tasks like behavioralanalysis, state analysis of the building, checkingthe health status of residents or employees or giverecommendations for an intelligent power consump-tion regarding the availability of renewable energy.With more variable market-based electricity tariffseven new business models would be possible usingthe presented approach in energy management sys-tems.
6. Conclusions
In this work, we developed an algorithm for ex-tracting device profiles from aggregate power datain six dimensions fully unsupervised. Since themethod relies on statistical and unsupervised ma-chine learning methods, it extracts repetitive pat-terns in the aggregate power data. Therefore, theextracted profiles are not necessarily full appliancesignatures but one operational mode of one device.The direct validation of device profiles was not pos-sible due to a lack of measured or correct deviceprofiles. The transferability of the proposed deviceprofile extraction is really high in theory, since nohyperparameters have to be optimized beforehandbut this has to be proven in future work. The dis-aggregation uses the extracted device profiles andshows a very accurate reconstruction. Thus, thedevice profiles seem to represent real appliance sig-natures sufficiently well. As the final application ofthe conducted NILM approach, the very short termprediction of power outperformed all compared pre-dictions. Although, many publications developedor carried out various NILM algorithms, a broadapplication of those methods for other purposes isstill missing. In this work, we showed the advan-tages of the additional knowledge of NILM for veryshort term power predictions. Our results and ap-proaches for predictions could be combined withshort term or long term power predictions work-ing directly in the power domain. Especially forenergy management systems, such combined andhigh quality predictions would be very valuable fordecision making processes.
Acknowledgments
The authors acknowledge the financial sup-port of the Federal Ministry for Economic Affairs and Energy of the Federal Republic ofGermany for the project
EG2050: EMGIMO:Neue Energieversorgungskonzepte fr Mehr-Mieter-Gewerbeimmobilien (03EGB0004Gand 03EGB0004A)
References [1] G. Strbac and A.M. Khambadkone, ”Demand sidemanagement: Benefits and challenges”,
Energy Policy ,36(12)(2008), pp. 4419-4426[2] D. Tran and A.M. Khambadkone, ”Energy managementfor lifetime extension of energy storage system in micro-grid applications”,
IEEE Transactions on Smart Grid ,4(3)(2013), pp. 1289-1296[3] D. Arcos-Aviles, J. Pascual, F. Guinjoan, L. Marroyo,P. Sanchis and M. Marietta, ”Low complexity energymanagement strategy for grid profile smoothing of a res-idential grid-connected microgrid using generation anddemand forecasting”,
Applied Energy , 205(2017), pp. 69-84[4] C. Wan et.al., ”Photovoltaic and solar power forecastingfor smart grid energy management”,
CSEE Journal ofPower and Energy Systems , 1(4)(2015), pp. 38-46[5] L. Pedersen, J. Stang and R. Ulseth, ”Load predictionmethod for heat and electricity demand in buildings forthe purpose of planning for mixed energy distributionsystems”,
Energy and Buildings , 40(7)(2008), pp. 1124-1134[6] M. Beccali, et.al., ”Forecasting daily urban electric loadprofiles using artificial neural networks”,
Energy conver-sion and management , 45(18-19)(2004), pp. 2879-2900[7] H. Li, et.al., ”A hybrid annual power load forecastingmodel based on generalized regression neural networkwith fruit fly optimization algorithm”,
Knowledge-BasedSystems , 37(2013), pp. 378-387[8] K. Lang, M. Zhang, Y. Yuan, and Y. Xijian, ”Short-term load forecasting based on multivariate time seriesprediction and weighted neural network with randomweights and kernels”,
Cluster Computing , vol. 22(2019),pp. 1258912597[9] A. M. Alonso, F. J. Nogales and C. Ruiz, ”A Sin-gle Scalable LSTM Model for Short-Term Forecast-ing of Massive Electricity Time-Series” , arXiv preprintarXiv:1910.06640 , 2020[10] Federal Environment Agency, ”Evaluation tables forEnergy balance of the Federal Republic of Germany1990 to 2018”, 2019, retrieved from: [11] H.S. Hippert, C.E. Pedreira and R.C. Souza, ”Neu-ral networks for short-term load forecasting: A reviewand evaluation”,
IEEE Transactions on power systems ,16(1)(2001), pp. 44-55[12] M.Q. Raza and A. Khosravi, ”A review on artificialintelligence based load demand forecasting techniques forsmart grid and buildings”,
Renewable and SustainableEnergy Reviews , 50(2015), pp. 1352-1372[13] G.W. Hart, ”Nonintrusive appliance load monitoring”,
Proc. of the IEEE , vol. 80(1992), pp. 1870-1891[14] A. Faustine, et. al., ”A survey on non-intrusive loadmonitoring methodies and techniques for energy dis- ggregation problem.” arXiv preprint arXiv:1703.00785 (2017).[15] A. Zoha, et. al., ”Non-intrusive load monitoring ap-proaches for disaggregated energy sensing: A survey.” Sensors , 12(12) (2012), pp. 16838-16866.[16] J.Z. Kolter and M.J. Johnson, ”REDD: A public dataset for energy disaggregation research”,
Proc. SustKDDWorkshop on Data Mining Appl. in Sustain. , 2011[17] S. Welikala, et. al., ”Incorporating appliance usage pat-terns for non-intrusive load monitoring and load forecast-ing.”
IEEE Transactions on Smart Grid , 10(1)(2017),pp. 448-461.[18] M. Wurm and V.C. Coroama, ”Grid-level short-termload forecasting based on disaggregated smart meterdata”
Computer Science-Research and Development ,33(1-2)(2018), pp. 265-266.[19] K. Brucke, S. Arens, J. Telle, S. Schlters, B. Hanke,K. Maydell and C. Agert, ”Particle Swarm Optimizationfor Energy Disaggregation in Industrial and CommercialBuildings” , arXiv preprint arXiv:2006.12940 , 2020[20] Janitza electronics GmbH,
Power Quality AnalyserUMG 604-PRO - User manual and technical data , 2017[21] S.K.K. Ng, J. Liang and J.W.M. Cheng, ”AutomaticAppliance Load Signature Identification by StatisticalClustering”, , Hong Kong, pp. 1-6[22] M. Zeifman, S.R. Shaw and J.L. Kirtley, ”Disaggrega-tion of home energy display data using probabilistic ap-proach”,
IEEE Transactions on Consumer Electronics ,58(1)(2012), pp. 23-31[23] C.M. Bishop, ”K-means Clustering”,
Pattern Recogni-tion and Machine Learning , Springer, New York, 2006[24] T. Cali´nski and J. Harabasz, ”A dendrite method forcluster analysis”,
Communications in Statistics - Theoryand Methods , 3(1)(1974), pp. 1-27[25] R.G. McClarren, ”Pearson Correlation”,
UncertaintyQuantification and Predictive Computational Science ,Springer, Cham, Switzerland, 2018[26] C.M. Bishop, ”Mixture of Gaussians”,
Pattern Recog-nition and Machine Learning , Springer, New York, 2006[27] C.M. Bishop, ”EM for Gaussian Mixtures”,
PatternRecognition and Machine Learning , Springer, New York,2006[28] C.M. Bishop, ”Model Comparison an BIC”,
PatternRecognition and Machine Learning , Springer, New York,2006[29] S. Amri, W. Barhoumi, E. Zagrouba ”Unsupervisedbackground reconstruction based on iterative medianblending and spatial segmentation”,
Proceedings ofIEEE International Conference on Imaging Systems andTechniques, IST 2010 , Thessaloniki, Greece, pp. 411-416[30] S. Al-Dahidi, O. Ayadi, M. Alrbai and J. Adeeb, ”En-semble approach of optimized artificial neural networksfor solar photovoltaic power prediction”,
IEEE Access ,vol. 7(2019), pp. 81741-81758[31] T. Kim and S. Cho, ”Predicting residential energy con-sumption using CNN-LSTM neural networks”,
Energy ,vol. 182 (2019), pp. 72-81[32] A. Torabi, S.A.K. Mousavy, V. Dashti, M. Saeedi andN. Yousefi, ”A new prediction model based on cascadeNN for wind power prediction”,
Computational Eco-nomics , vol. 182(3) (2019), pp. 1219-1243[33] F. Chollet, ”Keras”, 2015, retrieved from https://github.com/fchollet/keras [34] T. Steens, J. Telle, B. Hanke, K. Maydell, C. Agert,G. di Modica, B. Engelb and M. Grottke, ”A Fore-cast Based Load Management Approach For CommercialBuildings Comparing LSTM And Standardized LoadProfile Techniques” , arXiv preprint arXiv:2007.06832 ,2020[35] Autonomio Talos [Computer software], 2019, retrievedfrom http://github.com/autonomio/talos [36] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskeverand R. Salakhutdinov, ”Dropout: A Simple Way to Pre-vent Neural Networks from Overfitting”,
Journal of Ma-chine Learning Research , 15(2014), pp. 1929-1958[37] G. Balzer and C. Neumann, ”Switching operations [inGerman]”,
Switching and balancing operations in elec-trical networks [in German] , Berlin, Germany, Springer,2018[38] S.B. Leeb, S.R. Shaw and J.L. Kirtley, ”Transient eventdetection in spectral envelope estimates for nonintrusiveload monitoring”,
IEEE Transactions on Power Deliv-ery , 10(3)(1995), pp. 1200-1210[39] S.B. Leeb, S.R. Shaw and J.L. Kirtley, ”Load identifica-tion in nonintrusive load monitoring using steady-stateand turn-on transient energy algorithms”,
The 2010 14thInternational Conference on Computer Supported Coop-erative Work in Design , , Shanghai, China, pp. 27-32[40] M. Pal, R. Roy, J. Basu and M.S. Bepari, ”Blind sourceseparation: A review and analysis”, , Gurgaon, 2013,pp. 1-5[41] S. Pincus, ”Approximate entropy as a measure of sys-tem complexity”,
Proceedings of the National Academyof Sciences , 8(6)(1991), pp. 2297-2301[42] A. Delgado-Bonal, ”Quantifying the randomness of thestock markets”,
Scientific reports , 9(1)(2019), pp. 1-11, 9(1)(2019), pp. 1-11