[PDF] Using an Ancillary Neural Network to Capture Weekends and Holidays in an Adjoint Neural Network Architecture for Intelligent Building Management

Abstract

The US EIA estimated in 2017 about 39\% of total U.S. energy consumption was by the residential and commercial sectors. Therefore, Intelligent Building Management (IBM) solutions that minimize consumption while maintaining tenant comfort are an important component in addressing climate change. A forecasting capability for accurate prediction of indoor temperatures in a planning horizon of 24 hours is essential to IBM. It should predict the indoor temperature in both short-term (e.g. 15 minutes) and long-term (e.g. 24 hours) periods accurately including weekends, major holidays, and minor holidays. Other requirements include the ability to predict the maximum and the minimum indoor temperatures precisely and provide the confidence for each prediction. To achieve these requirements, we propose a novel adjoint neural network architecture for time series prediction that uses an ancillary neural network to capture weekend and holiday information. We studied four long short-term memory (LSTM) based time series prediction networks within this architecture. We observed that the ancillary neural network helps to improve the prediction accuracy, the maximum and the minimum temperature prediction and model reliability for all networks tested.

Full PDF

11 Using an Ancillary Neural Network to CaptureWeekends and Holidays in an Adjoint NeuralNetwork Architecture for Intelligent BuildingManagement

Zhicheng Ding, Mehmet Kerem Turkcan, and Albert Boulanger

Abstract —The US EIA estimated in 2017 about 39% oftotal U.S. energy consumption was by the residential andcommercial sectors. Therefore, Intelligent Building Management(IBM) solutions that minimize consumption while maintainingtenant comfort are an important component in reducing energyconsumption. A forecasting capability for accurate prediction ofindoor temperatures in a planning horizon of 24 hours is essentialto IBM. It should predict the indoor temperature in both short-term ( e . g . 15 minutes) and long-term ( e . g . 24 hours) periods accu-rately including weekends, major holidays, and minor holidays.Other requirements include the ability to predict the maximumand the minimum indoor temperatures precisely and provide theconﬁdence for each prediction. To achieve these requirements,we propose a novel adjoint neural network architecture for timeseries prediction that uses an ancillary neural network to captureweekend and holiday information. We studied four long short-term memory (LSTM) based time series prediction networkswithin this architecture. We observed that the ancillary neuralnetwork helps to improve the prediction accuracy, the maximumand the minimum temperature prediction and model reliabilityfor all networks tested. Index Terms —energy consumption, adjoint neural network,intelligent building management, time series prediction, multiple-steps ahead prediction.

I. I

NTRODUCTION W ITH a rapid population growth, the total energy con-sumption in both residential and commercial buildingshas increased and threatens the environment of the earth [1].Developing an energy-saving IBM system helps to mitigatethis problem.Predicting indoor temperature is the key to building suchsystems. This is a difﬁcult task because energy consumptionis inﬂuenced by many different factors such as occupancy,outside weather, solar load, and the building’s construction [2].As a result, IBM has become a popular research topic [3] inrecent years.Recent literature on IBM focuses on models utilizing ar-tiﬁcial neural networks (ANN) since such approaches arepowerful in modeling nonlinear problems which are hard tomodel by other machine learning approaches [4]. Recurrentneural networks (RNNs) have shown remarkable performancein predicting when trained with large time series datasets [5].RNNs work well for indoor temperature forecasting becausethe indoor temperature has certain time-related patterns that forexample include daily, weekly, monthly, seasonally, and yearlypatterns. A number of practical studies prove the efﬁcacy of ANNs for IBM. For example, an ANN model was used topredict the air temperature of buildings and was found toexhibit the best performance [6]. Another ANN model thatwas built for indoor air temperature forecasting outperformedcompeting regression methods [7].However, most of these studies utilize a single ANNmodel [3] and only a few studies that combine two or morenetworks or models have so far been proposed. In addition,only a small set of studies provide the conﬁdence of the model,which is important for evaluating a model’s reliability [8]. Inaddition, the model should predict the indoor temperature inboth short-term and the long-term time periods [9]. Short-term predictions provide relatively precise results to the IBMwhereas the long-term predictions offer an overview to thesystem, allowing the IBM enough foresight to optimize currentactions to take care of temperature control challenges later inthe day.Inspired by ensemble methods and dropout inference [10],[11], we propose a novel adjoint neural network architecturethat uses an ancillary neural network to capture weekends andholidays to increase the indoor temperature prediction accu-racy for IBM. This architecture also combines the advantagesof LSTMs and multilayer feed-forward neural networks [12].Our proposed method addresses the aforementioned issues andoutperforms the prediction of a single model. Meanwhile,it provides predictions for every 15 minutes of 24 hourswith 68% and 95% conﬁdence intervals (CIs). The maincontributions of this paper are summarized as follows. • We propose an adjoint neural network architecture thatuses an ancillary neural network to capture weekends andholidays to increase the accuracy of indoor temperaturepredictions. This architecture enables the prediction of upto 96 steps (24 hours) and provides conﬁdence for all thepredictions. • We conducted a comprehensive analysis and comparisonby applying our proposed architecture to four populartime series prediction models. The comparison includesthree metrics (average error of prediction, the errorof max/min temperature prediction, and reliability ofthe prediction) in both one-step ahead prediction andmultiple-steps ahead prediction. a r X i v : . [ c s . L G ] D ec Fig. 1. The architecture of the proposed neural network that includes three subnetworks. A) is a main neural network whose inputs are critical features andits outputs are the prediction of each input. It includes an LSTM and a multilayer feed-forward neural network. B) is an ancillary neural network in whichextra features are utilized. This module includes a relatively shallow neural network. C) is the model weighting part. We use the weighted average of ˆ y main and ˆ y anc to forecast the output. The output includes the predictions of up to 96 steps (24 hours) in the future. In our setting, we use a 2-layer LSTM unit, a7-layer multilayer feed-forward neural network, and a 4-layer shallow neural network. In total, this model contains 9,349,056 trainable parameters. II. R

ELATED W ORK

A. Artiﬁcial Neural Networks

ANNs have achieved a great success in many differentﬁelds [13]. A typical neural network contains many artiﬁcialneurons which are known as units. There are three types ofunits [14]: input units, hidden units, and output units. Withmore and more hidden layers employed, the neural work ismore able to learn deeper abstractions and learns as an artiﬁcialbrain [3].Techniques to effectively train ANNs with more than threelayers have become standard.

Deep Neural Network s (DNNs)are neural networks with more than one hidden layer and itusually works better than shallow neural networks [15] for aparticular task.DNNs have been used in many different domains. In theimage processing domain, DNNs were used to learn graphicsrepresentations [16] and estimate human pose [17]. In thetrafﬁc domain, DNNs were used to classify trafﬁc signs [18]and predict trafﬁc ﬂow [19]. In our IBM domain, DNNs helpedpredict indoor temperature [20] and energy consumption [21].In recent years, other ANN designs have become commonlyused. The LSTM and Bayesian Neural Network (BNN) de-signs are relevant and are reviewed below.

B. Long Short-Term Memory

LSTM networks [22] have become popular in the recentyears for time series prediction [8]. LSTM networks are aspecial kind of RNNs which are designed to solve the long-term memory problem. LSTMs have been used in many timeseries problems. For example, networks using LSTMs have been successfully employed in a number of important prob-lems like speech recognition [23], machine translation [24],and energy load forecasting [25].In this paper, we will use a densely connected multilayerfeed-forward network after an LSTM encoder. The multilayerfeed-forward neural network utilizes the temporal informationfor a robust prediction. We take the advantage of the encoderin the LSTM layer for extracting temporal information. Wefeed the ﬁnal states of the LSTM encoder to the multilayerfeed-forward network.III. M

ETHODOLOGY

The complete architecture proposed is shown in Figure 1.The architecture contains three parts: A) is the main neural net-work (LSTM followed by a multilayer feed-forward network)which aims to learn temporal patterns from signiﬁcant features.B) is an ancillary neural network which takes in extra featuresregarding weekends and holidays using an LSTM followedby relatively shallow multilayer feed-forward network). C) isthe combiner part whose outputs are the weighted average ofthe predictions from part (A) and part (B). The output includesthe forecast of one-step ahead (next 15 minutes) and multi-stepahead (up to 96 steps, 24 hours in total). After that, we adda dropout unit to infer the CIs of the result. We will discusseach module and CI in the details below.

A. Main Neural Network

This module aims to learn from the important features.Those features usually change along with time, such as outsidetemperature, occupancy, and date. Also, these features havedaily, weekly, monthly, seasonally, and yearly patterns. Thus,we construct the data as continuous time series input. Those

Fig. 2. Sample of one-step ahead prediction. The 68% and 95% CIs are shown as gray and light gray color bands respectively. The black solid line representsthe ground truth indoor temperature. The gray dash line denotes the prediction of LSTM-DNN. The indoor temperature shows in a different pattern as it ison weekdays. As for the data used in our work, Columbus day was the 13th of October.Fig. 3. Comparison of one-step ahead prediction by the LSTM-DNN without the ancillary network and the adjoint LSTM-DNN. The light gray dash linerepresents the prediction by the LSTM-DNN without the ancillary neural network, and the gray dash line denotes the prediction by the adjoint LSTM-DNN.The indoor temperature shows in a different pattern as it is on weekdays. As for the data used in our work, Columbus day was the 13th of October. data pass to the LSTM units which learn temporal informationand construct internal states. The internal states are furtherpropagated to a 7-layer feed-forward network that forecasts theone-step ahead and 96-steps ahead indoor temperature ˆ y main .Given a dataset which includes S timestamps, the modelneeds to predict the next n timestamps and for each timestamp,there are m features. Thus, we construct the input data as athree-dimensional matrix and the size of the matrix is S × n × m ,We use a 2-layer LSTM network to extract temporal in-formation and then feed the temporal information to a 7-layer feed-forward neural network. Next, the network forecaststhe indoor temperature ˆ y main which contains 96-steps aheadpredictions. Since the duration of each time step is 15 minutes,the 96-steps ahead predictions will cover the prediction of every 15 minutes of next 24 hours. B. Ancillary Neural Network

In the ancillary neural network, we aim to utilize extra fea-tures like weekends and holidays. These features are ancillarybecause, with enough data, this information could be learnedby the main neural network itself from signiﬁcant featuresfed into the main neural network. But explicitly providingthese features helps increase the performance of the modelespecially if the size of the dataset is small. These valuesof these extra features are either 0 or 1 which is used asan indicator of being a weekend or holiday. Then, to reducethe model complexity, those extra features are learned bya relatively shallow neural network (a 4-layer feed-forward (a) RMSE of all predictions(b) MAE of all predictions(c) MAPE of all predictions

Fig. 4. One-step ahead prediction’s error of RMSE (a), MAE (b), and MAPE(c) of all predictions. neural network). The size and dimension of the output ˆ y anc isexactly the same as the output from the main neural network. C. Model Weighting

After the main neural network and ancillary neural networkare fully trained, we use the weighted average [26] and use rectiﬁer (ReLU) [27] as the activation function to forecast theoutput, shown as below: ˆ y = ReLU ( w ˆ y main + w ˆ y anc ) (1)where ReLU is the activation function which is computation-ally efﬁcient to compute and has less likelihood of a vanishinggradient.The output of the weighted average ˆ y includes multiple stepspredictions with the same dimension as the output of mainneural network module ˆ y main and ancillary neural network ˆ y anc . We minimize the root mean squared error (RMSE)between all these values and corresponding ground truthvalue. In this case, we had prepared our ground truth indoortemperature with multiple timestamps as a three-dimensionalmatrix. (a) RMSE of max/min temperature predictions(b) MAE of max/min temperature predictions(c) MAPE of max/min temperature predictions Fig. 5. One-step ahead prediction’s error of RMSE (a), MAE (b), and MAPE(c) of max/min temperature predictions.

The output layer provides one output for each of the 96timestamps. The closer timestamp to the forecast time is usu-ally more accurate than later timestamps. The later timestampforecasting provides an overview of how indoor temperatureis going to change and allows for taking some actions aheadof time [28] for better anticipatory HVAC control of thebuilding. This helps provide desired temperatures with lessenergy consumed since less radical (more planned) actionsare taken.

D. Deriving Conﬁdence Intervals

After the model is fully trained, we need to derive the CI ofthe output of the model. We use MC dropout proposed in [29]and this provides a framework to estimate uncertainty withoutany change of the existing model.Speciﬁcally, we add stochastic dropouts to each hiddenlayer of the neural network. Then we add sample varianceto the output from the model [10]. Finally, we estimatethe uncertainty by approximating the sample variance. Weassume that indoor temperature is approximately a Gaussiandistribution. In this paper, we will derive the 68% and 95% (a) Probability of predictions are not within 68% CI(b) Probability of predictions are not within 95% CI

Fig. 6. The monthly probability of one-step ahead predictions that are notwithin 68% CI (a) and 95% CI (b).Fig. 7. Sample prediction of 96-steps ahead (24 hours) scenario. The 68%and 95% CIs are shown as gray and light gray color bands respectively. Theblack solid line denotes the ground truth indoor temperature and gray dashline represents prediction by adjoint model.

CIs and use it to evaluate the reliability of the model. Giventhe sample data, we could calculate the mean µ and standarddeviation σ . Then the · (1 − α )% CI [30] is: (cid:104) µ − t − α/ · σ √ n , µ + t − α/ · σ √ n (cid:105) (2)where n is the sample number and t denotes the densityfunction. IV. E XPERIMENTS

In this section, we will ﬁrst test our proposed architecturewith the LSTM-DNN [31] network in both one-step ahead (15minutes) prediction and 96-steps ahead (24 hours) prediction.Later, we will do similar test to the other three popular timeseries prediction models: LSTM RNN [22], LSTM encoder-decoder [32], and LSTM encoder w/predictnet [8]. We willcompare not only the error (RMSE, MAE, and MAPE) ofall predictions but also the error of predicting the maximumand the minimum temperatures. In addition, we will evaluate the reliability of the models by calculating the probability ofground truth temperature being within the 68% and 95% CIsrespectively.

A. Setting

The data comes from a multistory building located inNew York City (NYC). Data are collected by the BuildingManagement System (BMS) every 15 minutes. The temper-ature predicted is for one sensor on a ﬂoor of the building.Total building occupancy data are also collected. In addition,we collect the outside weather information, including windspeed and direction, humidity and dew point, pressure andweather status (fog, rain, snow, hail, thunder, and tornado), andtemperature from the Central Park NOAA weather station. Theanalyzed data are from June 9th, 2012 to November 16th, 2014(84,768 timestamps). We utilized 67,814 of the timestamps fortraining and 16,954 for testing. The ratio of training to testingis 8:2. We then split 20% of the timestamps from the trainingdataset for validation. After the neural networks are trained, weadded dropout layer and repeatedly sample for 10,000 times.Finally, we derive 68% and 95% CIs from the sample outputs.

B. One-Step Ahead Prediction Evaluation

We ﬁrst compare the capacity of capturing time seriespattern between the LSTM-DNN model without the ancillarymodel and the LSTN-DNN model using the adjoint architec-ture. Later on, we will evaluate the improvement of usingour proposed architecture from three different perspectives(error of the total prediction, error of predicting max/mintemperature, and model reliability based on the CIs).

Capacity of Capturing the Time Series Pattern: to retrievethe one-step ahead data from the output consisting of 96-steps ahead predictions, we extract the ﬁrst step of the outputtemperature of each input and splice these data. Figure 2demonstrates the sample predictions of using our proposedarchitecture. The plot includes one-step ahead prediction withthe 68% CI and 95% CI. The 68% and 95% CIs are shown asgray and light gray color band respectively. The black solidline represents the ground truth indoor temperature and thegray dash line is the prediction. We noticed that the modellearns the daily and weekly patterns as the temperature patternduring weekdays and weekends is different. But we also foundthat the bands of the CIs are large at the maximum and theminimum temperature.In order to evaluate the improvement of prediction usingthe adjoint neural network, we plot the predictions (withoutCIs) of the LSTM-DNN without the ancillary network, theadjoint LSTM-DNN, and ground truth indoor temperature inthe same plot. Figure 3 shows the selected result of the one-step ahead prediction from 22nd September to 22nd October.We observed that both models capture the daily, weeklyand holiday patterns well. But the adjoint model has betterpredictions, especially when the predicting of the maximumand the minimum temperature. Next, we will compare themodel using our proposed architecture with the model withoutthe ancillary network these three different perspectives. (a) 96-steps ahead prediction in 19th June (b) 96-steps ahead prediction in 2nd July(c) 96-steps ahead prediction in 17th July (d) 96-steps ahead prediction in 18th September

Fig. 8. Comparison of multi-step ahead prediction results between the LSTM-DNN without the ancillary network and the adjoint LSTM-DNN. The lightgray dash line represents the prediction by the LSTM-DNN without the ancillary neural network, and the gray dash line denotes the predictions by the adjointLSTM-DNN. TABLE IO NE - STEP AHEAD PREDICTION COMPARISON BETWEEN THE MODEL U SING THE ADJOINT ARCHITECTURE AND THE MODEL WITHOUT THE ANCILLARYNETWORK .Models Error of All Predictions Error of Max/Min Predictions GT not within CIRMSE MAE MAPE RMSE MAE MAPE 68% 95%LSTM-RNN 2.96 1.33 1.73 4.23 1.87 1.69 24.79% 15.43%LSTM-RNN (adjoint) 2.65 1.20 1.55 2.82 1.23 1.84 20.13% 11.69%LSTM-encoder-decoder 4.22 1.77 2.32 6.97 3.11 4.05 43.70% 19.19%LSTM-encoder-decoder (adjoint) 1.06 0.63 0.83 3.25 1.45 1.83 23.80% 13.56%LSTM-encoder w/predictnet 1.12 0.66 0.86 3.58 1.54 1.95 41.75% 20.85%LSTM-encoder w/predictnet (adjoint) 0.83 0.60 0.75 2.28 1.31 1.61 19.82% 13.82%LSTM-DNN 1.32 0.76 0.91 3.84 1.79 2.21 19.95% 13.18%

LSTM-DNN (adjoint) 0.57 0.39 0.61 1.39 0.78 1.26 9.84% 7.15%

1) Error of All Predictions:

First of all, we calculated themonthly error of all predictions. Speciﬁcally, we calculatedthe monthly error of RMSE, MAE, and MAPE using alltimestamps (every 15 minutes) for the months of June toNovember. Then we compared the error of both the LSTM-DNN without the ancillary network and the adjoint LSTM-DNN. Figure 4 illustrates the comparison of error (RMSE,MAE, and MAPE) of the two models. It is obvious that themodel using our proposed architecture has a much smallererror than the LSTM-DNN without the ancillary network.

2) Error of Max/Min Predictions:

In addition, we evaluatedthe improvement of using our proposed architecture to predictthe maximum and the minimum temperatures. To begin with,we found the timestamps of both the maximum and theminimum temperatures with respect to each day. Next, wecalculated the RMSE, MAE, and MAPE from 30 minutes (2timestamps) before to 30 minutes after the timestamp of themaximum or the minimum temperatures. Last, we summarizedthe error in the same month. Figure 5 illustrates that, using ourproposed architecture, the model has a much better capacityfor predicting the maximum and the minimum temperatures.

3) Reliability of Predictions:

Last, we evaluated the re-liability of the model. This is evaluated by calculating theprobability that the ground truth indoor temperatures are notwithin 68% and 95% CIs respectively, as shown in Figure 6.We notice that both models have similar changes during themonth. But the probability of using our proposed architectureis always lower than the LSTM-DNN without the ancillarynetwork. Therefore, using the adjoint network increases thereliability of the model.Then, the same analysis was conducted for the three otherbase models, LSTM RNN, LSTM encoder-decoder, and LSTMencoder w/predictnet, within the adjoint neural network archi-tecture and without the adjoint architecture.To sum up, Table 1 illustrates the comparison of one-stepahead predictions between the models with and without theancillary network. It shows that the adjoint neural networkarchitecture decreases the error and increases the reliabilityof the models. Also, we observed that our proposed model(LSTM-DNN) which is not the best model without the ancil-lary network becomes the best model when used within ouradjoint neural network architecture.

TABLE IIM

ULTI - STEPS AHEAD P REDICTION C OMPARISON BETWEEN THE MODEL U SING THE ADJOINT ARCHITECTURE AND THE MODEL WITHOUT THEANCILLARY NETWORK

Models Error of All Predictions Error of Max/Min Predictions GT not within CIRMSE MAE MAPE RMSE MAE MAPE 68% 95%LSTM-RNN 3.83 1.70 2.20 5.58 2.59 3.39 33.68% 25.08%LSTM-RNN (adjoint) 2.93 1.39 1.85 4.38 2.09 2.31 24.05% 11.66%LSTM-encoder-decoder 6.21 2.23 2.91 8.03 3.41 3.77 47.71% 21.68%LSTM-encoder-decoder (adjoint) 2.02 0.99 1.29 4.98 2.25 2.92 26.31% 15.70%LSTM-encoder w/predictnet 2.07 1.02 1.32 4.86 2.26 2.94 47.78% 24.22%LSTM-encoder w/predictnet (adjoint) 1.72 0.99 1.28 4.24 2.04 2.65 19.94% 15.94%LSTM-DNN 2.38 1.19 1.43 5.37 2.40 2.92 28.79% 18.87%

LSTM-DNN (adjoint) 1.40 0.76 0.88 4.19 1.88 2.28 14.15% 11.18% (a) RMSE of all predictions(b) MAE of all predictions(c) MAPE of all predictions

Fig. 9. Multi-step ahead prediction’s error of RMSE (a), MAE (b), and MAPE(c) of all predictions.

C. Multiple Timestamps Evaluation

In this section, we will evaluate the prediction errors for96-steps ahead. Each step represents 15 minutes, so 96 stepsdenote the predictions for 24 hours. We will ﬁrst compare themodels’ performance using or not using our proposed archi-tecture to capture the daily pattern. Then, we will compare thetwo models from these three different prospectives.

Capacity of Capturing the Time Series Pattern:

Figure (a) RMSE of max/min temperature predictions(b) MAE of max/min temperature predictions(c) MAPE of maxi/min temperature predictions

Fig. 10. Multi-step ahead prediction’s error of RMSE (a), MAE (b), andMAPE (c) of max/min temperature predictions. (a) Probability of predictions are not within 68% CI(b) Probability of predictions are not within 95% CI

Fig. 11. The monthly probability of multi-step ahead predictions that are notwithin 68% CI (a) and 95% CI (b).

At midnight, the indoor temperature is high since the HVACsystem is shut down at that hour. During the daytime, theHVAC system is operating, and the indoor temperature islower. As people leave the ofﬁce, the building operators rampdown and later turn off the HVAC system and the temperatureraises again.In order to evaluate the improvement of the prediction usingour proposed architecture, we plot the predictions (withoutCIs) of the LSTM-DNN without the ancillary network, theadjoint LSTM-DNN, and the ground truth indoor temperaturein the same plot, as shown in Figure 8. We observed thatthe adjoint neural network has better performance for ﬁttingthe 24 hour prediction, especially for predicting the maximumand the minimum indoor temperature. Next, we compare themodel using our proposed architecture with the model withoutthe ancillary network from three different perspective.

1) Error for all predictions:

Firstly, we calculated themonthly error of all predictions. Speciﬁcally, we calculatedthe monthly error of RMSE, MAE, and MAPE using 96-steps ahead (24 hours) predictions of all the timestamps inthat month respectively. Next, we compared the error of boththe LSTM-DNN without the ancillary network and the LSTM-DNN using our proposed architecture. Figure 9 demonstratesthe comparison of the two models. Though the error is higherthan the one-step ahead predictions, the model that uses ourproposed architecture still outperforms the model without theancillary network.

2) Error of Max/Min Predictions:

Secondly, we evaluatedthe predictions on the maximum and the minimum temper-ature. To begin with, we found the timestamps of both themaximum and the minimum temperature. Then we calculatedthe RMSE, MAE, and MAPE from 30 minutes before andafter the time of the maximum or the minimum temperature. Finally, we summarized the error in the same month. Figure 10illustrates that the model using our proposed architecture hasthe better capacity to predict the maximum and the minimumtemperatures.

3) Reliability of Predictions:

Thirdly, we evaluated thereliability of the model. We calculated the probability of theground truth indoor temperatures are outside of the 68% CIand the 95% CI respectively. Thus, the lower the probabilityis, the more reliable the model is. Figure 11 shows the resultof how the probability changes from June to November. Wenoticed that the models have fairly similar change during thistime period, but the adjoint neural network architecture is morereliable than the LSTM-DNN without the ancillary network.Then, the same analysis was conducted for the three otherbase models, LSTM RNN, LSTM encoder-decoder, and LSTMencoder w/predictnet, within the adjoint neural network archi-tecture and without the adjoint architecture.To sum up, Table 2 illustrates the comparison of the multi-step ahead predictions between the model using our proposedmodel and the corresponding model without the ancillarynetwork. It shows that the ancillary neural network can alsodecrease the error and increase the reliability of the modelin multi-step ahead forecast. The interesting ﬁnding, that themodel (LSTM-DNN) which is not the best model becomes thebest model by using our adjoint neural network architecture,also holds true in the multi-step ahead forecast.V. C

ONCLUSIONS

In this paper, we propose a novel adjoint neural networkarchitecture for time series prediction that uses an ancillaryneural network to capture weekend and holiday informationfor IBM. We used the dataset of a multistory building inNYC and compared four different base models (LSTM RNN,LSTM encoder-decoder, LSTM encoder w/predictnet and ourproposed LSTM-DNN model) within the adjoint architecturewith the corresponding models without the ancillary network.The models’ performance was evaluated in both one-stepahead prediction (15 minutes) and 96-steps ahead prediction(24 hours) in three different perspectives. First, we comparedthe total error of RMSE, MAE, and MAPE of all predictionaccounted. Second, we compared the error (RMSE, MAE,and MAPE) of predicting the maximum and the minimumtemperatures. Third, we compared the reliability of the modelsby calculating the probability of the ground truth indoortemperatures are not within CIs.From the result of the one-step ahead prediction (Figure3), the models using an ancillary NN successfully capturethe daily, the weekly patterns and the holidays. In addition,it performs better prediction result than the models withoutusing ancillary network. Then, from the result of the multi-stepahead prediction (Figure 7), the models using an ancillary net-work successfully captures the daily patterns better, especiallyon predicting the maximum and the minimum temperatures.Our adjoint neural network architecture with the LSTM-DNN model could be used for building a more dependableIBM system. With accountable predicted indoor temperatures,the system can provide comfortable indoor temperatures withless energy consumed. R EFERENCES[1] L. P´erez-Lombard, J. Ortiz, and C. Pout, “A review on buildings energyconsumption information,”

Energy and Buildings , 2008.[2] J. Romero, J. Navarro-Esbr´ı, and J. Belman-Flores, “A simpliﬁed black-box model oriented to chilled water temperature control in a variablespeed vapour compression system,”

Applied Thermal Engineering , 2011.[3] Z. Wang and R. S. Srinivasan, “A review of artiﬁcial intelligence basedbuilding energy use prediction: Contrasting the capabilities of singleand ensemble prediction models,”

Renewable and Sustainable EnergyReviews , vol. 75, no. September 2015, pp. 796–808, 2017. [Online].Available: http://dx.doi.org/10.1016/j.rser.2016.10.079[4] H. Hippert, C. Pedreira, and R. Souza, “Neural networks for short-termload forecasting: a review and evaluation,”

IEEE Transactions on PowerSystems , 2001.[5] T. Mikolov, M. Karaﬁat, L. Burget, J. Cernocky, and S. Khudanpur,“Recurrent Neural Network based Language Model,”

Interspeech , 2010.[6] L. Mba, P. Meukam, and A. Kemajou, “Application of artiﬁcial neu-ral network for predicting hourly indoor air temperature and relativehumidity in modern building in humid region,”

Energy and Buildings ,2016.[7] A. Ashtiani, P. A. Mirzaei, and F. Haghighat, “Indoor thermal conditionin urban heat island: Comparison of the artiﬁcial neural network andregression methods prediction,”

Energy and Buildings , 2014.[8] L. Zhu and N. Laptev, “Deep and Conﬁdent Prediction for Time Seriesat Uber,”

IEEE International Conference on Data Mining Workshops,ICDMW , vol. 2017-November, pp. 103–110, 2017.[9] A. Marvuglia, A. Messineo, and G. Nicolosi, “Coupling a neural networktemperature predictor and a fuzzy logic controller to perform thermalcomfort regulation in an ofﬁce building,”

Building and Environment ,2014.[10] Y. Li and Y. Gal, “Dropout Inference in Bayesian Neural Networks withAlpha-divergences,”

ICML , 2017.[11] Y. Gal, “Uncertainty in Deep Learning,”

PhD Thesis , 2016.[12] D. Svozil, V. Kvasniˇcka, and J. Posp´ıchal, “Introduction to multi-layer feed-forward neural networks,” in

Chemometrics and IntelligentLaboratory Systems , 1997.[13] G. E. Dahl, D. Yu, L. Deng, and A. Acero, “Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition,”

IEEE Transactions on Audio, Speech and Language Processing , 2012.[14] Y. A. LeCun, Y. Bengio, and G. E. Hinton, “Deep learning,”

Nature ,2015.[15] A. Goodfellow, Ian, Bengio, Yoshua, Courville, “Deep Learning,”

MITPress , 2016.[16] S. Cao, W. Lu, and Q. Xu, “Deep Neural Networks for Learning GraphRepresentations,”

Aaai , 2016.[17] A. Toshev and C. Szegedy, “DeepPose: Human pose estimation via deepneural networks,”

The IEEE Conference on Computer Vision and PatternRecognition (CVPR) , 2014.[18] D. Cirean, U. Meier, J. Masci, and J. Schmidhuber, “Multi-column deepneural network for trafﬁc sign classiﬁcation,”

Neural Networks , 2012.[19] W. Huang, G. Song, H. Hong, and K. Xie, “Deep architecture for trafﬁcﬂow prediction: Deep belief networks with multitask learning,”

IEEETransactions on Intelligent Transportation Systems , 2014.[20] P. Romeu, F. Zamora-Mart´ınez, P. Botella-Rocamora, and J. Pardo,“Time-series forecasting of indoor temperature using pre-trained deepneural networks,” in

Lecture Notes in Computer Science (includingsubseries Lecture Notes in Artiﬁcial Intelligence and Lecture Notes inBioinformatics) , 2013.[21] S. Kalogirou, “Artiﬁcial neural networks for the prediction of the energyconsumption of a passive solar building,”

Energy , 2000.[22] S. Hochreiter and J. Urgen Schmidhuber, “LONG SHORT-TERM MEM-ORY,”

Neural Computation , 1997.[23] W. Chan, N. Jaitly, Q. Le, and O. Vinyals, “Listen, attend and spell: Aneural network for large vocabulary conversational speech recognition,”in

ICASSP, IEEE International Conference on Acoustics, Speech andSignal Processing - Proceedings , 2016.[24] Y. Cui, S. Wang, J. Li, and Y. Wang, “LSTM Neural Reordering Featurefor Statistical Machine Translation,” in

In Proceedings of the Conferenceof the North American Chapter of the Association for ComputationalLinguistics: Human Language Technologies (NAACL-HLT) , 2016.[25] D. L. Marino, K. Amarasinghe, and M. Manic, “Building energy loadforecasting using Deep Neural Networks,” in

IECON 2016 - 42ndAnnual Conference of the IEEE Industrial Electronics Society , 2016.[26] X. Qiu, L. Zhang, Y. Ren, P. Suganthan, and G. Amaratunga, “Ensembledeep learning for regression and time series forecasting,” in

IEEE SSCI2014 - 2014 IEEE Symposium Series on Computational Intelligence - CIEL 2014: 2014 IEEE Symposium on Computational Intelligence inEnsemble Learning, Proceedings , 2014.[27] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectiﬁer neural net-works,”

AISTATS ’11: Proceedings of the 14th International Conferenceon Artiﬁcial Intelligence and Statistics , 2011.[28] F. J. Chang, Y. M. Chiang, and L. C. Chang, “Multi-step-ahead neuralnetworks for ﬂood forecasting,”

Hydrological Sciences Journal , 2007.[29] Y. Gal and Z. Ghahramani, “A Theoretically Grounded Applicationof Dropout in Recurrent Neural Networks,” no. Nips, 2015. [Online].Available: http://arxiv.org/abs/1512.05287[30] M. J. Penciana and R. B. D’Agostino, “Overall C as a measure ofdiscrimination in survival analysis: Model speciﬁc population value andconﬁdence interval estimation,”

Statistics in Medicine , 2004.[31] T. N. Sainath, O. Vinyals, A. Senior, and H. Sak, “Convolutional,Long Short-Term Memory, fully connected Deep Neural Networks,” in

ICASSP, IEEE International Conference on Acoustics, Speech and SignalProcessing - Proceedings , 2015.[32] K. Cho, B. V. Merrienboer, D. Bahdanau, and Y. Bengio, “On the Prop-erties of Neural Machine Translation : Encoder Decoder Approaches,”