[PDF] Study of Short-Term Personalized Glucose Predictive Models on Type-1 Diabetic Children

Abstract

Research in diabetes, especially when it comes to building data-driven models to forecast future glucose values, is hindered by the sensitive nature of the data. Because researchers do not share the same data between studies, progress is hard to assess. This paper aims at comparing the most promising algorithms in the field, namely Feedforward Neural Networks (FFNN), Long Short-Term Memory (LSTM) Recurrent Neural Networks, Extreme Learning Machines (ELM), Support Vector Regression (SVR) and Gaussian Processes (GP). They are personalized and trained on a population of 10 virtual children from the Type 1 Diabetes Metabolic Simulator software to predict future glucose values at a prediction horizon of 30 minutes. The performances of the models are evaluated using the Root Mean Squared Error (RMSE) and the Continuous Glucose-Error Grid Analysis (CG-EGA). While most of the models end up having low RMSE, the GP model with a Dot-Product kernel (GP-DP), a novel usage in the context of glucose prediction, has the lowest. Despite having good RMSE values, we show that the models do not necessarily exhibit a good clinical acceptability, measured by the CG-EGA. Only the LSTM, SVR and GP-DP models have overall acceptable results, each of them performing best in one of the glycemia regions.

Full PDF

SStudy of Short-Term Personalized GlucosePredictive Models on Type-1 Diabetic Children

Maxime De Bois

Universit´e Paris SaclayCNRS-LIMSIOrsay, [email protected]

Mounˆım A. El Yacoubi

T´el´ecom SudParis, Universit´e Paris SaclaySAMOVAR, CNRS´Evry, Francemounim.el [email protected]

Mehdi Ammi

Universit´e Paris 8Dept. of Computer ScienceSaint-Denis, [email protected]

Abstract —Research in diabetes, especially when it comes tobuilding data-driven models to forecast future glucose values, ishindered by the sensitive nature of the data. Because researchersdo not share the same data between studies, progress is hardto assess. This paper aims at comparing the most promisingalgorithms in the ﬁeld, namely Feedforward Neural Networks(FFNN), Long Short-Term Memory (LSTM) Recurrent NeuralNetworks, Extreme Learning Machines (ELM), Support VectorRegression (SVR) and Gaussian Processes (GP). They are person-alized and trained on a population of 10 virtual children fromthe Type 1 Diabetes Metabolic Simulator software to predictfuture glucose values at a prediction horizon of 30 minutes. Theperformances of the models are evaluated using the Root MeanSquared Error (RMSE) and the Continuous Glucose-Error GridAnalysis (CG-EGA). While most of the models end up havinglow RMSE, the GP model with a Dot-Product kernel (GP-DP), anovel usage in the context of glucose prediction, has the lowest.Despite having good RMSE values, we show that the models donot necessarily exhibit a good clinical acceptability, measured bythe CG-EGA. Only the LSTM, SVR and GP-DP models haveoverall acceptable results, each of them performing best in oneof the glycemia regions.

Index Terms —Glucose prediction, Feedforward Neural Net-work, Recurrent Neural Network, Long Short-Term Memory,Extreme Learning Machine, Support Vector Regression, Gaus-sian Process

I. I

NTRODUCTION

With 1.5 milion inputed deaths in 2012 [1], diabetes isone of the most threatening diseases of the modern world.Because of the non secretion of insulin (type-1 diabetes) orbody resistance to its action (type-2 diabetes), diabetic peoplehave trouble managing their blood glucose. When his bloodglucose falls too low, the diabetic is said to be in a state ofhypoglycemia, while, in the other hand, when it gets too high,we talk about hyperglycemia. Because both hypoglycemia andhyperglycemia have respectively short-term (e.g., exhaustion,coma, death) and long-term (e.g., cardiovascular diseases,blindness) complications, diabetic people need to maintaintheir blood glucose within an acceptable range.Big advances have been made in the recent years to help dia-betic people in their daily life. Continuous glucose monitoring(CGM) devices, such as the FreeStyle Libre [2], make possibleto track the glucose level throughout days and nights withouthaving to rely on the inconvenient use of lancets. Besides,in combination with CGM devices, we are witnessing a rise of coaching applications for diabetes such as the applicationmySugr [3], which has been approved by the Food & DrugAdministration (FDA) in the United States. In the other hand,bariatric surgery has been shown to induce a 10-year remissionrate of type 2 diabetes of 36% [4]. Finally, since 2016, the ﬁrstartiﬁcial pancreas, the MiniMed 670G, has been available inthe United States [5].One of the biggest research area of interest is the predictionof future glucose values. For a diabetic patient, knowingaccurately his future glycemia is undoubtedly valuable as itcan be used to avoid getting into the hypo-/hyperglycemiaranges by modifying his behavior (e.g., by taking insulin shotsor by eating sugar).Currently, the focus of glucose predictive models is heavilyin favor of data-driven techniques, where the patient’s glucose,carbohydrate (CHO) intake and insulin injection past valuesare used to forecast future glucose values. While the autore-gressive (AR) model and its different variations are the mosttraditionally used models in the ﬁeld [6], they have fallen outof favor for more complex regression models.In particular, Zecchin et al. showed that using meal in-formation improves the forecasting of glucose by using aFeedforward Neural Network (FFNN) [7]. Recurrent NeuralNetworks (RNN) is a class of artiﬁcial neural network (ANN)made for time-series forecasting and have been used byDaskalaki et al. to built a hypoglycemia early-warning system[8]. While RNN present some limitations (e.g., the vanishinggradient problem), novel types of the RNN cell have beenengineered and recently tried out to predict future glucosevalues, such as the Long Short-Term Memory (LSTM) unitby Mirshekarian et al. [9]. Extreme Learning Machine (ELM)is another type of ANN that is quite popular nowadays thanksto its ability to provide relatively good generalization withclose to no training time [10], [11].Meanwhile, models that use the kernel method, also knownas the kernel trick, to map the initial space of observations intoa higher dimension space, have shown interesting results whenused to predict future glucose values. Georga et al. and Khan et al. investigated the usability of Support Vector Regression(SVR) in forecasting blood glucose [12], [13]. Besides, DePaula et al. used Reinforcement Learning alongside withGaussian Processes (GP) to predict future glucose levels andnclude the predictions in a decision-based system aiming atregulating blood glucose [14].However, those advances are hindered by several factors.Because of their sensitive nature, diabetes-related data usedin studies are not made available to other researchers. Thisleads every research team to collect its own data and buildingtheir studies around them. Since most of the studies do notshare the same data, they cannot compare to each other. Theﬁeld needs comparative studies that give objective insights onthe performances of the models. In 2012, Daskalaki et al. leda study aiming at comparing two AR models and a FFNN,which ends up outperforming the former [15]. Meanwhile,in 2015, Zarkogianni et al. , compared four different modelsthey have investigated in previous studies, namely a FFNNmodel, a linear regression model, a Self-Organizing Map and aneuro-fuzzy network with wavelets [16]. Finally, also in 2015,Georga et al. evaluated hybrid glucose predictive models thatcombine regression models (SVR or GP) and feature rankingalgorithms [17].Nowadays, it is still unclear how the most trending modelsrelate to each other in terms of performances. In this study,we compare six of the most promising glucose predictivemodels, namely a FFNN, a RNN with LSTM units, an ELMneural network, a SVR model and two GP models. To addressthis goal, we ﬁrst describe the data ﬂow, from its simulationusing the Type 1 Diabetes Metabolic Simulator software to theimplementation of the models. Then, we discuss the results ofthe models obtained by evaluating them with two differentmetrics. Finally, we conclude by providing our takeaways andsome guidelines for future studies.II. M

ETHODS

This section presents the whole methodology that has beenused to compare the predictive models. First, we explain howwe obtain our experimental data. Then, we go through thepreprocessing of the data and the building of the predictivemodels. Finally, we discuss the evaluation metrics that havebeen used in this study. Figure 1 illustrates this methodology.

A. Data Simulation

The T1DMS software [18] is a type-1 virtual diabeticpatients simulator that has been accepted as a substitute toclinical testing by the FDA [19] and that has been extensivelyused in the glucose prediction research ﬁeld [7], [14], [20]–[22].In this study, 10 in-silico type-1 diabetic children (repre-senting the most challenging diabetic population) underwentthe following open-loop experiment: • g ), the normal distributions were centered at 40g, 85gand 60g with a variance of 0.5 times the mean quantity. parametersSimulationPreprocessingTrainingEvaluationRMSE CG-EGA testing setevaluation set time-seriestraining setmodel Fig. 1. Data ﬂow diagram, from its simulation to the evaluation of the models. • An insulin bolus is taken at the start of a meal. Thevalue of the bolus is taken uniformly between 0.7 and 1.3times the patient’s optimal bolus given his carbohydrate-to-insulin ratio. • Basal insulin is constant and optimal (computed by thesimulator).Similar scenarios have been used in the past few years [7],[15], [22]. The major difference between our simulation andothers is the length of the simulation. While most simulationslast only a few days, we simulated 29 days of data. This servesthe purpose of enhancing the generalization of the models byavoiding overﬁtting.In the end, for every patient, the simulation outputs threedifferent time-series with a sampling frequency of 60 Hz:glucose values, carbohydrate intakes and insulin boluses overtime.

B. Data Preprocessing1) Data Rearrangement:

First, we divided the 29-day longtime-series into day-long subsets. Then, we expanded everydaily subset with the data of the previous day to account forthe prediction horizon (30 minutes) and the data history (60minutes) used in the models. Finally we discarded the ﬁrstday as it is mostly used to warm up the simulator. We end uphaving 28 subsets of 1530 samples-long time-series. Figure 2illustrates this process.

2) 3-way Data Splitting:

Half of the subsets, the trainingset, is used to train the models. A quarter, the evaluation set,is used to tune the hyperparameters of the models. Finally, nsulinCHOglucosetime............day 1 day 2 day 29subset 1 subset 28

Fig. 2. Data rearrangement: from the 29-day long time-series, 28 subsets are created. the remaining quarter, the testing set, is used to evaluate theﬁnal model. We note that, even-though this kind of splittingis a common procedure in Machine Learning, it has not beenused in the past in glucose prediction (a simpler training andtesting sets split is relatively common, though).

3) Standardization:

The training set is standardized (zero-mean and unit variance) and the same transformation is thenapplied to the evaluation and testing sets.

C. Glucose Predictive Models

This subsection goes through all the models that have beenimplemented. The training and optimization of the modelshave been personalized to the patient.

1) Feedforward Neural Network (FFNN):

The network ismade of 4 hidden layers of respectively 128, 64, 32 and 16neurons. Apart from the last neuron, whose activation functionis linear, every neuron uses the Scaled Exponential Linear Unit(SELU), which is the Exponential Linear Unit (ELU) withan optimized value of α [23]. All the weights are initializedusing a coarse model trained on one of the children. Thisserves the purpose of speeding up the training process ofthe remaining children. The model is ﬁne-tuned to the givenpatient using the Adam optimizer with the Mean Squared Errorloss function, mini-batches of size 1500, an initial learning rateof − and a decay of − . To avoid the slight overﬁttingencountered during the training of the models, we add up a L2penalty to the weights and made use of early stopping. Alpha-dropout (speciﬁc to the use of the SELU activation function)has been tried out with no perceptible improvement. Also, theSELU activation function, not used so far in glucose predictionstudies, has shown to work really well with better trainingtime and performances compared to more classical activationfunctions (e.g., tanh, ReLU).

2) Long Short-Term Memory Recurrent Neural Network(LSTM):

The LSTM network is made of one recurrent layerof 64 LSTM units. The network has been unrolled 60 times toaccount for an history of 60 minutes. As for the FFNN model,the weights are initialized to coarse values ﬁtted on one ofthe children. The model is then ﬁne-tuned using the Adamoptimizer with the Mean Squared Error loss function, mini-batches of size 500 and an initial learning rate of − . As for the amount of regularization used during the training, weadded a L2 penalty to the weights ( − ), used early stoppingand recurrent dropout (rate of 0.5). Since an increase in thenumber of hidden layers or hidden neurons did not yield betterresults, we stuck to a rather simple network.

3) Extreme Learning Machine (ELM):

ELM networks arequite simple to tune as we only need to adjust the number ofneurons in the single hidden layer and their activation function[24]. The logistic activation function seemed to be the one towork the best for us. We applied a L2 penalty (100.0) to theweights to reduce the impact of overﬁtting. While continuouslyadding more neurons improved the performances, we choseto stop at 20160 neurons (which is the number of trainingsamples) as the increase in performance was not signiﬁcantafterwards.

4) Support Vector Regression (SVR):

The SVR model hasbeen implemented using the radial basis function (RBF) kernel(deﬁned in Equation 1, with x and x ′ being two inputvectors). The kernel’s coefﬁcient ( γ ) has been personalizedand optimized by grid search from the initial [10 − , ] range.The parameter ǫ models the ǫ -tube within which no penaltyis associated to the training loss function. While the penaltyis also personalized and optimized within a speciﬁc range( [10 − , − ] ), ǫ has been ﬁxed to − . Lower values of ǫ made the model unable to ﬁt the training data while greatervalues yielded worse results. k ( x, x ′ ) = exp( − γ k x − x ′ k ) (1)

5) Gaussian Process (GP) Regression:

As for the SVRmodel, GP models are traditionally used with a RBF kernel inglucose prediction studies [17]. However, we found out thatusing a dot-product (DP) kernel (deﬁned in Equation 2, with x and x ′ being two input vectors) was way more effective.Therefore, we implement both versions of Gaussian Processregression: one with a RBF kernel (GP-RBF) and one with aDP kernel (GP-DP). k ( x, x ′ ) = σ + x · x ′ (2)he RBF kernel has been ﬁxed with a value of γ of 0.5 aschanging the values did not impact the results. As for the DPkernel, the value of σ has also been ﬁxed to a value of . for the same reason.In order to help our models to ﬁt the training data, weadded noise to the observations, represented by the value α that is added to the diagonal of the kernel matrix during ﬁtting.More noise implies an easier ﬁt but also worse results, so wepersonalized and optimized its value in the [10 − , ] range. D. Performance Metrics

We evaluate the performances of the models using nestedcross-validation [25], doing permutations of the training, eval-uation and testing sets splitting (which is described in SectionII-B2). A ﬁrst cross-validation between the training and theevaluation sets is used to tune the hyperparameters of themodels. Then, the tuned and ﬁtted models are evaluated witha second round of cross-validation with the testing set.A lot of different evaluation metrics have been used through-out the years [26]. We have settled down with the two mostsigniﬁcant ones : the Root Mean Squared Error (RMSE) andthe Continuous Glucose Error Grid Analysis (CG-EGA).

1) Root Mean Squared Error:

The RMSE (deﬁned in Equa-tion 3, with y being the true values and ˆ y being the predictedvalues) is the standard metric to measure the performance ofa glucose predictive model. It has the advantage of beinga single value metric making comparison between modelsstraightforward. It can also be used as the loss function duringthe training stage (e.g., when training neural networks). And,compared to other similar metrics (e.g., the Mean AbsoluteError), it penalizes bigger errors more, which is preferable inglucose prediction since big errors, even when they are rare,can have disastrous consequences. RM SE ( y, ˆ y ) = vuut n n X i =1 ( y i − ˆ y i ) (3)

2) Continuous Glucose-Error Grid Analysis:

The CG-EGA, initially introduced to measure the clinical acceptabilityof Continuous Glucose Monitoring (CGM) devices [27], sees alot of use in the evaluation of glucose predictive models [26].It is made of two different evaluation grids: the Point-ErrorGrid Analysis (P-EGA) and the Rate-Error Grid Analysis (R-EGA). With the P-EGA, depending on the true glucose value,predictions are assigned to clinical acceptability areas, fromA to E (i.e. good to bad). As for the R-EGA, the idea is thesame but we focus on rates of change instead of focusingon the point-values themselves. The CG-EGA is simply theCartesian product of the P-EGA and the R-EGA. In order toappreciate the CG-EGA, it is simpliﬁed by giving each cella measure of its clinical acceptability, depending of the truestate of the patient’s glycemia. A cell can then either containaccurate predictions (AP), benign errors (BE) or erroneouspredictions (EP). III. R

ESULTS

The experimental results have been reported in Table Iwhich depicts, in terms of average RMSE and CG-EGA acrossthe children, the performance of the six models described insection II-C.With the biggest RMSE and, in most glycemia areas in theCG-EGA, the biggest amount of EP, the ELM model comesout to be the worst in our study.In terms of RMSE, the SVR, the FFNN and the GP-DPmodels stand out from the other models by making predictionsthe closest to the actual glucose values, with the GP-DP modelbeing the best of the three.As for the clinical acceptability of the models, the con-clusions depend on the glycemia range. In the range ofeuglycemia (where the true glucose value is between 70 mg/dLand 180 mg/dL), all the models manage to make acceptablepredictions with a minimum of 0.09% (SVR and LSTM) and amaximum of 1% (ELM) of EP. However, in the hypoglycemiarange (true value below 70 mg/dL), the ELM and the GP-RBFmodels show clinically unacceptable results with a signiﬁcantnumber of EP. In the hyperglycemia range, the ELM and theFFNN models show also unacceptable results with a highnumber of EP and BE. Overall, the LSTM, SVR and GP-DPmodels show stable good results across the whole CG-EGAtable.Figure 3 gives the reader the opportunity to visualize thepredictions of the models against the true glucose values forone of the children during a speciﬁc day, starting at 0h00 andending at 23h59. The three peaks in the graph that extendinto the hyperglycemia range represent the postprandial riseof glucose. IV. D

ISCUSSION

By showing high EP in at least one CG-EGA range, theELM, the GP-RBF and the FFNN to some extent, do not makesafe predictions for the diabetic patients. However, there isno clear winner among the remaining three models. WhileGP-DP presents the best RMSE and best CG-EGA in thehypoglycemia range, it is surpassed by the LSTM and SVRmodels in the remaining areas of the CG-EGA, with bothbeing, respectively, the best model in the euglycemia and thebest model in the hyperglycemia range. We also note thatthe GP-DP model has generally higher AP standard deviationvalues compared to the other models in the CG-EGA. Withhigher standard deviations, the GP-DP results are shown to beless stable across the diabetic children population, which is notpreferable considering potential future use of such predictivemodels into the whole real diabetic population.In a completely different perspective, our study shows theusefulness of using the CG-EGA to assess the performancesof the models and not relying solely on the RMSE metric.To illustrate this idea, we can compare the results of theFFNN and the LSTM models. As we can see in Table I,while the FFNN model has the second lowest RMSE, its CG-EGA results in the hyperglycemia range make it clinically

ABLE IP

ERFORMANCES OF GLUCOSE PREDICTION MODELS WITH MEAN AND STANDARD DEVIATION ACROSS THE CHILDREN . Models RMSE CG-EGA

Hypoglycemia Euglycemia Hyperglycemia

AP BE EP AP BE EP AP BE EP

ELM 11.24 (2.74) (4.24) (0.11) (4.30) (2.94) (2.90) (0.58) (5.09) (4.07) (1.84)

GP-RBF 7.84 (2.51) (12.23) (0.08) (12.26) (1.43) (1.42) (0.05) (1.62) (1.29) (0.42)

LSTM 7.08 (1.53) (0.72) (0.12) (0.71) (1.28) (1.27) (0.02) (1.07) (0.94) (0.20)

SVR 5.92 (2.19) (0.59) (0.08) (0.55) (1.55) (1.54) (0.03) (1.65) (1.49) (0.18)

FFNN 5.43 (1.84) (0.51) (0.07) (0.49) (2.15) (2.09) (0.17) (2.25) (1.87) (0.54)

GP-DP 5.16 (1.96) (1.66) (0.05) (0.45) (1.66) (1.57) (0.14) (3.40) (2.77) (0.75)

AP: Accurate Prediction (in %); BE: Benign Error (in %); EP: Erroneous Prediction (in %) ,

000 1 ,

200 1 , Time [minutes] G l u c o s ec on ce n t r a ti on [ m g / d L ] ELMGP-RBFLSTMSVRFFNNGP-DPTrue

Fig. 3. Daily glucose predictions as a function of time for a child during a speciﬁc day. nacceptable. If we look into the details of the FFNN’s CG-EGA (Figure 4), compared to those of the LSTM model(Figure 5) from the same child during the same day, we canunderstand why. In both the P-EGA and the R-EGA ﬁgures, aperfect prediction is represented by the line in the middle ofthe A zones. In both P-EGA grids, while most predictions arein the A zone, we can see that FFNN predictions are closerto true values compared to those made by the LSTM model(this difference is reﬂected by the difference in RMSE betweenboth models). In the other hand, the R-EGA ﬁgures show usthat the predicted rates of change of the FFNN model aremuch more spread out inside the grid. For a model to haveoverall good CG-EGA results, it needs to perform well inthe P-EGA and in the R-EGA grids at the same time. TheFFNN model, while being one of the best point predictivemodels (RMSE or P-EGA), has trouble estimating preciselyand consistently the glucose variations. We think that theLSTM model manages to have good CG-EGA results despitenot being one of the best RMSE thanks to the inherent natureof the algorithm. RNN, especially those based on LSTM units,make use of the sequential nature of the data to rememberimportant observations to compute coherent predictions.Finally, the results highlight some limitations of the CG-EGA. First, as it is not usually trained on (given its complexnature), algorithms which are only trained to compute goodpoint predictions (e.g. FFNN) may not succeed the clinicalacceptability test because it involves rates of change predic-tions. Second, the CG-EGA fails at discriminating models thathave more or less the same results (namely LSTM, SVR andGP-DP). We should note that, rather than from the CG-EGAitself, it comes from the common simpliﬁcation made from it(the AP, BE and EP categories).V. C

ONCLUSION

In this paper we studied six of the most trending andpromising glucose predictive models. We compared a FFNN,a LSTM RNN, a ELM neural network, a SVR model, andtwo GP models, one with a RBF kernel and the other witha DP kernel. While the RMSE has been used to measure theaccuracy of the predictions, the CG-EGA has been used toprovide a measure of the clinical acceptability of the models.The GP-DP model is a novel improvement of GP models,traditionally used with a RBF kernel in the context of glucoseprediction.The analysis of the results showed that only the LSTM,SVR and GP-DP have overall satisfactory results, each ofthem having its own strength. In particular, while the GP-DP model presents the best RMSE as well as the best clinicalacceptability in the hypoglycemia range, the LSTM and SVRmodels excel, respectively, in the euglycemia range and in thehyperglycemia range.Besides, we highlighted the limitations of the evaluationmethodology currently used in the ﬁeld of glucose prediction.While the CG-EGA covers the RMSE weakness by providinga way of evaluating the clinical acceptability of the models,it is not perfect as it cannot be trained on and, given the common simpliﬁcation used to report the results, cannot helpdiscriminating between the best models.This study makes us identify new approaches to tacklethe problem of predicting future glucose values of diabeticpatients such as improving the way we evaluate the modelsor combining them into a single predictor (through fusionalgorithms for instance) which would take advantage of theirdifferent strengths.Finally, in the future, we aim at conducting this study onthe other diabetic populations (i.e., adolescents, adults) and forlonger prediction horizons (e.g., 60 or 120 minutes) to see ifwe can generalize our results and ﬁndings.A

CKNOWLEDGMENT

This work is supported by the ”IDI 2017” project fundedby the IDEX Paris-Saclay, ANR-11-IDEX-0003-02.R

EFERENCES[1] World Health Organization et al. , Global report on diabetes . WorldHealth Organization, 2016.[2] A. F. ´Olafsd´ottir, S. Attvall, U. Sandgren, S. Dahlqvist, A. Pivodic,S. Skrtic, E. Theodorsson, and M. Lind, “A clinical trial of the accuracyand treatment experience of the ﬂash glucose monitor freestyle librein adults with type 1 diabetes,”

Diabetes technology & therapeutics ,vol. 19, no. 3, pp. 164–172, 2017.[3] K. J. Rose, M. Koenig, and F. Wiesbauer, “Evaluating success forbehavioral change in diabetes via mhealth and gamiﬁcation: Mysugrskeys to retention and patient engagement,”

Diabetes Technology &Therapeutics , vol. 15, p. A114, 2013.[4] A. P. Courcoulas, S. Z. Yanovski, D. Bonds, and et al, “Long-term outcomes of bariatric surgery: A national institutes of healthsymposium,”

JAMA Surgery , vol. 149, no. 12, pp. 1323–1329, 2014.[Online]. Available: http://bit.ly/2RjhTsH[5] U.S. Food & Drug Administration. Minimed 670g system -p160017/s017. [Online]. Available: http://bit.ly/2Jnwhxs[6] G. Sparacino, F. Zanderigo, S. Corazza, A. Maran, A. Facchinetti, andC. Cobelli, “Glucose concentration can be predicted ahead in time fromcontinuous glucose monitoring sensor time-series,”

IEEE Transactionson biomedical engineering , vol. 54, no. 5, pp. 931–937, 2007.[7] C. Zecchin, A. Facchinetti, G. Sparacino, G. De Nicolao, and C. Cobelli,“Neural network incorporating meal information improves accuracy ofshort-time prediction of glucose concentration,”

IEEE Transactions onBiomedical Engineering , vol. 59, no. 6, pp. 1550–1560, 2012.[8] E. Daskalaki, K. Nørgaard, T. Z¨uger, A. Prountzou, P. Diem,and S. Mougiakakou, “An early warning system forhypoglycemic/hyperglycemic events based on fusion of adaptiveprediction models,”

Journal of diabetes science and technology , vol. 7,no. 3, pp. 689–698, 2013.[9] S. Mirshekarian, R. Bunescu, C. Marling, and F. Schwartz, “Usinglstms to learn physiological models of blood glucose behavior,” in

Engineering in Medicine and Biology Society (EMBC), 2017 39thAnnual International Conference of the IEEE . IEEE, 2017, pp. 2887–2891.[10] K. Assadi, T. Hamdi, F. Fnaiech, J. M. Ginoux, and E. Moreau, “Es-timation of blood glucose levels techniques,” in

Smart, Monitored andControlled Cities (SM2C), 2017 International Conference on . IEEE,2017, pp. 75–80.[11] M. V. Jankovic, S. Mosimann, L. Bally, C. Stettler, and S. Mougiakakou,“Deep prediction model: The case of online adaptive prediction ofsubcutaneous glucose,” in , Nov 2016, pp. 1–5.[12] E. I. Georga, V. C. Protopappas, D. Ardig`o, M. Marina, I. Zavaroni,D. Polyzos, and D. I. Fotiadis, “Multivariate prediction of subcutaneousglucose concentration in type 1 diabetes patients based on support vectorregression,”

IEEE journal of biomedical and health informatics , vol. 17,no. 1, pp. 71–81, 2013.

00 200 300 400100200300400

A BBCCD DEE

True glucose values [ mg.dL − ] P r e d i c t e dg l u c o s e v a l u e s [ m g . d L − ] Point-Error Grid Analysis APBEEP − − − − − − − − AB B CCD DE E

True glucose rates of change [ mg.dL − .min − ] P r e d i c t e dg l u c o s e r a t e s o f c h a ng e [ m g . d L − . m i n − ] Rate-Error Grid Analysis APBEEP

AP: Accurate Prediction; BE: Benign Error; EP: Erroneous Prediction

Fig. 4. P-EGA and R-EGA of the FFNN model of a child for a speciﬁc day. For every prediction, AP, BE and EP markers are computed by combining theP-EGA and the R-EGA into the CG-EGA.

100 200 300 400100200300400 A BBC

CD DE E True glucose value [ mg.dL − ] P r e d i c t e dg l u c o s e v a l u e [ m g . d L − ] Point-Error Grid Analysis APBEEP − − − − − − − − AB B C CD D E E True glucose variation [ mg.dL − .min − ] P r e d i c t e dg l u c o s e v a r i a ti on [ m g . d L − . m i n − ] Rate-Error Grid Analysis APBEEP

AP: Accurate Prediction; BE: Benign Error; EP: Erroneous Prediction

Fig. 5. P-EGA and R-EGA of the LSTM model of a child for a speciﬁc day. For every prediction, AP, BE and EP markers are computed by combining theP-EGA and the R-EGA into the CG-EGA.13] T. Khan, M. Masud, and K. A. Mamun, “Methods to predict bloodglucose level for type 2 diabetes patients,” in

Humanitarian TechnologyConference (R10-HTC), 2017 IEEE Region 10 . IEEE, 2017, pp. 392–395.[14] M. De Paula, L. O. ´Avila, and E. C. Mart´ınez, “Controlling bloodglucose variability under uncertainty using reinforcement learning andgaussian processes,”

Applied Soft Computing , vol. 35, pp. 310–332,2015.[15] E. Daskalaki, A. Prountzou, P. Diem, and S. G. Mougiakakou, “Real-time adaptive models for the personalized prediction of glycemic proﬁlein type 1 diabetes patients,”

Diabetes technology & therapeutics , vol. 14,no. 2, pp. 168–174, 2012.[16] K. Zarkogianni, K. Mitsis, E. Litsa, M.-T. Arredondo, G. Fico, A. Fiora-vanti, and K. S. Nikita, “Comparative assessment of glucose predictionmodels for patients with type 1 diabetes mellitus applying sensorsfor glucose and physical activity monitoring,”

Medical & biologicalengineering & computing , vol. 53, no. 12, pp. 1333–1343, 2015.[17] E. I. Georga, V. C. Protopappas, D. Polyzos, and D. I. Fotiadis,“Evaluation of short-term predictors of glucose concentration in type1 diabetes combining feature ranking with regression models,”

Medical& biological engineering & computing , vol. 53, no. 12, pp. 1305–1318,2015.[18] The Epsilon Group. T1dms - a groundbreaking toolfor type 1 diabetes treatment r&d. [Online]. Available:https://tegvirginia.com/software/t1dms/[19] C. Dalla Man, F. Micheletto, D. Lv, M. Breton, B. Kovatchev, andC. Cobelli, “The uva/padova type 1 diabetes simulator: new features,”

Journal of diabetes science and technology , vol. 8, no. 1, pp. 26–34,2014.[20] H. Zhao, C. Zhao, C. Yu, and E. Dassau, “Multiple order modelmigration and optimal model selection for online glucose prediction intype 1 diabetes,”

AIChE Journal , vol. 64, no. 3, pp. 822–834, 2018.[21] I. Contreras, S. Oviedo, M. Vettoretti, R. Visentin, and J. Veh´ı, “Person-alized blood glucose prediction: A hybrid approach using grammaticalevolution and physiological models,”

PloS one , vol. 12, no. 11, p.e0187754, 2017.[22] K. Zarkogianni, A. Vazeou, S. G. Mougiakakou, A. Prountzou, andK. S. Nikita, “An insulin infusion advisory system based on autotuningnonlinear model-predictive control,”

IEEE Transactions on BiomedicalEngineering , vol. 58, no. 9, pp. 2467–2477, 2011.[23] G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter, “Self-normalizing neural networks,” in

Advances in Neural Information Pro-cessing Systems , 2017, pp. 971–980.[24] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine:theory and applications,”

Neurocomputing , vol. 70, no. 1-3, pp. 489–501,2006.[25] D. Krstajic, L. J. Buturovic, D. E. Leahy, and S. Thomas, “Cross-validation pitfalls when selecting and assessing regression and classi-ﬁcation models,”

Journal of cheminformatics , vol. 6, no. 1, p. 10, 2014.[26] S. Oviedo, J. Veh´ı, R. Calm, and J. Armengol, “A review of personalizedblood glucose prediction strategies for t1dm patients,”

Internationaljournal for numerical methods in biomedical engineering , vol. 33, no. 6,p. e2833, 2017.[27] B. P. Kovatchev, L. A. Gonder-Frederick, D. J. Cox, and W. L. Clarke,“Evaluating the accuracy of continuous glucose-monitoring sensors:continuous glucose–error grid analysis illustrated by therasense freestylenavigator data,”