[PDF] Bayesian Optimization Assisted Meal Bolus Decision Based on Gaussian Processes Learning and Risk-Sensitive Control

Abstract

Effective postprandial glucose control is important to glucose management for subjects with diabetes mellitus. In this work, a data-driven meal bolus decision method is proposed without the need of subject-specific glucose management parameters. The postprandial glucose dynamics is learnt using Gaussian process regression. Considering the asymmetric risks of hyper- and hypoglycemia and the uncertainties in the predicted glucose trajectories, an asymmetric risk-sensitive cost function is designed. Bayesian optimization is utilized to solve the optimization problem, since the gradient of the cost function is unavailable. The proposed approach is evaluated using the 10-adult cohort of the FDA-accepted UVA/Padova T1DM simulator and compared with the standard insulin bolus calculator. For the case of announced meals, the proposed method achieves satisfactory and similar performance in terms of mean glucose and percentage time in [70, 180] mg/dL without increasing the risk of hypoglycemia. Similar results are observed for the case without the meal information (assuming that the patient follows a consistent diet) and the case of basal rate mismatches. In addition, advisory-mode analysis is performed based on clinical data, which indicates that the method can determine safe and reasonable meal boluses in real clinical settings. The results verify the effectiveness and robustness of the proposed method and indicate the feasibility of achieving improved postprandial glucose regulation through a data-driven optimal control method.

Full PDF

aa r X i v : . [ ee ss . S Y ] J a n Bayesian Optimization Assisted Meal Bolus Decision Based on Gaussian ProcessesLearning and Risk-Sensitive Control ⋆ Deheng Cai a , Wei Liu b , Linong Ji b , Dawei Shi a, ∗ a State Key Laboratory of Intelligent Control and Decision of Complex Systems, School of Automation, Beijing Institute of Technology, Beijing, China b Department of Endocrine and Metabolism, Peking University People’s Hospital, Beijing, China

Abstract E ﬀ ective postprandial glucose control is important to glucose management for subjects with diabetes mellitus. In this work, adata-driven meal bolus decision method is proposed without the need of subject-speciﬁc glucose management parameters. Thepostprandial glucose dynamics is learnt using Gaussian process regression. Considering the asymmetric risks of hyper- and hypo-glycemia and the uncertainties in the predicted glucose trajectories, an asymmetric risk-sensitive cost function is designed. Bayesianoptimization is utilized to solve the optimization problem, since the gradient of the cost function is unavailable. The proposed ap-proach is evaluated using the 10-adult cohort of the FDA-accepted UVA / Padova T1DM simulator and compared with the standardinsulin bolus calculator. For the case of announced meals, the proposed method achieves satisfactory and similar performance interms of mean glucose and percentage time in [70, 180] mg / dL without increasing the risk of hypoglycemia. Similar results areobserved for the case without the meal information (assuming that the patient follows a consistent diet) and the case of basal ratemismatches. In addition, advisory-mode analysis is performed based on clinical data, which indicates that the method can deter-mine safe and reasonable meal boluses in real clinical settings. The results verify the e ﬀ ectiveness and robustness of the proposedmethod and indicate the feasibility of achieving improved postprandial glucose regulation through a data-driven optimal controlmethod. Keywords:

Meal bolus decision, Gaussian processes, risk-sensitive control, Bayesian optimization.

1. Introduction

Diabetes mellitus (DM) is a chronic metabolic disease thatis characterized by absolute insulin deﬁciency (type 1 diabetes(T1D)) or relative lack of insulin secretion and sensitivity (type2 diabetes (T2D)). Patients with DM tend to su ﬀ er from the longterm complications, e.g., retinopathy and nephropathy, due topoor blood glucose (BG) managements [1]. Nowadays, exter-nal insulin administration through a basal-bolus strategy withmultiple daily injections (MDI) or continuous subcutaneous in-sulin infusion (CSII) pump is a popular way among the patientsto control BG [2]. With improved modern diabetes technolo-gies, a closed-loop control system for BG regulations, namedartiﬁcial pancreas (AP), is further developed through integrat-ing the pumps and continuous glucose monitoring (CGM) sen-sors. An AP automatically delivers insulin to achieve desiredBG levels, based on CGM-driven feedback control algorithms[3, 4, 5, 6].However, for the therapies above, the e ﬀ ective control ofpostprandial glucose still remains a challenge [7]. At present,the feedforward control action in terms of preprandial insulin ⋆ This work was supported in part by the National Natural Science Foun-dation of China under Grant 61973030, and in part by the Peking UniversityPeople’s Hospital Scientiﬁc Research Development Funds RDY2019-05. ∗ Corresponding author

Email address: [email protected] (Dawei Shi) bolus is widely adopted to counteract the hyperglycemia associ-ated with meal intakes. Speciﬁcally, a meal bolus is determinedby a bolus calculator according to carbohydrate content of themeal, the BG level and subject-speciﬁc settings (carbohydrateratio (CR) and correction factor (CF) proﬁles). Multiple e ﬀ ortshave been devoted to improve the performance of the bolus ad-visors. Considering the repetitive nature of the daily activitiesof the patient, Owens et al. [8] proposed a Run-to-Run (R2R)algorithm to update the insulin bolus amount and timing daily.In Schiavon et al. [9], a novel optimization method for CRwas developed based on a validated index of insulin sensitiv-ity estimated form CGM and CSII data. Combining case-basedreasoning (CBR) with R2R, an advanced insulin bolus advi-sor through adapting CR and CF was presented in Herrero etal. [10]. Similarly, Torrent-Fontbona [11] investigated a bolusinsulin recommend system based on CBR, but provided a newreuse, revise and retain mechanism to adapt CR and CF. BesidesCBR, other artiﬁcial intelligence techniques including fuzzylogic [12] and reinforcement learning (RL) [13] have been in-vestigated for the bolus advisors. Moreover, several algorithmshave also been explored to calculate the required bolus withinthe framework of an AP. In Turksoy et al. [14], a meal bo-lus calculation method was developed for unannounced mealsbased on the Bergman’s minimal model and unscented Kalmanﬁlter. In To ﬀ anin et al. [15], R2R was implemented in the APto adapt CR. Shi et al. [16] explored a Bayesian optimizationassisted learning framework to adapt CR proﬁles for the AP, Preprint submitted to Control Engineering Practice January 21, 2021 sing historical CGM and CSII data.Most of the methods above improve postprandial glucosecontrol through updating CR and CF according to the designedmechanisms. These methods do not capture the dynamics ofglucose metabolism, and ignore the upcoming postprandial glu-cose situations in the optimization of bolus dosage, thus lead-ing to potentially sub-optimal glycemic control. With the im-proved glucose sensor accuracy and accessibility, data-drivenoptimal control provides a promising way to achieve optimalglycemic control and reduce the burden of optimizing CR andCF. The data-driven optimal control can explicitly exploit themodeled dynamics for the bolus decision by taking account ofthe preprandial glucose levels, and optimizing the bolus dosageusing the predicted postprandial glucose situations. This formsthe motivation of our work. To implement the data-driven opti-mal control for the meal bolus decision, glucose prediction is acritical role. Along this direction, many methods have beenexplored in the literature [17]. For example, an autoregres-sive with exogenous input model was presented in Romero-Ugaldein et al. [18] to predict interstitial glucose. In Yu etal. [19], four di ﬀ erent adaptive ﬁlters and a fusion mechanismwere proposed for the online glucose concentration predictions.Combining feature ranking with support vector regression orGaussian processes, Georga et al. [20] investigated the short-term glucose prediction. To improve long-term glucose pre-diction, Montaser et al. [21] presented an integrated predict-ing method based on seasonal local models and fuzzy c -means.Di ﬀ erent from the predictions above, using the Gaussian pro-cess (GP) regression, we provide multi-step predictions (e.g.,2 hours) for the postprandial glucose trajectories correspond-ing to various preprandial glucose situations, meal boluses andmeal information (carbohydrate content), so that the optimiza-tion for the meal bolus can be done via the predicted glucosetrajectories.Since GP is a data-e ﬃ cient and robust modeling method,di ﬀ erent approaches have concerned the research of GP-basedcontrol. For example, in [22], a framework of probabilistic in-ference for learning control was proposed based on the GP andapplied in real robotics and control tasks. In [23], a GP-basedmodel predictive controller (MPC) was investigated for build-ing energy control and demand response. To enhance e ﬀ ectiveonline learning and control, a risk-sensitive cost was introducedin the MPC with GP models [24], as well as in the RL [25].Inspired by the work in [24, 25], we utilize the predicted infor-mation in a risk-sensitive fashion, but construct an asymmetricrisk-sensitive cost with the consideration of asymmetric risks inhyper- and hypoglycemia. Finally, based on the designed cost,a constrained stochastic optimization problem is proposed forthe meal bolus decision. Since the gradient of the cost can-not be computed analytically, we utilize Bayesian optimization[26] and Monte-Carlo method to solve the optimization prob-lem. For safety reasons, the ﬁnal solution for the meal bolusenforces an insulin on board (IOB) constraint.The e ﬀ ectiveness and robustness of the proposed method areevaluated using di ﬀ erent in silico protocols on the 10-adult co-hort of the US Food and Drug Administration (FDA) acceptedUniversities of Virginia (UVA) / Padova T1DM simulator, and compared with the standard insulin bolus calculator. For thecase of announced meals, the proposed method achieves satis-factory and similar performance for scenarios of nominal basalrates, in terms of mean glucose level and percent time in the saferange, without increasing the risk of hypoglycemia. Similar re-sults are observed for the case without the meal information (as-suming that the patient follows a consistent diet) and the scenar-ios of over / under-estimated basal rates. In addition, advisory-mode analysis [27] based on clinical data from a T1DM subjectshow that the proposed method can determine reasonable mealboluses by explicitly taking account of the preprandial glucoselevels in the data-driven optimal control.

2. Materials and Methods

The overall structure of the proposed data-driven meal bo-lus decision method is illustrated in Fig. 1. The method buildson three key components: model learning, asymmetric risk-sensitive control, and Bayesian optimization.Model learning is responsible for constructing the postpran-dial glucose dynamics. Here, the aim is to provide a robust de-scription for postprandial glucose dynamics using GPs. Specif-ically, fed with the serialized data samples (including prepran-dial glucose measurements, the corresponding meal informa-tion, bolus dosage and postprandial glucose measurements), theGPs are trained o ﬄ ine and then applied online to provide theprediction and the uncertainty estimation of postprandial glu-cose trajectories. Considering the asymmetric risks of hyper-and hypoglycemia and the uncertainties in the predicted glucosetrajectories, we develop an asymmetric risk-sensitive cost func-tion to favor safe control actions. Finally, a constrained stochas-tic optimization problem is formulated for the meal bolus deci-sion based on the designed cost function. Since the gradientof the cost function is unavailable, we solve the optimizationproblem using Bayesian optimization and Monte-Carlo simula-tions. To ensure the safety of the method, IOB constraints arealso incorporated. A GP assumes a distribution over random functions f ( x ) : R n → R , such that values of f at any input x have a joint Gaus-sian distribution [28, 29], which is denoted as f ( x ) ∼ GP ( m ( x ) , k ( x , x ′ )) , (1)where m ( x ) and k ( x , x ′ ) are the mean function and positivesemideﬁnite covariance function, respectively, which have theform: m ( x ) = E f [ f ( x )] , (2) k ( x , x ′ ) = cov f [ f ( x ) , f ( x ′ )] , x , x ′ ∈ R n . (3)In GPs, we can encode our prior knowledge about the pro-cess by designing corresponding prior mean and covariancefunctions, which are parameterized by the parameter vector θ f .Considering the noisy observations y of f with the form of2 (cid:258)(cid:437)(cid:400)(cid:400)(cid:349)(cid:258)(cid:374)(cid:3)(cid:87)(cid:396)(cid:381)(cid:272)(cid:286)(cid:400)(cid:400)(cid:286)(cid:400) (cid:87)(cid:396)(cid:286)(cid:282)(cid:349)(cid:272)(cid:410)(cid:349)(cid:381)(cid:374)(cid:3)(cid:381)(cid:296)(cid:3)(cid:87)(cid:381)(cid:400)(cid:410)(cid:393)(cid:396)(cid:258)(cid:374)(cid:282)(cid:349)(cid:258)(cid:367)(cid:3)(cid:39)(cid:367)(cid:437)(cid:272)(cid:381)(cid:400)(cid:286) (cid:17)(cid:258)(cid:455)(cid:286)(cid:400)(cid:349)(cid:258)(cid:374)(cid:3)(cid:75)(cid:393)(cid:410)(cid:349)(cid:373)(cid:349)(cid:460)(cid:258)(cid:410)(cid:349)(cid:381)(cid:374) (cid:39)(cid:87)(cid:882)(cid:271)(cid:258)(cid:400)(cid:286)(cid:282)(cid:3)(cid:373)(cid:381)(cid:282)(cid:286)(cid:367)(cid:349)(cid:374)(cid:336)(cid:3)(cid:4)(cid:272)(cid:395)(cid:437)(cid:349)(cid:400)(cid:349)(cid:410)(cid:349)(cid:381)(cid:374)(cid:3)(cid:296)(cid:437)(cid:374)(cid:272)(cid:410)(cid:349)(cid:381)(cid:374) (cid:68)(cid:381)(cid:374)(cid:410)(cid:286)(cid:3)(cid:18)(cid:258)(cid:396)(cid:367)(cid:381)(cid:3)(cid:68)(cid:286)(cid:410)(cid:346)(cid:381)(cid:282) (cid:68)(cid:286)(cid:258)(cid:367)(cid:47)(cid:374)(cid:400)(cid:437)(cid:367)(cid:349)(cid:374) (cid:94)(cid:349)(cid:373)(cid:437)(cid:367)(cid:258)(cid:410)(cid:286)(cid:282)(cid:3)(cid:39)(cid:367)(cid:437)(cid:272)(cid:381)(cid:400)(cid:286)(cid:3)(cid:100)(cid:396)(cid:258)(cid:272)(cid:286) (cid:68)(cid:286)(cid:258)(cid:367)(cid:47)(cid:374)(cid:400)(cid:437)(cid:367)(cid:349)(cid:374) (cid:4)(cid:400)(cid:455)(cid:373)(cid:373)(cid:286)(cid:410)(cid:396)(cid:349)(cid:272)(cid:3)(cid:90)(cid:349)(cid:400)(cid:364)(cid:882)(cid:94)(cid:286)(cid:374)(cid:400)(cid:349)(cid:410)(cid:349)(cid:448)(cid:286)(cid:3)(cid:18)(cid:381)(cid:374)(cid:410)(cid:396)(cid:381)(cid:367)(cid:47)(cid:374)(cid:400)(cid:437)(cid:367)(cid:349)(cid:374)(cid:3)(cid:381)(cid:374)(cid:3)(cid:17)(cid:381)(cid:258)(cid:396)(cid:282) (cid:100)(cid:349)(cid:373)(cid:286)(cid:3)(cid:400)(cid:349)(cid:374)(cid:272)(cid:286)(cid:3)(cid:349)(cid:374)(cid:400)(cid:437)(cid:367)(cid:349)(cid:374)(cid:3)(cid:282)(cid:286)(cid:367)(cid:349)(cid:448)(cid:286)(cid:396)(cid:455)(cid:3)(cid:896)(cid:346)(cid:897)(cid:47)(cid:75)(cid:17)(cid:3)(cid:896)(cid:1081)(cid:897) (cid:68)(cid:286)(cid:258)(cid:367)(cid:3)(cid:271)(cid:381)(cid:367)(cid:437)(cid:400) (cid:100)(cid:349)(cid:373)(cid:286)(cid:3)(cid:3)(cid:896)(cid:346)(cid:897) (cid:75)(cid:393)(cid:410)(cid:349)(cid:373)(cid:349)(cid:460)(cid:258)(cid:410)(cid:349)(cid:381)(cid:374)(cid:24)(cid:258)(cid:410)(cid:258)(cid:3)(cid:94)(cid:258)(cid:373)(cid:393)(cid:367)(cid:286)(cid:400) (cid:4)(cid:393)(cid:393)(cid:367)(cid:349)(cid:272)(cid:258)(cid:410)(cid:349)(cid:381)(cid:374) (cid:18)(cid:258)(cid:367)(cid:272)(cid:437)(cid:367)(cid:258)(cid:410)(cid:349)(cid:381)(cid:374)(cid:24)(cid:286)(cid:272)(cid:349)(cid:400)(cid:349)(cid:381)(cid:374) (cid:75)(cid:271)(cid:400)(cid:286)(cid:396)(cid:448)(cid:286)(cid:282)(cid:3)(cid:115)(cid:258)(cid:367)(cid:437)(cid:286)(cid:400)(cid:18)(cid:381)(cid:374)(cid:400)(cid:410)(cid:396)(cid:258)(cid:349)(cid:374)(cid:410) (cid:28)(cid:448)(cid:258)(cid:367)(cid:437)(cid:258)(cid:410)(cid:349)(cid:381)(cid:374)(cid:3)(cid:87)(cid:381)(cid:349)(cid:374)(cid:410)(cid:400)(cid:68)(cid:286)(cid:258)(cid:367)(cid:3)(cid:17)(cid:381)(cid:367)(cid:437)(cid:400) Figure 1: Schematic of the proposed method. y = f ( x ) + ω , where ω ∼ N (0 , σ ω ) is a white Gaussian noise,the covariance function of y has the form: k y ( x , x ′ ) = k ( x , x ′ ) + σ ω δ ( x , x ′ ) , (4)where δ ( x , x ′ ) is the Kronecker delta function which is oneif and only if x = x ′ and zero otherwise. Given N train-ing inputs X = [ x , x , ..., x N ] ⊤ and the corresponding obser-vations Y = [ y , y , ..., y N ] ⊤ , the posterior GP hyper-parameters θ = [ θ f , σ ω ] are learned by maximizing the log-marginal likeli-hood: arg max θ log( p ( Y | X , θ )) [29].With the determined hyper-parameters, the GP can infer theposterior distribution of y ∗ corresponding to a new input x ∗ : y ∗ ∼ N ( m ∗ , σ ∗ ) with m ∗ = m ( x ∗ ) + K ⊤∗ ( K + σ ω I ) − ( Y − m ( X )) , (5) σ ∗ = k ( x ∗ , x ∗ ) − K ⊤∗ ( K + σ ω I ) − K ∗ , (6)where K ∗ = [ k ( x ∗ , x ) , ..., k ( x ∗ , x N )] ⊤ and K is the covariancematrix with elements K i j = k ( x i , x j ). As shown in (5) and (6),for a new input, a GP can provide a determined prediction viapredictive mean, c.f., (5), along with an estimate of uncertaintyor conﬁdence in the prediction via the predictive variance, c.f.,(6). Physiologically-based compartmental models are commonlyutilized to describe glucose dynamics using ﬁrst order di ﬀ eren-tial equations [30, 31, 32]; however, the parameter identiﬁca-tion of these models is time-consuming and sometimes is evenimpossible with incomplete information, e.g., when only theCGM data is available. Here, we utilize GPs to perform robustmodeling for postprandial glucose dynamics with incompleteand noisy information. Speciﬁcally, by feeding autoregressive,or time-delayed, input and output signals back to the model asregressors, the GPs can be used for modeling nonlinear dynam-ical control systems [22, 23]. To do this, the postprandial glucose (PG) dynamics is de-scribed in a multistep form using autoregressive models, andeach step is separately represented by a nonlinear function withadditive noise, which has the form: P t + = f t ( z t ) + w t , z t = [ P t − l , ..., P t , u , d ] ⊤ ; P t + = f t + ( z t + ) + w t + , z t + = [ P t − l + , ..., P t + , u , d ] ⊤ ; ... P t + n = f t + n − ( z t + n − ) + w t + n − , z t + n − = [ P t + n − − l , ..., P t + n − , u , d ] ⊤ , (7)where t denotes the time of the meal intake, w is a white Gaus-sian noise, P is the glucose measurement, and l is the lag forautoregressive outputs; u is the meal bolus, and d is the carbo-hydrate intake. To convey the most information of glucose sit-uations, the lag of l = T =

15 minare considered. This corresponds to the lag of 2 hours. Corre-spondingly, we take n =

8; since the sampling period is 15 min,this corresponds to the duration of 2 hours.The utilization of GPs is divided into two stages: o ﬄ ine mod-eling and online prediction. In the o ﬄ ine modeling stage, basedon (7), the GPs are separately used to model the PG dynamicsin each step following a similar way. For example, for the timestep t +

1, we use z t = [ P t − , ..., P t , u , d ] ⊤ as training inputs,and the di ﬀ erences ∆ P t = P t + − P t as training targets to re-duce the prediction uncertainty [22]. A linear mean functionand a commonly-used covariance function known as squaredexponential (SE) covariance kernel are considered: m ( z t ) = a ⊤ z t + b , (8) k z ( z t , z ′ t ) = σ f exp −

12 ( z t − z ′ t ) ⊤ Ω − ( z t − z ′ t ) ! + δ ( z t , z ′ t ) σ ω , (9)3here Ω = diag { [ l , l , ..., l ] } , σ f denotes the signal variance,and the characteristic length scales for input space l , l , ..., l describe the smoothness of the function. Note that u and d staythe same for all steps. Moreover, to highlight the e ﬀ ects of u and d on the glucose regulation, the glucose measurements inthe training inputs for each step are separately normalized into[0 ,

1] using min-max normalization.In the online prediction stage, given a bolus dosage andknown carbohydrate amount intakes, by iteratively feedingback the predictive mean of previous step into the input forthe prediction, we can obtain the prediction for each step usingcorresponding trained GPs. This prediction corresponds to thedi ﬀ erence between the current step and previous step. We thenadd this prediction with the predicted mean of the previous stepto determine the ﬁnal prediction in current step. Uniting thepredictions for the all steps, the GPs are able to provide 8-steppredictions for the PG trajectories that correspond to the corre-sponding preprandial glcucose situations, carbohydrate intakesand meal boluses. Note that when the eating habit of a subjectis approximately consistent in terms of timing and sizes of mealintake, we can approximate the e ﬀ ect of similar food intake asan invariant disturbance and the meal size information can beoptional for postprandial glucose prediction, utilizing the ro-bust prediction ability of the GPs; this allows the design of abolus decision algorithm without meal announcements. The glucose control problem is highly asymmetric in thesense that consequences of hypoglycemia are immediate andmore detrimental in comparison with those of (temporary) hy-perglycemia; therefore, we try to correct the postprandial hy-perglycemia while taking extra care of hypoglycemia. One fea-sible method to address this issue is to construct asymmetriccost functions [33, 34]. Inspired by the work in [34], we pe-nalize deviations above and below the target asymmetrically; inaddition, considering the uncertainties in postprandial glucoseprediction, the cost function is built in the risk-sensitive (RS)framework.To do this, we denote the 8-step predictions provided by theGPs as G i ∼ N ( m i , σ i ), i ∈ { , , ..., } , respectively, and collectthem as a vector state G = [ G , G , ..., G ] ⊤ , which describesthe probability distribution of postprandial glucose trajectories.According to the principle of risk-sensitive analysis [35], anasymmetric RS cost is designed as follows: L ARS = − γ log E (cid:20) exp (cid:18) − γ G − G r ) ⊤ + Q + ( G − G r ) + − γ G − G r ) ⊤− Q − ( G − G r ) − (cid:19)(cid:21) , (10)where ( G − G r ) + = [( G − G r ) ( G − G r ≥ , ..., ( G − G r ) ( G − G r ≥ ⊤ , (11)( G − G r ) − = [( G − G r ) ( G − G r < , ..., ( G − G r ) ( G − G r < ⊤ , (12) and ( · ) denotes the indicator function; G r is the target for thepostprandial glucose management; Q + is a positive penalty ma-trix for the glucose excursions above the target; Q − is a neg-ative penalty matrix for the glucose excursions below the tar-get; γ < Q + is designed as aconstant diagonal matrix. Based on the designed Q + , the diag-onal elements of Q − are devised to increase exponentially withthe increase of the absolute deviation from target while beingrestricted by upper and lower bounds. Speciﬁcally, the i th diag-onal element of Q − has the form of Q − i : = Q + i c + exp { α ( β − | G i − G ri | ) } + c ! , (13)where Q + i is the i th diagonal element of Q + ; Γ : = [ α, β, c , c ]is a quadruple determines the penalty intensity, which is de-signed same for the all diagonal elements. The lower and upperbounds are determined by c and c + c , respectively, and therate of increase is controlled by α . The parameter design willbe discussed in Section 3.Based on the above design, the quadratic penalty using con-stant Q + is scaled on the excursions above the target to maintaina reasonable but active response to hyperglycemia. Compara-tively, a quadratic penalty with exponentially weighted coe ﬃ -cients is applied to the glucose excursions below the target tohave a reasonably conservative response to the glucose excur-sions near the target, while maintaining the ability to respondquickly to larger glucose excursions and safely compensate forhypoglycemia. Given the designed asymmetric risk-sensitive cost L ARS in(10), the GP-based asymmetric risk-sensitive control for themeal bolus decision is formulated as the following constrainedstochastic optimization problem,min u L ( u ) : = L ARS + Ru , (14)s.t. G i ∼ N ( m i , σ i ) , i ∈ { , ..., } (15)0 ≤ u ≤ u max , (16)where u is the meal bolus to be optimized, R is the inputweighting to compromise the asymmetric RS cost and the ac-tual needed bolus dosage, m i and σ i are parameters provided bythe GPs. Since the gradient of the cost function cannot be obtainedanalytically, Bayesian Optimization (BO) [26] is employed tosolve the above constrained stochastic optimization problem.4 .3.1. Model Learning for the Cost Function

Here, we utilize a new GP to construct an approximation ofa complex map from the decision variable u to the cost func-tion value L ( u ) in (14). Speciﬁcally, we consider a prior zeromean function and the SE covariance function in (9) with ascalar input. As discussed in Section 2.1, given the set of N past observations D N = { u N , L ( u N ) } , the GP is trained andthen applied to predict the cost function value for a candidatemeal bolus u ∗ according to (5)-(6). The prediction is denoted asˆ L ( u ∗ ) and will be utilized to construct the acquisition function(see Section 2.3.2).Note that the value of the cost function corresponding tothe decision variable is estimated by Monte-Carlo simulations.Speciﬁcally, given a bolus dosage, we generate 1000 samplesfor the postprandial glucose trajectory based on the joint Gaus-sian distribution provided by the GP in Section 2.1. The aver-age cost for these samples is further calculated, which is thenregarded as the observed value of the cost corresponding to thebolus dosage. As a critical ingredient of the BO, the acquisition functionguides the optimization by determining the optimum candidatepoint for the next evaluation. Speciﬁcally, utilizing the pre-diction information o ﬀ ered by the model learning phase, theacquisition function is constructed to determine the candidatepoint by maintain a trade-o ﬀ exploration of the search space andexploitation of current promising areas. Up to now, there arerich literature concerning the acquisition function design [26],where several structures have been developed, e.g., probabil-ity of improvement, expected improvement, upper conﬁdencebound, and entropy search. In this work, the expected improve-ment is considered.Compared with probability of improvement, the expectedimprovement (EI) acquisition function [36] also incorporatesthe amount of improvement in selecting the candidate points.Speciﬁcally, to minimize the cost in (14), the improvementfunction of the EI acquisition function is deﬁned as I ( u ∗ ) : = ( L p min − ˆ L ( u ∗ )) ( L p min > ˆ L ( u ∗ )) , (17)where L p min denotes the minimum observed value of the costso far, ˆ L ( u ∗ ) ∼ N ( m ( u ∗ ) , σ ( u ∗ )) is the prediction at candidatepoint u ∗ . Note that the term L p min − ˆ L ( u ∗ ) represents the amountof improvement, and the other term denotes the probability ofthat improvement. Based on the improvement function, the EIacquisition function is further deﬁned as α EI ( u ∗ ) : = E ( I ( u ∗ )) . (18)By calculating the expectation in (18), we have α EI ( u ∗ ) = ( ( L p min − m ( u ∗ )) Φ ( U ) + σ ( u ∗ ) φ ( U ) , if m ( u ∗ ) > , , otherwise, (19)where U = L p min − m ( u ∗ ) σ ( u ∗ ) , (20) Algorithm 1

Bayesian optimization for meal bolus decision D ←

Initialize: { u , L ( u ) } , where u are selectedequidistantly among the bound in (16) while the number of iterations ≤ M do Train a GP model from D Compute the prediction at candidate u ∗ by the GP(Eq. (5)-(6)) Determine candidate u ∗ by maximizing the acquisitionfunction (Eq. (19)) Observe L ( u ∗ ) at determined u ∗ using the Monte-Carlomethod (see Sec. 2.3.1) Append { u ∗ , L ( u ∗ ) } to D end while Select the input value that corresponds to the minimum ob-served value of the cost as the ﬁnal solutionand Φ ( · ) is the standard normal cumulative distribution func-tion, and φ ( · ) denotes the standard normal probability densityfunction. The candidate point for the next evaluation is deter-mined as the one that maximizes α EI .The BO algorithm that solves the optimization problem in(14)-(16) is summarized in Algorithm 1. After M sequentialoperations of the BO, the ﬁnal solution e u b is determined. Forsafety concern, an IOB constraint [37] is enforced to preventoverbolus based on insulin delivery history. Denoting the IOBconstraint as u IOB , the ﬁnal meal bolus is determined as u b = e u b − u IOB . (21)

3. Data collection and Parameter design

The parameters of the proposed method are designed andevaluated using the UVA / Padova T1DM metabolic simulator[38]. The “average patient” of the simulator is selected to per-form the parameter design, and the obtained parameters are fur-ther evaluated using the 10 virtual adult patients.

Before the parameter design and evaluation, the data samplesneeded for the PG model learning are collected for each patient.The samples are collected under the situation where the in silico patients combine basal insulin and bolus insulin to control BG.To do this, a protocol with nominal basal rate and announcedmeals are designed for the patients. The protocol starts from5:00 on day one and lasts one week (7 days). Considering thedaily activities of diabetic patients tend to form similar patterns,e.g., with respect to meal timing and meal amount, but to mimiclifestyle disturbances, we assume the patients take breakfast,lunch, and dinner with normally distributed meal sizes (withmeans and standard deviations equal to [50, 75, 75] g and [3,4, 4] g CHO) and meal times uniformly distributed in [07:00,09:00], [11:00, 13:00], and [18:00, 20:00], respectively. Themeals are all announced but the meal boluses are calculatedwith potentially inappropriate CR, such that the variation of thebolus is uniformly distributed in [-30%, + able 1: Parameters for the proposed method Variables Value γ -2 R u max M Q + diag { [0 . , · · · , . , . , . } Γ [1 , , , G r [100 , , , , , , , ⊤ the breakfast, lunch and dinner, respectively. Each sample in-cludes preprandial glucose measurements, corresponding bolusdosage, carbohydrate amount, and postprandial glucose mea-surements. As the consumed CHO for lunch and dinner aresimilar but di ﬀ erent from the breakfast, we construct the samePG model for both lunch and dinner, and build a separate modelfor the breakfast using corresponding samples, although it isalso possible to build separate models for breakfast, lunch anddinner. Using the collected samples, the GPs are trained o ﬄ ine toprovide PG predictions for the online control of the method. Inthe online control, the parameter design and its evaluation areperformed. In the parameter design phase, a 12-hour in silico protocol starting from 6:00 is employed, where breakfast (50g CHO), lunch (75 g CHO) are consumed at 8:00 and 12:00,respectively.The parameters are designed using a trial-and-error approachbased on the glucose data obtained from the “average patient”,such that satisfactory postprandial regulation performance interms of average glucose, percent time in [70, 180] mg / dL, andpercent time below 70 mg / dL can be achieved. The obtainedparameters are summarized in Table 1. In the evaluation phase,di ﬀ erent scenarios are designed to evaluate the performance ofthe proposed method with the obtained parameters. The resultsare reported in Section 4.

4. Performance Analysis

As mentioned above, the proposed method is evaluated onthe 10-adult cohort of the UVA / Padova T1DM simulator. Be-sides, the advisory-mode analysis [27] is performed for themethod based on the clinical data from a particular T1DM sub-ject who undertook the MDI therapy.For the in silico evaluations, two protocols are designed toperform the comparison of the proposed method and the stan-dard insulin bolus calculator (denoted as “Control”), which hasthe form: u bolus = CHOCR + G c − G sp CF − u IOB , (22)where G c is the current glucose level (mg / dL); G sp is the glu-cose set-point (mg / dL), selected as 140 in this comparison. Oneof the designed protocol (denoted as “Protocol A”) begins at 5:00 on day 1 and lasts two days (48 hours) where breakfast(55 g and 45 g CHO), lunch (65 g and 85 g CHO) and dinner(85 g and 65 g CHO) are consumed at 8:00, 12:00 and 18:00 fortwo days, respectively. The other protocol (denoted as “Proto-col B”) begins at 5:00 and lasts one day (24 hours), includ-ing breakfast, lunch and dinner with 45 g, 85 g, and 65 g ofCHO content at 8:00, 12:00 and 18:00, respectively. Based onprotocol A, in silico experiments are performed on the whole10-adult cohort with nominal basal rates using additive CGMnoises for scenario (not) utilizing the meal information for theprediction, thus a total of 10 simulations are performed foreach scenario and each method. The statistical results are sum-marized in Table 2, and the comparison performance in termsof glucose regulation and meal bolus is illustrated in Fig. 2,where the 5%, 25%, 75% and 95% quartile curves togetherwith the median curves are presented. Besides, to illustrate therobust decision-making ability of the proposed method, in sil-ico evaluations without meal information for scenarios of basal-rate mismatches (110% and 80% of the nominal basal rate) areperformed on the same cohort using protocol B, respectively.The results are presented in Table 3 and Fig. 3, and the cor-responding discussions for the both protocol are provided inSection 4.1.The advisory-mode analysis [27] allows the comparisonswith insulin recommendations made by clinicians, throughfeeding the identical glucose data obtained in the clinical trialto the system. Here, by feeding the historical preprandial glu-cose data to the proposed method, the corresponding meal bo-luses are determined and compared with the ones following theclinician’s advice. The obtained meal boluses have no causalimpact on historical data and are only for the comparison. Thecollection of the historical clinical data are presented in Fig. 4,and the comparison performance are illustrated in Fig. 5. Thecorresponding discussions are provided in Section 4.2. In this subsection, using Protocol A, the performance of theproposed method are evaluated for the cases with the meal in-formation (IM) and without the meal information (NM), re-spectively. From Table 2, for the both cases, the proposedmethod achieves comparable glucose regulation performance incomparison with the standard insulin bolus calculator, which isequipped with the well-designed CR and CF. This is reﬂected inpercent time in the euglycemic range of 70-180 mg / dL (96.6%vs. 95.8%, p = .

627 for IM; 95.4% vs. 95.8%, p = .

264 forNM), mean glucose (126.7 mg / dl vs. 132.8 mg / dL, p = . / dL vs. 132.8 mg / dL, p = .

084 for NM),percent time >

250 mg / dL (0.0% vs. 0.0%, p = .

000 forIM; 0.0% vs. 0.0%, p = .

000 for NM) and glucose stan-dard deviation (SD) (25.9 vs. 26.2, p = .

000 for IM; 27.0vs. 26.2, p = .

020 for NM). No increase in the risk of hypo-glycemia is observed (percent time <

70 mg / dL, 0.0% vs. 0.0%, p = .

500 for IM; 0.0% vs. 0.0%, p = .

500 for NM). Theseresults show the method is robust to the meal information if theCHO amount intakes are similar to the standard amount usedfor the model learning, and also illustrate the learning abilityof proposed method, as the samples for the model learning are6 able 2: Comparison results with nominal basal rate

Scenario ( =

10) With meal information Without meal informationMetric Control Proposed p value Control Proposed p value% time <

54 mg / dL 0.0 (0.0) 0.0 (0.0) 1.000 0.0 (0.0) 0.0 (0.0) 1.000 <

70 mg / dL 0.0 (0.0) 0.0 (0.0) 0.500 0.0 (0.0) 0.0 (0.0) 0.50070-180 mg / dL 95.8 (5.7) 96.6 (7.1) 0.627 95.8 (5.7) 95.4 (7.8) 0.264 >

180 mg / dL 4.2 (5.7) 3.4 (5.6) 0.605 4.2 (5.7) 3.7 (7.8) 0.223 >

250 mg / dL 0.0 (0.0) 0.0 (0.0) 1.000 0.0 (0.0) 0.0 (0.0) 1.000Mean glucose (mg / dL) 132.8 (8.3) / dL) 26.2 (7.2) 25.9 (8.9) 1.000 26.2 (7.2) Mean glucose at 07:00 118.8 (11.0) p values are calculated based on the numberof virtual patients. Statistically signiﬁcant ( p < .

05) changes are highlighted in bold. (a) With meal information (b) Without meal informationFigure 2: Performance comparison with nominal basal rate in terms of glucose regulation and meal bolus. Yellow, blue, green and purple triangles denote meals of45 g, 55 g, 65 g and 85 g CHO, respectively. Table 3: Comparison results with under / over-estimated basal rate Scenario ( =

10) 80% of the nominal basal rate 110% of the nominal basal rateMetric Control Proposed p value Control Proposed p value% time <

54 mg / dL 0.0 (0.0) 0.0 (0.0) 1.000 0.0 (0.0) 0.0 (0.0) 1.000 <

70 mg / dL 0.0 (0.0) 0.0 (0.0) 1.000 0.0 (0.0) 0.0 (3.1) 0.62570-180 mg / dL 87.8 (3.5) 87.7 (8.0) 0.389 96.5 (3.8) 95.0 (9.7) 0.586 >

180 mg / dL 12.2 (3.5) 12.3 (8.0) 0.389 3.5 (4.9) 4.7 (7.3) 0.945 >

250 mg / dL 0.0 (0.0) 0.0 (0.0) 1.000 0.0 (0.0) 0.0 (0.0) 1.000Mean glucose (mg / dL) 151.2 (4.9) 146.6 (10.4) 0.131 120.9 (6.9) 119.0 (7.6) 0.064SD glucose (mg / dL) 24.7 (5.9) 26.8 (5.9) 0.131 28.8 (4.8) 32.0 (8.9) 0.275Mean glucose at 07:00 151.0 (21.0) p values are calculated based on the numberof virtual patients. Statistically signiﬁcant ( p < .

05) changes are highlighted in bold.collected with the inappropriate CR. The discussions of controlperformance are consistent with the quartile curves in Fig. 2.Besides, from the quartile curves of meal bolus in Fig. 2, weobserve that compared with the case of NM, the method in thecase of IM tends to increase (decrease) the meal bolus for theknown large (small) CHO. This implies that when the accuratemeal information are available, the method can react reason-ably to the CHO amount, but this will increase the burden of data collection for model learning in return.Using Protocol B, we also perform additional tests con-sidering realistic scenarios of under / over-estimated basal rateswith / without meal information to evaluate the robustness of theproposed method. The results are also compared with those ob-tained for the standard insulin bolus calculator. Since similarresults are observed for both cases, here we only present the re-sults of the case without meal information. From Table 3 and7 a) Under-estimated basal rates (b) Over-estimated basal ratesFigure 3: Performance comparison for the scenario of under- / over-estimated basal rates. Fig. 3, it is observed that for the scenario of under-estimatedbasal rate, the proposed method tends to increase the meal bolusfor the elevated preprandial glucose levels, and achieves simi-lar performance in terms of percent time in [70, 180] mg / dL(87.7% vs. 87.8%, p = / dLvs. 151.2 mg / dL, p = . / dL, 0.0% vs. 0.0%, p = . / dL (95.0% vs. 96.5%, p = / dL vs. 120.9 mg / dL, p = . p = . in silico evaluation results, however, is to compare theproposed data-driven method with the standard approach that isbuilt on CR and CF information. In this subsection, the historical clinical data from a T1DMsubject who undertook the MDI therapy are utilized to evaluatethe performance of the proposed method. Flash glucose mon-itoring (FGM) was worn to collect the glucose measurements.The data for the seven days (see Fig. 4) were collected at hospi-tal, where the patient was managed to have a consistent diet forevery day, in terms of similar meal timing and meal intakes, andthe corresponding meal boluses were determined by the clini-cians. The study was approved by institutional review board atPeking University People’s Hospital and written informed con-sent of the participant was obtained. Since the meal intakes arealmost identical for every day, we use the data of the ﬁrst ﬁvedays to model the PG dynamics for the breakfast and lunch-dinner, respectively, without the meal information. At last, theperformance of the proposed method equipped with the trainedGPs is evaluated by feeding the data for the next two days. From Fig. 5, compared with the ﬁxed meal boluses (denotedas blue bars) following the clinician’s advice, the proposedmethod can determine reasonable meal boluses (denoted as redbars) according to the preprandial glucose levels. We observethat the method would suggest additional 1-2 units of the insulinbolus for the lunch or dinner of the two days due to the ele-vated preprandial glucose level. This reasonable increase boluswould help reduce the later happened hyperglycemia. Besides,the breakfast meal boluses are the same as those determined bythe clinicians; observing that the historical breakfast boluses arealmost all identical, this indicates that the risk-sensitive controlmechanism tends to maintain the decisions in the database toensure safety when the risk of taking a di ﬀ erent value is notclear.

5. Conclusion

In this work, a GP-based asymmetric risk-sensitive (ARS)control method is proposed for the personalized meal bolus de-cision. With the formulation of the ARS cost function, themethod is capable to apply the experience learned form thesamples, while keeping own decision-making ability, e.g., tak-ing extra care of the hyperglycemia and increasing (decreas-ing) the meal bolus for the elevated (lowered) preprandial glu-cose levels. Besides, the method is robust to the meal vari-ability within a tolerable range, which reduces the burdens ofestimating the CHO amount for each meal. The e ﬀ ectivenessand robustness of the controller are evaluated using the 10-adult cohort of the UVA / Padova simulator through comparisonswith the standard insulin bolus calculator. Also, advisory-modeanalysis is performed based on the clinical data from a T1DMsubject. For future work, since the model learning is relied onthe o ﬄ ine samples, an adaptive PG model will be developed forthe method to take the long-term variations of the physiologicaldynamics into account. Acknowledgment

Access to the distributed version of the University of Virginia(UVA) / Padova metabolic simulator for research purposes is ac-knowledged.8 G l u c o se [ m g / dL ]

31g 31g 31g 31g 31g 31g 31g62g 62g 62g 62g 62g 62g 62g 62g 62g 62g 62g 62g 62g 62g

Time [h] M ea l bo l u s [ U ] Figure 4: The collection of the clinical data from a T1DM subject. Meals are denoted by green triangles with sizes below them, and the corresponding meal bolusesdetermined by the clinicians are displayed in the second panel. G l u c o se [ m g / dL ]

31g 31g62g 62g 62g 62g

Time [h] M ea l bo l u s [ U ] Figure 5: Performance evaluation of the proposed method based on the clinical data. Meals are denoted by green triangles with sizes below them.