[PDF] Glucodensities: a new representation of glucose profiles using distributional data analysis

Abstract

Biosensor data has the potential ability to improve disease control and detection. However, the analysis of these data under free-living conditions is not feasible with current statistical techniques. To address this challenge, we introduce a new functional representation of biosensor data, termed the glucodensity, together with a data analysis framework based on distances between them. The new data analysis procedure is illustrated through an application in diabetes with continuous-time glucose monitoring (CGM) data. In this domain, we show marked improvement with respect to state of the art analysis methods. In particular, our findings demonstrate that i) the glucodensity possesses an extraordinary clinical sensitivity to capture the typical biomarkers used in the standard clinical practice in diabetes, ii) previous biomarkers cannot accurately predict glucodensity, so that the latter is a richer source of information, and iii) the glucodensity is a natural generalization of the time in range metric, this being the gold standard in the handling of CGM data. Furthermore, the new method overcomes many of the drawbacks of time in range metrics, and provides deeper insight into assessing glucose metabolism.

Full PDF

GGlucodensities: a new representation of glucose proﬁles usingdistributional data analysis

Marcos Matabuena , , ∗ , Alexander Petersen , Juan C.Vidal and Francisco Gude Unidad de Epidemiolog´ıa Cl´ınica, Hospital C´ınico Universitario de Santiago deCompostela, Spain CiTIUS (Centro Singular de Investigaci´on en Tecnolox´ıas Intelixentes), Universidade deSantiago de Compostela, Spain Department of Statistics and Applied Probability, University of California, Santa Barbara ∗ [email protected] August 19, 2020

Abstract

Biosensor data has the potential ability to improve disease control and detection. However,the analysis of these data under free-living conditions is not feasible with current statisticaltechniques. To address this challenge, we introduce a new functional representation of biosen-sor data, termed the glucodensity, together with a data analysis framework based on distancesbetween them. The new data analysis procedure is illustrated through an application in dia-betes with continuous-time glucose monitoring (CGM) data. In this domain, we show markedimprovement with respect to state of the art analysis methods. In particular, our ﬁndingsdemonstrate that i) the glucodensity possesses an extraordinary clinical sensitivity to capturethe typical biomarkers used in the standard clinical practice in diabetes, ii) previous biomark-ers cannot accurately predict glucodensity, so that the latter is a richer source of information,and iii) the glucodensity is a natural generalization of the time in range metric, this being thegold standard in the handling of CGM data. Furthermore, the new method overcomes manyof the drawbacks of time in range metrics, and provides deeper insight into assessing glucosemetabolism.

The steadily increasing availability and prominence of biosensor data have given rise to new method-ological challenges for their statistical analysis. A primary feature of these data is that the moni-tored individuals are in free-living conditions, making a direct analysis of the recorded time seriesbetween groups of patients problematic if not infeasible. A clear example of such data is foundin the study of diabetes, where continuous glucose monitoring (CGM) is increasingly used. Theelevation of glucose is distinct between individuals and is inﬂuenced by factors such as mealtimes,diet composition, or physical exercise (Ewings et al., 2015). Consequently, an exciting topic ofdebate is how to exploit the enormous wealth of information recorded by CGM to draw more re-liable conclusions about the glucose homeostasis rather than the cursory summary measures suchas fasting plasma glucose (FPG) or glycated hemoglobin (A1c) (Zaccardi and Khunti, 2018).1 a r X i v : . [ s t a t . A P ] A ug ince 2010, the American Diabetes Association (ADA) has included measurement of A1c levelsto both diagnosis and diabetes control Association et al. (2018). A1c levels reﬂect underlying glucoselevels over the preceding 2–3 months, testing is convenient because blood samples can be obtained atany time of day, overnight fasting is not required, and A1c within-patient reproducibility is superiorto that of fasting plasma glucose and oral glucose tolerance tests (OGTTs) Selvin et al. (2007).However, recent articles have provided evidence for the need to go beyond A1c and use new measuresfor glycemic control (Group, 2018; Bergenstal, 2015) in order to capture more diverse aspects of thetemporally evolving glucose levels beyond the average, for example, glucose variability and time inrange metrics. The time in metric range measures the proportion of time an individual’s glucoselevels is maintained in diﬀerent target zones. In the case of diabetes, these can include rangescorresponding to hypoglycemia and hyperglycemia. In an innovative article, Beck et al. (2019)validated the time in range metric, showing that it is a good predictor of long-term microvascularcomplications despite just measuring glucose values seven times per day. Lu et al. (2018) reachedsimilar conclusions but using CGM technology only for 24 hours in each patient. At the same time,it is well-known that two patients may have the same glycosylated hemoglobin and a completelydiﬀerent glycemic proﬁle (Beck et al., 2017). These new approaches and ﬁndings have lead clinicalspecialists to consider that continuous glucose measurement during long monitoring periods canlead to more accurate results in research and clinical practice than in standard methods (Hirschet al., 2019). In fact, since 2012, the European Medicine Agency (for Medicinal Products forHuman Use et al., 2012) recommends the use of CGM to validate the eﬀect of drugs for treatmentor prevention of diabetes mellitus.Traditionally, CGM was designed for the risk management in real-time for type 1 diabetes andthe control of glucose values with insulin pumps (Kovatchev et al., 2009; Feig et al., 2017; DiMeglioet al., 2018). Notwithstanding, more recent applications of CGM have been more general. Theyinvolve, for example, screening patients, optimizing diet, epidemiological studies, assessing patientprognosis, and supporting treatment prescription, and have even been used in healthy populations(Freeman and Lyons, 2008; Hall et al., 2018). In addition to the increasing utility of CGM data,the technology is gradually becoming cheaper, and new devices capable of measuring glucose in anon-invasive way, for example, with glasses (Nichols et al., 2013), are quickly emerging. All of theseadvances are facilitating the adoption of CGM in standard clinical practice.In 2012, a panel of experts discussed how to represent CGM data in an “easy to view format”(Bergenstal et al., 2013). They also analyzed the convenience of using glycemic variability measuresand other summary measures such as time in range to extract the recorded information from CGM.In 2019, ADA established an updated version of clinical standards to use and deﬁne target zoneswith time in range metrics (Battelino et al., 2019). In a more recent review about the CGM metric,they establish time in range as a gold standard measure (Nguyen et al., 0).Motivated by the problem of analyzing data gathered via CGM more precisely, while stillleveraging the advantages possessed by time in range metrics, we propose an approach based on theconstruction of a functional proﬁle of glucose values for each subject. Conceptually, the approachis a natural extension of time in range metrics in which the ranges shrink and size and increase innumber, so that new proﬁle eﬀectively measures the proportion of time each patient spends at eachspeciﬁc glucose concentration rather than a coarsely deﬁned range. As a result of this, the newfunctional proﬁle, which we refer to as a glucodensity, automatically and simultaneously capturesall parameters arising from individual glucose distributions. Figure 1 illustrates a set of constructedglucodensities that represent the data objects for which we will propose the use of a tailored set of2tatistical methods.Mathematically, glucodensities constitute functional distributional data since each glucodensityrepresents a distribution of glucose concentrations. As such, these complex and constrained curvescannot be directly analyzed with the usual techniques. To overcome this, we introduce a frameworkfor the analysis of glucodensities by compiling suitable methods that are based on the calculationof glucodensities distances. We also reveal the superior clinical capacity of our representationcompared to classical measures of diabetes. Finally, we demonstrate that our representation hasa higher sensitivity than the standard time in range metric to explain the glycemic diﬀerencesbetween patients in various settings, including regression analysis. A new shiny interface to use themethods outlined in this paper is available at https://tec.citius.usc.es/diabetes .

50 100 150 200 250 300 350 400 . . . . Glucodensities

Glucose, mg/dL D en s i t y Figure 1: Example of a set of glucodensities estimated from a random sample of the AEGISpopulation-based study

The structure of this paper is as follows. First, we brieﬂy describe the AEGIS study, and themethods used. We then formally introduce the concept of glucodensity, the estimation methods,and some essential statistical background to understand the statistical procedures introduced inthe paper. Subsequently, we explain the regression models used in the validation of the repre-3iomarker Clinical signiﬁcanceA1c in diabetes diagnosis and controlGold standard markerHOMA-IR resistance and β -cell functionMeasurements to quantify insulinCONGAMODD glucose variabilitySummary indices ofMAGETable 1: Clinical importance of biomarkers used in the statistical analysissentation. Afterward, we show the results that demonstrate the superiority of glucodensity overglucose representations of state-art. Then, we illustrate the use with real data of the glucodensitiesmethodology in two-sample testing and cluster analysis. Finally, we discuss the clinical implica-tions of these results, their limitations, and the new perspectives of the glucodensities method inmedicine and device technology. A subset of the subjects in the A Estrada Glycation and Inﬂammation Study (AEGIS; trial

N CT ) provided the sample for the present work. In thelatter cross-sectional study, an age-stratiﬁed random sample of the population (aged ≥

18) wasdrawn from Spain’s National Health System Registry. A detailed description has been publishedelsewhere (Gude et al., 2017). For a one year beginning in March, subjects were periodically exam-ined at their primary care centre where they ( i ) completed an interviewer-administered structuredquestionnaire; ( ii ) provided a lifestyle description; ( iii ) were subjected to biochemical measure-ments; and ( iv ) were prepared for CGM (lasting 6 days). The subjects who made up the presentsample were the 581 (361 women, 220 men) who completed at least 2 days of monitoring, out ofan original 622 persons who consented to undergo a 6-day period of CGM. Another 41 originalsubjects were withdrawn from the study due to non-compliance with protocol demands (n = 4) ordiﬃculties in handling the device (n = 37). The characteristics of the participants are shown onthe Table 2. The present study was reviewed and approved by the Clinical Research Ethics Committee fromGalicia, Spain (CEIC2012-025). Written informed consent was obtained from each participant inthe study, which conformed to the current Helsinki Declaration.

Glucose was determined in plasma samples from fasting participants by the glucose oxidase per-oxidase method. A1c was determined by high-performance liquid chromatography in a Menarini4en ( n = 220) Women ( n = 361)Age, years 47 . ± . . ± . . ± . . ± . mg/dL ±

23 91 ± mg/dL.µU I/m . ± .

56 2 . ± . kg/m . ± . . ± . mg/dL . ± .

40 0 . ± . mg/dL . ± . . ± . . ± .

58 0 . ± . BM I - body mass index;

F P G - fasting plasma glucose; A c - glycated haemoglobin; HOM A − IR - homeostasis model assessment-insulin resistance; CON GA - glycemic variabilityin terms of continuous overall net glycemic action;

M ODD - mean of daily diﬀerences;

M AGE -mean amplitude of glycemic excursions.Diagnostics HA-8160 analyser; all A1c values were converted to DCCT-aligned values (Hoelzel et al.,2004). Insulin resistance was estimated using the homeostasis model assessment method (HOMA-IR) as the fasting concentration of plasma insulin ( µ units/mL) × plasma glucose (mg/dL)/ 405(Matthews et al., 1985). Glycaemic variability was measured in terms of continuous overall net glycemic action (CONGA)(McDonnell et al., 2005), the mean amplitude of glycaemic excursions (MAGE) (Service et al.,1970), and the mean of the daily diﬀerences (MODD) (Molnar et al., 1972) in glucose concentration.

At the start of each monitoring period, a research nurse inserted a sensor (Enlite TM , Medtronic,Inc, Northridge, CA, USA) subcutaneously into the subject’s abdomen, and instructed him/her inthe use of the iPro TM CGM device (Medtronic, Inc, Northridge, CA, USA). The sensor continu-ously measures the interstitial glucose level 40 −

400 (range mg/dL) of the subcutaneous tissue,recording values every 5 min. Participants were also provided with a conventional OneTouchRVerioR Pro glucometer (LifeScan, Milpitas, CA, USA) as well as compatible lancets and test stripsfor calibrating the CGM. All subjects were asked to make at least three capillary blood glucosemeasurements (usually before main meals). These readings were taken without checking the currentCGM reading. On the seventh day the sensor was removed and the data downloaded and storedfor further analysis. If the number of data-acquisition “skips” per day totalled more than 2 h, theentire day’s data were discarded.

The time in the range metric was calculated with two diﬀerent methods. In the ﬁrst, through theCGM records of the AEGIS study, we estimate the deciles of CGM records with normoglycemic5atients and use as cut-oﬀs the deciles (Table 3). In the second, we use cut-oﬀ points establishedby the ADA in the 2019 Medical guideline (Battelino et al., 2019) (Table 4).Range 1 < − − − − − − − − > < − − − > The density functions for each individual was estimated with non-parametric Nadaraya-Watsonprocedure. For this purpose, we used a Gaussian kernel and rule of thumb as a smoothing parameter.In addition, we estimate quantile representation for 2-Wasserstein methods using the empiricaldistribution.The following three regression models were used: i) The non-parametric kernel functional regres-sion model through 2-Wasserstein distance with the glucodensity as predictor (Ferraty and Vieu,2006); ii) A global 2-Wasserstein regression model where the glucodensity is response (Petersenand M¨uller, 2019); and iii) k -nearest neighbor algorithm in the case of time in range metrics with k = 10 neighbors.In the case of time in range metrics, we applied the isometric log-ratio (ilr) transformationfor compositional data prior to ﬁtting the model. To avoid problems with zeros, a ﬁxed positiveconstant was added to each each range, which were then normalized to add to 1.All analyses were carried out using R software. Functional data analysis was performed usingthe fda.usc package (Febrero-Bande and de la Fuente, 2012), which is freely available at https://cran.r-project.org/ , and our own implementations of the ANOVA test of Dubey and M¨uller62019) or Fr`echet regression in Petersen and M¨uller (2019) using the 2-Wasserstein distance. Theglucodensities and their quantile representation were estimated using the R basis functions. For patient i , denote the gathered glucose monitoring data by pairs ( t ij , X ij ), j = 1 , . . . , m i , wherethe t ij represent recording times that are typically equally spaced across the observation interval,and X ij is the glucose level at time t ij ∈ [0 , T i ] . Note that the number of records m i , the spacingbetween them, and the overall observation length T i can vary by patient. One can think of these dataas discrete observations of a continuous latent process Y i ( t ) , with X ij = Y i ( t ij ) . The glucodensityfor this patient is deﬁned in terms of this latent process as f i ( x ) = F (cid:48) i ( x ) , where F i ( x ) = 1 T i (cid:90) T i ( Y i ( t ) ≤ x ) d t for inf t ∈ [0 ,T i ] Y i ( t ) ≤ x ≤ sup t ∈ [0 ,T i ] Y i ( t )is the proportion of the observation interval in which the glucose levels remain below x. Since F i are increasing from 0 to 1, the data to be modeled are a set of probability density functions f i ,i = 1 , . . . , n. Of course, neither F i nor the glucodensity f i is observed in practice, but one can construct anapproximation through a density estimate ˜ f i ( · ) obtained from the observed sample. In this case ofCGM data, the glucodensities may have diﬀerent support and shape. Therefore, we suggest usinga non-parametric approach to estimate each density function. For example, using a kernel-typeestimator, we have ˜ f i ( x ) = 1 m i m i (cid:88) j =1 K h i ( x − X ij ) , where h i > K h i ( s ) = h i K ( sh i ). The choice of K does not have abig impact on the eﬃciency of the estimator, but the value of h i is crucial.Several alternatives for selecting the smoothing parameter have been proposed in the literature,including cross-validation, minimizing the estimated mean integrated squared error (MISE), or a“rule of thumb” derived from the assumption that the density is Gaussian. In this last case, thechoice can be explicitly written as ˜ h i = 1 .

06 ˜ σ i m − / i , where ˜ σ i is the sample standard deviation ofthe X ij . For more details, see Silverman (1986). Other approaches for the density function estima-tion include the use orthogonal series (e.g., Fourier or Wavelet) expansions, splines, or smoothing ofhistograms. For further details the reader is referred to Antoniadis (1997); Izenman (1991); M¨ullerand Petersen (2014). Let [ a, b ] be an interval of the real line, which may be unbounded, and suppose that each gluco-density f i has support contained in [ a, b ]. From a statistical point of view, the sample f , . . . , f n may be modeled and analyzed using methods of functional data analysis (Ramsay et al., 2005;Wang et al., 2016). However, since the f i must be positive and satisfy (cid:82) ba f i ( x ) dx = 1 , classicalmethods have in recent years been adapted to account for the nonlinear, distributional structureof density samples (Petersen and M¨uller, 2016; Hron et al., 2016). The general approach is todeﬁne a metric or distance between densities that, in turn, leads to descriptive statistics that7espect the unique density properties. For example, deﬁne the data space of glucodensities as A := { f : [ a, b ] → R + : (cid:82) ba f ( x ) dx = 1 and (cid:82) ba x f ( x ) dx < ∞} . Given two arbitrary glucodensities f, g ∈ A , the 2-Wasserstein distance (Villani, 2008) between f and g is d W ( f, g ) = (cid:115)(cid:90) ba ( F − ( x ) − G − ( x )) dx, (1)where F and G are the cumulative distribution functions (cdfs) of the density functions f and g .The 2-Wasserstein distance is a natural distance to measure the similarity between densityfunctions through its representation in the space of the quantile (inverse cdf) functions and ithas already been successfully applied in biological problems. Furthermore, it has computationaland modeling advantages compared to the usual L [ a, b ] metric when glucodensities have diﬀerentsupport within [ a, b ]. Finally, it has a physical interpretation in the theory of optimal transport.As glucodensities are distributional data, the subsequent application of the usual techniquesfor functional data, such as estimation of mean, covariance, and regression models, may lead tomisleading results. Hence, we have chosen to use models based on the 2-Wasserstein distance,although other choices are possible. As a starting point, based on the notion of distance we cangeneralise the mean and variance of a random variable that takes values in an abstract space withmetric structure (Fr´echet, 1948). As we will see, similar adaptations can be developed for regression,hypothesis testing, or to perform cluster analysis. Given a distance d : A × A → R + , of which d W is one example, and a random variable f deﬁned on A , the Fr´echet mean of f is µ f = arg min g ∈ A E ( d ( f, g )) . The

Fr´echet variance of Z is then σ f = E ( d ( f, µ f )) . If the choice of distance is the Wasserstein metric d W , these are given the names of Wassersteinmean and variance, respectively. In the following subsections we will extend these concepts ofFr`echet to statistical methodologies of regression, clustering, and hypothesis testing based on thenotion of distance. Let f be a functional random variable taking values in ( A, d W ) and Y a random variable that takevalues in the real line. We assume the following regression relationship between f and Y , whichrepresent the predictor and response variables, respectively: Y = g ( f ) + (cid:15) (2)where g : A → R is an unknown smooth function, and the random error (cid:15) satisﬁes E ( (cid:15) ) = 0.Given a sample { ( f i , Y i ) ∈ A × R } ni =1 , most non-parametric estimators ˜ g ( · ) have the form of aweighted average of the responses ˜ g ( x ) = n (cid:88) i =1 w ni ( x ) Y i . (3)8n general, the weights w ni ( x ) depend on the distance between each f i and x , with larger distancesreceiving lower weights, and satisfy (cid:80) ni =1 w ni ( x ) = 1 (Ferraty and Vieu, 2006). A typical choicewould be the Nadaraya–Watson weights w ni ( x ) = K ( d ( x,f i ) h ) (cid:80) ni =1 ( K ( d ( x,f j ) h )) , (4)where h is a smoothing parameter and K : R → R is a known univariate probability densityfunction called the kernel. For more details about this procedure see Ferraty and Vieu (2006). Asan alternative for the above method, we can use the kernel methods in Reproductive Kernel HilbertSpaces (RKHS) (Preda, 2007; Szab´o et al., 2016). In the case of the regression models with a density function as response, the literature is not veryextensive to the current date (Nerini and Ghattas, 2007; Han et al., 2019; Petersen and M¨uller,2019; Capitaine et al., 2019; Talska et al., 2018). In this article we use the model proposed inPetersen and M¨uller (2019) which allows us to incorporate the desired metric d W and is a directgeneralization of classical linear regression. The primary rationale for our use of this model is that,unlike the other approaches mentioned above, there is a methodology developed to performanceinferential procedures such as conﬁdence bands and hypothesis testing in order to establish thesigniﬁcance of the input variables in the model Petersen et al. (2019).Let f be a random variable (e.g. a glucodensity) that take values in the space of ( A, d W ) deﬁnedabove. Consider a random vector U ⊂ R d that contains the set of predictors. Our interest is in theFr`echet regression function, or function of conditional Fr`echet means, f ( u ) := arg min g ∈ A E ( d W ( f, g ) | U = u ) , u ∈ R d (5)Petersen and M¨uller (2019) imposes a particular model for f that, in direct analogy to classicallinear regression, takes the form of a weighted Fr`echet mean: f ( u ) = arg min g ∈ A E ( s ( U, u ) d W ( f, g )) , u ∈ R d . (6)Here, the weight function is s ( U, u ) = 1 + ( U − µ ) T Σ − ( u − µ ) , µ = E ( U ) , Σ = Cov( U ) , (7)and Σ is assumed to be positive deﬁnite.Given a sample ( U i , f i ) , i = 1 , . . . , n, of independent pairs each distributed as ( U, f ) , one canproceed to estimate f ( u ) for any desired input u. Due to the intimate connection between theWasserstein metric and quantile functions as in (1), for most inferential procedures it is suﬃcientto estimate the conditional Wasserstein mean quantile function Q ( u ) corresponding to f ( u ) . Let D be the set of quantile functions, Q i the quantile function corresponding to the random density f i , and deﬁne empirical weights s in ( u ) = 1+( U i − U ) T ˆΣ − ( u − U ) , where U and ˜Σ are the sample meanand variance of the U i , respectively. The natural estimator under d W is the weighted empiricalmean quantile function ˜ Q ( u ) = arg min Q ∈ D n (cid:88) i =1 s in ( x ) (cid:107) Q − Q i (cid:107) , (8)9here (cid:107)·(cid:107) denotes the L [0 ,

1] norm on D .A straightforward algorithm for computing ˜ Q ( u ) is shown in Supplementary Material of originalreference Petersen et al. (2019). In addition, two algorithms are given to estimate the conﬁdencebands at a given signiﬁcance level α for both the quantile functional parameter Q ( u ) and the densityparameter f ( u ). To validate the glucodensity representation, we use the database from the AEGIS study (Gudeet al., 2017). The database contains the continuous glucose monitoring data between 2-6 days of581 patients from a random sample of a general population. A detailed description of the datais introduced in Section 2 together with characteristics of patients in Table 2. To develop thevalidation task, we use two diﬀerent regression models: i) a non-parametric regression model wherethe unique predictor is glucodensity, and ii) a linear regression model where the response is aglucodensity. Further details on the regression models used can be found in the Section 3. Theﬁrst model was used to predict glycated hemoglobin (A1c) (Kilpatrick, 2000), homeostatic modelassessment (HOMA-IR) Ausk et al. (2010), and the following measures of glycemic variabilityService (2013); Monnier et al. (2008); Gude et al. (2017): continuous overall net glycemic action(CONGA), mean amplitude of glycemic excursions (MAGE) and mean of daily diﬀerences (MODD),through glucodensity representation. In contrast, the second was used to predict the glucodensitywith the ﬁve variables above. Figure 1 gives a visualization of the sample of glucodensities used inthese models. Biological signiﬁcance in variables under consideration is described in Table 1.

The aim of the ﬁrst set of regression analyses is to demonstrate that the glucodensity is suﬃcientlyrich in its information content to recover the aforementioned biomarkers with high precision. Toquantify this precision, we estimated the R after ﬁtting a non-parametric model for each biomarkeras the outcome variable, using the glucodensity as the sole predictor (i.e. independent variable).The R estimates for A1c, HOMA-IR, MAGE, MODD, CONGA were 0 .

79, 0 .

92, 0 .

86, and 0 . In the second regression analysis with the glucodensity as the outcome variable, we aim to show thatthe previous measurements commonly used in the clinical practice are not capable of capturing theglucodensity with high accuracy. This fact is not completely surprising because, as noted by someauthors (Zaccardi and Khunti, 2018), the information provided by a CGM is more precise thanthat contained in summary measures. To accomplish this, we computed a suitable version of R for this task after ﬁtting a regression model where the response is a glucodensity, and the previousvariables are the predictors. In this case, the R estimated was 0 .

74. As predicted, comparedto the previous section’s results, we were not able to accurately capture the complex nature of10 llll ll lll llll l lllll l ll l ll llll lll l llll llll l ll ll ll ll ll ll lll lll lll ll ll l lll l ll l ll lllllll lllllll lll ll lllll llll lllll llll lllll llllll ll llll lll l ll lll l ll l lll ll llll l lll l lllll ll ll llllllll ll lll l lllll lll lll ll ll llllllll lll l l lll l ll l ll llll llll llll ll llll llll lll llll llllllll lll lll ll ll lll llllllll ll llll lll ll lll lllll llll lll l ll llll ll l ll ll lll lll lll llll lll lll lllllll lllll lllll lllllll lllllll lllll lllllll llllll l lllll lll llll llll lll llllll ll lll lll lll lll ll ll l lll ll llllll lll ll lll llllllllll lllll ll lll lll ll llllll lllllll l llll lll ll llll ll ll ll lllll ll llll llllll ll ll llll llll lll ll lllll l lllllllllllllllll lllll ll ll lllll lll

A1c, %

50 100 150

MAGE, mg/dL

Real values E s t i m a t ed v a l ue s Figure 2: Real values vs Estimated values when glucodensity is predictorglucodensities, even while using the combined predictive power of several commonly used summarymeasures. Moreover, in some cases, the diﬀerences in prediction can be signiﬁcant (see Figure 3).

To illustrate the higher clinical sensitivity of glucodensities compared to time in range metrics, wecompared the ability of each representation to predict A1c, HOMA-IR, and glycemic variabilitymetrics MODD, MAGE, and CONGA, using the data from AEGIS study. The predictive capacityof the glucodensity representation was illustrated above, and this section gives the correspondingresults for time in range metrics, where these were calculated according to two sets of cutoﬀs. Inthe ﬁrst, the deciles of the normoglycemic individuals from the AEGIS study were used, while inthe second those proposed by the ADA were used. Tables 3 and 4 in Section 2 show the exact cutoﬀvalues for both cases. Since the time in range metrics constitute a sample of standard compositionaldata, the isometric log-ratio (ilr) transformation was employed in combination with a k -nearestneighbor algorithm as a regression model for predicting the scalar variables. Methodological detailsabout this statistical procedure can be found in Section 2.11 .0 0.2 0.4 0.6 0.8 1.0 − − Residuals

Quantile G l u c o s e , m g / dL Figure 3: Residuals in quantile space prediction glucodensities

Figure 4 compares the real and estimated values of the previous ﬁve variables under the two timein range metrics under consideration with. Table 5 provides the estimates of R for each variableand metric. The predictive capacity is signiﬁcantly worse than that attained by the glucodensitymethodology. The superiority of the glucodensity is particularly noteworthy in the case of theHOMA-IR variable, where the association is quite weak for time in range metrics. Even for the othervariables where the values of R are moderate, the larger residuals seen in diabetes patients withmore severe alterations of glucose metabolism indicate that time in range metrics are particularlypoorly suited for such patients. Interestingly, we do not observe substantial or consistent diﬀerencesA1c HOMA-IR CONGA MAGE MODDNormoglycemic cut-oﬀ 0 .

63 0 .

22 0 .

68 0 .

65 0 . .

61 0 .

08 0 .

73 0 .

69 0 . R estimated with time in range metrics under consideration12etween the two time in range metrics used, as deciles perform better than ADA criteria for twoof the variables, while in other instances the ordering was reversed. l llll ll lll llll l lllll l ll l ll llll lll l llll llll l ll ll ll ll ll ll lll lll lll ll ll l lll l ll l ll lllllll lllllll lll ll lllll llll lllll llll lllll llllll ll llll lll l ll lll l ll l lll ll llll l lll l lllll ll ll llllllll ll lll l lllll lll lll ll ll llllllll lll l l lll l ll l ll llll llll llll ll llll llll lll llll llllllll lll lll ll ll lll llllllll ll llll lll ll lll lllll llll lll l ll llll ll l ll ll lll lll lll llll lll lll lllllll lllll lllll lllllll lllllll lllll lllllll llllll l lllll lll llll llll lll llllll ll lll lll lll lll ll ll l lll ll llllll lll ll lll llllllllll lllll ll lll lll ll llllll lllllll l llll lll ll llll ll ll ll lllll ll llll llllll ll ll llll llll lll ll lllll l lllllllllllllllll lllll ll ll lllll lll . . . . A1c, %

Real values E s t i m a t ed v a l ue s l llll ll lll llll l lllll l ll l ll llll lll l llll llll l ll ll ll ll ll ll lll lll lll ll ll l lll l ll l ll lllllll lllllll lll ll lllll llll lllll llll lllll llllll ll llll lll l ll lll l ll l lll ll llll l lll l lllll ll ll llllllll ll lll l lllll lll lll ll ll llllllll lll l l lll l ll l ll llll llll llll ll llll llll lll llll llllllll lll lll ll ll lll llllllll ll llll lll ll lll lllll llll lll l ll llll ll l ll ll lll lll lll llll lll lll lllllll lllll lllll lllllll lllllll lllll lllllll llllll l lllll lll llll llll lll llllll ll lll lll lll lll ll ll l lll ll llllll lll ll lll llllllllll lllll ll lll lll ll llllll lllllll l llll lll ll llll ll ll ll lllll ll llll llllll ll ll llll llll lll ll lllll l lllllllllllllllll lllll ll ll lllll lll l ll llllllllllll l lll ll lllll lll lllllllll llll ll lllll ll llll llll ll llll lll l llll lll llll llllll ll lllllll lllllllllll llllllllllllll lll llllll lll llll ll l ll lllllllll lllll lll llllllllll lll ll lll lllllll ll llll ll lll llll lllll llllllllllllllll lllll lllllll ll lll lllll lllllll lllllllll ll lll llll llllll llll llll lllll lll lll ll lll lllll ll l llllllll lllll lll llllll lllll lll lllllllllllll lllllll lllllllllllllll llll lll lllllll ll lllllllllll llllllll lllllll ll ll lllllll lll llll llll lllll llll ll lllll lllllll llllllll ll llllllllllllllllll lll llllll ll llllll lll llllllllllll lllllll lllll lllllll llllllllllll llllll llllllllllllllllll ll lllll lll HOMA−IR, mass units

Real values E s t i m a t ed v a l ue s l ll llllllllllll l lll ll lllll lll lllllllll llll ll lllll ll llll llll ll llll lll l llll lll llll llllll ll lllllll lllllllllll llllllllllllll lll llllll lll llll ll l ll lllllllll lllll lll llllllllll lll ll lll lllllll ll llll ll lll llll lllll llllllllllllllll lllll lllllll ll lll lllll lllllll lllllllll ll lll llll llllll llll llll lllll lll lll ll lll lllll ll l llllllll lllll lll llllll lllll lll lllllllllllll lllllll lllllllllllllll llll lll lllllll ll lllllllllll llllllll lllllll ll ll lllllll lll llll llll lllll llll ll lllll lllllll llllllll ll llllllllllllllllll lll llllll ll llllll lll llllllllllll lllllll lllll lllllll llllllllllll llllll llllllllllllllllll ll lllll lllllll ll l lll llll l llllll ll llllll ll ll l ll ll llll lll ll llll llll lll lll l ll lllll lll l ll lllllllll lllllllll ll lll lllllllll ll ll ll l ll lll l ll ll ll ll ll llll llll lll l lll lll ll l lll llll lll ll ll ll lllll lll l lll l lll lll l llllll llll llll ll ll lll l llll lll lll ll llllll l ll lll llll l lll lll llll lll llll l ll l lllll ll lll llllll ll ll lllll llll lll ll lll ll ll ll ll llll ll lll llll lllllllll llll lll lll lllll l l lll ll lll l ll lll l ll ll llll lllll llll ll ll ll lll ll ll lll ll l ll lll llll ll lll lll lll lll ll lll lll ll lllllll ll lll l lll lll ll l llllll lll l llll ll lll l lll lll lll ll ll llll ll llll l ll lllll l ll lll lllll ll l l lll ll llll ll ll ll ll llll l lllll lllll l lllll lllll ll lll l ll ll lll ll ll lll lll

50 100 150

MAGE, mg/dL

Real values E s t i m a t ed v a l ue s llll ll l lll llll l llllll ll llllll ll ll l ll ll llll lll ll llll llll lll lll l ll lllll lll l ll lllllllll lllllllll ll lll lllllllll ll ll ll l ll lll l ll ll ll ll ll llll llll lll l lll lll ll l lll llll lll ll ll ll lllll lll l lll l lll lll l llllll llll llll ll ll lll l llll lll lll ll llllll l ll lll llll l lll lll llll lll llll l ll l lllll ll lll llllll ll ll lllll llll lll ll lll ll ll ll ll llll ll lll llll lllllllll llll lll lll lllll l l lll ll lll l ll lll l ll ll llll lllll llll ll ll ll lll ll ll lll ll l ll lll llll ll lll lll lll lll ll lll lll ll lllllll ll lll l lll lll ll l llllll lll l llll ll lll l lll lll lll ll ll llll ll llll l ll lllll l ll lll lllll ll l l lll ll llll ll ll ll ll llll l lllll lllll l lllll lllll ll lll l ll ll lll ll ll lll lll l lll ll l lll llll l ll llll llll llll lllll llll llll lll l l lll l ll ll lll lll lll ll lll lll l llll ll lllll lll lll lllll lll lll llllll l l ll lllllll ll llllll ll ll llll l lll lll l llll ll ll l lll l lll llll lll l ll llll llll lll l lll lll l llllll llll llll llll lll l lllllll lll lllllllll l ll ll lllll lll ll l lll l lllllll l lll ll lllllll l lll lllll ll lll ll lllll ll ll lll ll ll ll ll lllll ll ll ll ll lllllllll llll lll lllll lll ll lll ll llll ll lll l llll ll ll lllll l lll ll ll lllll lllllllll llllllllll lll llll l lll lll ll lll ll l ll lll llll ll lll lllllll lllllll llllll llll ll ll ll lll lll lll ll llllll ll lll llll ll lll l ll l ll lllll ll ll lll ll ll ll ll ll ll ll llll l lll ll lll ll lll lll l llll llllll ll ll lllll ll lll lll . . . MODD

Real values E s t i m a t ed v a l ue s l lll ll l lll llll l ll llll llll llll lllll llll llll lll l l lll l ll ll lll lll lll ll lll lll l llll ll lllll lll lll lllll lll lll llllll l l ll lllllll ll llllll ll ll llll l lll lll l llll ll ll l lll l lll llll lll l ll llll llll lll l lll lll l llllll llll llll llll lll l lllllll lll lllllllll l ll ll lllll lll ll l lll l lllllll l lll ll lllllll l lll lllll ll lll ll lllll ll ll lll ll ll ll ll lllll ll ll ll ll lllllllll llll lll lllll lll ll lll ll llll ll lll l llll ll ll lllll l lll ll ll lllll lllllllll llllllllll lll llll l lll lll ll lll ll l ll lll llll ll lll lllllll lllllll llllll llll ll ll ll lll lll lll ll llllll ll lll llll ll lll l ll l ll lllll ll ll lll ll ll ll ll ll ll ll llll l lll ll lll ll lll lll l llll llllll ll ll lllll ll lll llll lll ll l lll lll l lll llll ll lll lll llll l ll l l llll lll l l lll l ll ll lll lll l ll lllll lll l ll l l ll lll ll llll ll lll ll lll ll l llllll ll ll lll l l llll ll ll ll ll llll l l l l ll lll l lll lll ll l lll llll lll l lll l ll llll lll l llll lll lll l lll lll llll llll ll ll lll ll llll ll lll ll lll lll l llll l llll l lll lll lll l ll lllll l ll l ll l ll ll l ll ll ll ll l l ll llll l ll l llll l llll ll ll ll l l lllll l lll llll llllll lll lll l lll lll lllll l l ll l ll lll l lllll l ll ll llll lllll llll ll l lll lll llll lll ll l ll lll llll lllll ll l l ll lll ll lll lll ll l llll ll ll llll lll lll lll llllll llll llll lllll l lll lll lll ll ll llll ll llll l ll lll ll l ll lll lllll ll l l lll ll ll ll ll ll ll ll llll llll ll lllll lll l ll lllll ll ll l l ll ll ll lll ll lll lll . . . . CONGA, mg/dL

As a special case of regression, suppose we have a sample f , . . . f n of glucodensities deﬁned on( A, d W ) belonging to k diﬀerent groups G , G , · · · , G k that partition { , . . . , n } and are of size n j ( j = 1 , · · · , k ), so that (cid:80) kj =1 n j = n . If the goal is to simply test whether the Wasserstein meansare equal for each group, Petersen et al. (2019) developed testing procedures based on model (6) forthis purpose. An advantage of this model is its ﬂexibility, which allows for multiple factor layoutsas well as tests for interactions. However, the theoretical properties of these tests require a type ofequal variance assumption that may be restrictive for some data sets.More generally, one may wish to test the null hypothesis that the population distributions of the k groups share common Wasserstein means and variances, against the alternative that at least one13f the groups has a diﬀerent population distribution compared to the others in terms of either itsWasserstein mean or variance. In this scenario, Dubey and M¨uller (2019) investigated a test statisticbased on the group proportions λ j,n = n j n − , the groupwise sample Wasserstein means ˜ µ j =arg min g ∈ A (cid:80) i ∈ G j d W ( f i , g ) and variances ˜ V j = n − j (cid:80) i ∈ G j d W ( f i , ˜ µ j ) , the pooled Wassersteinmean ˆ µ p = arg min g ∈ A (cid:80) kj =1 (cid:80) i ∈ G j d W ( f i , g ) and variance ˜ V p = n − (cid:80) kj =1 (cid:80) i ∈ G j d W ( f i , ˜ µ p ) , andﬁnally the quantities ˜ σ j = 1 n j (cid:88) i ∈ G j d W ( f i , ˆ µ j ) −  n j (cid:88) i ∈ G j d W ( f i , ˆ µ j )  as estimates of the variance of ˆ V j . Then, with F n = ˜ V p − k (cid:88) j =1 λ j,n ˜ V j , R n = (cid:88) j

Y, Y (cid:48) ∼ F and Z, Z (cid:48) ∼ G that are deﬁned on a(semi)metric space (Ω , ρ ) of negative type, where ρ : V × V → R is the semi-metric. Though thenotation in this section is quite general, in particular we have in mind the case (Ω , ρ ) = ( A, d W )corresponding to glucodensities. The energy distance associated with ρ between the distribution F and G is (cid:15) ρ ( F, G ) = 2 E ( ρ ( Y, Z )) − E ( ρ ( Y, Y (cid:48) )) − E ( ρ ( Z, Z (cid:48) )) . Given random samples Y , . . . , Y n iid ∼ F and Z , . . . , Z m iid ∼ G , the sample energy distance is˜ (cid:15) ρ ( F, G ) = 2 1 nm n (cid:88) i =1 m (cid:88) j =1 ρ ( Y i , Z j ) − n n (cid:88) i =1 n (cid:88) i =1 ρ ( Y i , Y j ) − m m (cid:88) i =1 m (cid:88) i =1 ρ ( Z i , Z j ) . The asymptotic distribution of the above statistic for a null hypothesis ( H : F = G ) as well asfor the alternative ( H a : F (cid:54) = G ) is dependent on the chosen semimetric ρ . Besides, its expression14s diﬃcult to calculate and to implement in practice. Hence, when using the energy distancebased methods, the distribution under the null hypothesis is usually calibrated with a permutationmethod. Alternatives to calibrate the distribution under the null hypothesis include the wild ora weighted boostrap, as described in Leucht and Neumann (2013); Jim´enez-Gamero et al. (2019).The energy distance can also be extended to handle samples from more than two populations.Given k independent samples Y j , . . . , Y jn j iid ∼ F j , j = 1 , . . . , k, the energy distance statistic is˜ (cid:15) ρ ( F , . . . , F k ) (cid:88) ≤ j

10. Therefore, there is no statistically signiﬁcant diﬀerencebetween men and women at the signiﬁcance level of 5 percent.Figure 5 shows the glucodensity samples for each gender using their quantile representations.The pointwise means of these quantile functions constitute the quantile function of the sampleWasserstein mean glucodensites. These, together with pointwise standard deviation curves, arealso shown in Figure 5. On average, the groups are quite similar. However, certain discrepanciesare observed between both groups in terms of their variance, although not large enough for the testto show statistical signiﬁcant diﬀerences.

Women

Quantile G l u c o s e , m g / dL Men

Quantile G l u c o s e , m g / dL Mean

Quantile G l u c o s e , m g / dL WomenMen

Standart desviation

Quantile G l u c o s e , m g / dL WomenMen

Figure 5: (Left two panels) Glucodensities for men and women of the AEGIS study, plotted asquantile functions; (Third panel) 2-Wasserstein mean quantile functions for each group; (FourthPanel) Cross-sectional standard deviation curves for quantile functions in each group.

Cluster analysis is an essential tool for identifying subgroups of patients with similar characteristics.As an example, with the diabetes patients’ data from the AEGIS study, we perform a clusteranalysis using three clusters. To establish when a patient has diabetes, we use the doctor’s previousdiagnostic criteria, or if individuals currently have their glucose values measured with A1c and FPGin the ranges established by the ADA to be classiﬁed in that category.16igure 6 contains the results of applying the cluster analysis in diabetes patients. The algorithmhas identiﬁed three diﬀerentiated groups of patients. The ﬁrst group is patients with normal glucosevalues, probably because they are on medication, and the diagnosis of diabetes was made in thepast. The second group are patients with slightly altered diabetes metabolism. Finally, the lastgroup is patients with severely altered glucose values, and as can be seen in the glucodensities, theirglucose is continuously ﬂuctuating. The two-dimensional graphical representation of the densityfunction of A1c and FPG helps to validate these ﬁndings.

50 100 150 200 250 300 350 400 . . . Cluster 1

Glucose, mg/dL D en s i t y Cluster 1

A1c, % F P G , m g / dL .

50 100 150 200 250 300 350 400 . . . Cluster 2

Glucose, mg/dL D en s i t y Cluster 2

A1c, % F P G , m g / dL . . . . .

50 100 150 200 250 300 350 400 . . . Cluster 3

Glucose, mg/dL D en s i t y Cluster 3

A1c, % F P G , m g / dL − − − . . . . . . Figure 6: Clustering analysis of diabetes patients in AEGIS study

The primary contribution of this article is to propose a new representation of CGM data calledglucodensity. We have validated this representation from a clinical point of view, proving that it ismore accurate than time in range metrics. 17 .1 Diabetes etiology and biological components to capture in a mathematicalrepresention

Diabetes encompasses a heterogeneous group of impaired glucose metabolism, such as the frequentpresence of hyperglycemias or hypoglycemias Association et al. (2018). Anomalous glucose ﬂuctu-ations are another essential trait of dysglycemic regulation Monnier and Colette (2011); Monnieret al. (2008). The use of glycemic control measures that go beyond the average glucose valuessuch as A1c and also capture i) the impact of time spent at each glucose concentration on theglucose deregulation process, ii) the oscillations of glucose associated with cellular damage Monnierand Colette (2011), is crucial in the management of patients with diabetes as in the assessment ofglucose metabolism with a high degree of precision.

Our proposal accurately captures the components of diabetes mentioned above. Using clinical data,we evaluated the clinical sensitivity against established biomarkers in diabetes. We found a highassociation between A1c, HOMA-IR, CONGA, MODD, MAGE, and glucodensity. In the case ofthe HOMA-IR variable, the predictive ability does not seem excellent, although, to the best of ourknowledge, no known marker shows a predictive ability against that variable. However, our modelcan provide consistent values in moderate and large HOMA-IR values. While the ﬁt for the variableA1c was not perfect, we must consider that the time scale for the A1c and the glucodensities werequite diﬀerent. A1c is a measure that reﬂects the average glucose over 2 − R of 0 .

79 is better than the averageglucose recorded by the monitoring period ( R = 0 . R shows a moderate relationship between those variables. However, we are introducingthe essential variables of the glucose deregulation process. A possible explanation of this is thatthe use of the summary measures commonly used in diabetes can hardly capture an individual’sglycemic proﬁle. Glucose metabolism is very complex and highly dependent on the patient’s con-ditions. For example, the cellular mechanisms are diﬀerent in type I and type II diabetes. Inthe former, there is an inhibition of β -cell function and consequent non-insulin production, whileinsulin secretion is reduced in the latter (Taylor, 2013). In this context, the introduction of theconcept of glucodensity provides greater clinical accuracy to the possible decisions derived fromsuch representation compared to traditional methods because we utilize the entire distribution ofglucose concentrations of an individual over time. While time in range metrics may also achieve the previous aim, they do so to a clearly lesserextent than the glucodensity. Our proposal can capture the diﬀerences between individuals in eachglucose concentration. Notwithstanding, time in range only measures glucose diﬀerences alongintervals with the subsequent loss of information. Also, time in range metrics are substantiallylimited since the target zones must be deﬁned previously, and these may also depend on the studypopulation or the aim of the analysis.Empirical results demonstrate the advantages of our proposal out of the theoretical framework.The ability of glucodensity to predict A1c, HOMA-IR, and the CONGA, MAGE, and MODD18ariability measures is surprisingly high, much higher than that achieved with the range metricdespite using two diﬀerent target zones: the deciles of normoglycemic patients glucose values andthe target zones prescribed by the ADA.The estimated R between glucodensities and A1c is similar than that reported by other authorsbetween A1c and average glucose values Nathan et al. (2007). However, in this study, patients aremonitored only for 2 − R between A1c and the mean glucose in our databaseis only 0 . From a statistical standpoint, glucodensities are a special constrained type of functional data knownas distributional data; therefore, it is not possible to directly use the usual statistical techniques.In this paper, we have proposed a framework for the analysis of these distributional data based ondistances with existing techniques for hypothesis testing, cluster analysis, and regression models.However, further methodological development is necessary, as it can be the case of mixed modelsor causal inference methods where there is no available methodology.

A potential limitation of our representation is that it ignores the order of events. Instead, it analyzesonly the distribution of glucose values. However, the event sequence may not be a critical componentin diabetes modeling. The main factor of microvascular and macrovascular complications is chronichyperglycemia Cryer (2014); ˇSkrha et al. (2016), and this is captured with high accuracy by ourmodels. Moreover, an essential aspect of managing diabetes patients is hypoglycemia control, andour proposal also captures this. Finally, the third component of dysglycemia Monnier et al. (2008),glucose variability, can accurately predict by our representations, at least, through metrics CONGA,MAGE, and MODD.The sample size used may also be a limitation from a statistical point of view. Nevertheless,in the ﬁeld of diabetes, the AEGIS study is the world’s largest databases and, unlike other stud-ies, is composed of randomly selected individuals from a general population and non-participantsZeevi et al. (2015). Finally, for study validation, perhaps the most reliable way of validating thenew representation is in terms of the patients’ long-term prognosis. However, to the best of ourknowledge, no study with a reasonable sample size has this information from the intensive use ofCGM technology. Moreover, we have established the clinical validity from variables that do havea clear and established relationship with the prognosis and prevalence of diabetes as evidenced inthe current literature in the ﬁeld.

Adopting the concept of glucodensity in clinical practice and biomedical research could be verypromising in the following ways. 19

To have a simple and more accurate representation of the glycaemic proﬁle of an individual.This representation is especially useful in the management of diabetic patients and to assessthe eﬀects of an intervention. • To establish if there are statistically signiﬁcant diﬀerences between patients subjected todiﬀerent interventions, for example, in a clinical trial. • To identify diﬀerent subtypes of patients based on their glycaemic condition and other vari-ables. Cluster analysis of glucodensities can create new patient subtypes based on the riskof diabetes or other complications. Furthermore, it allows us to describe the etiology ofthe disease better by creating groups of subjects whose glucose proﬁles and other clinicalcharacteristics are similar. • To establish the prognosis or risk of a patient or to analyze the relationship of an individual’sglycaemic proﬁle with diﬀerent clinical variables in epidemiological studies. • To predict changes in the glycaemic proﬁle based on the characteristics of the individuals andthe intervention performed. For example: how does the glucodensity vary according to thediet? • To recommend the most advantageous treatments for a patient. Following the previous idea,a causal inference model could be ﬁtted where the response is glucodensity, for example, toestablish which diet is the most beneﬁcial for the individual to achieve a suitable glucoselevels.

We introduce glucodensities methodology with CGM data. However, our methodology is alsovalid for data from other biosensors such as accelerometers to measure physical activity levels. Inthis domain, the time in range metric is one of the most used representations, and perhaps theadoption of our approach can lead to better results Dumuid et al. (2018, 2020). The adoption ofnew methodology with other biosensors may be an essential research issue to be addressed in thefuture.In the diabetes ﬁeld, it will be necessary to evaluate the predictive capacity of the glucodensityin the long-term prognosis of patients. In addition, it would be interesting to assess, in moreextended monitoring periods, the reproducibility between days and weeks with the representationconstructed. One way to accomplish this is to compute the intraclass correlation coeﬃcient (ICC)using, for example, the methodology proposed recently in Xu et al. and based on distances betweenfunctions.

Acknowledgements

We thanks Russell Lyons for his discussions on the use of energy-distance based methods withglucodensities.This work has received ﬁnancial support from Carlos III Health Institute, Grant/Award Num-ber: PI16/01395; Ministry of Economy and Competitiveness (SPAIN) European Regional Devel-opment Fund (FEDER); the Axencia Galega de Innovaci´on, Conseller´ıa de Econom´ıa, Emprego e20ndustria, Xunta de Galicia, Spain, Grant/Award Number: GPC IN607B 2018/01; National Sci-ence Foundation, Grant/Award Number: DMS-1811888; the Spanish Ministry of Economy andCompetitiveness Grant/Award Number: TIN2015-73566-JIN and TIN2017-84796-C21-R.

Competing Interests

The authors declare no competing interests.

References

Antoniadis, A. (1997). Wavelets in statistics: A review.

Journal of the Italian Statistical Society ,

97. URL https://doi.org/10.1007/BF03178905 . Association, A. D. et al. (2018). 6. glycemic targets: standards of medical care in dia-betes—2018.

Diabetes Care , S55–S64.

Ausk, K. J. , Boyko, E. J. and

Ioannou, G. N. (2010). Insulin resistance pre-dicts mortality in nondiabetic individuals in the u.s.

Diabetes Care , https://care.diabetesjournals.org/content/33/6/1179.full.pdf , URL https://care.diabetesjournals.org/content/33/6/1179 . Battelino, T. , Danne, T. , Bergenstal, R. M. , Amiel, S. A. , Beck, R. , Biester, T. , Bosi, E. , Buckingham, B. A. , Cefalu, W. T. , Close, K. L. , Cobelli, C. , Dassau, E. , DeVries, J. H. , Donaghue, K. C. , Dovc, K. , Doyle, F. J. , Garg, S. , Grunberger,G. , Heller, S. , Heinemann, L. , Hirsch, I. B. , Hovorka, R. , Jia, W. , Kordonouri, O. , Kovatchev, B. , Kowalski, A. , Laffel, L. , Levine, B. , Mayorov, A. , Mathieu, C. , Mur-phy, H. R. , Nimri, R. , Nørgaard, K. , Parkin, C. G. , Renard, E. , Rodbard, D. , Saboo,B. , Schatz, D. , Stoner, K. , Urakami, T. , Weinzimer, S. A. and

Phillip, M. (2019). Clini-cal targets for continuous glucose monitoring data interpretation: Recommendations from the in-ternational consensus on time in range.

Diabetes Care . https://care.diabetesjournals.org/content/early/2019/06/07/dci19-0028.full.pdf , URL https://care.diabetesjournals.org/content/early/2019/06/07/dci19-0028 . Beck, R. W. , Bergenstal, R. M. , Riddlesworth, T. D. , Kollman, C. , Li, Z. , Brown,A. S. and

Close, K. L. (2019). Validation of time in range as an outcome measure for diabetesclinical trials.

Diabetes Care , https://care.diabetesjournals.org/content/42/3/400.full.pdf , URL https://care.diabetesjournals.org/content/42/3/400 . Beck, R. W. , Connor, C. G. , Mullen, D. M. , Wesley, D. M. and

Bergenstal, R. M. (2017). The fallacy of average: How using hba1c alone to assess glycemic control can be mislead-ing.

Diabetes Care , https://care.diabetesjournals.org/content/40/8/994.full.pdf , URL https://care.diabetesjournals.org/content/40/8/994 . Bergenstal, R. M. (2015). Glycemic variability and diabetes complications: Does it mat-ter? simply put, there are better glycemic markers!

Diabetes Care , https://care.diabetesjournals.org/content/38/8/1615.full.pdf , URL https://care.diabetesjournals.org/content/38/8/1615 .21 ergenstal, R. M. , Ahmann, A. J. , Bailey, T. , Beck, R. W. , Bissen, J. , Buckingham,B. , Deeb, L. , Dolin, R. H. , Garg, S. K. , Goland, R. , Hirsch, I. B. , Klonoff, D. C. , Kruger, D. F. , Matfin, G. , Mazze, R. S. , Olson, B. A. , Parkin, C. , Peters, A. , Powers, M. A. , Rodriguez, H. , Southerland, P. , Strock, E. S. , Tamborlane, W. and

Wesley, D. M. (2013). Recommendations for standardizing glucose reporting and analysis tooptimize clinical decision making in diabetes: The ambulatory glucose proﬁle (agp).

DiabetesTechnology & Therapeutics , https://doi.org/10.1089/dia.2013.0051 , URL https://doi.org/10.1089/dia.2013.0051 . Capitaine, L. , Genuer, R. and

Thi´ebaut, R. (2019). Fr´echet random forests. . Cryer, P. E. (2014). Glycemic goals in diabetes: Trade-oﬀ between glycemic control and iatrogenichypoglycemia.

Diabetes , https://diabetes.diabetesjournals.org/content/63/7/2188.full.pdf , URL https://diabetes.diabetesjournals.org/content/63/7/2188 . DiMeglio, L. A. , Evans-Molina, C. and

Oram, R. A. (2018). Type 1 diabetes.

TheLancet , . Dubey, P. and

M¨uller, H.-G. (2019). Fr´echet analysis of variance for random ob-jects.

Biometrika , http://oup.prod.sis.lan/biomet/article-pdf/106/4/803/30646779/asz052.pdf , URL https://doi.org/10.1093/biomet/asz052 . Dumuid, D. , Pediˇsi´c, ˇZ. , Palarea-Albaladejo, J. , Mart´ın-Fern´andez, J. A. , Hron, K. and

Olds, T. (2020). Compositional data analysis in time-use epidemiology: What, why, how.

International journal of environmental research and public health , https://pubmed.ncbi.nlm.nih.gov/32224966 . Dumuid, D. , Stanford, T. E. , Martin-Fern´andez, J.-A. , ˇZeljko Pediˇsi´c , Maher, C. A. , Lewis, L. K. , Hron, K. , Katzmarzyk, P. T. , Chaput, J.-P. , Fogelholm, M. , Hu, G. , Lambert, E. V. , Maia, J. , Sarmiento, O. L. , Standage, M. , Barreira, T. V. , Broyles,S. T. , Tudor-Locke, C. , Tremblay, M. S. and

Olds, T. (2018). Compositional data analysisfor physical activity, sedentary time and sleep research.

Statistical Methods in Medical Research , https://doi.org/10.1177/0962280217710835 , URL https://doi.org/10.1177/0962280217710835 . Ewings, S. M. , Sahu, S. K. , Valletta, J. J. , Byrne, C. D. and

Chipperfield, A. J. (2015).A bayesian network for modelling blood glucose concentration and exercise in type 1 diabetes.

Statistical Methods in Medical Research , https://doi.org/10.1177/0962280214520732 , URL https://doi.org/10.1177/0962280214520732 . Febrero-Bande, M. and de la Fuente, M. (2012). Statistical computing in functional dataanalysis: The r package fda.usc.

Journal of Statistical Software, Articles , . Feig, D. S. , Donovan, L. E. , Corcoy, R. , Murphy, K. E. , Amiel, S. A. , Hunt, K. F. , Asztalos, E. , Barrett, J. F. R. , Sanchez, J. J. , de Leiva, A. , Hod, M. , Jovanovic,L. , Keely, E. , McManus, R. , Hutton, E. K. , Meek, C. L. , Stewart, Z. A. , Wysocki,T. , O’Brien, R. , Ruedy, K. , Kollman, C. , Tomlinson, G. , Murphy, H. R. , Grisoni, J. ,22 yrne, C. , Davenport, K. , Neoh, S. , Gougeon, C. , Oldford, C. , Young, C. , Green, L. , Rossi, B. , Rogers, H. , Cleave, B. , Strom, M. , Adelantado, J. M. , Chico, A. I. , Tun-didor, D. , Malcolm, J. , Henry, K. , Morris, D. , Rayman, G. , Fowler, D. , Mitchell,S. , Rosier, J. , Temple, R. , Turner, J. , Canciani, G. , Hewapathirana, N. , Piper, L. , Kudirka, A. , Watson, M. , Bonomo, M. , Pintaudi, B. , Bertuzzi, F. , Daniela, G. , Mion,E. , Lowe, J. , Halperin, I. , Rogowsky, A. , Adib, S. , Lindsay, R. , Carty, D. , Craw-ford, I. , Mackenzie, F. , McSorley, T. , Booth, J. , McInnes, N. , Smith, A. , Stanton,I. , Tazzeo, T. , Weisnagel, J. , Mansell, P. , Jones, N. , Babington, G. , Spick, D. , Mac-Dougall, M. , Chilton, S. , Cutts, T. , Perkins, M. , Scott, E. , Endersby, D. , Dover,A. , Dougherty, F. , Johnston, S. , Heller, S. , Novodorsky, P. , Hudson, S. , Nisbet,C. , Ransom, T. , Coolen, J. , Baxendale, D. , Holt, R. , Forbes, J. , Martin, N. , Wal-bridge, F. , Dunne, F. , Conway, S. , Egan, A. , Kirwin, C. , Maresh, M. , Kearney, G. , Morris, J. , Quinn, S. , Bilous, R. , Mukhtar, R. , Godbout, A. , Daigle, S. , Lubina, A. , Jackson, M. , Paul, E. , Taylor, J. , Houlden, R. , Breen, A. , Banerjee, A. , Bracken-ridge, A. , Briley, A. , Reid, A. , Singh, C. , Newstead-Angel, J. , Baxter, J. , Philip,S. , Chlost, M. , Murray, L. , Castorino, K. , Frase, D. , Lou, O. and

Pragnell, M. (2017). Continuous glucose monitoring in pregnant women with type 1 diabetes (conceptt):a multicentre international randomised controlled trial.

The Lancet , . Ferraty, F. and

Vieu, P. (2006).

Nonparametric Functional Data Analysis: Theory and Practice(Springer Series in Statistics) . Springer-Verlag, Berlin, Heidelberg. for Medicinal Products for Human Use, C. et al. (2012). Guideline on clinical investigationof medicinal products in the treatment or prevention of diabetes mellitus.

London, EuropeanMedicines Society . Franca, G. , Vogelstein, J. T. and

Rizzo, M. (2020). Kernel k-groups via hartigan’s method.

IEEE Transactions on Pattern Analysis and Machine Intelligence

Fr´echet, M. R. (1948). Les ´el´ements al´eatoires de nature quelconque dans un espace distanci´e.

Annales de l’institut Henri Poincar´e , . Freeman, J. and

Lyons, L. (2008). The use of continuous glucose monitoring toevaluate the glycemic response to food.

Diabetes Spectrum , https://spectrum.diabetesjournals.org/content/21/2/134.full.pdf , URL https://spectrum.diabetesjournals.org/content/21/2/134 . Group, B. A. W. (2018). Need for regulatory change to incorporate beyond a1c glycemic metrics.

Diabetes Care , e92–e94. Gude, F. , D´ıaz-Vidal, P. , R´ua-P´erez, C. , Alonso-Sampedro, M. , Fern´andez-Merino,C. , Rey-Garc´ıa, J. , Cadarso-Su´arez, C. , Pazos-Couselo, M. , Garc´ıa-L´opez, J. M. and

Gonzalez-Quintela, A. (2017). Glycemic variability and its association with demographicsand lifestyles in a general adult population.

Journal of diabetes science and technology , https://pubmed.ncbi.nlm.nih.gov/28317402 .23 all, H. , Perelman, D. , Breschi, A. , Limcaoco, P. , Kellogg, R. , McLaughlin, T. and

Snyder, M. (2018). Glucotypes reveal new patterns of glucose dysregulation.

PLOS Biology , https://doi.org/10.1371/journal.pbio.2005143 . Han, K. , M¨uller, H.-G. and

Park, B. U. (2019). Additive functional regression for densitiesas responses.

Journal of the American Statistical Association , https://doi.org/10.1080/01621459.2019.1604365 , URL https://doi.org/10.1080/01621459.2019.1604365 . Hirsch, I. B. , Sherr, J. L. and

Hood, K. K. (2019). Connecting the dots: Vali-dation of time in range metrics with microvascular outcomes.

Diabetes Care , https://care.diabetesjournals.org/content/42/3/345.full.pdf , URL https://care.diabetesjournals.org/content/42/3/345 . Hoelzel, W. , Weykamp, C. , Jeppsson, J.-O. , Miedema, K. , Barr, J. R. , Goodall, I. , Hoshino, T. , John, W. G. , Kobold, U. , Little, R. , Mosca, A. , Mauri, P. , Paroni, R. , Susanto, F. , Takei, I. , Thienpont, L. , Umemoto, M. and

Wiedmeyer, H.-M. (2004).Ifcc reference system for measurement of hemoglobin a1c in human blood and the nationalstandardization schemes in the united states, japan, and sweden: A method-comparison study.

Clinical Chemistry , http://clinchem.aaccjnls.org/content/50/1/166.full.pdf , URL http://clinchem.aaccjnls.org/content/50/1/166 . Hron, K. , Menafoglio, A. , Templ, M. , Hruuzova, K. and

Filzmoser, P. (2016). Simplicialprincipal component analysis for density functions in bayes spaces.

Computational Statistics &Data Analysis , Izenman, A. J. (1991). Review papers: Recent developments in nonparametric density estima-tion.

Journal of the American Statistical Association , https://doi.org/10.1080/01621459.1991.10475021 , URL https://doi.org/10.1080/01621459.1991.10475021 . Jim´enez-Gamero, M. , Alba-Fern´andez, M. and

Ariza-L´opez, F. (2019). Approximatingthe null distribution of a class of statistics for testing independence.

Journal of Computationaland Applied Mathematics ,

131 – 143. URL . Kilpatrick, E. S. (2000). Glycated haemoglobin in the year 2000.

Journal of Clinical Pathol-ogy , https://jcp.bmj.com/content/53/5/335.full.pdf , URL https://jcp.bmj.com/content/53/5/335 . Kovatchev, B. P. , Breton, M. , Man, C. D. and

Cobelli, C. (2009). In silico preclinicaltrials: A proof of concept in closed-loop control of type 1 diabetes.

Journal of Diabetes Scienceand Technology , https://doi.org/10.1177/193229680900300106 ,URL https://doi.org/10.1177/193229680900300106 . Leucht, A. and

Neumann, M. H. (2013). Dependent wild bootstrap for degenerate u- and v-statistics.

Journal of Multivariate Analysis ,

257 – 280. URL . Lu, J. , Ma, X. , Zhou, J. , Zhang, L. , Mo, Y. , Ying, L. , Lu, W. , Zhu, W. , Bao, Y. , Vigersky, R. A. and

Jia, W. (2018). Association of time in range, as assessed by con-tinuous glucose monitoring, with diabetic retinopathy in type 2 diabetes.

Diabetes Care ,24 https://care.diabetesjournals.org/content/41/11/2370.full.pdf , URL https://care.diabetesjournals.org/content/41/11/2370 . Matthews, D. , Hosker, J. , Rudenski, A. , Naylor, B. , Treacher, D. and

Turner, R. (1985). Homeostasis model assessment: insulin resistance and β -cell function from fasting plasmaglucose and insulin concentrations in man. Diabetologia , McDonnell, C. , Donath, S. , Vidmar, S. , Werther, G. and

Cameron, F. (2005). A novelapproach to continuous glucose analysis utilizing glycemic variation.

Diabetes Technology &Therapeutics , https://doi.org/10.1089/dia.2005.7.253 , URL https://doi.org/10.1089/dia.2005.7.253 . Molnar, G. , Taylor, W. and

Ho, M. (1972). Day-to-day variation of continuously monitoredglycaemia: a further measure of diabetic instability.

Diabetologia , Monnier, L. and

Colette, C. (2011). Glycemic variability: Can we bridge the divide betweencontroversies?

Diabetes Care , https://care.diabetesjournals.org/content/34/4/1058.full.pdf , URL https://care.diabetesjournals.org/content/34/4/1058 . Monnier, L. , Colette, C. and

Owens, D. R. (2008). Glycemic variability: the third componentof the dysglycemia in diabetes. is it important? how to measure it?

Journal of diabetes scienceand technology , M¨uller, H.-G. and

Petersen, A. (2014). Density estimation including examples.

Wiley StatsRef:Statistics Reference Online

Nathan, D. , Turgeon, H. and

Regan, S. (2007). Relationship between glycated haemoglobinlevels and mean glucose levels over time.

Diabetologia , Nerini, D. and

Ghattas, B. (2007). Classifying densities using functional regression trees: Ap-plications in oceanology.

Computational Statistics & Data Analysis , . Nguyen, M. , Han, J. , Spanakis, E. K. , Kovatchev, B. P. and

Klonoff, D. C. (0). Areview of continuous glucose monitoring-based composite metrics for glycemic control.

DiabetesTechnology & Therapeutics , null. PMID: 32069094, https://doi.org/10.1089/dia.2019.0434 , URL https://doi.org/10.1089/dia.2019.0434 . Nichols, S. P. , Koh, A. , Storm, W. L. , Shin, J. H. and

Schoenfisch, M. H. (2013). Biocom-patible materials for continuous glucose monitoring devices.

Chemical Reviews , https://doi.org/10.1021/cr300387j . Petersen, A. , Liu, X. and

Divani, A. A. (2019). Wasserstein f -tests and conﬁdence bands forthe fr`echet regression of density response curves. . Petersen, A. and

M¨uller, H.-G. (2016). Functional data analysis for density functions bytransformation to a hilbert space.

Ann. Statist. , https://doi.org/10.1214/15-AOS1363 . Petersen, A. and

M¨uller, H.-G. (2019). Fr´echet regression for random objects with euclideanpredictors.

Ann. Statist. , https://doi.org/10.1214/17-AOS1624 .25 reda, C. (2007). Regression models for functional data by reproducing kernel hilbert spacesmethods. Journal of Statistical Planning and Inference ,

829 – 840. Special Issue on Nonpara-metric Statistics and Related Topics: In honor of M.L. Puri, URL . Ramsay, J. , Ramsay, J. and

Silverman, B. (2005).

Functional Data Analysis . Springer Seriesin Statistics, Springer. URL https://books.google.es/books?id=mU3dop5wY_4C . Selvin, E. , Crainiceanu, C. M. , Brancati, F. L. and

Coresh, J. (2007). Short-term vari-ability in measures of glycemia and implications for the classiﬁcation of diabetes.

Archives ofinternal medicine , Service, F. J. (2013). Glucose variability.

Diabetes , https://diabetes.diabetesjournals.org/content/62/5/1398.full.pdf , URL https://diabetes.diabetesjournals.org/content/62/5/1398 . Service, F. J. , Molnar, G. D. , Rosevear, J. W. , Ackerman, E. , Gatewood, L. C. and

Taylor, W. F. (1970). Mean amplitude of glycemic excursions, a measure of diabetic insta-bility.

Diabetes , http://diabetes.diabetesjournals.org/content/19/9/644.full.pdf , URL http://diabetes.diabetesjournals.org/content/19/9/644 . Silverman, B. W. (1986).

Density Estimation for Statistics and Data Analysis . Chapman & Hall,London.

Singh, R. , Barden, A. , Mori, T. and

Beilin, L. (2001). Advanced glycation end-products: areview.

Diabetologia , https://doi.org/10.1007/s001250051591 . ˇSkrha, J. , ˇSoupal, J. and Pr´azn`y, M. (2016). Glucose variability, hba1c and microvascularcomplications.

Reviews in Endocrine and Metabolic Disorders , Szab´o, Z. , Sriperumbudur, B. K. , P´oczos, B. and

Gretton, A. (2016). Learning theory fordistribution regression.

J. Mach. Learn. Res. , Szekely, G. J. and

Rizzo, M. L. (2017). The energy of data.

Annual Review of Statistics andIts Application , https://doi.org/10.1146/annurev-statistics-060116-054026 ,URL https://doi.org/10.1146/annurev-statistics-060116-054026 . Talska, R. , Menafoglio, A. , Machalova, J. , Hron, K. and

Fivserova, E. (2018). Com-positional regression with functional response.

Computational Statistics & Data Analysis , Taylor, R. (2013). Type 2 diabetes.

Diabetes Care , https://care.diabetesjournals.org/content/36/4/1047.full.pdf , URL https://care.diabetesjournals.org/content/36/4/1047 . Villani, C. (2008).

Optimal transport: old and new , vol. 338. Springer Science & Business Media.

Wang, J.-L. , Chiou, J.-M. and

M¨uller, H.-G. (2016). Functional data analysis.

Annual Reviewof Statistics and Its Application , u, M. , Reiss, P. T. and

Cribben, I. (????). Generalized reliability based on distances.

Biometrics , n/a . https://onlinelibrary.wiley.com/doi/pdf/10.1111/biom.13287 , URL https://onlinelibrary.wiley.com/doi/abs/10.1111/biom.13287 . Zaccardi, F. and

Khunti, K. (2018). Glucose dysregulation phenotypes—time to improve out-comes.

Nature Reviews Endocrinology , Zeevi, D. , Korem, T. , Zmora, N. , Israeli, D. , Rothschild, D. , Weinberger, A. , Ben-Yacov, O. , Lador, D. , Avnit-Sagi, T. , Lotan-Pompan, M. et al. (2015). Personalizednutrition by prediction of glycemic responses.

Cell ,163