An attention model to analyse the risk of agitation and urinary tract infections in people with dementia
Honglin Li, Roonak Rezvani, Magdalena Anita Kolanko, David J. Sharp, Maitreyee Wairagkar, Ravi Vaidyanathan, Ramin Nilforooshan, Payam Barnaghi
11 An attention model to analyse the risk of agitationand urinary tract infections in people with dementia
Honglin Li § , Roonak Rezvani § , Magdalena Anita Kolanko § , David J. Sharp, Maitreyee Wairagkar, RaviVaidyanathan, Ramin Nilforooshan, Payam Barnaghi Abstract —Behavioural symptoms and urinary tract infections(UTI) are among the most common problems faced by peoplewith dementia. One of the key challenges in the managementof these conditions is early detection and timely intervention inorder to reduce distress and avoid unplanned hospital admissions.Using in-home sensing technologies and machine learning modelsfor sensor data integration and analysis provides opportunitiesto detect and predict clinically significant events and changesin health status. We have developed an integrated platform tocollect in-home sensor data and performed an observational studyto apply machine learning models for agitation and UTI riskanalysis. We collected a large dataset from 88 participants witha mean age of 82 and a standard deviation of 6.5 (47 femalesand 41 males)to evaluate a new deep learning model that utilisesattention and rational mechanism The proposed solution canprocess a large volume of data over a period of time and extractsignificant patterns in a time-series data (i.e. attention) and usethe extracted features and patterns to train risk analysis models(i.e. rational). The proposed model can explain the predictionsby indicating which time-steps and features are used in a longseries of time-series data. The model provides a recall of 91%and precision of 83% in detecting the risk of agitation andUTIs. This model can be used for early detection of conditionssuch as UTIs and managing of neuropsychiatric symptoms suchas agitation in association with initial treatment and earlyintervention approaches. In our study we have developed aset of clinical pathways for early interventions using the alertsgenerated by the proposed model and a clinical monitoring teamhas been set up to use the platform and respond to the alertsaccording to the created intervention plans.
I. I
NTRODUCTION D EMENTIA affects , people in the UK and over50 million globally, and is set to become the developedworld’s largest socioeconomic healthcare burden over comingdecades [1], [2]. In the absence of any current treatment,there is an urgent need to focus on reducing the effects ofsymptoms and help to improve the quality of life and well-being of those already affected [3]. The 2020 report of theLancet Commission on dementia prevention, treatment, andcare stresses the importance of individualised interventions H. Li, M. N.Kolanko, D. J. Sharp, P.Barnaghi are with Department of BrainSciences, Imperial College London, W12 0NN, United Kingdom.R. Rezvani is with Centre for Vision, Speech and Signal Processing,University of Surrey, Guildford, GU2 7XH, United Kingdom.M. Wairagkar and R. Vaidyanathan are with Department of MechanicalEngineering, Imperial College London, SW7 1AL, United Kingdom.R. Nilforooshan is with Surrey and Borders NHS Foundation Trust,Leatherhead, KT22 7AD, United Kingdom.All authors are also with the Care Research and Technology Centre, TheUK Dementia Research Institute (UK DRI). § these authors contributed equally to this work.Corresponding author: [email protected] to address complex medical problems, multimorbidity andneuropsychiatric symptoms in dementia, which lead to un-necessary hospital admissions, faster functional decline, andworse quality of life [4].People with dementia have complex problems with symp-toms in many domains. It is estimated that up to will develop behavioural and physical symptoms of dementia(BPSD) over the course of their illness, with agitation beingone of the most common symptoms [5], and a frequent reasonfor nursing home placement [6]. Furthermore, patients withdementia often suffer from a number of co-morbid conditionsand have a higher frequency of medical problems such as falls,incontinence, dehydration or urinary tract infection (UTI) - thecommonest bacterial infection in the older patient population,and the commonest cause of sepsis in older adults [7] withan associated in-hospital mortality of in this age group[8]. If not detected and treated early, both BPSD and medicalcomorbidities frequently lead to emergency hospital admis-sions in dementia patients. Alzheimer’s Research UK estimatesthat of hospital admissions in dementia patients are forpreventable conditions, such as urinary tract infections. Be-sides significant costs, hospitalisation places dementia patientsat risk of serious complications, with longer hospital stays,higher risk of iatrogenic complications, delayed dischargeand functional decline during admission, which contributesto higher rates of transfer to residential care and in-patientmortality [9]. Therefore, increased medical supervision, earlyrecognition of deterioration in health status and rapid treatmentare key to preventing unnecessary hospitalization for ’ambu-latory’ conditions, that could be treated outside of hospital,such as UTIs. Furthermore, ongoing monitoring of peoplewith dementia allows immediate detection of behaviouraldisturbances, enabling earlier psychosocial and environmentalinterventions to reduce patients’ distress and prevent furtherescalation and hospitalization.However, monitoring and supporting individuals in an on-going manner is a resource and cost-intensive task, often notscalable to larger populations. Utilising remote monitoringtechnologies with the help of caregivers can allow creatingpractical and generalisable solutions. As part of the researchin the Care Research and Technology Centre at the UK De-mentia Research Institute (UK DRI), we have been developingand deploying in-home monitoring technologies to help andsupport people affected by dementia. Our research has led tothe development of a digital platform that allows collectingand integrating in-home observation and measurement datausing network-connected sensory devices [10]. In this paper, a r X i v : . [ c s . A I] J a n we discuss how our in-home monitoring data and machinelearning algorithms are used to detect early symptoms ofagitation and UTI in people with dementia living in their ownhomes.Sensing technologies have been increasingly used to moni-tor activities and movements of elderly patients living in theirown homes [11], [12], [13]. Interpreting this information; how-ever, demands considerable human effort, which is not alwaysfeasible. The use of analytical algorithms allows integrationand analysis of rich environmental and physiological data atscale, enabling rapid detection of clinically significant eventsand development of personalized, predictive and preventativehealthcare.Deep learning models have been applied in a variety ofhealthcare scenarios to identify the risk of various clinical con-ditions or predict outcomes of treatment [14], [15]. Recently,there have been several implementations of Recurrent NeuralNetworks (RNNs) to create learning models for time-serieshealthcare data analysis [16], [17], [18]. The behavioural andphysiological symptoms and patterns in long-term conditionssuch as dementia appear in the data over a long periodof time and can fluctuate and change over the course ofdisease. Machine learning models such as RNNs; however,are not suitable for analysing long sequences of time-points.To address the long sequence analysis issue in RNNs, othermethods such as Bidirectional RNN, LSTM and GRU havebeen used [19], [20]. There also have been attempts to applyattention mechanisms to clinical datasets [21], [22], [23], [24],[25] to improve the performance of analysing imbalancedand long-tail time-series data. A fundamental limitation inthese models is the adaptivity and generalisability. Whenlong-distance symptoms and patterns are related to a specificcondition, the generalisability and performance of the existingmodels are limited. The long sequences of data points andthe changes in the ongoing conditions vary in patients, andoften there are no large labelled training samples to train themodels for all the variations. Deep learning models offer anew opportunity to training models that can pay attention tocorrelations and long-distance relations between the patternsand sequences. However, the off-the-shelf and existing deeplearning model require large training samples.While applying neural networks to clinical data, there aretwo main challenges: 1) selecting the important timesteps andfeatures from long sequences of data to create generalisablemodels; and 2) imbalance in datasets. Neural networks arevery effective in finding a trend in datasets. Models suchas Recurrent Networks use the positions of the input andoutput sequences to generate a sequence of hidden states. Thisis computationally expensive and limited computing of theglobal dependencies [26]. In these models, the computationalcomplexity to relate input or output positions also grows as thedistance between positions increases. This latter makes it verychallenging to learn dependencies and correlations betweenlong-distance patterns and time points [27].Additionally, clinical datasets are often imbalanced, withcontent spanning ensembles of heterogeneous data. Most ofthe clinical datasets contain more normal cases (i.e. Truepositives) than abnormal data points (i.e. True Negatives). In our dataset, which includes a large set of in-home environ-mental and physiological data from people with dementia, thenumber of positive cases for infections is much smaller thanthe true negative cases. In large parts of the data, the truestatus of the infection is unknown (i.e. the data is partiallylabelled due to the limitations in accessing the patients’ clinicalrecords or knowing the presence of any infections withouta test). This issue causes the learning models to exhibit abias towards the majority class. It may ignore the minorityclass or make a decision based on a partial set which is nota broad representation of the cases [28]. There have beenseveral works on implementing attention mechanisms [26] toimprove the generalisability of learning models in analysingtime-series data. However, Jian et. al [29] found that there arelimitations in the weights generated by attention-based modelswhich can lead to wrong predictions. Hence, we need to bemore cautious in using the attention mechanisms and theirexplanations in designing deep learning models. While theattention-based models are promising in healthcare time-seriesdata analysis, considering the time and features dependenciesof the predictions poses a challenge for this type of models.Over-sampling which augments the data by generating syn-thetic samples [30], down-sampling which prunes the samplesin the majority classes are among the typical models that areused to deal with the imbalance issues in datasets [31]. How-ever, samples in clinical data and variations in the real-data areimportant aspects of the observations and measurements thatmay not be present in augmented data generated by samplingmethods. It is crucial to find an efficient way to address theimbalance issue without modifying or reducing the originaldata in pre-processing steps [32].Our goal is to propose a model to address the challengesmentioned above. To support the clinical treatment and adaptto the real-world sensory data readings, the model shouldfilter the redundant and less informative data. Furthermore,the model can explain the predictions by telling us which timeperiods and sensors are important to give the predictions. Lastbut not least, the model can adapt to the imbalanced data.II. D ESIGN , SETTING AND PARTICIPANTS
Real-time, continuous measurement methodologies enabledby the recent advances in pervasive computing and ‘smart-home’ technologies provide opportunities to monitor the be-haviour and health status of elderly people using wearabletechnology or environmental sensors [11], [12], [13].Computer-derived algorithms have been developed to anal-yse sensor data and identify patterns of activity over time.These can be applied to detect changes in activities of dailyliving in order to predict disease progression and cognitivedecline. For instance, ORCATECH group used continuous in-home monitoring system and pervasive computing technolo-gies to track activities and behaviours such as sleep, computeruse, medication adherence to capture changes in cognitivestatus [33]. They also demonstrated the ability of machinelearning algorithms to autonomously detect mild cognitiveimpairment in older adults [34]. Machine learning modelshave also been used to detect clinically significant events
Fig. 1: An overview of the proposed solution for healthcare data analysis. The data is encoded by positional encoding before passing to themodel. The proposed rationalising extract important information and pass to the higher layers. The proposed rationalising block contains arational layer to extract important time steps. A Long-Short Term Memory (LSTM) model processes the extracted data. The attention layerto pay attention to suitable features. The rationalising process of the data changes during the rationalising block. The rationalising blockextracts the important time steps at first. Then it pays attention to different emphasis features of the pruned data. Then the data is given tomake a prediction. All the layers are trained simultaneously. and changes in health status. Much of the previous workfocused on detection and prediction of falls using wearableaccelerometers or other motion detectors [35], as well astracking behavioural symptoms such as sleep disturbances[36], agitation [37], and wandering [38] in elderly patients.However, there is limited research on the use of machinelearning models for detection of health changes such as infec-tion in the context of smart-homes. An early supervised UTIdetection model has been described using in-home PIR sensors[39], however it relied on the activity labels and annotations inthe training dataset, which is extremely time-consuming andnot generalisable to the real-world situations with large amountof unlabelled data collected from uncontrolled environments.We have previously proposed an unsupervised technique thatcould learn individual’s movement patterns directly from theunlabelled PIR sensor data [40].Furthermore, the existing research and the data-driven solu-tions are either applied to small scale pilot studies and do notprovide evidence for scalability and generalisability. They arealso limited in analysing long-term patterns and correlationsthat appear in the data. Attention-based models which canovercome these problems have never been applied to sensordata for detecting clinically significant events or changes inhealth status in dementia patients.This is the first to use deep learning and attention-basedmethods to perform risk analysis for behavioural symptomsand health conditions such as UTIs in people living withdementia. The proposed model improves the accuracy andgeneralisability of machine learning models that use imbal-anced and noisy in-home sensory data for the risk analysis.An analysis of the suitability of the digital markers and theuse of in-home sensory data is explored in an ablation study. The proposed model is compared with several baseline modelsand state-of-the-art methods. The proposed approach has beenevaluated in an observational clinical study. Participants (n=88,age=81 +/- 6.5) were recruited for a six months trial period.The proposed solution provides a recall of and precisionof in detecting the risk of agitation and UTIs. We havealso set up a framework and a clinical response team that usethe risk alerts generated by the models for ongoing support andmanagement of the conditions in people living with dementia.Using high-resolution in-home observation and measure-ment data in association with advance machine learningmethods leads to early and timely interventions and has asignificant impact on reducing preventable and unplannedhospital admissions in people affected with dementia. A keychallenge in using analytical and predictive models for riskanalysis is identifying and collecting digital markers data usingin-home sensory devices. The capacity of the proposed modelto address time-series feature identification and data imbalanceenables use in a very wide range of healthcare and risk analysisapplications using in-home digital markers.III. M
ETHOD
We introduce a model that can identify the importanttime steps and features and utilise long-distance dependenciesto make better predictions. The proposed model provides aprediction based on the selected time points and the selectedfeatures from the raw observation and measurement data.Figure 1 shows how the data changes during the processing.The model selects important time steps through a pruningprocess. After pruning the data, it pays attention to differentfeatures and uses them to make the predictions. Differentfrom methods such as clustering sampling [41], we select the
Fig. 2: Visualisation of the sensor readings. The x-axis represents the time of the day for activation of the sensors. The y-axis representsthe days for a period of 8 months for a patient. Each colour represents a type of an environmental activity sensor. Similar colour along they-axis represent similar patterns of activities around the same time in consecutive days. The more colour distortion/merge of colours alongthe y-axis represent more changes in pattern of activity over time. important time steps of each sample instead of selecting aportion of samples for training. In contrast to statistic featureselection methods such as sequential feature selection [42], theproposed model selects important time steps based on differentdata. We use focal loss [43] to assign priority to the minorityclass without generating synthetic samples.
Fig. 3: A heat-map of the aggragation of the raw data. The readingsare aggregated per hour within each day.
Data sources and pre-processing
We have collected the data as part of an observationalclinical study in people living with dementia from December2018 to April 2020. Each of the participants has had aconfirmed diagnosis of dementia (mild to severe) within, atleast, the past three months of recruitment and have beenstable on dementia medication. The collected data containscontinuous environmental sensor data from houses of patientswith dementia who live in the UK. The sensors include PassiveInfra-Red (PIR), smart power plugs, motion and door produced by Develco in Aarhus, Denmark. The sensors were installedin the bathroom, hallway, bedroom, living room (or lounge)and kitchen in the homes and also on the fridge door, kettleand microwave (or toaster). The sensors also include network-connected physiological monitoring devices that are used forsubmitting daily measurements of vital signs, weight andhydration. The data is integrated into a digital platform, whichis designed in collaboration with clinicians and user group tosupport the people with dementia, that we have developed inour past research [10]. A clinical monitoring team that is setup as part of our observational study has used the platform todaily annotate the data and very the risk analysis alert. Basedon the annotations, we select four incidents including agitation,Urinary Tract Infection (UTI), abnormal blood pressure andabnormal body temperature to label our data binarily. Morespecifically, a label is set to true when the abnormal incidentis verified by the monitoring team and vice versa. We thenuse the environmental data to inference if there is any incidenthappen within one day. Fig 2 shows an example of collecteddata. To pre-process the data, we aggregate the readings of thesensors within each hour of the day, shown in Fig 3. Appendix1 shows a list of potential digital markers and sensory data thatcan be used in dementia care. In the appendix, we also show ascreenshot of the platform that is used for collecting the data.
Machine learning model
We aim to use the environmental sensors to predict possibleincidents and avoid delayed treatment. Furthermore, the modelshould provide the reason, i.e. which period of time andsensors are important to give the predictions, to explain theinference. In other words, the model can remove the redundant or less informative information and use the rest of the data togive the prediction, shown in Fig 4.
Fig. 4: Selected time steps from the raw data. These time steps areselected by the model. The model learns to identify time steps thatare more important in predicting the outcome.
As discussed earlier, in healthcare data analysis, often, thepredictions are based on a long sequence of data measuredand collected at different time-points. Accessing and feedingmore data helps to train more accurate models. However, moreinformation can also mean more noise in the data, and theimbalance in the samples that are given to the model can alsolead to decision bias. An efficient model should be able toprocess and utilise as much data as available. However, themodel should also avoid the common pitfalls of noise and bias.To address these issues, we have studied the use of attention-based models. This group of models will utilise all the avail-able information and, in each sequence, will identify the time-points that provide the most information to the training andprediction. This attention and selection process is an embeddedstep in the model. It will allow the model to be flexible andgeneralisable for different sequences with variable lengths andfor a different combination of features and values that arerepresented in the data. Before explaining our proposed modelsand its contributions to creating a generalisable solution fortime-series healthcare data analysis, we provide an overviewof the related work. We discuss the use of attention-basedmodels in other domains and explain how the ideas presentedin the existing work has led to the design of our current model.
Fig. 5: After selecting the important time steps, the model learnswhich sensors should be attention. In this case, the model think thebathroom sensor has the most contribution the prediction.
The attention mechanisms have been introduced in NeuralLanguage Processing (NLP) by Bahdanau et. al [44]. Theattention-based models are widely used in NLP due to theircapability of detecting important parts of a sequence andefficiently interpreting it. The attention-based models have alsobeen used in continuous healthcare and clinical data analysis[45]. Continuous clinical data are multivariate time-series datawith temporal and sequential relationships. For each patient,the data is a set of time steps, and each time step containsmedical features ( X ∈ R t × d ). REverse Time AttentIoN model(RETAIN) is one of the first systems, that used in usingattention mechanism for medical data [21]. In this model,there are two separate RNNs, one to generate the visit-levelattention weights ( α ) and the other one for variable-level ( β )attention weights. In this model, the most relevant time stepis the one associated with the largest value in α . Choi et. al provided a method to find the most influential medical feature[21]. However, RETAIN cannot handle long-distance depen-dencies. To deal with this issue, Ma et. al proposed Dipole,a predictive model for clinical data using Bidirectional RNNs[22]. They have implemented the model using two differentattention mechanisms: General attention and Concatenation-based attention. The results show that Concatenation-basedattention outperforms because of incorporating all the long-distance dependencies.In the above models, the input layer is simple, and thedata has the same pipeline, but in the Timeline model, Bai et. al adapted the pipeline of data [23]. They use attentionlayer to aggregate the medical features, and by modellingeach disease progression pattern, they find the most importanttimesteps. To deal with long-distance dependencies, Timelineimplements Bidirectional LSTMs. One of the recent studies inthis area is AdaCare [24], which uses Gated Recurrent Units(GRU). AdaCare utilises convolutional structure to extractall the dependencies in the clinical data. AdaCare showedpromising results in the explainability of the model. Themodels mentioned above have been developed based on re-current networks. However, the sequential aspect of recurrentmodels is computationally inefficient. The SAnD model wasdeveloped solely based on multi-head attention mechanism[25]. Song et. al implemented a positional encoding to includethe sequential order in the model.The models mentioned above show significant improve-ments in the accuracy and performance of predictive modelsin the clinical field. However, incorporating both long-distancedependencies and feature associations is a challenging task. Inthe existing models, the analysis is either on time step-level orfeature-level. In this paper, we propose a model to detect andpredict the risk of healthcare conditions by analysing long-distance dependencies in the patterns and sequences of thedata. This information can be useful for clinical experts inongoing management of the conditions. The work also helpsto use an automated process to alert the risk of adverse healthconditions and explore the symptoms related to the detectedconditions.Our proposed model consists of two main components, arationalising block and the classification block, as shown inFigure 1. In a high-level overview, the rational layers select the important time steps and pass to an LSTM layer. The LSTMlayer will ignore the trivial time steps and process the datafor the attention block. The classifier uses these time pointsfor predictions. After processing by the attention block, themodel will give a prediction. The details of these blocks areexplained in the following sections. Positional Encoding
To use the order of sequence in the analysis, we addpositional encoding (PE) before passing the data into themodel. We use the sine and cosine positional encoding [26].Shown in Equation 1, where pos is the position of the timestep, i is the position of the sensor, d is the dimension of eachtime step. P E ( pos, i ) = sin ( pos/ i/d ) P E ( pos, i + 1) = cos ( pos/ i/d ) (1) Rationalising Prediction
To add more focus on the time steps in the data that aremore relevant to the predictions, the generator produces abinary mask to select or ignore a specific time points. Forexample: x ∈ R k × f contains k time point and f features foreach time point, the generator will produce a binary vector z = { z , z , . . . , z k } . The i th variable z i ∈ { , } indicateswhether the i th time point in x is selected or not.Whether the i th time point is selected or not is a conditionalprobability given the input x . We assume that the selectionof each time point is independent. The Generator uses aprobability distribution over the z , which could be a jointprobability of the selections. The joint probability is givenby: p ( z | x ) = k (cid:89) i =1 p ( z i | x ) (2) Classifier
After exploring and selecting the most relevant time points,we train a classifier to provide the predictions. The trainedclassifier contains attention blocks and residual blocks.Attention block is an application of self-attention mecha-nism to detect the important features. The attention mecha-nism detects important parts of a sequence. It has three keycomponents: the inputs structure, the compatibility functionand the distribution function [46].There are three inputs in the structure; Keys ( K ∈ R n k × d k ),Values ( V ∈ R n v × d v ) and Query ( Q ∈ R n q ), where the n isthe dimension of the inputs, the k, v, q are the dimension of theoutputs. They could have different or same sources. If K and q come from the same source, it is self-attention [26]. K and V represent input sequence which could be either annotated orraw data. q illustrates the reference sequence for computingattention weights. For combining and comparing the q and K values, compatibility function has been used. Distributionfunction computes the attention weights ( a ∈ R d k ) using theoutput of compatibility function ( c ∈ R d k ). We obtain the attention by Equation 3. The Q, K, V are ma-trices formed by queries, keys and values vectors, respectively.Since we use the self-attention, the
Q, K, V are calculated bythe inputs with different weight matrices.Attention ( Q, K, V ) = softmax ( QK T √ d k ) V (3)The architecture of the attention block is the same describedin [26]. We employ a residual connection [47] followed by anormalisation layer [48] inside the attention block. Residualblocks and the output layer process the output of the attentionblock. Objective function
The training samples in healthcare datasets are often imbal-anced due to the low prevalence and sporadic occurrences. Inother words, some of the classes contain more samples thanothers. For example, only 25% of the data we collected arelabelled as positive. More details of the dataset will be clarifiedin the following section. To deal with the imbalance issue, weuse focal loss [43] as the objective function of the classifier,shown in Equation 4: L c = − α (1 − p ) β log( p ) (4)where α and β are hyper-parameters to balance the variant ofthe focal loss, p = f ( x, z ) ∗ y + (1 − f ( x, z ) ∗ (1 − y ) . f ( x, z ) is the probability estimated by the classifier and y ∈ { , } isthe label of x .In addition to the loss function used in the classifier, thegenerator produces a short rational selection and calculatesthe loss. Shown in Equation 5, where the λ is the parameterto weight the selection: L g = λ || z || (5)We then combine the focal loss and the loss from generatorto construct loss function as shown in Equation 6: L = (cid:88) ( x,y ) ∈ D E [ L c + L g ] (6)IV. R ESULTS
Evaluation Metrics : To evaluate our proposed method andcompare it with the baseline models, we calculated differentmetrics. One of the primary metrics to assess the model isaccuracy which is the measure of how close is the predictedclass to the actual class. However, accuracy alone cannot bea good measure to evaluate the performance of a classifier.As a result, we also calculated the Area Under the Curveof Receiver Operating Characteristic (ROC) and Precision-Recall (PR). The precision of class A is the ratio of samplespredicted as class A which are correct, and Recall is the ratioof samples as true class A which have been detected. ROCcurve is the measure of model capability in differentiatingbetween classes. We do not report the results in terms ofspecificity and sensitivity. The reason is that in this study, wedo not have access to the full electronic healthcare records (a) PR (b) ROC (c) Loss
Fig. 6: Evaluation of the proposed methods using the in-home sensory dataset. (a) shows the precision; (b) evaluates the Receiver OperatingCharacteristics (ROC) curve and (c) shows the changes to the loss during the training. In (a) and (b) the results of the proposed model isalso compared with a set of baseline models. (a) PR (b) ROC (c) Selection Rate changes
Fig. 7: An ablation study to evaluate the model; (a) shows the precision; (b) evaluates the Receiver Operating Characteristics (ROC) curveand (c) shows the selection rate changes. In (a) and (b) the results of the evaluation is by eliminating different components from the model. and hospital admission data of all the participants. So reportthe specificity and sensitivity only based on the detected andevaluated labels in our dataset, which can only be a sub-setof true and false cases for the cohort, can be misleading interms of an actual and generalisable clinical finding. Instead,we have opted to evaluate the precision and generalisabilityof the prediction algorithm based on the existing labelleddata and the known cases that we could evaluate and verifythe performance of the model.
Baseline Models : We compare our model with the LinearRegression (LR) [49], Long-Short Term Memory (LSTM)neural networks [50] and a fully connected Neural Network(NN) model [51].LR is a discriminative model which can avoid the confound-ing effects by analysing the association of all variables together[49]. It is also a commonly used baseline model to evaluatethe performance of the proposed models [20].NN has the ability to learn a complex relationship. UnlikeLR, NN does not need to assume the variables are linearlyseparated. It is also applied to a variety of clinical data sets[52], [53]. In the experiment, we used a Neural Network withone hidden layer contains 200 neurons, a softmax output layercontains two neurons, cross entropy loss and adam optimiser.LSTM is a powerful neural network to analyse the sequen-tial data, including time-wised clinical datasets [18], [19].It can associate the relevant inputs even if they are widelyseparated. Since our dataset consists of time-series sequences,we take the LSTM as another baseline model. In the experi-ment, we used a model that contains one residual block, one LSTM layer contains 128 neurons, and a softmax output layercontains two neurons, cross entropy loss and adam optimiser.In the experiments, we aggregate the readings of eachsensor per hour. Hence each data point contains 24-timepoints and eight features. We set the batch size to ,learning rate to . , sparsity to . . We divide thedata into a train set and a test set. The numbers of trainingand testing samples in the datasets are 209 and 103 caseswith their associated time-series data, respectively. Thedata is anonymous, and only the anonymous data withoutany personally identifiable information is used in this research. Experiments : The ROC and PR changes during trainingare shown in the first two graphs in Figure 6. Overall, theproposed model outperforms other baseline methods. TheLSTM performs well in dealing with the time-series data.Compared to the other methods, the neural network convergesmuch faster. However, the performance of the model fluctuatesaround 30 epochs. The convergence and the fluctuation aredue to the rational process. The model has to learn how toextract important time steps and pay attention to the features.This process is also reflected in Figure 6c, the loss fluctuatesduring that period. However, the model adjusts this fluctuationautomatically and improves the performance. The overallresults are also summarised in Table I.V. D
ISCUSSION
Ablation Study : We begin the discussion with an ablationstudy. Our model contains five important components:Rational layers, Attention layers, Residual layers, focal loss
TABLE I: The evaluation results in comparison with a set ofbaseline models: Linear Regression (LR), Long-Short Term Memory(LSTM) neural networks and a fully connected Neural Network (NN)model. Since the dataset is imbalance, we calculated the Area Underthe Curve (AUC) of Receiver Operating Characteristic (ROC) andPrecision-Recall (PR) to evaluate the performance.
LR LSTM NN Proposed methodAUC - PR 0.3472 0.6901 0.5814
AUC - RC 0.5919 0.7644 0.7601 and positional encoding. We omit each component one ata time and explore how removing one of the componentswill impact the performance of the model. The experimentsare shown in the first two graphs of Figure 7. The orangeline represents the model without rational layer. Althoughthe performance of the model without rational layer keepsincreasing, it underperforms in others significantly. In otherwords, the rational layer plays an important role in the model.Removing the positional encoding, attention layer, residuallayer, or the focal loss decrease the performance as well.The performance change caused by omitting each of thesefour components are quite similar. As shown in Figure 7,the positional encoding helps the model to identify relevantpatterns of the data over time and plays an important role inthe performance of the model. The rate of selected timestepschanges is shown in Figure 7c. The rate of selected timestepschanges is shown in Figure 7c.
Rationalising prediction : the Rational component helps toincrease the accuracy of the model. Generally, the proposedrationalising method shows that the model knows which timesteps and features to give the prediction. These patterns andtime steps can also be explored to identify and observerrelevant data and symptoms to a condition in each patient.Using this component, a personalised set of patterns andsymptoms can be explored for each patient. The last graphin Figure 7 shows the selection rate changes during thetraining phase. The model learns to extract the time steps,and the accuracy increases after the changes become stable.As mentioned in the ablation study, after learning to extractthe important time steps, the proposed model outperforms thebaseline models without rational mechanisms. In other words,the model extracts a sub-set of the time steps (e.g. part ofthe time steps are extracted from Figure 3 to Figure 4) toobtain a better prediction. As the learning process continues,the model tries different selections and finds the optimisedselection rate. Comparing to other models, the performanceof the proposed model does not decrease during the training.The model learns to pay attention to the most relevantsegments of the data and consider long-distance dependenciesin the time-series data. In summary, the proposed modelcan not only explain the prediction but also abandon theredundant information in the data automatically. According toour experiments, the proposed model in average selects of the time points in the datasets to estimate the predictions.
Pair analysation : We then analyse the rational block pro- cessing on the positive and negative samples. As shown inFigure 8, the rational block assigns weights to the positiveand negative samples differently. More specifically, the modelhas learnt to extract different amount and series of time stepsbased on the inputs. In this case, the model extracts more timesteps on the positive case than the negative case. Furthermore,the model pays attention differently based on the input data.In the example above, the model assumes the bathroom isthe most important sensors in the positive samples. However,the model takes the bathroom and kettle almost as equallyimportant sensors for predicting the negative case. After themodel pays attention to the sensors of selected time steps, theclassifier gives the predictions correctly.
Translating machine learning research into clinical practice
Improving the quality of life by preventing illness-relatedsymptoms and negative consequences of dementia has beenset out as a major goal to advance dementia care. Agitationand infections have been highlighted as areas for prioritydevelopment [6]. Our proposed model directly addresses thesepriorities in dementia care and intervention by enabling earlydetection of agitation and urinary tract infections in re-mote healthcare monitoring scenario, providing an opportunityfor delivering more personalised, predictive and preventativehealthcare. When applied to real-world clinical dataset in thecontext of the current clinical study our proposed algorithmprovided a recall of 91% and precision of 83% in detectingearly signs of agitation and UTI from physiological andenvironmental sensor data. A clinical monitoring team verifiedthe predictions by contacting the patient or carer when anagitation or UTI alert was generated. A set of clinical pathwaysfor early interventions has also been developed for the clinicalmonitoring team to use when responding to the alerts.
Relevance to patient outcomes:
We would like to high-light an important aspect of using this type of analysis toevaluate healthcare and patient outcomes. Focusing only onaccuracy as a metric for assessment of the solution withina specific cohort goes only so far [54]. Large studies andfurther experiments with different cohorts and various in-home deployment settings are required to assess how suchalgorithms will perform in the noisy and dynamic real-worldenvironments. There are several examples of AI and machinelearning algorithms that perform very well in controlled andlaboratory settings, but the real-world experience is different[54]. In this study, the sensors and data collection happens inuncontrolled, real-world environment. We have done severalcross-validations, comparison and ablation studies to avoidoverfitting the model and make sure the results are robustand reproducible. However, further independent trials andvalidation studies with larger cohorts are required to transformthe current work into a product that can be used in real-worldclinical and care settings. Another important item is that onlyfocusing on the accuracy of the algorithm will not give acomplete picture of the real effectiveness and impact of thesolution on patient outcomes.Our agitation intervention protocol follows all current guide-lines, which agree that individualised and person-centred non-
Fig. 8: Visualisation of the outputs within the rational block. The top figure visualises a sample which is validated with a True incident. Thebottom figure is a sample which is validated with a False incident. pharmacological therapies are the first-line treatment for agita-tion in people with dementia [55], [56]. In line with the currentguidelines, the initial assessment explores possible reasonsfor patients’ distress and addresses clinical or environmentalcauses first. The clinical monitoring team asks a set of stan-dardised questions to evaluate the symptoms and to help thecarer to identify potential causes of agitation such as pain,illness, discomfort, hunger, loneliness, boredom or environ-mental factors (temperature, light, noise level). The recognitionand treatment of possible organic causes or triggering factorsremains the mainstem of the intervention. In particular detec-tion of delirium and a possible underlying infection is of greatimportance and the clinical monitoring team facilitates earlydiagnosis and treatment by liaising with the study’s clinicalteam and patient’s GP. Finally, the clinical monitoring teamprovides psychological support for the caregivers in order toreduce the caregiver distress. In the future, we are planningto use multimodal sensor data to improve the classificationof agitation state which will include measuring sound levelsalong with activity detected by environmental sensors.Similarly to the agitation protocol, in case of a UTI alert theclinical monitoring team first responds by contacting the pa-tient/carer to evaluate the symptoms. However, the diagnosis ofUTI in dementia patients can be problematic, as these patientsare less likely to present with a typical clinical history andlocalised urinary symptoms compared with younger patients[57]. The team, therefore, arranges a home visit to perform adipstick urine test. If the urine dipstick test is suggestive ofinfection (positive nitrates or leukocytes) clinical monitoringteam advises the person with dementia/carer to visit the GP thesame day to obtain a prescription for antibiotics. MonitoringTeam also informs the GP of test results and requesting forantibiotics to be prescribed.One potential criticism of our UTI intervention algorithmcould be the possibility of antibiotic over-prescribing con-tributing to the spread of antibiotic resistance. However, recentevidence demonstrates that in elderly patients with a diagnosisof UTI in primary care, no antibiotics and delayed antibioticsare associated with a significant increase in bloodstreaminfection and all-cause mortality compared with immediatetreatment [58]. Therefore, early prescription of antibiotics for this vulnerable group of older adults is advised in view oftheir increased susceptibility to sepsis after UTI and despite agrowing pressure to reduce inappropriate antibiotic use.The impact of our in-home monitoring technologies and theembedded machine learning models on clinical outcomes in-cluding hospitalisation, institutionalisation and mortality ratesis part of an ongoing study. Nevertheless, the current workdemonstrates the effectiveness of the proposed algorithm andits translation into real-life clinical interventions. Fig 8 illus-trates individual cases of agitation and UTI correctly identifiedby the algorithm, with the digital markers demonstrating abehavioural anomaly.VI. C
ONCLUSION
To avoid unplanned hospital admissions and provide earlyclues to detect the risk of agitations and infections, wecollected the daily activity data and vital signs by in-homesensory devices. The noise and redundant information in thedata lead to inaccuracy predictions for the traditional machinelearning algorithms. Furthermore, the traditional machinelearning models cannot give explanation of the predictions.To address these issues, we proposed a model that can notonly outperform the traditional machine learning methods butalso provide the explanation of the predictions. The proposedrationalising block, which is based on the rational andattention mechanism, can process healthcare time-series databy filtering the redundant and less informative information.Furthermore, the filtered data can be regarded as the importantinformation to support clinical treatment. We also demonstratethe focal loss can help to improve the performance on theimbalanced clinical dataset and attention-based models canbe used effectively in healthcare data analysis. The evaluationshows the effectiveness of the model in a real-world clinicaldataset and describes how it is used to support people withdementia. A
CKNOWLEDGMENT
This research is funded by the UK Medical Research Coun-cil (MRC), Alzheimer’s Society and Alzheimer’s Research UKand supported by the UK Dementia Research Institute. R et al. ,“Dementia prevention, intervention, and care: 2020 report of the lancetcommission,” The Lancet , vol. 396, no. 10248, pp. 413–446, 2020.[4] J. Pickett, C. Bird, C. Ballard, S. Banerjee, C. Brayne, K. Cowan,L. Clare, A. Comas-Herrera, L. Corner, S. Daley et al. , “A roadmapto advance dementia research in prevention, diagnosis, intervention, andcare by 2025,”
International journal of geriatric psychiatry , vol. 33,no. 7, pp. 900–906, 2018.[5] A. Feast, M. Orrell, G. Charlesworth, N. Melunsky, F. Poland, andE. Moniz-Cook, “Behavioural and psychological symptoms in dementiaand the challenges for family carers: systematic review,”
The BritishJournal of Psychiatry , vol. 208, no. 5, pp. 429–434, 2016.[6] G. T. Buhr, M. Kuchibhatla, and E. C. Clipp, “Caregivers’ reasons fornursing home placement: clues for improving discussions with familiesprior to the transition,”
The Gerontologist , vol. 46, no. 1, pp. 52–61,2006.[7] B. C. Peach, G. J. Garvan, C. S. Garvan, and J. P. Cimiotti, “Riskfactors for urosepsis in older adults: a systematic review,”
Gerontologyand geriatric medicine , vol. 2, p. 2333721416638980, 2016.[8] S. Tal, V. Guller, S. Levi, R. Bardenstein, D. Berger, I. Gurevich, andA. Gurevich, “Profile and prognosis of febrile elderly patients withbacteremic urinary tract infection,”
Journal of Infection , vol. 50, no. 4,pp. 296–305, 2005.[9] C. Fogg, P. Griffiths, P. Meredith, and J. Bridges, “Hospital outcomesof older people with cognitive impairment: An integrative review,”
International journal of geriatric psychiatry , vol. 33, no. 9, pp. 1177–1197, 2018.[10] S. Enshaeifar, P. Barnaghi, S. Skillman, D. Sharp, R. Nilforooshan, andH. Rostill, “A digital platform for remote healthcare monitoring,” in
Companion Proceedings of the Web Conference , 2020.[11] S. Majumder, E. Aghayi, M. Noferesti, H. Memarzadeh-Tehran, T. Mon-dal, Z. Pang, and M. J. Deen, “Smart homes for elderly health-care—recent advances and research challenges,”
Sensors , vol. 17, no. 11,p. 2496, 2017.[12] R. Turjamaa, A. Pehkonen, and M. Kangasniemi, “How smart homesare used to support older people: An integrative review,”
InternationalJournal of Older People Nursing , vol. 14, no. 4, p. e12260, 2019.[13] K. K. Peetoom, M. A. Lexis, M. Joore, C. D. Dirksen, and L. P. De Witte,“Literature review on monitoring technologies and their outcomes inindependently living elderly people,”
Disability and Rehabilitation:Assistive Technology , vol. 10, no. 4, pp. 271–294, 2015.[14] R. Miotto, L. Li, B. A. Kidd, and J. T. Dudley, “Deep patient: anunsupervised representation to predict the future of patients from theelectronic health records,”
Scientific reports , vol. 6, no. 1, pp. 1–10,2016.[15] C. S. Ross-Innes, H. Chettouh, A. Achilleos, N. Galeano-Dalmau,I. Debiram-Beecham, S. MacRae, P. Fessas, E. Walker, S. Varghese,T. Evan et al. , “Risk stratification of barrett’s oesophagus using a non-endoscopic sampling method coupled with a biomarker panel: a cohortstudy,”
The lancet Gastroenterology & hepatology , vol. 2, no. 1, pp.23–31, 2017.[16] Z. C. Lipton, D. C. Kale, C. Elkan, and R. Wetzel, “Learning to diagnosewith lstm recurrent neural networks,” arXiv preprint arXiv:1511.03677 ,2015.[17] C. Esteban, O. Staeck, S. Baier, Y. Yang, and V. Tresp, “Predictingclinical events by combining static and dynamic information usingrecurrent neural networks,” in . IEEE, 2016, pp. 93–101.[18] E. Choi, M. T. Bahadori, A. Schuetz, W. F. Stewart, and J. Sun, “Doctorai: Predicting clinical events via recurrent neural networks,” in
MachineLearning for Healthcare Conference , 2016, pp. 301–318.[19] I. M. Baytas, C. Xiao, X. Zhang, F. Wang, A. K. Jain, and J. Zhou,“Patient subtyping via time-aware lstm networks,” in
Proceedings of the23rd ACM SIGKDD international conference on knowledge discoveryand data mining , 2017, pp. 65–74.[20] H. Harutyunyan, H. Khachatrian, D. C. Kale, G. Ver Steeg, andA. Galstyan, “Multitask learning and benchmarking with clinical timeseries data,”
Scientific data , vol. 6, no. 1, pp. 1–18, 2019. [21] E. Choi, M. T. Bahadori, J. Sun, J. Kulas, A. Schuetz, and W. Stewart,“Retain: An interpretable predictive model for healthcare using reversetime attention mechanism,” in
Advances in Neural Information Process-ing Systems , 2016, pp. 3504–3512.[22] F. Ma, R. Chitta, J. Zhou, Q. You, T. Sun, and J. Gao, “Dipole: Diagnosisprediction in healthcare via attention-based bidirectional recurrent neuralnetworks,” in
Proceedings of the 23rd ACM SIGKDD internationalconference on knowledge discovery and data mining . ACM, 2017,pp. 1903–1911.[23] T. Bai, S. Zhang, B. L. Egleston, and S. Vucetic, “Interpretable rep-resentation learning for healthcare via capturing disease progressionthrough time,” in
Proceedings of the 24th ACM SIGKDD InternationalConference on Knowledge Discovery & Data Mining . ACM, 2018, pp.43–51.[24] L. Ma, J. Gao, Y. Wang, C. Zhang, J. Wang, W. Ruan, W. Tang, X. Gao,and X. Ma, “Adacare: Explainable clinical health status representationlearning via scale-adaptive feature extraction and recalibration,” arXivpreprint arXiv:1911.12205 , 2019.[25] H. Song, D. Rajan, J. J. Thiagarajan, and A. Spanias, “Attend anddiagnose: Clinical time series analysis using attention models,” in
Thirty-second AAAI conference on artificial intelligence , 2018.[26] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in
Advancesin neural information processing systems , 2017, pp. 5998–6008.[27] S. Hochreiter, Y. Bengio, P. Frasconi, J. Schmidhuber et al. , “Gradientflow in recurrent nets: the difficulty of learning long-term dependencies,”2001.[28] J. M. Johnson and T. M. Khoshgoftaar, “Survey on deep learning withclass imbalance,”
Journal of Big Data , vol. 6, no. 1, p. 27, 2019.[29] S. Jain and B. C. Wallace, “Attention is not explanation,” arXiv preprintarXiv:1902.10186 , 2019.[30] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote:synthetic minority over-sampling technique,”
Journal of artificial intel-ligence research , vol. 16, pp. 321–357, 2002.[31] X.-Y. Liu, J. Wu, and Z.-H. Zhou, “Exploratory undersampling forclass-imbalance learning,”
IEEE Transactions on Systems, Man, andCybernetics, Part B (Cybernetics) , vol. 39, no. 2, pp. 539–550, 2008.[32] B. Krawczyk, “Learning from imbalanced data: open challenges andfuture directions,”
Progress in Artificial Intelligence , vol. 5, no. 4, pp.221–232, 2016.[33] B. E. Lyons, D. Austin, A. Seelye, J. Petersen, J. Yeargers, T. Riley,N. Sharma, N. Mattek, H. Dodge, K. Wild et al. , “Corrigendum: Perva-sive computing technologies to continuously assess alzheimer’s diseaseprogression and intervention efficacy,”
Frontiers in aging neuroscience ,vol. 7, p. 232, 2015.[34] A. Akl, B. Taati, and A. Mihailidis, “Autonomous unobtrusive detectionof mild cognitive impairment in older adults,”
IEEE transactions onbiomedical engineering , vol. 62, no. 5, pp. 1383–1394, 2015.[35] L. Schwickert, C. Becker, U. Lindemann, C. Maréchal, A. Bourke,L. Chiari, J. Helbostad, W. Zijlstra, K. Aminian, C. Todd et al. , “Falldetection with body-worn sensors,”
Zeitschrift für Gerontologie undGeriatrie , vol. 46, no. 8, pp. 706–719, 2013.[36] I. Lazarou, A. Karakostas, T. G. Stavropoulos, T. Tsompanidis, G. Med-itskos, I. Kompatsiaris, and M. Tsolaki, “A novel and intelligent homemonitoring system for care support of elders with cognitive impairment,”
Journal of Alzheimer’s Disease , vol. 54, no. 4, pp. 1561–1591, 2016.[37] A. Bankole, M. Anderson, T. Smith-Jackson, A. Knight, K. Oh, J. Brant-ley, A. Barth, and J. Lach, “Validation of noninvasive body sensornetwork technology in the detection of agitation in dementia,”
AmericanJournal of Alzheimer’s Disease & Other Dementias ® , vol. 27, no. 5, pp.346–354, 2012.[38] T. Fleiner, P. Haussermann, S. Mellone, and W. Zijlstra, “Sensor-basedassessment of mobility-related behavior in dementia: feasibility andrelevance in a hospital context,” International Psychogeriatrics , vol. 28,no. 10, p. 1687, 2016.[39] M. J. Rantz, M. Skubic, R. J. Koopman, L. Phillips, G. L. Alexander,S. J. Miller, and R. D. Guevara, “Using sensor networks to detecturinary tract infections in older adults,” in . IEEE,2011, pp. 142–149.[40] S. Enshaeifar, A. Zoha, S. Skillman, A. Markides, S. T. Acton, T. El-saleh, M. Kenny, H. Rostill, R. Nilforooshan, and P. Barnaghi, “Machinelearning methods for detecting urinary tract infection and analysing dailyliving activities in people with dementia,”
PloS one , vol. 14, no. 1, p.e0209909, 2019.[41] C. Wu and M. E. Thompson, “Stratified sampling and cluster sampling,”in
Sampling Theory and Practice . Springer, 2020, pp. 33–56. [42] D. W. Aha and R. L. Bankert, “A comparative evaluation of sequentialfeature selection algorithms,” in Learning from data . Springer, 1996,pp. 199–206.[43] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal lossfor dense object detection,” in
Proceedings of the IEEE internationalconference on computer vision , 2017, pp. 2980–2988.[44] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation byjointly learning to align and translate,” arXiv preprint arXiv:1409.0473 ,2014.[45] M. Usama, B. Ahmad, W. Xiao, M. S. Hossain, and G. Muhammad,“Self-attention based recurrent convolutional neural network for diseaseprediction using healthcare data,”
Computer methods and programs inbiomedicine , vol. 190, p. 105191, 2020.[46] A. Galassi, M. Lippi, and P. Torroni, “Attention, please! a criticalreview of neural attention models in natural language processing,” arXivpreprint arXiv:1902.02181 , 2019.[47] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for imagerecognition,” in
Proceedings of the IEEE conference on computer visionand pattern recognition , 2016, pp. 770–778.[48] J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXivpreprint arXiv:1607.06450 , 2016.[49] S. Sperandei, “Understanding logistic regression analysis,”
Biochemiamedica: Biochemia medica , vol. 24, no. 1, pp. 12–18, 2014.[50] F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget:Continual prediction with lstm,” 1999.[51] M. H. Hassoun et al. , Fundamentals of artificial neural networks . MITpress, 1995.[52] T. A. Lasko, J. C. Denny, and M. A. Levy, “Computational phenotypediscovery using unsupervised feature learning over noisy, sparse, andirregular clinical data,”
PloS one , vol. 8, no. 6, 2013.[53] Z. Che, D. Kale, W. Li, M. T. Bahadori, and Y. Liu, “Deep computationalphenotyping,” in
Proceedings of the 21th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining et al. , “Dementia: assessment, management and support forpeople living with dementia and their carers,” 2018.[56] E. Ijaopo, “Dementia-related agitation: a review of non-pharmacologicalinterventions and analysis of risks and benefits of pharmacotherapy,”
Translational psychiatry , vol. 7, no. 10, pp. e1250–e1250, 2017.[57] M. Lutters and N. B. Vogt-Ferrier, “Antibiotic duration for treatinguncomplicated, symptomatic lower urinary tract infections in elderlywomen,”
Cochrane Database of Systematic Reviews , no. 3, 2008.[58] M. Gharbi, J. H. Drysdale, H. Lishman, R. Goudie, M. Molokhia, A. P.Johnson, A. H. Holmes, and P. Aylin, “Antibiotic management of urinarytract infection in elderly patients in primary care and its association withbloodstream infections and all cause mortality: population based cohortstudy,” bmjbmj