Positive blood culture detection in time series data using a BiLSTM network
Leen De Baets, Joeri Ruyssinck, Thomas Peiffer, Johan Decruyenaere, Filip De Turck, Femke Ongenae, Tom Dhaene
PPositive blood culture detection in time series datausing a BiLSTM network
Leen De Baets, Joeri Ruyssinck, Thomas Peiffer,Filip De Turck, Femke Ongenae, Tom Dhaene
IBCNGhent University - iMinds9052 Ghent, Belgium [email protected]
Johan Decruyenaere
Ghent University HospitalGhent University9000 Ghent, Belgium
Abstract
The presence of bacteria or fungi in the bloodstream of patients is abnormal and canlead to life-threatening conditions. A computational model based on a bidirectionallong short-term memory artificial neural network, is explored to assist doctors inthe intensive care unit to predict whether examination of blood cultures of patientswill return positive. As input it uses nine monitored clinical parameters, presentedas time series data, collected from
ICU admissions at the Ghent UniversityHospital. Our main goal is to determine if general machine learning methods andmore specific, temporal models, can be used to create an early detection system.This preliminary research obtains an area of . under the precision recallcurve, proving the potential of temporal neural networks in this context. A positive blood culture is defined as a blood sample in which bacteria or fungi are present. Thisgrowth of organisms in the blood stream can lead to inflammation throughout the body or even organfailure or death [1]. When doctors suspect a patient to test positive they can decide to advance toa blood culture test. Symptoms indicative of a likely positive culture are complex and not fullyunderstood. Nevertheless, it is suspected a link exists between a patient’s physiological data and theoutcome of such a test.Literature presents several techniques to detect sepsis [2,3,4,5] from patients physiological data.Sepsis is a condition related to a positive blood culture [6] and detection thereof could be similar todetecting positive blood cultures. Although the monitored patient data is time dependent, no modelshave been proposed in literature that specifically model the time aspect. This paper presents our workto explore the potential of temporal models to detect positive blood cultures.
A database was constructed with physiological information from patients admitted at theintensive care unit (ICU) of the Ghent University Hospital whereof admissions had a positiveblood culture test. For all other patients, a blood test was performed which returned negative. For eachpatient, nine parameters were measured and calculated, these are listed in Table 1. Each parameter ismonitored with a different frequency. The total dataset contains more than fourteen million values.First, we filter out outliers. We do this by defining bio-limit ranges for each variable (see Table 1).Each value that falls outside this range is considered an outlier and removed. These outliers arecaused by human error or machine malfunction and prove to be rare ( . of the data), as the a r X i v : . [ c s . L G ] D ec able 1: Variables monitored per patientVariable bio-limits sample approachTemperature [29 − maxBlood thrombocyte count minBlood leukocyte count meanC-Reactive Protein maxSepsis-related organ failure assessment maxHeart rate [30 − maxRespiratory Rare [0 − maxInternational Normalized Ratio of prothrombine time maxmean Systemic Arterial Pressure [30 − maxdatabase values are checked by study nurses. After removing the outliers, the data is normalised pervariable using: n = x − avg ∗ std (1)where x is the value, avg the average of all values and std is the standard deviation.As each of the variables in the database has its own monitoring frequency, this results in a differentsequence length for each variable per patient. However, the method used in this paper (see Section3) requires the sequence length of all variables to be equal. This is obtained by resampling the data.To define this, the total sequence time, sampling frequency, and sample end time need to be defined.We used the expertise of the medical experts involved in this research to initialise these parameters.Ideally, multiple settings and the effects of these parameters should be explored, but this lies beyondthe scope of this initial study. More specifically, the total sequence time is configured to be daysand the sampling frequency to one sample per hour. This results in a total of points per variableper patient. As end of the sampling period, we take the moment when the first positive sample isestablished. If no positive sample is encountered , we choose as the sampling end-point the lastavailable time point. The beginning of the sampled period is the end time minus days. If there isnot enough data available for a patient (e.g. if the admission only happened days before), then thedata is padded with the means of the variables (zero because of the normalisation). If the samplingfrequency of a variable is higher than one sample per hour, we subsample in such a way that theminimal, maximal or average value (depending on the variable, see Table 1) is calculated in thesample window. If the sampling frequency is lower, we will repeat values.In the end, there is a time-sequence of points available for each patient where each point has features. A patient’s label is one if it has a positive blood sample and the label is zero otherwise.Recent research [7] handles the different monitoring frequencies by treating the formed gaps asfeatures. As it lead to superior results in their case, future research should investigate this. A Recurrent Neural Network (RNN) is a computational model designed to work with temporalfeatures. It is similar to a feed forward neural network with the extension that cycles are present inthe network. Through those cycles, the network can implement memory, by allowing it to combinepresent inputs with inputs from several time steps in the past.A commonly recognized problem in training recurrent neural networks is the vanishing gradientproblem. The influence of inputs from several time steps fades away exponentially. This makesit impossible for those network to learn dependencies that span over long periods of time. LongShort-Time Memory (LSTM) networks [8] mitigate this problem by introducing the principle ofgating. Conceptionally, these gates allow the network to implement small memory cell that is ableto contain it’s hidden state for longer periods of time, by blocking this cell’s inputs and/or outputs.In a standard LSTM, information only flows in the forward time direction. A bidirectional LSTM(BiLSTM) also allows dependencies in the reverse direction, by combining two normal LSTMs,processing the sequence in both directions. Figure 1 shows a schematic of a BiLSTM.2he basic network that is used for solving our problem has an input layer requesting the time sequenceas a x matrix. The input is then passed to one BiLSTM-layer that uses the tanh -function asactivation function to introduce non-linearity. One single output is generated. This number is theprediction whether or not the given time sequence originates from a person with a positive bloodculture or not. This is a floating point number, thus a threshold should be defined to binary classifythe patient having a positive culture or not. We will not define a hard threshold. Rather, the precisionrecall curve is generated by varying this threshold.Figure 1: Topology of an unfolded BiLSTM network with 2 imput features and 2 LSTM cells and anextra hidden feedforward layer.To train the parameters of the network, the mean-squared error is used. Because the used data isimbalanced ( positives = , negatives = ), the cost function is adapted in such a way thata larger error is given when a positive patient is wrongly classified compared to wrongly classifying anegative patient: MSE = n (cid:88) i =1 w y,i (ˆ y i − y i ) (2)where n are the amount of patients in the training set, y i is the label (positive or negative culture), ˆ y i is the prediction, w y,i is the class weight. This class weight is chosen such that patients with positivecultures are 8 times as important, since there are eight times as many patients with negative cultures. This section handles the evaluation of the network. Validation is done using the precision recall (PR)curve, which plots the precision against the recall. A good PR curve is defined by surface of thearea it encloses, this is the so-called area-under-the-curve (AUC). The larger the AUC, the better.Compared to the AUC of a receiving operating characteristic (ROC) curve, the AUC of the PR oftenprovides a more clear metric of performance on imbalanced data.For evaluation, the data is split into a training set ( ) and a test set ( ). This is done once in astratified manner. On this training set, 10-fold cross validation is done to select the BiLSTM networkwith the optimal hyperparameters. The considered hyperparameters are the number of hidden nodes( [10 , , ), the learning rate ( [0 . , . , . ). The maximal number of epochs is butlearning stops early if the PR AUC of the validation set is higher than or when it lowers again.The optimal parameters are chosen such that the average of the PR AUC over the validation setsis maximal. The final model is an ensemble of the 10 models trained on the train data splits. Note,the division into different sets is done using stratified sampling guaranteeing that the proportion ofpositive samples in every set is equal.The optimal hyperparameters are for the number of hidden nodes = , and . for the learningrate. The PR curve on the test set is shown in Figure 2 and the PR AUC is . . To compare, twobaselines were also evaluated. Baseline keeps predicting the same class all the time, resulting in aPR AUC of . . Baseline predicts the two classes according to the class imbalance, achievinga PR AUC of . . Both baselines perform significantly worse than the BiLSTM network.3igure 2: The PR curve on the validation set using the optimal BiLSTM. This initial study investigated whether it is possible to use temporal information for predicting bloodculture test outcomes. A BiLSTM network was built taking as input a time sequence containinginformation from days and with a sampling frequency of one sample per hour. The output wasa single number representing if there was a positive blood culture. Looking at the result, we canconclude that using temporal effects is useful in this setting.Future work includes improving the network topology and comparing different types of networks thatare able capture temporal effects, such as (bidirectional) recurrent neural networks or gated recurrentunits. A direct comparison with non-temporal methods is necessary to truly examine the advantagesof exploiting the temporal information in this data.Other open problems include investigating the influence of the chosen hyperparameters such as thesample length and frequency, used to generate the time sequences. Especially interesting is the choiceof the sampling end time. In this research, we defined it as the time when the first positive bloodculture was taken, or as the last available point. However, one can choose the sampling end time to bean arbitrary time before the first positive samples are present. This would generate a clear benefit in apractical setting, as the system would be able to act as a decision support system and early detectionalgorithm, proposing the doctor to perform a test. References [1] Morrell, M., Fraser, V. J., & Kollef, M. H. (2005). De- laying the empiric treatment of candida bloodstreaminfection until positive blood culture results are obtained: a potential risk factor for hospital mortality. In
Antimicrobial agents and chemotherapy 49 , pp. 3640–3645[2] Ho, J. C., Lee, C. H. & Ghosh, J. (2012) Imputation-enhanced prediction of septic shock in ICU patients. In
Proceedings of the ACM SIGKDD Workshop on Health Informatics [3] Mani, S., Ozdas, A., Aliferis, C., Varol, H.A., Chen, Q., Carnevale, R., Chen, Y., Romano-Keeler, J., Nian, H.and Weitkamp, J.H., (2014). Medical decision support using machine learning for early detection of late-onsetneonatal sepsis. In
Journal of the American Medical Informatics Association 21(2) , pp. 326–336.[4] Kim, J., Blum, J. M., & Scott, C. D. (2010). Temporal features and kernel methods for predicting sepsis inpostoperative patients.[5] Henry, K. E., Hager, D. N., Pronovost, P. J., & Saria, S. (2015). A targeted real-time early warning score(TREWScore) for septic shock. In
Science Translational Medicine 7(299) , pp. 1–9.[6] Rangel-Frausto, M. S., Pittet, D., Costigan, M., Hwang, T., Davis, C. S., & Wenzel, R. P. (1995). Thenatural history of the systemic inflammatory response syndrome (SIRS): a prospective study. In
Jama 273(2) , pp.117–123.[7] Lipton, Z. C., Kale, D. C., & Wetzel, R. (2016). Directly Modeling Missing Data in Sequences with RNNs:Improved Classification of Clinical Time Series. In
Machine Learning for Healthcare , pp. 1–17[8] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. In
Neural computation 9(8) , pp.1735–1780., pp.1735–1780.