[PDF] Enhancing Human-Machine Teaming for Medical Prognosis Through Neural Ordinary Differential Equations (NODEs)

Abstract

Machine Learning (ML) has recently been demonstrated to rival expert-level human accuracy in prediction and detection tasks in a variety of domains, including medicine. Despite these impressive findings, however, a key barrier to the full realization of ML's potential in medical prognoses is technology acceptance. Recent efforts to produce explainable AI (XAI) have made progress in improving the interpretability of some ML models, but these efforts suffer from limitations intrinsic to their design: they work best at identifying why a system fails, but do poorly at explaining when and why a model's prediction is correct. We posit that the acceptability of ML predictions in expert domains is limited by two key factors: the machine's horizon of prediction that extends beyond human capability, and the inability for machine predictions to incorporate human intuition into their models. We propose the use of a novel ML architecture, Neural Ordinary Differential Equations (NODEs) to enhance human understanding and encourage acceptability. Our approach prioritizes human cognitive intuition at the center of the algorithm design, and offers a distribution of predictions rather than single outputs. We explain how this approach may significantly improve human-machine collaboration in prediction tasks in expert domains such as medical prognoses. We propose a model and demonstrate, by expanding a concrete example from the literature, how our model advances the vision of future hybrid Human-AI systems.

Full PDF

EEnhancing Human-Machine Teaming for Medical PrognosisThrough Neural Ordinary Diﬀerential Equations (NODEs)

D. Fompeyrine , E. S. Vorm , N. Ricka , F. Rose , and G. Pellegrin February 2021

Abstract

Machine Learning (ML) has recently beendemonstrated to rival expert-level humanaccuracy in prediction and detection tasksin a variety of domains, including medicine.Despite these impressive ﬁndings, how-ever, a key barrier to the full realizationof ML’s potential in medical prognoses istechnology acceptance. Recent eﬀorts toproduce explainable AI (XAI) have madeprogress in improving the interpretabilityof some ML models, but these eﬀortssuﬀer from limitations intrinsic to theirdesign: they work best at identifying whya system fails, but do poorly at explainingwhen and why a model’s prediction iscorrect. We posit that the acceptabilityof ML predictions in expert domains islimited by two key factors: the machine’shorizon of prediction that extends beyondhuman capability, and the inability formachine predictions to incorporate humanintuition into their models. We proposethe use of a novel ML architecture, NeuralOrdinary Diﬀerential Equations (NODEs)to enhance human understanding andencourage acceptability. Our approach pri-oritizes human cognitive intuition at thecenter of the algorithm design, and oﬀers a Founder and CEO, PhD in Clinical Psychology,Myndblue PhD in Human-Computer Interaction, Lead Evalua-tor, DARPA Explainable AI Program Lead Data Scientist, PhD in Mathematics, Myndblue Biomedical Data Scientist, PhD in Computational Bi-ology, Myndblue Machine Learning and Project Engineer, MSc in Ap-plied Physics, Myndblue distribution of predictions rather than sin-gle outputs. We explain how this approachmay signiﬁcantly improve human-machinecollaboration in prediction tasks in expertdomains such as medical prognoses. Wepropose a model and demonstrate, byexpanding a concrete example from theliterature, how our model advances thevision of future hybrid Human-AI systems.

Keywords—

Expert System, Forecast, OrdinaryDiﬀerential Equations, Neural Network, VariationalApproach, Explainability, Acceptability, Intuition,Usability, Human-Machine TeamingCorresponding address: [email protected]

Introduction

Businesses and governments around the world are rac-ing at breakneck speeds to build systems that canleverage machine learning (ML) to gain strategic ad-vantages. While ML is quickly being introduced innew ﬁelds such as logistics [1] and agriculture [2], onedomain is already an old familiar friend: healthcare.The vision of being able to accurately predict a pa-tient’s medical trajectory by integrating vast amountsof disparate data has inspired generations of computerscientists and has resulted in a variety of early appli-cations of artiﬁcial intelligence in the form of deci-sion aids and decision support systems. At the heartof this vision is a collaboration between human andmachine; a synergistic hybrid system that aﬀords hu-mans with near superhuman abilities to compute mas-sive amounts of data and with it project far into thefuture with brilliant accuracy. This vision of human-machine teaming through the application of human-AI agents is fast becoming a reality today, thanks a r X i v : . [ c s . A I] F e b argely to ML. Indeed, ML algorithms have provento be as accurate or better than expert-level predic-tions in various medical domains, from image classi-ﬁcation to time-series analysis, and many others [3,4]. But while these advances promise much, the re-alization of true human-machine teaming in medicalprognoses may be hindered by a familiar and stub-born barrier— the lack of human trust. As early asthe 1980s, thorough comparisons between computer-generated recommendations and experts had alreadydemonstrated the critical usefulness of artiﬁcial deci-sion aids [5], but the lack of algorithmic transparencycaused signiﬁcant conﬂicts and inevitable delays thatultimately prevented the widespread adoption of ex-pert systems into mainstream use. A close reading ofthe literature from this era reveals that these failureswere caused by ﬂaws in usability , not by algorithmicaccuracy or eﬃciency.Modern day ML algorithms are direct descendantsof these expert systems of the past, and they carrymany of the same challenges. Concerns over low algo-rithmic transparency and the blackbox nature of al-gorithms such as deep learning have given rise to newinterdisciplinary ﬁelds of research aimed at improvinginterpretability and transparency of ML algorithms,so-called explainable artiﬁcial intelligence (XAI) [6].Perhaps driven by lessons learned from earlier gener-ations of clinical decision support failures, XAI hasquickly been oﬀered up as the solution, even whenthe problem is seldom articulated or perhaps not evenfully understood. In order to better understand whyexplainability and transparency play a central role inthe potential widespread adoption of ML, the follow-ing section we illustrate two general scenarios thatmotivate their importance. Following this, we dis-cuss why XAI alone is insuﬃcient in achieving thegoal of human-AI cooperation for medical prognoses.We then introduce Neural Ordinary Diﬀerential Equa-tions (NODEs) as a proposed machine learning archi-tecture for use in medical predictions and prognoses,and we illustrate how their use is intrinsically designedfor maximum usability, and is superior in supportinghuman intuition and decision making in medical prog-noses. The Utility of Explainability

Research has uncovered two predominant situationsin which users of machine learning encounter usabil-ity conﬂicts and hence hesitate to trust their out-puts [7]. The ﬁrst are conﬂicts that arise whenan ML algorithm or the overarching intelligent sys-tems that embody them suddenly behave unpre-dictably or erratically. These oﬀ-nominal behaviorscan have widespread consequences on user conﬁdenceand trust. When systems that are ordinarily pre-dictable and reliable suddenly behave unpredictably or give an unexpected output, questions and con-cerns naturally follow. Machine learning models thatperform very well under one condition often displaywildly diﬀerent behaviors when even small changesare made. Sometimes these errors can be traced to aroot cause and behaviors can be easily explained. Inother cases, tracing the error is much more diﬃcult,and oftentimes impossible. This is of immediate con-cern for makers of industrial-scale autonomous sys-tems such as self-driving cars [8], but also of greatconcern in applications that feature machine learningin the role of decision support, as is the case in clin-ical decision support systems. Physicians, hospitaladministrators and even government regulators, uponseeing the apparently brittleness of ML are likely toask themselves ”if this system has such low reliabil-ity and unpredictability, how can I ethically justifyusing it for my patients?” Without some measure ofassurance of its reliability, low trust and in some casesabandonment of the technology as viable remains themost likely outcome. The argument for XAI is there-fore driven by the understanding that no trust = nouse . Hence much work has recently focused on im-proving the transparency of ML algorithms to under-stand why they fail . While XAI research has resultedin a number of small breakthroughs in terms of MLdevelopment techniques, the true beneﬁts from theseeﬀorts are limited mostly to programmers and debug-gers whose goal is to build more robust and reliablesystems. While important, XAI’s current focus on ex-plaining what went wrong does little to help users de-termine when and why an algorithmic prediction maybe correct, and so does little to help users determinewhether or not to use, trust, engage with, or adopt AImoving forward. To have a measurable eﬀect in theseareas, we need a prospective focus.The second motivating scenario for XAI, therefore,is one that arises in situations where the user mustmake a prospective decision based on the output of thesystem. For instance, this might come in the form ofwhether or not a radiologist decides to accept or vali-date a diagnostic ﬂag created by ML on a medical im-age, or whether or not to act on ML-based predictionsthat indicate an aggressive treatment regimen may bewarranted in a given patient. In these situations, usersare not aﬀorded the luxury of ground truth, i.e., thereis no direct way of knowing whether or not the MLalgorithm is accurate because it is projecting a futurestate that has yet to occur. Instead, users must wres-tle with whether or not a projection of future events(e.g., prognosis) seems likely and plausible. As withany decision scenario of any importance, humans nat-urally seek additional information with which to in-form and support their decision. This information-seeking behavior typically comes in the form of ques-tioning [9], such as what speciﬁc data points predictthis person will make a good recovery? or perhaps hat is the reasoning behind this suggestion to treatwith an experimental drug? The argument for XAI toaddress this prospective scenario, therefore, is that themore answers to user questions a system can provide,the greater the degree of trustworthiness the systemhas, and the greater the likelihood that the systemwill be used to the extent and in the manner in whichit was designed.

Limits to Prospective XAI

Unfortunately, while the majority of XAI has focusedon post-hoc explanation strategies, even the few ef-forts that are prospective in nature are severely lim-ited in their ability to improve usability and tech-nology acceptance for ML for at least three reasons.Firstly, there are practical limits to how many ques-tions a system can answer, or how much informa-tion can be meaningfully provided to human users.Designing systems that seek to provide mappings toevery component and sub-component would be cum-bersome to the point of being unusable. While rulesgoverning a natural phenomenon’s evolution lie in aconstrained space that can hypothetically be modelledcompletely, a fully transparent XAI prediction wouldneed to be able to master all possible future stateseven in regions that do not seem plausible at ﬁrst.This seems wildly unrealistic, as it would require adataset of unattainable size to explore and understandall the possible conﬁgurations. Labyrinthine causaldiagrams of high dimensional datasets are simply im-practical as they are too diﬃcult (or impossible) fora human to interpret.Secondly, another limitation to XAI approachesstems from how humans reason about causality. Cog-nitive scientists have long demonstrated that humansdo not typically engage in the kind of deliberate,methodical decision making (i.e., “slow thinking,” or“system two thinking”) that would make use of sucha robust and complete XAI system. Rather, mostdecision-making strategies are predominantly thosethat make eﬃcient use of heuristics, or mental short-cuts (i.e., “fast thinking” or “system one thinking”[10]). Human cognition strategies have evolved largelyto prioritize rapid decision making. Most decisionmaking scenarios are those where humans make quickassessments of the information and act, rather thancautiously and systematically pour over all availabledata. In other words, more data is seldom likely toresult in better decisions.Lastly, a limitation of XAI in improving theprospective prediction problem is that developersmaximize predictive accuracy of ML models, but dolittle to address the myriad of other human factorsthat play a role in how humans prognose and makedecisions. The role of intuition in expert decisionmaking has received much focus in the cognitive and neurosciences for many decades, especially in taskssuch as discovery and exploration [11, 12, 13, 14, 15].Earlier generations of artiﬁcial decision aids that at-tempted to mimic human decision making ran intotrouble because they could not account for informa-tion originating from outside of their knowledge base.Developing expert knowledge seems an illusive targetfor an artiﬁcial system because, as human expertisegrows, it also evolves towards more and more intuition[16] and subjectivity, and draws conclusions from in-formation that is broader than merely the data in apatient’s medical record. How patients look, how theyspeak, and how family members interact with themare all examples of factors that could potentially in-form an expert clinician and contribute to their deci-sion strategy. This extra-cognitive information is bothdiﬃcult to characterize and diﬃcult to model in ML.Current ML strategies do not prioritize or make use ofhuman intuition in their predictions, and so explana-tion strategies are not likely to improve the likelihoodof experts using them. Any cooperative vision whereML is a trusted component in a cooperative decisionmaking system, such as a fully-integrated clinical de-cision support system, should feature the strengths ofboth components (human and machine), rather thanlimiting the strengths of one over the other.Although explainability is a vital factor in aﬀect-ing human trust in ML algorithms [17], it is not en-tirely suﬃcient to achieve the true vision of how MLcan help humanity by improving our ability to predictthe future. To achieve true human-machine collabo-ration, especially in expert domains where high levelsof risks are inherent, ML systems will need to do morethan merely explain themselves. They will also needto adapt to and in some cases overcome the naturallimitations of our human cognitive evolution.

Horizons of Predictability: Limits ofhuman cognition in prediction

In the ﬁeld of physics, the horizon of predictability (HOP) refers to the limit after which forecast becomesimpossible due to the exponential accumulation of er-rors [18, 19]. Machine learning has a similar limit toits predictive horizon for the same reason [20, 21, 22,19]. This limit is unbreakable, in the sense that evenwith perfect knowledge of the underlying dynamics ofthe system, it is impossible to make predictions be-yond a certain point because the latent errors com-pound to such an extent that no certainty can beachieved. Although there are limits to how far outML can accurately make predictions, that horizon ofpredictability extends far beyond the horizon of pre-dictability of human beings (e.g., Figure 1).The limits of human prediction stem mainly fromour own cognitive capacity and tendencies, ratherthan from latent errors in the data. Limits to hu- iagnosis Prediction PhenomenonExpert limitations Diagnosis and decision-making reassemble matureknowledge elements in a projected support, i.e. a stablemental map to compute observed data of thephenomenon. Prediction reassembles non-mature knowledge elements(ideas) in a projected support, i.e. a mental map,consisting in a mix of structures and representations, thatis able to gather dynamically observed data of thephenomenon.The mental map could be considered as organizedfollowing a mechanistic model in patterns andsubpatterns that, dominates, hides, expands, reduces,secures internal coherence and organises dependencies.Subpatterns would then organize themselves dynamicallyaround one master pattern. The mental map could be considered projected asfollowing a series of consecutive states of the initialphenomenon that mix subpatterns, identify the masterpattern, respect internal coherence, reject noise, projectthe future organizations. Prediction then would beprojected around the domination of one master pattern.Expert limitations & Pace of internal changes in thephenomenon is most of the time anticipated in order toconfirm the diagnosis and take the correspondingdecision. A prognosis is delivered following the previouslyidentified mechanistic model. Pace of changes in the organization of all subpatterns canhardly be anticipated. Avoiding noise is at the origin ofpartiality in the prediction.

Table 1: Structure of mental process for diagnosis and prediction of a human expert.

Forecasted Time

MachineHuman U n c e r t a i n t y Human HOP Theorical Machine HOP C e r t a i n t y Figure 1: Human and machine horizons of predictability(HOP). Machine learning is able to make an accurate pre-diction at a longer timescale than human beings, but humansoften struggle to trust ML outputs because they are diﬃcultto comprehend, and do not incorporate all available informa-tion, including human intution. Our proposed architecture ex-tends human predictive performances up to a time nearer tothe theoretical machine HOP, thus enhancing human-machineteaming in medical prognoses. man cognitive capacity are well known. For example,Miller’s Law, or the so-called ‘magic number 7 plusor minus 2’ illustrates the limits of working memoryfunctions of human beings [23]. Humans have otherwell-known computational challenges as well. For ex-ample, they often struggle to comprehend abstractconcepts such as single-event probabilities and non-linear distributions of data [24]. These cognitive limi-tations severely limit human ability to make accuratepredictions, creating in essence a very near horizon ofpredictability.Aside from these computational limitations, hu-mans also suﬀer from cognitive ﬂaws that limit ourability to accurately project future states. As men-tioned earlier, our understanding of human evolutionpoints to the prioritization of rapid pattern recogni-tion, but not necessarily the ability to uncover and explore new emerging patterns. Our instinct to focuson single dominant patterns is quite useful in identify-ing and classifying known entities (e.g., diagnosing).But this instinct also means that our ability to predictfuture events is ultimately fragile because our focuson identifying dominant patters often means that weexclude emerging sub-patterns (what is necessary toaccurately make a prognosis). The process of

Diag-nosing [25] requires a mechanistic model, which neces-sitates multiple knowledge fundamentals at diﬀerentlevels of maturation. This information is used to guideour exploration until we ﬁnd an eventual matchingpattern, and hence a diagnosis is conﬁrmed. The pri-mary mechanism through which diagnoses are made,however, is through a ”ruling out” process, which con-sists largely of seeking evidence to support a mainhypothesis, and systematically dismissing other hy-potheses that are not supported by the data.

Prognosis , on the other hand, requires us to admitthe projection of ideas not yet formalized on a repre-sentational support, i.e. a mental map that has notmatured to a full mechanistic model. In an attemptto separate informational uncertainty from intrinsicmedical uncertainty, experts naturally attempt to an-ticipate future changes. Unfortunately, this projec-tion suﬀers from the same conﬁrmatory bias as men-tioned before [26]. When attempting to make pre-dictions, research demonstrates that the projectionof a series of consecutive states of a phenomenon isusually ruled by a dominant master pattern, to theexclusion of other potentially informative and inﬂu-ential patterns [16]. This dominant pattern is heavilyinformed by a feeling of coherence, which is aﬀect-ingly charged before been conscientiously represented[27]. In other words, to make sense of the chaos, hu-man beings tend to arrange available information intoa form of a narrative [28]. Studies consistently show hat decision making is greatly inﬂuenced by how co-herent a person’s narrative is constructed— whetherthat narrative is self-chosen, or presented to them inthe form of “evidence” [29]. To determine a prog-nosis, therefore, the prognosis that seems most likelyand plausible to the person is the one that arrangesthe data in the most coherent structure— i.e., theone that tells the most convincing story. Unfortu-nately, as has been demonstrated before, data do notalways arrange themselves neatly into logical causalrelationships that can be quickly appreciated by hu-man beings, which sadly means that a great dealof the time, human beings have a tendency to seeconnections where there are none [30]. In summary,our evolutionary drive to seek dominant patterns andour aﬃnity to arrange data into a narrative formatis especially useful when it comes to diagnosing, butnot especially useful for making prognoses. In orderto achieve true human-machine collaboration whereexperts conﬁdently leverage the predictive power ofML, the task at hand, therefore, should not be tofocus solely on creating more predictive algorithms,or creating more explainable models. These eﬀortshave already demonstrated their futility through pre-vious generations of clinical decision support systems.What we need instead is to create human-machinesystems that allow for the uniqueness of expert hu-man intuition to combine with the distant horizon ofpredictability of machine learning. A Post-Explanation ParadigmShift

So far we have detailed the problems that may createusability conﬂicts between users and machine learningalgorithms. Despite highly accurate systems, theseconﬂicts pose a signiﬁcant threat to the likelihoodof machine learning integrating and being formallyadopted by expert domains such as medicine. Becausemachine learning can reason and project out muchfurther than human capabilities, there is a gap be-tween the machine and human horizon of predictabil-ity— the limits at which accurate predictions can bemade. Current XAI approaches alone will not narrowthis gap because a) they are mostly retrospective infocus and do very little to explain future predictions;and b) we have human cognitive limitations (i.e., wehave a tendency to focus on predominant patternsthat are familiar to us and therefore ignore emerg-ing new patterns, and we have cognitive limitationsin how much data we can process).To overcome these limitations, we need systemsthat are speciﬁcally designed with the human pre-disposition for cognitive intuition in mind in orderto enhance acceptability and encourage collaboration. A system that seeks to augment, as opposed to sup-plant, intuition would be one that presents its outputsin forms that are easily understandable , to the point ofbeing practically available for humans to use as partof their reasoning. We cannot expect all users of MLto become experts in computer science in order to useML. Nor do we not want AI that presents itself as anoracle, or one that requires humans to trust it implic-itly and not ask many questions. But we also must bemindful of not creating “coercive AI” or “persuasiveAI” that lead human decision makers down a path ofour own choosing. So what are we to do?Rather than developing ways to extract informa-tion from intractable models, a plausible solution toencourage better human-machine collaboration withML is to design machine learning in such a way thatits mathematical forms and representations maximizehuman understanding and comprehension. Ratherthan requiring humans to understand the mechanismsunderlying ML, why not develop ML in such a waythat its outputs are packaged in a format that mosthumans can naturally understand? Much researchhas demonstrated that the way information is repre-sented (i.e., how it is displayed and visualized) candetermine a great deal on whether or not humanswill comprehend and understand it. For instance, thestatement ”If a patient has COVID-19 the probabil-ity that they will have a positive result on a rapidtest is 95%” is often confused with ”if a patient hasa positive test result the probability that they haveCOVID-19 is 95%.” This is an example of how causal-ity, the direction of inference, and conditional prob-abilities can easily be confounded. In the exampleabove, the ﬁrst statement is referring to the sensi-tivity of rapid COVID-19 tests (95% accurate at de-tecting COVID-19 [31]). The second statement, how-ever, confounds the directional inference, mistakenlyreversing the conditional probability [32]. For thisreason, best practices when displaying statistical riskcall for the use of frequency statements (e.g., COVID-19 tests will successfully identify 9 out of 10 peoplewho are infected), as they are more intuitively under-stood by most people [33].Another example, one salient to ML, is the relianceon probabilities to communicate uncertainty. Thisstrategy is very problematic for a variety of reasons.First, humans do not understand probability very wellunless they are speciﬁcally trained to do so [33]. Sec-ond, in order to fully appreciate probability, it is nec-essary to have information related to base rate andfrequency of occurrences (something that is seldomaﬀorded to users). Thirdly, single-event probabilitiesare notoriously prone to being misunderstood by users[24]. For example, the statement ”The system is 40%certain that a patient will develop PTSD” can be in-terpreted a number of diﬀerent ways. One might in-terpret the statement to mean that 40% of patients ith proﬁles like this one will develop PTSD, while an-other might interpret the statement to mean that thesystem will be able to predict future PTSD in 40% ofpatient records. These are all simple examples of howthe way that information is represented, or its form ,can either make that information better understood,or more likely to be confused. Just as numbers can beexpressed in a variety of diﬀerent forms, the outputsof ML can also. Our approach to using mathematicalrepresentations that capitalize on and augment hu-man intuition is to use Neural Ordinary DiﬀerentialEquations (NODEs) [34]. Neural Ordinary Diﬀerential Equa-tions: An elegant solution to theparadox of explainability

Ordinary Diﬀerential Equations (ODE’s) are wellknown in the ﬁelds of applied and pure mathemat-ics. Their long history of beneﬁcial use in physics andengineering has resulted in large and extremely well-tested, high performing diﬀerential equation libraries.Diﬀerential equations are a tried and tested tool formodelling data that until 2018 had been largely leftout of the conversation surrounding machine learn-ing. Their introduction as an architecture for machinelearning was met with much surprise and critical ac-claim from the scientiﬁc and computational commu-nities of practice, including the best paper of the yearat the 2018 Conference on Neural Information Pro-cessing Systems (NEURIPS, [34]).Applied to machine learning, Neural Ordinary Dif-ferential Equations (NODEs) are algorithms that en-code the dynamics of a system by learning an ordi-nary diﬀerential equation for function approximation,as opposed to training a neural network. NODEshave several advantages over other machine learningtechniques for providing clear and tractable outputs.First, they express the solution in continuous time asopposed to models discretizing the timeline into smalltime steps [35, 36, 37, 38] and can learn on irregu-lar time-series to best match real-world data (for in-stance biological measurements in the medical ﬁeld).As opposed to the more common Partial DiﬀerentialEquations (PDEs) [34, 39, 35], where the dynamicsof a multi-variate function is modeled; NODEs onlyconsider diﬀerentials with respect to a single param-eter [40, 41, 42]. Because we are interested in futureprojections (i.e., predictions or prognosis), the mostrelevant continuous indexing parameter is time . Con-sequently, we posit that using NODES with all deriva-tives being with respect to the time variable will aﬀordusers a tremendous beneﬁt in being able to compre-hend and trust ML outputs for future predictions. Forinstance, using a NODE architecture, it is possibleto let the latent information evolve for an arbitrary long time to uncover subtle information about the fu-ture evolution of the system. This serves as a usefulmethod of simulating future states, with time as thesingle diﬀerentiating factor. Similarly, NODEs can beused to invert the arrow of time, and eﬀectively re-produce the steps they took to arrive at any observedstate of the system. This eﬀectively aﬀords users atraceability analysis, and allows users to answer ques-tions about the steps that led to the current observedstate of the system. This process is described in theﬁrst line of Table 4.In addition to the beneﬁts mentioned above,NODEs also show better long-term predictions thanclassical recurrent neural network (RNN) architec-tures. Published works [34, 39] (and the compan-ion code [43]) have demonstrated for the ﬁrst timethe use of NODEs in a latent ODE architecture tomodel patients’ trajectories from physiological datarecorded in an intensive care unit (ICU). In this work,NODEs show a better sequence reconstruction andstate-of-the-art accuracy when predicting in-hospitalmortality or risk of re-admission compared to otherdeep learning architectures [39, 44]. More broadly,a system based on NODEs could be especially wellsuited to predict future states in noisy dynamic sys-tems, such as those commonly found in clinical deci-sion support.We summarize in Table 2 the main improvementsbetween existing explainability frameworks and ourproposed approach using NODEs.

FrameworkObjectiveTime-seriesanalysistechnologyAccuracyAcceptability mechanism Post explainability Provide a narrative on how the system is evolvingODE (Latent ODE)Greater than state of the artQuery-based Human/AI interactions ExplainabilityExplain where the system is evolving toRNN(VAE RNN)Baseline (state of the art)Fixed set of explanations provided

Table 2: Key changes between the explainability frameworkand the post-explainability framework presented here.

Properties of the latent space mod-elled by latent ODEs

To brieﬂy illustrate and summarize the basic func-tion of NODEs, we will brieﬂy discuss latent ODEs nd their technical structure. Latent ODEs are usedto model the evolution of a process across a timeseries based on data from an initial latent state.While RNNs are the go-to solution for modeling reg-ularly sampled time-series data, they do poorly whenpresented with irregular or inconsistent data, suchas the data commonly found in a patient’s medicalrecord. To achieve success with traditional RNNswhen dealing with inconsitent or irregular time-seriesdata, many workaround steps in data preprocessingare necessary [34]. These steps result in fairly accu-rate predictions, but without any of the information(particularly the time-related information) necessaryto understand the latent variables underlying the pre-diction. Latent ODEs, on the other hand, are su-perior to traditional RNNs because they are ﬂexiblewith respect to incomplete or inconsistent data, andare especially capable at modeling the future acrosstime. The resulting latent trajectory should containinformation that is both useful for the main classiﬁ-cation task, and for the reconstruction, thus showingthe important features of the original time-series. Ac-cordingly, this architecture is intrinsically suitable forirregularly sampled data, as is common in healthcaredata, whereas existing approaches must add times-tamps to RNNs in an artiﬁcial way.Roughly speaking, the latent ODE system takesmeasurements ( x , . . . , x t ) as input, and translatesthem into a latent internal representation ( z , . . . , z t )with internal dynamics following a learned equation dzdt = f θ ( z, (cid:15) ) , where f θ is expressed by a deep neural network tak-ing into account the noise (cid:15) involved in the system.The whole latent trajectory depends only on z , andcan be extrapolated for an arbitrary long time by inte-grating the diﬀerential equation, giving extrapolations( z , . . . , z N ) for any N . Finally, the latent trajectoryis decoded into an approximation (ˆ x , . . . , ˆ x N ) of theoriginal measurement. The encoder, decoder and dif-ferential equation weights are trained so that ˆ x is asclose as possible to the real trajectory x . It was pre-viously observed in the literature that latent ODEsachieve results that are comparable or better thanstate-of-the-art performances on real life datasets (onthe MIMIC-II dataset, see table 6 in [39], reproducedhere as Table 3, and on the MIMIC-III dataset see[44]). We refer the reader to [34, 39] for more exten-sive details on latent ODEs in machine learning.Latent trajectories have been demonstrated on sim-ulated datasets in the literature (see the examples onthe spiral dataset in [34]). These analyses, however,need to be interpreted within a certain context. First,simulated examples are usually low dimensional, sogenerating a visually compelling latent space does notnecessary imply that it will be possible for real life RNN-VAESurvival AUCExtrapolation ReconstructionMSE (x10 -3 ) Latent ODE

Table 3: Results of classiﬁcation and reconstruction for theMIMIC-II ICU dataset. The task is to predict the survival ofICU patients, measured by the survival accuracy and AUC.The goodness of the reconstruction is measured by the meansquare error (MSE) on normalized features. scenarios where data is noisy, incomplete, irregularlysampled, etc. Second, the task studied for these sim-ulated examples are usually restricted to reconstruc-tion. Thus it is impossible to question whether thelatent trajectory actually supports a prediction. Forinstance, enforcing acceptability of an automaticallygenerated prognosis by showing the possible futures ofthe patient and the important changes that will occurduring the projected trajectory.The analysis made in [39] focused on the neural net-work’s ability to predict patient mortality. Our mainobjective, however, is to show that using NODEs tomodel a system’s evolution leverages additional infor-mation about a patient’s trajectory, which contributesto human-level understandability and therefore im-proves the acceptability of the output (assuming theoutput is accurate and deserves to be accepted), whilenot compromising the predictive power compared tostate-of-the-art approaches.In the next section, we demonstrate how using aNODE architecture in machine learning can be ap-plied to provide enhanced acceptability and usabil-ity. To do this, we demonstrate our approach ona real life medical dataset (MIMIC-II), and analyzeto what extent the architecture proposed by [34, 39]helps our purposes. The MIMIC-II dataset is a pub-lic dataset with de-identiﬁed clinical care data for over58,000 hospital admission records collected in a sin-gle tertiary teaching hospital from 2001 to 2008. Inthis work, we focus on the mortality task: predictingwhether the patient will die in the hospital, and wealso produce a study of the reconstruction trajecto-ries from [39] in the case of ICU patients in order todemonstrate how these data dramatically improve theusability of machine learning predictions.

Oﬀering a probabilistic trajectoryhelps trigger human capabilities

Due to the probabilistic nature of NODEs, our pro-posed architecture can aﬀord not only a robust andtractable future patient trajectory, but a distributionof trajectories, each representing multiple potentialfutures of the patient, and each with associated prob-abilities. (For an illustration, see [39, Figures 4 and5]). In practice, this distribution of trajectories would ﬀord the user a great deal of insight. First, the userwould be able to easily observe the machine horizon ofpredictability as the point at which curves are too di-vergent to extract a coherent behaviour. TraditionalRNNs provide no such indication as to when a predic-tion becomes untrustworthy, and systems thus mustbe programmed to rely on training parameters to seta ﬁxed horizon of events independent of the system’sdynamics. NODEs, on the other hand, display theirhorizon of predictability intrinsically and, most im-portantly, intuitively. Trajectories that lie before thishorizon, therefore, are ones the user can have greaterconﬁdence in, and each can be analyzed individually.It is in the analysis of these potential scenarioswhere human intuition may be allowed to combinewith the predictive power of ML, and in doing so, mayﬂourish. By providing a timeline with a broad arrayof potential futures, users can explore these potentialsin a way that maximizes and prioritizes their expertiseAND intuition because they are now aﬀorded access tomultiple potential emerging patterns, instead of hav-ing a single dominant pattern presented to them. Theform that NODEs take, therefore, aﬀords and encour-ages a kind of ”information foraging” [9, 45], wherenew emerging patterns are allowed to be consideredrather than ruled out preemptively. NODE trajecto-ries also allow for the exploration of various narra-tives, arranging and displaying data in a format thatnatively makes sense to human experts. The strengthsof NODEs illustrated here- a distribution of trajecto-ries along a timeline that aﬀords easy access to pre-dictive boundaries of the machine while allowing mul-tiple potential future scenarios to be explored- emergeas a natural side eﬀect of the architecture. In otherwords, in the same way that conveying risk throughthe use of frequency statements naturally enables peo-ple to grasp statistical information and make betterdecisions, so too do NODE architectures in machinelearning. Table 4 summarizes the main advantages ofour hybrid human-AI approach approach with respectto the classical RNN approaches.To demonstrate these claims, we ran an analysis ofthe MIMIC-II dataset, which is also studied in [39].This dataset is quite complex, full of real-life datathat is at times noisy, sometimes incomplete, and hasmuch inter-patient variability. These conditions rep-resent many of the characteristics that can harm thepredictive power and learning of an ML algorithm,and make interpretation even more diﬃcult. By ana-lyzing this data, we aim to demonstrate to the readerthe many inherent strengths of NODE architecture. Adiscussion of our ﬁndings will follow our methodologybelow. A real-life example: ICU patientstrajectories

Our ﬁrst step was to analyse a slightly modiﬁed ver-sion of the algorithm trained in [39]. For our study,training time was extended; better and more vari-able reconstructions were triggered by reducing thenoise parameter, thus limiting the power of the en-coder power and increasing the internal ODE weights.Two samples (patients) with the two possible out-comes (survival and death) were randomly chosen tostudy the predictions.Figure 2 represents a 48 hour window of time. Eachbox represents a diﬀerent measurement category (i.e.,inspired O2, Heart Rate, etc). The original measure-ments (blue dots) are displayed. As the reader cansee, some measurements are sparser than others. Thisrepresents the various inaccuracies and inconsistenciesof the data. For example, the arterial blood pressurefor patient B is only measured during the second day.Using these measurements, multiple reconstructions,corresponding to the duration of data fed to the algo-rithm, are conducted for each feature: the solid linescorrespond to the reconstructions where original datais known, whereas the dotted lines of the curves corre-spond to an extrapolated estimation of the patient’sfuture. Multiple dotted colored lines indicate multiplepotential futures.As we can see, for parts with completely missingobservations, the algorithm tends to estimate its val-ues, knowing all the other measured features and thecharacteristics of the dataset. These curves are notﬂat, so this does not correspond to an imputation tothe mean. Note also that, for these missing features,the algorithm reﬁnes the shape of the estimation curveas information grows. Some short-scale variations arenot well reconstructed by the latent ODE favoring asmooth curve, as the heart rate peaks around the 24thhour of patient B. This shows a direction to improvecurrent NODE models.Take for example, patient A. If we look closelyat the Glascow Coma Scale (GCS), we can see thatthe model initially projects an improvement, as seenby the orange and green curves which correspond to1/5th and 2/5ths of our 48-hour window (roughly theﬁrst 20 hours). We see, however, that these projec-tions quickly become accurate when enough data isaggregated. The red line projects what might be con-sidered a median outcome, and has a slightly moredistant horizon of predictability, while the purple lineand ﬁnally the brown lines show little or no improve-ment on the Glasgow Coma Scale. The brown lineremains solid throughout the 48 hour window, indi-cating that high predictive validity and conﬁdence.Because we have overlaid the actual measurementsof this patient, we can see that the actual GCS datanever improved throughout this 48 hour window, thus igure 2: Example results from the latent NODEs model on the ICU data. Medical observation from two patients A and B are shown in blue for 4 normalized features: fraction of inspired oxygen (FiO2), Glasgow Coma Scale, Heart Rate and Arterialpressure of Oxygen. The diﬀerent curves show the reconstruction and extrapolation predicted by the model if given a duration of 1/5th, 2/5th,... of data from the beginning of the time series. We see that for some features like the Glasgow Coma Scaleor the FiO2, reconstructions for patient B tend to follow the tendency of the real feature. The mortality prediction plot showsthe model prediction of the in-hospital mortality. Given the 48 hours of data, the system is able to predict the death of patientA and the survival of patient B several days later. validating the brown line’s prediction.The last subplot of Figure 2 represents the mor-tality prediction: for each duration of given data alatent trajectory is drawn by the system, from whicha simple neural classiﬁer computes mortality chances.For patient A, the mortality prediction stays low atthe beginning but rises quickly and ultimately crossesthe threshold just before the 48-hour mark, indicatingthat the system predicts patient A will not survive.We might infer from the data that this prognosis isdue to the stable and deteriorated coma state. Al-though the data shown here is not suﬃcient alone tomake a full cause of death analysis, this simple exam-ple demonstrates the ease with which one can accessthis data and quickly make sense of the underlyingconnections and their subsequent eﬀects on the pre-dicted outcome.For another example, let us examine patient B. Themortality prediction for patient B remains low andeven decreases after 24 hours showing the model’sconﬁdence in its prediction. It is important to notethat the mortality predictions are being made asnew data arrives across this 48 hour period. Alongthose 48 hours are modelled events (i.e., reconstruc-tions) that originate directly from the ODE architec-ture. Both the reconstructions and the mortality pre-dictions demonstrated here illustrate that the latentODE architecture can handle complex sparse real-lifedata in a manner that is human-understandable and intuitive, while remaining highly accurate. Conclusion

In the previous section, we illustrated that the NODEarchitecture is capable of reconstructing a real lifedataset, and have demonstrated how an expert mightexplore the data and produce a narrative in ac-cordance with the NODE’s results and predictions.When attempting to make a prognosis, the ability tovisualize in detail the system’s future evolution aidsthe expert in generating a narrative about the sys-tem. The ease of use aﬀorded by NODEs, combinedwith multiple future projections provide simple butpowerful insights that extend the human horizon ofpredictability beyond normal limits, and does so in away that minimizes bias and maximizes trust in thedata.The latent ODE architecture aﬀord the user thepossibility to add new hypothetical measurements inthe future, and enable the user to ask the system forthe most probable paths that led there. For instance,the expert might choose a speciﬁc curve that leads toa region of the feature space that is close to a danger-ous situation, and make the following query: ”if thesystem crosses the frontier of the dangerous region,what happens next, and how did the system evolve toend up here?”. This is depicted as the blue line in Fig- he new narrative is based on the query’s answer : “Given the prediction, the patient should have reached an intermediary state near the danger line in approximately 2 days” Initial PrognosisReal data

RNN NODEs

Patient Danger line

Day 1 Day 2 Day 3 Day 4 Day 5

Latent ODE QUERY

Figure 3: Compared to RNNs, an ODE-based approach produces smooth curves which can be evaluated at any point of thetrajectory. Once measured real datapoints are fed to the machine, estimations of its extrapolation can be produced (greencurve). The expert user can ask queries such as ”what is the trajectory of a patient who gets close to a dangerous situation(blue dot)? The latent ODE then constructs a family of most likely trajectories that passes through this newly added point.This extra-information will help the experts to construct a narrative that is compatible with their knowledge, reinforcing theirdecision process, or to explore the complex family of possible trajectories by asking more speciﬁc queries. ure 3. As you can see in this ﬁgure, the system thatends up close to the dangerous region at the end ofthe third day does not cross the frontier with this re-gion, so the user may be conﬁdent that this situationis not a concern.

Towards augmented decision-making

Hybrid Human-AI predictive systems could lead theway to a new generation of augmented decision-making solutions, and provide radical advances inreadiness and response to still unpredictable events.Predictive agents built on intrinsically explainable MLarchitectures such as NODEs would oﬀer objectivitywhen the rational foundations of a prediction are stilldisputed, and would provide dynamical representa-tions to facilitate early adoption of humanly unpre-dictable scenarios, in the respect of the expert’s worldview. As an ultimate result, these proposed predic-tive agents could allow users to re-code nonrepresenta-tional knowledge (i.e., intuition) into a dynamic rep-resentation of the data, thus leveraging the modelingpower and advantage of diﬀerential equations.A concrete application in the medical ﬁeld couldbe the prediction of risks in Post Traumatic StressDisorder (PTSD). PTSD is very diﬃcult to modeland project future states early, soon after a trau-matic event (a time often referred to as the ”blindzone”). Because many symptoms of PTSD are dif-ﬁcult to detect and measure (suﬀering, malaise, de-pression, suicidal thoughts, etc), creating models that make accurate prognoses is exponentially diﬃcult. Inour proposed system built on a NODE architecture,a predictive agent could encode the subject’s evolu-tion patterns into a NODE, and run a simulation ofthe possible future threats, providing then a concisedescription of the estimated risks. Thanks to a moreaccurate prediction, the physician, during the medi-cal check-up, could decide faster whether to includeor not the subject in a speciﬁc process of care.

Perspectives

Achieving the vision of humans leveraging the predic-tive power of ML in a synergistic team relationshipwill take much planning and work, much of whichis beyond mere model development. The ﬁrst stepshould be to select and build models that are intrinsi-cally understandable to human beings, and that nat-urally aﬀord enhanced insight and support better de-cision making. We have demonstrated here one suchsystem, built upon a robust and time-tested math-ematical approach to modelling generative processesover time. Our demonstration, we hope, illustrateshow the use of NODEs in medical prognoses is supe-rior to any explanation attempt of black box models,and also supports user’s natural intuition as a conse-quence of its design.

Non-interpretable features:

When the fea-tures are not intuitive to interpret for a given expert,it can be diﬃcult to generate a narrative merely from NN(VAE RNN) NODE(Latent ODE)

INTERACTIVE QUERYLONG TERM PROJECTION OF NEW NARRATIVES INTUITIVE PREDICTIONSMOOTH CURVE NARRATIVES DETAILEDBRUTE FORCESHORT TERMSTEPPED CURVE Post X-AI FUTURE RESTROSPECTIVE+PROSPECTIVE ACCESS TO INTERMEDIARY STATES

HUMAN-AI AGENT

Classic approaches are over-passed by NODEs in prediction. NODEs allow interactive Agents.123

Table 4: (1) RNNs deliver multiple future trajectories which require brute force analysis. NODEs oﬀer interactive reconstructionof the past and future of the query point to intuit the plausibility of the new narrative. (2) RNNs make discrete predictionsthat do not allow the user to access intermediary states. NODEs help the understanding of intermediary states to rebuild arelevant narrative. (3) RNNs’ horizon of predictability is short due to discrete predictions. NODEs give long term and highlyaccurate information without the need for explainability. An Interactive Agent will develop plausible narratives that supportexpert intuition to enhance the capacity to prevent disruptive changes. extrapolated data. Doing so is the equivalent of at-tempting to convince someone of a diﬀerent opinionor perspective- an eﬀort with low historical likelihoodof success. To help the construction of narratives andthe interactions with a predictive agent, an interestingdirection would be to extract additional variables ofinterest, that are distinct from the measured features.For instance, in the case of ICU patients, it could beinteresting to have machine learning algorithms thatextract from the latent trajectory the occurrence ofspeciﬁc events about diﬀerent systems (respiratory,cardiac, etc.) categorized by physicians to help sup-porting narratives. The mechanistic representation ofthe expert decision making, even if incomplete, couldcontain, for example, mutually exclusive symptomsappearing in a time frame deﬁned by physical bounds,i.e. critical event intensity.An additional algorithm could be used to extractinformation from the intractable latent space to aug-ment the basic information in expert knowledge. Forthis step, we could either use classical or powerful deeplearning algorithms, since extracted data are not yetsubject to explainability. Doing so could be framedas adding prior basic knowledge to the equation res-olution. In the ﬁeld of physics, to model systemsconserving their total energy, it is possible to add anenergy constraint to the NODE, to ensure that tra-jectories satisfy this condition. In technical terms,this is enforced using a Hamiltonian structure on theNODE, and the corresponding machine learning algo-rithm is studied in depth in [37]. This sensibility toprior knowledge needs to be investigated, in particularfor real world datasets. To conﬁrm the usefulness of these additionally ex-tracted variables , it would then be necessary to con-duct trials: the recommendation system would betested by experts with or without this add-on andevaluated for machine prediction acceptability. Thisis our proposed plan for the future.In conclusion, we have demonstrated the potentialutility of using NODE architecture on real-life datato enhance and improve human prognosis in medi-cal decisions. We have illustrated the beneﬁts, bothintrinsic and designed, of such an architecture, andhave discussed why these beneﬁts are likely to enhancehuman-machine teaming and technology acceptanceof ML in expert domains such as medicine.

This work was supported by the US Oﬃce of NavalResearch Global : ONRG - Research Grant - [N62909-20-1-2076]. eferences [1] Mason Marks. “Robots in space: Sharingour world with autonomous delivery ve-hicles”. In: Available at SSRN 3347466 (2019).[2] Konstantinos G Liakos et al. “Machinelearning in agriculture: A review”. In:

Sen-sors nature

BMC med-ical informatics and decision making

Information Fu-sion

58 (2020), pp. 82–115.[7] E. S. Vorm. “Assessing demand for trans-parency in intelligent systems using ma-chine learning”. In: (2018).[8] Franz Wotawa et al. “Quality assurancemethodologies for automated driving”. In: e & i Elektrotechnik und Informationstech-nik

Psychological review

Thinking, fast and slow .Macmillan, 2011.[11] Jerad H Moxley et al. “The role of intuitionand deliberative thinking in experts’ supe-rior tactical decision-making”. In:

Cognition

Legaland Criminological Psychology

Journal of management

NatureReviews Neuroscience

Journal of Decision Systems (2020), pp. 1–19.[16] Robert Earl Patterson and Robert G Eggle-ston. “Human–Machine Synergism in High-Level Cognitive Functioning: The HumanComponent”. In:

IEEE Transactions onEmerging Topics in Computational Intelli-gence

Cognitive systems engineer-ing: The future for a changing world (2017),pp. 137–163.[18] Steven Strogatz.

Inﬁnite powers: How cal-culus reveals the secrets of the universe .Houghton Miﬄin Harcourt, 2019.[19] Eric Woillez and Freddy Bouchet. “Instan-tons for the destabilization of the inner So-lar System”. In:

Physical Review Letters

Nature

Icarus

Chaotic evolu-tion of the solar system . 1992.[23] George Armitage Miller. “The magicalnumber seven, plus or minus two: Some lim-its on our capacity for processing informa-tion”. In:

Psychological review

63 (1956),pp. 81–97.1224] Gerd Gigerenzer and Adrian Edwards.“Simple tools for understanding risks: frominnumeracy to insight”. In:

Bmj

Medical Decision Making

Journal of graduatemedical education

PloS one

Social sci-ence & medicine

Jour-nal of personality and social psychology

Cognition

SARS-CoV-2 Reference Panel Com-parative Data . 2020. url : .[32] Ulrich Hoﬀrage et al. Communicating sta-tistical information . 2000.[33] Marilyn M Schapira, Ann B Nattinger, andColleen A McHorney. “Frequency or prob-ability? A qualitative study of risk com-munication formats used in health care”.In:

Medical Decision Making arXiv preprintarXiv:1806.07366 (2018). [35] Xuechen Li et al. “Scalable gradients forstochastic diﬀerential equations”. In:

In-ternational Conference on Artiﬁcial In-telligence and Statistics . PMLR. 2020,pp. 3870–3882.[36] Robert Strauss. “Augmenting NeuralDiﬀerential Equations to Model UnknownDynamical Systems with IncompleteState Information”. In: arXiv preprintarXiv:2008.08226 (2020).[37] Yaofeng Desmond Zhong, Biswadip Dey,and Amit Chakraborty. “Symplecticode-net: Learning hamiltonian dynam-ics with control”. In: arXiv preprintarXiv:1909.12077 (2019).[38] Lingkai Kong, Jimeng Sun, and ChaoZhang. “SDE-net: Equipping deep neuralnetworks with uncertainty estimates”. In: arXiv preprint arXiv:2008.10546 (2020).[39] Yulia Rubanova, Ricky TQ Chen, andDavid Duvenaud. “Latent odes forirregularly-sampled time series”. In: arXiv preprint arXiv:1907.03907 (2019).[40] Kaiqi Yang et al. “Self-Supervised Learningand Prediction of Microstructure Evolutionwith Recurrent Neural Networks”. In: arXivpreprint arXiv:2008.07658 (2020).[41] Zichao Long et al. “Pde-net: Learning pdesfrom data”. In:

International Conference onMachine Learning . PMLR. 2018, pp. 3208–3216.[42] Zichao Long, Yiping Lu, and Bin Dong.“PDE-Net 2.0: Learning PDEs from datawith a numeric-symbolic hybrid deepnetwork”. In:

Journal of ComputationalPhysics

399 (2019), p. 108925.[43] Yulia Rubanova. latent ode . https : / /github.com/YuliaRubanova/latent_ode .2019.[44] Sebastiano Barbieri et al. “Benchmarkingdeep learning architectures for predictingreadmission to the ICU and describingpatients-at-risk”. In: Scientiﬁc reports