Learning Bundled Care Opportunities from Electronic Medical Records
You Chen, Abel N. Kho, David Liebovitz, Catherine Ivory, Sarah Osmundson, Jiang Bian, Bradley A. Malin
TTitle:
Learning Bundled Care Opportunities from Electronic Medical Records
Authors:
You Chen , Abel N. Kho , David Liebovitz , Catherine Ivory , Sarah Osmundson , Jiang Bian , and Bradley A. Malin Author Affiliations: Dept. of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN Institute for Public Health and Medicine, Northwestern University, Chicago, IL School of Medicine, University of Chicago, Chicago, IL School of Nursing, Vanderbilt University, Nashville, TN Dept. of Obstetrics and Gynecology, School of Medicine, Vanderbilt University, Nashville, TN Dept. of Health Outcomes and Policy, University of Florida, Gainesville, FL Dept. of Biostatistics, School of Medicine, Vanderbilt University, Nashville, TN Dept. of Electrical Engineering & Computer Science, School of Engineering, Vanderbilt University, Nashville, TN
To Whom Correspondence Should Be Addressed:
You Chen, Ph.D. Assistant Professor Department of Biomedical Informatics Vanderbilt University Nashville, TN 37203 USA Email: [email protected] Phone: +1 615 343 1939 Fax: +1 615 322 0502
Keywords:
Workflow; Clinical phenotyping; Network analysis, Data mining; Electronic medical record; Topic modeling; Bundled care. ighlights (1)
A data-driven approach to learn bundled care opportunities from electronic medical records. (2)
A strategy to infer association network of phenotypic and workflow patterns. (3)
An evaluation of bundled care opportunities with administrative and clinical experts.
Graphic Abstract BSTRACT
Objective:
The traditional fee-for-service approach to healthcare leads to the management of a patientβs conditions in an independent manner, inducing various negative consequences. It is recognized that a bundled care approach to healthcare - one that manages a collection of health conditions together - may enable greater efficacy and cost savings. However, it is not always evident which sets of conditions should be managed in a bundled manner. Thus, we investigated how a data-driven approach could be applied to automatically learn potential bundles and evaluate their plausibility.
Methods:
To accomplish this research, we designed a data-driven framework to infer clusters of health conditions, which we referred to as phenotypic patterns, via their shared clinical workflows, which we refer to as coordinating care patterns, from the data inherent in electronic medical records (EMRs). We applied the framework with approximately 16,500 inpatient stays from a large medical center. The plausibility of the inferred health condition clusters for bundled care was assessed through a survey (whose responses were analyzed via an analysis of variance (ANOVA) under a 95% confidence interval) of a panel of five experts. Furthermore, the face validity of the inferred health condition clusters was confirmed by evidence in the published literature.
Results:
The framework inferred four condition clusters: 1) fetal abnormalities, 2) late pregnancies, 3) prostate problems, and 4) chronic diseases (with congestive heart failure featuring prominently). Each cluster had evidence in the literature and was deemed to be plausible for bundled care via ANOVA on the survey responses under a 95% confidence interval.
Conclusions:
The findings suggest that data from EMRs can provide a basis for discovering new directions in bundled care. Still, translating such findings into actual care management will require further refinement, implementation, and evaluation. ackground and Significance
Under a fee-for-service healthcare model, each of a patientβs conditions is managed relatively independently [1-2]. This approach to care can lead to several problems, including delays in (or failure to deliver) service, treatment redundancies, and increased cost. In turn, these problems can lead to declines in quality, patient satisfaction, and cost effectiveness. It is anticipated that a shift from fee-for-service to pay-for-value has the potential to resolve (or at least reduce the severity of) many of these problems [3-5]. To realize this alternative vision, healthcare organizations (HCOs) are migrating towards a bundled care model, which is a middle ground between fee-for-service and capitation reimbursement, and aims to account for the interplay between various health conditions, rather than focus on each in isolation [6-7]. There are numerous challenges in realizing bundled care. Two of the more pressing are: 1) determining which health conditions would be appropriate for such care models and 2) minimize the cost of refining current healthcare systems to support bundled care. While HCOs already manage the complicated health needs of their patients (e.g., considering a set of health conditions together), such routines often arise in an ad hoc fashion and are not formalized and validated. As such, there is an opportunity to design a data-driven approach to learn collections of health conditions, which are managed together (e.g., a set of health conditions share similar workflows) and, thus, might be ripe for bundling. The data-driven approach may further be beneficial because, if models are based on current healthcare systems, HCOs could minimize the implementation costs of newly established, or the formalization of existing, management routines. There is growing evidence that data derived from electronic medical records (EMRs) can be leveraged to discover associations between health problems [8-14], infer clinical phenomena (e.g., phenotypic patterns [15-18), and model workflows (e.g., hospital care patterns [19-23). More ecently, it has been shown that the relationship between health problems and workflows can be learned for specific phenomena, such as congestive heart failure [24]. In this paper, we build on such observations and introduce an automated learning framework to discover more general collections of health conditions that share similar workflows according to EMR system utilization. We hypothesize that such collections of health conditions could be bundled and managed together based on their shared workflows. We accomplish this goal by applying a generative topic modeling strategy to infer phenotypic patterns from the data inherent in EMRs and workflow patterns from the utilization of EMRs by employees of a healthcare organization (HCO). We apply a community detection algorithm to infer clusters of phenotypic patterns that share workflow patterns. We evaluate this framework with four months of inpatient data (over 16,000 inpatient stays) from Northwestern Memorial Hospital (NMH) and prove the plausibility of inferred clusters of phenotypic patterns for bundled care via a survey with administrative and clinical experts. We further prove correlations of phenotypic patterns within each cluster via disease associations published in the literature. esearch Design and Methods
The general framework is composed of four parts: i) a workflow pattern inference module , which is based on the electronically documented actions of EMR users, ii) a phenotypic pattern inference module , based on patient-specific clinical phenomena indicated in an EMR (e.g., diagnosis codes), iii) an association module , which infers clusters of phenotypic patterns according to their sharing workflow patterns and iv) an evaluation module , including surveys from administrative and clinical experts to determine if inferred clusters of phenotypic patterns could be managed in a bundle way. We begin with a high-level overview of the models and then proceed with a deeper dive into each component. The general relationships between the workflow module, phenotypic model and association modeling algorithm are depicted in Figure 1.
Figure 1.
A high-level architecture for discovering associations between clinical workflows and phenotypes, which are further leveraged to infer clusters of phenotypes. (Legend: u = EMR user , h = EMR patient , d = diagnosis, p = phenotypic pattern and w = workflow pattern ) et π» = {β , β , β― , β π } be the set of patients, π = {π , π , β― , π π } be the set of action sequences (issued by the users of EMRs) and π· = {π , π , β― , π π } be the set of clinical phenomena (e.g., diagnosis codes). Each patient β i in H is defined as a sequence π π in S (as shown in Figure 1a) and a collection of clinical phenomena in D (as shown in Figure 1e). The set of workflow patterns π ={π€ , π€ , β― , π€ k } (Figure 1b-left) and phenotypic patterns π = {π , π , β― , π π } (Figure 1d-right) are learned from S and D , respectively. Specifically, a workflow pattern π€ π is defined as a probability distribution over a set of subsequences in πβ² = {π β² , π β² , β― , π β² π } (Figure 1b-left). π β² i is defined as a subsequence that is frequently occurring across the sequences in S . Similarly, a phenotypic pattern π π is a probability distribution over a set of diagnoses (e.g., see the three patterns in Figure 1d-right). A patient is explained by their affinity to workflow and phenotypic patterns through π π (Figure 1b-right) and π π (Figure 1d-left), respectively. For instance, as shown in Figure 1b-right, workflow pattern w has a probability of 0.8 of explaining the affinity between the sequence of patient h and w . The strength of association between workflow and phenotypic patterns is summarized in a matrix π |π|Γ|π| and is rooted in the common set of patients they explain. The collections of phenotypic patterns are inferred via the associations between the phenotypic patterns and the workflow patterns (as shown in Figure 1c). To focus on the information learned from the EMR, in this study, we rely on existing inference algorithms to learn workflow and phenotypic patterns. To orient the reader, we briefly review the algorithms, but refer the reader to [25] and [26] for the details. Workflow Pattern Inference Algorithm he workflow inference algorithm [25] infers workflow topics, where each topic refers to a workflow pattern,
π = {π€ , π€ , β― , π€ π } from sequences Sβ² via a modified Latent Dirichlet Allocation (LDA) algorithm [27-28]. Briefly, the set of workflow topics W is inferred from a matrix π |π»|Γ|πβ²| . Here, π |π»|Γ|πβ²| (π, π) corresponds to the number of times a subsequence π β² π was included within a patient sequence π π . π π corresponds to a matrix that specifies the likelihoods that the patientsβ sequences in S are explained by the topics in W. Figure 1b-right depicts examples of the probabilities of patientsβ sequences being explained by workflow topics. It is often the case that the fitness of an LDA model, and thus the number of topics k , is determined through an information theoretic measure, such as perplexity [27-28]. However, in our situation, we aim to determine the value that maximizes the separation between the workflow topics, which is more semantically meaningful for workflows. As such, we set k by minimizing the average covariance between the workflow topics (details in [25-26]). Phenotypic Pattern Modeling Algorithm
The phenotypic pattern inference algorithm [26] infers phenotypic topics
π = {π , π , β¦ , π π } also via a modified LDA method. Briefly, the set of phenotypic topics π is inferred from a matrix π |π»|Γ|π·| . Here, π |π»|Γ|π·| (π, π) corresponds to the number of times that diagnosis code π π was assigned to patient β π . Figure 1d-right depicts examples of three phenotypes as topics with two diagnoses. π π corresponds to a matrix that specifies the likelihoods that patients are explained by the topics in P . Figure 1d-left depicts examples of the probabilities of patientsβ conditions being explained by phenotypic topics. We use the same strategy invoked for workflow topics to set the number of topics for phenotypic topics, which we denote as q [25-26]. Measuring Associations
Each workflow and phenotypic topic is leveraged to explain the patients (Figure 1b and Figure 1d). We use the patients they explain in common to measure their association. Specifically, the degree of association between a workflow topic π€ π and a phenotypic topic π π is measured as the cosine similarity of their respective vectors: π¨ππππ(π€ π , π π ) = π π (π)Β·π π(π) |π π (π)||π π(π) | , (1) where π π (π) is a vector specifying the distribution of probabilities that a workflow topic π€ π explains each of the patients. For instance, as shown in Figure 1c, the first workflow explains four patients with the following vector of probabilities ( ο‘ h , ο± , ο‘ h , ο± , ο‘ h , ο± , ο‘ h , ο± ). Similarly, π π (π) is a vector specifying the distribution of probabilities that a phenotypic topic π π explains each of the patients. For instance, as shown in Figure 1d, the first phenotypic topic explains four patients with a vector of probabilities ( ο‘ h , ο± , ο‘ h , 0.9 ο± , ο‘ h , 0 ο± , ο‘ h , 0 ο± ). According to Equation (1), the association between the first workflow and phenotypic topic π΄π π ππ(π€ , π ) is 0.7891. Our goal is to infer clusters of phenotypic patterns that share similar workflow patterns. We suspect that each cluster would be a candidate for bundled care and management under similar workflows. Thus, we use a community detection algorithm [29] to infer clusters of phenotypic topics via their associations with workflow topics. We guided the algorithm using a heuristic that is based on the optimization of modularity [30], which is efficient (in running time) and effective (in quality of communities) for weighted and undirected graphs. Clusters with high modularity have dense connectivity of phenotypic and workflow topics within clusters and sparse connectivity across clusters. lausibility Evaluation for Bundled Care
We investigated if the clusters of phenotypic topics are appropriately managed in a bundled way. To do so, we designed a survey that consisted of paired ο‘ inferred , random ο± clusters of phenotypic topics, which we asked administrative and clinical experts to review for appropriateness in terms of bundled care. Each inferred phenotypic topic was represented as a list of the diagnoses (e.g., diagnostic codes) that exhibit the largest probabilities for a specific topic. A random cluster of phenotypic topics was generated by randomly selecting a number of phenotypic topics, and the number was set to be the same with the number of phenotypic topics within the inferred cluster. Each randomly selected phenotypic topic was also represented as a list of the diagnoses. Each random cluster was fixed to contain the same number of diagnoses as its inferred counterpart. Survey questions and analysis.
We recruited a set of experts to answer questions of the following form, β
To what extent do you believe health conditions in the displayed group can be managed in a bundled way? β For each question, we provided five candidate answers (in the form of
Not At All Likely , Slightly Likely , Moderately Likely , Very Likely and
Completely Likely ). To perform hypothesis testing, we converted these answers into values in the range 0 to 1 (e.g.,
Not = 0,
Slightly = 0.25 , Moderately =
Very = 0.75, and
Completely = 1). Further details about the survey design, including the specific questions, are provided in online Appendix A.
Given the responses, we conducted a series of hypothesis tests, each of which can be summarized as: β
For a given pair of ο‘ inferred, random ο± clusters of health conditions, experts can distinguish the inferred from the random in terms of bundled careβ . We applied a linear regression model and analysis of variance (ANOVA) [31] to test the significance of difference at the 95% confidence level. o achieve power of 0.8 with a standard deviation of 0.4 in the difference in expertsβ scores for inferred and random clusters, the required sample size was five respondents. As such, we invited five knowledgeable professionals with a diverse array of expertise (e.g., HCO management, internal medicine, and emergency care). Each participant was emailed an introduction to the goals of the research and a link to access a REDCap survey [32]. The response rate was 100% because all respondents agreed to participate in the survey beforehand. xperimental Design Dataset
This study focused on four months of inpatient EMR data from Northwestern Memorial Hospital (NMH), which was collected in 2015. In this data, an event corresponds to an instance of a chart access, each of which is associated with the userβs job title and a user-designated reason for the access. There were 1,138,317 total access events distributed over 16,569 patient processes. These events were generated by users with 144 job titles. Additionally, each patient was associated with a set of ICD-9 codes assigned after discharge from the hospital. The total number of unique ICD-9 codes for this set of patients is 4,543. In recognition of the fact that multiple ICD-9 codes may describe the same clinical phenomena [33-34], various phenotyping investigations (e.g., [35-36]) have adopted alternative vocabularies for the secondary analysis of EMRs, such as the Phenome-Wide Association Study (PheWAS) vocabulary [15]. PheWAS codes correspond to groups of ICD-9 codes more closely match clinical or genetic understandings of diseases and reduce variability in identifying diseases. Based on this expectation, we translated a patientβs ICD-9 codes to PheWAS codes, which compressed the space into 1,374 unique PheWAS codes.
Number of Topics
The number of workflow and phenotypic topics were determined by minimizing the similarity over the range of 15 to 35 possible topics. The similarity was minimized for each set of topics when k = q = 25. At this point, the workflows and phenotypes exhibited a minimum similarity of 0.003 and 0.031, respectively. esults To provide context for the findings, we begin with a depiction of the learned workflow and phenotypic topics. Next, we report on the clusters of phenotypic topics and the extent to which they were deemed plausible for bundled care by experts and had face validity according to evidence in the published literature.
Learned Workflow and Phenotypic Topics
Recall that each workflow and phenotypic topic is expressed as a probability distribution over terms (i.e., subsequences of actions and PheWAS codes, respectively). To illustrate each topic succinctly, we depict the 10 terms with the largest probabilities. This cutoff was selected because the terms beyond this point had a negligible contribution to the probability mass for the affiliated topic. Specifically, these terms contributed probabilities that were smaller than 0.01. We use ProM [37], a software tool for process mining, to visualize workflow topics as a directed graph. The graphs for all 25 workflow topics and their corresponding top 10 subsequences are provided in Appendix B. To orient the reader to workflow topics, we list workflow topic 15, consisting of two loops, as an example in Figure 2.
Figure 2.
The directed graph of an echocardiography-based prenatal workflow. This visualization is based on the 10 subsequences with largest probabilities for the workflow topic. Note that, in this diagram, a pair of + symbols represents the beginning and ending of a loop. he first loop resides between a
Radiology Technologist ( RAD ) and an
NMH Physician Hospitalist invoking Computerized Physician Order Entry ( CPOE ). Based on consultation with the experts, this loop was deemed to likely be associated with the process of an echocardiography, where a physician approves the quality of a radiological report or participates in the peer review process of a report. The second loop resides between an
NMH Physician CPOE and a
Patient Care Staff Nurse-Lactation . This loop is likely associated with a primary physician and staff nurse responsible for an inpatientβs care associated with obstetrics. Each phenotypic topic is expressed as a probability distribution over approximately 1,300 PheWAS codes. The top 10 PheWAS codes, and their associated probabilities, for each phenotypic topic is provided in Appendix C. Our author experts provided informal labels to summarize each of phenotypic topics. To better understand the phenotypic topics, we provide an example of the topics with the label of childbirth in Table 1. This topic shows that interventions are required for complicated pregnancies and delivery associated problems (e.g., short gestation, endocrine and metabolic disturbances of fetus or newborn).
Table 1.
The top 10 PheWAS codes in a phenotypic topic that are the most indicative of childbirth.
PheWAS Code Description Probability lusters of Phenotypic Topics and Associated Workflow Topics
The modularity of the clusters of phenotypic topics was 0.62 in a [0,1] range. This indicates that the phenotypic topics and workflow topics within each cluster exhibited strong associations, while they exhibited weak associations between clusters. Figure 3 depicts the four inferred clusters of phenotypic topics (shown in blue, green, purple and red) and their affiliated workflow topics.
Figure 3.
Four clusters of phenotypic topics inferred via their sharing workflow topics. The edges represent the association strength between phenotypic and workflow topic. The wider the edge, the stronger associations between phenotypic and workflow topics. (Legend: p = phenotypic topic and w = workflow topic ) Cluster C (in green) is associated with fetal abnormality; Cluster C (in red) is associated with late pregnancy; Cluster C is associated with prostate problems and its corresponding omplications (in purple); while cluster C is complex, but is associated with various chronic problems, including cerebrovascular disease, coronary atherosclerosis, congestive heart failure (CHF), diabetes, and kidney failure (blue). To gain a deeper understanding of the inferred clusters and their associated workflow patterns, let us consider C as an example. The health conditions affiliated with C are the following phenotypic topics: p : Birth trauma , p : Fetal abnormality , and p : Mother complicating pregnancy , which were associated with care patterns that incorporated the following workflow topics: w : Interactions between physicians and staff nurses, w : Interactions between physicians, anesthesiologists, advanced practice clinicians and pharmacists , w : Interactions between physicians and unit secretaries , w : Interactions between physicians, anesthesiologists and staff nurses , and w : Interactions between physicians, radiologists and unit secretaries . This suggests that pregnancy complications (e.g., fetal abnormality and mother complicating pregnancy) are managed in a bundled way, requiring communication between various clinicians, obstetricians, anesthesiologists, radiologists, nurses, pharmacists, and administrative assistants.
Plausibility of Phenotypic Clusters for Bundled Care
The results of the plausibility survey are provided in Table 2. It can be seen that the experts always scored the inferred clusters as the more plausible for bundled care. All four clusters were statistically significantly higher than the randomized cluster in terms of the respondentsβ scores using a 95% confidence interval). This suggests that the phenotypic clusters associated with fetal abnormality, late pregnancy, prostate problems and CHF are plausible candidates for bundled care. Additionally, to orient the reader to each phenotypic cluster, we provide each of them, along with an informal summary from our author experts, in Table 2.
Table 2.
Survey results for the knowledgeable experts ( n = 5) regarding the plausibility of the inference that phenotypic patterns in each cluster can be managed in a bundled manner. Each cluster of phenotypic patterns are represented by a list of PheWAS codes and a brief summary. Each row reports the distance between the Likert score of the inferred phenotypic cluster and its randomized counterpart. Note that a positive distance indicates the inferred cluster received a higher Likert score. (* = statistical significance at the 0.05 confidence level) Cluster PheWAS Codes and Descriptions Likert Score Difference P-value
Informal Description:
Fetal abnormality could lead to complicating pregnancy and additional delivery problems (e.g., fetal distress), which requires interventions such as birth trauma service. C
649 Other conditions of the mother complicating pregnancy 652 Malposition and malpresentation of fetus or obstruction 654 Abnormality pelvic soft tissues & organs complicating pregnancy 658 Problems associated with amniotic cavity and membranes 659 Indications for care or intervention related to labor and delivery NEC 663 Umbilical cord complications during labor and delivery 665 Obstetrical/birth trauma ο΄ -8* Informal Description:
Late pregnancy might suggest a larger size infant requiring intervention (e.g. use of suction or forceps) which may cause temporary skull injuries. C
637 Short gestation; low birth weight; and fetal growth retardation 645 Late pregnancy and failed induction 649 Other conditions of the mother complicating pregnancy 656 Other perinatal conditions 656.1 Perinatal jaundice/isoimmunization 665 Obstetrical/birth trauma 819 Skull fracture and other intracranial injury 1010 Other tests 1008 Internal injury to organs ο΄ -8* nformal Description: Anemia and hypogonadism are often considered complications of prostate cancer and can lead to bone loss. When the thyroid does not produce a sufficient amount of hormones, it can cause lower esophageal sphincter dysfunction. This allows stomach contents and digestive juices to enter the esophagus, which may lead to gastroesophageal reflux disease. C
244 Hypothyroidism 272.1 Hyperlipidemia 276.14 Hypopotassemia 285.9 Anemia NOS 327.32 Obstructive sleep apnea 401.1 Essential hypertension 495 Asthma 530.11 Gastroesophageal Reflux Disease 600 Hyperplasia of prostate 740.1 Osteoarthritis; localized ο΄ -4* Informal Description:
Cerebrovascular disease and coronary atherosclerosis are the most common cause of congestive heart failure (CHF); smoking and diabetes are associated with all of the three diseases. Depression is associated with coronary disease. The liver test abnormality and some renal failure may be seen in CHF. C ο΄ -5* ace Validity of Phenotypic Clusters according to Evidence in the Published Literature While the phenotypic clusters were deemed plausible for bundled care from care management perspective, we did not investigate if the health conditions within such clusters were clinically related. If appropriateness could be confirmed from both a care process and a clinical perspective, we anticipate that the identified clusters of phenotypic patterns would be better received by HCO administrators. Towards this goal, we performed an investigation into evidence for the inferred clusters of phenotypic patterns. Notably, we found evidence for each cluster. A summary of the evidence is shown in Table 3. For instance, within cluster c , bone loss is known to be caused by hypogonadism following prostate cancer [38]. Furthermore, acid reflux is known to be affiliated with thyroid problems [39]. Table 3.
Evidence from the literature supporting the face validity of phenotypic patterns within each inferred cluster.
Cluster Evidence of Associations in the Literature C ο· Birth trauma associated with fetal big size and fetal distress [40] ο· Trauma in pregnancy [41-42] C ο· Late pregnancy and child birth [43] ο· Mode of delivery in nulliparous women has an effect on neonatal intracranial injuries [44] ο· Most fetal injuries occur in late pregnancy [45] C ο· Bone loss following hypogonadism with prostate cancer [38] ο· The acid reflux-thyroid connection [39] ο· Anemia associated with advanced prostate cancer [46] C ο· Tobacco and alcohol usage had increased risk of mortality for cerebrovascular disease and liver disease [47] ο· Thrombotic complications in heart failure [48-49] ο· Associations among diabetes, kidney disease, and cardiovascular disease [50] iscussion
Main Findings
This pilot study has several notable implications. First, the findings suggest that HCOs have an opportunity to leverage inferred phenotypic patterns, along with their affiliated workflow patterns, to identify (or refine) bundled care models. For instance, for patients near childbirth, their conditions may be affiliated with phenotypic topics: p : Birth trauma , p : Fetal abnormality , and p : Mother complicating pregnancy , which were associated with care patterns incorporating workflow topics: w : associated with physicians and care staff nurses, w : associated with anesthesiologists , and w : associated with radiologists . Second, the associations between workflow and phenotypic topics provide an opportunity for HCOs to manage patients and conduct resource allocation more efficiently. For instance, if the volume of patients associated with childbirth increases, HCOs could dedicate a larger amount of resources to workflow topics w , w and w . Limitations and Next Steps
Despite the merits of our findings, there are several limitations that we wish to highlight for future investigations. First, this study focused on the development of a methodology to infer general collections of phenotypic patterns that share similar workflow patterns according to EMR system utilization. However, we did not validate the clinical meanings (e.g., semantic contexts) for each of the inferred phenotypes nor their workflow patterns. If such phenotypic and workflow patterns are to be used in care management applications, their semantic meanings will require further interpretation by administrative experts. Second, while all four phenotypic clusters were deemed plausible for bundled care, several associations within congestive heart failure cluster c were not clear to the experts. Specifically, here are a number of reasons why renal failure and liver diseases might co-occur in a patient, such that this cluster may be too general in nature. In this respect, our study indicates health conditions have the potential to be managed in a bundled way, but what precisely should be managed is an open question and will require guidance by process management experts. Third, we acknowledge that this is a pilot only, which focuses on a case study of four months of data from one HCO. As such, we only found four clusters of phenotypic patterns, which were showed to be suggestive for bundled care. It is unknown if the proposed strategy is directly generalizable to other healthcare systems to find more clusters of health conditions, which could have high opportunities to be managed in a bundled manner. onclusions In this paper, we introduced a data-driven framework to mine EMRs for clusters of health conditions that might benefit from bundled care. We evaluated the approach with four months of inpatient data from a large hospital system and found four clusters of phenotypic patterns, which were deemed plausible for bundled care by knowledgeable experts and evidence in the literature. We anticipate working with process management and clinical experts to assess the workflow patterns affiliated with each inferred cluster to figure out how these patterns can be incorporated together to provide bundled care. Furthermore, we plan to test the performance and efficacy of such the framework in other healthcare systems with more data. ppendices
Appendix A: Survey questions. Appendix B: Workflow topics, each of which is represented by its top 10 subsequences and visualized as a process graph via Business Process Model and Notation (BPMN) in ProM. Appendix C: Phenotypic topics, each of which is represented by its top 10 PheWAS codes. unding
This research was supported, in part, by the National Institutes of Health under grants R00LM011933 and R01LM010685.
Competing Interests Statement
The authors have no competing interests to declare.
Contributors
YC performed the data collection and analysis, methods design, hypotheses design, experiments design, evaluation and interpretation of the experiments, and writing of the manuscript. AK and DL performed data collection, evaluation and interpretation of the experiments and writing of the manuscript. CI, SO, and JB performed evaluations of inferred clusters of phenotypes, and writing of the manuscript. BM performed the data collection and analysis, evaluation and interpretation of the experiments, and writing of the manuscript.
Acknowledgements
The authors thank Daniel Schneider and Prasanth Nannapaneni for gathering and supplying the de-identified data from Northwestern Memorial Hospital analyzed in this investigation. eferences Mulley AG. The global role of health care delivery science: learning from variation to build health systems that avoid waste and harm.
J Gen Intern Med . 2013; 28, 646-653. 2.
Peterson MW. Emerging developments in postsecondary organization theory and research: fragmentation or integration.
Educational Researcher . 1985; 14:5β12 3.
Stange KC. The problem of fragmentation and the need for integrative solutions.
Ann Fam Med. 2009;
Zismer DK. The promise of the brand: how health system leaders are guiding the transition to health services integration.
J Healthc Manag . 2013; 58(1):12-14 5.
Committee on Quality of Health Care in America. Crossing the Quality Chasm: A New Health System for the 21st Century.
National Academy Press . 2001. 6.
McDonald KM, Schultz E, Albin L, et.al. Care Coordination Atlas Version 4 (Prepared by Stanford University under subcontract to American Institutes for Research on Contract No. HHSA290-2010-00005I). AHRQ Publication No. 14-0037- EF. Rockville, MD: Agency for Healthcare Research and Quality. 2014 7.
Berry LL, Beckham D. Team-based care at Mayo Clinic: a model for ACOs.
J Healthc Manag . 2014; 59(1):9-13. 8.
Emmert-Streib F, Tripathi S, Simoes R, et al. The human disease network: opportunities for classification, diagnosis and prediction of disorders and disease genes.
Syst. Biomed . 2013; 1:15β22. 9.
JanjiΔ V, PrΕΎulj N. Biological function through network topology: a survey of the human diseasome.
Brief Funct Genomics . 2012; 11:522-532. 10.
Linghu B, Snitkin ES, Hu Z, et al. Genome-wide prioritization of disease genes and identification of disease-disease associations from an integrated human functional linkage network.
Genome Biol . 2009; 10:R91. 11.
Schriml L, Arze C, et al. Disease ontology: a backbone for disease semantic integration.
Nucleic Acids Res . 2012; 40:D940β946. 12.
Ε½itnik M, JanjiΔ V, Larminie C, et al. Discovering disease-disease associations by fusing systems-level molecular data.
Sci Rep . 2013; 3:3202. 13.
Schulam P, Wigley F, Saria S. Clustering longitudinal clinical marker trajectories from electronic health data: applications to phenotyping and endotype discovery.
Proceedings of the AAAI Annual Conference . 2015; 2956-2964. 14.
Pivovarov R, Perotte AJ, Grave E, Angiolillo J, Wiggins CH, Elhadad N. Learning probabilistic phenotypes from hetergeneous EHR data.
Journal of Biomedical Informatics . 2015; 58:156-165. 15.
Denny JC, Ritchie MD, Basford MA, et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations.
Bioinformatics . 2010; 26:1205-1210. 16.
Hripcsak G, Albers DJ. Next-generation phenotyping of electronic health records.
J Am Med Inform Assoc . 2013; 20:117-121 17.
Ho J, Ghosh J, Steinhubl SR, Stewart WF, Denny JC, Malin BA, Sun J. Limestone: high-throughput candidate phenotype generation via tensor factorization.
Journal of Biomedical Informatics . 2014; 52: 199-211. 18.
Zheng T, Xie W, Xu l, Zhang Y, Yang G, Chen Y. A Machine Learning-based Framework to Identify Type 2 Diabetes through Electronic Health Records.
International Journal of Medical Informatics . 2017; 97:120-127. 9.
Chen Y, Lorenzi NM, Sandberg WS, Wolgast K, Malin BA. Identifying collaborative care teams through electronic medical record utilization patterns.
Journal of the American Medical Informatics Association.
DeFlitch C, Geeting G, Paz HL. Reinventing Emergency Department Flow via Healthcare Delivery Science.
HERD . 2015; 8:105-115. 21.
Hribar MR, Brown SR, Reznick LG, et al. Secondary use of EHR timestamp data: validation and application for workflow optimization.
AMIA Annu Symp . 2015;1909-1917. 22.
Huang ZX, Dong W, Wang F, et al. Medical inpatient journey modeling and clustering: a Bayesian hidden Markov model based approach.
AMIA Annu Symp . 2015;649-658. 23.
Merill JA, Sheehan BM, Carley KM, Stetson PD. Transition networks in a cohort of patients with congestive heart failure.
Appl. Clin. Inform . 2015; 6:548-564. 24.
Yan C, Chen Y, Li B, et al. Learning Clinical Workflows to Identify Subgroups of Heart Failure Patients.
AMIA Annu Symp . 2016; in press. 25.
Chen Y, Xie W, Gunter C, et al. Inferring clinical workflow efficiency via electronic medical record utilization.
AMIA Annu Symp . 2015; 416-425. 26.
Chen Y, Ghosh J, Bejan CA, et al. Building bridges across electronic health record systems through inferred phenotypic topics. J Biomed Inform. 2015; 55:482-493. 27.
Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation.
Journal of Machine Learning
Research. 2003; 3:993-1022. 28.
Newman D, Asuncion A, Smyth P, et al. Distributed inference for latent Dirichlet allocation. In:
Proceedings of Neural Information Processing Systems . 2007; 1-9. 29.
Blondel VD, Guillaume JL, Lambiotte R, et al. Fast unfolding of communities in large networks.
Journal of Statistical Mechanics: Theory and Experiment . 2008; 10:P1000. 30.
Newman MEJ. Modularity and community structure in networks.
Proc Nat Acad Sci USA . 2006;103 (23):8577β8696. 31.
David C, Hoaglinab E. The hat matrix in regression and ANOVA.
The American Statistician . 1978; 32(1):17-22. 32.
Harris PA, Taylor R, Thielke R, et al. Research electronic data capture (REDCap) - a metadata-driven methodology and workflow process for providing translational research informatics support.
J Biomed Inform .2009; 42:377-381. 33.
Chan M, Lim PL, Chow A, et al. Surveillance for Clostridium difficile infection: ICD-9 coding has poor sensitivity compared to laboratory diagnosis in site patients.
PLoS One . 2001; 6:e15603. 34.
Deych EB, Waterman AD, Yan Y, Nilasena DS, et al. Accuracy of ICD-9-CM codes for identifying cardiovascular and stroke risk factors.
Med Care . 2005; 43:480-485. 35.
Ludvigsson JF, Pathak J, Murphy S. Use of computerized algorithm to identify individuals in need of testing for celiac disease.
J Am Med Inform Assoc . 2013; 20:e306β310 36.
Pathak J, Kho AN, Denny, JC. Electronic health records-driven phenotyping: challenges, recent advances, and perspectives.
J Am Med Inform Assoc . 2013; 20:e206-11 37.
Van der, Aalst WMP, van Dongen BF, et.al. ProM 4.0: Comprehensive support for real process analysis. In: Kleijn J, Yakovlev A, eds,
Application and Theory of Petri Nets and Other Models of Concurrency . 2007; 4546:484-494. 38.
Mittan D, Lee S, Miller E, et al. Bone loss following hypogonadism in men with prostate cancer treated with GnRH analogs.
J Clin Endocrinol Metab . 2002; 87:3656-3661. 9.
Hamdan A, Jabbour J, Dowli A, Dahouk EI, Azar ST. Prevalence of Laryngopharyngeal reflux disease in patient diagnosied with hypothyroidism.
Acta Endocrinologica . 2012; 8(2):239-248 40.
Gordon M, Rich H, Deutschberger J, et al. The immediate and long-term outcome of obstetric birth trauma: I. Brachial plexus paralysis.
Am J Obst Gynecol . 1973; 117:51-56. 41.
Jovanovic-Petersona L, Petersona CM. Dietary manipulation as a primary treatment strategy for pregnancies complicated by diabetes.
J Am Coll Nutr. 1990;
Murphy NJ, Quinlan JD. Trauma in pregnancy: assessment, management, and prevention.
Am Fam Physician . 2014; 90:717-722. 43.
GΓΌlmezoglu AM, Crowther CA, Middleton P, et al. Induction of labour for improving birth outcomes for women at or beyond term.
Cochrane Database Syst Rev . 2012; 4:CD004945. 44.
Towner D, Castro MA, Eby-Wilkens E, et al. Effect of mode of delivery in nulliparous women on neonatal intracranial injury.
N Engl J Med . 1999; 341:1709-1714. 45.
Rothenberger D. Blunt maternal trauma: a review of 103 cases.
J Trauma . 1987; 18(3):173-179. 46.
Nalesnik JG, Mysliwiec AG, Canby-Hagino E. Anemia in men with advanced prostate cancer: incidence, etiology, and treatment,
Rev Urol . 2004; 6:1-4. 47.
Pednekara MS, Sansonea G, Guptaa PC. Association of alcohol, alcohol and tobacco with mortality: findings from a prospective cohort study in Mumbai (Bombay), India.
Alcohol . 2012; 46:139-146. 48.
Shantsila E, Lip G. Thrombotic complications in heart failure - an underappreciated challenge.
Circulation. 2014;
Piazza G, Seddighzadeh A, Goldhaber SZ. Heart failure in patients with deep vein thrombosis.
Am J Cardiol . 2008; 101:1056-1059. 50.
Johnson RJ, Segal MS, Sautin Y, et.al. Potential role of sugar (fructose) in the epidemic of hypertension, obesity and the metabolic syndrome, diabetes, kidney disease, and cardiovascular disease.