Huan Mo
Vanderbilt University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Huan Mo.
Journal of the American Medical Informatics Association | 2015
Huan Mo; William K. Thompson; Luke V. Rasmussen; Jennifer A. Pacheco; Guoqian Jiang; Richard C. Kiefer; Qian Zhu; Jie Xu; Enid Montague; David Carrell; Todd Lingren; Frank D. Mentch; Yizhao Ni; Firas H. Wehbe; Peggy L. Peissig; Gerard Tromp; Eric B. Larson; Christopher G. Chute; Jyotishman Pathak; Joshua C. Denny; Peter Speltz; Abel N. Kho; Gail P. Jarvik; Cosmin Adrian Bejan; Marc S. Williams; Kenneth M. Borthwick; Terrie Kitchner; Dan M. Roden; Paul A. Harris
Background Electronic health records (EHRs) are increasingly used for clinical and translational research through the creation of phenotype algorithms. Currently, phenotype algorithms are most commonly represented as noncomputable descriptive documents and knowledge artifacts that detail the protocols for querying diagnoses, symptoms, procedures, medications, and/or text-driven medical concepts, and are primarily meant for human comprehension. We present desiderata for developing a computable phenotype representation model (PheRM). Methods A team of clinicians and informaticians reviewed common features for multisite phenotype algorithms published in PheKB.org and existing phenotype representation platforms. We also evaluated well-known diagnostic criteria and clinical decision-making guidelines to encompass a broader category of algorithms. Results We propose 10 desired characteristics for a flexible, computable PheRM: (1) structure clinical data into queryable forms; (2) recommend use of a common data model, but also support customization for the variability and availability of EHR data among sites; (3) support both human-readable and computable representations of phenotype algorithms; (4) implement set operations and relational algebra for modeling phenotype algorithms; (5) represent phenotype criteria with structured rules; (6) support defining temporal relations between events; (7) use standardized terminologies and ontologies, and facilitate reuse of value sets; (8) define representations for text searching and natural language processing; (9) provide interfaces for external software algorithms; and (10) maintain backward compatibility. Conclusion A computable PheRM is needed for true phenotype portability and reliability across different EHR products and healthcare systems. These desiderata are a guide to inform the establishment and evolution of EHR phenotype algorithm authoring platforms and languages.
Journal of the American Medical Informatics Association | 2016
Wei-Qi Wei; Pedro L. Teixeira; Huan Mo; Robert M. Cronin; Jeremy L. Warner; Joshua C. Denny
OBJECTIVE To evaluate the phenotyping performance of three major electronic health record (EHR) components: International Classification of Disease (ICD) diagnosis codes, primary notes, and specific medications. MATERIALS AND METHODS We conducted the evaluation using de-identified Vanderbilt EHR data. We preselected ten diseases: atrial fibrillation, Alzheimers disease, breast cancer, gout, human immunodeficiency virus infection, multiple sclerosis, Parkinsons disease, rheumatoid arthritis, and types 1 and 2 diabetes mellitus. For each disease, patients were classified into seven categories based on the presence of evidence in diagnosis codes, primary notes, and specific medications. Twenty-five patients per disease category (a total number of 175 patients for each disease, 1750 patients for all ten diseases) were randomly selected for manual chart review. Review results were used to estimate the positive predictive value (PPV), sensitivity, andF-score for each EHR component alone and in combination. RESULTS The PPVs of single components were inconsistent and inadequate for accurately phenotyping (0.06-0.71). Using two or more ICD codes improved the average PPV to 0.84. We observed a more stable and higher accuracy when using at least two components (mean ± standard deviation: 0.91 ± 0.08). Primary notes offered the best sensitivity (0.77). The sensitivity of ICD codes was 0.67. Again, two or more components provided a reasonably high and stable sensitivity (0.59 ± 0.16). Overall, the best performance (Fscore: 0.70 ± 0.12) was achieved by using two or more components. Although the overall performance of using ICD codes (0.67 ± 0.14) was only slightly lower than using two or more components, its PPV (0.71 ± 0.13) is substantially worse (0.91 ± 0.08). CONCLUSION Multiple EHR components provide a more consistent and higher performance than a single one for the selected phenotypes. We suggest considering multiple EHR components for future phenotyping design in order to obtain an ideal result.
Journal of the American Medical Informatics Association | 2015
Chen Lin; Elizabeth W. Karlson; Dmitriy Dligach; Monica P. Ramirez; Timothy A. Miller; Huan Mo; Natalie S. Braggs; Vivian S. Gainer; Joshua C. Denny; Guergana Savova
OBJECTIVES To improve the accuracy of mining structured and unstructured components of the electronic medical record (EMR) by adding temporal features to automatically identify patients with rheumatoid arthritis (RA) with methotrexate-induced liver transaminase abnormalities. MATERIALS AND METHODS Codified information and a string-matching algorithm were applied to a RA cohort of 5903 patients from Partners HealthCare to select 1130 patients with potential liver toxicity. Supervised machine learning was applied as our key method. For features, Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) was used to extract standard vocabulary from relevant sections of the unstructured clinical narrative. Temporal features were further extracted to assess the temporal relevance of event mentions with regard to the date of transaminase abnormality. All features were encapsulated in a 3-month-long episode for classification. Results were summarized at patient level in a training set (N=480 patients) and evaluated against a test set (N=120 patients). RESULTS The system achieved positive predictive value (PPV) 0.756, sensitivity 0.919, F1 score 0.829 on the test set, which was significantly better than the best baseline system (PPV 0.590, sensitivity 0.703, F1 score 0.642). Our innovations, which included framing the phenotype problem as an episode-level classification task, and adding temporal information, all proved highly effective. CONCLUSIONS Automated methotrexate-induced liver toxicity phenotype discovery for patients with RA based on structured and unstructured information in the EMR shows accurate results. Our work demonstrates that adding temporal features significantly improved classification results.
Journal of the American Medical Informatics Association | 2017
Pedro L. Teixeira; Wei-Qi Wei; Robert M. Cronin; Huan Mo; Jacob P. VanHouten; Robert J. Carroll; Eric LaRose; S. Trent Rosenbloom; Todd L. Edwards; Dan M. Roden; Thomas A. Lasko; Richard A. Dart; Anne M Nikolai; Peggy L. Peissig; Joshua C. Denny
Objective: Phenotyping algorithms applied to electronic health record (EHR) data enable investigators to identify large cohorts for clinical and genomic research. Algorithm development is often iterative, depends on fallible investigator intuition, and is time- and labor-intensive. We developed and evaluated 4 types of phenotyping algorithms and categories of EHR information to identify hypertensive individuals and controls and provide a portable module for implementation at other sites. Materials and Methods: We reviewed the EHRs of 631 individuals followed at Vanderbilt for hypertension status. We developed features and phenotyping algorithms of increasing complexity. Input categories included International Classification of Diseases, Ninth Revision (ICD9) codes, medications, vital signs, narrative-text search results, and Unified Medical Language System (UMLS) concepts extracted using natural language processing (NLP). We developed a module and tested portability by replicating 10 of the best-performing algorithms at the Marshfield Clinic. Results: Random forests using billing codes, medications, vitals, and concepts had the best performance with a median area under the receiver operator characteristic curve (AUC) of 0.976. Normalized sums of all 4 categories also performed well (0.959 AUC). The best non-NLP algorithm combined normalized ICD9 codes, medications, and blood pressure readings with a median AUC of 0.948. Blood pressure cutoffs or ICD9 code counts alone had AUCs of 0.854 and 0.908, respectively. Marshfield Clinic results were similar. Conclusion: This work shows that billing codes or blood pressure readings alone yield good hypertension classification performance. However, even simple combinations of input categories improve performance. The most complex algorithms classified hypertension with excellent recall and precision.
Journal of the American Medical Informatics Association | 2015
Jie Xu; Luke V. Rasmussen; Pamela L Shaw; Guoqian Jiang; Richard C. Kiefer; Huan Mo; Jennifer A. Pacheco; Peter Speltz; Qian Zhu; Joshua C. Denny; Jyotishman Pathak; William K. Thompson; Enid Montague
OBJECTIVE To review and evaluate available software tools for electronic health record-driven phenotype authoring in order to identify gaps and needs for future development. MATERIALS AND METHODS Candidate phenotype authoring tools were identified through (1) literature search in four publication databases (PubMed, Embase, Web of Science, and Scopus) and (2) a web search. A collection of tools was compiled and reviewed after the searches. A survey was designed and distributed to the developers of the reviewed tools to discover their functionalities and features. RESULTS Twenty-four different phenotype authoring tools were identified and reviewed. Developers of 16 of these identified tools completed the evaluation survey (67% response rate). The surveyed tools showed commonalities but also varied in their capabilities in algorithm representation, logic functions, data support and software extensibility, search functions, user interface, and data outputs. DISCUSSION Positive trends identified in the evaluation included: algorithms can be represented in both computable and human readable formats; and most tools offer a web interface for easy access. However, issues were also identified: many tools were lacking advanced logic functions for authoring complex algorithms; the ability to construct queries that leveraged un-structured data was not widely implemented; and many tools had limited support for plug-ins or external analytic software. CONCLUSIONS Existing phenotype authoring tools could enable clinical researchers to work with electronic health record data more efficiently, but gaps still exist in terms of the functionalities of such tools. The present work can serve as a reference point for the future development of similar tools.
Arthritis & Rheumatism | 2017
Jayanth Doss; Huan Mo; Robert J. Carroll; Leslie J. Crofford; Joshua C. Denny
The differences between seronegative and seropositive rheumatoid arthritis (RA) have not been widely reported. We performed electronic health record (EHR)–based phenome‐wide association studies (PheWAS) to identify disease associations in seropositive and seronegative RA.
Journal of Clinical Investigation | 2016
Akshay Shekhar; Xianming Lin; Fangyu Liu; Jie Zhang; Huan Mo; Joshua C. Denny; Nancy J. Cox; Mario Delmar; Dan M. Roden; Glenn I. Fishman; David S. Park
Rapid impulse propagation in the heart is a defining property of pectinated atrial myocardium (PAM) and the ventricular conduction system (VCS) and is essential for maintaining normal cardiac rhythm and optimal cardiac output. Conduction defects in these tissues produce a disproportionate burden of arrhythmic disease and are major predictors of mortality in heart failure patients. Despite the clinical importance, little is known about the gene regulatory network that dictates the fast conduction phenotype. Here, we have used signal transduction and transcriptional profiling screens to identify a genetic pathway that converges on the NRG1-responsive transcription factor ETV1 as a critical regulator of fast conduction physiology for PAM and VCS cardiomyocytes. Etv1 was highly expressed in murine PAM and VCS cardiomyocytes, where it regulates expression of Nkx2-5, Gja5, and Scn5a, key cardiac genes required for rapid conduction. Mice deficient in Etv1 exhibited marked cardiac conduction defects coupled with developmental abnormalities of the VCS. Loss of Etv1 resulted in a complete disruption of the normal sodium current heterogeneity that exists between atrial, VCS, and ventricular myocytes. Lastly, a phenome-wide association study identified a link between ETV1 and bundle branch block and heart block in humans. Together, these results identify ETV1 as a critical factor in determining fast conduction physiology in the heart.
Studies in health technology and informatics | 2015
Guoqian Jiang; Harold R. Solbrig; Richard C. Kiefer; Luke V. Rasmussen; Huan Mo; Peter Speltz; William K. Thompson; Joshua C. Denny; Christopher G. Chute; Jyotishman Pathak
This study describes our efforts in developing a standards-based semantic metadata repository for supporting electronic health record (EHR)-driven phenotype authoring and execution. Our system comprises three layers: 1) a semantic data element repository layer; 2) a semantic services layer; and 3) a phenotype application layer. In a prototype implementation, we developed the repository and services through integrating the data elements from both Quality Data Model (QDM) and HL7 Fast Healthcare Inteoroperability Resources (FHIR) models. We discuss the modeling challenges and the potential of our system to support EHR phenotype authoring and execution applications.
Journal of Biomedical Informatics | 2016
Guoqian Jiang; Richard C. Kiefer; Luke V. Rasmussen; Harold R. Solbrig; Huan Mo; Jennifer A. Pacheco; Jie Xu; Enid Montague; William K. Thompson; Joshua C. Denny; Christopher G. Chute; Jyotishman Pathak
The Quality Data Model (QDM) is an information model developed by the National Quality Forum for representing electronic health record (EHR)-based electronic clinical quality measures (eCQMs). In conjunction with the HL7 Health Quality Measures Format (HQMF), QDM contains core elements that make it a promising model for representing EHR-driven phenotype algorithms for clinical research. However, the current QDM specification is available only as descriptive documents suitable for human readability and interpretation, but not for machine consumption. The objective of the present study is to develop and evaluate a data element repository (DER) for providing machine-readable QDM data element service APIs to support phenotype algorithm authoring and execution. We used the ISO/IEC 11179 metadata standard to capture the structure for each data element, and leverage Semantic Web technologies to facilitate semantic representation of these metadata. We observed there are a number of underspecified areas in the QDM, including the lack of model constraints and pre-defined value sets. We propose a harmonization with the models developed in HL7 Fast Healthcare Interoperability Resources (FHIR) and Clinical Information Modeling Initiatives (CIMI) to enhance the QDM specification and enable the extensibility and better coverage of the DER. We also compared the DER with the existing QDM implementation utilized within the Measure Authoring Tool (MAT) to demonstrate the scalability and extensibility of our DER-based approach.
Journal of Molecular Endocrinology | 2017
Kayla A. Boortz; Kristen E. Syring; Lynley D. Pound; Huan Mo; James K. Oeser; Owen P. McGuinness; Joshua C. Denny; Richard M. O’Brien
Genome-wide association study (GWAS) data have linked the G6PC2 gene to variations in fasting blood glucose (FBG). G6PC2 encodes an islet-specific glucose-6-phosphatase catalytic subunit that forms a substrate cycle with the beta cell glucose sensor glucokinase. This cycle modulates the glucose sensitivity of insulin secretion and hence FBG. GWAS data have not linked G6PC2 to variations in body weight but we previously reported that female C57BL/6J G6pc2-knockout (KO) mice were lighter than wild-type littermates on both a chow and high-fat diet. The purpose of this study was to compare the effects of G6pc2 deletion on FBG and body weight in both chow-fed and high-fat-fed mice on two other genetic backgrounds. FBG was reduced in G6pc2 KO mice largely independent of gender, genetic background or diet. In contrast, the effect of G6pc2 deletion on body weight was markedly influenced by these variables. Deletion of G6pc2 conferred a marked protection against diet-induced obesity in male mixed genetic background mice, whereas in 129SvEv mice deletion of G6pc2 had no effect on body weight. G6pc2 deletion also reduced plasma cholesterol levels in a manner dependent on gender, genetic background and diet. An association between G6PC2 and plasma cholesterol was also observed in humans through electronic health record-derived phenotype analyses. These observations suggest that the action of G6PC2 on FBG is largely independent of the influences of environment, modifier genes or epigenetic events, whereas the action of G6PC2 on body weight and cholesterol are influenced by unknown variables.