Is this you? Create Your Porfile

Colin G. Walsh

Vanderbilt University Medical Center

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Colin G. Walsh is active.

Explore More

Publication

Featured researches published by Colin G. Walsh.

Clinical psychological science | 2017

Predicting Risk of Suicide Attempts Over Time Through Machine Learning

Colin G. Walsh; Jessica D. Ribeiro; Joseph C. Franklin

Traditional approaches to the prediction of suicide attempts have limited the accuracy and scale of risk detection for these dangerous behaviors. We sought to overcome these limitations by applying machine learning to electronic health records within a large medical database. Participants were 5,167 adult patients with a claim code for self-injury (i.e., ICD-9, E95x); expert review of records determined that 3,250 patients made a suicide attempt (i.e., cases), and 1,917 patients engaged in self-injury that was nonsuicidal, accidental, or nonverifiable (i.e., controls). We developed machine learning algorithms that accurately predicted future suicide attempts (AUC = 0.84, precision = 0.79, recall = 0.95, Brier score = 0.14). Moreover, accuracy improved from 720 days to 7 days before the suicide attempt, and predictor importance shifted across time. These findings represent a step toward accurate and scalable risk detection and provide insight into how suicide attempt risk shifts over time.

international conference on data mining | 2015

Outcomes Prediction via Time Intervals Related Patterns

Robert Moskovitch; Colin G. Walsh; Fei Wang; George Hripcsak; Nicholas P. Tatonetti

The increasing availability of multivariate temporal data in many domains, such as biomedical, security and more, provides exceptional opportunities for temporal knowledge discovery, classification and prediction, but also challenges. Temporal variables are often sparse and in many domains, such as in biomedical data, they have huge number of variables. In recent decades in the biomedical domain events, such as conditions, drugs and procedures, are stored as time intervals, which enables to discover Time Intervals Related Patterns (TIRPs) and use for classification or prediction. In this study we present a framework for outcome events prediction, called Maitreya, which includes an algorithm for TIRPs discovery called KarmaLegoD, designed to handle huge number of symbols. Three indexing strategies for pairs of symbolic time intervals are proposed and compared, showing that the use of FullyHashed indexing is only slightly slower but consumes minimal memory. We evaluated Maitreya on eight real datasets for the prediction of clinical procedures as outcome events. The use of TIRPs outperform the use of symbols, especially with horizontal support (number of instances) as TIRPs feature representation.

Journal of Biomedical Informatics | 2017

Beyond Discrimination: a Comparison of Calibration Methods and Clinical Usefulness of Predictive Models of Readmission Risk

Colin G. Walsh; Kavya Sharman; George Hripcsak

BACKGROUNDnPrior to implementing predictive models in novel settings, analyses of calibration and clinical usefulness remain as important as discrimination, but they are not frequently discussed. Calibration is a models reflection of actual outcome prevalence in its predictions. Clinical usefulness refers to the utilities, costs, and harms of using a predictive model in practice. A decision analytic approach to calibrating and selecting an optimal intervention threshold may help maximize the impact of readmission risk and other preventive interventions.nnnOBJECTIVESnTo select a pragmatic means of calibrating predictive models that requires a minimum amount of validation data and that performs well in practice. To evaluate the impact of miscalibration on utility and cost via clinical usefulness analyses.nnnMATERIALS AND METHODSnObservational, retrospective cohort study with electronic health record data from 120,000 inpatient admissions at an urban, academic center in Manhattan. The primary outcome was thirty-day readmission for three causes: all-cause, congestive heart failure, and chronic coronary atherosclerotic disease. Predictive modeling was performed via L1-regularized logistic regression. Calibration methods were compared including Platt Scaling, Logistic Calibration, and Prevalence Adjustment. Performance of predictive modeling and calibration was assessed via discrimination (c-statistic), calibration (Spiegelhalter Z-statistic, Root Mean Square Error [RMSE] of binned predictions, Sanders and Murphy Resolutions of the Brier Score, Calibration Slope and Intercept), and clinical usefulness (utility terms represented as costs). The amount of validation data necessary to apply each calibration algorithm was also assessed.nnnRESULTSnC-statistics by diagnosis ranged from 0.7 for all-cause readmission to 0.86 (0.78-0.93) for congestive heart failure. Logistic Calibration and Platt Scaling performed best and this difference required analyzing multiple metrics of calibration simultaneously, in particular Calibration Slopes and Intercepts. Clinical usefulness analyses provided optimal risk thresholds, which varied by reason for readmission, outcome prevalence, and calibration algorithm. Utility analyses also suggested maximum tolerable intervention costs, e.g.,

bioRxiv | 2018

Significant shared heritability underlies suicide attempt and clinically predicted probability of attempting suicide

Douglas Ruderfer; Colin G. Walsh; Matthew W Aquirre; Jessica D. Ribeiro; Joseph C. Franklin; Manuel A. Rivas

1720 for all-cause readmissions based on a published cost of readmission of

Journal of Child Psychology and Psychiatry | 2018

Predicting suicide attempts in adolescents with longitudinal clinical data and machine learning

Colin G. Walsh; Jessica D. Ribeiro; Joseph C. Franklin

11,862.nnnCONCLUSIONSnChoice of calibration method depends on availability of validation data and on performance. Improperly calibrated models may contribute to higher costs of intervention as measured via clinical usefulness. Decision-makers must understand underlying utilities or costs inherent in the use-case at hand to assess usefulness and will obtain the optimal risk threshold to trigger intervention with intervention cost limits as a result.

Journal of Biomedical Informatics | 2018

Discovering hidden knowledge through auditing clinical diagnostic knowledge bases

Matthew C. Lenert; Colin G. Walsh; Randolph A. Miller

Suicide accounts for nearly 800,000 deaths per year worldwide with rates of both deaths and attempts rising. Family studies have estimated substantial heritability of suicidal behavior; however, collecting the sample sizes necessary for successful genetic studies has remained a challenge. We utilized two different approaches in independent datasets to characterize the contribution of common genetic variation to suicide attempt. The first is a patient reported suicide attempt phenotype from genotyped samples in the UK Biobank (337,199 participants, 2,433 cases). The second leveraged electronic health record (EHR) data from the Vanderbilt University Medical Center (VUMC, 2.8 million patients, 3,250 cases) and machine learning to derive probabilities of attempting suicide in 24,546 genotyped patients. We identified significant and comparable heritability estimates of suicide attempt from both the patient reported phenotype in the UK Biobank (h2SNP = 0.035, p = 7.12×10−4) and the clinically predicted phenotype from VUMC (h2SNP = 0.046, p = 1.51×10−2). A significant genetic overlap was demonstrated between the two measures of suicide attempt in these independent samples through polygenic risk score analysis (t = 4.02, p = 5.75×10−5) and genetic correlation (rg = 1.073, SE = 0.36, p = 0.003). Finally, we show significant but incomplete genetic correlation of suicide attempt with insomnia (rg = 0.34 - 0.81) as well as several psychiatric disorders (rg = 0.26 - 0.79). This work demonstrates the contribution of common genetic variation to suicide attempt. It points to a genetic underpinning to clinically predicted risk of attempting suicide that is similar to the genetic profile from a patient reported outcome. Lastly, it presents an approach for using EHR data and clinical prediction to generate quantitative measures from binary phenotypes that improved power for our genetic study.

Arthritis Care and Research | 2018

Outpatient Engagement Lowers Predicted Risk of Suicide Attempts in Fibromyalgia

Lindsey C. McKernan; Matthew C Lenert; Leslie J. Crofford; Colin G. Walsh

BACKGROUNDnAdolescents have high rates of nonfatal suicide attempts, but clinically practical risk prediction remains a challenge. Screening can be time consuming to implement at scale, if it is done at all. Computational algorithms may predict suicide risk using only routinely collected clinical data. We used a machine learning approach validated on longitudinal clinical data in adults to address this challenge in adolescents.nnnMETHODSnThis is a retrospective, longitudinal cohort study. Data were collected from the Vanderbilt Synthetic Derivative from January 1998 to December 2015 and included 974 adolescents with nonfatal suicide attempts and multiple control comparisons: 496 adolescents with other self-injury (OSI), 7,059 adolescents with depressive symptoms, and 25,081 adolescent general hospital controls. Candidate predictors included diagnostic, demographic, medication, and socioeconomic factors. Outcome was determined by multiexpert review of electronic health records. Random forests were validated with optimism adjustment at multiple time points (from 1xa0week to 2xa0years). Recalibration was done via isotonic regression. Evaluation metrics included discrimination (AUC, sensitivity/specificity, precision/recall) and calibration (calibration plots, slope/intercept, Brier score).nnnRESULTSnComputational models performed well and did not require face-to-face screening. Performance improved as suicide attempts became more imminent. Discrimination was good in comparison with OSI controls (AUCxa0=xa00.83 [0.82-0.84] at 720xa0days; AUCxa0=xa00.85 [0.84-0.87] at 7xa0days) and depressed controls (AUCxa0=xa00.87 [95% CI 0.85-0.90] at 720xa0days; 0.90 [0.85-0.94] at 7xa0days) and best in comparison with general hospital controls (AUC 0.94 [0.92-0.96] at 720xa0days; 0.97 [0.95-0.98] at 7xa0days). Random forests significantly outperformed logistic regression in every comparison. Recalibration improved performance as much as ninefold - clinical recommendations with poorly calibrated predictions can lead to decision errors.nnnCONCLUSIONSnMachine learning on longitudinal clinical data may provide a scalable approach to broaden screening for risk of nonfatal suicide attempts in adolescents.

Advances in Methods and Practices in Psychological Science | 2018

Enabling Open-Science Initiatives in Clinical Psychology and Psychiatry Without Sacrificing Patients’ Privacy: Current Practices and Future Challenges:

Colin G. Walsh; Weiyi Xia; Muqun Li; Joshua C. Denny; Paul A. Harris; Bradley Malin

OBJECTIVEnEvaluate potential for data mining auditing techniques to identify hidden concepts in diagnostic knowledge bases (KB). Improving completeness enhances KB applications such as differential diagnosis and patient case simulation.nnnMATERIALS AND METHODSnAuthors used unsupervised (Pearsons correlation - PC, Kendalls correlation - KC, and a heuristic algorithm - HA) methods to identify existing and discover new finding-finding interrelationships (properties) in the INTERNIST-1/QMR KB. Authors estimated KB maintenance efficiency gains (effort reduction) of the approaches.nnnRESULTSnThe methods discovered new properties at 95% CI rates of [0.1%, 5.4%] (PC), [2.8%, 12.5%] (KC), and [5.6%, 18.8%] (HA). Estimated manual effort reduction for HA-assisted determination of new properties was approximately 50-fold.nnnCONCLUSIONnData mining can provide an efficient supplement to ensuring the completeness of finding-finding interdependencies in diagnostic knowledge bases. Authors findings should be applicable to other diagnostic systems that record finding frequencies within diseases (e.g., DXplain, ISABEL).

JAMA Oncology | 2017

Observational Cohort Studies and the Challenges of In Silico Experiments

Colin G. Walsh; Kevin B. Johnson

Patients with fibromyalgia (FM) are 10 times more likely to die by suicide than the general population. The purpose of this study was to externally validate published models predicting suicidal ideation and suicide attempts in patients with FM and to identify interpretable risk and protective factors for suicidality unique to FM.

american medical informatics association annual symposium | 2014

Enabling claims-based decision support through non-interruptive capture of admission diagnoses and provider billing codes

Colin G. Walsh; David K. Vawdrey; Peter D. Stetson; Matthew R. Fred; George Hripcsak

The psychological and psychiatric communities are generating data on an ever-increasing scale. To ensure that society reaps the greatest utility in research and clinical care from such rich resources, there is significant interest in wide-scale, open data sharing to foster scientific endeavors. However, it is imperative that such open-science initiatives ensure that data-privacy concerns are adequately addressed. In this article, we focus on these issues in clinical research. We review the privacy risks and then discuss how they can be mitigated through appropriate governance mechanisms that are both social (e.g., the application of data-use agreements) and technological (e.g., de-identification of structured data and unstructured narratives). We also discuss the benefits and drawbacks of these mechanisms, particularly as regards data fidelity. Our focus is on de-identification methods that meet regulatory requirements, such as the Privacy Rule of the Health Insurance Portability and Accountability Act of 1996. To illustrate their potential, we show how the principles we discuss have been applied in a large-scale clinical database and distributed research networks. We close this article with a discussion of challenges in supporting data privacy as open-science initiatives grow in their scale and complexity.

Explore More