Tristan Naumann
Massachusetts Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tristan Naumann.
JMIR medical informatics | 2014
Omar Badawi; Thomas Brennan; Leo Anthony Celi; Mengling Feng; Marzyeh Ghassemi; Andrea Ippolito; Alistair E. W. Johnson; Roger G. Mark; Louis Mayaud; George B. Moody; Christopher Moses; Tristan Naumann; Vipan Nikore; Marco A. F. Pimentel; Tom J. Pollard; Mauro D. Santos; David J. Stone; Andrew Zimolzak
With growing concerns that big data will only augment the problem of unreliable research, the Laboratory of Computational Physiology at the Massachusetts Institute of Technology organized the Critical Data Conference in January 2014. Thought leaders from academia, government, and industry across disciplines—including clinical medicine, computer science, public health, informatics, biomedical research, health technology, statistics, and epidemiology—gathered and discussed the pitfalls and challenges of big data in health care. The key message from the conference is that the value of large amounts of data hinges on the ability of researchers to share data, methodologies, and findings in an open setting. If empirical value is to be from the analysis of retrospective data, groups must continuously work together on similar problems to create more effective peer review. This will lead to improvement in methodology and quality, with each iteration of analysis resulting in more reliability.
Translational Psychiatry | 2016
Anna Rumshisky; Marzyeh Ghassemi; Tristan Naumann; Peter Szolovits; Victor M. Castro; Thomas H. McCoy; Roy H. Perlis
The ability to predict psychiatric readmission would facilitate the development of interventions to reduce this risk, a major driver of psychiatric health-care costs. The symptoms or characteristics of illness course necessary to develop reliable predictors are not available in coded billing data, but may be present in narrative electronic health record (EHR) discharge summaries. We identified a cohort of individuals admitted to a psychiatric inpatient unit between 1994 and 2012 with a principal diagnosis of major depressive disorder, and extracted inpatient psychiatric discharge narrative notes. Using these data, we trained a 75-topic Latent Dirichlet Allocation (LDA) model, a form of natural language processing, which identifies groups of words associated with topics discussed in a document collection. The cohort was randomly split to derive a training (70%) and testing (30%) data set, and we trained separate support vector machine models for baseline clinical features alone, baseline features plus common individual words and the above plus topics identified from the 75-topic LDA model. Of 4687 patients with inpatient discharge summaries, 470 were readmitted within 30 days. The 75-topic LDA model included topics linked to psychiatric symptoms (suicide, severe depression, anxiety, trauma, eating/weight and panic) and major depressive disorder comorbidities (infection, postpartum, brain tumor, diarrhea and pulmonary disease). By including LDA topics, prediction of readmission, as measured by area under receiver-operating characteristic curves in the testing data set, was improved from baseline (area under the curve 0.618) to baseline+1000 words (0.682) to baseline+75 topics (0.784). Inclusion of topics derived from narrative notes allows more accurate discrimination of individuals at high risk for psychiatric readmission in this cohort. Topic modeling and related approaches offer the potential to improve prediction using EHRs, if generalizability can be established in other clinical cohorts.
Science Translational Medicine | 2016
Jerôme Aboab; Leo Anthony Celi; Peter Charlton; Mengling Feng; Mohammad M. Ghassemi; Dominic C. Marshall; Louis Mayaud; Tristan Naumann; Ned McCague; Kenneth Paik; Tom J. Pollard; Matthieu Resche-Rigon; Justin D. Salciccioli; David J. Stone
A “datathon” model combines complementary knowledge and skills to formulate inquiries and drive research that addresses information gaps faced by clinicians. In recent years, there has been a growing focus on the unreliability of published biomedical and clinical research. To introduce effective new scientific contributors to the culture of health care, we propose a “datathon” or “hackathon” model in which participants with disparate, but potentially synergistic and complementary, knowledge and skills effectively combine to address questions faced by clinicians. The continuous peer review intrinsically provided by follow-up datathons, which take up prior uncompleted projects, might produce more reliable research, either by providing a different perspective on the study design and methodology or by replication of prior analyses.
knowledge discovery and data mining | 2017
Jen J. Gong; Tristan Naumann; Peter Szolovits; John V. Guttag
Existing machine learning methods typically assume consistency in how semantically equivalent information is encoded. However, the way information is recorded in databases differs across institutions and over time, often rendering potentially useful data obsolescent. To address this problem, we map database-specific representations of information to a shared set of semantic concepts, thus allowing models to be built from or transition across different databases. We demonstrate our method on machine learning models developed in a healthcare setting. In particular, we evaluate our method using two different intensive care unit (ICU) databases and on two clinically relevant tasks, in-hospital mortality and prolonged length of stay. For both outcomes, a feature representation mapping EHR-specific events to a shared set of clinical concepts yields better results than using EHR-specific events alone.
Journal of Medical Internet Research | 2016
Leo Anthony Celi; Sharukh Lokhandwala; Robert Montgomery; Christopher Moses; Tristan Naumann; Tom J. Pollard; Daniel Spitz; Robert Stretch
Background Datathons facilitate collaboration between clinicians, statisticians, and data scientists in order to answer important clinical questions. Previous datathons have resulted in numerous publications of interest to the critical care community and serve as a viable model for interdisciplinary collaboration. Objective We report on an open-source software called Chatto that was created by members of our group, in the context of the second international Critical Care Datathon, held in September 2015. Methods Datathon participants formed teams to discuss potential research questions and the methods required to address them. They were provided with the Chatto suite of tools to facilitate their teamwork. Each multidisciplinary team spent the next 2 days with clinicians working alongside data scientists to write code, extract and analyze data, and reformulate their queries in real time as needed. All projects were then presented on the last day of the datathon to a panel of judges that consisted of clinicians and scientists. Results Use of Chatto was particularly effective in the datathon setting, enabling teams to reduce the time spent configuring their research environments to just a few minutes—a process that would normally take hours to days. Chatto continued to serve as a useful research tool after the conclusion of the datathon. Conclusions This suite of tools fulfills two purposes: (1) facilitation of interdisciplinary teamwork through archiving and version control of datasets, analytical code, and team discussions, and (2) advancement of research reproducibility by functioning postpublication as an online environment in which independent investigators can rerun or modify analyses with relative ease. With the introduction of Chatto, we hope to solve a variety of challenges presented by collaborative data mining projects while improving research reproducibility.
national conference on artificial intelligence | 2015
Marzyeh Ghassemi; Marco A. F. Pimentel; Tristan Naumann; Thomas Brennan; David A. Clifton; Peter Szolovits; Mengling Feng
national conference on artificial intelligence | 2018
Matthew B. A. McDermott; Tom Yan; Tristan Naumann; Nathan Hunt; Harini Suresh; Peter Szolovits; Marzyeh Ghassemi
arXiv: Learning | 2018
Marzyeh Ghassemi; Tristan Naumann; Peter Schulam; Andrew L. Beam; Rajesh Ranganath
arXiv: Computers and Society | 2018
Dina Levy-Lambert; Jen J. Gong; Tristan Naumann; Tom J. Pollard; John V. Guttag
arXiv: Computation and Language | 2018
Willie Boag; Tristan Naumann; Peter Szolovits