Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Gyorgy Simon is active.

Publication


Featured researches published by Gyorgy Simon.


ACM Computing Surveys | 2018

Mining Electronic Health Records (EHRs): A Survey

Pranjul Yadav; Michael Steinbach; Vipin Kumar; Gyorgy Simon

The continuously increasing cost of the US healthcare system has received significant attention. Central to the ideas aimed at curbing this trend is the use of technology in the form of the mandate to implement electronic health records (EHRs). EHRs consist of patient information such as demographics, medications, laboratory test results, diagnosis codes, and procedures. Mining EHRs could lead to improvement in patient health management as EHRs contain detailed information related to disease prognosis for large patient populations. In this article, we provide a structured and comprehensive overview of data mining techniques for modeling EHRs. We first provide a detailed understanding of the major application areas to which EHR mining has been applied and then discuss the nature of EHR data and its accompanying challenges. Next, we describe major approaches used for EHR mining, the metrics associated with EHRs, and the various study designs. With this foundation, we then provide a systematic and methodological organization of existing data mining techniques used to model EHRs and discuss ideas for future research.


Journal of General Internal Medicine | 2016

Statin Use, Diabetes Incidence and Overall Mortality in Normoglycemic and Impaired Fasting Glucose Patients

M. Regina Castro; Gyorgy Simon; Stephen S. Cha; Barbara P. Yawn; L. Joseph Melton; Pedro J. Caraballo

ABSTRACTBACKGROUNDThe association between the use of statins and the risk of diabetes and increased mortality within the same population has been a source of controversy, and may underestimate the value of statins for patients at risk.OBJECTIVEWe aimed to assess whether statin use increases the risk of developing diabetes or affects overall mortality among normoglycemic patients and patients with impaired fasting glucose (IFG).DESIGN AND PARTICIPANTSObservational cohort study of 13,508 normoglycemic patients (nu2009=u20094460; 33xa0% taking statins) and 4563 IFG patients (nu2009=u20091865; 41xa0% taking statin) among residents of Olmsted County, Minnesota, with clinical data in the Mayo Clinic electronic medical record and at least one outpatient fasting glucose test between 1999 and 2004. Demographics, vital signs, tobacco use, laboratory results, medications and comorbidities were obtained by electronic search for the period 1999–2004. Results were analyzed by Cox proportional hazards models, and the risk of incident diabetes and mortality were analyzed by survival curves using the Kaplan–Meier method.MAIN MEASURESThe main endpoints were new clinical diagnosis of diabetes mellitus and total mortality.KEY RESULTSAfter a mean of 6xa0years of follow-up, statin use was found to be associated with an increased risk of incident diabetes in the normoglycemic (HR 1.19; 95xa0% CI, 1.05 to 1.35; pu2009=u20090.007) and IFG groups (HR 1.24; 95%CI, 1.11 to 1.38; pu2009=u20090.0001). At the same time, overall mortality decreased in both normoglycemic (HR 0.70; 95xa0% CI, 0.66 to 0.80; pu2009<u20090.0001) and IFG patients (HR 0.77, 95xa0% CI, 0.64 to 0.91; pu2009=u20090.0029) with statin use.CONCLUSIONIn general, recommendations for statin use should not be affected by concerns over an increased risk of developing diabetes, since the benefit of reduced mortality clearly outweighs this small (19–24xa0%) risk.


Journal of Biomedical Informatics | 2017

Strategies for handling missing clinical data for automated surgical site infection detection from the electronic health record

Zhen Hu; Genevieve B. Melton; Elliot G. Arsoniadis; Yan Wang; Mary R. Kwaan; Gyorgy Simon

Proper handling of missing data is important for many secondary uses of electronic health record (EHR) data. Data imputation methods can be used to handle missing data, but their use for analyzing EHR data is limited and specific efficacy for postoperative complication detection is unclear. Several data imputation methods were used to develop data models for automated detection of three types (i.e., superficial, deep, and organ space) of surgical site infection (SSI) and overall SSI using American College of Surgeons National Surgical Quality Improvement Project (NSQIP) Registry 30-day SSI occurrence data as a reference standard. Overall, models with missing data imputation almost always outperformed reference models without imputation that included only cases with complete data for detection of SSI overall achieving very good average area under the curve values. Missing data imputation appears to be an effective means for improving postoperative SSI detection using EHR clinical data.


Journal of Medical Systems | 2017

Providers’ Response to Clinical Decision Support for QT Prolonging Drugs

Sunita Sharma; J. Martijn Bos; Robert F. Tarrell; Gyorgy Simon; Bruce W. Morlan; Michael J. Ackerman; Pedro J. Caraballo

Commonly used drugs in hospital setting can cause QT prolongation and trigger life-threatening arrhythmias. We evaluate changes in prescribing behavior after the implementation of a clinical decision support system to prevent the use of QT prolonging medications in the hospital setting. We conducted a quasi-experimental study, before and after the implementation of a clinical decision support system integrated in the electronic medical record (QT-alert system). This system detects patients at risk of significant QT prolongation (QTc>500ms) and alerts providers ordering QT prolonging drugs. We reviewed the electronic health record to assess the provider’s responses which were classified as “action taken” (QT drug avoided, QT drug changed, other QT drug(s) avoided, ECG monitoring, electrolytes monitoring, QT issue acknowledged, other actions) or “no action taken”. Approximately, 15.5% (95/612) of the alerts were followed by a provider’s action in the pre-intervention phase compared with 21% (228/1085) in the post-intervention phase (p=0.006). The most common type of actions taken during pre-intervention phase compared to post-intervention phase were ECG monitoring (8% vs. 13%, p=0.002) and QT issue acknowledgment (2.1% vs. 4.1%, p=0.03). Notably, there was no significant difference for other actions including QT drug avoided (p=0.8), QT drug changed (p=0.06) and other QT drug(s) avoided (p=0.3). Our study demonstrated that the QT alert system prompted a higher proportion of providers to take action on patients at risk of complications. However, the overall impact was modest underscoring the need for educating providers and optimizing clinical decision support to further reduce drug-induced QT prolongation.


knowledge discovery and data mining | 2008

Semi-supervised approach to rapid and reliable labeling of large data sets

Gyorgy Simon; Vipin Kumar; Zhi Li Zhang

In this paper, we propose a method, where the labeling of the data set is carried out in a semi-supervised manner with user-specified guarantees about the quality of the labeling. In our scheme, we assume that for each class, we have some heuristics available, each of which can identify instances of one particular class. The heuristics are assumed to have reasonable performance but they do not need to cover all instances of the class nor do they need to be perfectly reliable. We further assume that we have an infallible expert, who is willing to manually label a few instances. The aim of the algorithm is to exploit the cluster structure of the problem, the predictions by the imperfect heuristics and the limited perfect labels provided by the expert to classify (label) the instances of the data set with guaranteed precision (specificed by the user) with regards to each class. The specified precision is not always attainable, so the algorithm is allowed to classify some instances as dontknow. The algorithm is evaluated by the number of instances labeled by the expert, the number of dontknow instances (global coverage) and the achieved quality of the labeling. On the KDD Cup Network Intrusion data set containing 500,000 instances, we managed to label 96.6% of the instances while guaranteeing a nominal precision of 90% (with 95% confidence) by having the expert label 630 instances; and by having the expert label 1200 instances, we managed to guarantee 95% nominal precision while labeling 96.4% of the data. We also provide a case study of applying our scheme to label the network traffic collected at a large campus network.


Archive | 2007

Minds: Architecture & Design

Varun Chandola; Eric Eilertson; Levent Ertoz; Gyorgy Simon; Vipin Kumar

This chapter provides an overview of the Minnesota Intrusion Detection System (MINDS), which uses a suite of data mining based algorithms to address different aspects of cyber security. The various components of MINDS such as the scan detector, anomaly detector and the profiling module detect different types of attacks and intrusions on a computer network. The scan detector aims at detecting scans which are the percusors to any network attack. The anomaly detection algorithm is very effective in detecting behavioral anomalies in the network traffic which typically translate to malicious activities such as denial-of-service (DoS) traffic, worms, policy violations and inside abuse. The profiling module helps a network analyst to understand the characteristics of the network traffic and detect any deviations from the normal profile. Our analysis shows that the intrusions detected by HINDS are complementary to those of traditional signature based systems, such as SNORT, which implies that they both can be combined to increase overall attack coverage. MINDS has shown great operational success in detecting network intrusions in two live deployments at the University of Minnesota and as a part of the Interrogator architecture at the US Army Research Lab — Center for Intrusion Monitoring and Protection (ARL-CIMP).


International Journal of Medical Informatics | 2017

Advancing Alzheimer’s research: A review of big data promises

Rui Zhang; Gyorgy Simon; Fang Yu

OBJECTIVEnTo review the current state of science using big data to advance Alzheimers disease (AD) research and practice. In particular, we analyzed the types of research foci addressed, corresponding methods employed and study findings reported using big data in AD.nnnMETHODnSystematic review was conducted for articles published in PubMed from January 1, 2010 through December 31, 2015. Keywords with AD and big data analytics were used for literature retrieval. Articles were reviewed and included if they met the eligibility criteria.nnnRESULTSnThirty-eight articles were included in this review. They can be categorized into seven research foci: diagnosing AD or mild cognitive impairment (MCI) (n=10), predicting MCI to AD conversion (n=13), stratifying risks for AD (n=5), mining the literature for knowledge discovery (n=4), predicting AD progression (n=2), describing clinical care for persons with AD (n=3), and understanding the relationship between cognition and AD (n=3). The most commonly used datasets are AD Neuroimaging Initiative (ADNI) (n=16), electronic health records (EHR) (n=11), MEDLINE (n=3), and other research datasets (n=8). Logistic regression (n=9) and support vector machine (n=8) are the most used methods for data analysis.nnnCONCLUSIONnBig data are increasingly used to address AD-related research questions. While existing research datasets are frequently used, other datasets such as EHR data provide a unique, yet under-utilized opportunity for advancing AD research.


international conference on conceptual structures | 2007

DDDAS/ITR: A Data Mining and Exploration Middleware for Grid and Distributed Computing

Jon B. Weissman; Vipin Kumar; Varun Chandola; Eric Eilertson; Levent Ertoz; Gyorgy Simon; Seonho Kim; Jinoh Kim

We describe our project that marries data mining together with Grid computing. Specifically, we focus on one data mining application - the Minnesota Intrusion Detection System (MINDS), which uses a suite of data mining based algorithms to address different aspects of cyber security including malicious activities such as denial-of-service (DoS) traffic, worms, policy violations and inside abuse. MINDS has shown great operational success in detecting network intrusions in several real deployments. In sophisticated distributed cyber attacks using a multitude of wide-area nodes, combining the results of several MINDS instances can enable additional early-alert cyber security. We also describe a Grid service system that can deploy and manage multiple MINDS instances across a wide-area network.


Journal of General Internal Medicine | 2018

Diagnostic Discordance, Health Information Exchange, and Inter-Hospital Transfer Outcomes: a Population Study

Michael G. Usher; Nishant Sahni; Dana Herrigel; Gyorgy Simon; Genevieve B. Melton; Anne M. Joseph; Andrew Olson

BackgroundStudying diagnostic error at the population level requires an understanding of how diagnoses change over time.ObjectiveTo use inter-hospital transfers to examine the frequency and impact of changes in diagnosis on patient risk, and whether health information exchange can improve patient safety by enhancing diagnostic accuracy.DesignDiagnosis coding before and after hospital transfer was merged with responses from the American Hospital Association Annual Survey for a cohort of patients transferred between hospitals to identify predictors of mortality.ParticipantsPatients (180,337) 18xa0years or older transferred between 473 acute care hospitals from NY, FL, IA, UT, and VT from 2011 to 2013.Main MeasuresWe identified discordant Elixhauser comorbidities before and after transfer to determine the frequency and developed a weighted score of diagnostic discordance to predict mortality. This was included in a multivariate model with inpatient mortality as the dependent variable. We investigated whether health information exchange (HIE) functionality adoption as reported by hospitals improved diagnostic discordance and inpatient mortality.Key ResultsDiscordance in diagnoses occurred in 85.5% of all patients. Seventy-three percent of patients gained a new diagnosis following transfer while 47% of patients lost a diagnosis. Diagnostic discordance was associated with increased adjusted inpatient mortality (OR 1.11 95% CI 1.10–1.11, pu2009<u20090.001) and allowed for improved mortality prediction. Bilateral hospital HIE participation was associated with reduced diagnostic discordance index (3.69 vs. 1.87%, pu2009<u20090.001) and decreased inpatient mortality (OR 0.88, 95% CI 0.89–0.99, pu2009<u20090.001).ConclusionsDiagnostic discordance commonly occurred during inter-hospital transfers and was associated with increased inpatient mortality. Health information exchange adoption was associated with decreased discordance and improved patient outcomes.


Journal of General Internal Medicine | 2018

Development and Validation of Machine Learning Models for Prediction of 1-Year Mortality Utilizing Electronic Medical Record Data Available at the End of Hospitalization in Multicondition Patients: a Proof-of-Concept Study

Nishant Sahni; Gyorgy Simon; Rashi Arora

BackgroundPredicting death in a cohort of clinically diverse, multicondition hospitalized patients is difficult. Prognostic models that use electronic medical record (EMR) data to determine 1-year death risk can improve end-of-life planning and risk adjustment for research.ObjectiveDetermine if the final set of demographic, vital sign, and laboratory data from a hospitalization can be used to accurately quantify 1-year mortality risk.DesignA retrospective study using electronic medical record data linked with the state death registry.ParticipantsA total of 59,848 hospitalized patients within a six-hospital network over a 4-year period.Main MeasuresThe last set of vital signs, complete blood count, basic and complete metabolic panel, demographic information, and ICD codes. The outcome of interest was death within 1 year.Key resultsModel performance was measured on the validation data set. Random forests (RF) outperformed logisitic regression (LR) models in discriminative ability. An RF model that used the final set of demographic, vitals, and laboratory data from the final 48xa0h of hospitalization had an AUC of 0.86 (0.85–0.87) for predicting death within a year. Age, blood urea nitrogen, platelet count, hemoglobin, and creatinine were the most important variables in the RF model. Models that used comorbidity variables alone had the lowest AUC. In groups of patients with a high probability of death, RF models underestimated the probability by less than 10%.ConclusionThe last set of EMR data from a hospitalization can be used to accurately estimate the risk of 1-year mortality within a cohort of multicondition hospitalized patients.

Collaboration


Dive into the Gyorgy Simon's collaboration.

Top Co-Authors

Avatar

Vipin Kumar

University of Minnesota

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Levent Ertoz

University of Minnesota

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Zhi Li Zhang

University of Minnesota

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge