Is this you? Create Your Porfile

Erhan Guven

Johns Hopkins University Applied Physics Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Erhan Guven is active.

Explore More

Publication

Featured researches published by Erhan Guven.

IEEE Communications Surveys and Tutorials | 2016

A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection

Anna L. Buczak; Erhan Guven

This survey paper describes a focused literature survey of machine learning (ML) and data mining (DM) methods for cyber analytics in support of intrusion detection. Short tutorial descriptions of each ML/DM method are provided. Based on the number of citations or the relevance of an emerging method, papers representing each method were identified, read, and summarized. Because data are so important in ML/DM approaches, some well-known cyber data sets used in ML/DM are described. The complexity of ML/DM algorithms is addressed, discussion of challenges for using ML/DM for cyber security is presented, and some recommendations on when to use a given method are provided.

PLOS Neglected Tropical Diseases | 2014

Prediction of High Incidence of Dengue in the Philippines

Anna L. Buczak; Benjamin Baugher; Steven M. Babin; Liane Ramac-Thomas; Erhan Guven; Yevgeniy Elbert; Phillip T. Koshute; John Mark Velasco; Vito G. Roque; Enrique A. Tayag; In-Kyu Yoon; Sheri Lewis

Background Accurate prediction of dengue incidence levels weeks in advance of an outbreak may reduce the morbidity and mortality associated with this neglected disease. Therefore, models were developed to predict high and low dengue incidence in order to provide timely forewarnings in the Philippines. Methods Model inputs were chosen based on studies indicating variables that may impact dengue incidence. The method first uses Fuzzy Association Rule Mining techniques to extract association rules from these historical epidemiological, environmental, and socio-economic data, as well as climate data indicating future weather patterns. Selection criteria were used to choose a subset of these rules for a classifier, thereby generating a Prediction Model. The models predicted high or low incidence of dengue in a Philippines province four weeks in advance. The threshold between high and low was determined relative to historical incidence data. Principal Findings Model accuracy is described by Positive Predictive Value (PPV), Negative Predictive Value (NPV), Sensitivity, and Specificity computed on test data not previously used to develop the model. Selecting a model using the F0.5 measure, which gives PPV more importance than Sensitivity, gave these results: PPV = 0.780, NPV = 0.938, Sensitivity = 0.547, Specificity = 0.978. Using the F3 measure, which gives Sensitivity more importance than PPV, the selected model had PPV = 0.778, NPV = 0.948, Sensitivity = 0.627, Specificity = 0.974. The decision as to which model has greater utility depends on how the predictions will be used in a particular situation. Conclusions This method builds prediction models for future dengue incidence in the Philippines and is capable of being modified for use in different situations; for diseases other than dengue; and for regions beyond the Philippines. The Philippines dengue prediction models predicted high or low incidence of dengue four weeks in advance of an outbreak with high accuracy, as measured by PPV, NPV, Sensitivity, and Specificity.

BMC Medical Informatics and Decision Making | 2015

Fuzzy association rule mining and classification for the prediction of malaria in South Korea

Anna L. Buczak; Benjamin Baugher; Erhan Guven; Liane Ramac-Thomas; Yevgeniy Elbert; Steven M. Babin; Sheri Lewis

BackgroundMalaria is the world’s most prevalent vector-borne disease. Accurate prediction of malaria outbreaks may lead to public health interventions that mitigate disease morbidity and mortality.MethodsWe describe an application of a method for creating prediction models utilizing Fuzzy Association Rule Mining to extract relationships between epidemiological, meteorological, climatic, and socio-economic data from Korea. These relationships are in the form of rules, from which the best set of rules is automatically chosen and forms a classifier. Two classifiers have been built and their results fused to become a malaria prediction model. Future malaria cases are predicted as LOW, MEDIUM or HIGH, where these classes are defined as a total of 0–2, 3–16, and above 17 cases, respectively, for a region in South Korea during a two-week period. Based on user recommendations, HIGH is considered an outbreak.ResultsModel accuracy is described by Positive Predictive Value (PPV), Sensitivity, and F-score for each class, computed on test data not previously used to develop the model. For predictions made 7–8 weeks in advance, model PPV and Sensitivity are 0.842 and 0.681, respectively, for the HIGH classes. The F0.5 and F3 scores (which combine PPV and Sensitivity) are 0.804 and 0.694, respectively, for the HIGH classes. The overall FARM results (as measured by F-scores) are significantly better than those obtained by Decision Tree, Random Forest, Support Vector Machine, and Holt-Winters methods for the HIGH class. For the MEDIUM class, Random Forest and FARM obtain comparable results, with FARM being better at F0.5, and Random Forest obtaining a higher F3.ConclusionsA previously described method for creating disease prediction models has been modified and extended to build models for predicting malaria. In addition, some new input variables were used, including indicators of intervention measures. The South Korea malaria prediction models predict LOW, MEDIUM or HIGH cases 7–8 weeks in the future. This paper demonstrates that our data driven approach can be used for the prediction of different diseases.

applied imagery pattern recognition workshop | 2010

Speech Emotion Recognition using a backward context

Erhan Guven; Peter Bock

The classification of emotions, such as joy, anger, anxiety, etc. from tonal variations in human speech is an important task for research and applications in human computer interaction. In the preceding work, it has been demonstrated that the locally extracted features of speech match or surpass the performance of global features that has been adopted in current approaches. In this continuing research, a backward context, which also can be considered as a feature vector memory, is shown to improve the prediction accuracy of the Speech Emotion Recognition engine. Preliminary results on German emotional speech database illustrate significant improvements over results from the previous study.

Biomedical Engineering and Computational Biology | 2016

Prediction of Peaks of Seasonal Influenza in Military Health-Care Data

Anna L. Buczak; Benjamin Baugher; Erhan Guven; Linda J. Moniz; Steven M. Babin; Jean-Paul Chretien

Influenza is a highly contagious disease that causes seasonal epidemics with significant morbidity and mortality. The ability to predict influenza peak several weeks in advance would allow for timely preventive public health planning and interventions to be used to mitigate these outbreaks. Because influenza may also impact the operational readiness of active duty personnel, the US military places a high priority on surveillance and preparedness for seasonal outbreaks. A method for creating models for predicting peak influenza visits per total health-care visits (ie, activity) weeks in advance has been developed using advanced data mining techniques on disparate epidemiological and environmental data. The model results are presented and compared with those of other popular data mining classifiers. By rigorously testing the model on data not used in its development, it is shown that this technique can predict the week of highest influenza activity for a specific region with overall better accuracy than other methods examined in this article.

Procedia Computer Science | 2013

An OpenCL Framework for Fuzzy Associative Classification and its Application to Disease Prediction.

Erhan Guven; Anna L. Buczak

Abstract Recently, the broad availability of online and soft real-time data has been attracting corporations and researchers towards data analytics. Though newer and faster algorithms are developed, as the available dataset sizes increase exponentially, computational processing has been falling behind. The trend on computational resources is mostly towards multiple cores and parallel processing, while the CPU clock speed improvements are slowing down. An important computational resource is the Graphics Processing Unit with multiple-cores, surpassing 2,000 processing elements in one processing unit and operating at almost 5 teraflops capability, such as the recently introduced GeForce GTX Titan. In this study, an Open Computing Language parallel processing framework for fuzzy associative classification is described. The hybrid CPU-GPU implementation is developed and employed for prediction of infectious disease outbreaks, specifically Influenza, using environmental and disease data readily available online. A comparison of the implemented performance with respect to another in-house developed Fuzzy Association Rule Mining operator, FARM, and the performances on four distinct parallel processing environments, specifically a four processor, 64 threads capable, Opteron server; a two processor, 24 threads capable, Xeon server; a GeForce GTX 680 GPU card; and a Radeon HD 7950 GPU card, is presented. The advantages and disadvantages of the OpenCL implementation on parallel processors are discussed.

PLOS ONE | 2018

Ensemble method for dengue prediction

Anna L. Buczak; Benjamin Baugher; Linda J. Moniz; Thomas Bagley; Steven M. Babin; Erhan Guven

Background In the 2015 NOAA Dengue Challenge, participants made three dengue target predictions for two locations (Iquitos, Peru, and San Juan, Puerto Rico) during four dengue seasons: 1) peak height (i.e., maximum weekly number of cases during a transmission season; 2) peak week (i.e., week in which the maximum weekly number of cases occurred); and 3) total number of cases reported during a transmission season. A dengue transmission season is the 12-month period commencing with the location-specific, historical week with the lowest number of cases. At the beginning of the Dengue Challenge, participants were provided with the same input data for developing the models, with the prediction testing data provided at a later date. Methods Our approach used ensemble models created by combining three disparate types of component models: 1) two-dimensional Method of Analogues models incorporating both dengue and climate data; 2) additive seasonal Holt-Winters models with and without wavelet smoothing; and 3) simple historical models. Of the individual component models created, those with the best performance on the prior four years of data were incorporated into the ensemble models. There were separate ensembles for predicting each of the three targets at each of the two locations. Principal findings Our ensemble models scored higher for peak height and total dengue case counts reported in a transmission season for Iquitos than all other models submitted to the Dengue Challenge. However, the ensemble models did not do nearly as well when predicting the peak week. Conclusions The Dengue Challenge organizers scored the dengue predictions of the Challenge participant groups. Our ensemble approach was the best in predicting the total number of dengue cases reported for transmission season and peak height for Iquitos, Peru.

Procedia Computer Science | 2012

Note and Timbre Classification by Local Features of Spectrogram

Erhan Guven; A. Murat Ozbayoglu

Abstract In recent years, very large scale online music databases containing more than 10 million tracks became prevalent as the fostered availability of streaming and downloading services via the World-Wide Web. The set of access schemes, or Music Information Retrieval (MIR), still poses several and partially solved problems, especially the personalization of the access, such as query by humming, melody, mood, style, genre, instrument, etc . Generally the previous approaches utilized the spectral features of the music track and extracted several high-level features such as pitch, cepstral coefficients, power, and the time-domain features such as onset, tempo, etc . In this work, however, the low-level local features of the spectrogram partitioned by means of the Bark scale are utilized to extract the quantized time-frequency-power features to be used by a Support Vector Machine to classify the notes (melody) and the timbre (instrument) of 128 instruments of General Midi standard. A database of 3-second sound clips of notes C4 to C5 on 7 sound cards using two software synthesizers is constructed and used for experimental note and timbre classification. The preliminary results of 13-category music note and 16-category timbre classifications are promising and their performance scores are surpassing the previously proposed methods.

Archive | 2006