Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Juan M. Banda is active.

Publication


Featured researches published by Juan M. Banda.


Proceedings of the National Academy of Sciences of the United States of America | 2016

Characterizing treatment pathways at scale using the OHDSI network

George Hripcsak; Patrick B. Ryan; Jon D. Duke; Nigam H. Shah; Rae Woong Park; Vojtech Huser; Marc A. Suchard; Martijn J. Schuemie; Frank J. DeFalco; Adler J. Perotte; Juan M. Banda; Christian G. Reich; Lisa M. Schilling; Michael E. Matheny; Daniella Meeker; Nicole L. Pratt; David Madigan

Observational research promises to complement experimental research by providing large, diverse populations that would be infeasible for an experiment. Observational research can test its own clinical hypotheses, and observational studies also can contribute to the design of experiments and inform the generalizability of experimental research. Understanding the diversity of populations and the variance in care is one component. In this study, the Observational Health Data Sciences and Informatics (OHDSI) collaboration created an international data network with 11 data sources from four countries, including electronic health records and administrative claims data on 250 million patients. All data were mapped to common data standards, patient privacy was maintained by using a distributed model, and results were aggregated centrally. Treatment pathways were elucidated for type 2 diabetes mellitus, hypertension, and depression. The pathways revealed that the world is moving toward more consistent therapy over time across diseases and across locations, but significant heterogeneity remains among sources, pointing to challenges in generalizing clinical trial results. Diabetes favored a single first-line medication, metformin, to a much greater extent than hypertension or depression. About 10% of diabetes and depression patients and almost 25% of hypertension patients followed a treatment pathway that was unique within the cohort. Aside from factors such as sample size and underlying population (academic medical center versus general population), electronic health records data and administrative claims data revealed similar results. Large-scale international observational research is feasible.


international conference on image processing | 2013

A large-scale solar image dataset with labeled event regions

Michael A. Schuh; Rafal A. Angryk; Karthik Ganesan Pillai; Juan M. Banda; Petrus C. H. Martens

This paper introduces a new public benchmark dataset of solar image data from the Solar Dynamics Observatory (SDO) mission. This is the first release, which contains over 15,000 images and nearly 24,000 solar events, spanning the first six months of 2012. It combines region-based event labels from six automated detection modules, ten pre-computed image parameters for each cell over a grid-based segmentation of the full resolution images, and a lower resolution version of the images for further analysis and visualization. Together, these components serve as a standardized, ready-to-use, solar image dataset for general image processing research, without requiring the necessary background knowledge to properly prepare it. We present here the fundamental dataset creation details and outline future improvements and opportunities as data collection continues for the coming years.


ieee international conference on fuzzy systems | 2009

On the effectiveness of fuzzy clustering as a data discretization technique for large-scale classification of solar images

Juan M. Banda; Rafal A. Angryk

This paper presents experimental results on the utilization of fuzzy clustering as a discretization technique for purpose of solar images recognition. By extracting texture features from our solar images, and consequently applying fuzzy clustering techniques on these features, we were able to determine what clustering algorithm and what algorithms initialization parameters produced the best data discretization. Based on these results we discretized some of our texture features and ran them on two different classifiers comparing how well the classifiers performed on our original data versus the discretized data. Our experimental results demonstrate that discretization of our data via fuzzy clustering carries significant potential since on our classifiers produced similar results on the original and the discretized data, and the reduction of storage space achieved through cluster-based discretization has been very significant.


Journal of the American Medical Informatics Association | 2016

Learning statistical models of phenotypes using noisy labeled training data

Vibhu Agarwal; Tanya Podchiyska; Juan M. Banda; Veena V. Goel; Tiffany I. Leung; Evan P. Minty; Timothy E. Sweeney; Elsie Gyang; Nigam H. Shah

OBJECTIVE Traditionally, patient groups with a phenotype are selected through rule-based definitions whose creation and validation are time-consuming. Machine learning approaches to electronic phenotyping are limited by the paucity of labeled training datasets. We demonstrate the feasibility of utilizing semi-automatically labeled training sets to create phenotype models via machine learning, using a comprehensive representation of the patient medical record. METHODS We use a list of keywords specific to the phenotype of interest to generate noisy labeled training data. We train L1 penalized logistic regression models for a chronic and an acute disease and evaluate the performance of the models against a gold standard. RESULTS Our models for Type 2 diabetes mellitus and myocardial infarction achieve precision and accuracy of 0.90, 0.89, and 0.86, 0.89, respectively. Local implementations of the previously validated rule-based definitions for Type 2 diabetes mellitus and myocardial infarction achieve precision and accuracy of 0.96, 0.92 and 0.84, 0.87, respectively.We have demonstrated feasibility of learning phenotype models using imperfectly labeled data for a chronic and acute phenotype. Further research in feature engineering and in specification of the keyword list can improve the performance of the models and the scalability of the approach. CONCLUSIONS Our method provides an alternative to manual labeling for creating training sets for statistical models of phenotypes. Such an approach can accelerate research with large observational healthcare datasets and may also be used to create local phenotype models.


international conference on data mining | 2012

Spatio-temporal Co-occurrence Pattern Mining in Data Sets with Evolving Regions

Karthik Ganesan Pillai; Rafal A. Angryk; Juan M. Banda; Michael A. Schuh; Tim Wylie

Spatio-temporal co-occurring patterns represent subsets of event types that occur together in both space and time. In comparison to previous work in this field, we present a general framework to identify spatio-temporal co occurring patterns for continuously evolving spatio-temporal events that have polygon-like representations. We also propose a set of measures to identify spatio-temporal co-occurring patterns and propose an Apriori-based spatio-temporal co-occurrence mining algorithm to find prevalent spatio-temporal co-occurring patterns for extended spatial representations that evolve over time. We evaluate our framework on real-life data to demonstrate the effectiveness of our measures and the algorithm. We present results highlighting the importance of our measures in identifying spatio-temporal co-occurrence patterns.


digital image computing: techniques and applications | 2010

Selection of Image Parameters as the First Step towards Creating a CBIR System for the Solar Dynamics Observatory

Juan M. Banda; Rafal A. Angryk

This work describes the attribute evaluation sections of the ambitious goal of creating a large-scale content-based image retrieval (CBIR) system for solar phenomena in NASA images from the Solar Dynamics Observatory mission. This mission, with its Atmospheric Imaging Assembly (AIA), is generating eight 4096 pixels x 4096 pixels images every 10 seconds, leading to a data transmission rate of approximately 700 Gigabytes per day from only the AIA component (the entire mission is expected to be sending about 1.5 Terabytes of data per day, for a minimum of 5 years). We investigate unsupervised and supervised methods of selecting image parameters and their importance from the perspective of distinguishing between different types of solar phenomena by using correlation analysis, and three supervised attribute evaluation methods. By selecting the most relevant image parameters (out of the twelve tested) we expect to be able to save 540 Megabytes per day of storage costs for each parameter that we remove. In addition, we also applied several image filtering algorithms on these images in order to investigate the enhancement of our classification results. We confirm our experimental results by running multiple classifiers for comparative analysis on the selected image parameters and filters.


international conference on image processing | 2011

On the surprisingly accurate transfer of image parameters between medical and solar images

Juan M. Banda; Rafal A. Angryk; Petrus C. H. Martens

In this work we report on the transfer of image parameters that produce good results for medical images to the domain of solar image analysis. Using the first solar domain-specific benchmark dataset that contains multiple types of solar phenomena we discovered during our work for constructing a content-based image retrieval (CBIR) system for NASAs Solar Dynamics Observatory (SDO) mission that we could take advantage of the research on the analysis of images in the medical field. We demonstrate that, while image analysis is a very domain-specific task, there are lessons to be learned and methods to be shared between different fields. In this paper we present an extensive comparative analysis of several different domain-specific datasets in order to provide some guidance for the solar physics community on the well-researched field of medical image analysis allowing them to transfer knowledge from one applied field to their own.


Scientific Data | 2016

A curated and standardized adverse drug event resource to accelerate drug safety research

Juan M. Banda; Lee Evans; Rami Vanguri; Nicholas P. Tatonetti; Patrick B. Ryan; Nigam H. Shah

Identification of adverse drug reactions (ADRs) during the post-marketing phase is one of the most important goals of drug safety surveillance. Spontaneous reporting systems (SRS) data, which are the mainstay of traditional drug safety surveillance, are used for hypothesis generation and to validate the newer approaches. The publicly available US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) data requires substantial curation before they can be used appropriately, and applying different strategies for data cleaning and normalization can have material impact on analysis results. We provide a curated and standardized version of FAERS removing duplicate case records, applying standardized vocabularies with drug names mapped to RxNorm concepts and outcomes mapped to SNOMED-CT concepts, and pre-computed summary statistics about drug-outcome relationships for general consumption. This publicly available resource, along with the source code, will accelerate drug safety research by reducing the amount of time spent performing data management on the source FAERS reports, improving the quality of the underlying data, and enabling standardized analyses using common vocabularies.


advances in databases and information systems | 2014

Spatiotemporal Co-occurrence Rules

Karthik Ganesan Pillai; Rafal A. Angryk; Juan M. Banda; Tim Wylie; Michael A. Schuh

Spatiotemporal co-occurrence rules (STCORs) discovery is an important problem in many application domains such as weather monitoring and solar physics, which is our application focus. In this paper, we present a general framework to identify STCORs for continuously evolving spatiotemporal events that have extended spatial representations. We also analyse a set of anti-monotone (monotonically non-increasing) and non anti-monotone measures to identify STCORs. We then validate and evaluate our framework on a real-life data set and report results of the comparison of the number candidates needed to discover actual patterns, memory usage, and the number of STCORs discovered using the anti-monotonic and non anti-monotonic measures.


international semantic web conference | 2015

Provenance-centered dataset of drug-drug interactions

Juan M. Banda; Tobias Kuhn; Nigam H. Shah; Michel Dumontier

Over the years several studies have demonstrated the ability to identify potential drug-drug interactions via data mining from the literature (MEDLINE), electronic health records, public databases (Drugbank), etc. While each one of these approaches is properly statistically validated, they do not take into consideration the overlap between them as one of their decision making variables. In this paper we present LInked Drug-Drug Interactions (LIDDI), a public nanopublication-based RDF dataset with trusty URIs that encompasses some of the most cited prediction methods and sources to provide researchers a resource for leveraging the work of others into their prediction methods. As one of the main issues to overcome the usage of external resources is their mappings between drug names and identifiers used, we also provide the set of mappings we curated to be able to compare the multiple sources we aggregate in our dataset.

Collaboration


Dive into the Juan M. Banda's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tim Wylie

Montana State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge