Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sanjoy Dey is active.

Publication


Featured researches published by Sanjoy Dey.


Nucleic Acids Research | 2012

Co-clustering phenome–genome for phenotype classification and disease gene discovery

Tae Hyun Hwang; Gowtham Atluri; Maoqiang Xie; Sanjoy Dey; Changjin Hong; Vipin Kumar; Rui Kuang

Understanding the categorization of human diseases is critical for reliably identifying disease causal genes. Recently, genome-wide studies of abnormal chromosomal locations related to diseases have mapped >2000 phenotype–gene relations, which provide valuable information for classifying diseases and identifying candidate genes as drug targets. In this article, a regularized non-negative matrix tri-factorization (R-NMTF) algorithm is introduced to co-cluster phenotypes and genes, and simultaneously detect associations between the detected phenotype clusters and gene clusters. The R-NMTF algorithm factorizes the phenotype–gene association matrix under the prior knowledge from phenotype similarity network and protein–protein interaction network, supervised by the label information from known disease classes and biological pathways. In the experiments on disease phenotype–gene associations in OMIM and KEGG disease pathways, R-NMTF significantly improved the classification of disease phenotypes and disease pathway genes compared with support vector machines and Label Propagation in cross-validation on the annotated phenotypes and genes. The newly predicted phenotypes in each disease class are highly consistent with human phenotype ontology annotations. The roles of the new member genes in the disease pathways are examined and validated in the protein–protein interaction subnetworks. Extensive literature review also confirmed many new members of the disease classes and pathways as well as the predicted associations between disease phenotype classes and pathways.


PLOS ONE | 2014

Proteomic Profiles in Acute Respiratory Distress Syndrome Differentiates Survivors from Non-Survivors

Maneesh Bhargava; Trisha Becker; Kevin J. Viken; Pratik Jagtap; Sanjoy Dey; Michael Steinbach; Baolin Wu; Vipin Kumar; Peter B. Bitterman; David H. Ingbar; Christine H. Wendt

Acute Respiratory Distress Syndrome (ARDS) continues to have a high mortality. Currently, there are no biomarkers that provide reliable prognostic information to guide clinical management or stratify risk among clinical trial participants. The objective of this study was to probe the bronchoalveolar lavage fluid (BALF) proteome to identify proteins that differentiate survivors from non-survivors of ARDS. Patients were divided into early-phase (1 to 7 days) and late-phase (8 to 35 days) groups based on time after initiation of mechanical ventilation for ARDS (Day 1). Isobaric tags for absolute and relative quantitation (iTRAQ) with LC MS/MS was performed on pooled BALF enriched for medium and low abundance proteins from early-phase survivors (n = 7), early-phase non-survivors (n = 8), and late-phase survivors (n = 7). Of the 724 proteins identified at a global false discovery rate of 1%, quantitative information was available for 499. In early-phase ARDS, proteins more abundant in survivors mapped to ontologies indicating a coordinated compensatory response to injury and stress. These included coagulation and fibrinolysis; immune system activation; and cation and iron homeostasis. Proteins more abundant in early-phase non-survivors participate in carbohydrate catabolism and collagen synthesis, with no activation of compensatory responses. The compensatory immune activation and ion homeostatic response seen in early-phase survivors transitioned to cell migration and actin filament based processes in late-phase survivors, revealing dynamic changes in the BALF proteome as the lung heals. Early phase proteins differentiating survivors from non-survivors are candidate biomarkers for predicting survival in ARDS.


American Journal of Physiology-lung Cellular and Molecular Physiology | 2013

Protein expression profile of rat type two alveolar epithelial cells during hyperoxic stress and recovery

Maneesh Bhargava; Sanjoy Dey; Trisha Becker; Michael Steinbach; Baolin Wu; Sang Mee Lee; LeeAnn Higgins; Vipin Kumar; Peter B. Bitterman; David H. Ingbar; Christine H. Wendt

In rodent model systems, the sequential changes in lung morphology resulting from hyperoxic injury are well characterized and are similar to changes in human acute respiratory distress syndrome. In the injured lung, alveolar type two (AT2) epithelial cells play a critical role in restoring the normal alveolar structure. Thus characterizing the changes in AT2 cells will provide insights into the mechanisms underpinning the recovery from lung injury. We applied an unbiased systems-level proteomics approach to elucidate molecular mechanisms contributing to lung repair in a rat hyperoxic lung injury model. AT2 cells were isolated from rat lungs at predetermined intervals during hyperoxic injury and recovery. Protein expression profiles were determined by using iTRAQ with tandem mass spectrometry. Of the 959 distinct proteins identified, 183 significantly changed in abundance during the injury-recovery cycle. Gene ontology enrichment analysis identified cell cycle, cell differentiation, cell metabolism, ion homeostasis, programmed cell death, ubiquitination, and cell migration to be significantly enriched by these proteins. Gene set enrichment analysis of data acquired during lung repair revealed differential expression of gene sets that control multicellular organismal development, systems development, organ development, and chemical homeostasis. More detailed analysis identified activity in two regulatory pathways, JNK and miR 374. A novel short time-series expression miner algorithm identified protein clusters with coherent changes during injury and repair. We concluded that coherent changes occur in the AT2 cell proteome in response to hyperoxic stress. These findings offer guidance regarding the specific molecular mechanisms governing repair of the injured lung.


PLOS ONE | 2013

Characteristics of Diarrheal Illnesses in Non-Breast Fed Infants Attending a Large Urban Diarrheal Disease Hospital in Bangladesh.

Sanjoy Dey; Mohammod Jobayer Chisti; Sumon Kumar Das; Chandan Kumar Shaha; Farzana Ferdous; Fahmida Dil Farzana; Shahnawaz Ahmed; Mohammad Abdul Malek; Abu Syed Golam Faruque; Tahmeed Ahmed; Mohammed Abdus Salam

Background Lack of breast feeding is associated with higher morbidity and case-fatality from both bacterial and viral etiologic diarrheas. However, there is very limited data on the characteristics of non–breastfed infants attending hospital with diarrheal illnesses caused by common bacterial and viral pathogens. Our objective was to assess the impact of lack of breast feeding on diarrheal illnesses in infants living in urban Bangladesh. Methods We extracted data of infants (0–11 months) for analyses from the data archive of Diarrheal Disease Surveillance System (DDSS) of the Dhaka Hospital of icddr,b for the period 2008–2011. Results The prevalence of breastfeeding in infants attending the hospital with diarrhea reduced from 31% in 2008 to 17% in 2011, with corresponding increase in the prevalence of non-breastfed (chi square for trend <0.001). Among breastfed infants, the incidence of rotavirus infections was higher (43%) among the 0–5 months age group than infants aged 9–11 months (18%). On the other hand, among non-breastfed infants, the incidence of rotavirus infections was much higher (82%) among 9–11 months old infants compared to those in 0–5 months age group (57%) (chi square for trend <0.001). Very similar trends were also observed in the incidence of cholera and ETEC diarrheas among different age groups of breastfed and non-breastfed infants (chi square for trend 0.020 and 0.001 respectively). However, for shigellosis, the statistical difference remained unchanged among both the groups (chi square for trend 0.240). Conclusion and Significance We observed protective role of breastfeeding in infantile diarrhea caused by the major viral and common bacterial agents. These findings underscore the importance of promotion and expansion of breastfeeding campaigns in Bangladesh and elsewhere.


Journal of Healthcare Engineering | 2011

Interpretable Predictive Models for Knowledge Discovery from Home-Care Electronic Health Records

Bonnie L. Westra; Sanjoy Dey; Gang Fang; Michael Steinbach; Vipin Kumar; Cristina Oancea; Kay Savik; Mary Dierich

The purpose of this methodological study was to compare methods of developing predictive rules that are parsimonious and clinically interpretable from electronic health record (EHR) home visit data, contrasting logistic regression with three data mining classification models. We address three problems commonly encountered in EHRs: the value of including clinically important variables with little variance, handling imbalanced datasets, and ease of interpretation of the resulting predictive models. Logistic regression and three classification models using Ripper, decision trees, and Support Vector Machines were applied to a case study for one outcome of improvement in oral medication management. Predictive rules for logistic regression, Ripper, and decision trees are reported and results compared using F-measures for data mining models and area under the receiver-operating characteristic curve for all models. The rules generated by the three classification models provide potentially novel insights into mining EHRs beyond those provided by standard logistic regression, and suggest steps for further study.


International Journal of Medical Informatics | 2016

Clustering of elderly patient subgroups to identify medication-related readmission risks

Catherine H. Olson; Sanjoy Dey; Vipin Kumar; Karen A. Monsen; Bonnie L. Westra

INTRODUCTION High Risk Medication Regimen (HRMR) scores are weakly predictive of hospital readmissions for elderly home health care patients. HRMR is composed of three elements related to drug risks: polypharmacy (number of medications); Potentially Inappropriate Medications (PIM) known to be harmful to the elderly; and the Medication Regimen Complexity Index (MRCI) that weighs drugs by the complexity of their dosing and instructions. In this paper, we hypothesized that HRMR scores are more predictive for demographic subgroups of elderly patients. The study used Outcome and Assessment Information Set (OASIS) variables to identify subgroups of patients for whom the HRMR measures appeared more predictive for hospital readmissions. METHODS OASIS and medication data were reused from a study of 911 patients (355 males, 556 females; mean age 78.9) from 15 Medicare-certified home health care agencies that established the relationship between HRMR and hospital readmissions. Hierarchical agglomerative clustering using the Jaccard distance measure and average-link method identified patient subgroups based on the OASIS data. Receiver operating curve (ROC) analyses evaluated the predictive strength of the HRMR variables for each subgroup. Additional False Discovery Rate (FDR) analyses assessed whether the clustered relationships were chance. RESULTS Clustering of OASIS data for 911 patients identified six subgroups: patients with Good Functional Status (n=382); Females with Moderate to Severe Pain (n=354); patients with poor prognosis needing functional status assistance (n=419); patients with Poor Functional Status (n=287); Males with Adult Children as Caregiver (n=198); adults living alone with spouses as primary caregiver (n=127). ROC results relating these subgroups to HRMR risks were strongest for Males with Adult Children as Caregivers (AUC: polypharmacy, 0.73; PIM, 0.64; MRCI, 0.77). The findings for this subgroup also met the FDR analysis threshold (<=0.20). CONCLUSIONS A risk of medication-related readmissions in elderly men with adult children as caregivers is consistent with research showing problems in medication adherence when seniors are supported by informal caregivers. The results from clustering analysis present a hypothesis for research on HRMR and on the relationship between adult caregivers and their fathers.


siam international conference on data mining | 2014

Mining interpretable and predictive diagnosis codes from multi-source electronic health records

Sanjoy Dey; György J. Simon; Bonnie L. Westra; Michael Steinbach; Vipin Kumar

Mining patterns from electronic health-care records (EHR) can potentially lead to better and more cost-effective treatments. We aim to find the groups of ICD-9 diagnosis codes from EHRs that can predict the improvement of urinary incontinence of home health care (HHC) patients and also are interpretable to domain experts. In this paper, we propose two approaches for increasing the interpretability of the obtained groups of ICD-9 codes. First, we incorporate prior information available from clinical domain knowledge using the clinical classification system (CCS). Second, we incorporate additional types of clinical information for the same patients, such as demographic, behavioral, physiological, and psycho-social variables available from survey questions during the hospital visits. Finally, we develop a hybrid framework that can combine both prior information and the datadriven clinical information in the predictive model framework. Our results obtained from a large-scale EHR data set show that the hybrid framework enhances clinical interpretability as compared to the baseline model obtained from ICD-9 codes only, while achieving almost the same predictive


Nursing Research | 2015

Mining Patterns Associated With Mobility Outcomes in Home Healthcare.

Sanjoy Dey; Jacob Cooner; Connie Delaney; Joanna Fakhoury; Vipin Kumar; György J. Simon; Michael Steinbach; Jeremy Weed; Bonnie L. Westra

BackgroundMobility is critical for self-management. Understanding factors associated with improvement in mobility during home healthcare can help nurses tailor interventions to improve mobility outcomes and keep patients safely at home. ObjectivesThe aims were to (a) identify patient and support system factors associated with mobility improvement during home care, (b) evaluate consistency of factors across groups defined by mobility status at the start of home care, and (c) identify patterns of factors associated with improvement and no improvement in mobility within each group. MethodsOutcome and Assessment Information Set data extracted from a national convenience sample of 270,634 patient records collected from October 1, 2008 to December 31, 2009 from 581 Medicare-certified, home healthcare agencies were used. Patients were placed into groups based on mobility scores at admission. Odds ratios were used to index associations of factors with improvement at discharge. Discriminative pattern mining was used to discover patterns associated with improvement of mobility. ResultsOverall, mobility improved for 49.4% of patients; improvement occurred most frequently (80%) among patients who were able, at admission, to walk only with the supervision or assistance of another person at all times. Numerous factors associated with improvement in mobility outcome were similar across the groups (except for those who were chairfast but were able to wheel themselves independently); however, the number, strength, and direction of associations varied. In most groups, data mining-discovered patterns of factors associated with the mobility outcome were composed of combinations of functional and cognitive status and the type and amount of help required at home. DiscussionThis study provides new data mining-based information about how factors associated with improvement in mobility group together and vary by mobility at admission. These approaches have potential to provide new insights for clinicians to tailor interventions for improvement of mobility.


Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine | 2012

A pattern mining based integrative framework for biomarker discovery

Sanjoy Dey; Kelvin O. Lim; Gowtham Atluri; Angus W. MacDonald; Michael Steinbach; Vipin Kumar

Recent advances in high throughput data collection and imaging technologies have resulted in the availability of diverse biomedical datasets that capture complementary information pertaining to the biological processes in an organism. Biomarkers that are discovered by integrating such datasets obtained from case-control studies have the potential to elucidate the biological mechanisms behind complex human diseases. Of particular importance are interaction-type integrative biomarker, which are biomarkers whose features can explain the disease when taken together, but not when considered individually. We propose a pattern mining based integrative framework (PAMIN) to discover these interaction-type integrative biomarkers from diverse case control datasets. PAMIN first finds patterns from individual datasets to capture the available information separately and then combines these patterns to find integrated patterns (IPs) consisting of variables from multiple datasets. We also use several interestingness measures to characterize the IPs into specific categories. Using synthetic and real data we compare the IPs found using our approach with those found by CCA and discriminative-CCA (dCCA). Our results indicate that PAMIN is able to discover interaction type integrated patterns that these competing approaches cannot find.


international conference on computational advances in bio and medical sciences | 2012

Invited: Discovering combinatorial biomarkers

Gowtham Atluri; Sanjoy Dey; Gang Fang; Sean R. Landman; Vanja Paunic; Wen Wang; Michael Steinbach; Vipin Kumar

There has been a dramatic increase in the quantity, quality, and types of advanced biomedical information available to individuals and their medical providers. These types of data include, but are not limited to, cell process information provided by DNA microarrays and RNA seq, genetic information in the form of Single Nucleotide Polymorphisms (SNPs), metabolomics data in terms of proteins and other metabolites, and structural and functional brain data from magnetic resonance imaging (MRI). Together with the increasing availability of clinical data from electronic medical records, this abundance of data has created the very real possibility of personalized medicine, i.e., using detailed biomedical, clinical, and environmental information about a person for a customized and more effective approach to patient care [11], [16], [3]. Achieving this goal requires identifying those features of the data that can distinguish not only between healthy or low risk subjects (controls) and diseased or high risk subjects (cases), but also among different subgroups of cases. These features are typically predictive patterns (biomarkers) that are associated with the disease or other phenotype of interest. Simple examples are a SNP that indicates a predisposition for a particular disorder or the presence of a protein or small molecule that signals the presence of cancer. These patterns can be directly useful in diagnosis, treatment or prevention, but equally as important; they can also provide insights into the underlying nature of the disease or related biomedical processes. Unfortunately, the lack of readily available, easy to use, and effective tools and techniques for finding trustworthy and useful markers is limiting progress in medical research and slowing the advent of personalized medicine [12], [4]. Several well-known challenges are responsible for this lack of progress. First, many times the large number of individual factors, e.g., hundreds of thousands or millions of SNPs, makes it difficult to find statistically significant single markers without large numbers of samples. In addition, the complexity of the diseases being considered also makes it unlikely that meaningful predictive patterns can be based on single factors. Thus, techniques for extracting meaningful associations must be able to discover combinations of factors that show a significant association with a disease phenotype even when single factors have little or no association. However, search for such high order interactions leads to increased computational complexity, since the number of possible patterns increase exponentially with pattern length. Perhaps an even more serious challenge is that of multiple hypothesis testing which results from the enormous number of potential patterns (hypotheses) and the resulting increased probability of mistaking spurious patterns for real ones. Yet another complication is the heterogeneous nature of many diseases, i.e., patients with a particular disease may form different subgroups and predictive patterns appropriate for one subgroup may not apply to another. To more fully capture the broad range of factors responsible for complex disorders, it is necessary to undertake the difficult task of integrating diverse types of data from the same set of subjects or from accumulated biomedical knowledge, e.g., functional annotations of genes or proteins. Given the inability of current techniques to handle these challenges (computational complexity, statistical significance, heterogeneity, and data integration), it is no surprise the even when statistically significant patterns are found in one study, they are rarely reproduced in follow-up studies by different groups [13], [14], [5]. This talk will present our groups recent research on pattern mining based approaches for addressing these challenges [8], [7], [9], [6], [10], [15], [2], [20], [17], [18], [1], [19].

Collaboration


Dive into the Sanjoy Dey's collaboration.

Top Co-Authors

Avatar

Vipin Kumar

University of Minnesota

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Baolin Wu

University of Minnesota

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge