Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Susmita Datta is active.

Publication


Featured researches published by Susmita Datta.


Journal of Statistical Computation and Simulation | 2017

Temporal prediction of future state occupation in a multistate model from high-dimensional baseline covariates via pseudo-value regression

Sandipan Dutta; Susmita Datta; Somnath Datta

ABSTRACT In many complex diseases such as cancer, a patient undergoes various disease stages before reaching a terminal state (say disease free or death). This fits a multistate model framework where a prognosis may be equivalent to predicting the state occupation at a future time t. With the advent of high-throughput genomic and proteomic assays, a clinician may intent to use such high-dimensional covariates in making better prediction of state occupation. In this article, we offer a practical solution to this problem by combining a useful technique, called pseudo-value (PV) regression, with a latent factor or a penalized regression method such as the partial least squares (PLS) or the least absolute shrinkage and selection operator (LASSO), or their variants. We explore the predictive performances of these combinations in various high-dimensional settings via extensive simulation studies. Overall, this strategy works fairly well provided the models are tuned properly. Overall, the PLS turns out to be slightly better than LASSO in most settings investigated by us, for the purpose of temporal prediction of future state occupation. We illustrate the utility of these PV-based high-dimensional regression methods using a lung cancer data set where we use the patients’ baseline gene expression values.


Biometrics | 2018

Profiling the effects of short time‐course cold ischemia on tumor protein phosphorylation using a Bayesian approach

You Wu; Jeremy Gaskins; Maiying Kong; Susmita Datta

Phosphorylated proteins provide insight into tumor etiology and are used as diagnostic, prognostic, and therapeutic markers of complex diseases. However, pre-analytic variations, such as freezing delay after biopsy acquisition, often occur in real hospital settings and potentially lead to inaccurate results. The objective of this work is to develop statistical methodology to assess the stability of phosphorylated proteins under short-time cold ischemia. We consider a hierarchical model to determine if phosphorylation abundance of a protein at a particular phosphorylation site remains constant or not during cold ischemia. When phosphorylation levels vary across time, we estimate the direction of the changes in each protein based on the maximum overall posterior probability and on the pairwise posterior probabilities, respectively. We analyze a dataset of ovarian tumor tissues that suffered cold-ischemia shock before the proteomic profiling. Gajadhar et al. (2015) applied independent clusterings for each patient because of the high heterogeneity across patients, while our proposed model shares information allowing conclusions for the entire sample population. Using the proposed model, 15 out of 32 proteins show significant changes during 1-hour cold ischemia. Through simulation studies, we conclude that our proposed methodology has a higher accuracy for detecting changes compared to an order restricted inference method. Our approach provides inference on the stability of these phosphorylated proteins, which is valuable when using these proteins as biomarkers for a disease.


Biological Research For Nursing | 2018

Pilot Study of Metabolomics and Psychoneurological Symptoms in Women With Early Stage Breast Cancer

Debra E. Lyon; Angela Starkweather; Yingwei Yao; Timothy J. Garrett; Debra Lynch Kelly; Victoria Menzies; Paweł Dereziński; Susmita Datta; Sreelakshmy Kumar; Colleen Jackson-Cook

Many women with breast cancer experience symptoms of pain, fatigue, and depression, collectively known as psychoneurologic (PN) symptoms, during and after chemotherapy treatment. Evidence that inflammatory dysfunction related to cancer and its treatments contributes to the development and persistence of PN symptoms through several interrelated pathways is accumulating. However, a major limiting factor in more precisely identifying the biological mechanisms underlying these symptoms is the lack of biological measures that represent a holistic spectrum of biological responses. Metabolomics allows for examination of multiple, co-occurring metabolic pathways and provides a systems-level perspective on biological mechanisms that may contribute to PN symptoms. Methods: In this pilot study, we performed serum metabolome analysis using liquid chromatography high-resolution mass spectrometry of global and targeted metabolomics from the tryptophan pathway from archived samples from 19 women with early-stage breast cancer. We used paired t tests to compare metabolite concentrations and Pearson’s correlation coefficients to examine concomitant changes in metabolite concentrations and PN symptoms before and after chemotherapy. Results: Levels of pain, fatigue, and depression increased after chemotherapy. Compared with pre-chemotherapy, global metabolites post-chemotherapy were characterized by higher concentrations of acetyl-l-alanine and indoxyl sulfate and lower levels of 5-oxo-l-proline. Targeted analysis indicated significantly higher kynurenine levels and kynurenine/tryptophan ratios post-chemotherapy. Symptoms of pain and fatigue had strong associations with multiple global and several targeted metabolites. Conclusion: Results demonstrated that metabolomics may be useful for elucidating biological mechanisms associated with the development and severity of PN symptoms, specifically pain and fatigue, in women with early-stage breast cancer.


Archive | 2017

Statistical Analysis of Proteomics, Metabolomics, and Lipidomics Data Using Mass Spectrometry

Susmita Datta; Bart Mertens

Statistical analysis of proteomics, metabolomics, and lipidomics data using mass spectrometry. , Statistical analysis of proteomics, metabolomics, and lipidomics data using mass spectrometry. , مرکز فناوری اطلاعات و اطلاع رسانی کشاورزی


BMC Bioinformatics | 2017

A novel statistical approach for identification of the master regulator transcription factor

Sinjini Sikdar; Susmita Datta

BackgroundTranscription factors are known to play key roles in carcinogenesis and therefore, are gaining popularity as potential therapeutic targets in drug development. A ‘master regulator’ transcription factor often appears to control most of the regulatory activities of the other transcription factors and the associated genes. This ‘master regulator’ transcription factor is at the top of the hierarchy of the transcriptomic regulation. Therefore, it is important to identify and target the master regulator transcription factor for proper understanding of the associated disease process and identifying the best therapeutic option.MethodsWe present a novel two-step computational approach for identification of master regulator transcription factor in a genome. At the first step of our method we test whether there exists any master regulator transcription factor in the system. We evaluate the concordance of two ranked lists of transcription factors using a statistical measure. In case the concordance measure is statistically significant, we conclude that there is a master regulator. At the second step, our method identifies the master regulator transcription factor, if there exists one.ResultsIn the simulation scenario, our method performs reasonably well in validating the existence of a master regulator when the number of subjects in each treatment group is reasonably large. In application to two real datasets, our method ensures the existence of master regulators and identifies biologically meaningful master regulators. An R code for implementing our method in a sample test data can be found in http://www.somnathdatta.org/software.ConclusionWe have developed a screening method of identifying the ‘master regulator’ transcription factor just using only the gene expression data. Understanding the regulatory structure and finding the master regulator help narrowing the search space for identifying biomarkers for complex diseases such as cancer. In addition to identifying the master regulator our method provides an overview of the regulatory structure of the transcription factors which control the global gene expression profiles and consequently the cell functioning.


Biology Direct | 2016

Exploring the importance of cancer pathways by meta-analysis of differential protein expression networks in three different cancers

Sinjini Sikdar; Somnath Datta; Susmita Datta

BackgroundIt is believed that all cancers occur due to the mutation or change in one or more genes. In order to investigate the significance of the biological pathways which are interrupted by these genetic mutations, we pursue an integrated analysis using multiple cancer datasets released by the International Cancer Genome Consortium (ICGC). This dataset consists of expression profiles for genes/proteins of patients receiving treatment, for three types of cancer - Head and Neck Squamous Cell Carcinoma (HNSC), Lung Adenocarcinoma (LUAD) and Kidney Renal Clear Cell Carcinoma (KIRC). We consider pathway analysis to identify all the biological pathways which are active among the patients and investigate the roles of the significant pathways using a differential network analysis of the protein expression datasets for the three cancers separately. We then integrate the pathway based results of all the three cancers which provide a more comprehensive picture of the three cancers.ResultsFrom our analysis of the protein expression data, overall, RAS and PI3K signaling pathways appear to play the most significant roles in the three cancers - Head and Neck Squamous Cell Carcinoma (HNSC), Lung Adenocarcinoma (LUAD) and Kidney Renal Clear Cell Carcinoma (KIRC).ConclusionThis analysis suggests that the RAS and PI3K signaling pathways are the two most important pathways in all the three cancers and should be investigated further for their potential roles in cancers.ReviewersThis article was reviewed by Joaquin Dopazo and Samiran Ghosh.


Rheumatology | 2018

The subgingival microbiome in patients with established rheumatoid arthritis

Ted R. Mikuls; Clay Walker; Fang Qiu; Fang Yu; Geoffrey M. Thiele; Barnett Alfant; Eric Li; Lisa Y Zhao; Gary P. Wang; Susmita Datta; Jeffrey B. Payne

Objectives To profile and compare the subgingival microbiome of RA patients with OA controls. Methods RA (n = 260) and OA (n = 296) patients underwent full-mouth examination and subgingival samples were collected. Bacterial DNA was profiled using 16 S rRNA Illumina sequencing. Following data filtering and normalization, hierarchical clustering analysis was used to group samples. Multivariable regression was used to examine associations of patient factors with membership in the two largest clusters. Differential abundance between RA and OA was examined using voom method and linear modelling with empirical Bayes moderation (Linear Models for Microarray Analysis, limma), accounting for the effects of periodontitis, race, marital status and smoking. Results Alpha diversity indices were similar in RA and OA after accounting for periodontitis. After filtering, 286 taxa were available for analysis. Samples grouped into one of seven clusters with membership sizes of 324, 223, 3, 2, 2, 1 and 1 patients, respectively. RA-OA status was not associated with cluster membership. Factors associated with cluster 1 (vs 2) membership included periodontitis, smoking, marital status and Caucasian race. Accounting for periodontitis, 10 taxa (3.5% of those examined) were in lower abundance in RA than OA. There were no associations between lower abundance taxa or other select taxa examined with RA autoantibody concentrations. Conclusion Leveraging data from a large case-control study and accounting for multiple factors known to influence oral health status, results from this study failed to identify a subgingival microbial fingerprint that could reliably discriminate RA from OA patients.


Journal of Applied Statistics | 2018

Bayesian hierarchical model for protein identifications

Riten Mitra; Ryan Gill; Sinjini Sikdar; Susmita Datta

ABSTRACT In proteomics, identification of proteins from complex mixtures of proteins extracted from biological samples is an important problem. Among the experimental technologies, mass spectrometry (MS) is the most popular one. Protein identification from MS data typically relies on a ‘two-step’ procedure of identifying the peptide first followed by the separate protein identification procedure next. In this setup, the interdependence of peptides and proteins is neglected resulting in relatively inaccurate protein identification. In this article, we propose a Markov chain Monte Carlo based Bayesian hierarchical model, a first of its kind in protein identification, which integrates the two steps and performs joint analysis of proteins and peptides using posterior probabilities. We remove the assumption of independence of proteins by using clustering group priors to the proteins based on the assumption that proteins sharing the same biological pathway are likely to be present or absent together and are correlated. The complete conditionals of the proposed joint model being tractable, we propose and implement a Gibbs sampling scheme for full posterior inference that provides the estimation and statistical uncertainties of all relevant parameters. The model has better operational characteristics compared to two existing ‘one-step’ procedures on a range of simulation settings as well as on two well-studied datasets.


IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2018

A Combined PLS and Negative Binomial Regression Model for Inferring Association Networks from Next-Generation Sequencing Count Data

Maiju Pesonen; Jaakko Nevalainen; S. Steven Potter; Somnath Datta; Susmita Datta

A major challenge of genomics data is to detect interactions displaying functional associations from large-scale observations. In this study, a new cPLS-algorithm combining partial least squares approach with negative binomial regression is suggested to reconstruct a genomic association network for high-dimensional next-generation sequencing count data. The suggested approach is applicable to the raw counts data, without requiring any further pre-processing steps. In the settings investigated, the cPLS-algorithm outperformed the two widely used comparative methods, graphical lasso, and weighted correlation network analysis. In addition, cPLS is able to estimate the full network for thousands of genes without major computational load. Finally, we demonstrate that cPLS is capable of finding biologically meaningful associations by analyzing an example data set from a previously published study to examine the molecular anatomy of the craniofacial development.


Biology Direct | 2018

Unraveling bacterial fingerprints of city subways from microbiome 16S gene profiles

Alejandro R. Walker; Tyler L. Grimes; Somnath Datta; Susmita Datta

BackgroundMicrobial communities can be location specific, and the abundance of species within locations can influence our ability to determine whether a sample belongs to one city or another. As part of the 2017 CAMDA MetaSUB Inter-City Challenge, next generation sequencing (NGS) data was generated from swipe samples collected from subway stations in Boston, New York City hereafter New York, and Sacramento. DNA was extracted and Illumina sequenced. Sequencing data was provided for all cities as part of 2017 CAMDA contest challenge dataset.ResultsPrincipal component analysis (PCA) showed clear clustering of the samples for the three cities, with a substantial proportion of the variance explained by the first three components. We ran two different classifiers and results were robust for error rate (< 6%) and accuracy (> 95%). The analysis of variance (ANOVA) demonstrated that overall, bacterial composition across the three cities is significantly different. A similar conclusion was reached using a novel bootstrap based test using diversity indices. Last but not least, a co-abundance association network analyses for the taxonomic levels “order”, “family”, and “genus” found different patterns of bacterial networks for the three cities.ConclusionsBacterial fingerprint can be useful to predict sample provenance. In this work prediction of provenance reported with over 95% accuracy. Association based network analysis, emphasized similarities between the closest cities sharing common bacterial composition. ANOVA showed different patterns of bacterial amongst cities, and these findings strongly suggest that bacterial signature across multiple cities are different. This work advocates a data analysis pipeline which could be followed in order to get biological insight from this data. However, the biological conclusions from this analysis is just an early indication out of a pilot microbiome data provided to us through CAMDA 2017 challenge and will be subject to change as we get more complete data sets in the near future. This microbiome data can have potential applications in forensics, ecology, and other sciences.ReviewersThis article was reviewed by Klas Udekwu, Alexandra Graf, and Rafal Mostowy.

Collaboration


Dive into the Susmita Datta's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Maiying Kong

University of Louisville

View shared research outputs
Top Co-Authors

Avatar

Ryan Gill

University of Louisville

View shared research outputs
Top Co-Authors

Avatar

Bart Mertens

Leiden University Medical Center

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chathura Siriwardhana

University of Hawaii at Manoa

View shared research outputs
Researchain Logo
Decentralizing Knowledge