Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Subhajit Sengupta is active.

Publication


Featured researches published by Subhajit Sengupta.


pacific symposium on biocomputing | 2014

Bayclone: Bayesian nonparametric inference of tumor subclones using NGS data.

Subhajit Sengupta; Jing Wang; Juhee Lee; Peter Müller; Kamalakar Gulukota; Arunava Banerjee; Yuan Ji

In this paper, we present a novel feature allocation model to describe tumor heterogeneity (TH) using next-generation sequencing (NGS) data. Taking a Bayesian approach, we extend the Indian buffet process (IBP) to define a class of nonparametric models, the categorical IBP (cIBP). A cIBP takes categorical values to denote homozygous or heterozygous genotypes at each SNV. We define a subclone as a vector of these categorical values, each corresponding to an SNV. Instead of partitioning somatic mutations into non-overlapping clusters with similar cellular prevalences, we took a different approach using feature allocation. Importantly, we do not assume somatic mutations with similar cellular prevalence must be from the same subclone and allow overlapping mutations shared across subclones. We argue that this is closer to the underlying theory of phylogenetic clonal expansion, as somatic mutations occurred in parent subclones should be shared across the parent and child subclones. Bayesian inference yields posterior probabilities of the number, genotypes, and proportions of subclones in a tumor sample, thereby providing point estimates as well as variabilities of the estimates for each subclone. We report results on both simulated and real data. BayClone is available at http://health.bsd.uchicago.edu/yji/soft.html.


Journal of The Royal Statistical Society Series C-applied Statistics | 2016

Bayesian inference for intratumour heterogeneity in mutations and copy number variation

Juhee Lee; Peter Müller; Subhajit Sengupta; Kamalakar Gulukota; Yuan Ji

Tumor samples are heterogeneous. They consist of different subclones that are characterized by differences in DNA nucleotide sequences and copy numbers on multiple loci. Heterogeneity can be measured through the identification of the subclonal copy number and sequence at a selected set of loci. Understanding that the accurate identification of variant allele fractions greatly depends on a precise determination of copy numbers, we develop a Bayesian feature allocation model for jointly calling subclonal copy numbers and the corresponding allele sequences for the same loci. The proposed method utilizes three random matrices, L , Z and w to represent subclonal copy numbers ( L ), numbers of subclonal variant alleles ( Z ) and cellular fractions of subclones in samples ( w ), respectively. The unknown number of subclones implies a random number of columns for these matrices. We use next-generation sequencing data to estimate the subclonal structures through inference on these three matrices. Using simulation studies and a real data analysis, we demonstrate how posterior inference on the subclonal structure is enhanced with the joint modeling of both structure and sequencing variants on subclonal genomes. Software is available at http://compgenome.org/BayClone2.


bioRxiv | 2017

The evolutionary history of 2,658 cancers

Moritz Gerstung; Clemency Jolly; Ignaty Leshchiner; Stefan Dentro; Santiago Gonzalez; Thomas J. Mitchell; Yulia Rubanova; Pavana Anur; Daniel Rosebrock; Kaixan Yu; Maxime Tarabichi; Amit G Deshwar; Jeff Wintersinger; Kortine Kleinheinz; Ignacio Vázquez-García; Kerstin Haase; Subhajit Sengupta; Geoff Macintyre; Salem Malikic; Nilgun Donmez; Dimitri Livitz; Marek Cmero; Jonas Demeulemeester; Steve Schumacher; Yu Fan; Xiaotong Yao; Juhee Lee; Matthias Schlesner; Paul C. Boutros; David Bowtell

Cancer develops through a process of somatic evolution. Here, we use whole-genome sequencing of 2,778 tumour samples from 2,658 donors to reconstruct the life history, evolution of mutational processes, and driver mutation sequences of 39 cancer types. The early phases of oncogenesis are driven by point mutations in a small set of driver genes, often including biallelic inactivation of tumour suppressors. Early oncogenesis is also characterised by specific copy number gains, such as trisomy 7 in glioblastoma or isochromosome 17q in medulloblastoma. By contrast, increased genomic instability, a nearly four-fold diversification of driver genes, and an acceleration of point mutation processes are features of later stages. Copy-number alterations often occur in mitotic crises leading to simultaneous gains of multiple chromosomal segments. Timing analysis suggests that driver mutations often precede diagnosis by many years, and in some cases decades, providing a window of opportunity for early cancer detection.


Nucleic Acids Research | 2016

Ultra-fast local-haplotype variant calling using paired-end DNA-sequencing data reveals somatic mosaicism in tumor and normal blood samples

Subhajit Sengupta; Kamalakar Gulukota; Yitan Zhu; Carole Ober; Katherine Naughton; William Wentworth-Sheilds; Yuan Ji

Somatic mosaicism refers to the existence of somatic mutations in a fraction of somatic cells in a single biological sample. Its importance has mainly been discussed in theory although experimental work has started to emerge linking somatic mosaicism to disease diagnosis. Through novel statistical modeling of paired-end DNA-sequencing data using blood-derived DNA from healthy donors as well as DNA from tumor samples, we present an ultra-fast computational pipeline, LocHap that searches for multiple single nucleotide variants (SNVs) that are scaffolded by the same reads. We refer to scaffolded SNVs as local haplotypes (LH). When an LH exhibits more than two genotypes, we call it a local haplotype variant (LHV). The presence of LHVs is considered evidence of somatic mosaicism because a genetically homogeneous cell population will not harbor LHVs. Applying LocHap to whole-genome and whole-exome sequence data in DNA from normal blood and tumor samples, we find wide-spread LHVs across the genome. Importantly, we find more LHVs in tumor samples than in normal samples, and more in older adults than in younger ones. We confirm the existence of LHVs and somatic mosaicism by validation studies in normal blood samples. LocHap is publicly available at http://www.compgenome.org/lochap.


G3: Genes, Genomes, Genetics | 2017

Phased Genotyping-by-Sequencing Enhances Analysis of Genetic Diversity and Reveals Divergent Copy Number Variants in Maize

Heather C. Manching; Subhajit Sengupta; Keith R. Hopper; Shawn W. Polson; Yuan Ji; Randall J. Wisser

High-throughput sequencing (HTS) of reduced representation genomic libraries has ushered in an era of genotyping-by-sequencing (GBS), where genome-wide genotype data can be obtained for nearly any species. However, there remains a need for imputation-free GBS methods for genotyping large samples taken from heterogeneous populations of heterozygous individuals. This requires that a number of issues encountered with GBS be considered, including the sequencing of nonoverlapping sets of loci across multiple GBS libraries, a common missing data problem that results in low call rates for markers per individual, and a tendency for applicability only in inbred line samples with sufficient linkage disequilibrium for accurate imputation. We addressed these issues while developing and validating a new, comprehensive platform for GBS. This study supports the notion that GBS can be tailored to particular aims, and using Zea mays our results indicate that large samples of unknown pedigree can be genotyped to obtain complete and accurate GBS data. Optimizing size selection to sequence a high proportion of shared loci among individuals in different libraries and using simple in silico filters, a GBS procedure was established that produces high call rates per marker (>85%) with accuracy exceeding 99.4%. Furthermore, by capitalizing on the sequence-read structure of GBS data (stacks of reads), a new tool for resolving local haplotypes and scoring phased genotypes was developed, a feature that is not available in many GBS pipelines. Using local haplotypes reduces the marker dimensionality of the genotype matrix while increasing the informativeness of the data. Phased GBS in maize also revealed the existence of reproducibly inaccurate (apparent accuracy) genotypes that were due to divergent copy number variants (CNVs) unobservable in the underlying single nucleotide polymorphism (SNP) data.


Archive | 2016

Bayesian Feature Allocation Models for Tumor Heterogeneity

Juhee Lee; Peter Müller; Subhajit Sengupta; Kamalakar Gulukota; Yuan Ji

Tumor samples are composed of subclones that evolve stochastically by acquiring mutations and by selection of those that are beneficial to the survival of the organism or local environment. This process results in the often observed heterogeneity of tumor samples. We review some recent work on a new class of feature allocation models for statistical inference on this tumor heterogeneity. We use next-generation sequencing data. The developed methods identify cell subpopulations (subclones) in tumor samples and allow us to cluster samples based on these identified subclones. We characterize subclones by latent haplotypes that are defined as a scaffold of single nucleotide variations (SNVs) on the same homologous genome. That is, each subclone is defined by a unique set of SNVs. We formally represent these sets of SNVs in a binary matrix with columns corresponding to subclones and entries indicating the presence or absence of a set of SNVs that characterize each subclone. We use a simplified version of the Indian buffet process (IBP) as a prior model on this latent binary matrix. In a model extension we develop a categorical IBP that allows us to incorporate copy number variants (CNVs) in addition to SNVs to jointly define subclones. We illustrate the proposed methods with several data analyses.


Archive | 2015

Estimating Latent Cell Subpopulations with Bayesian Feature Allocation Models

Yuan Ji; Subhajit Sengupta; Juhee Lee; Peter Müller; Kamalakar Gulukota

Tumor cells are genetically heterogeneous. The collection of the entire tumor cell population consists of different subclones that can be characterized by mutations in sequence and structure at various genomic locations. Using next-generation sequencing data, we characterize tumor heterogeneity using Bayesian nonparametric inference. Specifically, we estimate the number of subclones in a tumor sample, and for each subclone, we estimate the subclonal copy number and single nucleotide mutations at a selected set of loci. Posterior summaries are presented in three matrices, namely, the matrix of subclonal copy numbers (\(\boldsymbol{L}\)), subclonal variant alleles (\(\boldsymbol{Z}\)), and the population frequencies of the subclones (\(\boldsymbol{w}\)). The proposed method can handle a single or multiple tumor samples. Computation via Markov chain Monte Carlo yields posterior Monte Carlo samples of all three matrices, allowing for the assessment of any desired inference summary. Simulation and real-world examples are provided as illustration. An R package is available at http://www.cran.r-project.org/web/packages/BayClone2/index.html.


bioRxiv | 2018

Portraits of genetic intra-tumour heterogeneity and subclonal selection across cancer types

Stefan Dentro; Ignaty Leshchiner; Kerstin Haase; Maxime Tarabichi; Jeff Wintersinger; Amit G Deshwar; Kaixian Yu; Yulia Rubanova; Geoff Mcintyre; Ignacio Vázquez-García; Kortine Kleinheinz; Dimitri Livitz; Salem Malikic; Nilgun Donmez; Subhajit Sengupta; Jonas Demeulemeester; Pavana Anur; Clemency Jolly; Marek Cmero; Daniel Rosebrock; Steven E. Schumacher; Yu Fan; Matthew Fittall; Ruben M. Drews; Xiaotong Yao; Juhee Lee; Matthias Schlesner; Hongtu Zhu; David J. Adams; Gad Getz

Intra-tumor heterogeneity (ITH) is a mechanism of therapeutic resistance and therefore an important clinical challenge. However, the extent, origin and drivers of ITH across cancer types are poorly understood. To address this question, we extensively characterize ITH across whole-genome sequences of 2,658 cancer samples, spanning 38 cancer types. Nearly all informative samples (95.1%) contain evidence of distinct subclonal expansions, with frequent branching relationships between subclones. We observe positive selection of subclonal driver mutations across most cancer types, and identify cancer type specific subclonal patterns of driver gene mutations, fusions, structural variants and copy-number alterations, as well as dynamic changes in mutational processes between subclonal expansions. Our results underline the importance of ITH and its drivers in tumor evolution, and provide an unprecedented pan-cancer resource of comprehensively annotated subclonal events from whole-genome sequencing data.Continued evolution in cancers gives rise to intra-tumour heterogeneity (ITH), which is a major mechanism of therapeutic resistance and therefore an important clinical challenge. However, the extent, origin and drivers of ITH across cancer types are poorly understood. Here, we extensively characterise ITH across 2,778 cancer whole genome sequences from 36 cancer types. We demonstrate that nearly all tumours (95.1%) with sufficient sequencing depth contain evidence of recent subclonal expansions and most cancer types show clear signs of positive selection in both clonal and subclonal protein coding variants. We find distinctive subclonal patterns of driver gene mutations, fusions, structural variation and copy-number alterations across cancer types. Dynamic, tumour-type specific changes of mutational processes between subclonal expansions shape differences between clonal and subclonal events. Our results underline the importance of ITH and its drivers in tumour evolution and provide an unprecedented pan-cancer resource of extensively annotated subclonal events, laying a foundation for future cancer genomic studies.


Cancer Research | 2018

Abstract 218: The evolutionary history of 2,658 cancers

Clemency Jolly; Moritz Gerstung; Ignaty Leshchiner; Stefan Dentro; Santiago Gonzalez; Thomas J. Mitchell; Yulia Rubanova; Pavana Anur; Daniel Rosebrock; Kaixian Yu; Maxime Tarabichi; Amit G Deshwar; Jeff Wintersinger; Kortine Kleinheinz; Ignacio Vásquez-García; Kerstin Haase; Subhajit Sengupta; Geoff Macintyre; Salem Malikic; Nilgun Donmez; Dimitri Livitz; Mark Cmero; Jonas Demeulemeester; Steve Schumacher; Yu Fan; Xiaotong Yao; Juhee Lee; Matthias Schlesner; Paul C. Boutros; David Bowtell

Cancer develops through a continuous process of somatic evolution. Whole genome sequencing provides a snapshot of the tumor genome at the point of sampling, however, the data can contain information that permits the reconstruction of a tumor9s evolutionary past. Here, we apply such life history analyses on an unprecedented scale, to a set of 2,658 tumors spanning 39 cancer types. We estimated the timing of large chromosomal gains during tumor evolution, by comparing the rates of doubled to non-doubled point mutations within gained regions. Although we find that such events typically occur in the second half of clonal evolution, we also observe distinctive and early chromosomal gains in some cancer types, such as gains of chromosomes 7, 19 and 20 in glioblastoma, and isochromosome 17q in medulloblastoma. By integrating these results with the qualitative timing of individual driver mutations, we obtained an overall ranking, from early to late, of frequent somatic events per cancer type, which both identified novel patterns of tumor evolution, and incorporated additional detail into known models, such as the progression of APC-KRAS-TP53 in colorectal cancer proposed by Vogelstein and Fearon. To estimate how mutational processes acting on the tumor genome change over time, we classified mutations in each sample according to three broad time periods (early clonal, late clonal, and subclonal), and quantified the activity of mutational signatures in each period. Most mutational processes appear to remain remarkably constant, however, certain signatures show clear and consistent changes during clonal evolution. Particularly, mutational signatures associated with exposure to carcinogens, such as smoking and UV light, tend to decrease over time. In contrast, signatures associated with defective endogenous processes, such as APOBEC mutagenesis and defective double strand break repair, show an increase between early and late phases of tumor evolution. Making use of clock-like mutational signatures, we converted mutational time estimates for large events, such as whole genome duplication (WGD), and the emergence of the most recent common ancestor (MRCA), into real time estimates, which allowed us to combine our analyses into overall timelines of cancer evolution, per tumor type. For example, the typical timeline of ovarian adenocarcinoma development shows that early tumor evolution is characterized by mutations in TP53, and widespread genome instability, with WGD events taking place on average 8 years prior to diagnosis. In later stages of evolution, signatures of defective repair processes increase, and the MRCA emerges on average 1 year before diagnosis. Taken together, these data reveal the common and divergent evolutionary trajectories available to a cancer, which might be crucial in understanding specific tumor biology, and in providing new opportunities for early detection and cancer prevention. Citation Format: Clemency Jolly, Moritz Gerstung, Ignaty Leshchiner, Stefan C. Dentro, Santiago Gonzalez, Thomas J. Mitchell, Yulia Rubanova, Pavana Anur, Daniel Rosebrock, Kaixian Yu, Maxime Tarabichi, Amit Deshwar, Jeff Wintersinger, Kortine Kleinheinz, Ignacio Vasquez-Garcia, Kerstin Haase, Subhajit Sengupta, Geoff Macintyre, Salem Malikic, Nilgun Donmez, Dimitri G. Livitz, Mark Cmero, Jonas Demeulemeester, Steve Schumacher, Yu Fan, Xiaotong Yao, Juhee Lee, Matthias Schlesner, Paul C. Boutros, David D. Bowtell, Hongtu Zhu, Gad Getz, Marcin Imielinski, Rameen Beroukhim, S Cenk Sahinalp, Yuan Ji, Martin Peifer, Florian Markowetz, Ville Mustonen, Ke Juan, Wenyi Wang, Quaid D. Morris, Paul T. Spellman, David C. Wedge, Peter Van Loo, PCAWG Evolution and Heterogeneity Working Group. The evolutionary history of 2,658 cancers [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2018; 2018 Apr 14-18; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2018;78(13 Suppl):Abstract nr 218.


bioRxiv | 2017

Imaging-Genomics Study Of Head-Neck Squamous Cell Carcinoma: Associations Between Radiomic Phenotypes And Genomic Mechanisms Via Integration Of TCGA And TCIA

Yitan Zhu; A.S.R. Mohamed; Stephen Y. Lai; Shengjie Yang; Aasheesh Kanwar; Lin Wei; M. Kamal; Subhajit Sengupta; Hesham Elhalawani; Heath D. Skinner; Dennis Mackin; Jay Shiao; Jay A. Messer; Andrew J. Wong; Yao Ding; J. Zhang; L Court; Yuan Ji; Clifton D. Fuller

Purpose Recent data suggest that imaging radiomics features for a tumor could predict important genomic biomarkers. Understanding the relationship between radiomic and genomic features is important for basic cancer research and future patient care. For Head and Neck Squamous Cell Carcinoma (HNSCC), we perform a comprehensive study to discover the imaging-genomics associations and explore the potential of predicting tumor genomic alternations using radiomic features. Methods Our retrospective study integrates whole-genome multi-omics data from The Cancer Genome Atlas (TCGA) with matched computed tomography imaging data from The Cancer Imaging Archive (TCIA) for the same set of 126 HNSCC patients. Linear regression analysis and gene set enrichment analysis are used to identify statistically significant associations between radiomic imaging features and genomic features. Random forest classifier is used to predict two key HNSCC molecular biomarkers, the status of human papilloma virus (HPV) and disruptive TP53 mutation, based on radiomic features. Results Wide-spread and statistically significant associations are discovered between genomic features (including miRNA expressions, protein expressions, somatic mutations, and transcriptional activities, copy number variations, and promoter region DNA methylation changes of pathways) and radiomic features characterizing the size, shape, and texture of tumor. Prediction of HPV and TP53 mutation status using radiomic features achieves an area under the receiver operating characteristics curve (AUC) of 0.71 and 0.641, respectively. Conclusion Our analysis suggests that radiomic features are associated with genomic characteristics in HNSCC and provides justification for continued development of radiomics as biomarkers for relevant genomic alterations in HNSCC.

Collaboration


Dive into the Subhajit Sengupta's collaboration.

Top Co-Authors

Avatar

Yuan Ji

NorthShore University HealthSystem

View shared research outputs
Top Co-Authors

Avatar

Juhee Lee

University of California

View shared research outputs
Top Co-Authors

Avatar

Kamalakar Gulukota

NorthShore University HealthSystem

View shared research outputs
Top Co-Authors

Avatar

Peter Müller

University of Texas at Austin

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge