Simina M. Boca
Georgetown University Medical Center
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Simina M. Boca.
Science | 2007
Laura D. Wood; D. Williams Parsons; Siân Jones; Jimmy Lin; Tobias Sjöblom; Rebecca J. Leary; Dong Shen; Simina M. Boca; Thomas D. Barber; Janine Ptak; Natalie Silliman; Steve Szabo; Zoltan Dezso; Vadim Ustyanksky; Tatiana Nikolskaya; Yuri Nikolsky; Rachel Karchin; Paul Wilson; Joshua S. Kaminker; Zemin Zhang; Randal Croshaw; Joseph Willis; Dawn Dawson; Michail Shipitsin; James K V Willson; Saraswati Sukumar; Kornelia Polyak; Ben Ho Park; Charit L. Pethiyagoda; P.V. Krishna Pant
Human cancer is caused by the accumulation of mutations in oncogenes and tumor suppressor genes. To catalog the genetic changes that occur during tumorigenesis, we isolated DNA from 11 breast and 11 colorectal tumors and determined the sequences of the genes in the Reference Sequence database in these samples. Based on analysis of exons representing 20,857 transcripts from 18,191 genes, we conclude that the genomic landscapes of breast and colorectal cancers are composed of a handful of commonly mutated gene “mountains” and a much larger number of gene “hills” that are mutated at low frequency. We describe statistical and bioinformatic tools that may help identify mutations with a role in tumorigenesis. These results have implications for understanding the nature and heterogeneity of human cancers and for using personal genomics for tumor diagnosis and therapy.
Science | 2011
D. Williams Parsons; Meng Li; Xiaosong Zhang; Siân Jones; Rebecca J. Leary; Jimmy Lin; Simina M. Boca; Hannah Carter; Josue Samayoa; Chetan Bettegowda; Gary L. Gallia; George I. Jallo; Zev A. Binder; Yuri Nikolsky; James Hartigan; Doug Smith; Daniela S. Gerhard; Daniel W. Fults; Scott R. VandenBerg; Mitchel S. Berger; Suely Kazue Nagahashi Marie; Sueli Mieko Oba Shinjo; Carlos Clara; Peter C. Phillips; Jane E. Minturn; Jaclyn A. Biegel; Alexander R. Judkins; Adam C. Resnick; Phillip B. Storm; Tom Curran
Genomic analysis of a childhood cancer reveals markedly fewer mutations than what is typically seen in adult cancers. Medulloblastoma (MB) is the most common malignant brain tumor of children. To identify the genetic alterations in this tumor type, we searched for copy number alterations using high-density microarrays and sequenced all known protein-coding genes and microRNA genes using Sanger sequencing in a set of 22 MBs. We found that, on average, each tumor had 11 gene alterations, fewer by a factor of 5 to 10 than in the adult solid tumors that have been sequenced to date. In addition to alterations in the Hedgehog and Wnt pathways, our analysis led to the discovery of genes not previously known to be altered in MBs. Most notably, inactivating mutations of the histone-lysine N-methyltransferase genes MLL2 or MLL3 were identified in 16% of MB patients. These results demonstrate key differences between the genetic landscapes of adult and childhood cancers, highlight dysregulation of developmental pathways as an important mechanism underlying MBs, and identify a role for a specific type of histone methylation in human tumorigenesis.
Proceedings of the National Academy of Sciences of the United States of America | 2008
Rebecca J. Leary; Jimmy Lin; Jordan M. Cummins; Simina M. Boca; Laura D. Wood; D. Williams Parsons; Siân Jones; Tobias Sjöblom; Ben Ho Park; Ramon Parsons; Joseph Willis; Dawn Dawson; James K V Willson; Tatiana Nikolskaya; Yuri Nikolsky; Levy Kopelovich; Nick Papadopoulos; Len A. Pennacchio; Tian Li Wang; Sanford D. Markowitz; Giovanni Parmigiani; Kenneth W. Kinzler; Bert Vogelstein; Victor E. Velculescu
We have performed a genome-wide analysis of copy number changes in breast and colorectal tumors using approaches that can reliably detect homozygous deletions and amplifications. We found that the number of genes altered by major copy number changes, deletion of all copies or amplification to at least 12 copies per cell, averaged 17 per tumor. We have integrated these data with previous mutation analyses of the Reference Sequence genes in these same tumor types and have identified genes and cellular pathways affected by both copy number changes and point alterations. Pathways enriched for genetic alterations included those controlling cell adhesion, intracellular signaling, DNA topological change, and cell cycle control. These analyses provide an integrated view of copy number and sequencing alterations on a genome-wide scale and identify genes and pathways that could prove useful for cancer diagnosis and therapy.
Genomics | 2009
Giovanni Parmigiani; Simina M. Boca; Jimmy Lin; Kenneth W. Kinzler; Victor E. Velculescu; Bert Vogelstein
The availability of the human genome sequence and progress in sequencing and bioinformatic technologies have enabled genome-wide investigation of somatic mutations in human cancers. This article briefly reviews challenges arising in the statistical analysis of mutational data of this kind. A first challenge is that of designing studies that efficiently allocate sequencing resources. We show that this can be addressed by two-stage designs and demonstrate via simulations that even relatively small studies can produce lists of candidate cancer genes that are highly informative for future research efforts. A second challenge is to distinguish mutated genes that are selected for by cancer (drivers) from mutated genes that have no role in the development of cancer and simply happened to mutate (passengers). We suggest that this question is best approached as a classification problem and discuss some of the difficulties of more traditional testing-based approaches. A third challenge is to identify biologic processes affected by the driver genes. This can be pursued by gene set analyses. These can reliably identify functional groups and pathways that are enriched for mutated genes even when the individual genes involved in those pathways or sets are not mutated at sufficient frequencies to provide conclusive evidence as drivers.
Cancer Epidemiology, Biomarkers & Prevention | 2013
Joshua N. Sampson; Simina M. Boca; Xiao-Ou Shu; Rachael Z. Stolzenberg-Solomon; Charles E. Matthews; Ann W. Hsing; Yu Ting Tan; Bu Tian Ji; Wong Ho Chow; Qiuyin Cai; Da Ke Liu; Gong Yang; Yong Bing Xiang; Wei Zheng; Rashmi Sinha; Amanda J. Cross; Steven C. Moore
Background: Metabolite levels within an individual vary over time. This within-individual variability, coupled with technical variability, reduces the power for epidemiologic studies to detect associations with disease. Here, the authors assess the variability of a large subset of metabolites and evaluate the implications for epidemiologic studies. Methods: Using liquid chromatography/mass spectrometry (LC/MS) and gas chromatography-mass spectroscopy (GC/MS) platforms, 385 metabolites were measured in 60 women at baseline and year-one of the Shanghai Physical Activity Study, and observed patterns were confirmed in the Prostate, Lung, Colorectal, and Ovarian Cancer Screening study. Results: Although the authors found high technical reliability (median intraclass correlation = 0.8), reliability over time within an individual was low. Taken together, variability in the assay and variability within the individual accounted for the majority of variability for 64% of metabolites. Given this, a metabolite would need, on average, a relative risk of 3 (comparing upper and lower quartiles of “usual” levels) or 2 (comparing quartiles of observed levels) to be detected in 38%, 74%, and 97% of studies including 500, 1,000, and 5,000 individuals. Age, gender, and fasting status factors, which are often of less interest in epidemiologic studies, were associated with 30%, 67%, and 34% of metabolites, respectively, but the associations were weak and explained only a small proportion of the total metabolite variability. Conclusion: Metabolomics will require large, but feasible, sample sizes to detect the moderate effect sizes typical for epidemiologic studies. Impact: We offer guidelines for determining the sample sizes needed to conduct metabolomic studies in epidemiology. Cancer Epidemiol Biomarkers Prev; 22(4); 631–40. ©2013 AACR.
Journal of the Royal Society Interface | 2018
Travers Ching; Daniel Himmelstein; Brett K. Beaulieu-Jones; Alexandr A. Kalinin; Brian T. Do; Gregory P. Way; Enrico Ferrero; Paul-Michael Agapow; Michael Zietz; Michael M. Hoffman; Wei Xie; Gail Rosen; Benjamin J. Lengerich; Johnny Israeli; Jack Lanchantin; Stephen Woloszynek; Anne E. Carpenter; Avanti Shrikumar; Jinbo Xu; Evan M. Cofer; Christopher A. Lavender; Srinivas C. Turaga; Amr Alexandari; Zhiyong Lu; David J. Harris; Dave DeCaprio; Yanjun Qi; Anshul Kundaje; Yifan Peng; Laura Wiley
Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems—patient classification, fundamental biological processes and treatment of patients—and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural networks prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine.
Genome Biology | 2010
Simina M. Boca; Kenneth W. Kinzler; Victor E. Velculescu; Bert Vogelstein; Giovanni Parmigiani
Recent research has revealed complex heterogeneous genomic landscapes in human cancers. However, mutations tend to occur within a core group of pathways and biological processes that can be grouped into gene sets. To better understand the significance of these pathways, we have developed an approach that initially scores each gene set at the patient rather than the gene level. In mutation analysis, these patient-oriented methods are more transparent, interpretable, and statistically powerful than traditional gene-oriented methods.
Cancer | 2014
Amanda J. Cross; Steven C. Moore; Simina M. Boca; Wen-Yi Huang; Xiaoqin Xiong; Rachael Z. Stolzenberg-Solomon; Rashmi Sinha; Joshua N. Sampson
Colorectal cancer is highly prevalent, and the vast majority of cases are thought to be sporadic, although few risk factors have been identified. Using metabolomics technology, our aim was to identify biomarkers prospectively associated with colorectal cancer.
Carcinogenesis | 2014
Amanda J. Cross; Simina M. Boca; Neal D. Freedman; Neil E. Caporaso; Wen-Yi Huang; Rashmi Sinha; Joshua N. Sampson; Steven C. Moore
Colorectal cancer is not strictly considered a tobacco-related malignancy, but modest associations have emerged from large meta-analyses. Most studies, however, use self-reported data, which are subject to misclassification. Biomarkers of tobacco exposure may reduce misclassification and provide insight into metabolic variability that potentially influences carcinogenesis. Our aim was to identify metabolites that represent smoking habits and individual variation in tobacco metabolism, and investigate their association with colorectal cancer. In a nested case-control study of 255 colorectal cancers and 254 matched controls identified in the Prostate, Lung, Colorectal and Ovarian cancer screening trial, baseline serum was used to identify metabolites by ultra-high-performance liquid-phase chromatography and mass spectrometry, as well as gas chromatography with tandem mass spectrometry. Odds ratios (OR) and 95% confidence intervals (CI) were estimated by logistic regression. Self-reported current smoking was associated with serum cotinine, O-cresol sulfate and hydroxycotinine. Self-reported current smoking of any tobacco (OR = 1.90, 95% CI: 1.02-3.54) and current cigarette smoking (OR = 1.51, 95% CI: 0.75-3.04) were associated with elevated colorectal cancer risks, although the latter was not statistically significant. Individuals with detectable levels of hydroxycotinine had an increased colorectal cancer risk compared with those with undetectable levels (OR = 2.68, 95% CI: 1.33-5.40). Although those with detectable levels of cotinine had a suggestive elevated risk of this malignancy (OR = 1.81, 95% CI: 0.98-3.33), those with detectable levels of O-cresol sulfate did not (OR = 1.16, 95% CI: 0.57-2.37). Biomarkers capturing smoking behavior and metabolic variation exhibit stronger associations with colorectal cancer than self-report, providing additional evidence for a role for tobacco in this malignancy.
Complexity | 2006
Minglei Wang; Simina M. Boca; Rakhee Kalelkar; Jay E. Mittenthal; Gustavo Caetano-Anollés
The protein world has a hierarchical and redundant organization that can be specified in terms of evolutionary units of molecular structure, the protein domains. The Structural Classification of Proteins (SCOP) has unified domains into a comparatively small set of folding architectures, the protein fold families and superfamilies, and these have been further grouped into protein folds. In this study, we reconstruct the evolution of the protein world using information embedded in a structural genomic census of fold architectures defined by a phylogenomic analysis of 185 completely sequenced genomes using advanced hidden Markov models and 776 folds described in SCOP release 1.67. Our study confirms the existence of defined evolutionary patterns of architectural diversification and explores how phylogenomic trees generated from folds relate to those reconstructed from fold superfamilies. Evolutionary patterns help us propose a general conceptual model that describes the growth of architectures in the protein world.