Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Christopher A. Cassa is active.

Publication


Featured researches published by Christopher A. Cassa.


Genome Research | 2012

Disclosing pathogenic genetic variants to research participants: Quantifying an emerging ethical responsibility

Christopher A. Cassa; Sarah K. Savage; Patrick L. Taylor; Robert C. Green; Amy L. McGuire; Kenneth D. Mandl

There is an emerging consensus that when investigators obtain genomic data from research participants, they may incur an ethical responsibility to inform at-risk individuals about clinically significant variants discovered during the course of their research. With whole-exome sequencing becoming commonplace and the falling costs of full-genome sequencing, there will be an increasingly large number of variants identified in research participants that may be of sufficient clinical relevance to share. An explicit approach to triaging and communicating these results has yet to be developed, and even the magnitude of the task is uncertain. To develop an estimate of the number of variants that might qualify for disclosure, we apply recently published recommendations for the return of results to a defined and representative set of variants and then extrapolate these estimates to genome scale. We find that the total number of variants meeting the threshold for recommended disclosure ranges from 3955-12,579 (3.79%-12.06%, 95% CI) in the most conservative estimate to 6998-17,189 (6.69%-16.48%, 95% CI) in an estimate including variants with variable disease expressivity. Additionally, if the growth rate from the previous 4 yr continues, we estimate that the total number of disease-associated variants will grow 37% over the next 4 yr.


Journal of the American Medical Informatics Association | 2006

A Context-sensitive Approach to Anonymizing Spatial Surveillance Data: Impact on Outbreak Detection

Christopher A. Cassa; Shaun J. Grannis; J. Marc Overhage; Kenneth D. Mandl

OBJECTIVE The use of spatially based methods and algorithms in epidemiology and surveillance presents privacy challenges for researchers and public health agencies. We describe a novel method for anonymizing individuals in public health data sets by transposing their spatial locations through a process informed by the underlying population density. Further, we measure the impact of the skew on detection of spatial clustering as measured by a spatial scanning statistic. DESIGN Cases were emergency department (ED) visits for respiratory illness. Baseline ED visit data were injected with artificially created clusters ranging in magnitude, shape, and location. The geocoded locations were then transformed using a de-identification algorithm that accounts for the local underlying population density. MEASUREMENTS A total of 12,600 separate weeks of case data with artificially created clusters were combined with control data and the impact on detection of spatial clustering identified by a spatial scan statistic was measured. RESULTS The anonymization algorithm produced an expected skew of cases that resulted in high values of data set k-anonymity. De-identification that moves points an average distance of 0.25 km lowers the spatial cluster detection sensitivity by less than 4% and lowers the detection specificity less than 1%. CONCLUSION A population-density-based Gaussian spatial blurring markedly decreases the ability to identify individuals in a data set while only slightly decreasing the performance of a standardly used outbreak detection tool. These findings suggest new approaches to anonymizing data for spatial epidemiology and surveillance.


PLOS Currents | 2013

Twitter as a sentinel in emergency situations: lessons from the Boston marathon explosions.

Christopher A. Cassa; Rumi Chunara; Kenneth D. Mandl; John S. Brownstein

Immediately following the Boston Marathon attacks, individuals near the scene posted a deluge of data to social media sites. Previous work has shown that these data can be leveraged to provide rapid insight during natural disasters, disease outbreaks and ongoing conflicts that can assist in the public health and medical response. Here, we examine and discuss the social media messages posted immediately after and around the Boston Marathon bombings, and find that specific keywords appear frequently prior to official public safety and news media reports. Individuals immediately adjacent to the explosions posted messages within minutes via Twitter which identify the location and specifics of events, demonstrating a role for social media in the early recognition and characterization of emergency events. *Christopher Cassa and Rumi Chunara contributed equally to this work.Immediately following the Boston Marathon attacks, individuals near the scene posted a deluge of data to social media sites. Previous work has shown that these data can be leveraged to provide rapid insight during natural disasters, disease outbreaks and ongoing conflicts that can assist in the public health and medical response. Here, we examine and discuss the social media messages posted immediately after and around the Boston Marathon bombings, and find that specific keywords appear frequently prior to official public safety and news media reports. Individuals immediately adjacent to the explosions posted messages within minutes via Twitter which identify the location and specifics of events, demonstrating a role for social media in the early recognition and characterization of emergency events. *Christopher Cassa and Rumi Chunara contributed equally to this work.


Proceedings of the National Academy of Sciences of the United States of America | 2008

Revealing the spatial distribution of a disease while preserving privacy

Shannon C. Wieland; Christopher A. Cassa; Kenneth D. Mandl; Bonnie Berger

Datasets describing the health status of individuals are important for medical research but must be used cautiously to protect patient privacy. For patient data containing geographical identifiers, the conventional solution is to aggregate the data by large areas. This method often preserves privacy but suffers from substantial information loss, which degrades the quality of subsequent disease mapping or cluster detection studies. Other heuristic methods for de-identifying spatial patient information do not quantify the risk to individual privacy. We develop an optimal method based on linear programming to add noise to individual locations that preserves the distribution of a disease. The method ensures a small, quantitative risk of individual re-identification. Because the amount of noise added is minimal for the desired degree of privacy protection, the de-identified set is ideal for spatial epidemiological studies. We apply the method to patients in New York County, New York, showing that privacy is guaranteed while moving patients 25—150 times less than aggregation by zip code.


Human Mutation | 2013

Large numbers of genetic variants considered to be pathogenic are common in asymptomatic individuals.

Christopher A. Cassa; Mark Y. Tong; Daniel M. Jordan

It is now affordable to order clinically interpreted whole‐genome sequence reports from clinical laboratories. One major component of these reports is derived from the knowledge base of previously identified pathogenic variants, including research articles, locus‐specific, and other databases. While over 150,000 such pathogenic variants have been identified, many of these were originally discovered in small cohort studies of affected individuals, so their applicability to asymptomatic populations is unclear. We analyzed the prevalence of a large set of pathogenic variants from the medical and scientific literature in a large set of asymptomatic individuals (N = 1,092) and found 8.5% of these pathogenic variants in at least one individual. In the average individual in the 1000 Genomes Project, previously identified pathogenic variants occur on average 294 times (σ = 25.5) in homozygous form and 942 times (σ = 68.2) in heterozygous form. We also find that many of these pathogenic variants are frequently occurring: there are 3,744 variants with minor allele frequency (MAF) ≥ 0.01 (4.6%) and 2,837 variants with MAF ≥ 0.05 (3.5%). This indicates that many of these variants may be erroneous findings or have lower penetrance than previously expected.


Nature | 2015

Identification of cis -suppression of human disease mutations by comparative genomics

Daniel M. Jordan; Stephan Frangakis; Christelle Golzio; Christopher A. Cassa; Joanne Kurtzberg; Task Force for Neonatal Genomics; Erica E. Davis; Shamil R. Sunyaev; Nicholas Katsanis

Patterns of amino acid conservation have served as a tool for understanding protein evolution. The same principles have also found broad application in human genomics, driven by the need to interpret the pathogenic potential of variants in patients. Here we performed a systematic comparative genomics analysis of human disease-causing missense variants. We found that an appreciable fraction of disease-causing alleles are fixed in the genomes of other species, suggesting a role for genomic context. We developed a model of genetic interactions that predicts most of these to be simple pairwise compensations. Functional testing of this model on two known human disease genes revealed discrete cis amino acid residues that, although benign on their own, could rescue the human mutations in vivo. This approach was also applied to ab initio gene discovery to support the identification of a de novo disease driver in BTG2 that is subject to protective cis-modification in more than 50 species. Finally, on the basis of our data and models, we developed a computational tool to predict candidate residues subject to compensation. Taken together, our data highlight the importance of cis-genomic context as a contributor to protein evolution; they provide an insight into the complexity of allele effect on phenotype; and they are likely to assist methods for predicting allele pathogenicity.


PLOS Genetics | 2015

Dominance of Deleterious Alleles Controls the Response to a Population Bottleneck

Daniel J. Balick; Ron Do; Christopher A. Cassa; David Reich; Shamil R. Sunyaev

Population bottlenecks followed by re-expansions have been common throughout history of many populations. The response of alleles under selection to such demographic perturbations has been a subject of great interest in population genetics. On the basis of theoretical analysis and computer simulations, we suggest that this response qualitatively depends on dominance. The number of dominant or additive deleterious alleles per haploid genome is expected to be slightly increased following the bottleneck and re-expansion. In contrast, the number of completely or partially recessive alleles should be sharply reduced. Changes of population size expose differences between recessive and additive selection, potentially providing insight into the prevalence of dominance in natural populations. Specifically, we use a simple statistic, BR≡∑xipop1/∑xjpop2, where x i represents the derived allele frequency, to compare the number of mutations in different populations, and detail its functional dependence on the strength of selection and the intensity of the population bottleneck. We also provide empirical evidence showing that gene sets associated with autosomal recessive disease in humans may have a B R indicative of recessive selection. Together, these theoretical predictions and empirical observations show that complex demographic history may facilitate rather than impede inference of parameters of natural selection.


International Journal of Health Geographics | 2006

An unsupervised classification method for inferring original case locations from low-resolution disease maps

John S. Brownstein; Christopher A. Cassa; Isaac S. Kohane; Kenneth D. Mandl

BackgroundWidespread availability of geographic information systems software has facilitated the use of disease mapping in academia, government and private sector. Maps that display the address of affected patients are often exchanged in public forums, and published in peer-reviewed journal articles. As previously reported, a search of figure legends in five major medical journals found 19 articles from 1994–2004 that identify over 19,000 patient addresses. In this report, a method is presented to evaluate whether patient privacy is being breached in the publication of low-resolution disease maps.ResultsTo demonstrate the effect, a hypothetical low-resolution map of geocoded patient addresses was created and the accuracy with which patient addresses can be resolved is described. Through georeferencing and unsupervised classification of the original image, the method precisely re-identified 26% (144/550) of the patient addresses from a presentation quality map and 79% (432/550) from a publication quality map. For the presentation quality map, 99.8% of the addresses were within 70 meters (approximately one city block length) of the predicted patient location, 51.6% of addresses were identified within five buildings, 70.7% within ten buildings and 93% within twenty buildings. For the publication quality map, all addresses were within 14 meters and 11 buildings of the predicted patient location.ConclusionThis study demonstrates that lowering the resolution of a map displaying geocoded patient addresses does not sufficiently protect patient addresses from re-identification. Guidelines to protect patient privacy, including those of medical journals, should reflect policies that ensure privacy protection when spatial data are displayed or published.


International Journal of Health Geographics | 2008

Re-Identification of Home Addresses from Spatial Locations Anonymized by Gaussian Skew

Christopher A. Cassa; Shannon C. Wieland; Kenneth D. Mandl

BackgroundKnowledge of the geographical locations of individuals is fundamental to the practice of spatial epidemiology. One approach to preserving the privacy of individual-level addresses in a data set is to de-identify the data using a non-deterministic blurring algorithm that shifts the geocoded values. We investigate a vulnerability in this approach which enables an adversary to re-identify individuals using multiple anonymized versions of the original data set. If several such versions are available, each can be used to incrementally refine estimates of the original geocoded location.ResultsWe produce multiple anonymized data sets using a single set of addresses and then progressively average the anonymized results related to each address, characterizing the steep decline in distance from the re-identified point to the original location, (and the reduction in privacy). With ten anonymized copies of an original data set, we find a substantial decrease in average distance from 0.7 km to 0.2 km between the estimated, re-identified address and the original address. With fifty anonymized copies of an original data set, we find a decrease in average distance from 0.7 km to 0.1 km.ConclusionWe demonstrate that multiple versions of the same data, each anonymized by non-deterministic Gaussian skew, can be used to ascertain original geographic locations. We explore solutions to this problem that include infrastructure to support the safe disclosure of anonymized medical data to prevent inference or re-identification of original address data, and the use of a Markov-process based algorithm to mitigate this risk.


BMC Medical Genomics | 2008

My sister's keeper?: genomic research and the identifiability of siblings

Christopher A. Cassa; Brian Schmidt; Isaac S. Kohane; Kenneth D. Mandl

BackgroundGenomic sequencing of SNPs is increasingly prevalent, though the amount of familial information these data contain has not been quantified.MethodsWe provide a framework for measuring the risk to siblings of a patients SNP genotype disclosure, and demonstrate that sibling SNP genotypes can be inferred with substantial accuracy.ResultsExtending this inference technique, we determine that a very low number of matches at commonly varying SNPs is sufficient to confirm sib-ship, demonstrating that published sequence data can reliably be used to derive sibling identities. Using HapMap trio data, at SNPs where one child is homozygotic major, with a minor allele frequency ≤ 0.20, (N = 452684, 65.1%) we achieve 91.9% inference accuracy for sibling genotypes.ConclusionThese findings demonstrate that substantial discrimination and privacy risks arise from use of inferred familial genomic data.

Collaboration


Dive into the Christopher A. Cassa's collaboration.

Top Co-Authors

Avatar

Kenneth D. Mandl

Boston Children's Hospital

View shared research outputs
Top Co-Authors

Avatar

Shamil R. Sunyaev

Brigham and Women's Hospital

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Dana Vuzman

Weizmann Institute of Science

View shared research outputs
Top Co-Authors

Avatar

Agnes Toth-Petroczy

Brigham and Women's Hospital

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jill A. Rosenfeld

Baylor College of Medicine

View shared research outputs
Researchain Logo
Decentralizing Knowledge