Heidi J. Sofia
National Institutes of Health
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Heidi J. Sofia.
BMC Medical Genomics | 2016
Haixu Tang; Xiaoqian Jiang; XiaoFeng Wang; Shuang Wang; Heidi J. Sofia; Dov Fox; Kristin E. Lauter; Bradley Malin; Amalio Telenti; Li Xiong; Lucila Ohno-Machado
The outsourcing of genomic data into public cloud computing settings raises concerns over privacy and security. Significant advancements in secure computation methods have emerged over the past several years, but such techniques need to be rigorously evaluated for their ability to support the analysis of human genomic data in an efficient and cost-effective manner. With respect to public cloud environments, there are concerns about the inadvertent exposure of human genomic data to unauthorized users. In analyses involving multiple institutions, there is additional concern about data being used beyond agreed research scope and being prcoessed in untrused computational environments, which may not satisfy institutional policies. To systematically investigate these issues, the NIH-funded National Center for Biomedical Computing iDASH (integrating Data for Analysis, ‘anonymization’ and SHaring) hosted the second Critical Assessment of Data Privacy and Protection competition to assess the capacity of cryptographic technologies for protecting computation over human genomes in the cloud and promoting cross-institutional collaboration. Data scientists were challenged to design and engineer practical algorithms for secure outsourcing of genome computation tasks in working software, whereby analyses are performed only on encrypted data. They were also challenged to develop approaches to enable secure collaboration on data from genomic studies generated by multiple organizations (e.g., medical centers) to jointly compute aggregate statistics without sharing individual-level records. The results of the competition indicated that secure computation techniques can enable comparative analysis of human genomes, but greater efficiency (in terms of compute time and memory utilization) are needed before they are sufficiently practical for real world environments.
Journal of the American Medical Informatics Association | 2017
Jean Louis Raisaro; Florian Tramèr; Zhanglong Ji; Diyue Bu; Yongan Zhao; W. Knox Carey; David Lloyd; Heidi J. Sofia; Dixie B. Baker; Paul Flicek; Suyash Shringarpure; Carlos Bustamante; Shuang Wang; Xiaoqian Jiang; Lucila Ohno-Machado; Haixu Tang; XiaoFeng Wang; Jean-Pierre Hubaux
Abstract The Global Alliance for Genomics and Health (GA4GH) created the Beacon Project as a means of testing the willingness of data holders to share genetic data in the simplest technical context—a query for the presence of a specified nucleotide at a given position within a chromosome. Each participating site (or “beacon”) is responsible for assuring that genomic data are exposed through the Beacon service only with the permission of the individual to whom the data pertains and in accordance with the GA4GH policy and standards. While recognizing the inference risks associated with large-scale data aggregation, and the fact that some beacons contain sensitive phenotypic associations that increase privacy risk, the GA4GH adjudged the risk of re-identification based on the binary yes/no allele-presence query responses as acceptable. However, recent work demonstrated that, given a beacon with specific characteristics (including relatively small sample size and an adversary who possesses an individual’s whole genome sequence), the individual’s membership in a beacon can be inferred through repeated queries for variants present in the individual’s genome. In this paper, we propose three practical strategies for reducing re-identification risks in beacons. The first two strategies manipulate the beacon such that the presence of rare alleles is obscured; the third strategy budgets the number of accesses per user for each individual genome. Using a beacon containing data from the 1000 Genomes Project, we demonstrate that the proposed strategies can effectively reduce re-identification risk in beacon-like datasets.
npj Genomic Medicine | 2017
Shuang Wang; Xiaoqian Jiang; Haixu Tang; XiaoFeng Wang; Diyue Bu; Knox Carey; Stephanie O.M. Dyke; Dov Fox; Chao Jiang; Kristin E. Lauter; Bradley Malin; Heidi J. Sofia; Amalio Telenti; Lei Wang; Wenhao Wang; Lucila Ohno-Machado
The human genome can reveal sensitive information and is potentially re-identifiable, which raises privacy and security concerns about sharing such data on wide scales. In 2016, we organized the third Critical Assessment of Data Privacy and Protection competition as a community effort to bring together biomedical informaticists, computer privacy and security researchers, and scholars in ethical, legal, and social implications (ELSI) to assess the latest advances on privacy-preserving techniques for protecting human genomic data. Teams were asked to develop novel protection methods for emerging genome privacy challenges in three scenarios: Track (1) data sharing through the Beacon service of the Global Alliance for Genomics and Health. Track (2) collaborative discovery of similar genomes between two institutions; and Track (3) data outsourcing to public cloud services. The latter two tracks represent continuing themes from our 2015 competition, while the former was new and a response to a recently established vulnerability. The winning strategy for Track 1 mitigated the privacy risk by hiding approximately 11% of the variation in the database while permitting around 160,000 queries, a significant improvement over the baseline. The winning strategies in Tracks 2 and 3 showed significant progress over the previous competition by achieving multiple orders of magnitude performance improvement in terms of computational runtime and memory requirements. The outcomes suggest that applying highly optimized privacy-preserving and secure computation techniques to safeguard genomic data sharing and analysis is useful. However, the results also indicate that further efforts are needed to refine these techniques into practical solutions.
Scientific Data | 2018
Moran N. Cabili; Knox Carey; Stephanie O.M. Dyke; Anthony J. Brookes; Marc Fiume; Francis Jeanson; Giselle Kerry; Alex Lash; Heidi J. Sofia; Dylan Spalding; Anne-Marie Tassé; Susheel Varma; Ravi Pandya
The volume of genomics and health data is growing rapidly, driven by sequencing for both research and clinical use. However, under current practices, the data is fragmented into many distinct datasets, and researchers must go through a separate application process for each dataset. This is time-consuming both for the researchers and the data stewards, and it reduces the velocity of research and new discoveries that could improve human health. We propose to simplify this process, by introducing a standard Library Card that identifies and authenticates researchers across all participating datasets. Each researcher would only need to apply once to establish their bona fides as a qualified researcher, and could then use the Library Card to access a wide range of datasets that use a compatible data access policy and authentication protocol.
Nucleic Acids Research | 1998
Valerie Burland; Ying Shao; Nicole T. Perna; Guy Plunkett; Frederick R. Blattner; Heidi J. Sofia
Trends in Biochemical Sciences | 1995
Kenneth E. Rudd; Heidi J. Sofia; Eugene V. Koonin; Gup Plunkett; Sara W. Lazar; Pierre E. Rouviere
Archive | 2015
Vivien Bonazzi; Phil Bourne; Steven E. Brenner; Robin Brown; Ishwar Chandramouliswaran; Jennifer Couch; Sean Davis; Leslie Derr; Asif Dhar; Luke Dunlap; Kevin W. Eliceiri; Leigh Finnegan; Ian Fore; Melissa Haendel; Martin Hammitzsch; Tram Huyen; Daniel S. Katz; Mike Kellen; David N. Kennedy; Jennie Larkin; Jennifer Lin; Peter Lyster; Ron Margolis; Gabor T. Marth; Maryann E. Martone; Michael McLennan; Martin Morgan; Francis Ouellette; Vinay Pai; Andreas Prlić