Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Heidi J. Sofia is active.

Publication


Featured researches published by Heidi J. Sofia.


BMC Medical Genomics | 2016

Protecting genomic data analytics in the cloud: state of the art and opportunities

Haixu Tang; Xiaoqian Jiang; XiaoFeng Wang; Shuang Wang; Heidi J. Sofia; Dov Fox; Kristin E. Lauter; Bradley Malin; Amalio Telenti; Li Xiong; Lucila Ohno-Machado

The outsourcing of genomic data into public cloud computing settings raises concerns over privacy and security. Significant advancements in secure computation methods have emerged over the past several years, but such techniques need to be rigorously evaluated for their ability to support the analysis of human genomic data in an efficient and cost-effective manner. With respect to public cloud environments, there are concerns about the inadvertent exposure of human genomic data to unauthorized users. In analyses involving multiple institutions, there is additional concern about data being used beyond agreed research scope and being prcoessed in untrused computational environments, which may not satisfy institutional policies. To systematically investigate these issues, the NIH-funded National Center for Biomedical Computing iDASH (integrating Data for Analysis, ‘anonymization’ and SHaring) hosted the second Critical Assessment of Data Privacy and Protection competition to assess the capacity of cryptographic technologies for protecting computation over human genomes in the cloud and promoting cross-institutional collaboration. Data scientists were challenged to design and engineer practical algorithms for secure outsourcing of genome computation tasks in working software, whereby analyses are performed only on encrypted data. They were also challenged to develop approaches to enable secure collaboration on data from genomic studies generated by multiple organizations (e.g., medical centers) to jointly compute aggregate statistics without sharing individual-level records. The results of the competition indicated that secure computation techniques can enable comparative analysis of human genomes, but greater efficiency (in terms of compute time and memory utilization) are needed before they are sufficiently practical for real world environments.


Journal of the American Medical Informatics Association | 2017

Addressing Beacon Re-Identification Attacks: Quantification and Mitigation of Privacy Risks

Jean Louis Raisaro; Florian Tramèr; Zhanglong Ji; Diyue Bu; Yongan Zhao; W. Knox Carey; David Lloyd; Heidi J. Sofia; Dixie B. Baker; Paul Flicek; Suyash Shringarpure; Carlos Bustamante; Shuang Wang; Xiaoqian Jiang; Lucila Ohno-Machado; Haixu Tang; XiaoFeng Wang; Jean-Pierre Hubaux

Abstract The Global Alliance for Genomics and Health (GA4GH) created the Beacon Project as a means of testing the willingness of data holders to share genetic data in the simplest technical context—a query for the presence of a specified nucleotide at a given position within a chromosome. Each participating site (or “beacon”) is responsible for assuring that genomic data are exposed through the Beacon service only with the permission of the individual to whom the data pertains and in accordance with the GA4GH policy and standards. While recognizing the inference risks associated with large-scale data aggregation, and the fact that some beacons contain sensitive phenotypic associations that increase privacy risk, the GA4GH adjudged the risk of re-identification based on the binary yes/no allele-presence query responses as acceptable. However, recent work demonstrated that, given a beacon with specific characteristics (including relatively small sample size and an adversary who possesses an individual’s whole genome sequence), the individual’s membership in a beacon can be inferred through repeated queries for variants present in the individual’s genome. In this paper, we propose three practical strategies for reducing re-identification risks in beacons. The first two strategies manipulate the beacon such that the presence of rare alleles is obscured; the third strategy budgets the number of accesses per user for each individual genome. Using a beacon containing data from the 1000 Genomes Project, we demonstrate that the proposed strategies can effectively reduce re-identification risk in beacon-like datasets.


npj Genomic Medicine | 2017

A community effort to protect genomic data sharing, collaboration and outsourcing

Shuang Wang; Xiaoqian Jiang; Haixu Tang; XiaoFeng Wang; Diyue Bu; Knox Carey; Stephanie O.M. Dyke; Dov Fox; Chao Jiang; Kristin E. Lauter; Bradley Malin; Heidi J. Sofia; Amalio Telenti; Lei Wang; Wenhao Wang; Lucila Ohno-Machado

The human genome can reveal sensitive information and is potentially re-identifiable, which raises privacy and security concerns about sharing such data on wide scales. In 2016, we organized the third Critical Assessment of Data Privacy and Protection competition as a community effort to bring together biomedical informaticists, computer privacy and security researchers, and scholars in ethical, legal, and social implications (ELSI) to assess the latest advances on privacy-preserving techniques for protecting human genomic data. Teams were asked to develop novel protection methods for emerging genome privacy challenges in three scenarios: Track (1) data sharing through the Beacon service of the Global Alliance for Genomics and Health. Track (2) collaborative discovery of similar genomes between two institutions; and Track (3) data outsourcing to public cloud services. The latter two tracks represent continuing themes from our 2015 competition, while the former was new and a response to a recently established vulnerability. The winning strategy for Track 1 mitigated the privacy risk by hiding approximately 11% of the variation in the database while permitting around 160,000 queries, a significant improvement over the baseline. The winning strategies in Tracks 2 and 3 showed significant progress over the previous competition by achieving multiple orders of magnitude performance improvement in terms of computational runtime and memory requirements. The outcomes suggest that applying highly optimized privacy-preserving and secure computation techniques to safeguard genomic data sharing and analysis is useful. However, the results also indicate that further efforts are needed to refine these techniques into practical solutions.


Scientific Data | 2018

Simplifying research access to genomics and health data with Library Cards.

Moran N. Cabili; Knox Carey; Stephanie O.M. Dyke; Anthony J. Brookes; Marc Fiume; Francis Jeanson; Giselle Kerry; Alex Lash; Heidi J. Sofia; Dylan Spalding; Anne-Marie Tassé; Susheel Varma; Ravi Pandya

The volume of genomics and health data is growing rapidly, driven by sequencing for both research and clinical use. However, under current practices, the data is fragmented into many distinct datasets, and researchers must go through a separate application process for each dataset. This is time-consuming both for the researchers and the data stewards, and it reduces the velocity of research and new discoveries that could improve human health. We propose to simplify this process, by introducing a standard Library Card that identifies and authenticates researchers across all participating datasets. Each researcher would only need to apply once to establish their bona fides as a qualified researcher, and could then use the Library Card to access a wide range of datasets that use a compatible data access policy and authentication protocol.


Nucleic Acids Research | 1998

The complete DNA sequence and analysis of the large virulence plasmid of Escherichia coli O157:H7

Valerie Burland; Ying Shao; Nicole T. Perna; Guy Plunkett; Frederick R. Blattner; Heidi J. Sofia


Trends in Biochemical Sciences | 1995

A new family of peptidyl-prolyl isomerases

Kenneth E. Rudd; Heidi J. Sofia; Eugene V. Koonin; Gup Plunkett; Sara W. Lazar; Pierre E. Rouviere


Archive | 2015

Software Discovery Index Workshop Report

Vivien Bonazzi; Phil Bourne; Steven E. Brenner; Robin Brown; Ishwar Chandramouliswaran; Jennifer Couch; Sean Davis; Leslie Derr; Asif Dhar; Luke Dunlap; Kevin W. Eliceiri; Leigh Finnegan; Ian Fore; Melissa Haendel; Martin Hammitzsch; Tram Huyen; Daniel S. Katz; Mike Kellen; David N. Kennedy; Jennie Larkin; Jennifer Lin; Peter Lyster; Ron Margolis; Gabor T. Marth; Maryann E. Martone; Michael McLennan; Martin Morgan; Francis Ouellette; Vinay Pai; Andreas Prlić

Collaboration


Dive into the Heidi J. Sofia's collaboration.

Top Co-Authors

Avatar

Haixu Tang

Indiana University Bloomington

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Shuang Wang

University of California

View shared research outputs
Top Co-Authors

Avatar

XiaoFeng Wang

Indiana University Bloomington

View shared research outputs
Top Co-Authors

Avatar

Xiaoqian Jiang

University of California

View shared research outputs
Top Co-Authors

Avatar

Amalio Telenti

J. Craig Venter Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Diyue Bu

Indiana University Bloomington

View shared research outputs
Top Co-Authors

Avatar

Dov Fox

University of San Diego

View shared research outputs
Researchain Logo
Decentralizing Knowledge