Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yaniv Erlich is active.

Publication


Featured researches published by Yaniv Erlich.


Science | 2013

Identifying Personal Genomes by Surname Inference

Melissa Gymrek; Amy L. McGuire; David E. Golan; Eran Halperin; Yaniv Erlich

Anonymity Compromised The balance between maintaining individual privacy and sharing genomic information for research purposes has been a topic of considerable controversy. Gymrek et al. (p. 321; see the Policy Forum by Rodriguez et al.) demonstrate that the anonymity of participants (and their families) can be compromised by analyzing Y-chromosome sequences from public genetic genealogy Web sites that contain (sometimes distant) relatives with the same surname. Short tandem repeats (STRs) on the Y chromosome of a target individual (whose sequence was freely available and identified in GenBank) were compared with information in public genealogy Web sites to determine the shortest time to the most recent common ancestor and find the most likely surname, which, when combined with age and state of residency identified the individual. When STRs from 911 individuals were used as the starting points, the analysis projected a success rate of 12% within the U.S. male population with Caucasian ancestry. Further analysis of detailed pedigrees from one collection revealed that families of individuals whose genomes are in public repositories could be identified with high probability. Anonymity of male personal genome data sets can be compromised by means of publicly available data. [Also see News story and Policy Forum by Rodriguez et al.] Sharing sequencing data sets without identifiers has become a common practice in genomics. Here, we report that surnames can be recovered from personal genomes by profiling short tandem repeats on the Y chromosome (Y-STRs) and querying recreational genetic genealogy databases. We show that a combination of a surname with other types of metadata, such as age and state, can be used to triangulate the identity of the target. A key feature of this technique is that it entirely relies on free, publicly accessible Internet resources. We quantitatively analyze the probability of identification for U.S. males. We further demonstrate the feasibility of this technique by tracing back with high probability the identities of multiple participants in public sequencing projects.


Nature | 2016

The Simons Genome Diversity Project: 300 genomes from 142 diverse populations

Swapan Mallick; Heng Li; Mark Lipson; Iain Mathieson; Melissa Gymrek; Fernando Racimo; Mengyao Zhao; Niru Chennagiri; Arti Tandon; Pontus Skoglund; Iosif Lazaridis; Sriram Sankararaman; Qiaomei Fu; Nadin Rohland; Gabriel Renaud; Yaniv Erlich; Thomas Willems; Carla Gallo; Jeffrey P. Spence; Yun S. Song; Giovanni Poletti; Francois Balloux; George van Driem; Peter de Knijff; Irene Gallego Romero; Aashish R. Jha; Doron M. Behar; Claudio M. Bravi; Cristian Capelli; Tor Hervig

Here we report the Simons Genome Diversity Project data set: high quality genomes from 300 individuals from 142 diverse populations. These genomes include at least 5.8 million base pairs that are not present in the human reference genome. Our analysis reveals key features of the landscape of human genome variation, including that the rate of accumulation of mutations has accelerated by about 5% in non-Africans compared to Africans since divergence. We show that the ancestors of some pairs of present-day human populations were substantially separated by 100,000 years ago, well before the archaeologically attested onset of behavioural modernity. We also demonstrate that indigenous Australians, New Guineans and Andamanese do not derive substantial ancestry from an early dispersal of modern humans; instead, their modern human ancestry is consistent with coming from the same source as that of other non-Africans.


Nature Reviews Genetics | 2014

Routes for breaching and protecting genetic privacy

Yaniv Erlich; Arvind Narayanan

We are entering an era of ubiquitous genetic information for research, clinical care and personal curiosity. Sharing these data sets is vital for progress in biomedical research. However, a growing concern is the ability to protect the genetic privacy of the data originators. Here, we present an overview of genetic privacy breaching strategies. We outline the principles of each technique, indicate the underlying assumptions, and assess their technological complexity and maturation. We then review potential mitigation methods for privacy-preserving dissemination of sensitive data and highlight different cases that are relevant to genetic applications.


Genome Research | 2012

lobSTR: A short tandem repeat profiler for personal genomes

Melissa Gymrek; David E. Golan; Saharon Rosset; Yaniv Erlich

Short tandem repeats (STRs) have a wide range of applications, including medical genetics, forensics, and genetic genealogy. High-throughput sequencing (HTS) has the potential to profile hundreds of thousands of STR loci. However, mainstream bioinformatics pipelines are inadequate for the task. These pipelines treat STR mapping as gapped alignment, which results in cumbersome processing times and a biased sampling of STR alleles. Here, we present lobSTR, a novel method for profiling STRs in personal genomes. lobSTR harnesses concepts from signal processing and statistical learning to avoid gapped alignment and to address the specific noise patterns in STR calling. The speed and reliability of lobSTR exceed the performance of current mainstream algorithms for STR profiling. We validated lobSTRs accuracy by measuring its consistency in calling STRs from whole-genome sequencing of two biological replicates from the same individual, by tracing Mendelian inheritance patterns in STR alleles in whole-genome sequencing of a HapMap trio, and by comparing lobSTR results to traditional molecular techniques. Encouraged by the speed and accuracy of lobSTR, we used the algorithm to conduct a comprehensive survey of STR variations in a deeply sequenced personal genome. We traced the mutation dynamics of close to 100,000 STR loci and observed more than 50,000 STR variations in a single genome. lobSTRs implementation is an end-to-end solution. The package accepts raw sequencing reads and provides the user with the genotyping results. It is written in C/C++, includes multi-threading capabilities, and is compatible with the BAM format.


Nature Methods | 2008

Alta-Cyclic: a self-optimizing base caller for next-generation sequencing

Yaniv Erlich; Partha P. Mitra; Melissa delaBastide; W. Richard McCombie; Gregory J. Hannon

Next-generation sequencing is limited to short read lengths and by high error rates. We systematically analyzed sources of noise in the Illumina Genome Analyzer that contribute to these high error rates and developed a base caller, Alta-Cyclic, that uses machine learning to compensate for noise factors. Alta-Cyclic substantially improved the number of accurate reads for sequencing runs up to 78 bases and reduced systematic biases, facilitating confident identification of sequence variants.


Genome Research | 2011

Exome sequencing and disease-network analysis of a single family implicate a mutation in KIF1A in hereditary spastic paraparesis

Yaniv Erlich; Simon Edvardson; Emily Hodges; Shamir Zenvirt; Pramod Thekkat; Avraham Shaag; Talya Dor; Gregory J. Hannon; Orly Elpeleg

Whole exome sequencing has become a pivotal methodology for rapid and cost-effective detection of pathogenic variations in Mendelian disorders. A major challenge of this approach is determining the causative mutation from a substantial number of bystander variations that do not play any role in the disease etiology. Current strategies to analyze variations have mainly relied on genetic and functional arguments such as mode of inheritance, conservation, and loss of function prediction. Here, we demonstrate that disease-network analysis provides an additional layer of information to stratify variations even in the presence of incomplete sequencing coverage, a known limitation of exome sequencing. We studied a case of Hereditary Spastic Paraparesis (HSP) in a single inbred Palestinian family. HSP is a group of neuropathological disorders that are characterized by abnormal gait and spasticity of the lower limbs. Forty-five loci have been associated with HSP and lesions in 20 genes have been documented to induce the disorder. We used whole exome sequencing and homozygosity mapping to create a list of possible candidates. After exhausting the genetic and functional arguments, we stratified the remaining candidates according to their similarity to the previously known disease genes. Our analysis implicated the causative mutation in the motor domain of KIF1A, a gene that has not yet associated with HSP, which functions in anterograde axonal transportation. Our strategy can be useful for a large class of disorders that are characterized by locus heterogeneity, particularly when studying disorders in single families.


Genes & Development | 2009

Cell contact-dependent acquisition of cellular and viral nonautonomously encoded small RNAs

Oded Rechavi; Yaniv Erlich; Hila Amram; Lena Flomenblit; Fedor V. Karginov; Itamar Goldstein; Gregory J. Hannon

In some organisms, small RNA pathways can act nonautonomously, with responses spreading from cell to cell. Dedicated intercellular RNA delivery pathways have not yet been characterized in mammals, although secretory compartments have been found to contain RNA. Here we show that, upon cell contact, T cells acquire from B cells small RNAs that can impact the expression of target genes in the recipient T cells. Synthetic microRNA (miRNA) mimetics, viral miRNAs expressed by infected B cells, and endogenous miRNAs could all be transferred into T cells. These mechanisms may allow small RNA-mediated communication between immune cells. The documented transfer of viral miRNAs raises the possible exploitation of these pathways for viral manipulation of the host immune response.


Nature Genetics | 2016

Punctuated bursts in human male demography inferred from 1,244 worldwide Y-chromosome sequences

G. David Poznik; Yali Xue; Fernando L. Mendez; Thomas Willems; Andrea Massaia; Melissa A. Wilson Sayres; Qasim Ayub; Shane McCarthy; Apurva Narechania; Seva Kashin; Yuan Chen; Ruby Banerjee; Juan L. Rodriguez-Flores; Maria Cerezo; Haojing Shao; Melissa Gymrek; Ankit Malhotra; Sandra Louzada; Rob DeSalle; Graham R. S. Ritchie; Eliza Cerveira; Tomas Fitzgerald; Erik Garrison; Anthony Marcketta; David Mittelman; Mallory Romanovitch; Chengsheng Zhang; Xiangqun Zheng-Bradley; Gonçalo R. Abecasis; Steven A. McCarroll

We report the sequences of 1,244 human Y chromosomes randomly ascertained from 26 worldwide populations by the 1000 Genomes Project. We discovered more than 65,000 variants, including single-nucleotide variants, multiple-nucleotide variants, insertions and deletions, short tandem repeats, and copy number variants. Of these, copy number variants contribute the greatest predicted functional impact. We constructed a calibrated phylogenetic tree on the basis of binary single-nucleotide variants and projected the more complex variants onto it, estimating the number of mutations for each class. Our phylogeny shows bursts of extreme expansion in male numbers that have occurred independently among each of the five continental superpopulations examined, at times of known migrations and technological innovations.


Nature Genetics | 2016

Abundant contribution of short tandem repeats to gene expression variation in humans

Melissa Gymrek; Thomas Willems; Audrey Guilmatre; Haoyang Zeng; Barak Markus; Stoyan Georgiev; Mark J. Daly; Alkes L. Price; Jonathan K. Pritchard; Andrew J. Sharp; Yaniv Erlich

The contribution of repetitive elements to quantitative human traits is largely unknown. Here we report a genome-wide survey of the contribution of short tandem repeats (STRs), which constitute one of the most polymorphic and abundant repeat classes, to gene expression in humans. Our survey identified 2,060 significant expression STRs (eSTRs). These eSTRs were replicable in orthogonal populations and expression assays. We used variance partitioning to disentangle the contribution of eSTRs from that of linked SNPs and indels and found that eSTRs contribute 10–15% of the cis heritability mediated by all common variants. Further functional genomic analyses showed that eSTRs are enriched in conserved regions, colocalize with regulatory elements and may modulate certain histone modifications. By analyzing known genome-wide association study (GWAS) signals and searching for new associations in 1,685 whole genomes from deeply phenotyped individuals, we found that eSTRs are enriched in various clinically relevant conditions. These results highlight the contribution of STRs to the genetic architecture of quantitative human traits.


Genome Research | 2014

The landscape of human STR variation

Thomas Willems; Melissa Gymrek; Gareth Highnam; David Mittelman; Yaniv Erlich

Short tandem repeats are among the most polymorphic loci in the human genome. These loci play a role in the etiology of a range of genetic diseases and have been frequently utilized in forensics, population genetics, and genetic genealogy. Despite this plethora of applications, little is known about the variation of most STRs in the human population. Here, we report the largest-scale analysis of human STR variation to date. We collected information for nearly 700,000 STR loci across more than 1000 individuals in Phase 1 of the 1000 Genomes Project. Extensive quality controls show that reliable allelic spectra can be obtained for close to 90% of the STR loci in the genome. We utilize this call set to analyze determinants of STR variation, assess the human reference genomes representation of STR alleles, find STR loci with common loss-of-function alleles, and obtain initial estimates of the linkage disequilibrium between STRs and common SNPs. Overall, these analyses further elucidate the scale of genetic variation beyond classical point mutations.

Collaboration


Dive into the Yaniv Erlich's collaboration.

Top Co-Authors

Avatar

Melissa Gymrek

University of California

View shared research outputs
Top Co-Authors

Avatar

Dina Zielinski

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Assaf Gordon

Cold Spring Harbor Laboratory

View shared research outputs
Top Co-Authors

Avatar

Thomas Willems

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Barak Markus

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge