Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Eleazar Eskin is active.

Publication


Featured researches published by Eleazar Eskin.


Nature Biotechnology | 2005

Assessing computational tools for the discovery of transcription factor binding sites

Martin Tompa; Nan Li; Timothy L. Bailey; George M. Church; Bart De Moor; Eleazar Eskin; Alexander V. Favorov; Martin C. Frith; Yutao Fu; W. James Kent; Vsevolod J. Makeev; Andrei A. Mironov; William Stafford Noble; Giulio Pavesi; Mireille Régnier; Nicolas Simonis; Saurabh Sinha; Gert Thijs; Jacques van Helden; Mathias Vandenbogaert; Zhiping Weng; Christopher T. Workman; Chun Ye; Zhou Zhu

The prediction of regulatory elements is a problem where computational methods offer great hope. Over the past few years, numerous tools have become available for this task. The purpose of the current assessment is twofold: to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools.


Nature Genetics | 2010

Variance component model to account for sample structure in genome-wide association studies

Hyun Min Kang; Jae Hoon Sul; Noah Zaitlen; Sit Yee Kong; Nelson B. Freimer; Chiara Sabatti; Eleazar Eskin

Although genome-wide association studies (GWASs) have identified numerous loci associated with complex traits, imprecise modeling of the genetic relatedness within study samples may cause substantial inflation of test statistics and possibly spurious associations. Variance component approaches, such as efficient mixed-model association (EMMA), can correct for a wide range of sample structures by explicitly accounting for pairwise relatedness between individuals, using high-density markers to model the phenotype distribution; but such approaches are computationally impractical. We report here a variance component approach implemented in publicly available software, EMMA eXpedited (EMMAX), that reduces the computational time for analyzing large GWAS data sets from years to hours. We apply this method to two human GWAS data sets, performing association analysis for ten quantitative traits from the Northern Finland Birth Cohort and seven common diseases from the Wellcome Trust Case Control Consortium. We find that EMMAX outperforms both principal component analysis and genomic control in correcting for sample structure.


Genetics | 2008

Efficient Control of Population Structure in Model Organism Association Mapping

Hyun Min Kang; Noah Zaitlen; Claire M. Wade; Andrew Kirby; David Heckerman; Mark J. Daly; Eleazar Eskin

Genomewide association mapping in model organisms such as inbred mouse strains is a promising approach for the identification of risk factors related to human diseases. However, genetic association studies in inbred model organisms are confronted by the problem of complex population structure among strains. This induces inflated false positive rates, which cannot be corrected using standard approaches applied in human association studies such as genomic control or structured association. Recent studies demonstrated that mixed models successfully correct for the genetic relatedness in association mapping in maize and Arabidopsis panel data sets. However, the currently available mixed-model methods suffer from computational inefficiency. In this article, we propose a new method, efficient mixed-model association (EMMA), which corrects for population structure and genetic relatedness in model organism association mapping. Our method takes advantage of the specific nature of the optimization problem in applying mixed models for association mapping, which allows us to substantially increase the computational speed and reliability of the results. We applied EMMA to in silico whole-genome association mapping of inbred mouse strains involving hundreds of thousands of SNPs, in addition to Arabidopsis and maize data sets. We also performed extensive simulation studies to estimate the statistical power of EMMA under various SNP effects, varying degrees of population structure, and differing numbers of multiple measurements per strain. Despite the limited power of inbred mouse association mapping due to the limited number of available inbred strains, we are able to identify significantly associated SNPs, which fall into known QTL or genes identified through previous studies while avoiding an inflation of false positives. An R package implementation and webserver of our EMMA method are publicly available.


Nature | 2011

Mouse genomic variation and its effect on phenotypes and gene regulation.

Thomas M. Keane; Leo Goodstadt; Petr Danecek; Michael A. White; Kim Wong; Binnaz Yalcin; Andreas Heger; Avigail Agam; Guy Slater; Martin Goodson; N A Furlotte; Eleazar Eskin; Christoffer Nellåker; H Whitley; James Cleak; Deborah Janowitz; Polinka Hernandez-Pliego; Andrew Edwards; T G Belgard; Peter L. Oliver; Rebecca E McIntyre; Amarjit Bhomra; Jérôme Nicod; Xiangchao Gan; Wei Yuan; L van der Weyden; Charles A. Steward; Sendu Bala; Jim Stalker; Richard Mott

We report genome sequences of 17 inbred strains of laboratory mice and identify almost ten times more variants than previously known. We use these genomes to explore the phylogenetic history of the laboratory mouse and to examine the functional consequences of allele-specific variation on transcript abundance, revealing that at least 12% of transcripts show a significant tissue-specific expression bias. By identifying candidate functional variants at 718 quantitative trait loci we show that the molecular nature of functional variants and their position relative to genes vary according to the effect size of the locus. These sequences provide a starting point for a new era in the functional analysis of a key model organism.


pacific symposium on biocomputing | 2001

The spectrum kernel: a string kernel for SVM protein classification.

Christina S. Leslie; Eleazar Eskin; William Stafford Noble

We introduce a new sequence-similarity kernel, the spectrum kernel, for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem. Our kernel is conceptually simple and efficient to compute and, in experiments on the SCOP database, performs well in comparison with state-of-the-art methods for homology detection. Moreover, our method produces an SVM classifier that allows linear time classification of test sequences. Our experiments provide evidence that string-based kernels, in conjunction with SVMs, could offer a viable and computationally efficient alternative to other methods of protein classification and homology detection.


ieee symposium on security and privacy | 2001

Data mining methods for detection of new malicious executables

Matthew G. Schultz; Eleazar Eskin; F. Zadok; Salvatore J. Stolfo

A serious security threat today is malicious executables, especially new, unseen malicious executables often arriving as email attachments. These new malicious executables are created at the rate of thousands every year and pose a serious security threat. Current anti-virus systems attempt to detect these new malicious programs with heuristics generated by hand. This approach is costly and oftentimes ineffective. We present a data mining framework that detects new, previously unseen malicious executables accurately and automatically. The data mining framework automatically found patterns in our data set and used these patterns to detect a set of new malicious binaries. Comparing our detection methods with a traditional signature-based method, our method more than doubles the current detection rates for new malicious executables.


Bioinformatics | 2004

Mismatch string kernels for discriminative protein classification

Christina S. Leslie; Eleazar Eskin; Adiel Cohen; Jason Weston; William Stafford Noble

MOTIVATION Classification of proteins sequences into functional and structural families based on sequence homology is a central problem in computational biology. Discriminative supervised machine learning approaches provide good performance, but simplicity and computational efficiency of training and prediction are also important concerns. RESULTS We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the problem of protein classification and remote homology detection. These kernels measure sequence similarity based on shared occurrences of fixed-length patterns in the data, allowing for mutations between patterns. Thus, the kernels provide a biologically well-motivated way to compare protein sequences without relying on family-based generative models such as hidden Markov models. We compute the kernels efficiently using a mismatch tree data structure, allowing us to calculate the contributions of all patterns occurring in the data in one pass while traversing the tree. When used with an SVM, the kernels enable fast prediction on test sequences. We report experiments on two benchmark SCOP datasets, where we show that the mismatch kernel used with an SVM classifier performs competitively with state-of-the-art methods for homology detection, particularly when very few training examples are available. Examination of the highest-weighted patterns learned by the SVM classifier recovers biologically important motifs in protein families and superfamilies.


Nature | 2007

A sequence-based variation map of 8.27 million SNPs in inbred mouse strains.

Kelly A. Frazer; Eleazar Eskin; Hyun Min Kang; Molly A. Bogue; David A. Hinds; Erica Beilharz; Robert V. Gupta; Julie Montgomery; Matt Morenzoni; Geoffrey B. Nilsen; Charit Pethiyagoda; Laura L. Stuve; Frank M. Johnson; Mark J. Daly; Claire M. Wade; D. R. Cox

A dense map of genetic variation in the laboratory mouse genome will provide insights into the evolutionary history of the species and lead to an improved understanding of the relationship between inter-strain genotypic and phenotypic differences. Here we resequence the genomes of four wild-derived and eleven classical strains. We identify 8.27 million high-quality single nucleotide polymorphisms (SNPs) densely distributed across the genome, and determine the locations of the high (divergent subspecies ancestry) and low (common subspecies ancestry) SNP-rate intervals for every pairwise combination of classical strains. Using these data, we generate a genome-wide haplotype map containing 40,898 segments, each with an average of three distinct ancestral haplotypes. For the haplotypes in the classical strains that are unequivocally assigned ancestry, the genetic contributions of the Mus musculus subspecies—M. m. domesticus, M. m. musculus, M. m. castaneus and the hybrid M. m. molossinus—are 68%, 6%, 3% and 10%, respectively; the remaining 13% of haplotypes are of unknown ancestral origin. The considerable regional redundancy of the SNP data will facilitate imputation of the majority of these genotypes in less-densely typed classical inbred strains to provide a complete view of variation in additional strains.


Archive | 2002

A Geometric Framework for Unsupervised Anomaly Detection

Eleazar Eskin; Andrew Oliver Arnold; Michael Prerau; Leonid Portnoy; Sal Stolfo

Most current intrusion detection systems employ signature-based methods or data mining-based methods which rely on labeled training data. This training data is typically expensive to produce. We present a new geometric framework for unsupervised anomaly detection, which are algorithms that are designed to process unlabeled data. In our framework, data elements are mapped to a feature space which is typically a vector space ℛd. Anomalies are detected by determining which points lies in sparse regions of the feature space. We present two feature maps for mapping data elements to a feature space. Our first map is a data-dependent normalization feature map which we apply to network connections. Our second feature map is a spectrum kernel which we apply to system call traces. We present three algorithms for detecting which points lie in sparse regions of the feature space. We evaluate our methods by performing experiments over network records from the KDD CUP 1999 data set and system call traces from the 1999 Lincoln Labs DARPA evaluation.


Molecular Psychiatry | 2009

Genome-wide association study of bipolar disorder in European American and African American individuals

Erin N. Smith; Cinnamon S. Bloss; Thomas B. Barrett; Pamela L. Belmonte; Wade H. Berrettini; William Byerley; William Coryell; David Craig; Howard J. Edenberg; Eleazar Eskin; Tatiana Foroud; Elliot S. Gershon; Tiffany A. Greenwood; Maria Hipolito; Daniel L. Koller; William B. Lawson; Chunyu Liu; Falk W. Lohoff; Melvin G. McInnis; Francis J. McMahon; Daniel B. Mirel; Sarah S. Murray; Caroline M. Nievergelt; J. Nurnberger; Evaristus A. Nwulia; Justin Paschall; James B. Potash; John P. Rice; Thomas G. Schulze; W. Scheftner

To identify bipolar disorder (BD) genetic susceptibility factors, we conducted two genome-wide association (GWA) studies: one involving a sample of individuals of European ancestry (EA; n=1001 cases; n=1033 controls), and one involving a sample of individuals of African ancestry (AA; n=345 cases; n=670 controls). For the EA sample, single-nucleotide polymorphisms (SNPs) with the strongest statistical evidence for association included rs5907577 in an intergenic region at Xq27.1 (P=1.6 × 10−6) and rs10193871 in NAP5 at 2q21.2 (P=9.8 × 10−6). For the AA sample, SNPs with the strongest statistical evidence for association included rs2111504 in DPY19L3 at 19q13.11 (P=1.5 × 10−6) and rs2769605 in NTRK2 at 9q21.33 (P=4.5 × 10−5). We also investigated whether we could provide support for three regions previously associated with BD, and we showed that the ANK3 region replicates in our sample, along with some support for C15Orf53; other evidence implicates BD candidate genes such as SLITRK2. We also tested the hypothesis that BD susceptibility variants exhibit genetic background-dependent effects. SNPs with the strongest statistical evidence for genetic background effects included rs11208285 in ROR1 at 1p31.3 (P=1.4 × 10−6), rs4657247 in RGS5 at 1q23.3 (P=4.1 × 10−6), and rs7078071 in BTBD16 at 10q26.13 (P=4.5 × 10−6). This study is the first to conduct GWA of BD in individuals of AA and suggests that genetic variations that contribute to BD may vary as a function of ancestry.

Collaboration


Dive into the Eleazar Eskin's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Eran Halperin

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Eun Yong Kang

University of California

View shared research outputs
Top Co-Authors

Avatar

Jae Hoon Sul

University of California

View shared research outputs
Top Co-Authors

Avatar

Noah Zaitlen

University of California

View shared research outputs
Top Co-Authors

Avatar

Serghei Mangul

University of California

View shared research outputs
Researchain Logo
Decentralizing Knowledge