Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Deniz Yorukoglu is active.

Publication


Featured researches published by Deniz Yorukoglu.


Bioinformatics | 2010

Next-generation VariationHunter

Fereydoun Hormozdiari; Iman Hajirasouliha; Phuong Dao; Faraz Hach; Deniz Yorukoglu; Can Alkan; Evan E. Eichler; S. Cenk Sahinalp

Recent years have witnessed an increase in research activity for the detection of structural variants (SVs) and their association to human disease. The advent of next-generation sequencing technologies make it possible to extend the scope of structural variation studies to a point previously unimaginable as exemplified by the 1000 Genomes Project. Although various computational methods have been described for the detection of SVs, no such algorithm is yet fully capable of discovering transposon insertions, a very important class of SVs to the study of human evolution and disease. In this article, we provide a complete and novel formulation to discover both loci and classes of transposons inserted into genomes sequenced with high-throughput sequencing technologies. In addition, we also present ‘conflict resolution’ improvements to our earlier combinatorial SV detection algorithm (VariationHunter) by taking the diploid nature of the human genome into consideration. We test our algorithms with simulated data from the Venter genome (HuRef) and are able to discover >85% of transposon insertion events with precision of >90%. We also demonstrate that our conflict resolution algorithm (denoted as VariationHunter-CR) outperforms current state of the art (such as original VariationHunter, BreakDancer and MoDIL) algorithms when tested on the genome of the Yoruba African individual (NA18507). Availability: The implementation of algorithm is available at http://compbio.cs.sfu.ca/strvar.htm. Contact: [email protected]; [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Genome Research | 2011

Alu repeat discovery and characterization within human genomes.

Fereydoun Hormozdiari; Can Alkan; Mario Ventura; Iman Hajirasouliha; Maika Malig; Faraz Hach; Deniz Yorukoglu; Phuong Dao; Marzieh Bakhshi; S. Cenk Sahinalp; Evan E. Eichler

Human genomes are now being rapidly sequenced, but not all forms of genetic variation are routinely characterized. In this study, we focus on Alu retrotransposition events and seek to characterize differences in the pattern of mobile insertion between individuals based on the analysis of eight human genomes sequenced using next-generation sequencing. Applying a rapid read-pair analysis algorithm, we discover 4342 Alu insertions not found in the human reference genome and show that 98% of a selected subset (63/64) experimentally validate. Of these new insertions, 89% correspond to AluY elements, suggesting that they arose by retrotransposition. Eighty percent of the Alu insertions have not been previously reported and more novel events were detected in Africans when compared with non-African samples (76% vs. 69%). Using these data, we develop an experimental and computational screen to identify ancestry informative Alu retrotransposition events among different human populations.


PLOS Computational Biology | 2014

HapTree: a novel Bayesian framework for single individual polyplotyping using NGS data.

Emily Berger; Deniz Yorukoglu; Jian Peng; Bonnie Berger

As the more recent next-generation sequencing (NGS) technologies provide longer read sequences, the use of sequencing datasets for complete haplotype phasing is fast becoming a reality, allowing haplotype reconstruction of a single sequenced genome. Nearly all previous haplotype reconstruction studies have focused on diploid genomes and are rarely scalable to genomes with higher ploidy. Yet computational investigations into polyploid genomes carry great importance, impacting plant, yeast and fish genomics, as well as the studies of the evolution of modern-day eukaryotes and (epi)genetic interactions between copies of genes. In this paper, we describe a novel maximum-likelihood estimation framework, HapTree, for polyploid haplotype assembly of an individual genome using NGS read datasets. We evaluate the performance of HapTree on simulated polyploid sequencing read data modeled after Illumina sequencing technologies. For triploid and higher ploidy genomes, we demonstrate that HapTree substantially improves haplotype assembly accuracy and efficiency over the state-of-the-art; moreover, HapTree is the first scalable polyplotyping method for higher ploidy. As a proof of concept, we also test our method on real sequencing data from NA12878 (1000 Genomes Project) and evaluate the quality of assembled haplotypes with respect to trio-based diplotype annotation as the ground truth. The results indicate that HapTree significantly improves the switch accuracy within phased haplotype blocks as compared to existing haplotype assembly methods, while producing comparable minimum error correction (MEC) values. A summary of this paper appears in the proceedings of the RECOMB 2014 conference, April 2–5.


Nature Biotechnology | 2015

Quality score compression improves genotyping accuracy.

Y. William Yu; Deniz Yorukoglu; Jian Peng; Bonnie Berger

Most next-generation sequencing (NGS) quality scores are space intensive, redundant and often misleading. In this Correspondence, we recover quality information directly from sequence data using a compression tool named Quartz, rendering such scores redundant and yielding substantially better space and time efficiencies for storage and analysis. Quartz is designed to operate on NGS reads in FASTQ format, but can be trivially modified to discard quality scores in other formats for which scores are paired with sequence information. Discarding 95% of quality scores counterintuitively resulted in improved SNP calling, implying that compression need not come at the expense of accuracy.


research in computational molecular biology | 2014

Traversing the k-mer Landscape of NGS Read Datasets for Quality Score Sparsification

Y. William Yu; Deniz Yorukoglu; Bonnie Berger

It is becoming increasingly impractical to indefinitely store raw sequencing data for later processing in an uncompressed state. In this paper, we describe a scalable compressive framework, Read-Quality-Sparsifier (RQS), which substantially outperforms the compression ratio and speed of other de novo quality score compression methods while maintaining SNP-calling accuracy. Surprisingly, RQS also improves the SNP-calling accuracy on a gold-standard, real-life sequencing dataset (NA12878) using a k-mer density profile constructed from 77 other individuals from the 1000 Genomes Project. This improvement in downstream accuracy emerges from the observation that quality score values within NGS datasets are inherently encoded in the k-mer landscape of the genomic sequences. To our knowledge, RQS is the first scalable sequence based quality compression method that can efficiently compress quality scores of terabyte-sized and larger sequencing datasets. AVAILABILITY An implementation of our method, RQS, is available for download at: http://rqs.csail.mit.edu/.


Bioinformatics | 2012

Dissect: detection and characterization of novel structural alterations in transcribed sequences

Deniz Yorukoglu; Faraz Hach; Lucas Swanson; Colin Collins; Inanc Birol; S. Cenk Sahinalp

Motivation: Computational identification of genomic structural variants via high-throughput sequencing is an important problem for which a number of highly sophisticated solutions have been recently developed. With the advent of high-throughput transcriptome sequencing (RNA-Seq), the problem of identifying structural alterations in the transcriptome is now attracting significant attention. In this article, we introduce two novel algorithmic formulations for identifying transcriptomic structural variants through aligning transcripts to the reference genome under the consideration of such variation. The first formulation is based on a nucleotide-level alignment model; a second, potentially faster formulation is based on chaining fragments shared between each transcript and the reference genome. Based on these formulations, we introduce a novel transcriptome-to-genome alignment tool, Dissect (DIScovery of Structural Alteration Event Containing Transcripts), which can identify and characterize transcriptomic events such as duplications, inversions, rearrangements and fusions. Dissect is suitable for whole transcriptome structural variation discovery problems involving sufficiently long reads or accurately assembled contigs. Results: We tested Dissect on simulated transcripts altered via structural events, as well as assembled RNA-Seq contigs from human prostate cancer cell line C4-2. Our results indicate that Dissect has high sensitivity and specificity in identifying structural alteration events in simulated transcripts as well as uncovering novel structural alterations in cancer transcriptomes. Availability: Dissect is available for public use at: http://dissect-trans.sourceforge.net Contact: [email protected]; [email protected]; [email protected]


Nature Biotechnology | 2016

Compressive mapping for next-generation sequencing

Deniz Yorukoglu; Yun William Yu; Jian Peng; Bonnie Berger

VOLUME 34 NUMBER 4 APRIL 2016 NATURE BIOTECHNOLOGY Compressive mapping for nextgeneration sequencing event of substantial share price increases. Second, additional certainty can be provided to a company if specific limits or formulas are established in its compensation plan. As a result, stockholder approval of the plan will also constitute stockholder approval of the caps or grant formulas. Companies that take this approach, however, will limit the flexibility they currently retain when setting director compensation. In addition, in light of the Facebook case, companies should ensure shareholder approval of actions is formalized. Companies will not be able to rely on the informal acquiescence of a controlling stockholder to an action. Furthermore, in Facebook, the court reviewed the cash retainer paid to nonemployee directors (in addition to the equity compensation component). This serves as a subtle reminder that companies seeking stockholder ratification of nonemployee director compensation should consider having cash compensation ratified as well.


Bioinformatics | 2016

Fast genotyping of known SNPs through approximate k-mer matching.

Ariya Shajii; Deniz Yorukoglu; Yun William Yu; Bonnie Berger

MOTIVATION As the volume of next-generation sequencing (NGS) data increases, faster algorithms become necessary. Although speeding up individual components of a sequence analysis pipeline (e.g. read mapping) can reduce the computational cost of analysis, such approaches do not take full advantage of the particulars of a given problem. One problem of great interest, genotyping a known set of variants (e.g. dbSNP or Affymetrix SNPs), is important for characterization of known genetic traits and causative disease variants within an individual, as well as the initial stage of many ancestral and population genomic pipelines (e.g. GWAS). RESULTS We introduce lightweight assignment of variant alleles (LAVA), an NGS-based genotyping algorithm for a given set of SNP loci, which takes advantage of the fact that approximate matching of mid-size k-mers (with k = 32) can typically uniquely identify loci in the human genome without full read alignment. LAVA accurately calls the vast majority of SNPs in dbSNP and Affymetrixs Genome-Wide Human SNP Array 6.0 up to about an order of magnitude faster than standard NGS genotyping pipelines. For Affymetrix SNPs, LAVA has significantly higher SNP calling accuracy than existing pipelines while using as low as ∼5 GB of RAM. As such, LAVA represents a scalable computational method for population-level genotyping studies as well as a flexible NGS-based replacement for SNP arrays. AVAILABILITY AND IMPLEMENTATION LAVA software is available at http://lava.csail.mit.edu CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


research in computational molecular biology | 2015

Topological Signatures for Population Admixture.

Laxmi Parida; Filippo Utro; Deniz Yorukoglu; Anna Paola Carrieri; David N. Kuhn; Saugata Basu

As populations with multilinear transmission (i.e., mixing of genetic material from two parents, say) evolve over generations, the genetic transmission lines constitute complicated networks. In contrast, unilinear transmission leads to simpler network structures (trees). The genetic exchange in multilinear transmission is further influenced by migration, incubation, mixing and so on. The task we address in the paper is to tease apart subtle admixtures from the usual interrelationships of related populations. We present a combinatorial approach based on persistence in topology to detect admixture in populations. We show, based on controlled simulations, that topological characteristics have the potential for detecting subtle admixture in related populations. We then apply the technique successfully to a set of avocado germplasm data indicating that the approach has the potential for novel characterizations of relatedness in populations. We believe that this approach also has the potential for not only detecting but also discriminating ancient from recent admixture.


pattern recognition in bioinformatics | 2008

Discovery of Biomarkers for Hexachlorobenzene Toxicity Using Population Based Methods on Gene Expression Data

Cem Meydan; Alper Küçükural; Deniz Yorukoglu; O. Uğur Sezerman

Discovering toxicity biomarkers is important in drug discovery to safely evaluate possible toxic effects of a substance in early phases. We tried evolutionary classification methods for selecting the important classifier genes in hexachlorobenzene toxicity using microarray data. Using modified genetic algorithms for selection of minimum number of features for classification of gene expression data, we discovered a number of gene sets of size 4 that were able to discriminate between the control and the hexachlorobenzene (HCB) exposed group of Brown-Norway rats with >99% accuracy in 5-fold cross-validation tests, whereas classification using all of the genes with SVM and other methods yielded results that vary between 48.48% to 81.81%. Making use of this small number of genes as biomarkers may allow us to detect toxicity of substances with mechanisms of toxicity similar to HCB in a fast and cost efficient manner when there are no emerging symptoms.

Collaboration


Dive into the Deniz Yorukoglu's collaboration.

Top Co-Authors

Avatar

Bonnie Berger

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Yun William Yu

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Emily Berger

University of California

View shared research outputs
Top Co-Authors

Avatar

S. Cenk Sahinalp

Indiana University Bloomington

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Phuong Dao

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Y. William Yu

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge