Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where S. Cenk Sahinalp is active.

Publication


Featured researches published by S. Cenk Sahinalp.


Bioinformatics | 2010

PSORTb 3.0

Nancy Y. Yu; James R. Wagner; Matthew R. Laird; Gabor Melli; Sébastien Rey; Ray mond Lo; Phuong Dao; S. Cenk Sahinalp; Martin Ester; Leonard J. Foster; Fiona S. L. Brinkman

Motivation: PSORTb has remained the most precise bacterial protein subcellular localization (SCL) predictor since it was first made available in 2003. However, the recall needs to be improved and no accurate SCL predictors yet make predictions for archaea, nor differentiate important localization subcategories, such as proteins targeted to a host cell or bacterial hyperstructures/organelles. Such improvements should preferably be encompassed in a freely available web-based predictor that can also be used as a standalone program. Results: We developed PSORTb version 3.0 with improved recall, higher proteome-scale prediction coverage, and new refined localization subcategories. It is the first SCL predictor specifically geared for all prokaryotes, including archaea and bacteria with atypical membrane/cell wall topologies. It features an improved standalone program, with a new batch results delivery system complementing its web interface. We evaluated the most accurate SCL predictors using 5-fold cross validation plus we performed an independent proteomics analysis, showing that PSORTb 3.0 is the most accurate but can benefit from being complemented by Proteome Analyst predictions. Availability: http://www.psort.org/psortb (download open source software or use the web interface). Contact: [email protected] Supplementary Information: Supplementary data are availableat Bioinformatics online.


Nature Genetics | 2009

Personalized copy number and segmental duplication maps using next-generation sequencing

Can Alkan; Jeffrey M. Kidd; Tomas Marques-Bonet; Gozde Aksay; Francesca Antonacci; Fereydoun Hormozdiari; Jacob O. Kitzman; Carl Baker; Maika Malig; Onur Mutlu; S. Cenk Sahinalp; Richard A. Gibbs; Evan E. Eichler

Despite their importance in gene innovation and phenotypic variation, duplicated regions have remained largely intractable owing to difficulties in accurately resolving their structure, copy number and sequence content. We present an algorithm (mrFAST) to comprehensively map next-generation sequence reads, which allows for the prediction of absolute copy-number variation of duplicated segments and genes. We examine three human genomes and experimentally validate genome-wide copy number differences. We estimate that, on average, 73–87 genes vary in copy number between any two individuals and find that these genic differences overwhelmingly correspond to segmental duplications (odds ratio = 135; P < 2.2 × 10−16). Our method can distinguish between different copies of highly identical genes, providing a more accurate assessment of gene content and insight into functional constraint without the limitations of array-based technology.


PLOS Computational Biology | 2011

deFuse: An Algorithm for Gene Fusion Discovery in Tumor RNA-Seq Data

Andrew McPherson; Fereydoun Hormozdiari; Abdalnasser Zayed; Ryan Giuliany; Gavin Ha; Mark Sun; Malachi Griffith; Alireza Heravi Moussavi; Janine Senz; Nataliya Melnyk; Marina Pacheco; Marco A. Marra; Martin Hirst; Torsten O. Nielsen; S. Cenk Sahinalp; David Huntsman; Sohrab P. Shah

Gene fusions created by somatic genomic rearrangements are known to play an important role in the onset and development of some cancers, such as lymphomas and sarcomas. RNA-Seq (whole transcriptome shotgun sequencing) is proving to be a useful tool for the discovery of novel gene fusions in cancer transcriptomes. However, algorithmic methods for the discovery of gene fusions using RNA-Seq data remain underdeveloped. We have developed deFuse, a novel computational method for fusion discovery in tumor RNA-Seq data. Unlike existing methods that use only unique best-hit alignments and consider only fusion boundaries at the ends of known exons, deFuse considers all alignments and all possible locations for fusion boundaries. As a result, deFuse is able to identify fusion sequences with demonstrably better sensitivity than previous approaches. To increase the specificity of our approach, we curated a list of 60 true positive and 61 true negative fusion sequences (as confirmed by RT-PCR), and have trained an adaboost classifier on 11 novel features of the sequence data. The resulting classifier has an estimated value of 0.91 for the area under the ROC curve. We have used deFuse to discover gene fusions in 40 ovarian tumor samples, one ovarian cancer cell line, and three sarcoma samples. We report herein the first gene fusions discovered in ovarian cancer. We conclude that gene fusions are not infrequent events in ovarian cancer and that these events have the potential to substantially alter the expression patterns of the genes involved; gene fusions should therefore be considered in efforts to comprehensively characterize the mutational profiles of ovarian cancer transcriptomes.


Genome Research | 2008

Comparative analysis of the small RNA transcriptomes of Pinus contorta and Oryza sativa

Ryan D. Morin; Gozde Aksay; Elena V. Dolgosheina; H. Alexander Ebhardt; Vincent Magrini; Elaine R. Mardis; S. Cenk Sahinalp; Peter J. Unrau

The diversity of microRNAs and small-interfering RNAs has been extensively explored within angiosperms by focusing on a few key organisms such as Oryza sativa and Arabidopsis thaliana. A deeper division of the plants is defined by the radiation of the angiosperms and gymnosperms, with the latter comprising the commercially important conifers. The conifers are expected to provide important information regarding the evolution of highly conserved small regulatory RNAs. Deep sequencing provides the means to characterize and quantitatively profile small RNAs in understudied organisms such as these. Pyrosequencing of small RNAs from O. sativa revealed, as expected, approximately 21- and approximately 24-nt RNAs. The former contained known microRNAs, and the latter largely comprised intergenic-derived sequences likely representing heterochromatin siRNAs. In contrast, sequences from Pinus contorta were dominated by 21-nt small RNAs. Using a novel sequence-based clustering algorithm, we identified sequences belonging to 18 highly conserved microRNA families in P. contorta as well as numerous clusters of conserved small RNAs of unknown function. Using multiple methods, including expressed sequence folding and machine learning algorithms, we found a further 53 candidate novel microRNA families, 51 appearing specific to the P. contorta library. In addition, alignment of small RNA sequences to the O. sativa genome revealed six perfectly conserved classes of small RNA that included chloroplast transcripts and specific types of genomic repeats. The conservation of microRNAs and other small RNAs between the conifers and the angiosperms indicates that important RNA silencing processes were highly developed in the earliest spermatophytes. Genomic mapping of all sequences to the O. sativa genome can be viewed at http://microrna.bcgsc.ca/cgi-bin/gbrowse/rice_build_3/.


Nature Methods | 2010

mrsFAST: a cache-oblivious algorithm for short-read mapping

Faraz Hach; Fereydoun Hormozdiari; Can Alkan; Farhad Hormozdiari; Inanc Birol; Evan E. Eichler; S. Cenk Sahinalp

In addition to single-nucleotide variations and small insertions-deletions (indels), largersized structural variations (for example, insertions, deletions, inversions, segmental duplications and copy-number polymorphisms) contribute to human genetic diversity. In almost all recent structural variation discovery (SVD) studies, short reads from a donor genome have been mapped to a reference genome as a first step. The accuracy of such an SVD study is directly correlated to the accuracy of this mapping step, which also provides the main computational bottleneck of the SVD study.


Bioinformatics | 2010

Next-generation VariationHunter

Fereydoun Hormozdiari; Iman Hajirasouliha; Phuong Dao; Faraz Hach; Deniz Yorukoglu; Can Alkan; Evan E. Eichler; S. Cenk Sahinalp

Recent years have witnessed an increase in research activity for the detection of structural variants (SVs) and their association to human disease. The advent of next-generation sequencing technologies make it possible to extend the scope of structural variation studies to a point previously unimaginable as exemplified by the 1000 Genomes Project. Although various computational methods have been described for the detection of SVs, no such algorithm is yet fully capable of discovering transposon insertions, a very important class of SVs to the study of human evolution and disease. In this article, we provide a complete and novel formulation to discover both loci and classes of transposons inserted into genomes sequenced with high-throughput sequencing technologies. In addition, we also present ‘conflict resolution’ improvements to our earlier combinatorial SV detection algorithm (VariationHunter) by taking the diploid nature of the human genome into consideration. We test our algorithms with simulated data from the Venter genome (HuRef) and are able to discover >85% of transposon insertion events with precision of >90%. We also demonstrate that our conflict resolution algorithm (denoted as VariationHunter-CR) outperforms current state of the art (such as original VariationHunter, BreakDancer and MoDIL) algorithms when tested on the genome of the Yoruba African individual (NA18507). Availability: The implementation of algorithm is available at http://compbio.cs.sfu.ca/strvar.htm. Contact: [email protected]; [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Genome Research | 2003

Analysis of Primate Genomic Variation Reveals a Repeat-Driven Expansion of the Human Genome

Ge Liu; Nisc Comparative Sequencing Program; Shaying Zhao; Jeffrey A. Bailey; S. Cenk Sahinalp; Can Alkan; Eray Tuzun; Eric D. Green; Evan E. Eichler

Compositional spectra (CS) analysis based on k-mer scoring of DNA sequences was employed in this study for dot-plot comparison of human and primate genomes. The detection of extended conserved synteny regions was based on continuous fuzzy similarity rather than on chains of discrete anchors (genes or highly conserved noncoding elements). In addition to the high correspondence found in the comparisons of whole-genome sequences, a good similarity was also found after masking gene sequences, indicating that CS analysis manages to reveal phylogenetic signal in the organization of noncoding part of the genome sequences, including repetitive DNA and the genome ―dark matter‖. Obviously, the possibility to reveal parallel ordering depends on the signal of common ancestor sequence organization varying locally along the corresponding segments of the compared genomes. We explored two sources contributing to this signal: sequence composition (GC content) and sequence organization (abundances of k-mers in the usual A,T,G,C or purine-pyrimidine alphabets). Whole-genome comparisons based on GC distribution along the analyzed sequences indeed gives reasonable results, but combining it with k-mer abundances dramatically improves the ordering quality, indicating that compositional and organizational heterogeneity comprise complementary sources of information on evolutionary conserved similarity of genome sequences.


research in computational molecular biology | 2006

Not all scale free networks are Born equal: the role of the seed graph in PPI network emulation

Fereydoun Hormozdiari; Petra Berenbrink; Nataša Pržulj; S. Cenk Sahinalp

The (asymptotic) degree distributions of the best-known “scale-free” network models are all similar and are independent of the seed graph used; hence, it has been tempting to assume that networks generated by these models are generally similar. In this paper, we observe that several key topological features of such networks depend heavily on the specific model and the seed graph used. Furthermore, we show that starting with the “right” seed graph (typically a dense subgraph of the protein–protein interaction network analyzed), the duplication model captures many topological features of publicly available protein–protein interaction networks very well.


intelligent systems in molecular biology | 2008

Biomolecular network motif counting and discovery by color coding

Noga Alon; Phuong Dao; Iman Hajirasouliha; Fereydoun Hormozdiari; S. Cenk Sahinalp

Protein–protein interaction (PPI) networks of many organisms share global topological features such as degree distribution, k-hop reachability, betweenness and closeness. Yet, some of these networks can differ significantly from the others in terms of local structures: e.g. the number of specific network motifs can vary significantly among PPI networks. Counting the number of network motifs provides a major challenge to compare biomolecular networks. Recently developed algorithms have been able to count the number of induced occurrences of subgraphs with k≤ 7 vertices. Yet no practical algorithm exists for counting non-induced occurrences, or counting subgraphs with k≥ 8 vertices. Counting non-induced occurrences of network motifs is not only challenging but also quite desirable as available PPI networks include several false interactions and miss many others. In this article, we show how to apply the ‘color coding’ technique for counting non-induced occurrences of subgraph topologies in the form of trees and bounded treewidth subgraphs. Our algorithm can count all occurrences of motif G′ with k vertices in a network G with n vertices in time polynomial with n, provided k=O(log n). We use our algorithm to obtain ‘treelet’ distributions for k≤ 10 of available PPI networks of unicellular organisms (Saccharomyces cerevisiae Escherichia coli and Helicobacter Pyloris), which are all quite similar, and a multicellular organism (Caenorhabditis elegans) which is significantly different. Furthermore, the treelet distribution of the unicellular organisms are similar to that obtained by the ‘duplication model’ but are quite different from that of the ‘preferential attachment model’. The treelet distribution is robust w.r.t. sparsification with bait/edge coverage of 70% but differences can be observed when bait/edge coverage drops to 50%. Contact:[email protected]


RNA | 2008

Conifers have a unique small RNA silencing signature

Elena V. Dolgosheina; Ryan D. Morin; Gozde Aksay; S. Cenk Sahinalp; Vincent Magrini; Elaine R. Mardis; Jim Mattsson; Peter J. Unrau

Plants produce small RNAs to negatively regulate genes, viral nucleic acids, and repetitive elements at either the transcriptional or post-transcriptional level in a process that is referred to as RNA silencing. While RNA silencing has been extensively studied across the different phyla of the animal kingdom (e.g., mouse, fly, worm), similar studies in the plant kingdom have focused primarily on angiosperms, thus limiting evolutionary studies of RNA silencing in plants. Here we report on an unexpected phylogenetic difference in the size distribution of small RNAs among the vascular plants. By extracting total RNA from freshly growing shoot tissue, we conducted a survey of small RNAs in 24 vascular plant species. We find that conifers, which radiated from the other seed-bearing plants approximately 260 million years ago, fail to produce significant amounts of 24-nucleotide (nt) RNAs that are known to guide DNA methylation and heterochromatin formation in angiosperms. Instead, they synthesize a diverse population of small RNAs that are exactly 21-nt long. This finding was confirmed by high-throughput sequencing of the small RNA sequences from a conifer, Pinus contorta. A conifer EST search revealed the presence of a novel Dicer-like (DCL) family, which may be responsible for the observed change in small RNA expression. No evidence for DCL3, an enzyme that matures 24-nt RNAs in angiosperms, was found. We hypothesize that the diverse class of 21-nt RNAs found in conifers may help to maintain organization of their unusually large genomes.

Collaboration


Dive into the S. Cenk Sahinalp's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Phuong Dao

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Funda Ergün

Simon Fraser University

View shared research outputs
Top Co-Authors

Avatar

Alexander W. Wyatt

University of British Columbia

View shared research outputs
Researchain Logo
Decentralizing Knowledge