Tolga Can
Middle East Technical University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tolga Can.
Journal of Cheminformatics | 2015
Martin Krallinger; Obdulia Rabal; Florian Leitner; Miguel Vazquez; David Salgado; Zhiyong Lu; Robert Leaman; Yanan Lu; Donghong Ji; Daniel M. Lowe; Roger A. Sayle; Riza Theresa Batista-Navarro; Rafal Rak; Torsten Huber; Tim Rocktäschel; Sérgio Matos; David Campos; Buzhou Tang; Hua Xu; Tsendsuren Munkhdalai; Keun Ho Ryu; S. V. Ramanan; Senthil Nathan; Slavko Žitnik; Marko Bajec; Lutz Weber; Matthias Irmer; Saber A. Akhondi; Jan A. Kors; Shuo Xu
The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/
computational systems bioinformatics | 2003
Tolga Can; Yuan-Fang Wang
We present a new method for conducting protein structure similarity searches, which improves on the accuracy, robustness, and efficiency of some existing techniques. Our method is grounded in the theory of differential geometry on 3D space curve matching. We generate shape signatures for proteins that are invariant, localized, robust, compact, and biologically meaningful. To improve matching accuracy, we smooth the noisy raw atomic coordinate data with spline fitting. To improve matching efficiency, we adopt a hierarchical coarse-to-fine strategy. We use an efficient hashing-based technique to screen out unlikely candidates and perform detailed pairwise alignments only for a small number of candidates that survive the screening process. Contrary to other hashing based techniques, our technique employs domain specific information (not just geometric information) in constructing the hash key, and hence, is more tuned to the domain of biology. Furthermore, the invariancy, localization, and compactness of the shape signatures allow us to utilize a well-known local sequence alignment algorithm for aligning two protein structures. One measure of the efficacy of the proposed technique is that we were able to discover new, meaningful motifs that were not reported by other structure alignment methods.
Proceedings of the 5th international workshop on Bioinformatics | 2005
Tolga Can; Orhan Çamoǧlu; Ambuj K. Singh
Genome wide protein networks have become reality in recent years due to high throughput methods for detecting protein interactions. Recent studies show that a networked representation of proteins provides a more accurate model of biological systems and processes compared to conventional pair-wise analyses. Complementary to the availability of protein networks, various graph analysis techniques have been proposed to mine these networks for pathway discovery, function assignment, and prediction of complex membership. In this paper, we propose using random walks on graphs for the complex/pathway membership problem. We evaluate the proposed technique on three different probabilistic yeast networks using a benchmark dataset of 27 complexes from the MIPS complex catalog database and 10 pathways from the KEGG pathway database. Furthermore, we compare the proposed technique to two other existing techniques both in terms of accuracy and running time performance, thus addressing the scalability issue of such analysis techniques for the first time. Our experiments show that the random walk technique achieves similar or better accuracy with more than 1,000 times speed-up compared to the best competing technique.
Nucleic Acids Research | 2012
Begum H. Akman; Tolga Can; A. Elif Erson-Bensan
3′-Untranslated region (UTR) shortening of mRNAs via alternative polyadenylation (APA) has important ramifications for gene expression. By using proximal APA sites and switching to shorter 3′-UTRs, proliferating cells avoid miRNA-mediated repression. Such APA and 3′-UTR shortening events may explain the basis of some of the proto-oncogene activation cases observed in cancer cells. In this study, we investigated whether 17 β-estradiol (E2), a potent proliferation signal, induces APA and 3′-UTR shortening to activate proto-oncogenes in estrogen receptor positive (ER+) breast cancers. Our initial probe based screen of independent expression arrays suggested upregulation and 3′-UTR shortening of an essential regulator of DNA replication, CDC6 (cell division cycle 6), upon E2 treatment. We further confirmed the E2- and ER-dependent upregulation and 3′UTR shortening of CDC6, which lead to increased CDC6 protein levels and higher BrdU incorporation. Consequently, miRNA binding predictions and dual luciferase assays suggested that 3′-UTR shortening of CDC6 was a mechanism to avoid 3′-UTR-dependent negative regulations. Hence, we demonstrated CDC6 APA induction by the proliferative effect of E2 in ER+ cells and provided new insights into the complex regulation of APA. E2-induced APA is likely to be an important but previously overlooked mechanism of E2-responsive gene expression.
Bioinformatics | 2003
Tolga Can; Yujun Wang; Yuan-Fang Wang; Jianwen Su
MOTIVATION Many tools have been developed to visualize protein structures. Tools that have been based on Java 3D((TM)) are compatible among different systems and they can be run remotely through web browsers. However, using Java 3D for visualization has some performance issues with it. The primary concerns about molecular visualization tools based on Java 3D are in their being slow in terms of interaction speed and in their inability to load large molecules. This behavior is especially apparent when the number of atoms to be displayed is huge, or when several proteins are to be displayed simultaneously for comparison. RESULTS In this paper we present techniques for organizing a Java 3D scene graph to tackle these problems. We have developed a protein visualization system based on Java 3D and these techniques. We demonstrate the effectiveness of the proposed method by comparing the visualization component of our system with two other Java 3D based molecular visualization tools. In particular, for van der Waals display mode, with the efficient organization of the scene graph, we could achieve up to eight times improvement in rendering speed and could load molecules three times as large as the previous systems could. AVAILABILITY EPV is freely available with source code at the following URL: http://www.cs.ucsb.edu/~tcan/fpv/
pacific symposium on biocomputing | 2003
Arnab Bhattacharya; Tolga Can; Tamer Kahveci; Ambuj K. Singh; Yuan-Fang Wang
We consider the problem of similarity searches on protein databases based on both sequence and structure information simultaneously. Our program extracts feature vectors from both the sequence and structure components of the proteins. These feature vectors are then combined and indexed using a novel multi-dimensional index structure. For a given query, we employ this index structure to find candidate matches from the database. We develop a new method for computing the statistical significance of these candidates. The candidates with high significance are then aligned to the query protein using the Smith-Waterman technique to find the optimal alignment. The experimental results show that our method can classify up to 97% of the superfamilies and up to 100% of the classes correctly according to the SCOP classification. Our method is up to 37 times faster than CTSS, a recent structure search technique, combined with Smith-Waterman technique for sequences.
Bioinformatics | 2006
Orhan Çamoğlu; Tolga Can; Ambuj K. Singh
MOTIVATION A global view of the protein space is essential for functional and evolutionary analysis of proteins. In order to achieve this, a similarity network can be built using pairwise relationships among proteins. However, existing similarity networks employ a single similarity measure and therefore their utility depends highly on the quality of the selected measure. A more robust representation of the protein space can be realized if multiple sources of information are used. RESULTS We propose a novel approach for analyzing multi-attribute similarity networks by combining random walks on graphs with Bayesian theory. A multi-attribute network is created by combining sequence and structure based similarity measures. For each attribute of the similarity network, one can compute a measure of affinity from a given protein to every other protein in the network using random walks. This process makes use of the implicit clustering information of the similarity network, and we show that it is superior to naive, local ranking methods. We then combine the computed affinities using a Bayesian framework. In particular, when we train a Bayesian model for automated classification of a novel protein, we achieve high classification accuracy and outperform single attribute networks. In addition, we demonstrate the effectiveness of our technique by comparison with a competing kernel-based information integration approach.
BioSystems | 2011
M. Sualp; Tolga Can
1. IntroductionRecent discoveries in genetics show that gene expression inhigher eukaryotes is regulated in part by small non-coding RNAmolecules named microRNAs (miRNAs) (Ambros, 2004; Bartel,2004; Lim et al., 2003). In plants, miRNAs regulate proteinexpression by binding to the coding region of the correspondingmessenger RNA (mRNA). Plant miRNAs exhibit near-perfect com-plementarity withthetargetsites.Inanimals,miRNAsusuallybindto the 3 untranslated region (UTR) of the target mRNA and haveonly limited complementarity to their target sites (Bartel, 2004).miRNAs are involved in many important cellular processes (Cuiet al., 2006) and their dysregulation is related to anomalies suchas cancer (He et al., 2005) and heart diseases (Care et al., 2007;Zhao et al., 2007). In recent years, several thousand miRNAs wereidentified inanimalsandplants.AsofJune2010,miRBaserelease15(Griffiths-Jones etal.,2008)contains14,197distinctmaturemiRNAsequences.Current major challenges in miRNA research are: (1) identifica-tion ofnovelmiRNAgenes,(2)functionalannotationofmiRNAs,(3)identification of genes targeted by miRNAs,
Molecular Cancer Research | 2016
Ayse Elif Erson-Bensan; Tolga Can
Advancements in sequencing and transcriptome analysis methods have led to seminal discoveries that have begun to unravel the complexity of cancer. These studies are paving the way toward the development of improved diagnostics, prognostic predictions, and targeted treatment options. However, it is clear that pieces of the cancer puzzle are still missing. In an effort to have a more comprehensive understanding of the development and progression of cancer, we have come to appreciate the value of the noncoding regions of our genomes, partly due to the discovery of miRNAs and their significance in gene regulation. Interestingly, the miRNA–mRNA interactions are not solely dependent on variations in miRNA levels. Instead, the majority of genes harbor multiple polyadenylation signals on their 3′ UTRs (untranslated regions) that can be differentially selected on the basis of the physiologic state of cells, resulting in alternative 3′ UTR isoforms. Deregulation of alternative polyadenylation (APA) has increasing interest in cancer research, because APA generates mRNA 3′ UTR isoforms with potentially different stabilities, subcellular localizations, translation efficiencies, and functions. This review focuses on the link between APA and cancer and discusses the mechanisms as well as the tools available for investigating APA events in cancer. Overall, detection of deregulated APA-generated isoforms in cancer may implicate some proto-oncogene activation cases of unknown causes and may help the discovery of novel cases; thus, contributing to a better understanding of molecular mechanisms of cancer. Mol Cancer Res; 14(6); 507–17. ©2016 AACR.
IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2012
Seyedsasan Hashemikhabir; Eyup Serdar Ayaz; Yusuf Kavurucu; Tolga Can; Tamer Kahveci
Reconstructing the topology of a signaling network by means of RNA interference (RNAi) technology is an underdetermined problem especially when a single gene in the network is knocked down or observed. In addition, the exponential search space limits the existing methods to small signaling networks of size 10-15 genes. In this paper, we propose integrating RNAi data with a reference physical interaction network. We formulate the problem of signaling network reconstruction as finding the minimum number of edit operations on a given reference network. The edit operations transform the reference network to a network that satisfies the RNAi observations. We show that using a reference network does not simplify the computational complexity of the problem. Therefore, we propose two methods which provide near optimal results and can scale well for reconstructing networks up to hundreds of components. We validate the proposed methods on synthetic and real data sets. Comparison with the state of the art on real signaling networks shows that the proposed methodology can scale better and generates biologically significant results.