Mehmet Koyutürk | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mehmet Koyutürk is active.

Explore More

Publication

Featured researches published by Mehmet Koyutürk.

Journal of Computational Biology | 2006

Pairwise Alignment of Protein Interaction Networks

Mehmet Koyutürk; Yohan Kim; Umut Topkara; Shankar Subramaniam; Wojciech Szpankowski

With an ever-increasing amount of available data on protein-protein interaction (PPI) networks and research revealing that these networks evolve at a modular level, discovery of conserved patterns in these networks becomes an important problem. Although available data on protein-protein interactions is currently limited, recently developed algorithms have been shown to convey novel biological insights through employment of elegant mathematical models. The main challenge in aligning PPI networks is to define a graph theoretical measure of similarity between graph structures that captures underlying biological phenomena accurately. In this respect, modeling of conservation and divergence of interactions, as well as the interpretation of resulting alignments, are important design parameters. In this paper, we develop a framework for comprehensive alignment of PPI networks, which is inspired by duplication/divergence models that focus on understanding the evolution of protein interactions. We propose a mathematical model that extends the concepts of match, mismatch, and gap in sequence alignment to that of match, mismatch, and duplication in network alignment and evaluates similarity between graph structures through a scoring function that accounts for evolutionary events. By relying on evolutionary models, the proposed framework facilitates interpretation of resulting alignments in terms of not only conservation but also divergence of modularity in PPI networks. Furthermore, as in the case of sequence alignment, our model allows flexibility in adjusting parameters to quantify underlying evolutionary relationships. Based on the proposed model, we formulate PPI network alignment as an optimization problem and present fast algorithms to solve this problem. Detailed experimental results from an implementation of the proposed framework show that our algorithm is able to discover conserved interaction patterns very effectively, in terms of both accuracies and computational cost.

intelligent systems in molecular biology | 2004

An efficient algorithm for detecting frequent subgraphs in biological networks

Mehmet Koyutürk; Wojciech Szpankowski

MOTIVATION With rapidly increasing amount of network and interaction data in molecular biology, the problem of effectively analyzing this data is an important one. Graph theoretic formalisms, commonly used for these analysis tasks, often lead to computationally hard problems due to their relation with subgraph isomorphism. RESULTS This paper presents an innovative new algorithm for detecting frequently occurring patterns and modules in biological networks. Using an innovative graph simplification technique, which is ideally suited to biological networks, our algorithm renders these problems computationally tractable. Indeed, we show experimentally that our algorithm can extract frequently occurring patterns in metabolic pathways extracted from the KEGG database within seconds. The proposed model and algorithm are applicable to a variety of biological networks either directly or with minor modifications. AVAILABILITY Implementation of the proposed algorithms in the C programming language is available as open source at http://www.cs.purdue.edu/homes/koyuturk/pathway/

Bioinformatics | 2011

Comparative analysis of algorithms for next-generation sequencing read alignment

Matthew Ruffalo; Thomas LaFramboise; Mehmet Koyutürk

MOTIVATION The advent of next-generation sequencing (NGS) techniques presents many novel opportunities for many applications in life sciences. The vast number of short reads produced by these techniques, however, pose significant computational challenges. The first step in many types of genomic analysis is the mapping of short reads to a reference genome, and several groups have developed dedicated algorithms and software packages to perform this function. As the developers of these packages optimize their algorithms with respect to various considerations, the relative merits of different software packages remain unclear. However, for scientists who generate and use NGS data for their specific research projects, an important consideration is choosing the software that is most suitable for their application. RESULTS With a view to comparing existing short read alignment software, we develop a simulation and evaluation suite, Seal, which simulates NGS runs for different configurations of various factors, including sequencing error, indels and coverage. We also develop criteria to compare the performances of software with disparate output structure (e.g. some packages return a single alignment while some return multiple possible alignments). Using these criteria, we comprehensively evaluate the performances of Bowtie, BWA, mr- and mrsFAST, Novoalign, SHRiMP and SOAPv2, with regard to accuracy and runtime. CONCLUSION We expect that the results presented here will be useful to investigators in choosing the alignment software that is most suitable for their specific research aims. Our results also provide insights into the factors that should be considered to use alignment results effectively. Seal can also be used to evaluate the performance of algorithms that use deep sequencing data for various purposes (e.g. identification of genomic variants). AVAILABILITY Seal is available as open source at http://compbio.case.edu/seal/. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

PLOS Computational Biology | 2010

An integrative -omics approach to identify functional sub-networks in human colorectal cancer.

Rod K. Nibbe; Mehmet Koyutürk; Mark R. Chance

Emerging evidence indicates that gene products implicated in human cancers often cluster together in “hot spots” in protein-protein interaction (PPI) networks. Additionally, small sub-networks within PPI networks that demonstrate synergistic differential expression with respect to tumorigenic phenotypes were recently shown to be more accurate classifiers of disease progression when compared to single targets identified by traditional approaches. However, many of these studies rely exclusively on mRNA expression data, a useful but limited measure of cellular activity. Proteomic profiling experiments provide information at the post-translational level, yet they generally screen only a limited fraction of the proteome. Here, we demonstrate that integration of these complementary data sources with a “proteomics-first” approach can enhance the discovery of candidate sub-networks in cancer that are well-suited for mechanistic validation in disease. We propose that small changes in the mRNA expression of multiple genes in the neighborhood of a protein-hub can be synergistically associated with significant changes in the activity of that protein and its network neighbors. Further, we hypothesize that proteomic targets with significant fold change between phenotype and control may be used to “seed” a search for small PPI sub-networks that are functionally associated with these targets. To test this hypothesis, we select proteomic targets having significant expression changes in human colorectal cancer (CRC) from two independent 2-D gel-based screens. Then, we use random walk based models of network crosstalk and develop novel reference models to identify sub-networks that are statistically significant in terms of their functional association with these proteomic targets. Subsequently, using an information-theoretic measure, we evaluate synergistic changes in the activity of identified sub-networks based on genome-wide screens of mRNA expression in CRC. Cross-classification experiments to predict disease class show excellent performance using only a few sub-networks, underwriting the strength of the proposed approach in discovering relevant and reproducible sub-networks.

sensor, mesh and ad hoc communications and networks | 2005

Redundant reader elimination in RFID systems

Bogdan Carbunar; Murali Krishna Ramanathan; Mehmet Koyutürk; Christoph M. Hoffmann

While recent technological advances have motivated large-scale deployment of RFID systems, a number of critical design issues remain unresolved. In this paper we deal with de- tecting redundant RFID readers (the redundant reader problem). The underlying difficulty associated with this problem arises from the lack of collision detection mechanisms, the potential inability of RFID readers to relay packets generated by other readers, and severe resource constraints on RFID tags. We prove that an optimal solution to the redundant reader problem is NP-hard and propose a randomized, distributed, and localized approximation algorithm, RRE. We provide a detailed probabilistic analysis of the accuracy and time complexity of RRE and conduct elaborate simulations to demonstrate their correctness and efficiency. I. INTRODUCTION

Journal of Computational Biology | 2006

Detecting Conserved Interaction Patterns in Biological Networks

Mehmet Koyutürk; Yohan Kim; Shankar Subramaniam; Wojciech Szpankowski

Molecular interaction data plays an important role in understanding biological processes at a modular level by providing a framework for understanding cellular organization, functional hierarchy, and evolutionary conservation. As the quality and quantity of network and interaction data increases rapidly, the problem of effectively analyzing this data becomes significant. Graph theoretic formalisms, commonly used for these analysis tasks, often lead to computationally hard problems due to their relation to subgraph isomorphism. This paper presents an innovative new algorithm, MULE, for detecting frequently occurring patterns and modules in biological networks. Using an innovative graph simplification technique based on ortholog contraction, which is ideally suited to biological networks, our algorithm renders these problems computationally tractable and scalable to large numbers of networks. We show, experimentally, that our algorithm can extract frequently occurring patterns in metabolic pathways and protein interaction networks from the KEGG, DIP, and BIND databases within seconds. When compared to existing approaches, our graph simplification technique can be viewed either as a pruning heuristic, or a closely related, but computationally simpler task. When used as a pruning heuristic, we show that our technique reduces effective graph sizes significantly, accelerating existing techniques by several orders of magnitude! Indeed, for most of the test cases, existing techniques could not even be applied without our pruning step. When used as a stand-alone analysis technique, MULE is shown to convey significant biological insights at near-interactive rates. The software, sample input graphs, and detailed results for comprehensive analysis of nine eukaryotic PPI networks are available at www.cs.purdue.edu/homes/koyuturk/mule.

Biodata Mining | 2011

DADA: Degree-Aware Algorithms for Network- Based Disease Gene Prioritization

Sinan Erten; Gurkan Bebek; Rob M. Ewing; Mehmet Koyutürk

BackgroundHigh-throughput molecular interaction data have been used effectively to prioritize candidate genes that are linked to a disease, based on the observation that the products of genes associated with similar diseases are likely to interact with each other heavily in a network of protein-protein interactions (PPIs). An important challenge for these applications, however, is the incomplete and noisy nature of PPI data. Information flow based methods alleviate these problems to a certain extent, by considering indirect interactions and multiplicity of paths.ResultsWe demonstrate that existing methods are likely to favor highly connected genes, making prioritization sensitive to the skewed degree distribution of PPI networks, as well as ascertainment bias in available interaction and disease association data. Motivated by this observation, we propose several statistical adjustment methods to account for the degree distribution of known disease and candidate genes, using a PPI network with associated confidence scores for interactions. We show that the proposed methods can detect loosely connected disease genes that are missed by existing approaches, however, this improvement might come at the price of more false negatives for highly connected genes. Consequently, we develop a suite called DA DA, which includes different uniform prioritization methods that effectively integrate existing approaches with the proposed statistical adjustment strategies. Comprehensive experimental results on the Online Mendelian Inheritance in Man (OMIM) database show that DA DA outperforms existing methods in prioritizing candidate disease genes.ConclusionsThese results demonstrate the importance of employing accurate statistical models and associated adjustment methods in network-based disease gene prioritization, as well as other network-based functional inference applications. DA DA is implemented in Matlab and is freely available at http://compbio.case.edu/dada/.

research in computational molecular biology | 2005

Pairwise local alignment of protein interaction networks guided by models of evolution

Mehmet Koyutürk; Wojciech Szpankowski

With ever increasing amount of available data on protein-protein interaction (PPI) networks and research revealing that these networks evolve at a modular level, discovery of conserved patterns in these networks becomes an important problem. Recent algorithms on aligning PPI networks target simplified structures such as conserved pathways to render these problems computationally tractable. However, since conserved structures that are parts of functional modules and protein complexes generally correspond to dense subnets of the network, algorithms that are able to extract conserved patterns in terms of general graphs are necessary. With this motivation, we focus here on discovering protein sets that induce subnets that are highly conserved in the interactome of a pair of species. For this purpose, we develop a framework that formally defines the pairwise local alignment problem for PPI networks, models the problem as a graph optimization problem, and presents fast algorithms for this problem. In order to capture the underlying biological processes correctly, we base our framework on duplication/divergence models that focus on understanding the evolution of PPI networks. Experimental results from an implementation of the proposed framework show that our algorithm is able to discover conserved interaction patterns very effectively (in terms of accuracies and computational cost). While we focus on pairwise local alignment of PPI networks in this paper, the proposed algorithm can be easily adapted to finding matches for a subnet query in a database of PPI networks.

IEEE Transactions on Knowledge and Data Engineering | 2005

Compression, clustering, and pattern discovery in very high-dimensional discrete-attribute data sets

Mehmet Koyutürk; Naren Ramakrishnan

This paper presents an efficient framework for error-bounded compression of high-dimensional discrete-attribute data sets. Such data sets, which frequently arise in a wide variety of applications, pose some of the most significant challenges in data analysis. Subsampling and compression are two key technologies for analyzing these data sets. The proposed framework, PROXIMUS, provides a technique for reducing large data sets into a much smaller set of representative patterns, on which traditional (expensive) analysis algorithms can be applied with minimal loss of accuracy. We show desirable properties of PROXIMUS in terms of runtime, scalability to large data sets, and performance in terms of capability to represent data in a compact form and discovery and interpretation of interesting patterns. We also demonstrate sample applications of PROXIMUS in association rule mining and semantic classification of term-document matrices. Our experimental results on real data sets show that use of the compressed data for association rule mining provides excellent precision and recall values (above 90 percent) across a range of problem parameters while reducing the time required for analysis drastically. We also show excellent interpretability of the patterns discovered by PROXIMUS in the context of clustering and classification of terms and documents. In doing so, we establish PROXIMUS as a tool for both preprocessing data before applying computationally expensive algorithms and directly extracting correlated patterns.

research in computational molecular biology | 2006

Assessing significance of connectivity and conservation in protein interaction networks

Mehmet Koyutürk; Wojciech Szpankowski

Computational and comparative analysis of protein-protein interaction (PPI) networks enable understanding of the modular organization of the cell through identification of functional modules and protein complexes. These analysis techniques generally rely on topological features such as connectedness, based on the premise that functionally related proteins are likely to interact densely and that these interactions follow similar evolutionary trajectories. Significant recent work in our lab, and in other labs has focused on efficient algorithms for identification of modules and their conservation. Application of these methods to a variety of networks has yielded novel biological insights. In spite of algorithmic advances, development of a comprehensive infrastructure for interaction databases is in relative infancy compared to corresponding sequence analysis tools such as BLAST and CLUSTAL. One critical component of this infrastructure is a measure of the statistical significance of a match or a dense subcomponent. Corresponding sequence-based measures such as E-values are key components of sequence matching tools. In the absence of an analytical measure, conventional methods rely on computer simulations based on ad-hoc models for quantifying significance. This paper presents the first such effort, to the best of our knowledge, aimed at analytically quantifying statistical significance of dense components and matches in reference model graphs. We consider two reference graph models – a G(n,p) model in which each pair of nodes has an identical likelihood, p, of sharing an edge, and a two-level G(n,p) model, which accounts for high-degree hub nodes generally occurring in PPI networks. We argue that by choosing conservatively the value of p, the G(n,p) model will dominate that of the power-law graph that is often used to model PPI networks. We also propose a method for evaluating statistical significance based on the results derived from this analysis, and demonstrate the use of these measures for assessing significant structures in PPI networks. Experiments performed on a rich collection of PPI networks show that the proposed model provides a reliable means of evaluating statistical significance of dense patterns in these networks.

Explore More