Wojciech Szpankowski
Purdue University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Wojciech Szpankowski.
Journal of Computational Biology | 2006
Mehmet Koyutürk; Yohan Kim; Umut Topkara; Shankar Subramaniam; Wojciech Szpankowski
With an ever-increasing amount of available data on protein-protein interaction (PPI) networks and research revealing that these networks evolve at a modular level, discovery of conserved patterns in these networks becomes an important problem. Although available data on protein-protein interactions is currently limited, recently developed algorithms have been shown to convey novel biological insights through employment of elegant mathematical models. The main challenge in aligning PPI networks is to define a graph theoretical measure of similarity between graph structures that captures underlying biological phenomena accurately. In this respect, modeling of conservation and divergence of interactions, as well as the interpretation of resulting alignments, are important design parameters. In this paper, we develop a framework for comprehensive alignment of PPI networks, which is inspired by duplication/divergence models that focus on understanding the evolution of protein interactions. We propose a mathematical model that extends the concepts of match, mismatch, and gap in sequence alignment to that of match, mismatch, and duplication in network alignment and evaluates similarity between graph structures through a scoring function that accounts for evolutionary events. By relying on evolutionary models, the proposed framework facilitates interpretation of resulting alignments in terms of not only conservation but also divergence of modularity in PPI networks. Furthermore, as in the case of sequence alignment, our model allows flexibility in adjusting parameters to quantify underlying evolutionary relationships. Based on the proposed model, we formulate PPI network alignment as an optimization problem and present fast algorithms to solve this problem. Detailed experimental results from an implementation of the proposed framework show that our algorithm is able to discover conserved interaction patterns very effectively, in terms of both accuracies and computational cost.
intelligent systems in molecular biology | 2004
Mehmet Koyutürk; Wojciech Szpankowski
MOTIVATION With rapidly increasing amount of network and interaction data in molecular biology, the problem of effectively analyzing this data is an important one. Graph theoretic formalisms, commonly used for these analysis tasks, often lead to computationally hard problems due to their relation with subgraph isomorphism. RESULTS This paper presents an innovative new algorithm for detecting frequently occurring patterns and modules in biological networks. Using an innovative graph simplification technique, which is ideally suited to biological networks, our algorithm renders these problems computationally tractable. Indeed, we show experimentally that our algorithm can extract frequently occurring patterns in metabolic pathways extracted from the KEGG database within seconds. The proposed model and algorithm are applicable to a variety of biological networks either directly or with minor modifications. AVAILABILITY Implementation of the proposed algorithms in the C programming language is available as open source at http://www.cs.purdue.edu/homes/koyuturk/pathway/
Theoretical Computer Science | 1995
Philippe Jacquet; Wojciech Szpankowski
The Lempel-Ziv parsing scheme finds a wide range of applications, most notably in data compression and algorithms on words. It partitions a sequence of length n into variable phrases such that a new phrase is the shortest substring not seen in the past as a phase. The parameter of interest is the number Mn of phrases that one can construct from a sequence of length n. In this paper, for the memoryless source with unequal probabilities of symbols generation we derive the limiting distribution of Mn which turns out to be normal. This proves a long-standing open problem. In fact, to obtain this result we solved another open problem, namely, that of establishing the limiting distribution of the internal path length in a digital search tree. The latter is a consequence of an asymptotic solution of a multiplicative differential-functional equation often arising in the analysis of algorithms on words. Interestingly enough, our findings are proved by a combination of probabilistic techniques such as renewal equation and uniform integrability, and analytical techniques such as Mellin transform, differential-functional equations, de-Poissonization, and so forth. In concluding remarks we indicate a possibility of extending our results to Markovian models.
international symposium on information theory | 1997
Mireille Régnier; Wojciech Szpankowski
Abstract. Consider a given pattern H and a random text T generated by a Markovian source. We study the frequency of pattern occurrences in a random text when overlapping copies of the pattern are counted separately. We present exact and asymptotic formulae for moments (including the variance), and probability of r pattern occurrences for three different regions of r , namely: (i) r=O(1) , (ii) central limit regime, and (iii) large deviations regime. In order to derive these results, we first construct certain language expressions that characterize pattern occurrences which are later translated into generating functions. We then use analytical methods to extract asymptotic behaviors of the pattern frequency from the generating functions. These findings are of particular interest to molecular biology problems (e.g., finding patterns with unexpectedly high or low frequencies, and gene recognition), information theory (e.g., second-order properties of the relative frequency), and pattern matching algorithms (e.g., q -gram algorithms).
SIAM Journal on Computing | 1993
Wojciech Szpankowski
Suffix trees find several applications in computer science and telecommunications, most notably in algorithms on strings, data compressions, and codes. Despite this, very little is known about their typical behaviors. In a probabilistic framework, a family of suffix trees—further called b-suffix trees—built from the first n suffixes of a random word is considered. In this family a noncompact suffix tree (i.e., such that every edge is labeled by a single symbol) is represented by
international conference on data mining | 2003
Robert Gwadera; Mikhail J. Atallah; Wojciech Szpankowski
b = 1
IEEE Transactions on Information Theory | 1997
Tomasz Luczak; Wojciech Szpankowski
, and a compact suffix tree (i.e., without unary nodes) is asymptotically equivalent to
IEEE Transactions on Information Theory | 2004
Michael Drmota; Wojciech Szpankowski
b \to \infty
Journal of Computational Biology | 2006
Mehmet Koyutürk; Yohan Kim; Shankar Subramaniam; Wojciech Szpankowski
as
Journal of Combinatorial Theory | 1994
Philippe Jacquet; Wojciech Szpankowski
n \to \infty