Is this you? Create Your Porfile

Ioan Tabus

Tampere University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ioan Tabus is active.

Explore More

Publication

Featured researches published by Ioan Tabus.

ACM Transactions on Information Systems | 2005

An efficient normalized maximum likelihood algorithm for DNA sequence compression

Gergely Korodi; Ioan Tabus

This article presents an efficient algorithm for DNA sequence compression, which achieves the best compression ratios reported over a test set commonly used for evaluating DNA compression programs. The algorithm introduces many refinements to a compression method that combines: (1) encoding by a simple normalized maximum likelihood (NML) model for discrete regression, through reference to preceding approximate matching blocks, (2) encoding by a first order context coding and (3) representing strings in clear, to make efficient use of the redundancy sources in DNA data, under fast execution times. One of the main algorithmic features is the constraint on the matching blocks to include reasonably long contiguous matches, which not only reduces significantly the search time, but also can be used to modify the NML model to exploit the constraint for getting smaller code lengths. The algorithm handles the changing statistics of DNA data in an adaptive way and by predictively encoding the matching pointers it is successful in compressing long approximate matches. Apart from comparison with previous DNA encoding methods, we present compression results for the recently published human genome data.

IEEE Transactions on Signal Processing | 2001

On the parameterization of positive real sequences and MA parameter estimation

Bogdan Dumitrescu; Ioan Tabus; Petre Stoica

An algorithm for moving average (MA) parameter estimation was proposed by Stoica et al. (see ibid. vol.48, p.1999-2012, 2000). Its key step (covariance fitting) is a semidefinite programming (SDP) problem with two convex constraints: one reflecting the real positiveness of the desired covariance sequence and the other having a second-order cone form. We analyze two parameterizations of a positive real sequence and show that there is a one-to-one correspondence between them. We also show that the dual of the covariance fitting problem has a significantly smaller number of variables and, thus, a much reduced computational complexity. We discuss in detail the formulations that are best suited for the currently available semidefinite quadratic programming packages. Experimental results show that the execution times of the newly proposed algorithms scale well with the MA order, which are therefore convenient for large-order MA signals.

data compression conference | 2003

DNA sequence compression using the normalized maximum likelihood model for discrete regression

Ioan Tabus; Gergely Korodi; Jorma Rissanen

The use of normalized maximum likelihood (NML) model for encoding sequences known to have regularities in the form of approximate repetitions was discussed. A particular version of the NML model was presented for discrete regression, which was shown to provide a very powerful yet simple model for encoding the approximate repeats in DNA sequences. Combining the model of repeats with a simple first order Markov model, a fast lossless compression method was obtained that compares favorably with the existing DNA compression programs. It is remarkable that a simple model, which recursively updates a small number of parameters, is able to reach the state of the art compression ratio for DNA sequences with much more complex models. Being a minimum description length (MDL) model, the NML model may later prove to be useful in studying global and local features of DNA or possibly of other biological sequences.

Molecular & Cellular Proteomics | 2010

Integrated Proteomics and Genomics Analysis Reveals a Novel Mesenchymal to Epithelial Reverting Transition in Leiomyosarcoma through Regulation of Slug

Jilong Yang; James A. Eddy; Yuan Pan; Andrea Hategan; Ioan Tabus; Yingmei Wang; David Cogdell; Nathan D. Price; Raphael E. Pollock; Alexander J. Lazar; Kelly K. Hunt; Jonathan C. Trent; Wei Zhang

Leiomyosarcoma is one of the most common mesenchymal tumors. Proteomics profiling analysis by reverse-phase protein lysate array surprisingly revealed that expression of the epithelial marker E-cadherin (encoded by CDH1) was significantly elevated in a subset of leiomyosarcomas. In contrast, E-cadherin was rarely expressed in the gastrointestinal stromal tumors, another major mesenchymal tumor type. We further sought to 1) validate this finding, 2) determine whether there is a mesenchymal to epithelial reverting transition (MErT) in leiomyosarcoma, and if so 3) elucidate the regulatory mechanism responsible for this MErT. Our data showed that the epithelial cell markers E-cadherin, epithelial membrane antigen, cytokeratin AE1/AE3, and pan-cytokeratin were often detected immunohistochemically in leiomyosarcoma tumor cells on tissue microarray. Interestingly, the E-cadherin protein expression was correlated with better survival in leiomyosarcoma patients. Whole genome microarray was used for transcriptomics analysis, and the epithelial gene expression signature was also associated with better survival. Bioinformatics analysis of transcriptome data showed an inverse correlation between E-cadherin and E-cadherin repressor Slug (SNAI2) expression in leiomyosarcoma, and this inverse correlation was validated on tissue microarray by immunohistochemical staining of E-cadherin and Slug. Knockdown of Slug expression in SK-LMS-1 leiomyosarcoma cells by siRNA significantly increased E-cadherin; decreased the mesenchymal markers vimentin and N-cadherin (encoded by CDH2); and significantly decreased cell proliferation, invasion, and migration. An increase in Slug expression by pCMV6-XL5-Slug transfection decreased E-cadherin and increased vimentin and N-cadherin. Thus, MErT, which is mediated through regulation of Slug, is a clinically significant phenotype in leiomyosarcoma.

Bioinformatics | 2005

Robust estimation of protein expression ratios with lysate microarray technology

Cristian Mircean; Ilya Shmulevich; David Cogdell; Woonyoung Choi; Yu Jia; Ioan Tabus; Stanley R. Hamilton; Wei Zhang

MOTIVATION The protein lysate microarray is a developing proteomic technology for measuring protein expression levels in a large number of biological samples simultaneously. A challenge for accurate quantification is the relatively narrow dynamic range associated with the commonly used chromogenic signal detection system. To facilitate accurate measurement of the relative expression levels, each sample is serially diluted and each diluted version is spotted on a nitrocellulose-coated slide in triplicate. Thus, each sample yields multiple measurements in different dynamic ranges of the detection system. This study aims to develop suitable algorithms that yield accurate representations of the relative expression levels in different samples from multiple data points. RESULTS We evaluated two algorithms for estimating relative protein expression in different samples on the lysate microarray by means of a cross-validation procedure. For this purpose as well as for quality control we designed a 1440-spot lysate microarray containing 80 identical samples of purified bovine serum albumin, printed in triplicate with six 2-fold dilutions. Our analysis showed that the algorithm based on a robust least squares estimator provided the most accurate quantification of the protein lysate microarray data. We also demonstrated our methods by estimating relative expression levels of p53 and p21 in either p53(+/+) or p53(-/-) HCT116 colon cancer cells after two drug treatments and their combinations on another lysate microarray. AVAILABILITY http://www.cs.tut.fi/~mirceanc/lysate_array_bioinformatics.htm

Eurasip Journal on Bioinformatics and Systems Biology | 2008

Inference of gene regulatory networks based on a universal minimum description length

John Dougherty; Ioan Tabus; Jaakko Astola

The Boolean network paradigm is a simple and effective way to interpret genomic systems, but discovering the structure of these networks remains a difficult task. The minimum description length (MDL) principle has already been used for inferring genetic regulatory networks from time-series expression data and has proven useful for recovering the directed connections in Boolean networks. However, the existing method uses an ad hoc measure of description length that necessitates a tuning parameter for artificially balancing the model and error costs and, as a result, directly conflicts with the MDL principles implied universality. In order to surpass this difficulty, we propose a novel MDL-based method in which the description length is a theoretical measure derived from a universal normalized maximum likelihood model. The search space is reduced by applying an implementable analogue of Kolmogorovs structure function. The performance of the proposed method is demonstrated on random synthetic networks, for which it is shown to improve upon previously published network inference algorithms with respect to both speed and accuracy. Finally, it is applied to time-series Drosophila gene expression measurements.

EURASIP Journal on Advances in Signal Processing | 2001

On the use of MDL principle in gene expression prediction

Ioan Tabus; Jaakko Astola

The structure and biological behavior of a cell are determined by the pattern of gene expressions within that cell. The so-called gene prediction problem refers to finding rules, or sets of possible rules, on how certain genes expressions determine the expression level of a given target gene. In this paper, we investigate the gene prediction problem and propose the use of new predictors, selected according to the minimum description length (MDL) principle. We compare the use of Boolean predictors, ternary predictors and perceptron predictors. We resort to MDL as a tool for selecting the proper size of the prediction window. MDL is also well suited for comparing predictors having different complexities. We show that the best description can be achieved by the Boolean and ternary predictors, since they obtain better fitting of the data with a lower complexity of the model. To illustrate the comparison, both synthetic and experimental data are used.

IEEE Transactions on Signal Processing | 2012

Greedy Sparse RLS

Bogdan Dumitrescu; Alexandru Onose; Petri Helin; Ioan Tabus

Starting from the orthogonal (greedy) least squares method, we build an adaptive algorithm for finding online sparse solutions to linear systems. The algorithm belongs to the exponentially windowed recursive least squares (RLS) family and maintains a partial orthogonal factorization with pivoting of the system matrix. For complexity reasons, the permutations that bring the relevant columns into the first positions are restrained mainly to interchanges between neighbors at each time moment. The storage scheme allows the computation of the exact factorization, implicitly working on indefinitely long vectors. The sparsity level of the solution, i.e., the number of nonzero elements, is estimated using information theoretic criteria, in particular Bayesian information criterion (BIC) and predictive least squares. We present simulations showing that, for identifying sparse time-varying FIR channels, our algorithm is consistently better than previous sparse RLS methods based on the -norm regularization of the RLS criterion. We also use our sparse greedy RLS algorithm for computing linear predictions in a lossless audio coding scheme and obtain better compression than MPEG4 ALS using an RLS-LMS cascade.

data compression conference | 2007

Normalized maximum likelihood model of order-1 for the compression of DNA sequences

Gergely Korodi; Ioan Tabus

We present the NML model for classes of models with memory described by first order dependencies. The model is used for efficiently locating and encoding the best regressor present in a dictionary. By combining the order-1 NML with the order-0 NML model the resulting algorithm achieves a consistent improvement over the earlier order-0 NML algorithm, and it is demonstrated to have superior performance on the practical compression of the human genome

Signal Processing | 2003

Classification and feature gene selection using the normalized maximum likelihood model for discrete regression

Ioan Tabus; Jorma Rissanen; Jaakko Astola

This paper studies the problem of class discrimination based on the normalized maximum likelihood (NML) model for a nonlinear regression, where the nonlinearly transformed class labels, each taking M possible values, are assumed to be drawn from a multinomial trial process. The strength of the MDL methods in statistical inference is to find the model structure which, in this particular classification problem, amounts to finding the best set of feature genes. We first show that the minimization of the codelength of the NML model for different sets of feature genes is a tractable problem. We then extend the model for selecting the feature genes to a completely defined classifier and check its classification error in a cross-validation experiment. Also the quantization process itself involved in getting the required entries in the model, can be evaluated with the NML description length. The new classification method is applied to leukemia class discrimination based on gene expression microarray data. We find classification errors as low as 0.03% with a quadruplet of binary qnantized genes, which was top ranked by the NML description length. Such a length of the class labels, obtained with various sets of feature genes in the nonlinear regression model, allows intuitive comparisons of nested feature sets.

Explore More