Matti Saarela
Tampere University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Matti Saarela.
Genome Biology | 2008
Sami Kilpinen; Reija Autio; Kalle Ojala; Kristiina Iljin; Elmar Bucher; Henri Sara; Tommi Pisto; Matti Saarela; Rolf Skotheim; Mari Björkman; John Patrick Mpindi; Saija Haapa-Paananen; Paula Vainio; Henrik Edgren; Maija Wolf; Jaakko Astola; Sampsa Hautaniemi; Olli Kallioniemi
Our knowledge on tissue- and disease-specific functions of human genes is rather limited and highly context-specific. Here, we have developed a method for the comparison of mRNA expression levels of most human genes across 9,783 Affymetrix gene expression array experiments representing 43 normal human tissue types, 68 cancer types, and 64 other diseases. This database of gene expression patterns in normal human tissues and pathological conditions covers 113 million datapoints and is available from the GeneSapiens website.
Genes, Chromosomes and Cancer | 2008
Anna-Kaarina Järvinen; Reija Autio; Sami Kilpinen; Matti Saarela; Ilmo Leivo; Reidar Grénman; Antti A. Mäkitie; Outi Monni
Gene amplifications and deletions are frequent in head and neck squamous cell carcinomas (SCC) but the association of these alterations with gene expression is mostly unknown. Here, we characterized genome‐wide copy number and gene expression changes on microarrays for 18 oral tongue SCC (OTSCC) cell lines. We identified a number of altered regions including nine high‐level amplifications such as 6q12‐q14 (CD109, MYO6), 9p24 (JAK2, CD274, SLC1A1, RLN1), 11p12‐p13 (TRAF6, COMMD9, TRIM44, FJX1, CD44, PDHX, APIP), 11q13 (FADD, PPFIA1, CTTN), and 14q24 (ABCD4, HBLD1, LTBP2, ZNF410, COQ6, ACYP1, JDP2) where 9% to 64% of genes showed overexpression. Across the whole genome, 26% of the amplified genes had associated overexpression in OTSCC. Furthermore, our data implicated that OTSCC cell lines harbored similar genomic alterations as laryngeal SCC cell lines we have previously analyzed, suggesting that despite differences in clinicopathological features there are no marked differences in molecular genetic alterations of these two HNSCC sites. To identify genes whose expression was associated with copy number increase in head and neck SCC, a statistical analysis for oral tongue and laryngeal SCC cell line data were performed. We pinpointed 1,192 genes that had a statistically significant association between copy number and gene expression. These results suggest that genomic alterations with associated gene expression changes play an important role in the malignant behavior of head and neck SCC. The identified genes provide a basis for further functional validation and may lead to the identification of novel candidates for targeted therapies. This article contains Supplementary Material available at http://www.interscience.wiley.com/jpages/1045‐2257/suppmat.
Oncogene | 2006
Anna-Kaarina Järvinen; Reija Autio; Saija Haapa-Paananen; Maija Wolf; Matti Saarela; Reidar Grénman; Ilmo Leivo; Olli Kallioniemi; Antti A. Mäkitie; Outi Monni
Molecular mechanisms contributing to initiation and progression of head and neck squamous cell carcinoma are still poorly known. Numerous genetic alterations have been described, but molecular consequences of such alterations in most cases remain unclear. Here, we performed an integrated high-resolution microarray analysis of gene copy number and expression in 20 laryngeal cancer cell lines and primary tumors. Our aim was to identify genetic alterations that play a key role in disease pathogenesis and pinpoint genes whose expression is directly impacted by these events. Integration of DNA level data from array-based comparative genomic hybridization with RNA level information from oligonucleotide microarrays was achieved with custom-developed bioinformatic methods. High-level amplifications had a clear impact on gene expression. Across the genome, overexpression of 739 genes could be attributed to gene amplification events in cell lines, with 325 genes showing the same phenomenon in primary tumors including FADD and PPFIA1 at 11q13. The analysis of gene ontology and pathway distributions further pinpointed genes that may identify potential targets of therapeutic intervention. Our data highlight genes that may be critically important to laryngeal cancer progression and offer potential therapeutic targets.
BMC Genomics | 2008
Henna Heinonen; Anni I. Nieminen; Matti Saarela; Anne Kallioniemi; Juha Klefström; Sampsa Hautaniemi; Outi Monni
BackgroundThe 70 kDa ribosomal protein S6 kinase (RPS6KB1), located at 17q23, is amplified and overexpressed in 10–30% of primary breast cancers and breast cancer cell lines. p70S6K is a serine/threonine kinase regulated by PI3K/mTOR pathway, which plays a crucial role in control of cell cycle, growth and survival. Our aim was to determine p70S6K and PI3K/mTOR/p70S6K pathway dependent gene expression profiles by microarrays using five breast cancer cell lines with predefined gene copy number and gene expression alterations. The p70S6K dependent profiles were determined by siRNA silencing of RPS6KB1 in two breast cancer cell lines overexpressing p70S6K. These profiles were further correlated with gene expression alterations caused by inhibition of PI3K/mTOR pathway with PI3K inhibitor Ly294002 or mTOR inhibitor rapamycin.ResultsAltogether, the silencing of p70S6K altered the expression of 109 and 173 genes in two breast cancer cell lines and 67 genes were altered in both cell lines in addition to RPS6KB1. Furthermore, 17 genes including VTCN1 and CDKN2B showed overlap with genes differentially expressed after PI3K or mTOR inhibition. The gene expression signatures responsive to both PI3K/mTOR pathway and p70S6K inhibitions revealed previously unidentified genes suggesting novel downstream targets for PI3K/mTOR/p70S6K pathway.ConclusionSince p70S6K overexpression is associated with aggressive disease and poor prognosis of breast cancer patients, the potential downstream targets of p70S6K and the whole PI3K/mTOR/p70S6K pathway identified in our study may have diagnostic value.
BMC Bioinformatics | 2009
Reija Autio; Sami Kilpinen; Matti Saarela; Olli-P. Kallioniemi; Sampsa Hautaniemi; Jaakko Astola
BackgroundGene expression microarray technologies are widely used across most areas of biological and medical research. Comparing and integrating microarray data from different experiments would be very useful, but is currently very challenging due to the experimental and hybridization conditions, as well as data preprocessing and normalization methods. Furthermore, even in the case of the widely-used, industry-standard Affymetrix oligonucleotide microarrays, the various array generations have different probe sets representing different genes, hindering the data integration.ResultsIn this study our objective is to find systematic approaches to normalize the data emerging from different Affymetrix array generations and from different laboratories. We compare and assess the accuracy of five normalization methods for Affymetrix gene expression data using 6,926 Affymetrix experiments from five array generations. The methods that we compare include 1) standardization, 2) housekeeping gene based normalization, 3) equalized quantile normalization, 4) Weibull distribution based normalization and 5) array generation based gene centering. Our results indicate that the best results are achieved when the data is normalized first within a sample and then between-samples with Array Generation based gene Centering (AGC) normalization.ConclusionWe conclude that with the AGC method integrating different Affymetrix datasets results in values that are significantly more comparable across the array generations than in the cases where no array generation based normalization is used. The AGC method was found to be the best method for normalizing the data from several different array generations, and achieve comparable gene values across thousands of samples.
BMC Bioinformatics | 2009
Reija Autio; Matti Saarela; Anna-Kaarina Järvinen; Sampsa Hautaniemi; Jaakko Astola
BackgroundGene copy number and gene expression values play important roles in cancer initiation and progression. Both can be measured with high-throughput microarrays and some methodologies to integrate and analyze these data exist. However, varying gene sets within different gene expression and copy number microarrays present significant challenges.ResultsWe report an advanced version of earlier published CGH-Plotter that rapidly can identify amplified and deleted areas using gene copy number data. With CGH-Plotter v2, the copy number values can be filtered based on the genomic location in basepair units. After filtering, the values for the missing genes can be interpolated. Moreover, the effect of non-informative areas in the genome can be systematically removed by smoothing and interpolating. Further, we developed a tool (ECN) to illustrate the CGH-data values annotated based on the gene expression. The ECN-tool is a MATLAB toolbox enabling straightforward illustration of copy numbers annotated based on the gene expression levels.ConclusionCGH-Plotter v2 provides two methods for analyzing copy number data; dynamic programming and genomic location based smoothing. With ECN-tool the data analyzed with CGH-Plotter v2 can easily be illustrated along the chromosomes individually or along the whole genome. ECN-tool plots the copy number data annotated based on the gene expression data, and it is easy to find the genes that are both over-expressed and amplified or under-expressed and deleted in the samples. From the resulting figures it is straightforward to select interesting genes.
Archive | 2012
Petri Lehtinen; Matti Saarela; Tapio Elomaa
We show that a commonly-used sampling theoretical attribute discretization algorithm ChiMerge can be implemented efficiently in the online setting. Its benefits include that it is efficient, statistically justified, robust to noise, can be made to produce low-arity partitions, and has empirically been observed to work well in practice.
Advances in Machine Learning I | 2010
Matti Saarela; Tapio Elomaa; Keijo Ruohonen
The relevance vector machine (RVM) is a Bayesian framework for learning sparse regression models and classifiers. Despite of its popularity and practical success, no thorough analysis of its functionality exists. In this paper we consider the RVM in the case of regression models and present two kinds of analysis results: we derive a full characterization of the behavior of the RVM analytically when the columns of the regression matrix are orthogonal and give some results concerning scale and rotation invariance of the RVM. We also consider the practical implications of our results and present a scenario in which our results can be used to detect potential weakness in the RVM framework.
international conference on bioinformatics | 2006
Reija Autio; Sami Kilpinen; Matti Saarela; Sampsa Hautaniemi; Olli Kallioniemi; Jaakko Astola
Affymetrix human gene expression microarrays are widely used in gene expression analysis. However, the comparability of data analyzed in different laboratories is not self-evident hindering integration of multiple data sets. In this study, we introduce a novel normalization method, Weibull distribution based normalization that makes the data from different laboratories easier to integrate and compare. The method normalizes the samples by correcting the ML-estimates of the parameters of Weibull distribution to be the same in every sample of the same array generation. The effects of the Weibull distribution based normalization were studied by comparing the distributions of the samples, examining the deviations of expression levels of housekeeping genes, and clustering the data.
international syposium on methodologies for intelligent systems | 2008
Tapio Elomaa; Petri Lehtinen; Matti Saarela
Cut point analysis for discretization of numerical attributes has shown, for many commonly-used attribute evaluation functions, that adjacent value range intervals with an equal relative class distribution may be merged together without risking to find the optimal partition of the range. A natural idea is to relax this requirement and rely on a statistical test to decide whether the intervals are probably generated from the same distribution. ChiMerge is a classical algorithm for numerical interval processing operating just in this manner. ChiMerge handles the interval mergings in the order of their statistical probability. However, in online processing of the data the required n log n time is too much. In this paper we propose to do the mergings during a left-to-right scan of the intervals. Thus, we reduce the time requirement of merging down to more reasonable linear time. Such linear time operations are not necessary in connection of every example. Our empirical evaluation shows that intervals get effectively combined, their growth rate remains very moderate even when the number of examples grows excessive, and that the substantial reduction of interval numbers can even benefit prediction accuracy.