Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Arto Klami is active.

Publication


Featured researches published by Arto Klami.


BMC Cancer | 2009

Combined use of expression and CGH arrays pinpoints novel candidate genes in Ewing sarcoma family of tumors

Suvi Savola; Arto Klami; Abhishek Tripathi; Tarja Niini; Massimo Serra; Piero Picci; Samuel Kaski; Diana Zambelli; Katia Scotlandi; Sakari Knuutila

BackgroundEwing sarcoma family of tumors (ESFT), characterized by t(11;22)(q24;q12), is one of the most common tumors of bone in children and young adults. In addition to EWS/FLI1 gene fusion, copy number changes are known to be significant for the underlying neoplastic development of ESFT and for patient outcome. Our genome-wide high-resolution analysis aspired to pinpoint genomic regions of highest interest and possible target genes in these areas.MethodsArray comparative genomic hybridization (CGH) and expression arrays were used to screen for copy number alterations and expression changes in ESFT patient samples. A total of 31 ESFT samples were analyzed by aCGH and in 16 patients DNA and RNA level data, created by expression arrays, was integrated. Time of the follow-up of these patients was 5–192 months. Clinical outcome was statistically evaluated by Kaplan-Meier/Logrank methods and RT-PCR was applied on 42 patient samples to study the gene of the highest interest.ResultsCopy number changes were detected in 87% of the cases. The most recurrent copy number changes were gains at 1q, 2, 8, and 12, and losses at 9p and 16q. Cumulative event free survival (ESFT) and overall survival (OS) were significantly better (P < 0.05) for primary tumors with three or less copy number changes than for tumors with higher number of copy number aberrations. In three samples copy number imbalances were detected in chromosomes 11 and 22 affecting the FLI1 and EWSR1 loci, suggesting that an unbalanced t(11;22) and subsequent duplication of the derivative chromosome harboring fusion gene is a common event in ESFT. Further, amplifications on chromosomes 20 and 22 seen in one patient sample suggest a novel translocation type between EWSR1 and an unidentified fusion partner at 20q. In total 20 novel ESFT associated putative oncogenes and tumor suppressor genes were found in the integration analysis of array CGH and expression data. Quantitative RT-PCR to study the expression levels of the most interesting gene, HDGF, confirmed that its expression was higher than in control samples. However, no association between HDGF expression and patient survival was observed.ConclusionWe conclude that array CGH and integration analysis proved to be effective methods to identify chromosome regions and novel target genes involved in the tumorigenesis of ESFT.


international conference on machine learning | 2007

Local dependent components

Arto Klami; Samuel Kaski

We introduce a mixture of probabilistic canonical correlation analyzers model for analyzing local correlations, or more generally mutual statistical dependencies, in cooccurring data pairs. The model extends the traditional canonical correlation analysis and its probabilistic interpretation in three main ways. First, a full Bayesian treatment enables analysis of small samples (large p, small n, a crucial problem in bioinformatics, for instance), and rigorous estimation of the degree of dependency and independency. Secondly, the mixture formulation generalizes the method from global linearity to the more reasonable assumption of different kinds of dependencies for different kinds of data. As a third novel extension the method decomposes the variation in the data into shared and data set-specific components.


international conference on multimodal interfaces | 2009

GaZIR: gaze-based zooming interface for image retrieval

László Kozma; Arto Klami; Samuel Kaski

We introduce GaZIR, a gaze-based interface for browsing and searching for images. The system computes on-line predictions of relevance of images based on implicit feedback, and when the user zooms in, the images predicted to be the most relevant are brought out. The key novelty is that the relevance feedback is inferred from implicit cues obtained in real-time from the gaze pattern, using an estimator learned during a separate training phase. The natural zooming interface can be connected to any content-based information retrieval engine operating on user feedback. We show with experiments on one engine that there is sufficient amount of information in the gaze patterns to make the estimated relevance feedback a viable choice to complement or even replace explicit feedback by pointing-and-clicking.


Neurocomputing | 2008

Probabilistic approach to detecting dependencies between data sets

Arto Klami; Samuel Kaski

We study data fusion under the assumption that data source-specific variation is irrelevant and only shared variation is relevant. Traditionally the shared variation has been sought by maximizing a dependency measure, such as correlation of linear projections in canonical correlation analysis (CCA). In this traditional framework it is hard to tackle overfitting and model order selection, and thus we turn to probabilistic generative modeling which makes all tools of Bayesian inference applicable. We introduce a family of probabilistic models for the same task, and present conditions under which they seek dependency. We show that probabilistic CCA is a special case of the model family, and derive a new dependency-seeking clustering algorithm as another example. The solution is computed with variational Bayes.


Human Brain Mapping | 2013

Identifying fragments of natural speech from the listener's MEG signals.

Miika Koskinen; Jaakko Viinikanoja; Mikko Kurimo; Arto Klami; Samuel Kaski; Riitta Hari

It is a challenge for current signal analysis approaches to identify the electrophysiological brain signatures of continuous natural speech that the subject is listening to. To relate magnetoencephalographic (MEG) brain responses to the physical properties of such speech stimuli, we applied canonical correlation analysis (CCA) and a Bayesian mixture of CCA analyzers to extract MEG features related to the speech envelope. Seven healthy adults listened to news for an hour while their brain signals were recorded with whole‐scalp MEG. We found shared signal time series (canonical variates) between the MEG signals and speech envelopes at 0.5–12 Hz. By splitting the test signals into equal‐length fragments from 2 to 65 s (corresponding to 703 down to 21 pieces per the total speech stimulus) we obtained better than chance‐level identification for speech fragments longer than 2–3 s, not used in the model training. The applied analysis approach thus allowed identification of segments of natural speech by means of partial reconstruction of the continuous speech envelope (i.e., the intensity variations of the speech sounds) from MEG responses, provided means to empirically assess the time scales obtainable in speech decoding with the canonical variates, and it demonstrated accurate identification of the heard speech fragments from the MEG data. Hum Brain Mapp, 2013.


Data Mining and Knowledge Discovery | 2011

Matching samples of multiple views

Abhishek Tripathi; Arto Klami; Matej Orešič; Samuel Kaski

Multi-view learning studies how several views, different feature representations, of the same objects could be best utilized in learning. In other words, multi-view learning is analysis of co-occurrence data, where the observations are co-occurrences of samples in the views. Standard multi-view learning such as joint density modeling cannot be done in the absence of co-occurrence, when the views are observed separately and the identities of objects are not known. As a practical example, joint analysis of mRNA and protein concentrations requires mapping between genes and proteins. We introduce a data-driven approach for learning the correspondence of the observations in the different views, in order to enable joint analysis also in the absence of known co-occurrence. The method finds a matching that maximizes statistical dependency between the views, which is particularly suitable for multi-view methods such as canonical correlation analysis which has the same objective. We apply the method to translational metabolomics, to identify differences and commonalities in metabolic processes in different species or tissues. The metabolite identities and roles in the different species are not generally known, and it is necessary to search for a matching. In this paper we show, using different metabolomics measurement batches as the views so that the ground truth is known, that the metabolite identities can be reliably matched by a consensus of several matching solutions.


international conference on artificial neural networks | 2002

Learning More Accurate Metrics for Self-Organizing Maps

Jaakko Peltonen; Arto Klami; Samuel Kaski

Improved methods are presented for learning metrics that measure only important distances. It is assumed that changes in primary data are relevant only to the extent that they cause changes in auxiliary data, available paired with the primary data. The metrics are here derived from estimators of the conditional density of the auxiliary data. More accurate estimators are compared, and a more accurate approximation to the distances is introduced. The new methods improved the quality of Self-Organizing Maps (SOMs) significantly for four of the five studied data sets.


european conference on machine learning | 2010

Variational Bayesian mixture of robust CCA models

Jaakko Viinikanoja; Arto Klami; Samuel Kaski

We study the problem of extracting statistical dependencies between multivariate signals, to be used for exploratory analysis of complicated natural phenomena. In particular, we develop generative models for extracting the dependencies, made possible by the probabilistic interpretation of canonical correlation analysis (CCA). We introduce a mixture of robust canonical correlation analyzers, using t-distribution to make the model robust to outliers and variational Bayesian inference for learning from noisy data. We demonstrate the improvements of the new model on artificial data, and further apply it for analyzing dependencies between MEG and measurements of autonomic nervous system to illustrate potential use scenarios.


International Scholarly Research Notices | 2011

High Expression of Complement Component 5 (C5) at Tumor Site Associates with Superior Survival in Ewing's Sarcoma Family of Tumour Patients

Suvi Savola; Arto Klami; Samuel Myllykangas; Cristina Manara; Katia Scotlandi; Piero Picci; Sakari Knuutila; Jukka Vakkila

Background. Unlike in most adult-onset cancers, an association between typical paediatric neoplasms and inflammatory triggers is rare. We studied whether immune system-related genes are activated and have prognostic significance in Ewings sarcoma family of tumors (ESFTs). Method. Data analysis was performed on gene expression profiles of 44 ESFT patients, 11 ESFT cell lines, and 18 normal skeletal muscle samples. Differential expression of 238 inflammation and 299 macrophage-related genes was analysed by t-test, and survival analysis was performed according to gene expression. Results. Inflammatory genes are activated in ESFT patient samples, as 38 of 238 (16%) inflammatory genes were upregulated (P < 0.001) when compared to cell lines. This inflammatory gene activation was characterized by significant enrichment of macrophage-related gene expression with 58 of 299 (19%) of genes upregulated (P < 0.001). High expression of complement component 5 (C5) correlated with better event-free (P = 0.01) and overall survival (P = 0.004) in a dose-dependent manner. C5 and its receptor C5aR1 expression was verified at protein level by immunohistochemistry on an independent ESFT tumour tissue microarray. Conclusion. Immune system-related gene activation is observed in ESFT patient samples, and prognostically significant inflammatory genes (C5, JAK1, and IL8) for ESFT were identified.


IEEE Transactions on Neural Networks | 2015

Group Factor Analysis

Arto Klami; Seppo Virtanen; Eemeli Leppäaho; Samuel Kaski

Factor analysis (FA) provides linear factors that describe the relationships between individual variables of a data set. We extend this classical formulation into linear factors that describe the relationships between groups of variables, where each group represents either a set of related variables or a data set. The model also naturally extends canonical correlation analysis to more than two sets, in a way that is more flexible than previous extensions. Our solution is formulated as a variational inference of a latent variable model with structural sparsity, and it consists of two hierarchical levels: 1) the higher level models the relationships between the groups and 2) the lower models the observed variables given the higher level. We show that the resulting solution solves the group factor analysis (GFA) problem accurately, outperforming alternative FA-based solutions as well as more straightforward implementations of GFA. The method is demonstrated on two life science data sets, one on brain activation and the other on systems biology, illustrating its applicability to the analysis of different types of high-dimensional data sources.

Collaboration


Dive into the Arto Klami's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Seppo Virtanen

Helsinki Institute for Information Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kitsuchart Pasupa

King Mongkut's Institute of Technology Ladkrabang

View shared research outputs
Top Co-Authors

Avatar

Aditya Jitta

Helsinki Institute for Information Technology

View shared research outputs
Top Co-Authors

Avatar

Janne Sinkkonen

Helsinki University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge