Ilari T. Nieminen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ilari T. Nieminen is active.

Explore More

Publication

Featured researches published by Ilari T. Nieminen.

international symposium on neural networks | 2012

Subjects on objects in contexts: Using GICA method to quantify epistemological subjectivity

Timo Honkela; Juha Raitio; Krista Lagus; Ilari T. Nieminen; Nina Honkela; Mika Pantzar

A substantial amount of subjectivity is involved in how people use language and conceptualize the world. Computational methods and formal representations of knowledge usually neglect this kind of individual variation. We have developed a novel method, Grounded Intersubjective Concept Analysis (GICA), for the analysis and visualization of individual differences in language use and conceptualization. The GICA method first employs a conceptual survey or a text mining step to elicit from varied groups of individuals the particular ways in which terms and associated concepts are used among the individuals. The subsequent analysis and visualization reveals potential underlying groupings of subjects, objects and contexts. One way of viewing the GICA method is to compare it with the traditional word space models. In the word space models, such as latent semantic analysis (LSA), statistical analysis of word-context matrices reveals latent information. A common approach is to analyze term-document matrices in the analysis. The GICA method extends the basic idea of the traditional term-document matrix analysis to include a third dimension of different individuals. This leads to a formation of a third-order tensor of size subjects × objects × contexts. Through flattening into a matrix, these subject-object-context (SOC) tensors can again be analyzed using various computational methods including principal component analysis (PCA), singular value decomposition (SVD), independent component analysis (ICA) or any existing or future method suitable for analyzing high-dimensional data sets. In order to demonstrate the use of the GICA method, we present the results of two case studies. In the first case, GICA of health-related concepts is conducted. In the second one, the State of the Union addresses by US presidents are analyzed. In these case studies, we apply multidimensional scaling (MDS), the self-organizing map (SOM) and Neighborhood Retrieval Visualizer (NeRV) as specific data analysis methods within the overall GICA method. The GICA method can be used, for instance, to support education of heterogeneous audiences, public planning processes and participatory design, conflict resolution, environmental problem solving, interprofessional and interdisciplinary communication, product development processes, mergers of organizations, and building enhanced knowledge representations in semantic web.

international conference on artificial neural networks | 2010

Using correlation dimension for analysing text data

Ilkka Kivimäki; Krista Lagus; Ilari T. Nieminen; Jaakko J. Väyrynen; Timo Honkela

In this article, we study the scale-dependent dimensionality properties and overall structure of text data with a method that measures correlation dimension in different scales. As experimental results, we present the analysis of text data sets with the Reuters and Europarl corpora, which are also compared to artificially generated point sets. A comparison is also made with speech data. The results reflect some of the typical properties of the data and the use of our method in improving various data analysis applications is discussed.

WSOM | 2013

Controlling Self-Organization and Handling Missing Values in SOM and GTM

Tommi Vatanen; Ilari T. Nieminen; Timo Honkela; Tapani Raiko; Krista Lagus

In this paper, we study fundamental properties of the Self-Organizing Map (SOM) and the Generative Topographic Mapping (GTM), ramifications of the initialization of the algorithms and properties of the algorithms in presence of missing data. We show that the commonly used principal component analysis (PCA) initialization of the GTM does not guarantee good learning results with complex, high-dimensional data. We propose initializing the GTM with SOM and demonstrate usefulness of this improvement using the ISOLET data set. We also propose a revision to the batch SOM algorithm called the Imputation SOM and show that the new algorithm is more robust in presence of missing data. We compare the performance of the algorithms in the missing value imputation task. We also announce a revised version of the SOM Toolbox for Matlab with added GTM functionality.

international conference on artificial neural networks | 2013

Exploration of Loneliness Questionnaires Using the Self-Organising Map

Krista Lagus; Juho Saari; Ilari T. Nieminen; Timo Honkela

Statistical machine learning methods can provide help when developing preventative services and tools that support the empowerment of individuals. We explore how the self-organizing map could be utilized as a tool for analyzing, visualizing and browsing heterogeneous survey data on wellbeing that contains both quantitative (numeric) and qualitative (text) data. There is systematic evidence implying that social isolation has drastic consequences for subjective well-being and health. It is important to obtain a deeper understanding of the phenomenon. Analysis of loneliness questionnaire data (N=521) succeeds in identifying profiles of loneliness as well as identifies crowd-sourced ideas for improving social wellbeing among the different subgroups.

intelligent data analysis | 2013

Variational Bayesian PCA versus k-NN on a Very Sparse Reddit Voting Dataset

Jussa Klapuri; Ilari T. Nieminen; Tapani Raiko; Krista Lagus

We present vote estimation results on the largely unexplored Reddit voting dataset that contains 23M votes from 43k users on 3.4M links. This problem is approached using Variational Bayesian Principal Component Analysis VBPCA and a novel algorithm for k-Nearest Neighbors k-NN optimized for high dimensional sparse datasets without using any approximations. We also explore the scalability of the algorithms for extremely sparse problems with up to 99.99% missing values. Our experiments show that k-NN works well for the standard Reddit vote prediction. The performance of VBPCA with the full preprocessed Reddit data was not as good as k-NNs, but it was more resilient to further sparsification of the problem.

international conference on computational linguistics | 2008