Wilhelmiina Hämäläinen
University of Eastern Finland
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Wilhelmiina Hämäläinen.
intelligent tutoring systems | 2006
Wilhelmiina Hämäläinen; Mikko Vinni
To implement real intelligence or adaptivity, the models for intelligent tutoring systems should be learnt from data. However, the educational data sets are so small that machine learning methods cannot be applied directly. In this paper, we tackle this problem, and give general outlines for creating accurate classifiers for educational data. We describe our experiment, where we were able to predict course success with more than 80% accuracy in the middle of course, given only hundred rows of data.
international conference on data mining | 2008
Wilhelmiina Hämäläinen; Matti Nykänen
Searching statistically significant association rules is an important but neglected problem. Traditional association rules do not capture the idea of statistical dependence and the resulting rules can be spurious, while the most significant rules may be missing. This leads to erroneous models and predictions which often become expensive.The problem is computationally very difficult, because the significance is not a monotonic property. However, in this paper we prove several other properties, which can be used for pruning the search space. The properties are implemented in the StatApriori algorithm, which searches statistically significant, non-redundant association rules. Based on both theoretical and empirical observations, the resulting rules are very accurate compared to traditional association rules. In addition, StatApriori can work with extremely low frequencies, thus finding new interesting rules.
Knowledge and Information Systems | 2010
Wilhelmiina Hämäläinen
Searching statistically significant association rules is an important but neglected problem. Traditional association rules do not capture the idea of statistical dependence and the resulting rules can be spurious, while the most significant rules may be missing. This leads to erroneous models and predictions which often become expensive. The problem is computationally very difficult, because the significance is not a monotonic property. However, in this paper, we prove several other properties, which can be used for pruning the search space. The properties are implemented in the StatApriori algorithm, which searches statistically significant, non-redundant association rules. Empirical experiments have shown that StatApriori is very efficient, but in the same time it finds good quality rules.
Knowledge and Information Systems | 2012
Wilhelmiina Hämäläinen
Statistical dependency analysis is the basis of all empirical science. A commonly occurring problem is to find the most significant dependency rules, which describe either positive or negative dependencies between categorical attributes. In medical science, for example, one is interested in genetic factors, which can either predispose or prevent diseases. The requirement of statistical significance is essential, because the discoveries should hold also in future data. Typically, the significance is estimated either by Fisher’s exact test or the χ2-measure. The problem is computationally very difficult, because the number of all possible dependency rules increases exponentially with the number of attributes. As a solution, different kinds of restrictions and heuristics have been applied, but a general, scalable search method has been missing. In this paper, we introduce an efficient algorithm, called Kingfisher, for searching for the best non-redundant dependency rules with statistical significance measures. The rules can express either positive or negative dependencies between a set of positive attributes and a single consequent attribute. The algorithm itself is independent from the used goodness measure, but we concentrate on Fisher’s exact test and the χ2-measure. The algorithm is based on an application of the branch-and-bound search strategy, supplemented by several pruning properties. Especially, we prove a new lower bound for Fisher’s p and introduce a new effective pruning principle. According to our experiments on classical benchmark data, the algorithm is well scalable and can efficiently handle even dense and high-dimensional data sets. An interesting observation was that Fisher’s exact test did not only produce more reliable rules than the χ2-measure, but it also performed the search much faster.
Frontiers in Education | 2004
Wilhelmiina Hämäläinen
In this paper, we report our first experiment in teaching the theory of computability in the problem-based way. As far as we know, this is the first experiment of applying the problem-based method to a purely theoretical course of computer science. Performing the course consisted of three parts: First, the new subjects were learnt according to the classical seven step method, which contains both individual and group work, and problem reports were written. Second, the students participated in a traditional exercise session, in which the new techniques were practised in details. And third, the students kept a learning diary, in which they processed the subjects further, tried to construct an overall schema of things learnt, and supervised their own learning. The results were really successful: the students committed themselves well and the drop out percentage was very small; they achieved very deep understanding of the subjects measured by their grades and quality of learning diaries; the experience was enjoyable for both the students and the teachers; and finally, the method supported different kinds of learners very well.
intelligent systems design and applications | 2011
Wilhelmiina Hämäläinen; Mikko Järvinen; Paula Martiskainen; Jaakko Mononen
A current trend in activity recognition is to use just one easily carried accelerometer, either integrated into a mobile phone, carried in a pocket, or attached to an animals collar. The main disadvantage of this approach is that the orientation of the accelerometer is generally unknown. Therefore, one cannot separate body-related accelerations from the gravitational acceleration or determine the real directions of the observed accelerations accurately. As a solution, we introduce a new technique where jerk (changes of accelerations) is analyzed instead of the original acceleration signal. The total jerk magnitude is completely orientation-independent and it reflects only body-related accelerations. If the direction of the gravitation can be approximated even loosely, then the jerk signal can be further enriched with valuable information on jerk angles (direction changes). According to our experiments this kind of jerk-filtered signal produces robust features and can improve the recognition accuracy remarkably.
international conference on data mining | 2010
Wilhelmiina Hämäläinen
Statistical dependency analysis is the basis of all empirical science. A commonly occurring problem is to find the most significant dependency rules, which describe either positive or negative dependencies between categorical attributes. For example, in medical science one is interested in genetic factors, which can either predispose or prevent diseases. The requirement of statistical significance is essential, because the discoveries should hold also in the future data. Typically, the significance is estimated either by Fishers exact test or the
Fundamenta Informaticae | 2011
Wilhelmiina Hämäläinen
\chi^2
Computational Statistics & Data Analysis | 2016
Wilhelmiina Hämäläinen
-measure. The problem is computationally very difficult, because the number of all possible dependency rules increases exponentially with the number of attributes. As a solution, different kinds of restrictions and heuristics have been applied, but a general, scalable search method has been missing. In this paper, we introduce an efficient algorithm for searching for the top-K globally optimal dependency rules using Fishers exact test as a measure function. The rules can express either positive or negative dependencies between a set of positive attributes and a single consequent attribute. The algorithm is based on an application of the branch-and-bound search strategy, supplemented by several pruning properties. Especially, we prove a new lower-bound for the Fishers p, and introduce a new effective pruning principle. The general search algorithm is applicable to other goodness measures, like the
knowledge discovery and data mining | 2014
Wilhelmiina Hämäläinen; Geoffrey I. Webb
\chi^2