Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Matti Kääriäinen is active.

Publication


Featured researches published by Matti Kääriäinen.


Journal of Artificial Intelligence Research | 2001

An analysis of reduced error pruning

Tapio Elomaa; Matti Kääriäinen

Top-down induction of decision trees has been observed to suffer from the inadequate functioning of the pruning phase. In particular, it is known that the size of the resulting tree grows linearly with the sample size, even though the accuracy of the tree does not improve. Reduced Error Pruning is an algorithm that has been used as a representative technique in attempts to explain the problems of decision tree learning. In this paper we present analyses of Reduced Error Pruning in three different settings. First we study the basic algorithmic properties of the method, properties that hold independent of the input decision tree and pruning examples. Then we examine a situation that intuitively should lead to the subtree under consideration to be replaced by a leaf node, one in which the class label and attribute values of the pruning examples are independent of each other. This analysis is conducted under two different assumptions. The general analysis shows that the pruning probability of a node fitting pure noise is bounded by a function that decreases exponentially as the size of the tree grows. In a specific analysis we assume that the examples are distributed uniformly to the tree. This assumption lets us approximate the number of subtrees that are pruned because they do not receive any pruning examples. This paper clarifies the different variants of the Reduced Error Pruning algorithm, brings new insight to its algorithmic properties, analyses the algorithm with less imposed assumptions than before, and includes the previously overlooked empty subtrees to the analysis.


algorithmic learning theory | 2006

Active learning in the non-realizable case

Matti Kääriäinen

Most of the existing active learning algorithms are based on the realizability assumption: The learners hypothesis class is assumed to contain a target function that perfectly classifies all training and test examples. This assumption can hardly ever be justified in practice. In this paper, we study how relaxing the realizability assumption affects the sample complexity of active learning. First, we extend existing results on query learning to show that any active learning algorithm for the realizable case can be transformed to tolerate random bounded rate class noise. Thus, bounded rate class noise adds little extra complications to active learning, and in particular exponential label complexity savings over passive learning are still possible. However, it is questionable whether this noise model is any more realistic in practice than assuming no noise at all. Our second result shows that if we move to the truly non-realizable model of statistical learning theory, then the label complexity of active learning has the same dependence Ω(1/∈ 2 ) on the accuracy parameter e as the passive learning label complexity. More specifically, we show that under the assumption that the best classifier in the learners hypothesis class has generalization error at most β > 0, the label complexity of active learning is Ω(β 2 /∈ 2 log(1/δ)), where the accuracy parameter e measures how close to optimal within the hypothesis class the active learner has to get and δ is the confidence parameter. The implication of this lower bound is that exponential savings should not be expected in realistic models of active learning, and thus the label complexity goals in active learning should be refined.


computer vision and pattern recognition | 2007

Practical Online Active Learning for Classification

Claire Monteleoni; Matti Kääriäinen

We compare the practical performance of several recently proposed algorithms for active learning in the online classification setting. We consider two active learning algorithms (and their combined variants) that are strongly online, in that they access the data sequentially and do not store any previously labeled examples, and for which formal guarantees have recently been proven under various assumptions. We motivate an optical character recognition (OCR) application that we argue to be appropriately served by online active learning. We compare the practical efficacy, for this application, of the algorithm variants, and show significant reductions in label-complexity over random sampling.


empirical methods in natural language processing | 2009

Sinuhe -- Statistical Machine Translation using a Globally Trained Conditional Exponential Family Translation Model

Matti Kääriäinen

We present a new phrase-based conditional exponential family translation model for statistical machine translation. The model operates on a feature representation in which sentence level translations are represented by enumerating all the known phrase level translations that occur inside them. This makes the model a good match with the commonly used phrase extraction heuristics. The models predictions are properly normalized probabilities. In addition, the model automatically takes into account information provided by phrase overlaps, and does not suffer from reference translation reachability problems. We have implemented an open source translation system Sinuhe based on the proposed translation model. Our experiments on Europarl and GigaFrEn corpora demonstrate that finding the unique MAP parameters for the model on large scale data is feasible with simple stochastic gradient methods. Sinuhe is fast and memory efficient, and the BLEU scores obtained by it are only slightly inferior to those of Moses.


european conference on machine learning | 2003

Rademacher penalization over decision tree prunings

Matti Kääriäinen; Tapio Elomaa

Rademacher penalization is a modern technique for obtaining data-dependent bounds on the generalization error of classifiers. It would appear to be limited to relatively simple hypothesis classes because of computational complexity issues. In this paper we, nevertheless, apply Rademacher penalization to the in practice important hypothesis class of unrestricted decision trees by considering the prunings of a given decision tree rather than the tree growing phase. Moreover, we generalize the error-bounding approach from binary classification to multi-class situations. Our empirical experiments indicate that the proposed new bounds clearly outperform earlier bounds for decision tree prunings and provide non-trivial error estimates on real-world data sets.


european conference on machine learning | 2001

On the Practice of Branching Program Boosting

Tapio Elomaa; Matti Kääriäinen

Branching programs are a generalization of decision trees. From the viewpoint of boosting theory the former appear to be exponentially more efficient. However, earlier experience demonstrates that such results do not necessarily translate to practical success. In this paper we develop a practical version of Mansour and McAllesters [13] algorithm for branching program boosting. We test the algorithm empirically with real-world and synthetic data. Branching programs attain the same prediction accuracy level as C4.5. Contrary to the implications of the boosting analysis, they are not significantly smaller than the corresponding decision trees. This further corroborates the earlier observations on the way in which boosting analyses bear practical significance.


international joint conference on neural network | 2006

Semi-Supervised Model Selection Based on Cross-Validation

Matti Kääriäinen

We propose a new semi-supervised model selection method that is derived by applying the structural risk minimization principle to a recent semi-supervised generalization error bound. This bound that we build on is based on the cross-validation estimate underlying the popular cross-validation model selection heuristic. Thus, the proposed semi-supervised method is closely connected to cross-validation which makes studying these methods side by side very natural. We evaluate the performance of the proposed method and the cross-validation heuristic empirically on the task of selecting the parameters of support vector machines. The experiments indicate that the models selected by the two methods have roughly the same accuracy. However, whereas the cross-validation heuristic only proposes which classifier to choose, the semi-supervised method provides also a reliable and reasonably tight generalization error guarantee for the chosen classifier. Thus, when unlabeled data is available, the proposed semi-supervised method seems to have an advantage when reliable error guarantees are called for. In addition to the empirical evaluation, we also analyze the theoretical properties of the proposed method and prove that under suitable conditions it converges to the optimal model.We propose a new semi-supervised model selection method that is derived by applying the structural risk minimization principle to a recent semi-supervised generalization error bound. This bound that we build on is based on the cross-validation estimate underlying the popular cross-validation model selection heuristic. Thus, the proposed semi-supervised method is closely connected to cross-validation which makes studying these methods side by side very natural. We evaluate the performance of the proposed method and the cross-validation heuristic empirically on the task of selecting the parameters of support vector machines. The experiments indicate that the models selected by the two methods have roughly the same accuracy. However, whereas the cross-validation heuristic only proposes which classifier to choose, the semi-supervised method provides also a reliable and reasonably tight generalization error guarantee for the chosen classifier. Thus, when unlabeled data is available, the proposed semi-supervised method seems to have an advantage when reliable error guarantees are called for. In addition to the empirical evaluation, we also analyze the theoretical properties of the proposed method and prove that under suitable conditions it converges to the optimal model.


Annals of Mathematics and Artificial Intelligence | 2004

The Difficulty of Reduced Error Pruning of Leveled Branching Programs

Tapio Elomaa; Matti Kääriäinen

Induction of decision trees is one of the most successful approaches to supervised machine learning. Branching programs are a generalization of decision trees and, by the boosting analysis, exponentially more efficiently learnable than decision trees. However, this advantage has not been seen to materialize in experiments. Decision trees are easy to simplify using pruning. Reduced error pruning is one of the simplest decision tree pruning algorithms. For branching programs no pruning algorithms are known. In this paper we prove that reduced error pruning of branching programs is infeasible. Finding the optimal pruning of a branching program with respect to a set of pruning examples that is separate from the set of training examples is NP-complete. Because of this intractability result, we have to consider approximating reduced error pruning. Unfortunately, it turns out that even finding an approximate solution of arbitrary accuracy is computationally infeasible. In particular, reduced error pruning of branching programs is APX-hard. Our experiments show that, despite the negative theoretical results, heuristic pruning of branching programs can reduce their size without significantly altering the accuracy.


Information Processing Letters | 2003

Reduced error pruning of branching programs cannot be approximated to within a logarithmic factor

Richard Nock; Tapio Elomaa; Matti Kääriäinen

In this paper, we prove under a plausible complexity hypothesis that Reduced Error Pruning of branching programs is hard to approximate within log1-δ n, for every δ > 0, where n is the number of description variables, a measure of the problems complexity. The result holds under the assumption that NP problems do not admit deterministic, slightly superpolynomial time algorithms: NP ⊄ TIME(|I|O(loglog |I|)). This improves on a previous result that only had a small constant inapproximability ratio, and puts a fairly strong constraint on the efficiency of potential approximation algorithms. The result also holds for read-once and µ-branching programs.


Lecture Notes in Computer Science | 2005

Generalization error bounds using unlabeled data

Matti Kääriäinen

Collaboration


Dive into the Matti Kääriäinen's collaboration.

Top Co-Authors

Avatar

Tapio Elomaa

Tampere University of Technology

View shared research outputs
Top Co-Authors

Avatar

Claire Monteleoni

George Washington University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Richard Nock

Australian National University

View shared research outputs
Researchain Logo
Decentralizing Knowledge