Ville Hautamäki | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ville Hautamäki is active.

Explore More

Publication

Featured researches published by Ville Hautamäki.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2006

Fast Agglomerative Clustering Using a k-Nearest Neighbor Graph

Pasi Fränti; Olli Virmajoki; Ville Hautamäki

We propose a fast agglomerative clustering method using an approximate nearest neighbor graph for reducing the number of distance calculations. The time complexity of the algorithm is improved from O(tauN2) to O(tauN log N) at the cost of a slight increase in distortion; here, tau denotes the lumber of nearest neighbor updates required at each iteration. According to the experiments, a relatively small neighborhood size is sufficient to maintain the quality close to that of the full search

international conference on pattern recognition | 2004

Outlier detection using k-nearest neighbour graph

Ville Hautamäki; Ismo Kärkkäinen; Pasi Fränti

We present an outlier detection using indegree number (ODIN) algorithm that utilizes k-nearest neighbour graph. Improvements to existing kNN distance-based method are also proposed. We compare the methods with real and synthetic datasets. The results show that the proposed method achieves reasonable results with synthetic data and outperforms compared methods with real data sets with small number of observations.

international conference on pattern recognition | 2008

Time-series clustering by approximate prototypes

Ville Hautamäki; Pekka Nykanen; Pasi Fränti

Clustering time-series data poses problems, which do not exist in traditional clustering in Euclidean space. Specifically, cluster prototype needs to be calculated, where common solution is to use cluster medoid. In this work, we define an optimal prototype as an optimization problem and propose a local search solution to it. We experimentally compare different time-series clustering methods and find out that the proposed prototype with agglomerative clustering followed by k-means algorithm provides best clustering accuracy.

scandinavian conference on image analysis | 2005

Improving k-means by outlier removal

Ville Hautamäki; Svetlana Cherednichenko; Ismo Kärkkäinen; Tomi Kinnunen; Pasi Fränti

We present an Outlier Removal Clustering (ORC) algorithm that provides outlier detection and data clustering simultaneously. The method employs both clustering and outlier discovery to improve estimation of the centroids of the generative distribution. The proposed algorithm consists of two stages. The first stage consist of purely K-means process, while the second stage iteratively removes the vectors which are far from their cluster centroids. We provide experimental results on three different synthetic datasets and three map images which were corrupted by lossy compression. The results indicate that the proposed method has a lower error on datasets with overlapping clusters than the competing methods.

advanced concepts for intelligent vision systems | 2008

Knee Point Detection in BIC for Detecting the Number of Clusters

Qinpei Zhao; Ville Hautamäki; Pasi Fränti

Bayesian Information Criterion (BIC) is a promising method for detecting the number of clusters. It is often used in model-based clustering in which a decisive first local maximum is detected as the number of clusters. In this paper, we re-formulate the BIC in partitioning based clustering algorithm, and propose a new knee point finding method based on it. Experimental results show that the proposed method detects the correct number of clusters more robustly and accurately than the original BIC and performs well in comparison to several other cluster validity indices.

IEEE Signal Processing Letters | 2008

Maximum a Posteriori Adaptation of the Centroid Model for Speaker Verification

Ville Hautamäki; Tomi Kinnunen; Ismo Kärkkäinen; Juhani Saastamoinen; Marko Tuononen; Pasi Fränti

Maximum a posteriori adapted Gaussian mixture model (GMM-MAP) is widely used in speaker verification. GMMs have three sets of parameters to be adapted: means, covariances, and weights. However, practice has shown that it is sufficient to adapt the means only. Motivated by this, we formulate maximum a posteriori vector quantization (VQ-MAP) procedure which stores and adapts the mean vectors (centroids) only. Experiments on the NIST 2001 and NIST 2006 corpora indicate that VQ-MAP gives comparable accuracy with GMM-MAP with simpler implementation and faster adaptation.

IEEE Transactions on Audio, Speech, and Language Processing | 2013

Sparse Classifier Fusion for Speaker Verification

Ville Hautamäki; Tomi Kinnunen; Filip Sedlak; Kong Aik Lee; Bin Ma; Haizhou Li

State-of-the-art speaker verification systems take advantage of a number of complementary base classifiers by fusing them to arrive at reliable verification decisions. In speaker verification, fusion is typically implemented as a weighted linear combination of the base classifier scores, where the combination weights are estimated using a logistic regression model. An alternative way for fusion is to use classifier ensemble selection, which can be seen as sparse regularization applied to logistic regression. Even though score fusion has been extensively studied in speaker verification, classifier ensemble selection is much less studied. In this study, we extensively study a sparse classifier fusion on a collection of twelve I4U spectral subsystems on the NIST 2008 and 2010 speaker recognition evaluation (SRE) corpora.

Pattern Recognition Letters | 2009

Comparative evaluation of maximum a Posteriori vector quantization and gaussian mixture models in speaker verification

Tomi Kinnunen; Juhani Saastamoinen; Ville Hautamäki; Mikko Vinni; Pasi Fränti

Gaussian mixture model with universal background model (GMM-UBM) is a standard reference classifier in speaker verification. We have recently proposed a simplified model using vector quantization (VQ-UBM). In this study, we extensively compare these two classifiers on NIST 2005, 2006 and 2008 SRE corpora, while having a standard discriminative classifier (GLDS-SVM) as a point of reference. We focus on parameter setting for N-top scoring, model order, and performance for different amounts of training data. The most interesting result, against a general belief, is that GMM-UBM yields better results for short segments whereas VQ-UBM is good for long utterances. The results also suggest that maximum likelihood training of the UBM is sub-optimal, and hence, alternative ways to train the UBM should be considered.

Speech Communication | 2015

Automatic versus human speaker verification: The case of voice mimicry

Rosa González Hautamäki; Tomi Kinnunen; Ville Hautamäki; Anne-Maria Laukkanen

In this work, we compare the performance of three modern speaker verification systems and non-expert human listeners in the presence of voice mimicry. Our goal is to gain insights on how vulnerable speaker verification systems are to mimicry attack and compare it to the performance of human listeners. We study both traditional Gaussian mixture model-universal background model (GMM-UBM) and an i-vector based classifier with cosine scoring and probabilistic linear discriminant analysis (PLDA) scoring. For the studied material in Finnish language, the mimicry attack decreased lightly the equal error rate (EER) for GMM-UBM from 10.83 to 10.31, while for i-vector systems the EER increased from 6.80 to 13.76 and from 4.36 to 7.38. The performance of the human listening panel shows that imitated speech increases the difficulty of the speaker verification task. It is even more difficult to recognize a person who is intentionally concealing his or her identity. For Impersonator A, the average listener made 8 errors from 34 trials while the automatic systems had 6 errors in the same set. The average listener for Impersonator B made 7 errors from the 28 trials, while the automatic systems made 7 to 9 errors. A statistical analysis of the listener performance was also conducted. We found out a statistically significant association, with p ¼ 0:00019 and R 2 ¼ 0:59, between listener accuracy and self reported factors only when familiar voices were present in the test.

Digital Signal Processing | 2014

From single to multiple enrollment i-vectors: practical PLDA scoring variants for speaker verification

Padmanabhan Rajan; Anton Afanasyev; Ville Hautamäki; Tomi Kinnunen

Abstract The availability of multiple utterances (and hence, i-vectors) for speaker enrollment brings up several alternatives for their utilization with probabilistic linear discriminant analysis (PLDA). This paper provides an overview of their effective utilization, from a practical viewpoint. We derive expressions for the evaluation of the likelihood ratio for the multi-enrollment case, with details on the computation of the required matrix inversions and determinants. The performance of five different scoring methods, and the effect of i-vector length normalization is compared experimentally. We conclude that length normalization is a useful technique for all but one of the scoring methods considered, and averaging i-vectors is the most effective out of the methods compared. We also study the application of multicondition training on the PLDA model. Our experiments indicate that multicondition training is more effective in estimating PLDA hyperparameters than it is for likelihood computation. Finally, we look at the effect of the configuration of the enrollment data on PLDA scoring, studying the properties of conditional dependence and number-of-enrollment-utterances per target speaker. Our experiments indicate that these properties affect the performance of the PLDA model. These results further support the conclusion that i-vector averaging is a simple and effective way to process multiple enrollment utterances.

Explore More