Nikolaos Vasiloglou | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nikolaos Vasiloglou is active.

Explore More

Publication

Featured researches published by Nikolaos Vasiloglou.

international workshop on machine learning for signal processing | 2008

Scalable semidefinite manifold learning

Nikolaos Vasiloglou; Alexander G. Gray; David V. Anderson

Maximum variance unfolding (MVU) is among the state of the art manifold learning (ML) algorithms and experimentally proven to be the best method to unfold a manifold to its intrinsic dimension. Unfortunately it doesnpsilat scale for more than a few hundred points. A non convex formulation of MVU made it possible to scale up to a few thousand points with the risk of getting trapped in local minima. In this paper we demonstrate techniques based on the dual-tree algorithm and L-BFGS that allow MVU to scale up to 100,000 points. We also present a new variant called maximum furthest neighbor unfolding (MFNU) which performs even better than MVU in terms of avoiding local minima.

international conference on digital signal processing | 2009

Learning the Intrinsic Dimensions of the Timit Speech Database with Maximum Variance Unfolding

Nikolaos Vasiloglou; David V. Anderson; Alexander G. Gray

Modern methods for nonlinear dimensionality reduction have been used extensively in the machine learning community for discovering the intrinsic dimension of several datasets. In this paper we apply one of the most successful ones Maximum Variance Unfolding on a big sample of the well known speech benchmark TIMIT. Although MVU is not generally scalable, we managed to apply to 1 million 39-dimensional points and successfully reduced the dimension down to 15. In this paper we apply some of the state-of-the-art techniques for handling big datasets. The biggest bottleneck is the local neighborhood computation. For 300K points it took 9 hours while for 1M points it took 3.5 days. We also demonstrate the weakness of MFCC representation under the k-nearest neighborhood classification since the error rate is more than 50%.

2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing | 2006

Parameter Estimation for Manifold Learning, Through Density Estimation

Nikolaos Vasiloglou; Alexander G. Gray; David V. Anderson

Manifold learning turns out to be a very useful tool for many applications of machine learning, such as classification. Unfortunately the existing algorithms use ad hoc selection of the parameters that define the geometry of the manifold. The parameter choice affects significantly the performance of manifold learning algorithms. Recent theoretical work has proven the equivalence between the Mercer kernel learning methods and the kernel in kernel density estimation. Based on this fact the problem of kernel parameter estimation for manifold learning is addressed based on the nonparametric statistical theory estimation. An automatic way of determining the local bandwidths that define the geometry is introduced. The results show that the automatic bandwidth selection leads to improved clustering performance and reduces the computational load versus ad hoc selection.

Second International Conference on Web Delivering of Music, 2002. WEDELMUSIC 2002. Proceedings. | 2002

Lossless audio coding with MPEG-4 structured audio

Nikolaos Vasiloglou; Ronald W. Schafer; Mat C. Hans

MPEG-4 structured audio (SA) has been proposed as a flexible standard for generalized audio coding. Originating out of Netsound software developed at MIT SA is based on MIDI-synthesis of sound, but it is enriched with DSP algorithms so as to allow emulation of other types of coders designed for speech and audio signals. We have investigated the use of structured audio for lossless coding of audio signals, and have found that certain limitations of structured audio make implementations of lossless coders less straightforward than might be desired. In particular we have used SA to implement an MPEG-4 compliant version of the lossless audio coder AudioPaK. To implement and validate our new coder we used the software system Sfront, which translates MPEG-4 SA files into efficient C programs that render the audio signal.

international workshop on machine learning for signal processing | 2011

Learning distances to improve phoneme classification

Ryan R. Curtin; Nikolaos Vasiloglou; David V. Anderson

In this work we aim to learn a Mahalanobis distance to improve the performance of phoneme classification using the standard 39-dimensional MFCC features. To learn and to evaluate the performance of our distance, we use the simple k-nearest-neighbors (k-NN) classifier. Although this classifier exhibits low performance relative to state-of-the-art phoneme classifiers, it can be used to determine a distance metric that is applicable to many other better-performing machine learning methods. We devise a novel optimization method that minimizes the error function of the k-NN classifier with respect to the covariance matrix of the Mahalanobis distance, based on finite-difference stochastic approximation (FDSA) gradient estimates combined with a random perturbation term to avoid local minima. We apply our method to the problem of phoneme classification with the k-NN classifier and show that our learned distance provides performance improvement of up to 8:19% over the standard k-NN classifier, and additionally outperforms other state-of-the-art distance learning methods by approximately 4 percentage points. We also find that the computational complexity of our method, while not optimal, is better than other distance learning methods. The performance improvements for individual phoneme classes are given. The distances learned are applicable to other scale-variant machine learning methods, such as support vector machines, multidimensional scaling, and maximum variance unfolding, as well as others.

international workshop on machine learning for signal processing | 2009

Learning Isometric Separation Maps

Nikolaos Vasiloglou; Alexander G. Gray; David V. Anderson

Maximum Variance Unfolding (MVU) and its variants have been very successful in embedding data-manifolds in lower dimensional spaces, often revealing the true intrinsic dimension. In this paper we show how to also incorporate supervised class information into an MVU-like method without breaking its convexity. We call this method the Isometric Separation Map and we show that the resulting kernel matrix can be used as a binary/multiclass Support Vector Machine-like method in a semi-supervised (transductive) framework. We also show that the method always finds a kernelmatrix that linearly separates the training data exactly without projecting them in infinite dimensional spaces. In traditional SVMs we choose a kernel and hope that the data become linearly separable in the kernel space. In this paper we show how the hyperplane can be chosen ad-hoc and the kernel is trained so that data are always linearly separable. Comparisons with Large Margin SVMs show comparable performance.

asilomar conference on signals, systems and computers | 2004

Isolated word, speaker dependent recognition under the presence noise, based on an audio retrieval algorithm

Nikolaos Vasiloglou; Ronald W. Schafer; Mat C. Hans

With rapidly increasing storage and computational capacity, a common PC can store and index hundreds of hours of speech. This suggests that new approaches based on database techniques might be useful in speech recognition and speech indexing. This paper presents a first step in such a direction. The algorithm developed relies on an indexed single-speaker database. The database consists of spoken utterances transcribed into text. The waveforms of these utterances are converted off-line to binary symbols called fingerprints through a nonlinear frequency-domain transform. The fingerprints are associated with the transcribed text. Given the fingerprint of a new waveform, the best word match from the database can be retrieved. A 3255 word database is used as a test bed. All the words from this database are mixed with white noise and time-scale modified to provide test data. The database is queried with the fingerprint of the test words and the best match is retrieved. The results of the experiments conducted are promising, showing a 99.5% recognition rate for a 20 dB signal to noise ratio (SNR).

siam international conference on data mining | 2008