Yannis Panagakis
Imperial College London
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yannis Panagakis.
IEEE Transactions on Audio, Speech, and Language Processing | 2010
Yannis Panagakis; Constantine Kotropoulos; Gonzalo R. Arce
Motivated by psychophysiological investigations on the human auditory system, a bio-inspired two-dimensional auditory representation of music signals is exploited, that captures the slow temporal modulations. Although each recording is represented by a second-order tensor (i.e., a matrix), a third-order tensor is needed to represent a music corpus. Non-negative multilinear principal component analysis (NMPCA) is proposed for the unsupervised dimensionality reduction of the third-order tensors. The NMPCA maximizes the total tensor scatter while preserving the non-negativity of auditory representations. An algorithm for NMPCA is derived by exploiting the structure of the Grassmann manifold. The NMPCA is compared against three multilinear subspace analysis techniques, namely the non-negative tensor factorization, the high-order singular value decomposition, and the multilinear principal component analysis as well as their linear counterparts, i.e., the non-negative matrix factorization, the singular value decomposition, and the principal components analysis in extracting features that are subsequently classified by either support vector machine or nearest neighbor classifiers. Three different sets of experiments conducted on the GTZAN and the ISMIR2004 Genre datasets demonstrate the superiority of NMPCA against the aforementioned subspace analysis techniques in extracting more discriminating features, especially when the training set has small cardinality. The best classification accuracies reported in the paper exceed those obtained by the state-of-the-art music genre classification algorithms applied to both datasets.
international conference on acoustics, speech, and signal processing | 2010
Yannis Panagakis; Constantine Kotropoulos
Motivated by the rich, psycho-physiologically grounded properties of auditory cortical representations and the power of sparse representation-based classifiers, we propose a robust music genre classification framework. Its first pilar is a novel multilinear subspace analysis method that reduces the dimensionality of cortical representations of music signals, while preserving the topology of the cortical representations. Its second pilar is the sparse representation based classification, that models any test cortical representation as a sparse weighted sum of dictionary atoms, which stem from training cortical representations of known genre, by assuming that the representations of music recordings of the same genre are close enough in the tensor space they lie. Accordingly, the dimensionality reduction is made in a compatible manner to the working principle of the sparse-representation based classification. Music genre classification accuracy of 93.7% and 94.93% is reported on the GTZAN and the ISMIR2004 Genre datasets, respectively. Both accuracies outperform any accuracy ever reported for state of the art music genre classification algorithms applied to the aforementioned datasets.
IEEE Transactions on Audio, Speech, and Language Processing | 2014
Yannis Panagakis; Constantine Kotropoulos; Gonzalo R. Arce
A novel framework for music genre classification, namely the joint sparse low-rank representation (JSLRR) is proposed in order to: 1) smooth the noise in the test samples, and 2) identify the subspaces that the test samples lie onto. An efficient algorithm is proposed for obtaining the JSLRR and a novel classifier is developed, which is referred to as the JSLRR-based classifier. Special cases of the JSLRR-based classifier are the joint sparse representation-based classifier and the low-rank representation-based one. The performance of the three aforementioned classifiers is compared against that of the sparse representation-based classifier, the nearest subspace classifier, the support vector machines, and the nearest neighbor classifier for music genre classification on six manually annotated benchmark datasets. The best classification results reported here are comparable with or slightly superior than those obtained by the state-of-the-art music genre classification methods.
international conference on computer vision | 2015
Christos Sagonas; Yannis Panagakis; Stefanos Zafeiriou; Maja Pantic
Recently, it has been shown that excellent results can be achieved in both facial landmark localization and pose-invariant face recognition. These breakthroughs are attributed to the efforts of the community to manually annotate facial images in many different poses and to collect 3D facial data. In this paper, we propose a novel method for joint frontal view reconstruction and landmark localization using a small set of frontal images only. By observing that the frontal facial image is the one having the minimum rank of all different poses, an appropriate model which is able to jointly recover the frontalized version of the face as well as the facial landmarks is devised. To this end, a suitable optimization problem, involving the minimization of the nuclear norm and the matrix l1 norm is solved. The proposed method is assessed in frontal face reconstruction, face landmark localization, pose-invariant face recognition, and face verification in unconstrained conditions. The relevant experiments have been conducted on 8 databases. The experimental results demonstrate the effectiveness of the proposed method in comparison to the state-of-the-art methods for the target problems.
computer vision and pattern recognition | 2014
Christos Sagonas; Yannis Panagakis; Stefanos Zafeiriou; Maja Pantic
The construction of Facial Deformable Models (FDMs) is a very challenging computer vision problem, since the face is a highly deformable object and its appearance drastically changes under different poses, expressions, and illuminations. Although several methods for generic FDMs construction, have been proposed for facial landmark localization in still images, they are insufficient for tasks such as facial behaviour analysis and facial motion capture where perfect landmark localization is required. In this case, person-specific FDMs (PSMs) are mainly employed, requiring manual facial landmark annotation for each person and person-specific training. In this paper, a novel method for the automatic construction of PSMs is proposed. To this end, an orthonormal subspace which is suitable for facial image reconstruction is learnt. Next, to correct the fittings of a generic model, image congealing (i.e., batch image aliment) is performed by employing only the learnt orthonormal subspace. Finally, the corrected fittings are used to construct the PSM. The image congealing problem is solved by formulating a suitable sparsity regularized rank minimization problem. The proposed method outperforms the state-of-the art methods that is compared to, in terms of both landmark localization accuracy and computational time.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2016
Yannis Panagakis; Mihalis A. Nicolaou; Stefanos Zafeiriou; Maja Pantic
Recovering correlated and individual components of two, possibly temporally misaligned, sets of data is a fundamental task in disciplines such as image, vision, and behavior computing, with application to problems such as multi-modal fusion (via correlated components), predictive analysis, and clustering (via the individual ones). Here, we study the extraction of correlated and individual components under real-world conditions, namely i) the presence of gross non-Gaussian noise and ii) temporally misaligned data. In this light, we propose a method for the Robust Correlated and Individual Component Analysis (RCICA) of two sets of data in the presence of gross, sparse errors. We furthermore extend RCICA in order to handle temporal incongruities arising in the data. To this end, two suitable optimization problems are solved. The generality of the proposed methods is demonstrated by applying them onto 4 applications, namely i) heterogeneous face recognition, ii) multi-modal feature fusion for human behavior analysis (i.e., audio-visual prediction of interest and conflict), iii) face clustering, and iv) the temporal alignment of facial expressions. Experimental results on 2 synthetic and 7 real world datasets indicate the robustness and effectiveness of the proposed methods on these application domains, outperforming other state-of-the-art methods in the field.
international conference on acoustics, speech, and signal processing | 2014
Mihalis A. Nicolaou; Yannis Panagakis; Stefanos Zafeiriou; Maja Pantic
The problem of automatically estimating the interest level of a subject has been gaining attention by researchers, mostly due to the vast applicability of interest detection. In this work, we obtain a set of continuous interest annotations for the SE-MAINE database, which we analyse also in terms of emotion dimensions such as valence and arousal. Most importantly, we propose a robust variant of Canonical Correlation Analysis (RCCA) for performing audio-visual fusion, which we apply to the prediction of interest. RCCA recovers a low-rank subspace which captures the correlations of fused modalities, while isolating gross errors in the data without making any assumptions regarding Gaussianity. We experimentally show that RCCA is more appropriate than other standard fusion techniques (such as l2-CCA and feature-level fusion), since it both captures interactions between modalities while also decontaminating the obtained subspace from errors which are dominant in real-world problems.
Pattern Recognition Letters | 2014
Yannis Panagakis; Constantine Kotropoulos
A novel homogeneity-based method for music structure analysis is proposed. The heart of the method is a similarity measure, derived from first principles, that is based on the matrix Elastic Net (EN) regularization and deals efficiently with highly correlated audio feature vectors. In particular, beat-synchronous mel-frequency cepstral coefficients, chroma features, and auditory temporal modulations model the audio signal. The EN induced similarity measure is employed to construct an affinity matrix, yielding a novel subspace clustering method referred to as Elastic Net subspace clustering (ENSC). The performance of the ENSC in structure analysis is assessed by conducting extensive experiments on the Beatles dataset. The experimental findings demonstrate the descriptive power of the EN-based affinity matrix over the affinity matrices employed in subspace clustering methods, attaining the state-of-the-art performance reported for the Beatles dataset.
british machine vision conference | 2014
Georgios Papamakarios; Yannis Panagakis; Stefanos Zafeiriou
The robust estimation of the low-dimensional subspace that spans the data from a set of high-dimensional, possibly corrupted by gross errors and outliers observations is fundamental in many computer vision problems. The state-of-the-art robust principal component analysis (PCA) methods adopt convex relaxations of `0 quasi-norm-regularised rank minimisation problems. That is, the nuclear norm and the `1-norm are employed. However, this convex relaxation may make the solutions deviate from the original ones. To this end, the Generalised Scalable Robust PCA (GSRPCA) is proposed, by reformulating the robust PCA problem using the Schatten p-norm and the `q-norm subject to orthonormality constraints, resulting in a better non-convex approximation of the original sparsity regularised rank minimisation problem. It is worth noting that the common robust PCA variants are special cases of the GSRPCA when p = q = 1 and by properly choosing the upper bound of the number of the principal components. An efficient algorithm for the GSRPCA is developed. The performance of the GSRPCA is assessed by conducting experiments on both synthetic and real data. The experimental results indicate that the GSRPCA outperforms the common state-of-the-art robust PCA methods without introducing much extra computational cost.
computer vision and pattern recognition | 2013
Yannis Panagakis; Mihalis A. Nicolaou; Stefanos Zafeiriou; Maja Pantic
Temporal alignment of human behaviour from visual data is a very challenging problem due to a numerous reasons, including possible large temporal scale differences, inter/intra subject variability and, more importantly, due to the presence of gross errors and outliers. Gross errors are often in abundance due to incorrect localization and tracking, presence of partial occlusion etc. Furthermore, such errors rarely follow a Gaussian distribution, which is the de-facto assumption in machine learning methods. In this paper, building on recent advances on rank minimization and compressive sensing, a novel, robust to gross errors temporal alignment method is proposed. While previous approaches combine the dynamic time warping (DTW) with low-dimensional projections that maximally correlate two sequences, we aim to learn two underlying projection matrices (one for each sequence), which not only maximally correlate the sequences but, at the same time, efficiently remove the possible corruptions in any datum in the sequences. The projections are obtained by minimizing the weighted sum of nuclear and ℓ1 norms, by solving a sequence of convex optimization problems, while the temporal alignment is found by applying the DTW in an alternating fashion. The superiority of the proposed method against the state-of-the-art time alignment methods, namely the canonical time warping and the generalized time warping, is indicated by the experimental results on both synthetic and real datasets.