Aggelos Pikrakis
University of Piraeus
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Aggelos Pikrakis.
international conference on acoustics, speech, and signal processing | 2009
Theodoros Giannakopoulos; Aggelos Pikrakis; Sergios Theodoridis
In this paper we present a novel method for extracting affective information from movies, based on speech data. The method is based on a 2-D representation of speech emotions (Emotion Wheel). The goal is twofold. First, to investigate whether the Emotion Wheel offers a good representation for emotions associated with speech signals. To this end, several humans have manually annotated speech data from movies using the Emotion Wheel and the level of disagreement has been computed as a measure of representation quality. The results indicate that the emotion wheel is a good representation of emotions in speech data. Second, a regression approach is adopted, in order to predict the location of an unknown speech segment in the Emotion Wheel. Each speech segment is represented by a vector of ten audio features. The results indicate that the resulting architecture can estimate emotion states of speech from movies, with sufficient accuracy.
IEEE Transactions on Multimedia | 2008
Aggelos Pikrakis; Theodoros Giannakopoulos; Sergios Theodoridis
This paper presents a multistage system for speech/music discrimination which is based on a three-step procedure. The first step is a computationally efficient scheme consisting of a region growing technique and operates on a 1-D feature sequence, which is extracted from the raw audio stream. This scheme is used as a preprocessing stage and yields segments with high music and speech precision at the expense of leaving certain parts of the audio recording unclassified. The unclassified parts of the audio stream are then fed as input to a more computationally demanding scheme. The latter treats speech/music discrimination of radio recordings as a probabilistic segmentation task, where the solution is obtained by means of dynamic programming. The proposed scheme seeks the sequence of segments and respective class labels (i.e., speech/music) that maximize the product of posterior class probabilities, given the data that form the segments. To this end, a Bayesian Network combiner is embedded as a posterior probability estimator. At a final stage, an algorithm that performs boundary correction is applied to remove possible errors at the boundaries of the segments (speech or music) that have been previously generated. The proposed system has been tested on radio recordings from various sources. The overall system accuracy is approximately 96%. Performance results are also reported on a musical genre basis and a comparison with existing methods is given.
IEEE Transactions on Speech and Audio Processing | 2003
Aggelos Pikrakis; Sergios Theodoridis; Dimitris Kamarotos
This paper presents an efficient method for recognizing isolated musical patterns in a monophonic environment, using a novel extension of Dynamic Time Warping, which we call Context Dependent Dynamic Time Warping. Each pattern is converted into a sequence of frequency jumps by means of a fundamental frequency tracking algorithm, followed by a quantizer. The resulting sequence of frequency jumps is presented to the input of the recognizer which employs Context Dependent Dynamic Time Warping. The main characteristic of Context Dependent Dynamic Time Warping is that it exploits the correlation exhibited among adjacent frequency jumps of the feature sequence. The methodology has been tested in the context of Greek Traditional Music, which exhibits certain characteristics that make the classification task harder, when compared with Western musical tradition. A recognition rate higher than 95% was achieved.
IEEE Transactions on Audio, Speech, and Language Processing | 2006
Aggelos Pikrakis; Sergios Theodoridis; Dimitris Kamarotos
This paper presents a new extension to the variable duration hidden Markov model (HMM), capable of classifying musical pattens that have been extracted from raw audio data into a set of predefined classes. Each musical pattern is converted into a sequence of music intervals by means of a fundamental frequency tracking procedure. This sequence is subsequently presented as input to a set of variable-duration HMMs. Each one of these models has been trained to recognize patterns of a corresponding predefined class. Classification is determined based on the highest recognition probability. The new type of variable-duration hidden Markov modeling proposed in this paper results in enhanced performance because 1) it deals effectively with errors that commonly originate during the feature extraction stage, and 2) it accounts for variations due to the individual expressive performance of different instrument players. To demonstrate its effectiveness, the novel classification scheme has been employed in the context of Greek traditional music, to monophonic musical patterns of a popular instrument, the Greek traditional clarinet. Although the method is also appropriate for western-style music, Greek traditional music poses extra difficulties and makes music pattern recognition a harder task. The classification results demonstrate that the new approach outperforms previous work based on conventional HMMs
multimedia signal processing | 2007
Theodoros Giannakopoulos; Aggelos Pikrakis; Sergios Theodoridis
In this work, we present a multi-class classification algorithm for audio segments recorded from movies, focusing on the detection of violent content, for protecting sensitive social groups (e.g. children). Towards this end, we have used twelve audio features stemming from the nature of the signals under study. In order to classify the audio segments into six classes (three of them violent), Bayesian networks have been used in combination with the one versus all classification architecture. The overall system has been trained and tested on a large data set (5000 audio segments), recorded from more than 30 movies of several genres. Experiments showed, that the proposed method can be used as an accurate multi-class classification scheme, but also, as a binary classifier for the problem of violent -non violent audio content.
international conference on pattern recognition | 2008
Theodoros Giannakopoulos; Aggelos Pikrakis; Sergios Theodoridis
In this paper, a novel approach to audio segmentation is presented. The problem of detecting audio segmentspsila limits is treated as a binary classification task. Frames are classified as ldquosegment limitsrdquo vs ldquononsegment limitsrdquo. For each audio frame a spectrogram is computed and eight feature values are extracted from respective frequency bands. Final decisions are taken based on a classifier combination scheme. The algorithm has very low complexity with almost real time performance. It achieves 86% accuracy rate on real audio streams extracted from movies. Moreover, it introduces a general framework to audio segmentation, which does not depend explicitly on the number of audio classes.
international conference on pattern recognition | 2010
Theodoros Giannakopoulos; Aggelos Pikrakis; Sergios Theodoridis
This paper presents a method for detecting violent content in video sharing sites. The proposed approach operates on a fusion of three modalities: audio, moving image and text data, the latter being collected from the accompanying user comments. The problem is treated as a binary classification task (violent vs non-violent content) on a 9-dimensional feature space, where 7 out of 9 features are extracted from the audio stream. The proposed method has been evaluated on 210 YouTube videos and the overall accuracy has reached 82%.
international conference on acoustics, speech, and signal processing | 2006
Theodoros Giannakopoulos; Aggelos Pikrakis; Sergios Theodoridis
This paper presents a speech/music discriminator for radio recordings. The segmentation stage is based on the detection of changes in the energy distribution of the audio signal. For the classification stage, Bayesian networks have been adopted in order to combine the results of nine k-nearest neighbor classifiers trained on individual features. To this end, a comparison of the performance of three popular Bayesian network architectures is presented. Furthermore, in order to reduce the number of features used for classification, a new feature selection scheme is introduced, that is also based on the properties of Bayesian networks. The proposed system has been tested on real Internet broadcasts of BBC radio stations
Cognitive Information Processing (CIP), 2014 4th International Workshop on | 2014
Bob L. Sturm; Corey Kereliuk; Aggelos Pikrakis
Systems built using deep learning neural networks trained on low-level spectral periodicity features (DeSPerF) reproduced the most “ground truth” of the systems submitted to the MIREX 2013 task, “Audio Latin Genre Classification.” To answer why this was the case, we take a closer look at the behavior of a DeSPerF system we create and evaluate using the benchmark dataset BALLROOM. We find through time stretching that this DeSPerF system appears to obtain a high figure of merit on the task of music genre recognition because of a confounding of tempo with “ground truth” in BALLROOM. This observation motivates several predictions.
international conference on music and artificial intelligence | 2002
Aggelos Pikrakis; Sergios Theodoridis; Dimitris Kamarotos
This paper presents an efficient method for recognizing isolated musical patterns in a monophonic environment, using Discrete Observation Hidden Markov Models. Each musical pattern is converted into a sequence of music intervals by means of a fundamental frequency tracking algorithm followed by a quantizer. The resulting sequence of music intervals is presented to the input of a set of Discrete Observation Hidden Markov models, each of which has been trained to recognize a specific type of musical patterns. Our methodology has been tested in the context of Greek Traditional Music, which exhibits certain characteristics that make the classification task harder, when compared with Western musical tradition. A recognition rate higher than 95% was achieved. To our knowledge, it is the first time that the problem of isolated musical pattern recognition has been treated using Hidden Markov Models.