Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Theodoros Giannakopoulos is active.

Publication


Featured researches published by Theodoros Giannakopoulos.


hellenic conference on artificial intelligence | 2006

Violence content classification using audio features

Theodoros Giannakopoulos; Dimitrios I. Kosmopoulos; Andreas Aristidou; Sergios Theodoridis

This work studies the problem of violence detection in audio data, which can be used for automated content rating. We employ some popular frame-level audio features both from the time and frequency domain. Afterwards, several statistics of the calculated feature sequences are fed as input to a Support Vector Machine classifier, which decides about the segment content with respect to violence. The presented experimental results verify the validity of the approach and exhibit a better performance than the other known approaches.


hellenic conference on artificial intelligence | 2010

Audio-Visual fusion for detecting violent scenes in videos

Theodoros Giannakopoulos; Alexandros Makris; Dimitrios I. Kosmopoulos; Stavros J. Perantonis; Sergios Theodoridis

In this paper we present our research towards the detection of violent scenes in movies, employing fusion methodologies, based on learning Towards this goal, a multi-step approach is followed: initially, automated auditory and visual processing and analysis is performed in order to estimate probabilistic measures regarding particular audio and visual related classes At a second stage, a meta-classification architecture is adopted, which combines the audio and visual information, in order to classify mid-term video segments as “violent” or “non-violent” The proposed scheme has been evaluated on a real dataset from 10 films.


international conference on acoustics, speech, and signal processing | 2009

A dimensional approach to emotion recognition of speech from movies

Theodoros Giannakopoulos; Aggelos Pikrakis; Sergios Theodoridis

In this paper we present a novel method for extracting affective information from movies, based on speech data. The method is based on a 2-D representation of speech emotions (Emotion Wheel). The goal is twofold. First, to investigate whether the Emotion Wheel offers a good representation for emotions associated with speech signals. To this end, several humans have manually annotated speech data from movies using the Emotion Wheel and the level of disagreement has been computed as a measure of representation quality. The results indicate that the emotion wheel is a good representation of emotions in speech data. Second, a regression approach is adopted, in order to predict the location of an unknown speech segment in the Emotion Wheel. Each speech segment is represented by a vector of ten audio features. The results indicate that the resulting architecture can estimate emotion states of speech from movies, with sufficient accuracy.


IEEE Transactions on Multimedia | 2008

A Speech/Music Discriminator of Radio Recordings Based on Dynamic Programming and Bayesian Networks

Aggelos Pikrakis; Theodoros Giannakopoulos; Sergios Theodoridis

This paper presents a multistage system for speech/music discrimination which is based on a three-step procedure. The first step is a computationally efficient scheme consisting of a region growing technique and operates on a 1-D feature sequence, which is extracted from the raw audio stream. This scheme is used as a preprocessing stage and yields segments with high music and speech precision at the expense of leaving certain parts of the audio recording unclassified. The unclassified parts of the audio stream are then fed as input to a more computationally demanding scheme. The latter treats speech/music discrimination of radio recordings as a probabilistic segmentation task, where the solution is obtained by means of dynamic programming. The proposed scheme seeks the sequence of segments and respective class labels (i.e., speech/music) that maximize the product of posterior class probabilities, given the data that form the segments. To this end, a Bayesian Network combiner is embedded as a posterior probability estimator. At a final stage, an algorithm that performs boundary correction is applied to remove possible errors at the boundaries of the segments (speech or music) that have been previously generated. The proposed system has been tested on radio recordings from various sources. The overall system accuracy is approximately 96%. Performance results are also reported on a musical genre basis and a comparison with existing methods is given.


multimedia signal processing | 2007

A Multi-Class Audio Classification Method With Respect To Violent Content In Movies Using Bayesian Networks

Theodoros Giannakopoulos; Aggelos Pikrakis; Sergios Theodoridis

In this work, we present a multi-class classification algorithm for audio segments recorded from movies, focusing on the detection of violent content, for protecting sensitive social groups (e.g. children). Towards this end, we have used twelve audio features stemming from the nature of the signals under study. In order to classify the audio segments into six classes (three of them violent), Bayesian networks have been used in combination with the one versus all classification architecture. The overall system has been trained and tested on a large data set (5000 audio segments), recorded from more than 30 movies of several genres. Experiments showed, that the proposed method can be used as an accurate multi-class classification scheme, but also, as a binary classifier for the problem of violent -non violent audio content.


IEEE Transactions on Audio, Speech, and Language Processing | 2012

Fisher Linear Semi-Discriminant Analysis for Speaker Diarization

Theodoros Giannakopoulos; Sergios Petridis

Given an audio signal with an unknown number of people speaking, speaker diarization aims to automatically answer the question “who spoke when.” Crucial to the success of diarization is the distance metric between speech segments, a factor depending on the choice of the feature space: distances should be low for segments of the same speaker and high for segments of different speakers. Starting from an Mel-frequency cepstrum coefficient (MFCC)-based feature space, an algorithm is proposed that finds a Fisher near-optimal linear discriminant subspace, adapted to the particular speakers which exist in the audio signal. The proposed approach relies on a semi-supervised version of Fisher linear discriminant analysis (FLD), leveraging information from the sequential structure of the audio signal as a substitute for unknown speaker labels. The resulting algorithm is completely unsupervised; therefore, the need for speaker labels in the provided or an independent set is dismissed. The eigenvalue perturbation theory is applied in order to provide optimality bounds with respect to FLD, showing the effectiveness of the approach under the assumption that speakers do not significantly modify the characteristics of their voice. A complete diarization system is then proposed, using fuzzy clustering, a non-parametric K-nearest neighbors classifier and a hidden Markov model. The experimental results show a major improvement of speaker diarization accuracy when using the optimal subspace found by the proposed approach with respect to using the initial MFCC feature space or subspaces found by competitive approaches.


Expert Systems With Applications | 2011

Multimodal and ontology-based fusion approaches of audio and visual processing for violence detection in movies

Thanassis Perperis; Theodoros Giannakopoulos; Alexandros Makris; Dimitrios I. Kosmopoulos; Sofia Tsekeridou; Stavros J. Perantonis; Sergios Theodoridis

In this paper we present our research results towards the detection of violent scenes in movies, employing advanced fusion methodologies, based on learning, knowledge representation and reasoning. Towards this goal, a multi-step approach is followed: initially, automated audio and visual analysis is performed to extract audio and visual cues. Then, two different fusion approaches are deployed: (i) a multimodal one that provides binary decisions on the existence of violence or not, employing machine learning techniques, (ii) an ontological and reasoning one, that combines the audio-visual cues with violence and multimedia ontologies. The latter reasons out not only the existence of violence or not in a video scene, but also the type of violence (fight, screams, gunshots). Both approaches are experimentally tested, validated and compared for the binary decision problem of violence detection. Finally, results for the violence type identification are presented for the ontological fusion approach. For evaluation purposes, a large dataset of real movie data has been populated.


IEEE Transactions on Consumer Electronics | 2005

A practical, real-time speech-driven home automation front-end

Theodoros Giannakopoulos; Nicolas-Alexander Tatlas; Todor Ganchev; Ilyas Potamitis

This work presents an integrated system that uses speech as a natural input modality to provide user-friendly access to information and entertainment devices installed in a real home environment. The practical limitations introduced by the on-line nature of the application as well as the implementation challenges and solutions are analyzed. The focus of the present study is on the implementation on the front-end signal pre-processing block that consist of an array of 8 microphones connected to a multi-channel soundcard and a tandem of workstations performing all signal pre-processing tasks, such as, acquisition, filtering, and beam-forming. Evaluation of the beamformers performance in realistic home environment with controllable noise sources is provided. Furthermore, speech and speaker recognition results using the front-end that was deployed are presented.


international conference on signal processing and multimedia applications | 2014

Clothes change detection using the Kinect sensor

Dimitris Sgouropoulos; Theodoros Giannakopoulos; Sergios Petridis; Stavros J. Perantonis; Antonis Korakis

This paper describes a methodology for detecting when a human has changed clothes. Changing clothes is a basic activity of daily living which makes the methodology valuable for tracking the functional status of elderly people, in the context of a non-contract unobtrusive monitoring system. Our approach uses Kinect and the OpenNI SDK, along with a workflow of basic image analysis steps. Evaluation has been conducted on a set of real recordings under various illumination conditions, which is publicly available along with the source code of the proposed system at http://users.iit.demokritos.gr/ tyianak/ClothesCode.html.


international conference on pattern recognition | 2008

A novel efficient approach for audio segmentation

Theodoros Giannakopoulos; Aggelos Pikrakis; Sergios Theodoridis

In this paper, a novel approach to audio segmentation is presented. The problem of detecting audio segmentspsila limits is treated as a binary classification task. Frames are classified as ldquosegment limitsrdquo vs ldquononsegment limitsrdquo. For each audio frame a spectrogram is computed and eight feature values are extracted from respective frequency bands. Final decisions are taken based on a classifier combination scheme. The algorithm has very low complexity with almost real time performance. It achieves 86% accuracy rate on real audio streams extracted from movies. Moreover, it introduces a general framework to audio segmentation, which does not depend explicitly on the number of audio classes.

Collaboration


Dive into the Theodoros Giannakopoulos's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sergios Theodoridis

National and Kapodistrian University of Athens

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Michalis Papakostas

University of Texas at Arlington

View shared research outputs
Top Co-Authors

Avatar

Evaggelos Spyrou

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Fillia Makedon

University of Texas at Arlington

View shared research outputs
Top Co-Authors

Avatar

Harry Dimitropoulos

National and Kapodistrian University of Athens

View shared research outputs
Top Co-Authors

Avatar

Ioanna Koromila

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Natalia Manola

National and Kapodistrian University of Athens

View shared research outputs
Researchain Logo
Decentralizing Knowledge