Rok Gajšek | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rok Gajšek is active.

Explore More

Publication

Featured researches published by Rok Gajšek.

international conference on biometrics theory applications and systems | 2009

Principal Gabor filters for face recognition

Vitomir Struc; Rok Gajšek; Nikola Pavesic

Gabor filters have proven themselves to be a powerful tool for facial feature extraction. An abundance of recognition techniques presented in the literature exploits these filters to achieve robust face recognition. However, while exhibiting desirable properties, such as orientational selectivity or spatial locality, Gabor filters have also some shortcomings which crucially affect the characteristics and size of the Gabor representation of a given face pattern. Amongst these shortcomings the fact that the filters are not orthogonal one to another and are, hence, correlated is probably the most important. This makes the information contained in the Gabor face representation redundant and also affects the size of the representation. To overcome this problem we propose in this paper to employ orthonormal linear combinations of the original Gabor filters rather than the filters themselves for deriving the Gabor face representation. The filters, named principal Gabor filters for the fact that they are computed by means of principal component analysis, are assessed in face recognition experiments performed on the XM2VTS and YaleB databases, where encouraging results are achieved.

international conference on pattern recognition | 2010

Multi-modal Emotion Recognition Using Canonical Correlations and Acoustic Features

Rok Gajšek; Vitomir truc

The information of the psycho-physical state of the subject is becoming a valuable addition to the modern audio or video recognition systems. As well as enabling a better user experience, it can also assist in superior recognition accuracy of the base system. In the article, we present our approach to multi-modal (audio-video) emotion recognition system. For audio sub-system, a feature set comprised of prosodic, spectral and cepstrum features is selected and support vector classifier is used to produce the scores for each emotional category. For video sub-system a novel approach is presented, which does not rely on the tracking of specific facial landmarks and thus, eliminates the problems usually caused, if the tracking algorithm fails at detecting the correct area. The system is evaluated on the interface database and the recognition accuracy of our audio-video fusion is compared to the published results in the literature.

text, speech and dialogue | 2009

Analysis and Assessment of AvID: Multi-Modal Emotional Database

Rok Gajšek; Vitomir Struc; Bostjan Vesnicer; Anja Podlesek; Luka Komidar

The paper deals with the recording and the evaluation of a multi modal (audio/video) database of spontaneous emotions. Firstly, motivation for this work is given and different recording strategies used are described. Special attention is given to the process of evaluating the emotional database. Different kappa statistics normally used in measuring the agreement between annotators are discussed. Following the problems of standard kappa coefficients, when used in emotional database assessment, a new time-weighted free-marginal kappa is presented. It differs from the other kappa statistics in that it weights each utterances particular score of agreement based on the duration of the utterance. The new method is evaluated and the superiority over the standard kappa, when dealing with a database of spontaneous emotions, is demonstrated.

BioID_MultiComm'09 Proceedings of the 2009 joint COST 2101 and 2102 international conference on Biometric ID management and multimodal communication | 2009

Combining audio and video for detection of spontaneous emotions

Rok Gajšek; Vitomir Struc; Simon Dobrisek; Janez Žibert; Nikola Pavesic

The paper presents our initial attempts in building an audio video emotion recognition system. Both, audio and video sub-systems are discussed, and description of the database of spontaneous emotions is given. The task of labelling the recordings from the database according to different emotions is discussed and the measured agreement between multiple annotators is presented. Instead of focusing on the prosody in audio emotion recognition, we evaluate the possibility of using linear transformations (CMLLR) as features. The classification results from audio and video sub-systems are combined using sum rule fusion and the increase in recognition results, when using both modalities, is presented.

text speech and dialogue | 2012

Analysis and Assessment of State Relevance in HMM-Based Feature Extraction Method

Rok Gajšek; Simon Dobrisek

In the article we evaluate the importance of different HMM states in an HMM-based feature extraction method used to model paralinguistic information. Specifically, we evaluate the distribution of the paralinguistic information across different states of the HMM in two different classification tasks: emotion recognition and alcoholization detection. In the task of recognizing emotions we found that the majority of emotion-related information is incorporated in the first and third state of a 3-state HMM. Surprisingly, in the alcoholization detection task we observed a somewhat equal distribution of task-specific information across all three states, resulting in constantly producing better results if more states are utilized.

text speech and dialogue | 2008

Acoustic Modeling for Speech Recognition in Telephone Based Dialog System Using Limited Audio Resources

Rok Gajšek; Janez Žibert

In the article we evaluate different techniques of acoustic modeling for speech recognition in the case of limited audio resources. The objective was to build different sets of acoustic models, the first was trained on a small set of telephone speech recordings and the other was trained on a bigger database with broadband speech recordings and later adapted to a different audio environment. Different adaptation methods (MLLR, MAP) were examined in combination with different parameterization features (MFCC, PLP, RPLP). We show that using adaptation methods, which are mainly used for speaker adaptation purposes, can increase the robustness of speech recognition in cases of mismatched training and working acoustic environment conditions.

International Journal of Advanced Robotic Systems | 2013