Dejan Arsic
Technische Universität München
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dejan Arsic.
international conference on acoustics, speech, and signal processing | 2007
Björn W. Schuller; Dejan Arsic; Gerhard Rigoll; Matthias Wimmer; Bernd Radig
Great interest is recently shown in behavior modeling, especially in public surveillance tasks. In general it is agreed upon the benefits of use of several input cues as audio and video. Yet, synchronization and fusion of these information sources remains the main challenge. We therefore show results for a feature space combination, which allows for overall feature space optimization. Audio and video features are thereby firstly derived as low-level-descriptors. Synchronization and feature combination is achieved by multivariate time-series analysis. Test-runs on a database of aggressive, cheerful, intoxicated, nervous, neutral, and tired behavior in an airplane situation show a significant improvement over each single modality.
international conference on distributed smart cameras | 2008
Dejan Arsic; Encho Hristov; Nicolas H. Lehment; Benedikt Hörnler; Björn W. Schuller; Gerhard Rigoll
CCTV systems have been introduced in most public spaces in order to increase security. Video outputs are observed by human operators if possible but mostly used as a forensic tool. Therefore it seems desirable to automate video surveillance systems, in order to be able to detect potentially dangerous situations as soon as possible. Multi camera systems have seem to be the prerequisite for huge spaces where frequently occlusions appear. In this paper we will present a system which robustly detects and tracks objects in a multi camera environment and performs a subsequent behavioral analysis based on luggage related events.
international conference on multimedia and expo | 2008
Björn W. Schuller; Bogdan Vlasenko; Dejan Arsic; Gerhard Rigoll; Andreas Wendemuth
Recognition of emotion in speech usually uses acoustic models that ignore the spoken content. Likewise one general model per emotion is trained independent of the phonetic structure. Given sufficient data, this approach seemingly works well enough. Yet, this paper tries to answer the question whether acoustic emotion recognition strongly depends on phonetic content, and if models tailored for the spoken unit can lead to higher accuracies. We therefore investigate phoneme-, and word-models by use of a large prosodic, spectral, and voice quality feature space and Support Vector Machines (SVM). Experiments also take the necessity of ASR into account to select appropriate unit- models. Test-runs on the well-known EMO-DB database facing speaker-independence demonstrate superiority of word emotion models over todays common general models provided sufficient occurrences in the training corpus.
2009 Twelfth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance | 2009
Dejan Arsic; Atanas Lyutskanov; Gerhard Rigoll; Bogdan Kwolek
Reliable tracking of objects is an inevitable prerequisite for automated video surveillance systems. As most object detection methods, which are based on machine learning, require adequate data for the application scenario, foreground segmentation is a popular method to find possible regions of interest. These usually require a specific learning phase and adaptation over time. In this work we will present a novel approach based on graph cuts, which outperforms most standard algorithms. It is commonly agreed that occlusions can only be resolved in multi camera environments. Applying multi layer homography will enable us to robustly detect and track objects applying only foreground data, resulting in a high tracking performance.
international conference on multimedia and expo | 2005
Björn W. Schuller; Bernardo José Brüning Schmitt; Dejan Arsic; Stephan Reiter; Manfred K. Lang; Gerhard Rigoll
In this work we strive to find an optimal set of acoustic features for the discrimination of speech, monophonic singing, and polyphonic music to robustly segment acoustic media streams for annotation and interaction purposes. Furthermore we introduce ensemble-based classification approaches within this task. From a basis of 276 attributes we select the most efficient set by SVM-SFFS. Additionally relevance of single features by calculation of information gain ratio is presented. As a basis of comparison we reduce dimensionality by PCA. We show extensive analysis of different classifiers within the named task. Among these are kernel machines, decision trees, and Bayesian classifiers. Moreover we improve single classifier performance by bagging and boosting, and finally combine strengths of classifiers by stackingC. The database is formed by 2,114 samples of speech, and singing of 58 persons. 1,000 music clips have been taken from the MTV-Europe-Top-20 1980-2000. The outstanding discrimination results of a working realtime capable implementation stress the practicability of the proposed novel ideas
international conference on digital signal processing | 2009
Stavros Ntalampiras; Dejan Arsic; Andre Stormer; Todor Ganchev; Ilyas Potamitis; Nikos Fakotakis
The present paper describes the construction of a multimodal database, referred to as the PROMETHEUS database, which contains recordings from heterogeneous sensors. The main purpose of this database is the development of a framework for monitoring and interpretation of human behavior in unrestricted environments of both indoor and outdoor type. It contains single-person and multi-person scenarios, but also covers scenarios with interactions between groups of people. It is devoted to detection of typical and atypical events, while care has been to taken for the recordings to be as close to real-world conditions as possible. The uniqueness of the PROMETHEUS database comes not only from the unique sensor sets but is due primarily to its generic design, which allows for embracing a wide range of real-world applications (including smart-home and human-robot interaction interfaces, indoors/outdoors public areas surveillance etc).
international conference on multimedia and expo | 2005
Dejan Arsic; Frank Wallhoff; Björn W. Schuller; Gerhard Rigoll
In the present treatise, we propose an approach for a highly configurable image based online person behaviour monitoring system. The particular application scenario is a crew supporting multi-stream on-board threat detection system, which is getting more desirable for the use in public transport. For such frameworks, to work robust in mostly unconstrained environments, many subsystems have to be employed. Although the research field of pattern recognition has brought up reliable approaches for several involved sub-tasks in the last decade, there often exists a gap between reliability and the needed computational efforts. However in order, to accomplish this highly demanding task, several straight forward technologies, here the output of several so-called weak classifiers using low-level features are fused by a sophisticated Bayesian network
international conference on multimedia and expo | 2006
Björn W. Schuller; Frank Wallhoff; Dejan Arsic; Gerhard Rigoll
Automatic discrimination of musical signal types as speech, singing, music, genres or drumbeats within audio streams is of great importance, e.g. for radio broadcast stream segmentation. Yet, feature sets are largely discussed. We therefore suggest a large open feature set approach starting with systematical generation of 7k hi-level features based on MPEG-7 low-level-descriptors and further feature contours. A subsequent fast gain ratio reduction followed by wrapper-based floating search leads to a strong basis of relevant features. Next, features are added by alteration and combination within genetic search. For classification we use support-vector-machines proven reliable for this task. Test-runs are carried out on two task-specific databases and the public Columbia SMD database and show significant improvements for each step of the suggested novel concept
computer vision and pattern recognition | 2010
Nicolas H. Lehment; Dejan Arsic; Moritz Kaiser; Gerhard Rigoll
Current experiments with HCIs have shown a high demand for more natural interaction paradigms. Gestures are thereby considered the most important cue besides speech. In order to recognize gestures it is necessary to extract meaningful motion features from the body. Up to now mostly marker based tracking systems are used in virtual reality environments, since these were traditionally more reliable than purely image based detection methods. However, markers tend to be distracting and cumbersome. Following recent advances in processing power, it becomes possible to use a camera system in order to obtain a depth image of the test subject, match it to a pre-defined body model, and thus track the body parts over time. We will present a full-body system based on APF which enables full body tracking utilizing point clouds recorded with a 3D sensor. Further refinement is provided by a specially adapted inverse kinematics system. A GPU based implementation speeds up processing significantly and allows near real time performance.
international conference on image processing | 2005
Dejan Arsic; W. Bjorn; Björn W. Schuller; Gerhard Rigoll
In the present treatise, we propose an approach for a highly configurable image based online person behaviour monitoring system. The particular application scenario is a crew supporting multi-stream on-board threat detection system, which is getting more desirable for the use in public transport. For such frameworks, to work robustly in mostly unconstrained environments, many subsystems have to be employed. Although the research field of pattern recognition has brought up reliable approaches for several involved subtasks in the last decade, there often exists a gap between reliability and the needed computational efforts. However in order, to accomplish this highly demanding task, several straight forward technologies, here the output of several so-called weak classifiers using low-level features are fused by a sophisticated Bayesian network.