Esra Acar
Technical University of Berlin
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Esra Acar.
conference on multimedia modeling | 2014
Esra Acar; Frank Hopfgartner; Sahin Albayrak
In consideration of the ever-growing available multimedia data, annotating multimedia content automatically with feeling(s) expected to arise in users is a challenging problem. In order to solve this problem, the emerging research field of video affective analysis aims at exploiting human emotions. In this field where no dominant feature representation has emerged yet, choosing discriminative features for the effective representation of video segments is a key issue in designing video affective content analysis algorithms. Most existing affective content analysis methods either use low-level audio-visual features or generate hand-crafted higher level representations based on these low-level features. In this work, we propose to use deep learning methods, in particular convolutional neural networks (CNNs), in order to learn mid-level representations from automatically extracted low-level features. We exploit the audio and visual modality of videos by employing Mel-Frequency Cepstral Coefficients (MFCC) and color values in the RGB space in order to build higher level audio and visual representations. We use the learned representations for the affective classification of music video clips. We choose multi-class support vector machines (SVMs) for classifying video clips into four affective categories representing the four quadrants of the Valence-Arousal (VA) space. Results on a subset of the DEAP dataset (on 76 music video clips) show that a significant improvement is obtained when higher level representations are used instead of low-level features, for video affective content analysis.
multimedia signal processing | 2012
Esra Acar; Tobias Senst; Alexander Kuhn; Ivo Keller; Holger Theisel; Sahin Albayrak; Thomas Sikora
Human action recognition requires the description of complex motion patterns in image sequences. In general, these patterns span varying temporal scales. In this context, Lagrangian methods have proven to be valuable for crowd analysis tasks such as crowd segmentation. In this paper, we show that, besides their potential in describing large scale motion patterns, Lagrangian methods are also well suited to model complex individual human activities over variable time intervals. We use Finite Time Lyapunov Exponents and time-normalized arc length measures in a linear SVM classification scheme. We evaluated our method on the Weizmann and KTH datasets. The results demonstrate that our approach is promising and that human action recognition performance is improved by fusing Lagrangian measures.
acm multimedia | 2013
Esra Acar; Frank Hopfgartner; Sahin Albayrak
Detecting violent scenes in movies is an important video content understanding functionality e.g., for providing automated youth protection services. One key issue in designing algorithms for violence detection is the choice of discriminative features. In this paper, we employ mid-level audio features and compare their discriminative power against low-level audio and visual features. We fuse these mid-level audio cues with low-level visual ones at the decision level in order to further improve the performance of violence detection. We use Mel-Frequency Cepstral Coefficients (MFCC) as audio and average motion as visual features. In order to learn a violence model, we choose two-class support vector machines (SVMs). Our experimental results on detecting violent video shots in Hollywood movies show that mid-level audio features are more discriminative and provide more precise results than low-level ones. The detection performance is further enhanced by fusing the mid-level audio cues with low-level visual ones using an SVM-based decision fusion.
content based multimedia indexing | 2008
Esra Acar; Serdar Arslan; Adnan Yazici; Murat Koyuncu
Content-based retrieval of multimedia data has still been an active research area. The efficient retrieval in natural images has been proven a difficult task for content-based image retrieval systems. In this paper, we present a system that adapts two different index structures, namely Slim-Tree and BitMatrix, for efficient retrieval of images based on multidimensional low-level features such as color, texture and shape. These index structures also use metric space. We use MPEG-7 Descriptors extracted from images to represent these features and store them in a native XML database. The low-level features; color layout (CL), dominant color (DC), edge histogram (EH) and region shape (RS) are used in Slim-Tree and BitMatrix and aggregated by ordered weighted averaging (OWA) method to find final similarity between any two objects. The experiments included in the paper are in the subject of index construction and update, query response time and retrieval effectiveness using ANMRR performance metric and precision/recall scores. The experimental results strengthen the case that uses BitMatrix along with ordered weighted averaging method in content-based image retrieval systems.
data and knowledge engineering | 2013
Serdar Arslan; Adnan Yazici; Ahmet Sacan; Ismail Hakki Toroslu; Esra Acar
Abstract In information retrieval, efficient similarity search in multimedia collections is a critical task. In this paper, we present a rigorous comparison of three different approaches to the image retrieval problem, including cluster-based indexing, distance-based indexing, and multidimensional scaling methods. The time and accuracy trade-offs for each of these methods are demonstrated on three different image data sets. Similarity of images is obtained either by a feature-based similarity measure using four MPEG-7 low-level descriptors or by a whole image-based similarity measure. The effect of these similarity measurement techniques on the retrieval process is also evaluated through the performance tests performed on several data sets. We show that using low-level features of images in the similarity measurement function results in significantly better accuracy and time performance compared to the whole-image based approach. Moreover, an optimization of feature contributions to the distance measure for feature-based approach can identify the most relevant features and is necessary to obtain maximum accuracy. We further show that multidimensional scaling can achieve comparable accuracy, while speeding-up the query times significantly by allowing the use of spatial access methods.
content based multimedia indexing | 2013
Esra Acar; Frank Hopfgartner; Sahin Albayrak
Movie violent content detection e.g., for providing automated youth protection services is a valuable video content analysis functionality. Choosing discriminative features for the representation of video segments is a key issue in designing violence detection algorithms. In this paper, we employ mid-level audio features which are based on a Bag-of-Audio Words (BoAW) method using Mel-Frequency Cepstral Coefficients (MFCCs). BoAW representations are constructed with two different methods, namely the vector quantization-based (VQ-based) method and the sparse coding-based (SC-based) method. We choose two-class support vector machines (SVMs) for classifying video shots as (non-)violent. Our experiments on detecting violent video shots in Hollywood movies show that the mid-level audio features provide promising results. Additionally, we establish that the SC-based method outperforms the VQ-based one. More importantly, the SC-based method outperforms the unimodal submissions in the MediaEval Violent Scenes Detection (VSD) task, except one vision-based method in terms of average precision.
Multimedia Tools and Applications | 2017
Esra Acar; Frank Hopfgartner; Sahin Albayrak
In today’s society where audio-visual content such as professionally edited and user-generated videos is ubiquitous, automatic analysis of this content is a decisive functionality. Within this context, there is an extensive ongoing research about understanding the semantics (i.e., facts) such as objects or events in videos. However, little research has been devoted to understanding the emotional content of the videos. In this paper, we address this issue and introduce a system that performs emotional content analysis of professionally edited and user-generated videos. We concentrate both on the representation and modeling aspects. Videos are represented using mid-level audio-visual features. More specifically, audio and static visual representations are automatically learned from raw data using convolutional neural networks (CNNs). In addition, dense trajectory based motion and SentiBank domain-specific features are incorporated. By means of ensemble learning and fusion mechanisms, videos are classified into one of predefined emotion categories. Results obtained on the VideoEmotion dataset and a subset of the DEAP dataset show that (1) higher level representations perform better than low-level features, (2) among audio features, mid-level learned representations perform better than mid-level handcrafted ones, (3) incorporating motion and domain-specific information leads to a notable performance gain, and (4) ensemble learning is superior to multi-class support vector machines (SVMs) for video affective content analysis.
international conference on intelligent transportation systems | 2014
Jan Keiser; Nils Masuch; Marco Lützenberger; Dennis Grunewald; Maximilian Kern; Esra Acar; Çiğdem Avci Salma; Xuan-Thuy Dang; Christian Kuster; Sahin Albayrak
Traffic systems of major cities have reached a high level of sophistication, yet, we think their heterogeneity difficult. Ever-new mobility concepts occur, such as car or ride sharing. It is our opinion that a holistic consideration of all available concepts may provide significant benefits. In this paper we present such consideration in the form of an open accessible, agent-based mobility platform, namely the IMA system.
conference on multimedia modeling | 2014
David Scott; Zhenxing Zhang; Rami Albatal; Kevin McGuinness; Esra Acar; Frank Hopfgartner; Cathal Gurrin; Noel E. O'Connor; Alan F. Smeaton
This paper presents our third participation in the Video Browser Showdown. Building on the experience that we gained while participating in this event, we compete in the 2014 showdown with a more advanced browsing system based on incorporating several audio-visual retrieval techniques. This paper provides a short overview of the features and functionality of our new system.
acm multimedia | 2013
Esra Acar
Among the ever growing available multimedia data, finding multimedia content which matches the current mood of users is a challenging problem. Choosing discriminative features for the representation of video segments is a key issue in designing video affective content analysis algorithms, where no dominant feature representation has emerged yet. Most existing affective content analysis methods either use low-level audio-visual features or generate hand-crafted higher level representations. In this work, we propose to use deep learning methods, in particular, convolutional neural networks (CNNs), in order to learn mid-level representations from automatically extracted raw features. We exploit only the audio modality in the current framework and employ Mel-Frequency Cepstral Coefficients (MFCC) features in order to build higher level audio representations. We use the learned representations for the affective classification of music video clips. We choose multi-class support vector machines (SVMs) for classifying video clips into affective categories. Preliminary results on a subset of the DEAP dataset show that a significant improvement is obtained when we learn higher level representations instead of using low-level features directly for video affective content analysis. We plan to further extend this work and include visual modality as well. We will generate mid-level visual representations using CNNs and fuse these visual representations with mid-level audio representations both at feature- and decision-level for video affective content analysis.