Cigdem Eroglu Erdem
Bahçeşehir University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Cigdem Eroglu Erdem.
IEEE Transactions on Circuits and Systems for Video Technology | 2003
Cigdem Eroglu Erdem; A.M. Tekalp; B. Sankur
Presents a scalable object tracking framework, which is capable of tracking the contour of nonrigid objects in the presence of occlusion. The framework consists of open-loop boundary prediction and closed-loop boundary correction parts. The open-loop prediction block adaptively divides the object contour into subcontours, and estimates the mapping parameters for each subsegment. The closed-loop boundary correction block employs a suitably weighted combination of low-level features such as color edge, color segmentation, motion models, and motion segmentation for each subcontour. Performance evaluation measures are used in a feedback loop to evaluate the goodness of the segmentation/tracking in order to adjust the weights assigned to each of these low-level features for each subcontour at each frame. The framework is scalable because it can be adapted to track a coarse estimate of the boundary of selected objects in real-time, as well as pixel-accurate boundary tracking in off-line mode. The proposed method does not depend on any single motion or shape model, and does not need training. Experimental results demonstrate that the algorithm is able to track the object boundaries under significant occlusion and background clutter.
international conference on acoustics, speech, and signal processing | 2011
Cigdem Eroglu Erdem; Sezer Ulukaya; Ali Karaali; A. T. Erdem
This paper presents a hybrid method for face detection in color images. The well known Haar feature-based face detector developed by Viola and Jones (VJ), that has been designed for gray-scale images is combined with a skin-color filter, which provides complementary information in color images. The image is first passed through a Haar-Feature based face detector, which is adjusted such that it is operating at a point on its ROC curve that has a low number of missed faces but a high number of false detections. Then, using the proposed skin color post-filtering method many of these false detections can be eliminated easily. We also use a color compensation algorithm to reduce the effects of lighting. Our experimental results on the Bao color face database show that the proposed method is superior to the original VJ algorithm and also to other skin color based pre-filtering methods in the literature in terms of precision.
signal processing and communications applications conference | 2007
Elif Bozkurt; Cigdem Eroglu Erdem; Engin Erzin; Tanju Erdem; Mehmet K. Özkan
Natural looking lip animation, synchronized with incoming speech, is essential for realistic character animation. In this work, we evaluate the performance of phone and viseme based acoustic units, with and without context information, for generating realistic lip synchronization using HMM based recognition systems. We conclude via objective evaluations that utilization of viseme based units with context information outperforms the other methods.
IEEE Transactions on Image Processing | 2001
Cigdem Eroglu Erdem; Güneş Karabulut; Evsen Yanmaz; Emin Anarim
A recent work explicitly models the discontinuous motion estimation problem in the frequency domain where the motion parameters are estimated using a harmonic retrieval approach. The vertical and horizontal components of the motion are independently estimated from the locations of the peaks of respective periodogram analyses and they are paired to obtain the motion vectors using a procedure proposed. In this paper, we present a more efficient method that replaces the motion component pairing task and hence eliminates the problems of the pairing method described. The method described in this paper uses the fuzzy c-planes (FCP) clustering approach to fit planes to three-dimensional (3-D) frequency domain data obtained from the peaks of the periodograms. Experimental results are provided to demonstrate the effectiveness of the proposed method.
Proceedings of the 3rd international workshop on Affective interaction in natural environments | 2010
Cigdem Eroglu Erdem; Elif Bozkurt; Engin Erzin; A. Tanju Erdem
Training datasets containing spontaneous emotional expressions are often imperfect due the ambiguities and difficulties of labeling such data by human observers. In this paper, we present a Random Sampling Consensus (RANSAC) based training approach for the problem of emotion recognition from spontaneous speech recordings. Our motivation is to insert a data cleaning process to the training phase of the Hidden Markov Models (HMMs) for the purpose of removing some suspicious instances of labels that may exist in the training dataset. Our experiments using HMMs with various number of states and Gaussian mixtures per state indicate that utilization of RANSAC in the training phase provides an improvement of up to 2.84% in the unweighted recall rates on the test set. . This improvement in the accuracy of the classifier is shown to be statistically significant using McNemars test.
computer vision and pattern recognition | 2001
Cigdem Eroglu Erdem; Bülent Sankur; A.M. Tekalp
We present a scalable object tracking framework which is capable of tracking the contour of rigid and non-rigid objects in the presence of occlusion. The method adaptively divides the object contour into sub-contours, and employs several low-level features such as color edge, color segmentation, motion models, motion segmentation, and shape continuity information in a feedback loop to track each sub-contour. We also introduce some novel performance evaluation measures to evaluate the goodness of the segmentation and tracking. The results of these performance measures are utilized in a feedback loop to adjust the weights assigned to each of these low-level features for each sub-contour at each frame. The framework is scalable because it can be adapted to roughly track simple objects in real-time as well as pixel-accurate tracking of more complex objects in offline mode. The proposed method does not depend on any single motion or shape model, and does not need training. Experimental results demonstrate that the algorithm is able to track the object boundaries accurately under significant occlusion and background clutter.
Signal, Image and Video Processing | 2016
Sara Zhalehpour; Zahid Akhtar; Cigdem Eroglu Erdem
We present a fully automatic multimodal emotion recognition system based on three novel peak frame selection approaches using the video channel. Selection of peak frames (i.e., apex frames) is an important preprocessing step for facial expression recognition as they contain the most relevant information for classification. Two of the three proposed peak frame selection methods (i.e., MAXDIST and DEND-CLUSTER) do not employ any training or prior learning. The third method proposed for peak frame selection (i.e., EIFS) is based on measuring the “distance” of the expressive face from the subspace of neutral facial expression, which requires a prior learning step to model the subspace of neutral face shapes. The audio and video modalities are fused at the decision level. The subject-independent audio-visual emotion recognition system has shown promising results on two databases in two different languages (eNTERFACE and BAUM-1a).
Digital Signal Processing | 2014
Sezer Ulukaya; Cigdem Eroglu Erdem
When the goal is to recognize the facial expression of a person given an expressive image, there are mainly two types of information encoded in the image that we have to deal with: identity-related information and expression related information. Alleviating the identity-related information, for example by using an image of the same person with a neutral facial expression, increases the success of facial expression recognition algorithms. However, the neutral face image corresponding to an expressive face may not always be available or known, which is known as the baseline problem. In this work, we propose a general solution to the baseline problem by estimating the unknown neutral face shape of an expressive face image using a dictionary of neutral face shapes. The dictionary is formed using a Gaussian Mixture Model fitting method. We also present a method of fusing shape-based (geometrical) features with appearance based features by calculating them only around the most discriminative geometrical facial features, which have been selected automatically. Experimental results on three widely used facial expression databases as well as cross database analysis show that utilization of the estimated neutral face shapes increases the facial expression recognition rate significantly, when the person-specific neutral face information is not available.
international conference on pattern recognition | 2010
Elif Bozkurt; Engin Erzin; Cigdem Eroglu Erdem; A. Tanju Erdem
We propose the use of the line spectral frequency (LSF) features for emotion recognition from speech, which have not been been previously employed for emotion recognition to the best of our knowledge. Spectral features such as mel-scaled cepstral coefficients have already been successfully used for the parameterization of speech signals for emotion recognition. The LSF features also offer a spectral representation for speech, moreover they carry intrinsic information on the formant structure as well, which are related to the emotional state of the speaker [4]. We use the Gaussian mixture model (GMM) classifier architecture, that captures the static color of the spectral features. Experimental studies performed over the Berlin Emotional Speech Database and the FAU Aibo Emotion Corpus demonstrate that decision fusion configurations with LSF features bring a consistent improvement over the MFCC based emotion classification rates.
Multimedia Tools and Applications | 2015
Cigdem Eroglu Erdem; Çiğdem Turan; Zafer Aydin
Access to audio-visual databases, which contain enough variety and are richly annotated is essential to assess the performance of algorithms in affective computing applications, which require emotion recognition from face and/or speech data. Most databases available today have been recorded under tightly controlled environments, are mostly acted and do not contain speech data. We first present a semi-automatic method that can extract audio-visual facial video clips from movies and TV programs in any language. The method is based on automatic detection and tracking of faces in a movie until the face is occluded or a scene cut occurs. We also created a video-based database, named as BAUM-2, which consists of annotated audio-visual facial clips in several languages. The collected clips simulate real-world conditions by containing various head poses, illumination conditions, accessories, temporary occlusions and subjects with a wide range of ages. The proposed semi-automatic affective clip extraction method can easily be used to extend the database to contain clips in other languages. We also created an image based facial expression database from the peak frames of the video clips, which is named as BAUM-2i. Baseline image and video-based facial expression recognition results using state-of-the art features and classifiers indicate that facial expression recognition under tough and close-to-natural conditions is quite challenging.