Arman Savran
Boğaziçi University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Arman Savran.
Biometrics and Identity Management | 2008
Arman Savran; Nese Alyuz; Hamdi Dibeklioglu; Oya Celiktutan; Berk Gökberk; Bülent Sankur; Lale Akarun
A new 3D face database that includes a rich set of expressions, systematic variation of poses and different types of occlusions is presented in this paper. This database is unique from three aspects: i) the facial expressions are composed of judiciously selected subset of Action Units as well as the six basic emotions, and many actors/actresses are incorporated to obtain more realistic expression data; ii) a rich set of head pose variations are available; and iii) different types of face occlusions are included. Hence, this new database can be a very valuable resource for development and evaluation of algorithms on face recognition under adverse conditions and facial expression analysis as well as for facial expression synthesis.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2010
Javier Ortega-Garcia; Julian Fierrez; Fernando Alonso-Fernandez; Javier Galbally; Manuel Freire; Joaquin Gonzalez-Rodriguez; Carmen García-Mateo; Jose-Luis Alba-Castro; Elisardo González-Agulla; Enrique Otero-Muras; Sonia Garcia-Salicetti; Lorene Allano; Bao Ly-Van; Bernadette Dorizzi; Josef Kittler; Thirimachos Bourlai; Norman Poh; Farzin Deravi; Ming Wah R. Ng; Michael C. Fairhurst; Jean Hennebert; Andrea Monika Humm; Massimo Tistarelli; Linda Brodo; Jonas Richiardi; Andrzej Drygajlo; Harald Ganster; Federico M. Sukno; Sri-Kaushik Pavani; Alejandro F. Frangi
A new multimodal biometric database designed and acquired within the framework of the European BioSecure Network of Excellence is presented. It is comprised of more than 600 individuals acquired simultaneously in three scenarios: 1 over the Internet, 2 in an office environment with desktop PC, and 3 in indoor/outdoor environments with mobile portable hardware. The three scenarios include a common part of audio/video data. Also, signature and fingerprint data have been acquired both with desktop PC and mobile portable hardware. Additionally, hand and iris data were acquired in the second scenario using desktop PC. Acquisition has been conducted by 11 European institutions. Additional features of the BioSecure Multimodal Database (BMDB) are: two acquisition sessions, several sensors in certain modalities, balanced gender and age distributions, multimodal realistic scenarios with simple and quick tasks per modality, cross-European diversity, availability of demographic data, and compatibility with other multimodal databases. The novel acquisition conditions of the BMDB allow us to perform new challenging research and evaluation of either monomodal or multimodal biometric systems, as in the recent BioSecure Multimodal Evaluation campaign. A description of this campaign including baseline results of individual modalities from the new database is also given. The database is expected to be available for research purposes through the BioSecure Association during 2008.
Image and Vision Computing | 2012
Arman Savran; Bülent Sankur; M. Taha Bilge
Facial Action Coding System (FACS) is the de facto standard in the analysis of facial expressions. FACS describes expressions in terms of the configuration and strength of atomic units called Action Units: AUs. FACS defines 44 AUs and each AU intensity is defined on a nonlinear scale of five grades. There has been significant progress in the literature on the detection of AUs. However, the companion problem of estimating the AU strengths has not been much investigated. In this work we propose a novel AU intensity estimation scheme applied to 2D luminance and/or 3D surface geometry images. Our scheme is based on regression of selected image features. These features are either non-specific, that is, those inherited from the AU detection algorithm, or are specific in that they are selected for the sole purpose of intensity estimation. For thoroughness, various types of local 3D shape indicators have been considered, such as mean curvature, Gaussian curvature, shape index and curvedness, as well as their fusion. The feature selection from the initial plethora of Gabor moments is instrumented via a regression that optimizes the AU intensity predictions. Our AU intensity estimator is person-independent and when tested on 25 AUs that appear singly or in various combinations, it performs significantly better than the state-of-the-art method which is based on the margins of SVMs designed for AU detection. When evaluated comparatively, one can see that the 2D and 3D modalities have relative merits per upper face and lower face AUs, respectively, and that there is an overall improvement if 2D and 3D intensity estimations are used in fusion.
Pattern Recognition | 2012
Arman Savran; Bülent Sankur; M. Taha Bilge
Automatic detection of facial expressions attracts great attention due to its potential applications in human-computer interaction as well as in human facial behavior research. Most of the research has so far been performed in 2D. However, as the limitations of 2D data are understood, expression analysis research is being pursued in 3D face modality. 3D can capture true facial surface data and is less disturbed by illumination and head pose. At this junction we have conducted a comparative evaluation of 3D and 2D face modalities. We have investigated extensively 25 action units (AU) defined in the Facial Action Coding System. For fairness we map facial surface geometry into 2D and apply totally data-driven techniques in order to avoid biases due to design. We have demonstrated that overall 3D data performs better, especially for lower face AUs and that there is room for improvement by fusion of 2D and 3D modalities. Our study involves the determination of the best feature set from 2D and 3D modalities and of the most effective classifier, both from several alternatives. Our detailed analysis puts into evidence the merits and some shortcomings of 3D modality over 2D in classifying facial expressions from single images.
Biometrics and Identity Management | 2008
Nese Alyuz; Berk Gökberk; Hamdi Dibeklioglu; Arman Savran; Albert Ali Salah; Lale Akarun; Bülent Sankur
This paper presents an evaluation of several 3D face recognizers on the Bosphorus database which was gathered for studies on expression and pose invariant face analysis. We provide identification results of three 3D face recognition algorithms, namely generic face template based ICP approach, one-to-all ICP approach, and depth image-based Principal Component Analysis (PCA) method. All of these techniques treat faces globally and are usually accepted as baseline approaches. In addition, 2D texture classifiers are also incorporated in a fusion setting. Experimental results reveal that even though global shape classifiers achieve almost perfect identification in neutral-to-neutral comparisons, they are sub-optimal under extreme expression variations. We show that it is possible to boost the identification accuracy by focusing on the rigid facial regions and by fusing complementary information coming from shape and texture modalities.
international conference on computer vision | 2009
Arman Savran; Bülent Sankur
We address the person-independent recognition problem of facial expressions using static 3D face data. The novel approach to the facial expression recognition uses non-rigid registration of surface curvature features. 3D face data is cast onto 2D feature images, which are then subjected to elastic deformations in their parametric space. Each Action Unit (AU) detector is trained over its respective influence domain on the face. The registration task is incorporated in the multiresolution elastic deformation scheme, which yields adequate registration accuracy for mild pose variations. The algorithm is fully automatic and is free of the burden of first localizing anatomical facial points. The algorithm was tested on 22 facial action units of Facial Action Coding System. Promising results obtained indicate that we have an operative device for facial action unit detection, and an intermediate step to infer emotional or mental states. Moreover, experiments conducted with low intensity AU12 - Lip Corner Puller points to the potential of 3D data and the proposed method in subtle expression detection.
computer vision and pattern recognition | 2010
Arman Savran; Bülent Sankur; M. Taha Bilge
In human facial behavioral analysis, Action Unit (AU) coding is a powerful instrument to cope with the diversity of facial expressions. Almost all of the work in the literature for facial action recognition is based on 2D camera images. Given the performance limitations in AU detection with 2D data, 3D facial surface information appears as a viable alternative. 3D systems capture true facial surface data and are less disturbed by illumination and head pose. In this paper we extensively compare the use of 3D modality vis-`a-vis 2D imaging modality for AU recognition. Surface data is converted into curvature data and mapped into 2D so that both modalities can be compared on a fair ground. Since the approach is totally data-driven, possible bias due to the design is avoided. Our experiments cover 25 AUs and is based on the comparison of Receiver Operating Characteristic (ROC) curves. We demonstrate that in general 3D data performs better, especially for lower face AUs. Furthermore it is more robust in detecting low intensity AUs. Also, we show that generative and discriminative classifiers perform on a par with 3D data. Finally, we evaluate fusion of the two modalities. The highest detection rate was achieved by fusion, which is 97.1 area under the ROC curve. This score was 95.4 for 3D and 93.5 for 2D modality.
computer vision and pattern recognition | 2008
Arman Savran; Bülent Sankur
Non-rigid surface registration, particularly registration of human faces, finds a wide variety of applications in computer vision and graphics. We present a new automatic surface registration method which utilizes both attraction forces originating from geometrical and textural similarities, and stresses due to non-linear elasticity of the surfaces. Reference and target surfaces are first mapped onto their feature image planes, then these images are registered by subjecting them to local deformations, and finally 3D correspondences are established. Surfaces are assumed to be elastic sheets and are represented by triangular meshes. The internal elastic forces act as a regularizer in this ill-posed problem. Furthermore, the non-linear elasticity model allows us to handle large deformations, which can be essential, for instance, for facial expressions. The method has been tested successfully on 3D scanned human faces, with and without expressions. The algorithm runs quite efficiently using a multiresolution approach.
Signal Processing | 2006
Arman Savran; Levent M. Arslan; Lale Akarun
In this study, a complete system that generates visual speech by synthesizing 3D face points has been implemented. The estimated face points drive MPEG-4 facial animation. This system is speaker independent and can be driven by audio or both audio and text. The synthesis of visual speech was realized by a codebook-based technique, which is trained with audio-visual data from a speaker. An audio-visual speech data set in Turkish language was created using a 3D facial motion capture system that was developed for this study. The performance of this method was evaluated in three categories. First, audio-driven results were reported, and compared with the time-delayed neural network (TDNN) and recurrent neural network (RNN) algorithms, which are popular in speech-processing field. It was found out that TDNN performs best and RNN performs worst for this data set. Second, results for the codebook-based method after incorporating text information were given. It was seen that text information together with the audio improves the synthesis performance significantly. For many applications, the donor speaker for the audio-visual data will not be available to provide audio data for synthesis. Therefore, we designed a speaker-independent version of this codebook technique. The results of speaker-independent synthesis are important, because there are no comparative results reported for speech input from other speakers to animate the face model. It was observed that although there is small degradation in the trajectory correlation (0.71-0.67) with respect to speaker-dependent synthesis, the performance results are quite satisfactory. Thus, the resulting system is capable of animating faces realistically from input speech of any Turkish speaker.
Computer Vision and Image Understanding | 2017
Arman Savran; Bülent Sankur
Abstract We propose a novel feature extraction approach for 3D facial expression recognition by incorporating non-rigid registration in face-model-free analysis, which in turn makes feasible data-driven, i.e., feature-model-free recognition of expressions. The resulting simplicity of feature representation is due to the fact that facial information is adapted to the input faces via shape model-free dense registration, and this provides a dynamic feature extraction mechanism. This approach eliminates the necessity of complex feature representations as required in the case of static feature extraction methods, where the complexity arises from the necessity to model the local context; higher degree of complexity persists in deep feature hierarchies enabled by end-to-end learning on large-scale datasets. Face-model-free recognition implies independence from limitations and biases due to committed face models, bypassing complications of model fitting, and avoiding the burden of manual model construction. We show via information gain maps that non-rigid registration enables extraction of highly informative features, as it provides invariance to local-shifts due to physiognomy (subject invariance) and residual pose misalignments; in addition, it allows estimation of local correspondences of expressions. To maximize the recognition rate, we use the strategy of employing a rich but computationally manageable set of local correspondence structures, and to this effect we propose a framework to optimally select multiple registration references. Our features are re-sampled surface curvature values at individual coordinates which are chosen per expression-class and per reference pair. We show the superior performance of our novel dynamic feature extraction approach on three distinct recognition problems, namely, action unit detection, basic expression recognition, and emotion dimension recognition.