Is this you? Create Your Porfile

Derya Ozkan

University of Southern California

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Derya Ozkan is active.

Explore More

Publication

Featured researches published by Derya Ozkan.

computer vision and pattern recognition | 2006

A Graph Based Approach for Naming Faces in News Photos

Derya Ozkan; Pynar Duygulu

We propose a method to associate names and faces for querying people in large news photo collections. On the assumption that a person’s face is likely to appear when his/her name is mentioned in the caption, first all the faces associated with the query name are selected. Among these faces, there could be many faces corresponding to the queried person in different conditions, poses and times, but there could also be other faces corresponding to other people in the caption or some non-face images due to the errors in the face detection method used. However, in most cases, the number of corresponding faces of the queried person will be large, and these faces will be more similar to each other than to others. In this study, we propose a graph based method to find the most similar subset among the set of possible faces associated with the query name, where the most similar subset is likely to correspond to the faces of the queried person. When the similarity of faces are represented in a graph structure, the set of most similar faces will be the densest component in the graph. We represent the similarity of faces using SIFT descriptors. The matching interest points on two faces are decided after the application of two constraints, namely the geometrical constraint and the unique match constraint. The average distance of the matching points are used to construct the similarity graph. The most similar set of faces is then found based on a greedy densest component algorithm. The experiments are performed on thousands of news photographs taken in real life conditions and, therefore, having a large variety of poses, illuminations and expressions.

international conference on pattern recognition | 2008

Re-ranking of web image search results using a graph algorithm

Hilal Zitouni; Sare Gul Sevil; Derya Ozkan; Pinar Duygulu

We propose a method to improve the results of image search engines on the Internet to satisfy users who desire to see relevant images in the first few pages. The method re-ranks the results of text based systems by incorporating visual similarity of the resulting images. We observe that, together with many unrelated ones, results of text based systems include a subset of correct images, and this set is, in general, the largest one which has the most similar images compared to other possible subsets. Based on this observation, we present similarities of all images in a graph structure, and find the densest component that corresponds to the largest set of most similar subset of images. Then, to re-rank the results, we give higher priority to the images in the densest component, and rank the others based on their similarities to the images in the densest component. The experiments are carried out on 18 category of images from.

Pattern Recognition | 2010

Interesting faces: A graph-based approach for finding people in news

Derya Ozkan; Pinar Duygulu

A GRAPH BASED APPROACH FOR FINDING PEOPLE IN NEWS Derya Ozkan M.S. in Computer Engineering Supervisor: Asst. Prof. Dr. Pinar Duygulu Şahin July, 2007 Along with the recent advances in technology, large quantities of multi-modal data has arisen and became prevalent. Hence, effective and efficient retrieval, organization and analysis of such data constitutes a big challenge. Both news photographs on the web and news videos on television forms this kind of data by covering rich sources of information. People are mostly the main subject of the news; therefore, queries related to a specific person are often desired. In this study, we propose a graph based method to improve the performance of person queries in large news video and photograph collections. We exploit the multi-modal structure of the data by associating text and face information. On the assumption that a person’s face is likely to appear when his/her name is mentioned in the news, only the faces associated with the query name are selected first to limit the search space for a query name. Then, we construct a similarity graph of the faces in this limited search space, where nodes correspond to the faces and edges correspond to the similarity between the faces. Among these faces, there could be many faces corresponding to the queried person in different conditions, poses and times. There could also be other faces corresponding to other people in the news or some non-face images due to the errors in the face detection method used. However, in most cases, the number of corresponding faces of the queried person will be large, and these faces will be more similar to each other than to others. To this end, the problem is transformed into a graph problem, in which we seek to find the densest component of the graph. This most similar subset (densest component) is likely to correspond to the faces of the query name. Finally, the result of the graph algorithm is used as a model for further recognition when new faces are encountered. In the paper, it has been

international conference on multimodal interfaces | 2012

Step-wise emotion recognition using concatenated-HMM

Derya Ozkan; Stefan Scherer; Louis-Philippe Morency

Human emotion is an important part of human-human communication, since the emotional state of an individual often affects the way that he/she reacts to others. In this paper, we present a method based on concatenated Hidden Markov Model (co-HMM) to infer the dimensional and continuous emotion labels from audio-visual cues. Our method is based on the assumption that continuous emotion levels can be modeled by a set of discrete values. Based on this, we represent each emotional dimension by step-wise label classes, and learn the intrinsic and extrinsic dynamics using our co-HMM model. We evaluate our approach on the Audio-Visual Emotion Challenge (AVEC 2012) dataset. Our results show considerable improvement over the baseline regression model presented with the AVEC 2012.

signal processing and communications applications conference | 2008

Re-ranking of image search results using a graph algorithm

Sare Gul Sevil; Hilal Zitouni; Nazlı İkizler; Derya Ozkan; Pinar Duygulu

Although one of the most common usages of Internet is searching, especially in image search, the users are not satisfied due to many irrelevant results. In this paper we present a method to identify irrelevant results of image search on the Internet and re-rank the results so that the relevant results will have a higher priority within the list. The proposed method represents the similarity of images in a graph structure, and then finds the densest component in the graph representing the most similar set of images corresponding to the query.

conference on image and video retrieval | 2006

Finding people frequently appearing in news

Derya Ozkan; Pinar Duygulu

We propose a graph based method to improve the performance of person queries in large news video collections. The method benefits from the multi-modal structure of videos and integrates text and face information. Using the idea that a person appears more frequently when his/her name is mentioned, we first use the speech transcript text to limit our search space for a query name. Then, we construct a similarity graph with nodes corresponding to all of the faces in the search space, and the edges corresponding to similarity of the faces. With the assumption that the images of the query name will be more similar to each other than to other images, the problem is then transformed into finding the densest component in the graph corresponding to the images of the query name. The same graph algorithm is applied for detecting and removing the faces of the anchorpeople in an unsupervised way. The experiments are conducted on 229 news videos provided by NIST for TRECVID 2004. The results show that proposed method outperforms the text only based methods and provides cues for recognition of faces on the large scale.

IEEE Transactions on Multimedia | 2013

Latent Mixture of Discriminative Experts

Derya Ozkan; Louis-Philippe Morency

In this paper, we introduce a new model called Latent Mixture of Discriminative Experts which can automatically learn the temporal relationship between different modalities. Since, we train separate experts for each modality, LMDE is capable of improving the prediction performance even with limited amount of data. For model interpretation, we present a sparse feature ranking algorithm that exploits L1 regularization. An empirical evaluation is provided on the task of listener backchannel prediction (i.e., head nod). We introduce a new error evaluation metric called User-adaptive Prediction Accuracy that takes into account the difference in peoples backchannel responses. Our results confirm the importance of combining five types of multimodal features: lexical, syntactic structure, part-of-speech, visual and prosody. Latent Mixture of Discriminative Experts model outperforms previous approaches.

HBU'10 Proceedings of the First international conference on Human behavior understanding | 2010

Concensus of self-features for nonverbal behavior analysis

Derya Ozkan; Louis-Philippe Morency

One of the key challenge in social behavior analysis is to automatically discover the subset of features relevant to a specific social signal (e.g., backchannel feedback). The way that these social signals are performed exhibit some variations among different people. In this paper, we present a feature selection approach which first looks at important behaviors for each individual, called self-features, before building a consensus. To enable this approach, we propose a new feature ranking scheme which exploits the sparsity of probabilistic models when trained on human behavior problems. We validated our self-feature concensus approach on the task of listener backchannel prediction and showed improvement over the traditional group-feature approach. Our technique gives researchers a new tool to analyze individual differences in social nonverbal communication.

intelligent virtual agents | 2013

Prediction of Visual Backchannels in the Absence of Visual Context Using Mutual Influence

Derya Ozkan; Louis-Philippe Morency

Based on the phenomena of mutual influence between participants of a face-to-face conversation, we propose a context-based prediction approach for modeling visual backchannels. Our goal is to create intelligent virtual listeners with the ability of providing backchannel feedbacks, enabling natural and fluid interactions. In our proposed approach, we first anticipate the speaker behaviors, and then use this anticipated visual context to obtain more accurate listener backchannel moments. We model the mutual influence between speaker and listener gestures using a latent variable sequential model. We compared our approach with state-of-the-art prediction models on a publicly available dataset and showed importance of modeling the mutual influence between the speaker and the listener.

Machine Learning Techniques for Multimedia | 2008

Combining Textual and Visual Information for Semantic Labeling of Images and Videos

Pinar Duygulu; Muhammet Bastan; Derya Ozkan

Semantic labeling of large volumes of image and video archives is difficult, if not impossible, with the traditional methods due to the huge amount of human effort required for manual labeling used in a supervised setting. Recently, semi-supervised techniques which make use of annotated image and video collections are proposed as an alternative to reduce the human effort. In this direction, different techniques, which are mostly adapted from information retrieval literature, are applied to learn the unknown one-to-one associations between visual structures and semantic descriptions. When the links are learned, the range of application areas is wide including better retrieval and automatic annotation of images and videos, labeling of image regions as a way of large-scale object recognition and association of names with faces as a way of large-scale face recognition. In this chapter, after reviewing and discussing a variety of related studies, we present two methods in detail, namely, the so called “translation approach” which translates the visual structures to semantic descriptors using the idea of statistical machine translation techniques, and another approach which finds the densest component of a graph corresponding to the largest group of similar visual structures associated with a semantic description.

Explore More