Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Nazli Ikizler-Cinbis is active.

Publication


Featured researches published by Nazli Ikizler-Cinbis.


european conference on computer vision | 2010

Object, scene and actions: combining multiple features for human action recognition

Nazli Ikizler-Cinbis; Stan Sclaroff

In many cases, human actions can be identified not only by the singular observation of the human body in motion, but also properties of the surrounding scene and the related objects. In this paper, we look into this problem and propose an approach for human action recognition that integrates multiple feature channels from several entities such as objects, scenes and people. We formulate the problem in a multiple instance learning (MIL) framework, based on multiple feature channels. By using a discriminative approach, we join multiple feature channels embedded to the MIL space. Our experiments over the large YouTube dataset show that scene and object information can be used to complement person features for human action recognition.


international conference on computer vision | 2009

Learning actions from the Web

Nazli Ikizler-Cinbis; R. Gokberk Cinbis; Stan Sclaroff

This paper proposes a generic method for action recognition in uncontrolled videos. The idea is to use images collected from the Web to learn representations of actions and use this knowledge to automatically annotate actions in videos. Our approach is unsupervised in the sense that it requires no human intervention other than the text querying. Its benefits are two-fold: 1) we can improve retrieval of action images, and 2) we can collect a large generic database of action poses, which can then be used in tagging videos. We present experimental evidence that using action images collected from the Web, annotating actions is possible.


international conference on computer vision | 2013

Action Recognition and Localization by Hierarchical Space-Time Segments

Shugao Ma; Jianming Zhang; Nazli Ikizler-Cinbis; Stan Sclaroff

We propose Hierarchical Space-Time Segments as a new representation for action recognition and localization. This representation has a two-level hierarchy. The first level comprises the root space-time segments that may contain a human body. The second level comprises multi-grained space-time segments that contain parts of the root. We present an unsupervised method to generate this representation from video, which extracts both static and non-static relevant space-time segments, and also preserves their hierarchical and temporal relationships. Using simple linear SVM on the resultant bag of hierarchical space-time segments representation, we attain better than, or comparable to, state-of-the-art action recognition performance on two challenging benchmark datasets and at the same time produce good action localization results.


Journal of Artificial Intelligence Research | 2016

Automatic description generation from images: a survey of models, datasets, and evaluation measures

Raffaella Bernardi; Ruket Cakici; Desmond Elliott; Aykut Erdem; Erkut Erdem; Nazli Ikizler-Cinbis; Frank Keller; Adrian Muscat; Barbara Plank

Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities. In this survey, we classify the existing approaches based on how they conceptualize this problem, viz., models that cast description as either generation problem or as a retrieval problem over a visual or multimodal representational space. We provide a detailed review of existing models, highlighting their advantages and disadvantages. Moreover, we give an overview of the benchmark image datasets and the evaluation measures that have been developed to assess the quality of machine-generated image descriptions. Finally we extrapolate future directions in the area of automatic image description generation.


IEEE Transactions on Multimedia | 2012

Web-Based Classifiers for Human Action Recognition

Nazli Ikizler-Cinbis; Stan Sclaroff

Action recognition in uncontrolled videos is a challenging task, where it is relatively hard to find the large amount of required training videos to model all the variations of the domain. This paper addresses this challenge and proposes a generic method for action recognition. The idea is to use images collected from the Web to learn representations of actions and leverage this knowledge to automatically annotate actions in videos. For this purpose, we first use an incremental image retrieval procedure to collect and clean up the necessary training set for building the human pose classifiers. Our approach is unsupervised in the sense that it requires no human intervention other than the text querying to an internet search engine. Its benefits are two-fold: 1) we can improve retrieval of action images, and 2) we can collect a large generic database of action poses, which can then be used in tagging videos. We present experimental evidence that using action images collected from the Web, annotating actions in the videos is possible. Additionally, we explore how the Web-based pose classifiers can be utilized in conjunction with limited labelled videos. We propose to use “ordered pose pairs” (OPP) for encoding the temporal ordering of poses in our action model, and show that considering the temporal ordering of pose pairs can increase the action recognition accuracy. We also show that by selecting the keyposes with the help of Web-based classifiers, the classification time can be reduced. Our experiments demonstrate that, with or without available video data, the pose models learned from the Web can improve the performance of the action recognition systems.


international conference on computer vision | 2012

On recognizing actions in still images via multiple features

Fadime Sener; Cagdas Bas; Nazli Ikizler-Cinbis

We propose a multi-cue based approach for recognizing human actions in still images, where relevant object regions are discovered and utilized in a weakly supervised manner. Our approach does not require any explicitly trained object detector or part/attribute annotation. Instead, a multiple instance learning approach is used over sets of object hypotheses in order to represent objects relevant to the actions. We test our method on the extensive Stanford 40 Actions dataset [1] and achieve significant performance gain compared to the state-of-the-art. Our results show that using multiple object hypotheses within multiple instance learning is effective for human action recognition in still images and such an object representation is suitable for using in conjunction with other visual features.


international conference on computer vision | 2012

Unsupervised learning of discriminative relative visual attributes

Shugao Ma; Stan Sclaroff; Nazli Ikizler-Cinbis

Unsupervised learning of relative visual attributes is important because it is often infeasible for a human annotator to predefine and manually label all the relative attributes in large datasets. We propose a method for learning relative visual attributes given a set of images for each training class. The method is unsupervised in the sense that it does not require a set of predefined attributes. We formulate the learning as a mixed-integer programming problem and propose an efficient algorithm to solve it approximately. Experiments show that the learned attributes can provide good generalization and tend to be more discriminative than hand-labeled relative attributes. While in the unsupervised setting the learned attributes do not have explicit names, many are highly correlated with human annotated attributes and this demonstrates that our method is able to discover relative attributes automatically.


Journal of Visual Communication and Image Representation | 2015

Two-person interaction recognition via spatial multiple instance embedding

Fadime Sener; Nazli Ikizler-Cinbis

A MI-based framework for two-person interaction recognition in videos.Relative distances between people are encoded within MI-learning.Two-person features are utilized in spatial multiple instance embedding.Our framework receives on par or better results than the state-of-the-art. In this work, we look into the problem of recognizing two-person interactions in videos. Our method integrates multiple visual features in a weakly supervised manner by utilizing an embedding-based multiple instance learning framework. In our proposed method, first, several visual features that capture the shape and motion of the interacting people are extracted from each detected person region in a video. Then, two-person visual descriptors are formed. Since the relative spatial locations of interacting people are likely to complement the visual descriptors, we propose to use spatial multiple instance embedding, which implicitly incorporates the distances between people into the multiple instance learning process. Experimental results on two benchmark datasets validate that using two-person visual descriptors together with spatial multiple instance learning offers an effective way for inferring the type of the interaction.


Pattern Recognition Letters | 2016

Facial descriptors for human interaction recognition in still images

Gokhan Tanisik; Cemil Zalluhoglu; Nazli Ikizler-Cinbis

We explore the contribution of face features to interaction recognition in images.Several novel facial feature descriptors are proposed.A new dataset for human interaction recognition in still images is collected.Scene and Convolutional Neural Network (CNN) features are also explored.Experiments show that faces provide complementary information for recognition. This paper presents a novel approach in a rarely studied area of computer vision: Human interaction recognition in still images. We explore whether the facial regions and their spatial configurations contribute to the recognition of interactions. In this respect, our method involves extraction of several visual features from the facial regions, as well as incorporation of scene characteristics and deep features to the recognition. Extracted multiple features are utilized within a discriminative learning framework for recognizing interactions between people. Our designed facial descriptors are based on the observation that relative positions, size and locations of the faces are likely to be important for characterizing human interactions. Since there is no available dataset in this relatively new domain, a comprehensive new dataset which includes several images of human interactions is collected. Our experimental results show that faces and scene characteristics contain important information to recognize interactions between people.


Pattern Recognition Letters | 2016

Low-level features for visual attribute recognition

Emine Gul Danaci; Nazli Ikizler-Cinbis

A comprehensive evaluation on the use of low-level features in attribute recognition.Several color, texture, shape and deep (CNN) features are evaluated.Experiments show that best feature may vary for different attribute types.Although CNN features outperform others, HOG and CSIFT are also competitive.Weighted late fusion is a more effective strategy for combining low-level features. In recent years, visual attributes, which are mid-level representations that describe human-understandable aspects of objects and scenes, have become a popular topic of computer vision research. Visual attributes are being used in various tasks, including object recognition, people search, scene recognition, and many more. A critical step in attribute recognition is the extraction of low-level features, which encodes the local visual characteristics in images, and provides the representation used in the attribute prediction step. In this work, we explore the effects of utilizing different low-level features on learning visual attributes. In particular, we analyze the performance of various shape, color, texture and deep neural network features. Experiments have been carried out on four different datasets, which have been collected for different visual recognition tasks and extensive evaluations have been reported. Our results show that, while the supervised deep features are effective, using them in combination with low-level features can lead to significant improvements in attribute recognition performance.

Collaboration


Dive into the Nazli Ikizler-Cinbis's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ruket Cakici

Middle East Technical University

View shared research outputs
Researchain Logo
Decentralizing Knowledge