Ramazan Gokberk Cinbis

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ramazan Gokberk Cinbis is active.

Explore More

Publication

Featured researches published by Ramazan Gokberk Cinbis.

computer vision and pattern recognition | 2014

Multi-fold MIL Training for Weakly Supervised Object Localization

Ramazan Gokberk Cinbis; Jakob J. Verbeek; Cordelia Schmid

Object category localization is a challenging problem in computer vision. Standard supervised training requires bounding box annotations of object instances. This time-consuming annotation process is sidestepped in weakly supervised learning. In this case, the supervised information is restricted to binary labels that indicate the absence/presence of object instances in the image, without their locations. We follow a multiple-instance learning approach that iteratively trains the detector and infers the object locations in the positive training images. Our main contribution is a multi-fold multiple instance learning procedure, which prevents training from prematurely locking onto erroneous object locations. This procedure is particularly important when high-dimensional representations, such as the Fisher vectors, are used. We present a detailed experimental evaluation using the PASCAL VOC 2007 dataset. Compared to state-of-the-art weakly supervised detectors, our approach better localizes objects in the training images, which translates into improved detection performance.

international conference on computer vision | 2013

Segmentation Driven Object Detection with Fisher Vectors

Ramazan Gokberk Cinbis; Jakob J. Verbeek; Cordelia Schmid

We present an object detection system based on the Fisher vector (FV) image representation computed over SIFT and color descriptors. For computational and storage efficiency, we use a recent segmentation-based method to generate class-independent object detection hypotheses, in combination with data compression techniques. Our main contribution is a method to produce tentative object segmentation masks to suppress background clutter in the features. Re-weighting the local image features based on these masks is shown to improve object detection significantly. We also exploit contextual features in the form of a full-image FV descriptor, and an inter-category rescoring mechanism. Our experiments on the VOC 2007 and 2010 datasets show that our detector improves over the current state-of-the-art detection results.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2017

Weakly Supervised Object Localization with Multi-Fold Multiple Instance Learning

Ramazan Gokberk Cinbis; Jakob J. Verbeek; Cordelia Schmid

Object category localization is a challenging problem in computer vision. Standard supervised training requires bounding box annotations of object instances. This time-consuming annotation process is sidestepped in weakly supervised learning. In this case, the supervised information is restricted to binary labels that indicate the absence/presence of object instances in the image, without their locations. We follow a multiple-instance learning approach that iteratively trains the detector and infers the object locations in the positive training images. Our main contribution is a multi-fold multiple instance learning procedure, which prevents training from prematurely locking onto erroneous object locations. This procedure is particularly important when using high-dimensional representations, such as Fisher vectors and convolutional neural network features. We also propose a window refinement method, which improves the localization accuracy by incorporating an objectness prior. We present a detailed experimental evaluation using the PASCAL VOC 2007 dataset, which verifies the effectiveness of our approach.

international conference on computer vision | 2011

Unsupervised metric learning for face identification in TV video

Ramazan Gokberk Cinbis; Jakob J. Verbeek; Cordelia Schmid

The goal of face identification is to decide whether two faces depict the same person or not. This paper addresses the identification problem for face-tracks that are automatically collected from uncontrolled TV video data. Face-track identification is an important component in systems that automatically label characters in TV series or movies based on subtitles and/or scripts: it enables effective transfer of the sparse text-based supervision to other faces. We show that, without manually labeling any examples, metric learning can be effectively used to address this problem. This is possible by using pairs of faces within a track as positive examples, while negative training examples can be generated from pairs of face tracks of different people that appear together in a video frame. In this manner we can learn a cast-specific metric, adapted to the people appearing in a particular video, without using any supervision. Identification performance can be further improved using semi-supervised learning where we also include labels for some of the face tracks. We show that our cast-specific metrics not only improve identification, but also recognition and clustering.

international conference on pattern recognition | 2008

Recognizing actions from still images

Nazlı İkizler; Ramazan Gokberk Cinbis; Selen Pehlivan; Pinar Duygulu

In this paper, we approach the problem of understanding human actions from still images. Our method involves representing the pose with a spatial and orientational histogramming of rectangular regions on a parse probability map. We use LDA to obtain a more compact and discriminative feature representation and binary SVMs for classification. Our results over a new dataset collected for this problem show that by using a rectangle histogramming approach, we can discriminate actions to a great extent. We also show how we can use this approach in an unsupervised setting. To our best knowledge, this is one of the first studies that try to recognize actions within still images.

international conference on pattern recognition | 2008

Human action recognition with line and flow histograms

Nazlı İkizler; Ramazan Gokberk Cinbis; Pinar Duygulu

We present a compact representation for human action recognition in videos using line and optical flow histograms. We introduce a new shape descriptor based on the distribution of lines which are fitted to boundaries of human figures. By using an entropy-based approach, we apply feature selection to densify our feature representation, thus, minimizing classification time without degrading accuracy. We also use a compact representation of optical flow for motion information. Using line and flow histograms together with global velocity information, we show that high-accuracy action recognition is possible, even in challenging recording conditions.

computer vision and pattern recognition | 2012

Image categorization using Fisher kernels of non-iid image models

Ramazan Gokberk Cinbis; Jakob J. Verbeek; Cordelia Schmid

The bag-of-words (BoW) model treats images as an unordered set of local regions and represents them by visual word histograms. Implicitly, regions are assumed to be identically and independently distributed (iid), which is a poor assumption from a modeling perspective. We introduce non-iid models by treating the parameters of BoW models as latent variables which are integrated out, rendering all local regions dependent. Using the Fisher kernel we encode an image by the gradient of the data log-likelihood w.r.t. hyper-parameters that control priors on the model parameters. Our representation naturally involves discounting transformations similar to taking square-roots, providing an explanation of why such transformations have proven successful. Using variational inference we extend the basic model to include Gaussian mixtures over local descriptors, and latent topic models to capture the co-occurrence structure of visual words, both improving performance. Our models yield state-of-the-art categorization performance using linear classifiers; without using non-linear transformations such as taking square-roots of features, or using (approximate) explicit embeddings of non-linear kernels.

european conference on computer vision | 2012

Contextual object detection using set-based classification

Ramazan Gokberk Cinbis; Stan Sclaroff

We propose a new model for object detection that is based on set representations of the contextual elements. In this formulation, relative spatial locations and relative scores between pairs of detections are considered as sets of unordered items. Directly training classification models on sets of unordered items, where each set can have varying cardinality can be difficult. In order to overcome this problem, we propose SetBoost, a discriminative learning algorithm for building set classifiers. The SetBoost classifiers are trained to rescore detected objects based on object-object and object-scene context. Our method is able to discover composite relationships, as well as intra-class and inter-class spatial relationships between objects. The experimental evidence shows that our set-based formulation performs comparable to or better than existing contextual methods on the SUN and the VOC 2007 benchmark datasets.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2016

Approximate Fisher Kernels of Non-iid Image Models for Image Categorization

Ramazan Gokberk Cinbis; Jakob J. Verbeek; Cordelia Schmid

The bag-of-words (BoW) model treats images as sets of local descriptors and represents them by visual word histograms. The Fisher vector (FV) representation extends BoW, by considering the first and second order statistics of local descriptors. In both representations local descriptors are assumed to be identically and independently distributed (iid), which is a poor assumption from a modeling perspective. It has been experimentally observed that the performance of BoW and FV representations can be improved by employing discounting transformations such as power normalization. In this paper, we introduce non-iid models by treating the model parameters as latent variables which are integrated out, rendering all local regions dependent. Using the Fisher kernel principle we encode an image by the gradient of the data log-likelihood w.r.t. the model hyper-parameters. Our models naturally generate discounting effects in the representations; suggesting that such transformations have proven successful because they closely correspond to the representations obtained for non-iid models. To enable tractable computation, we rely on variational free-energy bounds to learn the hyper-parameters and to compute approximate Fisher kernels. Our experimental evaluation results validate that our models lead to performance improvements comparable to using power normalization, as employed in state-of-the-art feature aggregation methods.

international conference on image processing | 2007

Relative Position-Based Spatial Relationships using Mathematical Morphology

Ramazan Gokberk Cinbis; Selim Aksoy

Spatial information is a crucial aspect of image understanding for modeling context as well as resolving the uncertainties caused by the ambiguities in low-level features. We describe intuitive, flexible and efficient methods for modeling pairwise directional spatial relationships and the ternary between relation using fuzzy mathematical morphology. First, a fuzzy landscape is constructed where each point is assigned a value that quantifies its relative position according to the reference object(s) and the type of the relationship. Then, the degree of satisfaction of this relation by a target object is computed by integrating the corresponding landscape over the support of the target region. Our models support sensitivity to visibility to handle areas that are partially enclosed by objects and are not visible from image points along the direction of interest. They can also cope with the cases where one object is significantly spatially extended relative to others. Experiments using synthetic and real images show that our models produce more intuitive results than other techniques.

Explore More