Serhan Cosar
Sabancı University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Serhan Cosar.
IEEE Transactions on Circuits and Systems for Video Technology | 2017
Serhan Cosar; Giuseppe Donatiello; Vania Bogorny; Carolina Garate; Luis Otavio Alvares; Francois Bremond
In this paper, we present a unified approach for abnormal behavior detection and group behavior analysis in video scenes. Existing approaches for abnormal behavior detection do either use trajectory-based or pixel-based methods. Unlike these approaches, we propose an integrated pipeline that incorporates the output of object trajectory analysis and pixel-based analysis for abnormal behavior inference. This enables to detect abnormal behaviors related to speed and direction of object trajectories, as well as complex behaviors related to finer motion of each object. By applying our approach on three different data sets, we show that our approach is able to detect several types of abnormal group behaviors with less number of false alarms compared with existing approaches.
Image and Vision Computing | 2011
Serhan Cosar; Müjdat Çetin
In this paper a facial feature point tracker that is motivated by applications such as human-computer interfaces and facial expression analysis systems is proposed. The proposed tracker is based on a graphical model framework. The facial features are tracked through video streams by incorporating statistical relations in time as well as spatial relations between feature points. By exploiting the spatial relationships between feature points, the proposed method provides robustness in real-world conditions such as arbitrary head movements and occlusions. A Gabor feature-based occlusion detector is developed and used to handle occlusions. The performance of the proposed tracker has been evaluated on real video data under various conditions including occluded facial gestures and head movements. It is also compared to two popular methods, one based on Kalman filtering exploiting temporal relations, and the other based on active appearance models (AAM). Improvements provided by the proposed approach are demonstrated through both visual displays and quantitative analysis.
Journal of Visual Communication and Image Representation | 2014
Serhan Cosar; Müjdat Çetin
Visual sensor networks (VSNs) consist of image sensors, embedded processors and wireless transceivers which are powered by batteries. Since the energy and bandwidth resources are limited, setting up a tracking system in VSNs is a challenging problem. In this paper, we present a framework for human tracking in VSNs. The traditional approach of sending compressed images to a central node has certain disadvantages such as decreasing the performance of further processing (i.e., tracking) because of low quality images. Instead, we propose a feature compression-based decentralized tracking framework that is better matched with the further inference goal of tracking. In our method, each camera performs feature extraction and obtains likelihood functions. By transforming to an appropriate domain and taking only the significant coefficients, these likelihood functions are compressed and this new representation is sent to the fusion node. As a result, this allows us to reduce the communication in the network without significantly affecting the tracking performance. An appropriate domain is selected by performing a comparison between well-known transforms. We have applied our method for indoor people tracking and demonstrated the superiority of our system over the traditional approach and a decentralized approach that uses Kalman filter.
international conference on computer vision | 2011
Serhan Cosar; Müjdat Çetin
In this paper, a novel 3-D action recognition method based on sparse representation is presented. Silhouette images from multiple cameras are combined to obtain motion history volumes (MHVs). Cylindrical Fourier transform of MHVs is used as action descriptors. We assume that a test sample has a sparse representation in the space of training samples. We cast the action classification problem as an optimization problem and classify actions using group sparsity based on l1 regularization. We show experimental results using the IXMAS multi-view database and demonstrate the superiority of our method, especially when observations are low resolution, occluded, and noisy and when the feature dimension is reduced.
Archive | 2009
Hüseyin Abut; Hakan Erdogan; Aytül Erçil; Baran Çürüklü; Hakkı Can Koman; Fatih Taş; Ali Özgür Argunşah; Serhan Cosar; Batu Akan; Harun Karabalkan; Emrecan Çökelek; Rahmi Fıçıcı; Volkan Sezer; Serhan Danis; Mehmet Karaca; Mehmet Abbak; Mustafa Gökhan Uzunbas; Kayhan Eritmen; Mümin Imamoğlu; Cagatay Karabat
In this chapter, we present data collection activities and preliminary research findings from the real-world database collected with “UYANIK,” a passenger car instrumented with several sensors, CAN-Bus data logger, cameras, microphones, data acquisitions systems, computers, and support systems. Within the shared frameworks of Drive-Safe Consortium (Turkey) and the NEDO (Japan) International Collaborative Research on Driving Behavior Signal Processing, close to 16 TB of driver behavior, vehicular, and road data have been collected from more than 100 drivers on a 25 km route consisting of both city roads and The Trans-European Motorway (TEM) in Istanbul, Turkey. Challenge of collecting data in a metropolis with around 12 million people and famous with extremely limited infrastructure yet driving behavior defying all rules and regulations bordering madness could not be “painless.” Both the experience gained and the preliminary results from still on-going studies using the database are very encouraging and give comfort.
Iet Computer Vision | 2015
Salma Elloumi; Serhan Cosar; Guido Pusiol; Francois Bremond; Monique Thonnat
In this study, the authors propose a complete framework based on a hierarchical activity model to understand and recognise activities of daily living in unstructured scenes. At each particular time of a long-time video, the framework extracts a set of space-time trajectory features describing the global position of an observed person and the motion of his/her body parts. Human motion information is gathered in a new feature that the authors call perceptual feature chunks (PFCs). The set of PFCs is used to learn, in an unsupervised way, particular regions of the scene (topology) where the important activities occur. Using topologies and PFCs, the video is broken into a set of small events (‘primitive events’) that have a semantic meaning. The sequences of ‘primitive events’ and topologies are used to construct hierarchical models for activities. The proposed approach has been tested with the medical field application to monitor patients suffering from Alzheimers and dementia. The authors have compared their approach to their previous study and a rule-based approach. Experimental results show that the framework achieves better performance than existing works and has the potential to be used as a monitoring tool in medical field applications.
advanced video and signal based surveillance | 2016
Farhood Negin; Michal Koperski; Carlos Fernando Crispim; Francois Bremond; Serhan Cosar; Konstantinos Avgerinakis
Many supervised approaches report state-of-the-art results for recognizing short-term actions in manually clipped videos by utilizing fine body motion information. The main downside of these approaches is that they are not applicable in real world settings. The challenge is different when it comes to unstructured scenes and long-term videos. Unsupervised approaches have been used to model the long-term activities but the main pitfall is their limitation to handle subtle differences between similar activities since they mostly use global motion information. In this paper, we present a hybrid approach for long-term human activity recognition with more precise recognition of activities compared to unsupervised approaches. It enables processing of long-term videos by automatically clipping and performing online recognition. The performance of our approach has been tested on two Activities of Daily Living (ADL) datasets. Experimental results are promising compared to existing approaches.
advanced video and signal based surveillance | 2013
Serhan Cosar; Müjdat Çetin
In this paper, a sparsity-driven approach is presented for multi-camera tracking in visual sensor networks (VSNs). VSNs consist of image sensors, embedded processors and wireless transceivers which are powered by batteries. Since the energy and bandwidth resources are limited, setting up a tracking system in VSNs is a challenging problem. Motivated by the goal of tracking in a bandwidth-constrained environment, we present a sparsity-driven method to compress the features extracted by the camera nodes, which are then transmitted across the network for distributed inference. We have designed special overcomplete dictionaries that match the structure of the features, leading to very parsimonious yet accurate representations. We have tested our method in indoor and outdoor people tracking scenarios. Our experimental results demonstrate how our approach leads to communication savings without significant loss in tracking performance.
Sensors | 2017
Carlos Fernando Crispim-Junior; Alvaro Gómez Uría; Carola Strumia; Michal Koperski; Alexandra König; Farhood Negin; Serhan Cosar; Anh Tuan Nghiem; Duc Phu Chau; Guillaume Charpiat; Francois Bremond
Visual activity recognition plays a fundamental role in several research fields as a way to extract semantic meaning of images and videos. Prior work has mostly focused on classification tasks, where a label is given for a video clip. However, real life scenarios require a method to browse a continuous video flow, automatically identify relevant temporal segments and classify them accordingly to target activities. This paper proposes a knowledge-driven event recognition framework to address this problem. The novelty of the method lies in the combination of a constraint-based ontology language for event modeling with robust algorithms to detect, track and re-identify people using color-depth sensing (Kinect® sensor). This combination enables to model and recognize longer and more complex events and to incorporate domain knowledge and 3D information into the same models. Moreover, the ontology-driven approach enables human understanding of system decisions and facilitates knowledge transfer across different scenes. The proposed framework is evaluated with real-world recordings of seniors carrying out unscripted, daily activities at hospital observation rooms and nursing homes. Results demonstrated that the proposed framework outperforms state-of-the-art methods in a variety of activities and datasets, and it is robust to variable and low-frame rate recordings. Further work will investigate how to extend the proposed framework with uncertainty management techniques to handle strong occlusion and ambiguous semantics, and how to exploit it to further support medicine on the timely diagnosis of cognitive disorders, such as Alzheimer’s disease.
advanced video and signal based surveillance | 2016
Carlos Fernando Crispim; Michal Koperski; Serhan Cosar; Francois Bremond
Methods for action recognition have evolved considerably over the past years and can now automatically learn and recognize short term actions with satisfactory accuracy. Nonetheless, the recognition of complex activities - compositions of actions and scene objects - is still an open problem due to the complex temporal and composite structure of this category of events. Existing methods focus either on simple activities or oversimplify the modeling of complex activities by targeting only whole-part relations between its sub-parts (e.g., actions). In this paper, we propose a semi-supervised approach that learns complex activities from the temporal patterns of concept compositions (e.g., “slicing-tomato” before “pouring into-pan”). We demonstrate that our method outperforms prior work in the task of automatic modeling and recognition of complex activities learned out of the interaction of 218 distinct concepts.