Sinisa Todorovic | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sinisa Todorovic is active.

Explore More

Publication

Featured researches published by Sinisa Todorovic.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2010

Local-Learning-Based Feature Selection for High-Dimensional Data Analysis

Yijun Sun; Sinisa Todorovic; Steve Goodison

This paper considers feature selection for data classification in the presence of a huge number of irrelevant features. We propose a new feature-selection algorithm that addresses several major issues with prior work, including problems with algorithm implementation, computational complexity, and solution accuracy. The key idea is to decompose an arbitrarily complex nonlinear problem into a set of locally linear ones through local learning, and then learn feature relevance globally within the large margin framework. The proposed algorithm is based on well-established machine learning and numerical analysis techniques, without making any assumptions about the underlying data distribution. It is capable of processing many thousands of features within minutes on a personal computer while maintaining a very high accuracy that is nearly insensitive to a growing number of irrelevant features. Theoretical analyses of the algorithms sample complexity suggest that the algorithm has a logarithmical sample complexity with respect to the number of features. Experiments on 11 synthetic and real-world data sets demonstrate the viability of our formulation of the feature-selection problem for supervised learning and the effectiveness of our algorithm.

international conference on computer vision | 2011

Learning spatiotemporal graphs of human activities

William Brendel; Sinisa Todorovic

Complex human activities occurring in videos can be defined in terms of temporal configurations of primitive actions. Prior work typically hand-picks the primitives, their total number, and temporal relations (e.g., allow only followed-by), and then only estimates their relative significance for activity recognition. We advance prior work by learning what activity parts and their spatiotemporal relations should be captured to represent the activity, and how relevant they are for enabling efficient inference in realistic videos. We represent videos by spatiotemporal graphs, where nodes correspond to multiscale video segments, and edges capture their hierarchical, temporal, and spatial relationships. Access to video segments is provided by our new, multiscale segmenter. Given a set of training spatiotemporal graphs, we learn their archetype graph, and pdfs associated with model nodes and edges. The model adaptively learns from data relevant video segments and their relations, addressing the “what” and “how.” Inference and learning are formulated within the same framework - that of a robust, least-squares optimization - which is invariant to arbitrary permutations of nodes in spatiotemporal graphs. The model is used for parsing new videos in terms of detecting and localizing relevant activity parts. We out-perform the state of the art on benchmark Olympic and UT human-interaction datasets, under a favorable complexity-vs.-accuracy trade-off.

computer vision and pattern recognition | 2011

Multiobject tracking as maximum weight independent set

William Brendel; Mohamed R. Amer; Sinisa Todorovic

This paper addresses the problem of simultaneous tracking of multiple targets in a video. We first apply object detectors to every video frame. Pairs of detection responses from every two consecutive frames are then used to build a graph of tracklets. The graph helps transitively link the best matching tracklets that do not violate hard and soft contextual constraints between the resulting tracks. We prove that this data association problem can be formulated as finding the maximum-weight independent set (MWIS) of the graph. We present a new, polynomial-time MWIS algorithm, and prove that it converges to an optimum. Similarity and contextual constraints between object detections, used for data association, are learned online from object appearance and motion properties. Long-term occlusions are addressed by iteratively repeating MWIS to hierarchically merge smaller tracks into longer ones. Our results demonstrate advantages of simultaneously accounting for soft and hard contextual constraints in multitarget tracking. We outperform the state of the art on the benchmark datasets.

international conference on computer vision | 2009

Video object segmentation by tracking regions

William Brendel; Sinisa Todorovic

This paper presents an approach to unsupervised segmentation of moving and static objects occurring in a video. Objects are, in general, spatially cohesive and characterized by locally smooth motion trajectories. Therefore, they occupy regions within each frame, while the shape and location of these regions vary slowly from frame to frame. Thus, video segmentation can be done by tracking regions across the frames such that the resulting tracks are locally smooth. To this end, we use a low-level segmentation to extract regions in all frames, and then we transitively match and cluster the similar regions across the video. The similarity is defined with respect to the region photometric, geometric, and motion properties. We formulate a new circular dynamic-time warping (CDTW) algorithm that generalizes DTW to match closed boundaries of two regions, without compromising DTWs guarantees of achieving the optimal solution with linear complexity. Our quantitative evaluation and comparison with the state of the art suggest that the proposed approach is a competitive alternative to currently prevailing point-based methods.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2008

Unsupervised Category Modeling, Recognition, and Segmentation in Images

Sinisa Todorovic; Narendra Ahuja

Suppose a set of arbitrary (unlabeled) images contains frequent occurrences of 2D objects from an unknown category. This paper is aimed at simultaneously solving the following related problems: 1) unsupervised identification of photometric, geometric, and topological properties of multiscale regions comprising instances of the 2D category, 2) learning a region-based structural model of the category in terms of these properties, and 3) detection, recognition, and segmentation of objects from the category in new images. To this end, each image is represented by a tree that captures a multiscale image segmentation. The trees are matched to extract the maximally matching subtrees across the set, which are taken as instances of the target category. The extracted subtrees are then fused into a tree union that represents the canonical category model. Detection, recognition, and segmentation of objects from the learned category are achieved simultaneously by finding matches of the category model with the segmentation tree of a new image. Experimental validation on benchmark data sets demonstrates the robustness and high accuracy of the learned category models when only a few training examples are used for learning without any human supervision.

computer vision and pattern recognition | 2006

Extracting Subimages of an Unknown Category from a Set of Images

Sinisa Todorovic; Narendra Ahuja

Suppose a set of images contains frequent occurrences of objects from an unknown category. This paper is aimed at simultaneously solving the following related problems: (1) unsupervised identification of photometric, geometric, and topological (mutual containment) properties of multiscale regions defining objects in the category; (2) learning a region-based structural model of the category in terms of these properties from a set of training images; and (3) segmentation and recognition of objects from the category in new images. To this end, each image is represented by a tree that captures a multiscale image segmentation. The trees are matched to find the maximally matching subtrees across the set, the existence of which is itself viewed as evidence that a category is indeed present. The matched subtrees are fused into a canonical tree, which represents the learned model of the category. Recognition of objects in a new image and image segmentation delineating all object parts are achieved simultaneously by finding matches of the model with subtrees of the new image. Experimental comparison with state-of-the-art methods shows that the proposed approach has similar recognition and superior localization performance while it uses fewer training examples.

international conference on computer vision | 2011

From contours to 3D object detection and pose estimation

Nadia Payet; Sinisa Todorovic

This paper addresses view-invariant object detection and pose estimation from a single image. While recent work focuses on object-centered representations of point-based object features, we revisit the viewer-centered framework, and use image contours as basic features. Given training examples of arbitrary views of an object, we learn a sparse object model in terms of a few view-dependent shape templates. The shape templates are jointly used for detecting object occurrences and estimating their 3D poses in a new image. Instrumental to this is our new mid-level feature, called bag of boundaries (BOB), aimed at lifting from individual edges toward their more informative summaries for identifying object boundaries amidst the background clutter. In inference, BOBs are placed on deformable grids both in the image and the shape templates, and then matched. This is formulated as a convex optimization problem that accommodates invariance to non-rigid, locally affine shape deformations. Evaluation on benchmark datasets demonstrates our competitive results relative to the state of the art.

computer vision and pattern recognition | 2011

Probabilistic event logic for interval-based event recognition

William Brendel; Alan Fern; Sinisa Todorovic

This paper is about detecting and segmenting interrelated events which occur in challenging videos with motion blur, occlusions, dynamic backgrounds, and missing observations. We argue that holistic reasoning about time intervals of events, and their temporal constraints is critical in such domains to overcome the noise inherent to low-level video representations. For this purpose, our first contribution is the formulation of probabilistic event logic (PEL) for representing temporal constraints among events. A PEL knowledge base consists of confidence-weighted formulas from a temporal event logic, and specifies a joint distribution over the occurrence time intervals of all events. Our second contribution is a MAP inference algorithm for PEL that addresses the scalability issue of reasoning about an enormous number of time intervals and their constraints in a typical video. Specifically, our algorithm leverages the spanning-interval data structure for compactly representing and manipulating entire sets of time intervals without enumerating them. Our experiments on interpreting basketball videos show that PEL inference is able to jointly detect events and identify their time intervals, based on noisy input from primitive-event detectors.

european conference on computer vision | 2012

Cost-Sensitive top-down/bottom-up inference for multiscale activity recognition

Mohamed R. Amer; Dan Xie; Mingtian Zhao; Sinisa Todorovic; Song-Chun Zhu

This paper addresses a new problem, that of multiscale activity recognition. Our goal is to detect and localize a wide range of activities, including individual actions and group activities, which may simultaneously co-occur in high-resolution video. The video resolution allows for digital zoom-in (or zoom-out) for examining fine details (or coarser scales), as needed for recognition. The key challenge is how to avoid running a multitude of detectors at all spatiotemporal scales, and yet arrive at a holistically consistent video interpretation. To this end, we use a three-layered AND-OR graph to jointly model group activities, individual actions, and participating objects. The AND-OR graph allows a principled formulation of efficient, cost-sensitive inference via an explore-exploit strategy. Our inference optimally schedules the following computational processes: 1) direct application of activity detectors --- called α process; 2) bottom-up inference based on detecting activity parts --- called β process; and 3) top-down inference based on detecting activity context --- called γ process. The scheduling iteratively maximizes the log-posteriors of the resulting parse graphs. For evaluation, we have compiled and benchmarked a new dataset of high-resolution videos of group and individual activities co-occurring in a courtyard of the UCLA campus.

european conference on computer vision | 2010

Activities as time series of human postures

William Brendel; Sinisa Todorovic

This paper presents an exemplar-based approach to detecting and localizing human actions, such as running, cycling, and swinging, in realistic videos with dynamic backgrounds. We show that such activities can be compactly represented as time series of a few snapshots of human-body parts in their most discriminative postures, relative to other activity classes. This enables our approach to efficiently store multiple diverse exemplars per activity class, and quickly retrieve exemplars that best match the query by aligning their short time-series representations. Given a set of example videos of all activity classes, we extract multiscale regions from all their frames, and then learn a sparse dictionary of most discriminative regions. The Viterbi algorithm is then used to track detections of the learned codewords across frames of each video, resulting in their compact time-series representations. Dictionary learning is cast within the largemargin framework, wherein we study the effects of l1 and l2 regularization on the sparseness of the resulting dictionaries. Our experiments demonstrate robustness and scalability of our approach on challenging YouTube videos.

Explore More