Wongun Choi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wongun Choi is active.

Explore More

Publication

Featured researches published by Wongun Choi.

european conference on computer vision | 2012

A unified framework for multi-target tracking and collective activity recognition

Wongun Choi; Silvio Savarese

We present a coherent, discriminative framework for simultaneously tracking multiple people and estimating their collective activities. Instead of treating the two problems separately, our model is grounded in the intuition that a strong correlation exists between a persons motion, their activity, and the motion and activities of other nearby people. Instead of directly linking the solutions to these two problems, we introduce a hierarchy of activity types that creates a natural progression that leads from a specific persons motion to the activity of the group as a whole. Our model is capable of jointly tracking multiple people, recognizing individual activities (atomic activities), the interactions between pairs of people (interaction activities), and finally the behavior of groups of people (collective activities). We also propose an algorithm for solving this otherwise intractable joint inference problem by combining belief propagation with a version of the branch and bound algorithm equipped with integer programming. Experimental results on challenging video datasets demonstrate our theoretical claims and indicate that our model achieves the best collective activity classification results to date.

computer vision and pattern recognition | 2011

Learning context for collective activity recognition

Wongun Choi; Khuram Shahid; Silvio Savarese

In this paper we present a framework for the recognition of collective human activities. A collective activity is defined or reinforced by the existence of coherent behavior of individuals in time and space. We call such coherent behavior ‘Crowd Context’. Examples of collective activities are “queuing in a line” or “talking”. Following [7], we propose to recognize collective activities using the crowd context and introduce a new scheme for learning it automatically. Our scheme is constructed upon a Random Forest structure which randomly samples variable volume spatio-temporal regions to pick the most discriminating attributes for classification. Unlike previous approaches, our algorithm automatically finds the optimal configuration of spatio-temporal bins, over which to sample the evidence, by randomization. This enables a methodology for modeling crowd context. We employ a 3D Markov Random Field to regularize the classification and localize collective activities in the scene. We demonstrate the flexibility and scalability of the proposed framework in a number of experiments and show that our method outperforms state-of-the art action classification techniques [7, 19].

international conference on computer vision | 2009

What are they doing? : Collective activity classification using spatio-temporal relationship among people

Wongun Choi; Khuram Shahid; Silvio Savarese

In this paper we present a new framework for pedestrian action categorization. Our method enables the classification of actions whose semantic can be only analyzed by looking at the collective behavior of pedestrians in the scene. Examples of these actions are waiting by a street intersection versus standing in a queue. To that end, we exploit the spatial distribution of pedestrians in the scene as well as their pose and motion for achieving robust action classification. Our proposed solution employs extended Kalman filtering for tracking of detected pedestrians in 2D 1/2 scene coordinates as well as camera parameter and horizon estimation for tracker filtering and stabilization. We present a local spatio-temporal descriptor effective in capturing the spatial distribution of pedestrians over time as well as their pose. This descriptor captures pedestrian activity while requiring no high level scene understanding. Our work is tested against highly challenging real world pedestrian video sequences captured by low resolution hand held cameras. Experimental results on a 5-class action dataset indicate that our solution: i) is effective in classifying collective pedestrian activities; ii) is tolerant to challenging real world conditions such as variation in illumination, scale, viewpoint as well as partial occlusion and background motion; iii) outperforms state-of-the art action classification techniques.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2013

A General Framework for Tracking Multiple People from a Moving Camera

Wongun Choi; Caroline Pantofaru; Silvio Savarese

In this paper, we present a general framework for tracking multiple, possibly interacting, people from a mobile vision platform. To determine all of the trajectories robustly and in a 3D coordinate system, we estimate both the cameras ego-motion and the peoples paths within a single coherent framework. The tracking problem is framed as finding the MAP solution of a posterior probability, and is solved using the reversible jump Markov chain Monte Carlo (RJ-MCMC) particle filtering method. We evaluate our system on challenging datasets taken from moving cameras, including an outdoor street scene video dataset, as well as an indoor RGB-D dataset collected in an office. Experimental evidence shows that the proposed method can robustly estimate a cameras motion from dynamic scenes and stably track people who are moving independently or interacting.

computer vision and pattern recognition | 2013

Understanding Indoor Scenes Using 3D Geometric Phrases

Wongun Choi; Yu-Wei Chao; Caroline Pantofaru; Silvio Savarese

Visual scene understanding is a difficult problem interleaving object detection, geometric reasoning and scene classification. We present a hierarchical scene model for learning and reasoning about complex indoor scenes which is computationally tractable, can be learned from a reasonable amount of training data, and avoids oversimplification. At the core of this approach is the 3D Geometric Phrase Model which captures the semantic and geometric relationships between objects which frequently co-occur in the same 3D spatial configuration. Experiments show that this model effectively explains scene semantics, geometry and object groupings from a single image, while also improving individual object detections.

international conference on computer vision | 2011

Detecting and tracking people using an RGB-D camera via multiple detector fusion

Wongun Choi; Caroline Pantofaru; Silvio Savarese

The goal of personal robotics is to create machines that help us with the tasks of daily living, co-habiting with us in our homes and offices. These robots must interact with people on a daily basis, navigating with and around people, and approaching people to serve them. To enable this coexistence, personal robots must be able to detect and track people in their environment. Excellent progress has been made in the vision community in detecting people outdoors, in surveillance scenarios, in Internet images, or in specific scenarios such as video game play in living rooms. The indoor robot perception problem differs, however, in that the platform is moving, the subjects are frequently occluded or truncated by the field-of-view, there is large scale variation, the subjects take on a wider range of poses than pedestrians, and computation must take place in near real time. In this paper, we describe a system for detecting and tracking people from image and depth sensors on board a mobile robot. To cope with the challenges of indoor mobile perception, our system combines an ensemble of detectors in a unified framework, is efficient, and has the potential to incorporate multiple sensor inputs. The performance of our algorithm surpasses other approaches on two challenging data sets, including a new robot-based data set.

computer vision and pattern recognition | 2016

Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers

Fan Yang; Wongun Choi; Yuanqing Lin

In this paper, we investigate two new strategies to detect objects accurately and efficiently using deep convolutional neural network: 1) scale-dependent pooling and 2) layerwise cascaded rejection classifiers. The scale-dependent pooling (SDP) improves detection accuracy by exploiting appropriate convolutional features depending on the scale of candidate object proposals. The cascaded rejection classifiers (CRC) effectively utilize convolutional features and eliminate negative object proposals in a cascaded manner, which greatly speeds up the detection while maintaining high accuracy. In combination of the two, our method achieves significantly better accuracy compared to other state-of-the-arts in three challenging datasets, PASCAL object detection challenge, KITTI object detection benchmark and newly collected Inner-city dataset, while being more efficient.

computer vision and pattern recognition | 2015

Data-driven 3D Voxel Patterns for object category recognition

Yu Xiang; Wongun Choi; Yuanqing Lin; Silvio Savarese

Despite the great progress achieved in recognizing objects as 2D bounding boxes in images, it is still very challenging to detect occluded objects and estimate the 3D properties of multiple objects from a single image. In this paper, we propose a novel object representation, 3D Voxel Pattern (3DVP), that jointly encodes the key properties of objects including appearance, 3D shape, viewpoint, occlusion and truncation. We discover 3DVPs in a data-driven way, and train a bank of specialized detectors for a dictionary of 3DVPs. The 3DVP detectors are capable of detecting objects with specific visibility patterns and transferring the meta-data from the 3DVPs to the detected objects, such as 2D segmentation mask, 3D pose as well as occlusion or truncation boundaries. The transferred meta-data allows us to infer the occlusion relationship among objects, which in turn provides improved object recognition results. Experiments are conducted on the KITTI detection benchmark [17] and the outdoor-scene dataset [41]. We improve state-of-the-art results on car detection and pose estimation with notable margins (6% in difficult data of KITTI). We also verify the ability of our method in accurately segmenting objects from the background and localizing them in 3D.

european conference on computer vision | 2010

Multiple target tracking in world coordinate with single, minimally calibrated camera

Wongun Choi; Silvio Savarese

Tracking multiple objects is important in many application domains. We propose a novel algorithm for multi-object tracking that is capable of working under very challenging conditions such as minimal hardware equipment, uncalibrated monocular camera, occlusions and severe background clutter. To address this problem we propose a new method that jointly estimates object tracks, estimates corresponding 2D/3D temporal trajectories in the camera reference system as well as estimates the model parameters (pose, focal length, etc) within a coherent probabilistic formulation. Since our goal is to estimate stable and robust tracks that can be univocally associated to the object IDs, we propose to include in our formulation an interaction (attraction and repulsion) model that is able to model multiple 2D/3D trajectories in space-time and handle situations where objects occlude each other. We use a MCMC particle filtering algorithm for parameter inference and propose a solution that enables accurate and efficient tracking and camera model estimation. Qualitative and quantitative experimental results obtained using our own dataset and the publicly available ETH dataset shows very promising tracking and camera estimation results.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2014

Understanding Collective Activitiesof People from Videos

Wongun Choi; Silvio Savarese

This paper presents a principled framework for analyzing collective activities at different levels of semantic granularity from videos. Our framework is capable of jointly tracking multiple individuals, recognizing activities performed by individuals in isolation (i.e., atomic activities such as walking or standing), recognizing the interactions between pairs of individuals (i.e., interaction activities) as well as understanding the activities of group of individuals (i.e., collective activities). A key property of our work is that it can coherently combine bottom-up information stemming from detections or fragments of tracks (or tracklets) with top-down evidence. Top-down evidence is provided by a newly proposed descriptor that captures the coherent behavior of groups of individuals in a spatial-temporal neighborhood of the sequence. Top-down evidence provides contextual information for establishing accurate associations between detections or tracklets across frames and, thus, for obtaining more robust tracking results. Bottom-up evidence percolates upwards so as to automatically infer collective activity labels. Experimental results on two challenging data sets demonstrate our theoretical claims and indicate that our model achieves enhances tracking results and the best collective classification results to date.

Explore More