Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sangmin Oh is active.

Publication


Featured researches published by Sangmin Oh.


computer vision and pattern recognition | 2011

A large-scale benchmark dataset for event recognition in surveillance video

Sangmin Oh; Anthony Hoogs; A. G. Amitha Perera; Naresh P. Cuntoor; Chia-Chih Chen; Jong Taek Lee; Saurajit Mukherjee; Jake K. Aggarwal; Hyungtae Lee; Larry S. Davis; Eran Swears; Xiaoyang Wang; Qiang Ji; Kishore K. Reddy; Mubarak Shah; Carl Vondrick; Hamed Pirsiavash; Deva Ramanan; Jenny Yuen; Antonio Torralba; Bi Song; Anesco Fong; Amit K. Roy-Chowdhury; Mita Desai

We introduce a new large-scale video dataset designed to assess the performance of diverse visual event recognition algorithms with a focus on continuous visual event recognition (CVER) in outdoor areas with wide coverage. Previous datasets for action recognition are unrealistic for real-world surveillance because they consist of short clips showing one action by one individual [15, 8]. Datasets have been developed for movies [11] and sports [12], but, these actions and scene conditions do not apply effectively to surveillance videos. Our dataset consists of many outdoor scenes with actions occurring naturally by non-actors in continuously captured videos of the real world. The dataset includes large numbers of instances for 23 event types distributed throughout 29 hours of video. This data is accompanied by detailed annotations which include both moving object tracks and event examples, which will provide solid basis for large-scale evaluation. Additionally, we propose different types of evaluation modes for visual recognition tasks and evaluation metrics along with our preliminary experimental results. We believe that this dataset will stimulate diverse aspects of computer vision research and help us to advance the CVER tasks in the years ahead.


international conference on computer vision | 2013

Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach

Arash Vahdat; Kevin J. Cannons; Greg Mori; Sangmin Oh; Ilseo Kim

We present a compositional model for video event detection. A video is modeled using a collection of both global and segment-level features and kernel functions are employed for similarity comparisons. The locations of salient, discriminative video segments are treated as a latent variable, allowing the model to explicitly ignore portions of the video that are unimportant for classification. A novel, multiple kernel learning (MKL) latent support vector machine (SVM) is defined, that is used to combine and re-weight multiple feature types in a principled fashion while simultaneously operating within the latent variable framework. The compositional nature of the proposed model allows it to respond directly to the challenges of temporal clutter and intra-class variation, which are prevalent in unconstrained internet videos. Experimental results on the TRECVID Multimedia Event Detection 2011 (MED11) dataset demonstrate the efficacy of the method.


machine vision applications | 2014

Multimedia event detection with multimodal feature fusion and temporal concept localization

Sangmin Oh; Scott McCloskey; Ilseo Kim; Arash Vahdat; Kevin J. Cannons; Hossein Hajimirsadeghi; Greg Mori; A. G. Amitha Perera; Megha Pandey; Jason J. Corso

We present a system for multimedia event detection. The developed system characterizes complex multimedia events based on a large array of multimodal features, and classifies unseen videos by effectively fusing diverse responses. We present three major technical innovations. First, we explore novel visual and audio features across multiple semantic granularities, including building, often in an unsupervised manner, mid-level and high-level features upon low-level features to enable semantic understanding. Second, we show a novel Latent SVM model which learns and localizes discriminative high-level concepts in cluttered video sequences. In addition to improving detection accuracy beyond existing approaches, it enables a unique summary for every retrieval by its use of high-level concepts and temporal evidence localization. The resulting summary provides some transparency into why the system classified the video as it did. Finally, we present novel fusion learning algorithms and our methodology to improve fusion learning under limited training data condition. Thorough evaluation on a large TRECVID MED 2011 dataset showcases the benefits of the presented system.


international conference on computer vision | 2012

Explicit performance metric optimization for fusion-based video retrieval

Ilseo Kim; Sangmin Oh; Byungki Byun; A. G. Amitha Perera; Chin-Hui Lee

We present a learning framework for fusion-based video retrieval system, which explicitly optimizes given performance metrics. Real-world computer vision systems serve sophisticated user needs, and domain-specific performance metrics are used to monitor the success of such systems. However, the conventional approach for learning under such circumstances is to blindly minimize standard error rates and hope the targeted performance metrics improve, which is clearly suboptimal. In this work, a novel scheme to directly optimize such targeted performance metrics during learning is developed and presented. Our experimental results on two large consumer video archives are promising and showcase the benefits of the proposed approach.


conference on automation science and engineering | 2013

Automatic building exterior mapping using multilayer feature graphs

Yan Lu; Dezhen Song; Yiliang Xu; A. G. Amitha Perera; Sangmin Oh

We develop algorithms that can assist robot to perform building exterior mapping, which is important for building energy retrofitting. In this task, a robot needs to identify building facades in its localization and mapping process, which in turn can be used to assist robot navigation. Existing localization and mapping algorithms rely on low level features such as point clouds and line segments and cannot be directly applied to our task. We attack this problem by employing a multiple layer feature graph (MFG), which contains five different features ranging from raw key points to planes and vanishing points in 3D, in an extended Kalman filter (EKF) framework. We analyze how errors are generated and propagated in the MFG construction process, and then apply MFG data as observations for the EKF to map building facades. We have implemented and tested our MFG-EKF method at three different sites. Experimental results show that building facades are successfully constructed in modern urban environments with mean relative errors of plane depth less than 4.66%.


international conference on pattern recognition | 2010

Unsupervised Learning of Activities in Video Using Scene Context

Sangmin Oh; Anthony Hoogs

Unsupervised learning of semantic activities from video collected over time is an important problem for visual surveillance and video scene understanding. Our goal is to cluster tracks into semantically interpretable activity models that are independent of scene locations; most previous work in video scene understanding is focused on learning location-specific normalcy models. Location-independent models can be used to detect instances of the same activity anywhere in the scene, or even across multiple scenes. Our insight for this unsupervised activity learning problem is to incorporate scene context to characterize the behavior of every track. By scene context, we mean local scene structures, such as building entrances, parking spots and roads, that moving objects frequently interact with. Each track is attributed with large number of potentially useful features that capture the relationships and interactions with a set of existing scene context elements. Once feature vectors are obtained, tracks are grouped in this feature space using state-of-the-art clustering techniques, without considering scene location. Experiments are conducted on webcam video of a complex scene, with many interacting objects and very noisy tracks resulting from low frame rates and poor image quality. Our results demonstrate that location-independent and semantically interpretable groupings can be successfully obtained using unsupervised clustering methods, and that the models are superior to standard location-dependent clustering.


british machine vision conference | 2012

A Videography Analysis Framework for Video Retrieval and Summarization.

Kang Li; Sangmin Oh; A. G. Amitha Perera; Yun Fu

Abstract : In this work, we focus on developing features and approaches to represent and analyze videography styles in unconstrained videos. By unconstrained videos, we mean typical consumer videos with significant content complexity and diverse editing artifacts, mostly with long duration. Our approach constructs a videography dictionary, which is used to represent each video clip as a series of varying videography words. In addition to conventional features such as camera motion and foreground object motion, two novel features including motion correlation and scale information are introduced to characterize videography. Then, we show that unique videography signatures from different events can be automatically identified, using statistical analysis methods. For practical applications, we explore the use of videography analysis for content-based video retrieval and video summarization. We compare our approaches with other methods on a large unconstrained video dataset, and demonstrate that our approach benefits video analysis.


international conference on computer vision | 2013

Learning Non-linear Calibration for Score Fusion with Applications to Image and Video Classification

Tianyang Ma; Sangmin Oh; A. G. Amitha Perera; Longin Jan Latecki

Image and video classification is a challenging task, particularly for complex real-world data. Recent work indicates that using multiple features can improve classification significantly, and that score fusion is effective. In this work, we propose a robust score fusion approach which learns non-linear score calibrations for multiple base classifier scores. Through calibration, original base classifiers scores are adjusted to reflect their true intrinsic accuracy and confidence, relative to the other base classifiers, in such a way that calibrated scores can be simply added to yield accurate fusion results. Our approach provides a unified approach to jointly solve score normalization and fusion classifier learning. The learning problem is solved within a max-margin framework to globally optimize performance metric on the training set. Experiments demonstrate the strength and robustness of the proposed method.


advanced video and signal based surveillance | 2011

AVSS 2011 demo session: A large-scale benchmark dataset for event recognition in surveillance video

Sangmin Oh; Anthony Hoogs; A. G. Amitha Perera; Naresh P. Cuntoor; Chia-Chih Chen; Jong Taek Lee; Saurajit Mukherjee; Jake K. Aggarwal; Hyungtae Lee; Larry S. Davis; Eran Swears; Xiaoyang Wang; Qiang Ji; Kishore K. Reddy; Mubarak Shah; Carl Vondrick; Hamed Pirsiavash; Deva Ramanan; Jenny Yuen; Antonio Torralba; Bi Song; Anesco Fong; Amit K. Roy-Chowdhury; Mita Desai

Summary form only given. We present a concept for automatic construction site monitoring by taking into account 4D information (3D over time), that is acquired from highly-overlapping digital aerial images. On the one hand todays maturity of flying micro aerial vehicles (MAVs) enables a low-cost and an efficient image acquisition of high-quality data that maps construction sites entirely from many varying viewpoints. On the other hand, due to low-noise sensors and high redundancy in the image data, recent developments in 3D reconstruction workflows have benefited the automatic computation of accurate and dense 3D scene information. Having both an inexpensive high-quality image acquisition and an efficient 3D analysis workflow enables monitoring, documentation and visualization of observed sites over time with short intervals. Relating acquired 4D site observations, composed of color, texture, geometry over time, largely supports automated methods toward full scene understanding, the acquisition of both the change and the construction sites progress.


Pattern Recognition Letters | 2016

Image-oriented economic perspective on user behavior in multimedia social forums

Sangmin Oh; Megha Pandey; Ilseo Kim; Anthony Hoogs

Clustering diverse images shared on social forums produces meaningful groups.User behavior patterns on social media can be characterized with image distributions.Users exhibit diverse preference patterns for images they engage with.Users often exhibit distinct patterns between supply and consumption behavior.Salient users can be identified by non-parametric statistical anomaly analysis. This work addresses the novel problem of analyzing individual users behavioral patterns regarding images shared on social forums. In particular, we present an image-oriented economic perspective: the first activity mode of sharing or posting on social forums is interpreted as supply; and another mode of activity such as commenting on images is interpreted as consumption. First, we show that, despite the significant diversity, images in social forums can be clustered into semantically meaningful groups using modern computer vision techniques. Then, users supply and consumption profiles are characterized based on the distribution of images which they engage with. We then present various statistical analyses on real-world data, which show that there is significant difference between the images users supply and consume. This finding suggests that the flow of images on social network should be modeled as a bi-directional graph. In addition, we introduce a statistical approach to identify users with salient profiles. This approach can be useful for social multimedia services to block users with undesirable behavior or to identify viral content and promote it.

Collaboration


Dive into the Sangmin Oh's collaboration.

Top Co-Authors

Avatar

Ilseo Kim

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Arash Vahdat

Simon Fraser University

View shared research outputs
Top Co-Authors

Avatar

Greg Mori

Simon Fraser University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chin-Hui Lee

Georgia Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge