Is this you? Create Your Porfile

Ilseo Kim

Georgia Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ilseo Kim is active.

Explore More

Publication

Featured researches published by Ilseo Kim.

international conference on computer vision | 2013

Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach

Arash Vahdat; Kevin J. Cannons; Greg Mori; Sangmin Oh; Ilseo Kim

We present a compositional model for video event detection. A video is modeled using a collection of both global and segment-level features and kernel functions are employed for similarity comparisons. The locations of salient, discriminative video segments are treated as a latent variable, allowing the model to explicitly ignore portions of the video that are unimportant for classification. A novel, multiple kernel learning (MKL) latent support vector machine (SVM) is defined, that is used to combine and re-weight multiple feature types in a principled fashion while simultaneously operating within the latent variable framework. The compositional nature of the proposed model allows it to respond directly to the challenges of temporal clutter and intra-class variation, which are prevalent in unconstrained internet videos. Experimental results on the TRECVID Multimedia Event Detection 2011 (MED11) dataset demonstrate the efficacy of the method.

machine vision applications | 2014

Multimedia event detection with multimodal feature fusion and temporal concept localization

Sangmin Oh; Scott McCloskey; Ilseo Kim; Arash Vahdat; Kevin J. Cannons; Hossein Hajimirsadeghi; Greg Mori; A. G. Amitha Perera; Megha Pandey; Jason J. Corso

We present a system for multimedia event detection. The developed system characterizes complex multimedia events based on a large array of multimodal features, and classifies unseen videos by effectively fusing diverse responses. We present three major technical innovations. First, we explore novel visual and audio features across multiple semantic granularities, including building, often in an unsupervised manner, mid-level and high-level features upon low-level features to enable semantic understanding. Second, we show a novel Latent SVM model which learns and localizes discriminative high-level concepts in cluttered video sequences. In addition to improving detection accuracy beyond existing approaches, it enables a unique summary for every retrieval by its use of high-level concepts and temporal evidence localization. The resulting summary provides some transparency into why the system classified the video as it did. Finally, we present novel fusion learning algorithms and our methodology to improve fusion learning under limited training data condition. Thorough evaluation on a large TRECVID MED 2011 dataset showcases the benefits of the presented system.

international conference on computer vision | 2012

Explicit performance metric optimization for fusion-based video retrieval

Ilseo Kim; Sangmin Oh; Byungki Byun; A. G. Amitha Perera; Chin-Hui Lee

We present a learning framework for fusion-based video retrieval system, which explicitly optimizes given performance metrics. Real-world computer vision systems serve sophisticated user needs, and domain-specific performance metrics are used to monitor the success of such systems. However, the conventional approach for learning under such circumstances is to blindly minimize standard error rates and hope the targeted performance metrics improve, which is clearly suboptimal. In this work, a novel scheme to directly optimize such targeted performance metrics during learning is developed and presented. Our experimental results on two large consumer video archives are promising and showcase the benefits of the proposed approach.

international conference on acoustics, speech, and signal processing | 2009

A detection-based approach to broadcast news video story segmentation

Chengyuan Ma; Byungki Byun; Ilseo Kim; Chin-Hui Lee

A detection-based paradigm decomposes a complex system into small pieces, solves each subproblem one by one, and combines the collected evidence to obtain a final solution. In this study of video story segmentation, a set of key events are first detected from heterogeneous multimedia signal sources, including a large scale concept ontology for images, text generated from automatic speech recognition systems, features extracted from audio track, and high-level video transcriptions. Then a discriminative evidence fusion scheme is investigated. We use the maximum figure-of-merit learning approach to directly optimize the performance metrics used in system evaluation, such as precision, recall, and F1 measure. Some experimental evaluations conducted on the TRECVID 2003 dataset demonstrate the effectiveness of the proposed detection-based paradigm. The proposed framework facilitates flexible combination and extensions of event detector design and evidence fusion to enable other related video applications.

international conference on acoustics, speech, and signal processing | 2009

A hierarchical grid feature representation framework for automatic image annotation

Ilseo Kim; Chin-Hui Lee

We propose a hierarchical-grid (HG) feature analysis framework for representing images in automatic image annotation (AIA). We explore the properties of codebooks constructed with different-sized grids in image sub-blocks, and co-occurrence relationship between VQ codewords constructed from different grid systems. The proposed HG approach is evaluated on the TRECVID 2005 data set using classifiers obtained with maximal figure-of-merit discriminative training. With multi-level and cross-level grid systems incorporating bigram information within and between higher and lower grid levels, we show that the AIA performance can be significantly improved. For 20 selected concepts from the 39-concept LSCOM-Lite annotation set, we achieve a best F1 in almost all the concepts. The overall performance improvement with the combined multi-level and cross-level grid systems over the best single-size grid system in micro F1 is about 12.1%.

signal processing systems | 2014

An Efficient Gradient-based Approach to Optimizing Average Precision Through Maximal Figure-of-Merit Learning

Ilseo Kim; Chin-Hui Lee

We propose an efficient algorithm that directly optimizes a ranking performance measure, with a focus on class average precision (AP). Instead of using pair-wise ranking approximation in defining a loss function by conventional approaches, we use an efficient gradient-based approach that approximates a discrete ranking performance measure. In particular, AP is considered as a staircase function with respect to each individual sample score after rank ordering is applied to all samples. Then, a combination of sigmoid functions is applied to approximate the staircase AP function as a continuous and differntiable function of the model parameters used to compute the sample scores. Compared to the use of pair-wise rankings, the proposed approach substantially reduces the computational complexity to a manageable level when estimating model parameters with a gradient descent algorithm. In terms of explicitly optimizing a target performance metric, the proposed algorithm can be considered as an extension of maximal figure-of-merit (MFoM) learning to optimization of a ranking performance measure. Our experiments on two challenging image-retrieval datasets showcased the usefulness of the proposed framework in both improving AP and achieving learning efficiency.

international workshop on machine learning for signal processing | 2011

Optimization of average precision with Maximal Figure-of-Merit Learning

Ilseo Kim; Chin-Hui Lee

We propose an efficient algorithm to directly optimize class average precision (AP) with a Maximal Figure-of-Merit (MFoM) learning scheme. AP is considered as a staircase function with respect to each individual sample score after rank ordering is applied to all samples. A combination of sigmoid functions is then used to approximate AP as a continuously differentiable function of the classified parameters used to compute the sample scores. Compared to pair-wise ranking comparisons, the computational complexity of the proposed MFoM-AP learning algorithm can be substantially reduced when estimating classifier parameters with a gradient descent algorithm. Experiments on the TRECVID 2005 high-level feature extraction task showed that the proposed algorithm can effectively improve the mean average precision (MAP) over 39 concepts from a baseline performance of 0.4039 with MFoM maximizing F1 to 0.4274 with MFoM-AP, while showing significant impromvements for 12 concepts as more than 10%.

Pattern Recognition Letters | 2016

Image-oriented economic perspective on user behavior in multimedia social forums

Sangmin Oh; Megha Pandey; Ilseo Kim; Anthony Hoogs

Clustering diverse images shared on social forums produces meaningful groups.User behavior patterns on social media can be characterized with image distributions.Users exhibit diverse preference patterns for images they engage with.Users often exhibit distinct patterns between supply and consumption behavior.Salient users can be identified by non-parametric statistical anomaly analysis. This work addresses the novel problem of analyzing individual users behavioral patterns regarding images shared on social forums. In particular, we present an image-oriented economic perspective: the first activity mode of sharing or posting on social forums is interpreted as supply; and another mode of activity such as commenting on images is interpreted as consumption. First, we show that, despite the significant diversity, images in social forums can be clustered into semantically meaningful groups using modern computer vision techniques. Then, users supply and consumption profiles are characterized based on the distribution of images which they engage with. We then present various statistical analyses on real-world data, which show that there is significant difference between the images users supply and consume. This finding suggests that the flow of images on social network should be modeled as a bi-directional graph. In addition, we introduce a statistical approach to identify users with salient profiles. This approach can be useful for social multimedia services to block users with undesirable behavior or to identify viral content and promote it.

international conference on multimedia and expo | 2012

Per-Exemplar Fusion Learning for Video Retrieval and Recounting

Ilseo Kim; Sangmin Oh; A. G. Amitha Perera; Chin-Hui Lee

We propose a novel video retrieval framework based on an extension of per-exemplar learning [7]. Each training sample with multiple types of features (e.g., audio and visual) is regarded as an exemplar. For each exemplar, a localized per-exemplar distance function is learned and used to measure the similarity between itself and new test samples. Exemplars associate only with sufficiently similar test data, which accumulate to identify the data to be retrieved. In particular, for every exemplar, relevance of each feature type is discriminatively analyzed and the effect of less informative features is minimized during the fusion-based associations. In addition, we show that our framework can enable a rich set of recounting capabilities where the rationale for each retrieval result can be automatically described to users to aid their interaction with the system. We show that our system provides competitive retrieval accuracy against strong baseline methods, while adding the benefits of recounting.

conference of the international speech communication association | 2012