Kihyuk Sohn | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kihyuk Sohn is active.

Explore More

Publication

Featured researches published by Kihyuk Sohn.

european conference on computer vision | 2016

Attribute2Image: Conditional Image Generation from Visual Attributes

Xinchen Yan; Jimei Yang; Kihyuk Sohn; Honglak Lee

This paper investigates a novel problem of generating images from visual attributes. We model the image as a composite of foreground and background and develop a layered generative model with disentangled latent variables that can be learned end-to-end using a variational auto-encoder. We experiment with natural images of faces and birds and demonstrate that the proposed models are capable of generating realistic and diverse samples with disentangled latent representations. We use a general energy minimization algorithm for posterior inference of latent variables given novel images. Therefore, the learned generative models show excellent quantitative and visual results in the tasks of attribute-conditioned image reconstruction and completion.

international conference on computer vision | 2011

Efficient learning of sparse, distributed, convolutional feature representations for object recognition

Kihyuk Sohn; Dae Yon Jung; Honglak Lee; Alfred O. Hero

Informative image representations are important in achieving state-of-the-art performance in object recognition tasks. Among feature learning algorithms that are used to develop image representations, restricted Boltzmann machines (RBMs) have good expressive power and build effective representations. However, the difficulty of training RBMs has been a barrier to their wide use. To address this difficulty, we show the connections between mixture models and RBMs and present an efficient training method for RBMs that utilize these connections. To the best of our knowledge, this is the first work showing that RBMs can be trained with almost no hyperparameter tuning to provide classification performance similar to or significantly better than mixture models (e.g., Gaussian mixture models). Along with this efficient training, we evaluate the importance of convolutional training that can capture a larger spatial context with less redundancy, as compared to non-convolutional training. Overall, our method achieves state-of-the-art performance on both Caltech 101 / 256 datasets using a single type of feature.

computer vision and pattern recognition | 2015

Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction

Yuting Zhang; Kihyuk Sohn; Ruben Villegas; Gang Pan; Honglak Lee

Object detection systems based on the deep convolutional neural network (CNN) have recently made ground-breaking advances on several object detection benchmarks. While the features learned by these high-capacity neural networks are discriminative for categorization, inaccurate localization is still a major source of error for detection. Building upon high-capacity CNN architectures, we address the localization problem by 1) using a search algorithm based on Bayesian optimization that sequentially proposes candidate regions for an object bounding box, and 2) training the CNN with a structured loss that explicitly penalizes the localization inaccuracy. In experiments, we demonstrate that each of the proposed methods improves the detection performance over the baseline method on PASCAL VOC 2007 and 2012 datasets. Furthermore, two methods are complementary and significantly outperform the previous state-of-the-art when combined.

computer vision and pattern recognition | 2013

Augmenting CRFs with Boltzmann Machine Shape Priors for Image Labeling

Andrew Kae; Kihyuk Sohn; Honglak Lee; Erik G. Learned-Miller

Conditional random fields (CRFs) provide powerful tools for building models to label image segments. They are particularly well-suited to modeling local interactions among adjacent regions (e.g., super pixels). However, CRFs are limited in dealing with complex, global (long-range) interactions between regions. Complementary to this, restricted Boltzmann machines (RBMs) can be used to model global shapes produced by segmentation models. In this work, we present a new model that uses the combined power of these two network types to build a state-of-the-art labeler. Although the CRF is a good baseline labeler, we show how an RBM can be added to the architecture to provide a global shape bias that complements the local modeling provided by the CRF. We demonstrate its labeling performance for the parts of complex face images from the Labeled Faces in the Wild data set. This hybrid model produces results that are both quantitatively and qualitatively better than the CRF alone. In addition to high-quality labeling results, we demonstrate that the hidden units in the RBM portion of our model can be interpreted as face attributes that have been learned without any attribute-level supervision.

international conference on computer vision | 2017

Towards Large-Pose Face Frontalization in the Wild

Xi Yin; Xiang Yu; Kihyuk Sohn; Xiaoming Liu; Manmohan Chandraker

Despite recent advances in face recognition using deep learning, severe accuracy drops are observed for large pose variations in unconstrained environments. Learning pose-invariant features is one solution, but needs expensively labeled large-scale data and carefully designed feature learning algorithms. In this work, we focus on frontalizing faces in the wild under various head poses, including extreme profile views. We propose a novel deep 3D Morphable Model (3DMM) conditioned Face Frontalization Generative Adversarial Network (GAN), termed as FF-GAN, to generate neutral head pose face images. Our framework differs from both traditional GANs and 3DMM based modeling. Incorporating 3DMM into the GAN structure provides shape and appearance priors for fast convergence with less training data, while also supporting end-to-end training. The 3DMM-conditioned GAN employs not only the discriminator and generator loss but also a new masked symmetry loss to retain visual quality under occlusions, besides an identity loss to recover high frequency information. Experiments on face recognition, landmark localization and 3D reconstruction consistently show the advantage of our frontalization method on faces in the wild datasets. 1

international conference on communications | 2012

An interpretation of the Cover and Leung capacity region for the MAC with feedback through stochastic control

Achilleas Anastasopoulos; Kihyuk Sohn

We consider the problem of communication over a multiple access channel (MAC) with noiseless feedback. A single-letter characterization of the capacity of this channel is not currently known in general. We formulate the MAC with feedback capacity problem as a stochastic control problem for a special class of channels for which the capacity is known to be the single-letter expression given by Cover and Leung. This approach has been recently successful in finding channel capacity for point-to-point channels with noiseless feedback but has not yet been fruitful in the study of multi-user communication systems. Our interpretation provides an understanding of the role of auxiliary random variables and can also hint at on-line capacity-achieving transmission schemes.

siam international conference on data mining | 2016

Discriminative Training of Structured Dictionaries via Block Orthogonal Matching Pursuit.

Wenling Shang; Kihyuk Sohn; Honglak Lee; Anna C. Gilbert

It is well established that high-level representations learned via sparse coding are effective for many machine learning applications such as denoising and classification. In addition to being reconstructive, sparse representations that are discriminative and invariant can further help with such applications. In order to achieve these desired properties, this paper proposes a new framework that discriminatively trains structured dictionaries via block orthogonal matching pursuit. Specifically, the dictionary atoms are assumed to be organized into blocks. Distinct classes correspond to distinct blocks of dictionary atoms; however, our algorithm can handle the case where multiple classes share blocks. We provide theoretical justification and empirical evaluation of our method.

neural information processing systems | 2015