Menghua Zhai
University of Kentucky
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Menghua Zhai.
computer vision and pattern recognition | 2016
Menghua Zhai; Scott Workman; Nathan Jacobs
We propose a novel method for detecting horizontal vanishing points and the zenith vanishing point in man-made environments. The dominant trend in existing methods is to first find candidate vanishing points, then remove outliers by enforcing mutual orthogonality. Our method reverses this process: we propose a set of horizon line candidates and score each based on the vanishing points it contains. A key element of our approach is the use of global image context, extracted with a deep convolutional network, to constrain the set of candidates under consideration. Our method does not make a Manhattan-world assumption and can operate effectively on scenes with only a single horizontal vanishing point. We evaluate our approach on three benchmark datasets and achieve state-of the-art performance on each. In addition, our approach is significantly faster than the previous best method.
british machine vision conference | 2016
Scott Workman; Menghua Zhai; Nathan Jacobs
The horizon line is an important contextual attribute for a wide variety of image understanding tasks. As such, many methods have been proposed to estimate its location from a single image. These methods typically require the image to contain specific cues, such as vanishing points, coplanar circles, and regular textures, thus limiting their real-world applicability. We introduce a large, realistic evaluation dataset, Horizon Lines in the Wild (HLW), containing natural images with labeled horizon lines. Using this dataset, we investigate the application of convolutional neural networks for directly estimating the horizon line, without requiring any explicit geometric constraints or other special cues. An extensive evaluation shows that using our CNNs, either in isolation or in conjunction with a previous geometric approach, we achieve state-of-the-art results on the challenging HLW dataset and two existing benchmark datasets.
international conference on image processing | 2015
Scott Workman; Connor Greenwell; Menghua Zhai; Ryan Baltenberger; Nathan Jacobs
Estimating the focal length of an image is an important preprocessing step for many applications. Despite this, existing methods for single-view focal length estimation are limited in that they require particular geometric calibration objects, such as orthogonal vanishing points, co-planar circles, or a calibration grid, to occur in the field of view. In this work, we explore the application of a deep convolutional neural network, trained on natural images obtained from Internet photo collections, to directly estimate the focal length using only raw pixel intensities as input features. We present quantitative results that demonstrate the ability of our technique to estimate the focal length with comparisons against several baseline methods, including an automatic method which uses orthogonal vanishing points.
computer vision and pattern recognition | 2017
Menghua Zhai; Zachary Bessinger; Scott Workman; Nathan Jacobs
We introduce a novel strategy for learning to extract semantically meaningful features from aerial imagery. Instead of manually labeling the aerial imagery, we propose to predict (noisy) semantic features automatically extracted from co-located ground imagery. Our network architecture takes an aerial image as input, extracts features using a convolutional neural network, and then applies an adaptive transformation to map these features into the ground-level perspective. We use an end-to-end learning approach to minimize the difference between the semantic segmentation extracted directly from the ground image and the semantic segmentation predicted solely based on the aerial image. We show that a model learned using this strategy, with no additional training, is already capable of rough semantic labeling of aerial imagery. Furthermore, we demonstrate that by finetuning this model we can achieve more accurate semantic segmentation than two baseline initialization strategies. We use our network to address the task of estimating the geolocation and geo-orientation of a ground image. Finally, we show how features extracted from an aerial image can be used to hallucinate a plausible ground-level panorama.
workshop on applications of computer vision | 2016
Tawfiq Salem; Scott Workman; Menghua Zhai; Nathan Jacobs
Given an image, we propose to use the appearance of people in the scene to estimate when the picture was taken. There are a wide variety of cues that can be used to address this problem. Most previous work has focused on low-level image features, such as color and vignetting. Recent work on image dating has used more semantic cues, such as the appearance of automobiles and buildings. We extend this line of research by focusing on human appearance. Our approach, based on a deep convolutional neural network, allows us to more deeply explore the relationship between human appearance and time. We find that clothing, hair styles, and glasses can all be informative features. To support our analysis, we have collected a new dataset containing images of people from many high school yearbooks, covering the years 1912-2014. While not a complete solution to the problem of image dating, our results show that human appearance is strongly related to time and that semantic information can be a useful cue.
workshop on applications of computer vision | 2016
Ryan Baltenberger; Menghua Zhai; Connor Greenwell; Scott Workman; Nathan Jacobs
We propose the use of deep convolutional neural networks to estimate the transient attributes of a scene from a single image. Transient scene attributes describe both the objective conditions, such as the weather, time of day, and the season, and subjective properties of a scene, such as whether or not the scene seems busy. Recently, convolutional neural networks have been used to achieve state-of-the-art results for many vision problems, from object detection to scene classification, but have not previously been used for estimating transient attributes. We compare several methods for adapting an existing network architecture and present state-of-the-art results on two benchmark datasets. Our method is more accurate and significantly faster than previous methods, enabling real-world applications.
international conference on image processing | 2014
Feiyu Shi; Menghua Zhai; Drew Duncan; Nathan Jacobs
Principal component analysis (PCA) is a widely used technique for dimensionality reduction which assumes that the input data can be represented as a collection of fixed-length vectors. Many real-world datasets, such as those constructed from Internet photo collections, do not satisfy this assumption. A natural approach to addressing this problem is to first coerce all input data to a fixed size, and then use standard PCA techniques. This approach is problematic because it either introduces artifacts when we must upsample an image, or loses information when we must downsample an image. We propose MPCA, an approach for estimating the PCA decomposition from multi-sized input data which avoids this initial resizing step. We demonstrate the effectiveness of this approach on simulated and real-world datasets.
international conference on pattern recognition | 2014
Menghua Zhai; Feiyu Shi; Drew Duncan; Nathan Jacobs
Principal component analysis (PCA) is used in diverse settings for dimensionality reduction. If data elements are all the same size, there are many approaches to estimating the PCA decomposition of the dataset. However, many datasets contain elements of different sizes that must be coerced into a fixed size before analysis. Such approaches introduce errors into the resulting PCA decomposition. We introduce CO-MPCA, a nonlinear method of directly estimating the PCA decomposition from datasets with elements of different sizes. We compare our method with two baseline approaches on three datasets: a synthetic vector dataset, a synthetic image dataset, and a real dataset of color histograms extracted from surveillance video. We provide quantitative and qualitative evidence that using CO-MPCA gives a more accurate estimate of the PCA basis.
european conference on computer vision | 2018
Samuel Schulter; Menghua Zhai; Nathan Jacobs; Manmohan Chandraker
Given a single RGB image of a complex outdoor road scene in the perspective view, we address the novel problem of estimating an occlusion-reasoned semantic scene layout in the top-view. This challenging problem not only requires an accurate understanding of both the 3D geometry and the semantics of the visible scene, but also of occluded areas. We propose a convolutional neural network that learns to predict occluded portions of the scene layout by looking around foreground objects like cars or pedestrians. But instead of hallucinating RGB values, we show that directly predicting the semantics and depths in the occluded areas enables a better transformation into the top-view. We further show that this initial top-view representation can be significantly enhanced by learning priors and rules about typical road layouts from simulated or, if available, map data. Crucially, training our model does not require costly or subjective human annotations for occluded areas or the top-view, but rather uses readily available annotations for standard semantic segmentation. We extensively evaluate and analyze our approach on the KITTI and Cityscapes data sets.
international conference on image processing | 2016
Menghua Zhai; Scott Workman; Nathan Jacobs
We address the problem of single-image geo-calibration, in which an estimate of the geographic location, viewing direction and field of view is sought for the camera that captured an image. The dominant approach to this problem is to match features of the query image, using color and texture, against a reference database of nearby ground imagery. However, this fails when such imagery is not available. We propose to overcome this limitation by matching against a geographic database that contains the locations of known objects, such as houses, roads and bodies of water. Since we are unable to find one-to-one correspondences between image locations and objects in our database, we model the problem probabilistically based on the geometric configuration of multiple such weak correspondences. We propose a Markov Chain Monte Carlo (MCMC) sampling approach to approximate the underlying probability distribution over the full geo-calibration of the camera.