Is this you? Create Your Porfile

Gary B. Huang

University of Massachusetts Amherst

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gary B. Huang is active.

Explore More

Publication

Featured researches published by Gary B. Huang.

international conference on computer vision | 2007

Unsupervised Joint Alignment of Complex Images

Gary B. Huang; Vidit Jain; Erik G. Learned-Miller

Many recognition algorithms depend on careful positioning of an object into a canonical pose, so the position of features relative to a fixed coordinate system can be examined. Currently, this positioning is done either manually or by training a class-specialized learning algorithm with samples of the class that have been hand-labeled with parts or poses. In this paper, we describe a novel method to achieve this positioning using poorly aligned examples of a class with no additional labeling. Given a set of unaligned examplars of a class, such as faces, we automatically build an alignment mechanism, without any additional labeling of parts or poses in the data set. Using this alignment mechanism, new members of the class, such as faces resulting from a face detector, can be precisely aligned for the recognition process. Our alignment method improves performance on a face recognition task, both over unaligned images and over images aligned with a face alignment algorithm specifically developed for and trained on hand-labeled face images. We also demonstrate its use on an entirely different class of objects (cars), again without providing any information about parts or pose to the learning algorithm.

computer vision and pattern recognition | 2012

Learning hierarchical representations for face verification with convolutional deep belief networks

Gary B. Huang; Honglak Lee; Erik G. Learned-Miller

Most modern face recognition systems rely on a feature representation given by a hand-crafted image descriptor, such as Local Binary Patterns (LBP), and achieve improved performance by combining several such representations. In this paper, we propose deep learning as a natural source for obtaining additional, complementary representations. To learn features in high-resolution images, we make use of convolutional deep belief networks. Moreover, to take advantage of global structure in an object class, we develop local convolutional restricted Boltzmann machines, a novel convolutional learning model that exploits the global structure by not assuming stationarity of features across the image, while maintaining scalability and robustness to small misalignments. We also present a novel application of deep learning to descriptors other than pixel intensity values, such as LBP. In addition, we compare performance of networks trained using unsupervised learning against networks with random filters, and empirically show that learning weights not only is necessary for obtaining good multilayer representations, but also provides robustness to the choice of the network architecture parameters. Finally, we show that a recognition system using only representations obtained from deep learning can achieve comparable accuracy with a system using a combination of hand-crafted image descriptors. Moreover, by combining these representations, we achieve state-of-the-art results on a real-world face verification database.

computer vision and pattern recognition | 2008

Towards unconstrained face recognition

Gary B. Huang; Manjunath Narayana; Erik G. Learned-Miller

In this paper, we argue that the most difficult face recognition problems (unconstrained face recognition) will be solved by simultaneously leveraging the solutions to multiple vision problems including segmentation, alignment, pose estimation, and the estimation of other hidden variables such as gender and hair color. While in theory a single unified principle could solve all these problems simultaneously in a giant hidden variable model, we believe that such an approach will be computationally, and more importantly, statistically, intractable. Instead, we promote studying the interactions among mid-level vision features, such as segmentations and pose estimates, as a route toward solving very difficult recognition problems. In this paper, we discuss and provide results showing how pose and face segmentations mutually influence each other, and provide a surprisingly simple method for estimating pose from segmentations.

computer vision and pattern recognition | 2010

Improving state-of-the-art OCR through high-precision document-specific modeling

Andrew Kae; Gary B. Huang; Carl Doersch; Erik G. Learned-Miller

Optical character recognition (OCR) remains a difficult problem for noisy documents or documents not scanned at high resolution. Many current approaches rely on stored font models that are vulnerable to cases in which the document is noisy or is written in a font dissimilar to the stored fonts. We address these problems by learning character models directly from the document itself, rather than using pre-stored font models. This method has had some success in the past, but we are able to achieve substantial improvement in error reduction through a novel method for creating nearly error-free document-specific training data and building character appearance models from this data. In particular, we first use the state-of-the-art OCR system Tesseract to produce an initial translation. Then, our method identifies a subset of words that we have high confidence have been recognized correctly and uses this subset to bootstrap document-specific character models. We present theoretical justification that a word in the selected subset is very unlikely to be incorrectly recognized, and empirical results on a data set of difficult historical newspaper scans demonstrating that we make only two errors in 56 documents. We then relax the theoretical constraint in order to create a larger training set, and using document-specific character models generated from this data, we are able to reduce the error over properly segmented characters by 34.1% overall from the initial Tesseract translation.

computer vision and pattern recognition | 2010

Learning class-specific image transformations with higher-order Boltzmann machines

Gary B. Huang; Erik G. Learned-Miller

In this paper, we examine the problem of learning a representation of image transformations specific to a complex object class, such as faces. Learning such a representation for a specific object class would allow us to perform improved, pose-invariant visual verification, such as unconstrained face verification. We build off of the method of using factored higher-order Boltzmann machines to model such image transformations. Using this approach will potentially enable us to use the model as one component of a larger deep architecture. This will allow us to use the feature information in an ordinary deep network to perform better modeling of transformations, and to infer pose estimates from the hidden representation. We focus on applying these higher-order Boltzmann machines to the NORB 3D objects data set and the Labeled Faces in the Wild face data set. We first show two different approaches to using this method on these object classes, demonstrating that while some useful transformation information can be extracted, ultimately the simple direct application of these models to higher-resolution, complex object classes is insufficient to achieve improved visual verification performance. Instead, we believe that this method should be integrated into a larger deep architecture, and show initial results using the higher-order Boltzmann machine as the second layer of a deep architecture, above a first layer convolutional RBM.

international conference on document analysis and recognition | 2007

Cryptogram Decoding for OCR Using Numerization Strings

Gary B. Huang; Erik G. Learned-Miller; Andrew McCallum

OCR systems for printed documents typically require large numbers of font styles and character models to work well. When given an unseen font, performance degrades even in the absence of noise. In this paper, we perform OCR in an unsupervised fashion without using any character models by using a cryptogram decoding algorithm. We present results on real and artificial OCR data.

neural information processing systems | 2012