Is this you? Create Your Porfile

Tamara L. Berg

University of North Carolina at Chapel Hill

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tamara L. Berg is active.

Explore More

Publication

Featured researches published by Tamara L. Berg.

computer vision and pattern recognition | 2005

Shape matching and object recognition using low distortion correspondences

Alexander C. Berg; Tamara L. Berg; Jitendra Malik

We approach recognition in the framework of deformable shape matching, relying on a new algorithm for finding correspondences between feature points. This algorithm sets up correspondence as an integer quadratic programming problem, where the cost function has terms based on similarity of corresponding geometric blur point descriptors as well as the geometric distortion between pairs of corresponding feature points. The algorithm handles outliers, and thus enables matching of exemplars to query images in the presence of occlusion and clutter. Given the correspondences, we estimate an aligning transform, typically a regularized thin plate spline, resulting in a dense correspondence between the two shapes. Object recognition is then handled in a nearest neighbor framework where the distance between exemplar and query is the matching cost between corresponding points. We show results on two datasets. One is the Caltech 101 dataset (Fei-Fei, Fergus and Perona), an extremely challenging dataset with large intraclass variation. Our approach yields a 48% correct classification rate, compared to Fei-Fei et al s 16%. We also show results for localizing frontal and profile faces that are comparable to special purpose approaches tuned to faces.

computer vision and pattern recognition | 2011

Baby talk: Understanding and generating simple image descriptions

Girish Kulkarni; Visruth Premraj; Sagnik Dhar; Siming Li; Yejin Choi; Alexander C. Berg; Tamara L. Berg

We posit that visually descriptive language offers computer vision researchers both information about the world, and information about how people describe the world. The potential benefit from this source is made more significant due to the enormous amount of language data easily available today. We present a system to automatically generate natural language descriptions from images that exploits both statistics gleaned from parsing large quantities of text data and recognition algorithms from computer vision. The system is very effective at producing relevant sentences for images. It also generates descriptions that are notably more true to the specific image content than previous work.

european conference on computer vision | 2010

Automatic attribute discovery and characterization from noisy web data

Tamara L. Berg; Alexander C. Berg; Jonathan Shih

It is common to use domain specific terminology - attributes - to describe the visual appearance of objects. In order to scale the use of these describable visual attributes to a large number of categories, especially those not well studied by psychologists or linguists, it will be necessary to find alternative techniques for identifying attribute vocabularies and for learning to recognize attributes without hand labeled training data. We demonstrate that it is possible to accomplish both these tasks automatically by mining text and image data sampled from the Internet. The proposed approach also characterizes attributes according to their visual representation: global or local, and type: color, texture, or shape. This work focuses on discovering attributes and their visual appearance, and is as agnostic as possible about the textual description.

computer vision and pattern recognition | 2011

High level describable attributes for predicting aesthetics and interestingness

Sagnik Dhar; Vicente Ordonez; Tamara L. Berg

With the rise in popularity of digital cameras, the amount of visual data available on the web is growing exponentially. Some of these pictures are extremely beautiful and aesthetically pleasing, but the vast majority are uninteresting or of low quality. This paper demonstrates a simple, yet powerful method to automatically select high aesthetic quality images from large image collections. Our aesthetic quality estimation method explicitly predicts some of the possible image cues that a human might use to evaluate an image and then uses them in a discriminative approach. These cues or high level describable image attributes fall into three broad types: 1) compositional attributes related to image layout or configuration, 2) content attributes related to the objects or scene types depicted, and 3) sky-illumination attributes related to the natural lighting conditions. We demonstrate that an aesthetics classifier trained on these describable attributes can provide a significant improvement over baseline methods for predicting human quality judgments. We also demonstrate our method for predicting the “interestingness” of Flickr photos, and introduce a novel problem of estimating query specific “interestingness”.

computer vision and pattern recognition | 2012

Parsing clothing in fashion photographs

Kota Yamaguchi; M. Hadi Kiapour; Luis E. Ortiz; Tamara L. Berg

In this paper we demonstrate an effective method for parsing clothing in fashion photographs, an extremely challenging problem due to the large number of possible garment items, variations in configuration, garment appearance, layering, and occlusion. In addition, we provide a large novel dataset and tools for labeling garment items, to enable future research on clothing estimation. Finally, we present intriguing initial results on using clothing estimates to improve pose identification, and demonstrate a prototype application for pose-independent visual garment retrieval.

computer vision and pattern recognition | 2011

Who are you with and where are you going

Kota Yamaguchi; Alexander C. Berg; Luis E. Ortiz; Tamara L. Berg

We propose an agent-based behavioral model of pedestrians to improve tracking performance in realistic scenarios. In this model, we view pedestrians as decision-making agents who consider a plethora of personal, social, and environmental factors to decide where to go next. We formulate prediction of pedestrian behavior as an energy minimization on this model. Two of our main contributions are simple, yet effective estimates of pedestrian destination and social relationships (groups). Our final contribution is to incorporate these hidden properties into an energy formulation that results in accurate behavioral prediction. We evaluate both our estimates of destination and grouping, as well as our accuracy at prediction and tracking against state of the art behavioral model and show improvements, especially in the challenging observational situation of infrequent appearance observations–something that might occur in thousands of webcams available on the Internet.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2013

BabyTalk: Understanding and Generating Simple Image Descriptions

Girish Kulkarni; Visruth Premraj; Vicente Ordonez; Sagnik Dhar; Siming Li; Yejin Choi; Alexander C. Berg; Tamara L. Berg

We present a system to automatically generate natural language descriptions from images. This system consists of two parts. The first part, content planning, smooths the output of computer vision-based detection and recognition algorithms with statistics mined from large pools of visually descriptive text to determine the best content words to use to describe an image. The second step, surface realization, chooses words to construct natural language sentences based on the predicted content and general statistics from natural language. We present multiple approaches for the surface realization step and evaluate each using automatic measures of similarity to human generated reference descriptions. We also collect forced choice human evaluations between descriptions from the proposed generation system and descriptions from competing approaches. The proposed system is very effective at producing relevant sentences for images. It also generates descriptions that are notably more true to the specific image content than previous work.

Lecture Notes in Computer Science | 2006

Dataset Issues in Object Recognition

Jean Ponce; Tamara L. Berg; Mark Everingham; David A. Forsyth; Martial Hebert; Svetlana Lazebnik; Marcin Marszalek; Cordelia Schmid; Bryan C. Russell; Antonio Torralba; Christopher K. I. Williams; Jianguo Zhang; Andrew Zisserman

Appropriate datasets are required at all stages of object recognition research, including learning visual models of object and scene categories, detecting and localizing instances of these models in images, and evaluating the performance of recognition algorithms. Current datasets are lacking in several respects, and this paper discusses some of the lessons learned from existing efforts, as well as innovative ways to obtain very large and diverse annotated datasets. It also suggests a few criteria for gathering future datasets.

computer vision and pattern recognition | 2012

Two-person interaction detection using body-pose features and multiple instance learning

Kiwon Yun; Jean Honorio; Debaleena Chattopadhyay; Tamara L. Berg; Dimitris Samaras

Human activity recognition has potential to impact a wide range of applications from surveillance to human computer interfaces to content based video retrieval. Recently, the rapid development of inexpensive depth sensors (e.g. Microsoft Kinect) provides adequate accuracy for real-time full-body human tracking for activity recognition applications. In this paper, we create a complex human activity dataset depicting two person interactions, including synchronized video, depth and motion capture data. Moreover, we use our dataset to evaluate various features typically used for indexing and retrieval of motion capture data, in the context of real-time detection of interaction activities via Support Vector Machines (SVMs). Experimentally, we find that the geometric relational features based on distance between all pairs of joints outperforms other feature choices. For whole sequence classification, we also explore techniques related to Multiple Instance Learning (MIL) in which the sequence is represented by a bag of body-pose features. We find that the MIL based classifier outperforms SVMs when the sequences extend temporally around the interaction of interest.

international conference on computer vision | 2013

Paper Doll Parsing: Retrieving Similar Styles to Parse Clothing Items

Kota Yamaguchi; M. Hadi Kiapour; Tamara L. Berg

Clothing recognition is an extremely challenging problem due to wide variation in clothing item appearance, layering, and style. In this paper, we tackle the clothing parsing problem using a retrieval based approach. For a query image, we find similar styles from a large database of tagged fashion images and use these examples to parse the query. Our approach combines parsing from: pre-trained global clothing models, local clothing models learned on the fly from retrieved examples, and transferred parse masks (paper doll item transfer) from retrieved examples. Experimental evaluation shows that our approach significantly outperforms state of the art in parsing accuracy.

Explore More