Is this you? Create Your Porfile

Alexander C. Berg

University of North Carolina at Chapel Hill

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alexander C. Berg is active.

Explore More

Publication

Featured researches published by Alexander C. Berg.

european conference on computer vision | 2016

SSD: Single Shot MultiBox Detector

Wei Liu; Dragomir Anguelov; Dumitru Erhan; Christian Szegedy; Scott E. Reed; Cheng-Yang Fu; Alexander C. Berg

We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. Our SSD model is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stage and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, MS COCO, and ILSVRC datasets confirm that SSD has comparable accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. Compared to other single stage methods, SSD has much better accuracy, even with a smaller input image size. For

computer vision and pattern recognition | 2015

MatchNet: Unifying feature and metric learning for patch-based matching

Xufeng Han; Thomas Leung; Yangqing Jia; Rahul Sukthankar; Alexander C. Berg

300\times 300

international conference on computer vision | 2015

Where to Buy It: Matching Street Clothing Photos in Online Shops

M. Hadi Kiapour; Xufeng Han; Svetlana Lazebnik; Alexander C. Berg; Tamara L. Berg

input, SSD achieves 72.1% mAP on VOC2007 test at 58 FPS on a Nvidia Titan X and for

european conference on computer vision | 2014

Hipster wars: Discovering elements of fashion styles

M. Hadi Kiapour; Kota Yamaguchi; Alexander C. Berg; Tamara L. Berg

500\times 500

european conference on computer vision | 2016

Modeling Context in Referring Expressions

Licheng Yu; Patrick Poirson; Shan Yang; Alexander C. Berg; Tamara L. Berg

input, SSD achieves 75.1% mAP, outperforming a comparable state of the art Faster R-CNN model. Code is available at this https URL .

workshop on applications of computer vision | 2016

Combining multiple sources of knowledge in deep CNNs for action recognition

Eunbyung Park; Xufeng Han; Tamara L. Berg; Alexander C. Berg

Motivated by recent successes on learning feature representations and on learning feature comparison functions, we propose a unified approach to combining both for training a patch matching system. Our system, dubbed Match-Net, consists of a deep convolutional network that extracts features from patches and a network of three fully connected layers that computes a similarity between the extracted features. To ensure experimental repeatability, we train MatchNet on standard datasets and employ an input sampler to augment the training set with synthetic exemplar pairs that reduce overfitting. Once trained, we achieve better computational efficiency during matching by disassembling MatchNet and separately applying the feature computation and similarity networks in two sequential stages. We perform a comprehensive set of experiments on standard datasets to carefully study the contributions of each aspect of MatchNet, with direct comparisons to established methods. Our results confirm that our unified approach improves accuracy over previous state-of-the-art results on patch matching datasets, while reducing the storage requirement for descriptors. We make pre-trained MatchNet publicly available.

international conference on computer vision | 2015

Visual Madlibs: Fill in the Blank Description Generation and Question Answering

Licheng Yu; Eunbyung Park; Alexander C. Berg; Tamara L. Berg

In this paper, we define a new task, Exact Street to Shop, where our goal is to match a real-world example of a garment item to the same item in an online shop. This is an extremely challenging task due to visual differences between street photos (pictures of people wearing clothing in everyday uncontrolled settings) and online shop photos (pictures of clothing items on people, mannequins, or in isolation, captured by professionals in more controlled settings). We collect a new dataset for this application containing 404,683 shop photos collected from 25 different online retailers and 20,357 street photos, providing a total of 39,479 clothing item matches between street and shop photos. We develop three different methods for Exact Street to Shop retrieval, including two deep learning baseline methods, and a method to learn a similarity measure between the street and shop domains. Experiments demonstrate that our learned similarity significantly outperforms our baselines that use existing deep learning based representations.

international conference on computer vision | 2013

Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going?

Olga Russakovsky; Jia Deng; Zhiheng Huang; Alexander C. Berg; Li Fei-Fei

The clothing we wear and our identities are closely tied, revealing to the world clues about our wealth, occupation, and socio-identity. In this paper we examine questions related to what our clothing reveals about our personal style. We first design an online competitive Style Rating Game called Hipster Wars to crowd source reliable human judgments of style. We use this game to collect a new dataset of clothing outfits with associated style ratings for 5 style categories: hipster, bohemian, pinup, preppy, and goth. Next, we train models for between-class and within-class classification of styles. Finally, we explore methods to identify clothing elements that are generally discriminative for a style, and methods for identifying items in a particular outfit that may indicate a style.

computer vision and pattern recognition | 2017

Transformation-Grounded Image Generation Network for Novel 3D View Synthesis

Eunbyung Park; Jimei Yang; Ersin Yumer; Duygu Ceylan; Alexander C. Berg

Humans refer to objects in their environments all the time, especially in dialogue with other people. We explore generating and comprehending natural language referring expressions for objects in images. In particular, we focus on incorporating better measures of visual context into referring expression models and find that visual comparison to other objects within an image helps improve performance significantly. We also develop methods to tie the language generation process together, so that we generate expressions for all objects of a particular category jointly. Evaluation on three recent datasets - RefCOCO, RefCOCO+, and RefCOCOg (Datasets and toolbox can be downloaded from https://github.com/lichengunc/refer), shows the advantages of our methods for both referring expression generation and comprehension.

international conference on 3d vision | 2016

Fast Single Shot Detection and Pose Estimation

Patrick Poirson; Phil Ammirato; Cheng-Yang Fu; Wei Liu; Jana Kosecka; Alexander C. Berg

Although deep convolutional neural networks (CNNs) have shown remarkable results for feature learning and prediction tasks, many recent studies have demonstrated improved performance by incorporating additional handcrafted features or by fusing predictions from multiple CNNs. Usually, these combinations are implemented via feature concatenation or by averaging output prediction scores from several CNNs. In this paper, we present new approaches for combining different sources of knowledge in deep learning. First, we propose feature amplification, where we use an auxiliary, hand-crafted, feature (e.g. optical flow) to perform spatially varying soft-gating on intermediate CNN feature maps. Second, we present a spatially varying multiplicative fusion method for combining multiple CNNs trained on different sources that results in robust prediction by amplifying or suppressing the feature activations based on their agreement. We test these methods in the context of action recognition where information from spatial and temporal cues is useful, obtaining results that are comparable with state-of-the-art methods and outperform methods using only CNNs and optical flow features.

Explore More