Xufeng Han | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xufeng Han is active.

Explore More

Publication

Featured researches published by Xufeng Han.

computer vision and pattern recognition | 2015

MatchNet: Unifying feature and metric learning for patch-based matching

Xufeng Han; Thomas Leung; Yangqing Jia; Rahul Sukthankar; Alexander C. Berg

Motivated by recent successes on learning feature representations and on learning feature comparison functions, we propose a unified approach to combining both for training a patch matching system. Our system, dubbed Match-Net, consists of a deep convolutional network that extracts features from patches and a network of three fully connected layers that computes a similarity between the extracted features. To ensure experimental repeatability, we train MatchNet on standard datasets and employ an input sampler to augment the training set with synthetic exemplar pairs that reduce overfitting. Once trained, we achieve better computational efficiency during matching by disassembling MatchNet and separately applying the feature computation and similarity networks in two sequential stages. We perform a comprehensive set of experiments on standard datasets to carefully study the contributions of each aspect of MatchNet, with direct comparisons to established methods. Our results confirm that our unified approach improves accuracy over previous state-of-the-art results on patch matching datasets, while reducing the storage requirement for descriptors. We make pre-trained MatchNet publicly available.

international conference on computer vision | 2015

Where to Buy It: Matching Street Clothing Photos in Online Shops

M. Hadi Kiapour; Xufeng Han; Svetlana Lazebnik; Alexander C. Berg; Tamara L. Berg

In this paper, we define a new task, Exact Street to Shop, where our goal is to match a real-world example of a garment item to the same item in an online shop. This is an extremely challenging task due to visual differences between street photos (pictures of people wearing clothing in everyday uncontrolled settings) and online shop photos (pictures of clothing items on people, mannequins, or in isolation, captured by professionals in more controlled settings). We collect a new dataset for this application containing 404,683 shop photos collected from 25 different online retailers and 20,357 street photos, providing a total of 39,479 clothing item matches between street and shop photos. We develop three different methods for Exact Street to Shop retrieval, including two deep learning baseline methods, and a method to learn a similarity measure between the street and shop domains. Experiments demonstrate that our learned similarity significantly outperforms our baselines that use existing deep learning based representations.

computer vision and pattern recognition | 2012

Understanding and predicting importance in images

Alexander C. Berg; Tamara L. Berg; Hal Daumé; Jesse Dodge; Amit Goyal; Xufeng Han; Alyssa Mensch; Margaret Mitchell; Aneesh Sood; Karl Stratos; Kota Yamaguchi

What do people care about in an image? To drive computational visual recognition toward more human-centric outputs, we need a better understanding of how people perceive and judge the importance of content in images. In this paper, we explore how a number of factors relate to human perception of importance. Proposed factors fall into 3 broad types: 1) factors related to composition, e.g. size, location, 2) factors related to semantics, e.g. category of object or scene, and 3) contextual factors related to the likelihood of attribute-object, or object-scene pairs. We explore these factors using what people describe as a proxy for importance. Finally, we build models to predict what will be described about an image given either known image content, or image content estimated automatically by recognition systems.

workshop on applications of computer vision | 2016

Combining multiple sources of knowledge in deep CNNs for action recognition

Eunbyung Park; Xufeng Han; Tamara L. Berg; Alexander C. Berg

Although deep convolutional neural networks (CNNs) have shown remarkable results for feature learning and prediction tasks, many recent studies have demonstrated improved performance by incorporating additional handcrafted features or by fusing predictions from multiple CNNs. Usually, these combinations are implemented via feature concatenation or by averaging output prediction scores from several CNNs. In this paper, we present new approaches for combining different sources of knowledge in deep learning. First, we propose feature amplification, where we use an auxiliary, hand-crafted, feature (e.g. optical flow) to perform spatially varying soft-gating on intermediate CNN feature maps. Second, we present a spatially varying multiplicative fusion method for combining multiple CNNs trained on different sources that results in robust prediction by amplifying or suppressing the feature activations based on their agreement. We test these methods in the context of action recognition where information from spatial and temporal cues is useful, obtaining results that are comparable with state-of-the-art methods and outperform methods using only CNNs and optical flow features.

NeuroImage | 2013

Multi-voxel pattern analysis of selective representation of visual working memory in ventral temporal and occipital regions.

Xufeng Han; Alexander C. Berg; Hwamee Oh; Dimitris Samaras; Hoi-Chung Leung

While previous results from univariate analysis showed that the activity level of the parahippocampal gyrus (PHG) but not the fusiform gyrus (FG) reflects selective maintenance of the cued picture category, present results from multi-voxel pattern analysis (MVPA) showed that the spatial response patterns of both regions can be used to differentiate the selected picture category in working memory. The ventral temporal and occipital areas including the PHG and FG have been shown to be specialized in perceiving and processing different kinds of visual information, though their role in the representation of visual working memory remains unclear. To test whether the PHG and FG show spatial response patterns that reflect selective maintenance of task-relevant visual working memory in comparison with other posterior association regions, we reanalyzed data from a previous fMRI study of visual working memory with a cue inserted during the delay period of a delayed recognition task. Classification of FG and PHG activation patterns for the selected category (face or scene) during the cue phase was well above chance using classifiers trained with fMRI data from the cue or probe phase. Classification of activity in other temporal and occipital regions for the cued picture category during the cue phase was relatively less consistent even though classification of their activity during the probe recognition was comparable with the FG and PHG. In sum, these findings suggest that the FG and PHG carry information relevant to the cued visual category, and their spatial activation patterns during selective maintenance seem to match those during visual recognition.

International Journal of Computer Vision | 2016

Large Scale Retrieval and Generation of Image Descriptions

Vicente Ordonez; Xufeng Han; Polina Kuznetsova; Girish Kulkarni; Margaret Mitchell; Kota Yamaguchi; Karl Stratos; Amit Goyal; Jesse Dodge; Alyssa Mensch; Hal Daumé; Alexander C. Berg; Yejin Choi; Tamara L. Berg

What is the story of an image? What is the relationship between pictures, language, and information we can extract using state of the art computational recognition systems? In an attempt to address both of these questions, we explore methods for retrieving and generating natural language descriptions for images. Ideally, we would like our generated textual descriptions (captions) to both sound like a person wrote them, and also remain true to the image content. To do this we develop data-driven approaches for image description generation, using retrieval-based techniques to gather either: (a) whole captions associated with a visually similar image, or (b) relevant bits of text (phrases) from a large collection of image + description pairs. In the case of (b), we develop optimization algorithms to merge the retrieved phrases into valid natural language sentences. The end result is two simple, but effective, methods for harnessing the power of big data to produce image captions that are altogether more general, relevant, and human-like than previous attempts.

computer vision and pattern recognition | 2012

DCMSVM: Distributed parallel training for single-machine multiclass classifiers

Xufeng Han; Alexander C. Berg

We present an algorithm and implementation for distributed parallel training of single-machine multiclass SVMs. While there is ongoing and healthy debate about the best strategy for multiclass classification, there are some features of the single-machine approach that are not available when training alternatives such as one-vs-all, and that are quite complex for tree based methods. One obstacle to exploring single-machine approaches on large datasets is that they are usually limited to running on a single machine! We build on a framework borrowed from parallel convex optimization - the alternating direction method of multipliers (ADMM) - to develop a new consensus based algorithm for distributed training of single-machine approaches. This is demonstrated with an implementation of our novel sequential dual algorithm (DCMSVM) which allows distributed parallel training with small communication requirements. Benchmark results show significant reduction in wall clock time compared to current state of the art multiclass SVM implementation (Liblinear) on a single node. Experiments are performed on large scale image classification including results with modern high-dimensional features.

conference of the european chapter of the association for computational linguistics | 2012