Is this you? Create Your Porfile

Junshi Huang

National University of Singapore

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Junshi Huang is active.

Explore More

Publication

Featured researches published by Junshi Huang.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2016

HCP: A Flexible CNN Framework for Multi-Label Image Classification

Yunchao Wei; Wei Xia; Min Lin; Junshi Huang; Bingbing Ni; Jian Dong; Yao Zhao; Shuicheng Yan

Convolutional Neural Network (CNN) has demonstrated promising performance in single-label image classification tasks. However, how CNN best copes with multi-label images still remains an open problem, mainly due to the complex underlying object layouts and insufficient multi-label training images. In this work, we propose a flexible deep CNN infrastructure, called Hypotheses-CNN-Pooling (HCP), where an arbitrary number of object segment hypotheses are taken as the inputs, then a shared CNN is connected with each hypothesis, and finally the CNN output results from different hypotheses are aggregated with max pooling to produce the ultimate multi-label predictions. Some unique characteristics of this flexible deep CNN infrastructure include: 1) no ground-truth bounding box information is required for training; 2) the whole HCP infrastructure is robust to possibly noisy and/or redundant hypotheses; 3) the shared CNN is flexible and can be well pre-trained with a large-scale single-label image dataset, e.g., ImageNet; and 4) it may naturally output multi-label prediction results. Experimental results on Pascal VOC 2007 and VOC 2012 multi-label image datasets well demonstrate the superiority of the proposed HCP infrastructure over other state-of-the-arts. In particular, the mAP reaches 90.5% by HCP only and 93.2% after the fusion with our complementary result in [12] based on hand-crafted features on the VOC 2012 dataset.

computer vision and pattern recognition | 2015

Deep domain adaptation for describing people based on fine-grained clothing attributes

Qiang Chen; Junshi Huang; Rogério Schmidt Feris; Lisa M. Brown; Jian Dong; Shuicheng Yan

We address the problem of describing people based on fine-grained clothing attributes. This is an important problem for many practical applications, such as identifying target suspects or finding missing people based on detailed clothing descriptions in surveillance videos or consumer photos. We approach this problem by first mining clothing images with fine-grained attribute labels from online shopping stores. A large-scale dataset is built with about one million images and fine-detailed attribute sub-categories, such as various shades of color (e.g., watermelon red, rosy red, purplish red), clothing types (e.g., down jacket, denim jacket), and patterns (e.g., thin horizontal stripes, houndstooth). As these images are taken in ideal pose/lighting/background conditions, it is unreliable to directly use them as training data for attribute prediction in the domain of unconstrained images captured, for example, by mobile phones or surveillance cameras. In order to bridge this gap, we propose a novel double-path deep domain adaptation network to model the data from the two domains jointly. Several alignment cost layers placed inbetween the two columns ensure the consistency of the two domain features and the feasibility to predict unseen attribute categories in one of the domains. Finally, to achieve a working system with automatic human body alignment, we trained an enhanced RCNN-based detector to localize human bodies in images. Our extensive experimental evaluation demonstrates the effectiveness of the proposed approach for describing people based on fine-grained clothing attributes.

international conference on computer vision | 2015

Cross-Domain Image Retrieval with a Dual Attribute-Aware Ranking Network

Junshi Huang; Rogério Schmidt Feris; Qiang Chen; Shuicheng Yan

We address the problem of cross-domain image retrieval, considering the following practical application: given a user photo depicting a clothing image, our goal is to retrieve the same or attribute-similar clothing items from online shopping stores. This is a challenging problem due to the large discrepancy between online shopping images, usually taken in ideal lighting/pose/background conditions, and user photos captured in uncontrolled conditions. To address this problem, we propose a Dual Attribute-aware Ranking Network (DARN) for retrieval feature learning. More specifically, DARN consists of two sub-networks, one for each domain, whose retrieval feature representations are driven by semantic attribute learning. We show that this attribute-guided learning is a key factor for retrieval accuracy improvement. In addition, to further align with the nature of the retrieval problem, we impose a triplet visual similarity constraint for learning to rank across the two subnetworks. Another contribution of our work is a large-scale dataset which makes the network learning feasible. We exploit customer review websites to crawl a large set of online shopping images and corresponding offline user photos with fine-grained clothing attributes, i.e., around 450,000 online shopping images and about 90,000 exact offline counterpart images of those online ones. All these images are collected from real-world consumer websites reflecting the diversity of the data modality, which makes this dataset unique and rare in the academic community. We extensively evaluate the retrieval performance of networks in different configurations. The top-20 retrieval accuracy is doubled when using the proposed DARN other than the current popular solution using pre-trained CNN features only (0.570 vs. 0.268).

IEEE Transactions on Multimedia | 2014

Fashion Parsing With Weak Color-Category Labels

Si Liu; Jiashi Feng; Csaba Domokos; Hui Xu; Junshi Huang; Zhenzhen Hu; Shuicheng Yan

In this paper we address the problem of automatically parsing the fashion images with weak supervision from the user-generated color-category tags such as “red jeans” and “white T-shirt”. This problem is very challenging due to the large diversity of fashion items and the absence of pixel-level tags, which make the traditional fully supervised algorithms inapplicable. To solve the problem, we propose to combine the human pose estimation module, the MRF-based color and category inference module and the (super)pixel-level category classifier learning module to generate multiple well-performing category classifiers, which can be directly applied to parse the fashion items in the images. Besides, all the training images are parsed with color-category labels and the human poses of the images are estimated during the model learning phase in this work. We also construct a new fashion image dataset called Colorful-Fashion, in which all 2,682 images are labeled with pixel-level color-category labels. Extensive experiments on this dataset clearly show the effectiveness of the proposed method for the weakly supervised fashion parsing task.

IEEE Transactions on Multimedia | 2016

Clothes Co-Parsing Via Joint Image Segmentation and Labeling With Application to Clothing Retrieval

Xiaodan Liang; Liang Lin; Wei Yang; Ping Luo; Junshi Huang; Shuicheng Yan

This paper aims at developing an integrated system for clothing co-parsing (CCP), in order to jointly parse a set of clothing images (unsegmented but annotated with tags) into semantic configurations. A novel data-driven system consisting of two phases of inference is proposed. The first phase, referred as “image cosegmentation,” iterates to extract consistent regions on images and jointly refines the regions over all images by employing the exemplar-SVM technique [1]. In the second phase (i.e., “region colabeling”), we construct a multiimage graphical model by taking the segmented regions as vertices, and incorporating several contexts of clothing configuration (e.g., item locations and mutual interactions). The joint label assignment can be solved using the efficient Graph Cuts algorithm. In addition to evaluate our framework on the Fashionista dataset [2], we construct a dataset called the SYSU-Clothes dataset consisting of 2098 high-resolution street fashion photos to demonstrate the performance of our system. We achieve 90.29%/88.23% segmentation accuracy and 65.52%/63.89% recognition rate on the Fashionista and the SYSU-Clothes datasets, respectively, which are superior compared with the previous methods. Furthermore, we apply our method on a challenging task, i.e., cross-domain clothing retrieval: given user photo depicting a clothing image, retrieving the same clothing items from online shopping stores based on the fine-grained parsing results.

computer vision and pattern recognition | 2014

Towards Multi-view and Partially-Occluded Face Alignment

Junliang Xing; Zhiheng Niu; Junshi Huang; Weiming Hu; Shuicheng Yan

We present a robust model to locate facial landmarks under different views and possibly severe occlusions. To build reliable relationships between face appearance and shape with large view variations, we propose to formulate face alignment as an l1-induced Stagewise Relational Dictionary (SRD) learning problem. During each training stage, the SRD model learns a relational dictionary to capture consistent relationships between face appearance and shape, which are respectively modeled by the pose-indexed image features and the shape displacements for current estimated landmarks. During testing, the SRD model automatically selects a sparse set of the most related shape displacements for the testing face and uses them to refine its shape iteratively. To locate facial landmarks under occlusions, we further propose to learn an occlusion dictionary to model different kinds of partial face occlusions. By deploying the occlusion dictionary into the SRD model, the alignment performance for occluded faces can be further improved. Our algorithm is simple, effective, and easy to implement. Extensive experiments on two benchmark datasets and two newly built datasets have demonstrated its superior performances over the state-of-the-art methods, especially for faces with large view variations and/or occlusions.

acm multimedia | 2013

Towards efficient sparse coding for scalable image annotation

Junshi Huang; Hairong Liu; Jialie Shen; Shuicheng Yan

Nowadays, content-based retrieval methods are still the development trend of the traditional retrieval systems. Image labels, as one of the most popular approaches for the semantic representation of images, can fully capture the representative information of images. To achieve the high performance of retrieval systems, the precise annotation for images becomes inevitable. However, as the massive number of images in the Internet, one cannot annotate all the images without a scalable and flexible (i.e., training-free) annotation method. In this paper, we particularly investigate the problem of accelerating sparse coding based scalable image annotation, whose off-the-shelf solvers are generally inefficient on large-scale dataset. By leveraging the prior that most reconstruction coefficients should be zero, we develop a general and efficient framework to derive an accurate solution to the large-scale sparse coding problem through solving a series of much smaller-scale subproblems. In this framework, an active variable set, which expands and shrinks iteratively, is maintained, with each snapshot of the active variable set corresponding to a subproblem. Meanwhile, the convergence of our proposed framework to global optimum is theoretically provable. To further accelerate the proposed framework, a sub-linear time complexity hashing strategy, e.g. Locality-Sensitive Hashing, is seamlessly integrated into our framework. Extensive empirical experiments on NUS-WIDE and IMAGENET datasets demonstrate that the orders-of-magnitude acceleration is achieved by the proposed framework for large-scale image annotation, along with zero/negligible accuracy loss for the cases without/with hashing speed-up, compared to the expensive off-the-shelf solvers.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2018

Towards Robust and Accurate Multi-View and Partially-Occluded Face Alignment

Junliang Xing; Zhiheng Niu; Junshi Huang; Weiming Hu; Xi Zhou; Shuicheng Yan

Face alignment acts as an important task in computer vision. Regression-based methods currently dominate the approach to solving this problem, which generally employ a series of mapping functions from the face appearance to iteratively update the face shape hypothesis. One keypoint here is thus how to perform the regression procedure. In this work, we formulate this regression procedure as a sparse coding problem. We learn two relational dictionaries, one for the face appearance and the other one for the face shape, with coupled reconstruction coefficient to capture their underlying relationships. To deploy this model for face alignment, we derive the relational dictionaries in a stage-wised manner to perform close-loop refinement of themselves, i.e., the face appearance dictionary is first learned from the face shape dictionary and then used to update the face shape hypothesis, and the updated face shape dictionary from the shape hypothesis is in return used to refine the face appearance dictionary. To improve the model accuracy, we extend this model hierarchically from the whole face shape to face part shapes, thus both the global and local view variations of a face are captured. To locate facial landmarks under occlusions, we further introduce an occlusion dictionary into the face appearance dictionary to recover face shape from partially occluded face appearance. The occlusion dictionary is learned in a data driven manner from background images to represent a set of elemental occlusion patterns, a sparse combination of which models various practical partial face occlusions. By integrating all these technical innovations, we obtain a robust and accurate approach to locate facial landmarks under different face views and possibly severe occlusions for face images in the wild. Extensive experimental analyses and evaluations on different benchmark datasets, as well as two new datasets built by ourselves, have demonstrated the robustness and accuracy of our proposed model, especially for face images with large view variations and/or severe occlusions.

Archive | 2017

Visual Attributes for Fashion Analytics

Si Liu; Lisa M. Brown; Qiang Chen; Junshi Huang; Luoqi Liu; Shuicheng Yan

In this chapter, we describe methods that leverage clothing and facial attributes as mid-level features for fashion recommendation and retrieval. We introduce a system called Magic Closet for recommending clothing for different occasions, and a system called Beauty E-Expert for hairstyle and facial makeup recommendation. For fashion retrieval, we describe a cross-domain clothing retrieval system, which receives as input a user photo of a particular clothing item taken in unconstrained conditions, and retrieves the exact same or similar item from online shopping catalogs. In each of these systems, we show the value of attribute-guided learning and describe approaches to transfer semantic concepts from large-scale uncluttered annotated data to challenging real-world imagery.

acm multimedia | 2014