Wei Di
eBay
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Wei Di.
computer vision and pattern recognition | 2013
Wei Di; Catherine Wah; Anurag Bhardwaj; Robinson Piramuthu; Neel Sundaresan
With the rapid proliferation of smartphones and tablet computers, search has moved beyond text to other modalities like images and voice. For many applications like Fashion, visual search offers a compelling interface that can capture stylistic visual elements beyond color and pattern that cannot be as easily described using text. However, extracting and matching such attributes remains an extremely challenging task due to high variability and deformability of clothing items. In this paper, we propose a fine-grained learning model and multimedia retrieval framework to address this problem. First, an attribute vocabulary is constructed using human annotations obtained on a novel fine-grained clothing dataset. This vocabulary is then used to train a fine-grained visual recognition system for clothing styles. We report benchmark recognition and retrieval results on Womens Fashion Coat Dataset and illustrate potential mobile applications for attribute-based multimedia retrieval of clothing items and image annotation.
international conference on computer vision | 2015
Zhicheng Yan; Hao Zhang; Robinson Piramuthu; Vignesh Jagadeesh; Dennis Decoste; Wei Di; Yizhou Yu
In image classification, visual separability between different object categories is highly uneven, and some categories are more difficult to distinguish than others. Such difficult categories demand more dedicated classifiers. However, existing deep convolutional neural networks (CNN) are trained as flat N-way classifiers, and few efforts have been made to leverage the hierarchical structure of categories. In this paper, we introduce hierarchical deep CNNs (HD-CNNs) by embedding deep CNNs into a two-level category hierarchy. An HD-CNN separates easy classes using a coarse category classifier while distinguishing difficult classes using fine category classifiers. During HDCNN training, component-wise pretraining is followed by global fine-tuning with a multinomial logistic loss regularized by a coarse category consistency term. In addition, conditional executions of fine category classifiers and layer parameter compression make HD-CNNs scalable for largescale visual recognition. We achieve state-of-the-art results on both CIFAR100 and large-scale ImageNet 1000-class benchmark datasets. In our experiments, we build up three different two-level HD-CNNs, and they lower the top-1 error of the standard CNNs by 2:65%, 3:1%, and 1:1%.
knowledge discovery and data mining | 2014
Vignesh Jagadeesh; Robinson Piramuthu; Anurag Bhardwaj; Wei Di; Neel Sundaresan
We describe a completely automated large scale visual recommendation system for fashion. Our focus is to efficiently harness the availability of large quantities of online fashion images and their rich meta-data. Specifically, we propose two classes of data driven models in the Deterministic Fashion Recommenders (DFR) and Stochastic Fashion Recommenders (SFR) for solving this problem. We analyze relative merits and pitfalls of these algorithms through extensive experimentation on a large-scale data set and baseline them against existing ideas from color science. We also illustrate key fashion insights learned through these experiments and show how they can be employed to design better recommendation systems. The industrial applicability of proposed models is in the context of mobile fashion shopping. Finally, we also outline a large-scale annotated data set of fashion images Fashion-136K) that can be exploited for future research in data driven visual fashion.
workshop on applications of computer vision | 2014
Mohammad Haris Baig; Vignesh Jagadeesh; Robinson Piramuthu; Anurag Bhardwaj; Wei Di; Neel Sundaresan
The rapid increase in number of high quality mobile cameras have opened up an array of new problems in mobile vision. Mobile cameras are predominantly monocular and are devoid of any sense of depth, making them heavily reliant on 2D image processing. Understanding 3D structure of scenes being imaged can greatly improve the performance of existing vision/graphics techniques. In this regard, recent availability of large scale RGB-D datasets beg for more effective data driven strategies to leverage the scale of data. We propose a depth recovery mechanism “im2depth”, that is lightweight enough to run on mobile platforms, while leveraging the large scale nature of modern RGB-D datasets. Our key observation is to form a basis (dictionary) over the RGB and depth spaces, and represent depth maps by a sparse linear combination of weights over dictionary elements. Subsequently, a prediction function is estimated between weight vectors in RGB to depth space to recover depth maps from query images. A final superpixel post processor aligns depth maps with occlusion boundaries, creating physically plausible results. We conclude with thorough experimentation with four state of the art depth recovery algorithms, and observe an improvement of over 6.5 percent in shape recovery, and over 10cm reduction in average L1 error.
knowledge discovery and data mining | 2013
Anurag Bhardwaj; Atish Das Sarma; Wei Di; Raffay Hamid; Robinson Piramuthu; Neel Sundaresan
With the explosion of mobile devices with cameras, online search has moved beyond text to other modalities like images, voice, and writing. For many applications like Fashion, image-based search offers a compelling interface as compared to text forms by better capturing the visual attributes. In this paper we present a simple and fast search algorithm that uses color as the main feature for building visual search. We show that low level cues such as color can be used to quantify image similarity and also to discriminate among products with different visual appearances. We demonstrate the effectiveness of our approach through a mobile shopping application\footnote{eBay Fashion App available at https://itunes.apple.com/us/app/ebay-fashion/id378358380?mt=8 and eBay image swatch is the feature indexing millions of real world fashion images}. Our approach outperforms several other state-of-the-art image retrieval algorithms for large scale image data.
international conference on image processing | 2015
M. Hadi Kiapour; Wei Di; Vignesh Jagadeesh; Robinson Piramuthu
While discriminative visual element mining has been introduced before, in this paper we present an approach that requires minimal annotation in both training and test time. Given only a bounding box localization of the foreground objects, our approach automatically transforms the input images into a roughly-aligned pose space and discovers the most discriminative visual fragments for each category. These fragments are then used to learn robust classifiers that discriminate between very similar categories under challenging conditions such as large variations in pose or habitats. The minimal required input, is a critical characteristic that enables our approach to generalize over visual domains where expert knowledge is not readily available. Moreover, our approach takes advantage of deep networks that are targeted towards fine-grained classification. It learns mid-level representations that are specific to a category and generalize well across the category instances at the same time. Our evaluations demonstrate that the automatically learned representation based on discriminative fragments, significantly outperforms globally extracted deep features in classification accuracy.
international conference on image processing | 2014
Rohit Pandey; Wei Di; Vignesh Jagadeesh; Robinson Piramuthu; Anurag Bhardwaj
In this paper we present a framework for logo retrieval in natural images. Color-localized spatial masks are used as an alternative to computationally expensive spatial verification techniques like RANSAC. First, keypoints are detected using traditional techniques such as the SIFT detector. Local masks are defined around each keypoint that take its scale and orientation information into account. To exploit inherent color information presented in brand logos, ordered color histograms are extracted from masked regions. A separate vocabulary is constructed for both SIFT descriptors (visual word) and color histograms (color word). For faster matching during runtime, a two-stage cascaded index is designed, which maps the visual word and color word tuple to a list of relevant images. This list is finally re-ranked with BoW cosine similarity to generate relevant matches for the input query. To demonstrate the efficacy of our method, we conduct experiments on two popular logo datasets: Flickr27 and Flickr32. Our experimental results illustrate State-of-the-art retrieval performance on these datasets with potential for added speed and a lower memory footprint as indicated by the low response ratio.
international conference on computer vision systems | 2015
Kevin J. Shih; Wei Di; Vignesh Jagadeesh; Robinson Piramuthu
Text is ubiquitous in the artificial world and easily attainable when it comes to book title and author names. Using the images from the book cover set from the Stanford Mobile Visual Search dataset and additional book covers and metadata from openlibrary.org, we construct a large scale book cover retrieval dataset, complete with 100i¾?K distractor covers and title and author strings for each. Because our query images are poorly conditioned for clean text extraction, we propose a method for extracting a matching noisy and erroneous OCR readings and matching it against clean author and book title strings in a standard document look-up problem setup. Finally, we demonstrate how to use this text-matching as a feature in conjunction with popular retrieval features such as VLAD using a simple learning setupi¾?to achieve significant improvements in retrieval accuracy over that of either VLAD or the text alone.
Handbook of Statistics | 2013
Robinson Piramuthu; Anurag Bhardwaj; Wei Di; Neel Sundaresan
Abstract Advances in the proliferation of media devices as well as Internet technologies have generated massive image data sets and made them easier to access and share today. These large-scale data sources not only provide rich test beds for solving existing computer vision problems, but also pose a unique challenge for large-scale data processing that demands an effective information retrieval system to browse and search. This is also motivated by many real-world applications where visual search has been shown to offer compelling interfaces and functionalities by capturing visual attributes better than modalities such as audio, text, etc. In this chapter, we describe state-of-the-art techniques in large-scale visual search. Specifically, we outline each phase in a typical retrieval pipeline including information extraction, representation, indexing, and matching, and we focus on practical issues such as memory footprint and speed while dealing with large data sets. We tabulate several public data sets commonly used as benchmarks, along with their summary. The scope of data reveals a wide variety of potential applications for a vision-based retrieval system. Finally, we address several promising research directions by introducing some other core components that serve to improve the current retrieval system.
Archive | 2014
Anurag Bhardwaj; Wei Di; Muhammad Raffay Hamid; Robinson Piramuthu; Neelakantan Sundaresan