Is this you? Create Your Porfile

Fei Sha

University of Southern California

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fei Sha is active.

Explore More

Publication

Featured researches published by Fei Sha.

computer vision and pattern recognition | 2012

Geodesic flow kernel for unsupervised domain adaptation

Boqing Gong; Yuan Shi; Fei Sha; Kristen Grauman

In real-world applications of visual recognition, many factors - such as pose, illumination, or image quality - can cause a significant mismatch between the source domain on which classifiers are trained and the target domain to which those classifiers are applied. As such, the classifiers often perform poorly on the target domain. Domain adaptation techniques aim to correct the mismatch. Existing approaches have concentrated on learning feature representations that are invariant across domains, and they often do not directly exploit low-dimensional structures that are intrinsic to many vision datasets. In this paper, we propose a new kernel-based method that takes advantage of such structures. Our geodesic flow kernel models domain shift by integrating an infinite number of subspaces that characterize changes in geometric and statistical properties from the source to the target domain. Our approach is computationally advantageous, automatically inferring important algorithmic parameters without requiring extensive cross-validation or labeled data from either domain. We also introduce a metric that reliably measures the adaptability between a pair of source and target domains. For a given target domain and several source domains, the metric can be used to automatically select the optimal source domain to adapt and avoid less desirable ones. Empirical studies on standard datasets demonstrate the advantages of our approach over competing methods.

international conference on acoustics, speech, and signal processing | 2006

Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition

Fei Sha; Lawrence K. Saul

We develop a framework for large margin classification by Gaussian mixture models (GMMs). Large margin GMMs have many parallels to support vector machines (SVMs) but use ellipsoids to model classes instead of half-spaces. Model parameters are trained discriminatively to maximize the margin of correct classification, as measured in terms of Mahalanobis distances. The required optimization is convex over the models parameter space of positive semidefinite matrices and can be performed efficiently. Large margin GMMs are naturally suited to large problems in multiway classification; we apply them to phonetic classification and recognition on the TIMIT database. On both tasks, we obtain significant improvement over baseline systems trained by maximum likelihood estimation. For the problem of phonetic classification, our results are competitive with other state-of-the-art classifiers, such as hidden conditional random fields

international conference on machine learning | 2005

Analysis and extension of spectral methods for nonlinear dimensionality reduction

Fei Sha; Lawrence K. Saul

Many unsupervised algorithms for nonlinear dimensionality reduction, such as locally linear embedding (LLE) and Laplacian eigenmaps, are derived from the spectral decompositions of sparse matrices. While these algorithms aim to preserve certain proximity relations on average, their embeddings are not explicitly designed to preserve local features such as distances or angles. In this paper, we show how to construct a low dimensional embedding that maximally preserves angles between nearby data points. The embedding is derived from the bottom eigenvectors of LLE and/or Laplacian eigenmaps by solving an additional (but small) problem in semidefinite programming, whose size is independent of the number of data points. The solution obtained by semidefinite programming also yields an estimate of the datas intrinsic dimensionality. Experimental results on several data sets demonstrate the merits of our approach.

computer vision and pattern recognition | 2011

Sharing features between objects and their attributes

Sung Ju Hwang; Fei Sha; Kristen Grauman

Visual attributes expose human-defined semantics to object recognition models, but existing work largely restricts their influence to mid-level cues during classifier training. Rather than treat attributes as intermediate features, we consider how learning visual properties in concert with object categories can regularize the models for both. Given a low-level visual feature space together with attribute-and object-labeled image data, we learn a shared lower-dimensional representation by optimizing a joint loss function that favors common sparsity patterns across both types of prediction tasks. We adopt a recent kernelized formulation of convex multi-task feature learning, in which one alternates between learning the common features and learning task-specific classifier parameters on top of those features. In this way, our approach discovers any structure among the image descriptors that is relevant to both tasks, and allows the top-down semantics to restrict the hypothesis space of the ultimate object classifiers. We validate the approach on datasets of animals and outdoor scenes, and show significant improvements over traditional multi-class object classifiers and direct attribute prediction models.

computer vision and pattern recognition | 2014

Decorrelating Semantic Visual Attributes by Resisting the Urge to Share

Dinesh Jayaraman; Fei Sha; Kristen Grauman

Existing methods to learn visual attributes are prone to learning the wrong thing -- namely, properties that are correlated with the attribute of interest among training samples. Yet, many proposed applications of attributes rely on being able to learn the correct semantic concept corresponding to each attribute. We propose to resolve such confusions by jointly learning decorrelated, discriminative attribute models. Leveraging side information about semantic relatedness, we develop a multi-task learning approach that uses structured sparsity to encourage feature competition among unrelated attributes and feature sharing among related attributes. On three challenging datasets, we show that accounting for structure in the visual attribute space is key to learning attribute models that preserve semantics, yielding improved generalizability that helps in the recognition and discovery of unseen object categories.

european conference on computer vision | 2016

Video Summarization with Long Short-Term Memory

Ke Zhang; Wei-Lun Chao; Fei Sha; Kristen Grauman

We propose a novel supervised learning technique for summarizing videos by automatically selecting keyframes or key subshots. Casting the problem as a structured prediction problem on sequential data, our main idea is to use Long Short-Term Memory (LSTM), a special type of recurrent neural networks to model the variable-range dependencies entailed in the task of video summarization. Our learning models attain the state-of-the-art results on two benchmark video datasets. Detailed analysis justifies the design of the models. In particular, we show that it is crucial to take into consideration the sequential structures in videos and model them. Besides advances in modeling techniques, we introduce techniques to address the need of a large number of annotated data for training complex learning models. There, our main idea is to exploit the existence of auxiliary annotated video datasets, albeit heterogeneous in visual styles and contents. Specifically, we show domain adaptation techniques can improve summarization by reducing the discrepancies in statistical properties across those datasets.

european conference on computer vision | 2016

An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild

Wei-Lun Chao; Soravit Changpinyo; Boqing Gong; Fei Sha

We investigate the problem of generalized zero-shot learning (GZSL). GZSL relaxes the unrealistic assumption in conventional zero-shot learning (ZSL) that test data belong only to unseen novel classes. In GZSL, test data might also come from seen classes and the labeling space is the union of both types of classes. We show empirically that a straightforward application of classifiers provided by existing ZSL approaches does not perform well in the setting of GZSL. Motivated by this, we propose a surprisingly simple but effective method to adapt ZSL approaches for GZSL. The main idea is to introduce a calibration factor to calibrate the classifiers for both seen and unseen classes so as to balance two conflicting forces: recognizing data from seen classes and those from unseen ones. We develop a new performance metric called the Area Under Seen-Unseen accuracy Curve to characterize this trade-off. We demonstrate the utility of this metric by analyzing existing ZSL approaches applied to the generalized setting. Extensive empirical studies reveal strengths and weaknesses of those approaches on three well-studied benchmark datasets, including the large-scale ImageNet with more than 20,000 unseen categories. We complement our comparative studies in learning methods by further establishing an upper bound on the performance limit of GZSL. In particular, our idea is to use class-representative visual features as the idealized semantic embeddings. We show that there is a large gap between the performance of existing approaches and the performance limit, suggesting that improving the quality of class semantic embeddings is vital to improving ZSL.

computer vision and pattern recognition | 2016

Summary Transfer: Exemplar-Based Subset Selection for Video Summarization

Ke Zhang; Wei-Lun Chao; Fei Sha; Kristen Grauman

Video summarization has unprecedented importance to help us digest, browse, and search todays ever-growing video collections. We propose a novel subset selection technique that leverages supervision in the form of humancreated summaries to perform automatic keyframe-based video summarization. The main idea is to nonparametrically transfer summary structures from annotated videos to unseen test videos. We show how to extend our method to exploit semantic side information about the videos category/ genre to guide the transfer process by those training videos semantically consistent with the test input. We also show how to generalize our method to subshot-based summarization, which not only reduces computational costs but also provides more flexible ways of defining visual similarity across subshots spanning several frames. We conduct extensive evaluation on several benchmarks and demonstrate promising results, outperforming existing methods in several settings.

International Journal of Computer Vision | 2014

Learning Kernels for Unsupervised Domain Adaptation with Applications to Visual Object Recognition

Boqing Gong; Kristen Grauman; Fei Sha

Domain adaptation aims to correct the mismatch in statistical properties between the source domain on which a classifier is trained and the target domain to which the classifier is to be applied. In this paper, we address the challenging scenario of unsupervised domain adaptation, where the target domain does not provide any annotated data to assist in adapting the classifier. Our strategy is to learn robust features which are resilient to the mismatch across domains and then use them to construct classifiers that will perform well on the target domain. To this end, we propose novel kernel learning approaches to infer such features for adaptation. Concretely, we explore two closely related directions. In the first direction, we propose unsupervised learning of a geodesic flow kernel (GFK). The GFK summarizes the inner products in an infinite sequence of feature subspaces that smoothly interpolates between the source and target domains. In the second direction, we propose supervised learning of a kernel that discriminatively combines multiple base GFKs. Those base kernels model the source and the target domains at fine-grained granularities. In particular, each base kernel pivots on a different set of landmarks—the most useful data instances that reveal the similarity between the source and the target domains, thus bridging them to achieve adaptation. Our approaches are computationally convenient, automatically infer important hyper-parameters, and are capable of learning features and classifiers discriminatively without demanding labeled data from the target domain. In extensive empirical studies on standard benchmark recognition datasets, our appraches yield state-of-the-art results compared to a variety of competing methods.

international conference on acoustics, speech, and signal processing | 2011

Speech recognitionwith segmental conditional random fields: A summary of the JHU CLSP 2010 Summer Workshop

Geoffrey Zweig; Patrick Nguyen; D. Van Compernolle; Kris Demuynck; L. Atlas; Pascal Clark; Gregory Sell; M. Wang; Fei Sha; Hynek Hermansky; Damianos Karakos; Aren Jansen; Samuel Thomas; S. Bowman; Justine T. Kao

This paper summarizes the 2010 CLSP Summer Workshop on speech recognition at Johns Hopkins University. The key theme of the workshop was to improve on state-of-the-art speech recognition systems by using Segmental Conditional Random Fields (SCRFs) to integrate multiple types of information. This approach uses a state-of-the-art baseline as a springboard from which to add a suite of novel features including ones derived from acoustic templates, deep neural net phoneme detections, duration models, modulation features, and whole word point-process models. The SCRF framework is able to appropriately weight these different information sources to produce significant gains on both the Broadcast News and Wall Street Journal tasks.

Explore More