Felix X. Yu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Felix X. Yu is active.

Explore More

Publication

Featured researches published by Felix X. Yu.

computer vision and pattern recognition | 2012

Weak attributes for large-scale image retrieval

Felix X. Yu; Rongrong Ji; Ming-Hen Tsai; Guangnan Ye; Shih-Fu Chang

Attribute-based query offers an intuitive way of image retrieval, in which users can describe the intended search targets with understandable attributes. In this paper, we develop a general and powerful framework to solve this problem by leveraging a large pool of weak attributes comprised of automatic classifier scores or other mid-level representations that can be easily acquired with little or no human labor. We extend the existing retrieval model of modeling dependency within query attributes to modeling dependency of query attributes on a large pool of weak attributes, which is more expressive and scalable. To efficiently learn such a large dependency model without overfitting, we further propose a semi-supervised graphical model to map each multiattribute query to a subset of weak attributes. Through extensive experiments over several attribute benchmarks, we demonstrate consistent and significant performance improvements over the state-of-the-art techniques. In addition, we compile the largest multi-attribute image retrieval dateset to date, including 126 fully labeled query attributes and 6,000 weak attributes of 0.26 million images.

international conference on computer vision | 2015

An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections

Yu Cheng; Felix X. Yu; Rogério Schmidt Feris; Sanjiv Kumar; Alok N. Choudhary; Shih-Fu Chang

We explore the redundancy of parameters in deep neural networks by replacing the conventional linear projection in fully-connected layers with the circulant projection. The circulant structure substantially reduces memory footprint and enables the use of the Fast Fourier Transform to speed up the computation. Considering a fully-connected neural network layer with d input nodes, and d output nodes, this method improves the time complexity from O(d2) to O(dlogd) and space complexity from O(d2) to O(d). The space savings are particularly important for modern deep convolutional neural network architectures, where fully-connected layers typically contain more than 90% of the network parameters. We further show that the gradient computation and optimization of the circulant projections can be performed very efficiently. Our experiments on three standard datasets show that the proposed approach achieves this significant gain in storage and efficiency with minimal increase in error rate compared to neural networks with unstructured projections.

acm multimedia | 2011

Active query sensing for mobile location search

Felix X. Yu; Rongrong Ji; Shih-Fu Chang

While much exciting progress is being made in mobile visual search, one important question has been left unexplored in all current systems. When the first query fails to find the right target (up to 50% likelihood), how should the user form his/her search strategy in the subsequent interaction? In this paper, we propose a novel Active Query Sensing system to suggest the best way for sensing the surrounding scenes while forming the second query for location search. We accomplish the goal by developing several unique components -- an offline process for analyzing the saliency of the views associated with each geographical location based on score distribution modeling, predicting the visual search precision of individual views and locations, estimating the view of an unseen query, and suggesting the best subsequent view change. Using a scalable visual search system implemented over a NYC street view data set (0.3 million images), we show a performance gain as high as two folds, reducing the failure rate of mobile location search to only 12% after the second query. This work may open up an exciting new direction for developing interactive mobile media applications through innovative exploitation of active sensing and query formulation.

international conference on multimedia retrieval | 2014

Minimally Needed Evidence for Complex Event Recognition in Unconstrained Videos

Subhabrata Bhattacharya; Felix X. Yu; Shih-Fu Chang

This paper addresses the fundamental question -- How do humans recognize complex events in videos? Normally, humans view videos in a sequential manner. We hypothesize that humans can make high-level inference such as an event is present or not in a video, by looking at a very small number of frames not necessarily in a linear order. We attempt to verify this cognitive capability of humans and to discover the Minimally Needed Evidence (MNE) for each event. To this end, we introduce an online game based event quiz facilitating selection of minimal evidence required by humans to judge the presence or absence of a complex event in an open source video. Each video is divided into a set of temporally coherent microshots (1.5 secs in length) which are revealed only on player request. The players task is to identify the positive and negative occurrences of the given target event with minimal number of requests to reveal evidence. Incentives are given to players for correct identification with the minimal number of requests. Our extensive human study using the game quiz validates our hypothesis - 55% of videos need only one microshot for correct human judgment and events of varying complexity require different amounts of evidence for human judgment. In addition, the proposed notion of MNE enables us to select discriminative features, drastically improving speed and accuracy of a video retrieval system.

acm multimedia | 2014

Modeling Attributes from Category-Attribute Proportions

Felix X. Yu; Liangliang Cao; Michele Merler; Noel C. F. Codella; Tao Chen; John R. Smith; Shih-Fu Chang

Attribute-based representation has been widely used in visual recognition and retrieval due to its interpretability and cross-category generalization properties. However, classic attribute learning requires manually labeling attributes on the images, which is very expensive, and not scalable. In this paper, we propose to model attributes from category-attribute proportions. The proposed framework can model attributes without attribute labels on the images. Specifically, given a multi-class image datasets with N categories, we model an attribute, based on an N-dimensional category-attribute proportion vector, where each element of the vector characterizes the proportion of images in the corresponding category having the attribute. The attribute learning can be formulated as a learning from label proportion (LLP) problem. Our method is based on a newly proposed machine learning algorithm called

computer vision and pattern recognition | 2017

Learning Discriminative and Transformation Covariant Local Feature Detectors

Xu Zhang; Felix X. Yu; Svebor Karaman; Shih-Fu Chang

\propto

acm multimedia | 2012

Active query sensing: Suggesting the best query view for mobile visual search

Rongrong Ji; Felix X. Yu; Tongtao Zhang; Shih-Fu Chang

SVM. Finding the category-attribute proportions is much easier than manually labeling images, but it is still not a trivial task. We further propose to estimate the proportions from multiple modalities such as human commonsense knowledge, NLP tools, and other domain knowledge. The value of the proposed approach is demonstrated by various applications including modeling animal attributes, visual sentiment attributes, and scene attributes.

acm multimedia | 2011

Intelligent query formulation for mobile visual search

Felix X. Yu

Robust covariant local feature detectors are important for detecting local features that are (1) discriminative of the image content and (2) can be repeatably detected at consistent locations when the image undergoes diverse transformations. Such detectors are critical for applications such as image search and scene reconstruction. Many learning-based local feature detectors address one of these two problems while overlooking the other. In this work, we propose a novel learning-based method to simultaneously address both issues. Specifically, we extend the covariant constraint proposed by Lenc and Vedaldi [8] by defining the concepts of standard patch and canonical feature and leverage these to train a novel robust covariant detector. We show that the introduction of these concepts greatly simplifies the learning stage of the covariant detector, and also makes the detector much more robust. Extensive experiments show that our method outperforms previous hand-crafted and learning-based detectors by large margins in terms of repeatability.

acm multimedia | 2016

Tamp: A Library for Compact Deep Neural Networks with Structured Matrices

Bingchen Gong; Brendan Jou; Felix X. Yu; Shih-Fu Chang

While much exciting progress is being made in mobile visual search, one important question has been left unexplored in all current systems. When searching objects or scenes in the 3D world, which viewing angle is more likely to be successful? More particularly, if the first query fails to find the right target, how should the user control the mobile camera to form the second query? In this article, we propose a novel Active Query Sensing system for mobile location search, which actively suggests the best subsequent query view to recognize the physical location in the mobile environment. The proposed system includes two unique components: (1) an offline process for analyzing the saliencies of different views associated with each geographical location, which predicts the location search precisions of individual views by modeling their self-retrieval score distributions. (2) an online process for estimating the view of an unseen query, and suggesting the best subsequent view change. Specifically, the optimal viewing angle change for the next query can be formulated as an online information theoretic approach. Using a scalable visual search system implemented over a NYC street view dataset (0.3 million images), we show a performance gain by reducing the failure rate of mobile location search to only 12% after the second query. We have also implemented an end-to-end functional system, including user interfaces on iPhones, client-server communication, and a remote search server. This work may open up an exciting new direction for developing interactive mobile media applications through the innovative exploitation of active sensing and query formulation.

Archive | 2013

Additional Remarks on Designing Category-Level Attributes for Discriminative Visual Recognition

Felix X. Yu; Liangliang Cao; Rogério Schmidt Feris; John R. Smith; Shih-Fu Chang

While much progress is being made in mobile visual search, most efforts are on how to improve search performance (precision, recall, speed) given queries. How to help the user form a good query has generally left unexplored. Successful mobile search should keep users in the loop and have the machine and user work closely as a team to solve the difficult problem - user provide fast feedback on the fly and machine does the data intensive analysis. Therefore, helping the user to form a good query has a great potential in improving the search performance. We describe a novel framework, Active Query Sensing, to provide interactive query formulation solutions for mobile location search. We also discuss new research directions addressing the important open issues of interactive query formulation for mobile visual search.

Explore More