Adriana Kovashka
University of Texas at Austin
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Adriana Kovashka.
computer vision and pattern recognition | 2010
Adriana Kovashka; Kristen Grauman
Recent work shows how to use local spatio-temporal features to learn models of realistic human actions from video. However, existing methods typically rely on a predefined spatial binning of the local descriptors to impose spatial information beyond a pure “bag-of-words” model, and thus may fail to capture the most informative space-time relationships. We propose to learn the shapes of space-time feature neighborhoods that are most discriminative for a given action category. Given a set of training videos, our method first extracts local motion and appearance features, quantizes them to a visual vocabulary, and then forms candidate neighborhoods consisting of the words associated with nearby points and their orientation with respect to the central interest point. Rather than dictate a particular scaling of the spatial and temporal dimensions to determine which points are near, we show how to learn the class-specific distance functions that form the most informative configurations. Descriptors for these variable-sized neighborhoods are then recursively mapped to higher-level vocabularies, producing a hierarchy of space-time configurations at successively broader scales. Our approach yields state-of-theart performance on the UCF Sports and KTH datasets.
International Journal of Computer Vision | 2015
Adriana Kovashka; Kristen Grauman
To learn semantic attributes, existing methods typically train one discriminative model for each word in a vocabulary of nameable properties. However, this “one model per word” assumption is problematic: while a word might have a precise linguistic definition, it need not have a precise visual definition. We propose to discover shades of attribute meaning. Given an attribute name, we use crowdsourced image labels to discover the latent factors underlying how different annotators perceive the named concept. We show that structure in those latent factors helps reveal shades, that is, interpretations for the attribute shared by some group of annotators. Using these shades, we train classifiers to capture the primary (often subtle) variants of the attribute. The resulting models are both semantic and visually precise. By catering to users’ interpretations, they improve attribute prediction accuracy on novel images. Shades also enable more successful attribute-based image search, by providing robust personalized models for retrieving multi-attribute query results. They are widely applicable to tasks that involve describing visual content, such as zero-shot category learning and organization of photo collections.
computer vision and pattern recognition | 2016
Christopher Thomas; Adriana Kovashka
We introduce the novel problem of identifying the photographer behind a photograph. To explore the feasibility of current computer vision techniques to address this problem, we created a new dataset of over 180,000 images taken by 41 well-known photographers. Using this dataset, we examined the effectiveness of a variety of features (low and high-level, including CNN features) at identifying the photographer. We also trained a new deep convolutional neural network for this task. Our results show that high-level features greatly outperform low-level features. We provide qualitative results using these learned models that give insight into our methods ability to distinguish between photographers, and allow us to draw interesting conclusions about what specific photographers shoot. We also demonstrate two applications of our method.
computer vision and pattern recognition | 2017
Zaeem Hussain; Mingda Zhang; Xiaozhong Zhang; Keren Ye; Christopher Thomas; Zuha Agha; Nathan Ong; Adriana Kovashka
There is more to images than their objective physical content: for example, advertisements are created to persuade a viewer to take a certain action. We propose the novel problem of automatic advertisement understanding. To enable research on this problem, we create two datasets: an image dataset of 64,832 image ads, and a video dataset of 3,477 ads. Our data contains rich annotations encompassing the topic and sentiment of the ads, questions and answers describing what actions the viewer is prompted to take and the reasoning that the ad presents to persuade the viewer (What should I do according to this ad, and why should I do it?), and symbolic references ads make (e.g. a dove symbolizes peace). We also analyze the most common persuasive strategies ads use, and the capabilities that computer vision systems should have to understand these strategies. We present baseline classification results for several prediction tasks, including automatically answering questions about the messages of the ads.
workshop on applications of computer vision | 2017
Debashis Ganguly; Mohammad H. Mofrad; Adriana Kovashka
While the abundance of visual content available on the Internet, and the easy access to such content by all users allows us to find relevant content quickly, it also poses challenges. For example, if a parent wants to restrict the visual content which their child can see, this content needs to either be automatically tagged as offensive or not, or a computer vision algorithm needs to be trained to detect offensive content. One type of potentially offensive content is sexually explicit or provocative imagery. An image may be sexually provocative if it portrays nudity, but the sexual innuendo could also be contained in the body posture or facial expression of the human subject shown in the photo. Existing methods simply analyze skin exposure, but fail to capture the hidden intent behind images. Thus, they are unable to capture several important ways in which an image might be sexually provocative, hence offensive to children. We propose to address this problem by extracting a unified feature descriptor constituting the percentage of skin exposure, the body posture of the human in the image, and his/her gestures and facial expressions. We learn to predict these cues, then train a hierarchical model which combines them. We show in experiments that this model more accurately detects sexual innuendos behind images.
Archive | 2017
Adriana Kovashka; Kristen Grauman
Image retrieval is a computer vision application that people encounter in their everyday lives. To enable accurate retrieval results, a human user needs to be able to communicate in a rich and noiseless way with the retrieval system. We propose semantic visual attributes as a communication channel for search because they are commonly used by humans to describe the world around them. We first propose a new feedback interaction where users can directly comment on how individual properties of retrieved content should be adjusted to more closely match the desired visual content. We then show how to ensure this interaction is as informative as possible, by having the vision system ask those questions that will most increase its certainty over what content is relevant. To ensure that attribute-based statements from the user are not misinterpreted by the system, we model the unique ways in which users employ attribute terms, and develop personalized attribute models. We discover clusters among users in terms of how they use a given attribute term, and consequently discover the distinct “shades of meaning” of these attributes. Our work is a significant step in the direction of bridging the semantic gap between high-level user intent and low-level visual features. We discuss extensions to further increase the utility of attributes for practical search applications.
computer vision and pattern recognition | 2016
Christopher Thomas; Adriana Kovashka; Donald M. Chiarulli; Steven P. Levitan
We present a new top-down and bottom-up saliency algorithm designed to exploit the capabilities of coupled oscillators: an ultra-low-power, high performance, non-boolean computer architecture designed to serve as a special purpose embedded accelerator for vision applications. To do this, we extend a widely used neuromorphic bottom-up saliency pipeline by introducing a top-down channel which looks for objects of a particular type. The proposed channel relies on a segmentation of the input image to identify exemplar object segments resembling those encountered in training. The channel leverages pre-computed bottom-up feature maps to produce a novel scale-invariant descriptor for each segment with little computational overhead. We also introduce a new technique to automatically determine exemplar segments during training, without the need for annotations per segment. We evaluate our method on both NeoVision2 DARPA challenge datasets, illustrating significant gains in performance compared to all baseline approaches.
computer vision and pattern recognition | 2012
Adriana Kovashka; Devi Parikh; Kristen Grauman
international conference on computer vision | 2011
Adriana Kovashka; Sudheendra Vijayanarasimhan; Kristen Grauman
international conference on computer vision | 2013
Adriana Kovashka; Kristen Grauman