Haonan Yu
Purdue University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Haonan Yu.
computer vision and pattern recognition | 2013
Yu Cao; Daniel Paul Barrett; Andrei Barbu; Siddharth Narayanaswamy; Haonan Yu; Aaron Michaux; Yuewei Lin; Sven J. Dickinson; Jeffrey Mark Siskind; Song Wang
Recognizing human activities in partially observed videos is a challenging problem and has many practical applications. When the unobserved subsequence is at the end of the video, the problem is reduced to activity prediction from unfinished activity streaming, which has been studied by many researchers. However, in the general case, an unobserved subsequence may occur at any time by yielding a temporal gap in the video. In this paper, we propose a new method that can recognize human activities from partially observed videos in the general case. Specifically, we formulate the problem into a probabilistic framework: 1) dividing each activity into multiple ordered temporal segments, 2) using spatiotemporal features of the training video samples in each segment as bases and applying sparse coding (SC) to derive the activity likelihood of the test video sample at each segment, and 3) finally combining the likelihood at each segment to achieve a global posterior for the activities. We further extend the proposed method to include more bases that correspond to a mixture of segments with different temporal lengths (MSSC), which can better represent the activities with large intra-class variations. We evaluate the proposed methods (SC and MSSC) on various real videos. We also evaluate the proposed methods on two special cases: 1) activity prediction where the unobserved subsequence is at the end of the video, and 2) human activity recognition on fully observed videos. Experimental results show that the proposed methods outperform existing state-of-the-art comparison methods.
Journal of Artificial Intelligence Research | 2015
Haonan Yu; N. Siddharth; Andrei Barbu; Jeffrey Mark Siskind
We present an approach to simultaneously reasoning about a video clip and an entire natural-language sentence. The compositional nature of language is exploited to construct models which represent the meanings of entire sentences composed out of the meanings of the words in those sentences mediated by a grammar that encodes the predicate-argument relations. We demonstrate that these models faithfully represent the meanings of sentences and are sensitive to how the roles played by participants (nouns), their characteristics (adjectives), the actions performed (verbs), the manner of such actions (adverbs), and changing spatial relations between participants (prepositions) affect the meaning of a sentence and how it is grounded in video. We exploit this methodology in three ways. In the first, a video clip along with a sentence are taken as input and the participants in the event described by the sentence are highlighted, even when the clip depicts multiple similar simultaneous events. In the second, a video clip is taken as input without a sentence and a sentence is generated that describes an event in that clip. In the third, a corpus of video clips is paired with sentences which describe some of the events in those clips and the meanings of the words in those sentences are learned. We learn these meanings without needing to specify which attribute of the video clips each word in a given sentence refers to. The learned meaning representations are shown to be intelligible to humans.
International Journal of Computer Vision | 2017
Haonan Yu; Jeffrey Mark Siskind
Video object codiscovery can leverage the weak semantic constraint implied by sentences that describe the video content. Our codiscovery method, like other object codetection techniques, does not employ any pretrained object models or detectors. Unlike most prior work that focuses on codetecting large objects which are usually salient both in size and appearance, our method can discover small or medium sized objects as well as ones that may be occluded for part of the video. More importantly, our method can codiscover multiple object instances of different classes within a single video clip. Although the semantic information employed is usually simple and weak, it can greatly boost performance by constraining the hypothesized object locations. Experiments show promising results on three datasets: an average IoU score of 0.423 on a new dataset with 15 object classes, an average IoU score of 0.373 on a subset of CAD-120 with 5 object classes, and an average IoU score of 0.358 on a subset of MPII-Cooking with 7 object classes. Our result on this subset of MPII-Cooking improves upon those of the previous state-of-the-art methods by significant margins.
machine vision applications | 2016
Daniel Paul Barrett; Ran Xu; Haonan Yu; Jeffrey Mark Siskind
We make available to the community a new dataset to support action recognition research. This dataset is different from prior datasets in several key ways. It is significantly larger. It contains streaming video with long segments containing multiple action occurrences that often overlap in space and/or time. All actions were filmed in the same collection of backgrounds so that background gives little clue as to action class. We had five humans to replicate the annotation of temporal extent of action occurrences labeled with their classes and measured a surprisingly low level of intercoder agreement. Baseline experiments show that recent state-of-the-art methods perform poorly on this dataset. This suggests that this will be a challenging dataset to foster advances in action recognition research. This manuscript serves to describe the novel content and characteristics of the LCA dataset, present the design decisions made when filming the dataset, document the novel methods employed to annotate the dataset, and present the results of our baseline experiments.
meeting of the association for computational linguistics | 2013
Haonan Yu; Jeffrey Mark Siskind
Archive | 2013
Jeffrey Mark Siskind; Andrei Barbu; Siddharth Narayanaswamy; Haonan Yu
national conference on artificial intelligence | 2015
Haonan Yu; Jeffrey Mark Siskind
arXiv: Robotics | 2015
Daniel Paul Barrett; Scott Alan Bronikowski; Haonan Yu; Jeffrey Mark Siskind
arXiv: Computer Vision and Pattern Recognition | 2015
Haonan Yu; Jeffrey Mark Siskind
IEEE Transactions on Neural Networks | 2018
Daniel Paul Barrett; Scott Alan Bronikowski; Haonan Yu; Jeffrey Mark Siskind