Yuewei Lin
University of South Carolina
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yuewei Lin.
computer vision and pattern recognition | 2013
Yu Cao; Daniel Paul Barrett; Andrei Barbu; Siddharth Narayanaswamy; Haonan Yu; Aaron Michaux; Yuewei Lin; Sven J. Dickinson; Jeffrey Mark Siskind; Song Wang
Recognizing human activities in partially observed videos is a challenging problem and has many practical applications. When the unobserved subsequence is at the end of the video, the problem is reduced to activity prediction from unfinished activity streaming, which has been studied by many researchers. However, in the general case, an unobserved subsequence may occur at any time by yielding a temporal gap in the video. In this paper, we propose a new method that can recognize human activities from partially observed videos in the general case. Specifically, we formulate the problem into a probabilistic framework: 1) dividing each activity into multiple ordered temporal segments, 2) using spatiotemporal features of the training video samples in each segment as bases and applying sparse coding (SC) to derive the activity likelihood of the test video sample at each segment, and 3) finally combining the likelihood at each segment to achieve a global posterior for the activities. We further extend the proposed method to include more bases that correspond to a mixture of segments with different temporal lengths (MSSC), which can better represent the activities with large intra-class variations. We evaluate the proposed methods (SC and MSSC) on various real videos. We also evaluate the proposed methods on two special cases: 1) activity prediction where the unobserved subsequence is at the end of the video, and 2) human activity recognition on fully observed videos. Experimental results show that the proposed methods outperform existing state-of-the-art comparison methods.
computer vision and pattern recognition | 2015
Xiaochuan Fan; Kang Zheng; Yuewei Lin; Song Wang
We propose a new learning-based method for estimating 2D human pose from a single image, using Dual-Source Deep Convolutional Neural Networks (DS-CNN). Recently, many methods have been developed to estimate human pose by using pose priors that are estimated from physiologically inspired graphical models or learned from a holistic perspective. In this paper, we propose to integrate both the local (body) part appearance and the holistic view of each local part for more accurate human pose estimation. Specifically, the proposed DS-CNN takes a set of image patches (category-independent object proposals for training and multi-scale sliding windows for testing) as the input and then learns the appearance of each local part by considering their holistic views in the full body. Using DS-CNN, we achieve both joint detection, which determines whether an image patch contains a body joint, and joint localization, which finds the exact location of the joint in the image patch. Finally, we develop an algorithm to combine these joint detection/localization results from all the image patches for estimating the human pose. The experimental results show the effectiveness of the proposed method by comparing to the state-of-the-art human-pose estimation methods based on pose priors that are estimated from physiologically inspired graphical models or learned from a holistic perspective.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2013
Yuewei Lin; Yuan Yan Tang; Bin Fang; Zhaowei Shang; Yong-Hui Huang; Song Wang
This paper introduces a new computational visual-attention model for static and dynamic saliency maps. First, we use the Earth Movers Distance (EMD) to measure the center-surround difference in the receptive field, instead of using the Difference-of-Gaussian filter that is widely used in many previous visual-attention models. Second, we propose to take two steps of biologically inspired nonlinear operations for combining different features: combining subsets of basic features into a set of super features using the Lm-norm and then combining the super features using the Winner-Take-All mechanism. Third, we extend the proposed model to construct dynamic saliency maps from videos by using EMD for computing the center-surround difference in the spatiotemporal receptive field. We evaluate the performance of the proposed model on both static image data and video data. Comparison results show that the proposed model outperforms several existing models under a unified evaluation setting.
Neurocomputing | 2013
Weibin Yang; Yuan Yan Tang; Bin Fang; Zhao Wei Shang; Yuewei Lin
This paper proposes a novel method for visual saliency detection based on an universal probabilistic model, which measures the saliency by combining low level features and location prior. We view the task of estimating visual saliency as searching the most conspicuous parts in an image and extract the saliency map by computing the dissimilarity between different regions. We simulate the moving of the center of human visual field, and describe how the center shift process works on visual saliency. Furthermore, multiscale analysis is adopted for improving the robustness of our model. Experimental results on three public image datasets show that the proposed approach outperforms 18 state-of-the-art methods for both salient object detection and human eye fixation prediction.
international conference on computer vision | 2015
Yuewei Lin; Kareem Abdelfatah; Youjie Zhou; Xiaochuan Fan; Hongkai Yu; Hui Qian; Song Wang
Wearable cameras, such as Google Glass and Go Pro, enable video data collection over larger areas and from different views. In this paper, we tackle a new problem of locating the co-interest person (CIP), i.e., the one who draws attention from most camera wearers, from temporally synchronized videos taken by multiple wearable cameras. Our basic idea is to exploit the motion patterns of people and use them to correlate the persons across different videos, instead of performing appearance-based matching as in traditional video co-segmentation/localization. This way, we can identify CIP even if a group of people with similar appearance are present in the view. More specifically, we detect a set of persons on each frame as the candidates of the CIP and then build a Conditional Random Field (CRF) model to select the one with consistent motion patterns in different videos and high spacial-temporal consistency in each video. We collect three sets of wearable-camera videos for testing the proposed algorithm. All the involved people have similar appearances in the collected videos and the experiments demonstrate the effectiveness of the proposed algorithm.
computer vision and pattern recognition | 2016
Hongkai Yu; Youjie Zhou; Jeff P. Simmons; Craig Przybyla; Yuewei Lin; Xiaochuan Fan; Yang Mi; Song Wang
Automatic tracking of large-scale crowded targets are of particular importance in many applications, such as crowded people/vehicle tracking in video surveillance, fiber tracking in materials science, and cell tracking in biomedical imaging. This problem becomes very challenging when the targets show similar appearance and the interslice/ inter-frame continuity is low due to sparse sampling, camera motion and target occlusion. The main challenge comes from the step of association which aims at matching the predictions and the observations of the multiple targets. In this paper we propose a new groupwise method to explore the target group information and employ the within-group correlations for association and tracking. In particular, the within-group association is modeled by a nonrigid 2D Thin-Plate transform and a sequence of group shrinking, group growing and group merging operations are then developed to refine the composition of each group. We apply the proposed method to track large-scale fibers from microscopy material images and compare its performance against several other multi-target tracking methods. We also apply the proposed method to track crowded people from videos with poor inter-frame continuity.
IEEE Transactions on Nanobioscience | 2014
Jing Chen; Yuan Yan Tang; C. L. Philip Chen; Bin Fang; Yuewei Lin; Zhaowei Shang
Protein subcellular location prediction aims to predict the location where a protein resides within a cell using computational methods. Considering the main limitations of the existing methods, we propose a hierarchical multi-label learning model FHML for both single-location proteins and multi-location proteins. The latent concepts are extracted through feature space decomposition and label space decomposition under the nonnegative data factorization framework. The extracted latent concepts are used as the codebook to indirectly connect the protein features to their annotations. We construct dual fuzzy hypergraphs to capture the intrinsic high-order relations embedded in not only feature space, but also label space. Finally, the subcellular location annotation information is propagated from the labeled proteins to the unlabeled proteins by performing dual fuzzy hypergraph Laplacian regularization. The experimental results on the six protein benchmark datasets demonstrate the superiority of our proposed method by comparing it with the state-of-the-art methods, and illustrate the benefit of exploiting both feature correlations and label correlations.
european conference on computer vision | 2014
Kang Zheng; Yuewei Lin; Youjie Zhou; Dhaval Salvi; Xiaochuan Fan; Dazhou Guo; Zibo Meng; Song Wang
This paper is focused on developing a new approach for video-based action detection where a set of temporally synchronized videos are taken by multiple wearable cameras from different and varying views and our goal is to accurately localize the starting and ending time of each instance of the actions of interest in such videos. Compared with traditional approaches based on fixed-camera videos, this new approach incorporates the visual attention of the camera wearers and allows for the action detection in a larger area, although it brings in new challenges such as unconstrained motion of cameras. In this approach, we leverage the multi-view information and the temporal synchronization of the input videos for more reliable action detection. Specifically, we detect and track the focal character in each video and conduct action recognition only for the focal character in each temporal sliding window. To more accurately localize the starting and ending time of actions, we develop a strategy that may merge temporally adjacent sliding windows when detecting durative actions, and non-maximally suppress temporally adjacent sliding windows when detecting momentary actions. Finally we propose a voting scheme to integrate the detection results from multiple videos for more accurate action detection. For the experiments, we collect a new dataset of multiple wearable-camera videos that reflect the complex scenarios in practice.
IEEE Transactions on Systems, Man, and Cybernetics | 2017
Yuewei Lin; Jing Chen; Yu Cao; Youjie Zhou; Lingfeng Zhang; Yuan Yan Tang; Song Wang
This paper introduces a new method to solve the cross-domain recognition problem. Different from the traditional domain adaption methods which rely on a global domain shift for all classes between the source and target domains, the proposed method is more flexible to capture individual class variations across domains. By adopting a natural and widely used assumption that the data samples from the same class should lay on an intrinsic low-dimensional subspace, even if they come from different domains, the proposed method circumvents the limitation of the global domain shift, and solves the cross-domain recognition by finding the joint subspaces of the source and target domains. Specifically, given labeled samples in the source domain, we construct a subspace for each of the classes. Then we construct subspaces in the target domain, called anchor subspaces, by collecting unlabeled samples that are close to each other and are highly likely to belong to the same class. The corresponding class label is then assigned by minimizing a cost function which reflects the overlap and topological structure consistency between subspaces across the source and target domains, and within the anchor subspaces, respectively. We further combine the anchor subspaces to the corresponding source subspaces to construct the joint subspaces. Subsequently, one-versus-rest support vector machine classifiers are trained using the data samples belonging to the same joint subspaces and applied to unlabeled data in the target domain. We evaluate the proposed method on two widely used datasets: 1) object recognition dataset for computer vision tasks and 2) sentiment classification dataset for natural language processing tasks. Comparison results demonstrate that the proposed method outperforms the comparison methods on both datasets.This paper introduces a new method to solve the cross-domain recognition problem. Different from the traditional domain adaption methods which rely on a global domain shift for all classes between the source and target domains, the proposed method is more flexible to capture individual class variations across domains. By adopting a natural and widely used assumption that the data samples from the same class should lay on an intrinsic low-dimensional subspace, even if they come from different domains, the proposed method circumvents the limitation of the global domain shift, and solves the cross-domain recognition by finding the joint subspaces of the source and target domains. Specifically, given labeled samples in the source domain, we construct a subspace for each of the classes. Then we construct subspaces in the target domain, called anchor subspaces, by collecting unlabeled samples that are close to each other and are highly likely to belong to the same class. The corresponding class label is then assigned by minimizing a cost function which reflects the overlap and topological structure consistency between subspaces across the source and target domains, and within the anchor subspaces, respectively. We further combine the anchor subspaces to the corresponding source subspaces to construct the joint subspaces. Subsequently, one-versus-rest support vector machine classifiers are trained using the data samples belonging to the same joint subspaces and applied to unlabeled data in the target domain. We evaluate the proposed method on two widely used datasets: 1) object recognition dataset for computer vision tasks and 2) sentiment classification dataset for natural language processing tasks. Comparison results demonstrate that the proposed method outperforms the comparison methods on both datasets.
IEEE Transactions on Circuits and Systems for Video Technology | 2017
Yuewei Lin; Yan Tong; Yu Cao; Youjie Zhou; Song Wang
Motion is one of the most important cues to separate foreground objects from the background in a video. Using a stationary camera, it is usually assumed that the background is static, while the foreground objects are moving most of the time. However, in practice, the foreground objects may show infrequent motions, such as abandoned objects and sleeping persons. Meanwhile, the background may contain frequent local motions, such as waving trees and/or grass. Such complexities may prevent the existing background subtraction algorithms from correctly identifying the foreground objects. In this paper, we propose a new approach that can detect the foreground objects with frequent and/or infrequent motions. Specifically, we use a visual-attention mechanism to infer a complete background from a subset of frames and then propagate it to the other frames for accurate background subtraction. Furthermore, we develop a feature-matching-based local motion stabilization algorithm to identify frequent local motions in the background for reducing false positives in the detected foreground. The proposed approach is fully unsupervised, without using any supervised learning for object detection and tracking. Extensive experiments on a large number of videos have demonstrated that the proposed approach outperforms the state-of-the-art motion detection and background subtraction methods in comparison.